Sorry for not getting back to you all sooner. And thanks for all the replies thus far. I have been trying to get some feedback and results back from our developers in the office, and also to re-do some of the tests so I can capture some results to show, but it's been a bit crazy as it's nearing xmas.
 
In terms of the config, so far we are doing nothing special with these particular servers as you will see.
 
INVITES are t_relayed. I use rewritehostport to forward to the IP of the SIP server, then t_relay in routing block as below:
 

 if (uri==myself) {

  force_rport();
  fix_nated_contact();

  if (uri=~"^sip:*@*") {
   rewritehostport("aaa.bbb.ccc.ddd:5060");  # forward to the SIP server
   route(1);
   return;
  };

  lookup("aliases");
  if (!uri==myself) {
   append_hf("P-hint: outbound alias\r\n");
   route(1);
  };

  # native SIP destinations are handled using our USRLOC DB
  if (!lookup("location")) {
   sl_send_reply("404", "Not Found");
   exit;
  };
  append_hf("P-hint: usrloc applied\r\n");
 }; # END-OF_(  "if (uri==myself)"  )

 route(1);
}

route[1] {
 # send it out now; use stateful forwarding as it works reliably
 # even for UDP2TCP
 if (!t_relay()) {
  sl_reply_error();
 };
 exit;
}
 
I will post more info as soon as I have it.
 
I will be happy to carry out any further tests you suggest. I am trying to come up with a standard series of tests that will eliminate as many variables as possible. As such, I am also trying to test with as many different tools as possible.
 
To this end, I have also done some very basic flood testing using SIPSak which sends SIP OPTIONS requests, and so far I have not noticed this problem with OPTIONS requests i.e. for every OPTIONS request there was a 200 OK back. I will have to re-test on all the servers just to confirm this.
 
 
Here's a trace captured on one of the servers a while back using ngrep.
 

U 2006/11/27 14:31:47.515721 <UAC's IP>:5060 -> <OpenSER's IP>:5060
NOTIFY sip:<MY DOMAIN> SIP/2.0.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6 .
CSeq: 11 NOTIFY.
Max-Forwards: 70.
Event: keep-alive.
User-Agent: Linksys/PAP2-2.0.12(LS).
Content-Length: 0.
.

############################
U 2006/11/27 14:31:48.013624 <UAC's IP>:5060 -> <OpenSER's IP>:5060
NOTIFY sip:<MY DOMAIN> SIP/2.0.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6.
CSeq: 11 NOTIFY.
Max-Forwards: 70.
Event: keep-alive.
User-Agent: Linksys/PAP2-2.0.12(LS).
Content-Length: 0.
.

######################################################
U 2006/11/27 14:31:49.014389 <UAC's IP>:5060 -> <OpenSER's IP>:5060
NOTIFY sip:<MY DOMAIN> SIP/2.0.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6.
CSeq: 11 NOTIFY.
Max-Forwards: 70.
Event: keep-alive.
User-Agent: Linksys/PAP2-2.0.12(LS).
Content-Length: 0.
.

#######################################################################################
U 2006/11/27 14:31:51.013908 <UAC's IP>:5060 -> <OpenSER's IP>:5060
NOTIFY sip:<MY DOMAIN> SIP/2.0.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6.
CSeq: 11 NOTIFY.
Max-Forwards: 70.
Event: keep-alive.
User-Agent: Linksys/PAP2-2.0.12(LS).
Content-Length: 0.
.

#################################################################################################################################################################################
U 2006/11/27 14:31:54.431844 <OpenSER's IP>:5060 -> <UAC's IP>:5060
SIP/2.0 200 OK.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d;rport=5060;received=<UAC's IP>.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>;tag=01a9973244fdde83b61ad80a93303da1.0b93.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6.
CSeq: 11 NOTIFY.
Content-Length: 0.
.

############################
U 2006/11/27 14:31:54.433384 <OpenSER's IP>:5060 -> <UAC's IP>:5060
SIP/2.0 200 OK.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d;rport=5060;received=<UAC's IP>.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>;tag=01a9973244fdde83b61ad80a93303da1.0b93.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6.
CSeq: 11 NOTIFY.
Content-Length: 0.
.

#################################################################
U 2006/11/27 14:31:54.634717 <OpenSER's IP>:5060 -> <UAC's IP>:5060
SIP/2.0 200 OK.
Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d;rport=5060;received=<UAC's IP>.
From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1.
To: <sip:<MY DOMAIN>>;tag= 01a9973244fdde83b61ad80a93303da1.0b93.
Call-ID: a3281cc3-9ad08a6e@192.168.1.6.
CSeq: 11 NOTIFY.
Content-Length: 0.

 
A few points to note:
1. all the packets are UDP packets
2. As you can see, there is no looping or packet-loss for that matter, as the openser server does receive all the packets.
3. Moreover, all the packets have the same branch number and Call-ID, which unless i'm mistaken means it's a retransmission.
4. All packets before and after got the 200 OK reply back straight away as normal and the server just continued as if nothing had happened i.e. NOTIFY, 200 OK, NOTIFY, 200 OK, etc.

I realise these are only currently with NOTIFY requests, but I am trying to see if I can find any with INVITES and will post if I can capture any, but one of the developers noticed a trace with this happening with INVITES as well at one time but did not save the trace so.. :(.
 
The problem with this is that it is hard to duplicate the problem as this thing only happens every now and then and not all the time, so it is hard to trace. If it happened all the time then we could say it's a general problem or something wrong in the config, but...
 
Anyway, I will post more info as soon as I have it.
 
 
On 12/19/06, Daniel-Constantin Mierla <daniel@voice-system.ro> wrote:
On 12/15/06 21:27, Jiri Kuthan wrote:
> At 10:37 15/12/2006, Daniel-Constantin Mierla wrote:
>
>
>
>> On 12/14/06 17:03, samuel wrote:
>>
>>> It might be due to a DNS query....whenver a request has to be
>>> forwarded to a domain, openSER makes a DNS query to resolv the IP.
>>> During this operation, the child processing the request will not
>>> answer to further incoming messages.
>>>
>> If proves to be because of DNS, the best is to install nscd (name service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster ( e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
>>
>
> Actually we have tried this one and yet another one (whose name I can't recall)
> and there were some reliability issues. Unfortunately, I remember this very remotely,
> cc-ed thus serusers as this debate was there once going on -- hopefuly someone
> with better memory than myself will speak up.
>
nscd is part of GNU C Library, I am sure a lot of people will be happy
to learn about and many will strive to fix as soon as possible, if you
can describe the issues you had with it -- it is part of a core
component in all Unixes.

Also, the name of the other one and the issues will help the developers
to make it better -- testing and feedback is the most appreciated.

Cheers,
Daniel



> -jiri
>
> --
> Jiri Kuthan            http://iptel.org/~jiri/
>
>
>

_______________________________________________
Users mailing list
Users@openser.org
http://openser.org/cgi-bin/mailman/listinfo/users