Hi all,
Just wondering if anyone else has had this problem. I have noticed while tracing on my OpenSER server, that every now and then the server receives a packet which it does to respond to immediately, resulting in a string of packets being sent to the server and then the server responding a few seconds later. This does not happen all the time, just say maybe once or twice every hour. The rest of the time the signaling is correct and responses follow request packets in the correct order.
What I am trying to figure out is whether this is a load traffic issue (i.e. can the server not handle too much load), and if so is it OpenSER or the network or the server in general? I have run diagnostics on the servers and there is nothing wrong with the hardware.
On the other hand Could this be related to any timer issues? I remember there was mention of timers in SER but are there any default timer settings that can be tweaked?
Thanks in advance for any response.
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
it also can be happening due to a spiral loop that stays on the server.
Without further information (confg,logs) it's hard to tell which is the reason...
hope it helps, Samuel.
without more information
2006/12/14, Max Gregorian gregorian442@googlemail.com:
Hi all,
Just wondering if anyone else has had this problem. I have noticed while tracing on my OpenSER server, that every now and then the server receives a packet which it does to respond to immediately, resulting in a string of packets being sent to the server and then the server responding a few seconds later. This does not happen all the time, just say maybe once or twice every hour. The rest of the time the signaling is correct and responses follow request packets in the correct order.
What I am trying to figure out is whether this is a load traffic issue (i.e. can the server not handle too much load), and if so is it OpenSER or the network or the server in general? I have run diagnostics on the servers and there is nothing wrong with the hardware.
On the other hand Could this be related to any timer issues? I remember there was mention of timers in SER but are there any default timer settings that can be tweaked?
Thanks in advance for any response.
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
it also can be happening due to a spiral loop that stays on the server.
This shouldn't block the process, and if is t_relayed, a 100 reply should be sent back.
Cheers, Daniel
Without further information (confg,logs) it's hard to tell which is the reason...
hope it helps, Samuel.
without more information
2006/12/14, Max Gregorian gregorian442@googlemail.com:
Hi all,
Just wondering if anyone else has had this problem. I have noticed while tracing on my OpenSER server, that every now and then the server receives a packet which it does to respond to immediately, resulting in a string of packets being sent to the server and then the server responding a few seconds later. This does not happen all the time, just say maybe once or twice every hour. The rest of the time the signaling is correct and responses follow request packets in the correct order.
What I am trying to figure out is whether this is a load traffic issue (i.e. can the server not handle too much load), and if so is it OpenSER or the network or the server in general? I have run diagnostics on the servers and there is nothing wrong with the hardware.
On the other hand Could this be related to any timer issues? I remember there was mention of timers in SER but are there any default timer settings that can be tweaked?
Thanks in advance for any response.
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
At 10:37 15/12/2006, Daniel-Constantin Mierla wrote:
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
Actually we have tried this one and yet another one (whose name I can't recall) and there were some reliability issues. Unfortunately, I remember this very remotely, cc-ed thus serusers as this debate was there once going on -- hopefuly someone with better memory than myself will speak up.
-jiri
-- Jiri Kuthan http://iptel.org/~jiri/
On Fri, 2006-12-15 at 20:27 +0100, Jiri Kuthan wrote:
At 10:37 15/12/2006, Daniel-Constantin Mierla wrote:
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
Actually we have tried this one and yet another one (whose name I can't recall) and there were some reliability issues. Unfortunately, I remember this very remotely, cc-ed thus serusers as this debate was there once going on -- hopefuly someone with better memory than myself will speak up.
It was dnsmasq (also found almost in every distribution) if you don't want to use bind in cache only mode. If you use dnsmasq put your attention to the filterwin2k option; if you set it on, then SRV requests won't be forwarded, only locally configured SRV request will be answered.
Michal
On Sat, December 16, 2006 0:05, Michal Matyska said:
If you use dnsmasq put your attention to the filterwin2k option; if you set it on, then SRV requests won't be forwarded, only locally configured SRV request will be answered.
AFAIK NAPTR records are filterd too.
regards klaus
I remember we had actually some software troubles. (which is not a conceptual counterargument, just software-wise it appeared impractical). -jiri
At 00:33 18/12/2006, Klaus Darilion wrote:
On Sat, December 16, 2006 0:05, Michal Matyska said:
If you use dnsmasq put your attention to the filterwin2k option; if you set it on, then SRV requests won't be forwarded, only locally configured SRV request will be answered.
AFAIK NAPTR records are filterd too.
regards klaus
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
-- Jiri Kuthan http://iptel.org/~jiri/
On 12/15/06 21:27, Jiri Kuthan wrote:
At 10:37 15/12/2006, Daniel-Constantin Mierla wrote:
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
Actually we have tried this one and yet another one (whose name I can't recall) and there were some reliability issues. Unfortunately, I remember this very remotely, cc-ed thus serusers as this debate was there once going on -- hopefuly someone with better memory than myself will speak up.
nscd is part of GNU C Library, I am sure a lot of people will be happy to learn about and many will strive to fix as soon as possible, if you can describe the issues you had with it -- it is part of a core component in all Unixes.
Also, the name of the other one and the issues will help the developers to make it better -- testing and feedback is the most appreciated.
Cheers, Daniel
-jiri
-- Jiri Kuthan http://iptel.org/~jiri/
Sorry for not getting back to you all sooner. And thanks for all the replies thus far. I have been trying to get some feedback and results back from our developers in the office, and also to re-do some of the tests so I can capture some results to show, but it's been a bit crazy as it's nearing xmas.
In terms of the config, so far we are doing nothing special with these particular servers as you will see.
INVITES are t_relayed. I use *rewritehostport* to forward to the IP of the SIP server, then t_relay in routing block as below:
if (uri==myself) {
force_rport(); fix_nated_contact();
if (uri=~"^sip:*@*") { rewritehostport("aaa.bbb.ccc.ddd:5060"); # forward to the SIP server route(1); return; };
lookup("aliases"); if (!uri==myself) { append_hf("P-hint: outbound alias\r\n"); route(1); };
# native SIP destinations are handled using our USRLOC DB if (!lookup("location")) { sl_send_reply("404", "Not Found"); exit; }; append_hf("P-hint: usrloc applied\r\n"); }; # END-OF_( "if (uri==myself)" )
route(1); } route[1] { # send it out now; use stateful forwarding as it works reliably # even for UDP2TCP if (!t_relay()) { sl_reply_error(); }; exit; }
I will post more info as soon as I have it.
I will be happy to carry out any further tests you suggest. I am trying to come up with a standard series of tests that will eliminate as many variables as possible. As such, I am also trying to test with as many different tools as possible.
To this end, I have also done some very basic flood testing using SIPSak which sends SIP OPTIONS requests, and so far I have not noticed this problem with OPTIONS requests i.e. for every OPTIONS request there was a 200 OK back. I will have to re-test on *all* the servers just to confirm this.
Here's a trace captured on one of the servers a while back using *ngrep*.
U 2006/11/27 14:31:47.515721 <UAC's IP>:5060 -> <OpenSER's IP>:5060 NOTIFY sip:<MY DOMAIN> SIP/2.0. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Max-Forwards: 70. Event: keep-alive. User-Agent: Linksys/PAP2-2.0.12(LS). Content-Length: 0. .
############################ U 2006/11/27 14:31:48.013624 <UAC's IP>:5060 -> <OpenSER's IP>:5060 NOTIFY sip:<MY DOMAIN> SIP/2.0. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Max-Forwards: 70. Event: keep-alive. User-Agent: Linksys/PAP2-2.0.12(LS). Content-Length: 0. .
###################################################### U 2006/11/27 14:31:49.014389 <UAC's IP>:5060 -> <OpenSER's IP>:5060 NOTIFY sip:<MY DOMAIN> SIP/2.0. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Max-Forwards: 70. Event: keep-alive. User-Agent: Linksys/PAP2-2.0.12(LS). Content-Length: 0. .
####################################################################################### U 2006/11/27 14:31:51.013908 <UAC's IP>:5060 -> <OpenSER's IP>:5060 NOTIFY sip:<MY DOMAIN> SIP/2.0. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Max-Forwards: 70. Event: keep-alive. User-Agent: Linksys/PAP2-2.0.12(LS). Content-Length: 0. .
################################################################################################################################################################################# U 2006/11/27 14:31:54.431844 <OpenSER's IP>:5060 -> <UAC's IP>:5060 SIP/2.0 200 OK. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d;rport=5060;received=<UAC's IP>. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>;tag=01a9973244fdde83b61ad80a93303da1.0b93. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Content-Length: 0. .
############################ U 2006/11/27 14:31:54.433384 <OpenSER's IP>:5060 -> <UAC's IP>:5060 SIP/2.0 200 OK. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d;rport=5060;received=<UAC's IP>. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>;tag=01a9973244fdde83b61ad80a93303da1.0b93. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Content-Length: 0. .
################################################################# U 2006/11/27 14:31:54.634717 <OpenSER's IP>:5060 -> <UAC's IP>:5060 SIP/2.0 200 OK. Via: SIP/2.0/UDP 192.168.1.6:5060;branch=z9hG4bK-9afb502d;rport=5060;received=<UAC's IP>. From: <USER1> <sip:<MY ACCT NO.>@<MY DOMAIN>>;tag=c0eac8eb24dd8c66o1. To: <sip:<MY DOMAIN>>;tag=01a9973244fdde83b61ad80a93303da1.0b93. Call-ID: a3281cc3-9ad08a6e@192.168.1.6. CSeq: 11 NOTIFY. Content-Length: 0.
A few points to note: 1. all the packets are UDP packets 2. As you can see, there is no looping or *packet-loss* for that matter, as the openser server does receive all the packets. 3. Moreover, all the packets have the same *branch number* and *Call-ID*, which unless i'm mistaken means it's a retransmission. 4. All packets *before* and *after* got the 200 OK reply back straight away as normal and the server just continued as if nothing had happened i.e. NOTIFY, 200 OK, NOTIFY, 200 OK, etc.
I realise these are only currently with NOTIFY requests, but I am trying to see if I can find any with INVITES and will post if I can capture any, but one of the developers noticed a trace with this happening with INVITES as well at one time but did not save the trace so.. :(.
The problem with this is that it is hard to duplicate the problem as this thing only happens every now and then and not all the time, so it is hard to trace. If it happened all the time then we could say it's a general problem or something wrong in the config, but...
Anyway, I will post more info as soon as I have it.
On 12/19/06, Daniel-Constantin Mierla daniel@voice-system.ro wrote:
On 12/15/06 21:27, Jiri Kuthan wrote:
At 10:37 15/12/2006, Daniel-Constantin Mierla wrote:
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name
service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
Actually we have tried this one and yet another one (whose name I can't
recall)
and there were some reliability issues. Unfortunately, I remember this
very remotely,
cc-ed thus serusers as this debate was there once going on -- hopefuly
someone
with better memory than myself will speak up.
nscd is part of GNU C Library, I am sure a lot of people will be happy to learn about and many will strive to fix as soon as possible, if you can describe the issues you had with it -- it is part of a core component in all Unixes.
Also, the name of the other one and the issues will help the developers to make it better -- testing and feedback is the most appreciated.
Cheers, Daniel
-jiri
-- Jiri Kuthan http://iptel.org/~jiri/
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Just out of interest, is DNS caching required or recommended on the servers themselves?
I checked and *nscd* is present in the current installed version of CentOS on all the servers without the need for downloads or updates (at least from the packages I selected before installing). It just hasn't been configured and as far as I can tell, is not running.
Also, what sort of performance should I expect from these servers given the specs I mentioned before, in terms CPS, for example. Not necessarily looking for a definitive figure, but at least a ball-park figure - say a bare minimum (50 CPS min. or above maybe?).
On 12/19/06, Daniel-Constantin Mierla daniel@voice-system.ro wrote:
On 12/15/06 21:27, Jiri Kuthan wrote:
At 10:37 15/12/2006, Daniel-Constantin Mierla wrote:
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name
service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
Actually we have tried this one and yet another one (whose name I can't
recall)
and there were some reliability issues. Unfortunately, I remember this
very remotely,
cc-ed thus serusers as this debate was there once going on -- hopefuly
someone
with better memory than myself will speak up.
nscd is part of GNU C Library, I am sure a lot of people will be happy to learn about and many will strive to fix as soon as possible, if you can describe the issues you had with it -- it is part of a core component in all Unixes.
Also, the name of the other one and the issues will help the developers to make it better -- testing and feedback is the most appreciated.
Cheers, Daniel
-jiri
-- Jiri Kuthan http://iptel.org/~jiri/
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
2006/12/15, Daniel-Constantin Mierla daniel@voice-system.ro:
On 12/14/06 17:03, samuel wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
If proves to be because of DNS, the best is to install nscd (name service cache daemon) which will speed-up a lot DNS interaction. Having it in the system will help other applications to do DNS queries faster (e.g., asterisk, mail servers ...). It looks to be really powerful being able to cache many services, not only DNS. It comes packaged with most of common distributions.
Didn't know of this...
it also can be happening due to a spiral loop that stays on the server.
This shouldn't block the process, and if is t_relayed, a 100 reply should be sent back.
The problem I'm referring to is when there's a misconfiguration and the proxy sends to itself the message because it thinks the message is directed to itself instead of a subscriber's UA (alias,listen,loose_route(),...). It is true that the 100 reply is sent back and that child is not blocked BUT, the message is relayed to the server itself, which is probably faster that the messages coming from the UA, and therefore children are constantly processing the "spiral" message. Coming back to MAx problem....do you see a Too Many Hops reply sent back after the "blocking period" (this usually happens in the "spiral" because every processing decreases MAx-Forwards until the lower limit is reached...)??
Posting config and some logs/captures might help to know the problem. Samuel.
Cheers, Daniel
Thanks very much for all the replies. I shall try and post a config and traces as soon as I can get them from the office.
Some more information, if it helps:
Server specs: - HP ProLiant DL360 G4 (1U rack servers) - 3 GHz processors (800 MHz FSB) - 1 GB RAM - 10K rpm SCSI HDs (in a RAID 1+0 Mirror)
# Servers are running OpenSER 1.0.1 (no-TLS). # Servers are listening on 3 ports (both tcp and udp for each port), so in openserctl ps I am seeing 4 child processes for each port. # Servers running CentOS Linux 4.3 # MySQL installed when CentOS was installed but *not* running and not currently being used with Openser.
Things I have pretty much managed to eliminate are: 1. It's doesn't seem to be *hardware*. The specs for the servers are more than sufficient I think. 2. It doesn't seem to be *traffic/load* related as I see these problems on 2 brand new servers I have just installed with no traffic on them. However, it does seem to get worse with more traffic. 3. I don't think it's *database* related as I have deliberately not configured *mysql* on any of the servers in case of database performance. 4. I haven't played with the *timers* at all so far. 5. I haven't configured *nscd* yet, but as far as I can tell it's not caching DNS. 6. Though openser is listening on tcp ports as well, currently only the udp ports are being used as most of our customers use hardware phones. In any case, I haven't as yet seen as requests on tcp. 7. I am not sure it is DNS as in the tests I ran I sent requests directly to the external IP of the server and not to the domain name it is responsible for. Also the test servers are now only responsible for one domain, but in future will have more than one. 8. Also TTL on the domain name is really short. Ping from the server itself TTL=64 and ping times are low as you would expect (< 1ms when pinging from the server itself). Ping from outside the network (from the internet - for me - tp the domain was) 12ms (average), no packet loss, TTL = 53. 9. I have not setup any internal DNS entries for the domain. Servers are resolving domain from entries in /etc/hosts.
Like I said, it doesn't happen all the time - just maybe once or twice every hour on the servers with more traffic.
I ran *SIPp* pointing at one of the new servers last week and at around 100CPS I was seeing about 2,000 out of approx. 10,000 calls were failing. Setup was UAC -> openser -> UAS (Both UAC and UAS were running on the same machine, but different ports). Again there is no traffic on these servers now so I have no idea why so many failed calls.
I am not sure if any of this information helps, but I am certainly open to suggestions on things to try.
Thanks in advance.
On 12/14/06, samuel samu60@gmail.com wrote:
It might be due to a DNS query....whenver a request has to be forwarded to a domain, openSER makes a DNS query to resolv the IP. During this operation, the child processing the request will not answer to further incoming messages.
it also can be happening due to a spiral loop that stays on the server.
Without further information (confg,logs) it's hard to tell which is the reason...
hope it helps, Samuel.
without more information
2006/12/14, Max Gregorian gregorian442@googlemail.com:
Hi all,
Just wondering if anyone else has had this problem. I have noticed while tracing on my OpenSER server, that every now and then the server
receives a
packet which it does to respond to immediately, resulting in a string of packets being sent to the server and then the server responding a few seconds later. This does not happen all the time, just say maybe once or twice every hour. The rest of the time the signaling is correct and responses follow request packets in the correct order.
What I am trying to figure out is whether this is a load traffic issue (
i.e.
can the server not handle too much load), and if so is it OpenSER or the network or the server in general? I have run diagnostics on the servers
and
there is nothing wrong with the hardware.
On the other hand Could this be related to any timer issues? I remember there was mention of timers in SER but are there any default timer
settings
that can be tweaked?
Thanks in advance for any response.
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Hi,
Max Gregorian wrote:
Thanks very much for all the replies. I shall try and post a config and traces as soon as I can get them from the office.
do you get any errors in the logs when you experience the problems? can you see in the traces (at sip level) what are the reasons for the failed calls? like: negative replies from openser, missing sip packages, timeouts, etc.
all this information is essential to find the problem which after all can be a simple memory bottleneck or maybe something more complex.
I ran *SIPp* pointing at one of the new servers last week and at around 100CPS I was seeing about 2,000 out of approx. 10,000 calls were failing. Setup was UAC -> openser -> UAS (Both UAC and UAS were running on the same machine, but different ports). Again there is no traffic on these servers now so I have no idea why so many failed calls.
I am not sure if any of this information helps, but I am certainly open to suggestions on things to try.
regards, bogdan