i have two sr proxies talking with each other over tcp. one is running 3.0 and the other 3.1. sometimes i notice that no sip requests or any other tcp packets get through from 3.0 sr host to 3.1 sr host. netstat on 3.0 sr host shows that tcp connection to 3.1 sr host is in ESTABLISHED state whereas netstat on 3.1 sr shows that the same tcp connection is in FIN_WAIT1 state. it takes several minutes before the hanging tcp connection is replaced by a new working one. there is no ip level connectivity problem between the two sr hosts.
any suggestions on what might be going on? is there some known bug in 3.0 tcp implementation that could explain this?
-- juha
i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:
SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)
there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.
-- juha
On Jan 05, 2011 at 14:29, Juha Heinanen jh@tutpro.com wrote:
i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:
SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)
there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.
The most likely candidates are: - blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug. - some local firewall rules on the OUTPUT chain running out of memory (but it's strange that you don't get any log messages)
Did you disable the tcp async mode, or you get this in async mode?
Andrei
Andrei Pelinescu-Onciul writes:
The most likely candidates are:
- blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug.
i'll check with those commands when the hangup happens again.
- some local firewall rules on the OUTPUT chain
i don't have control of the firewall at sr 3.1 end. i know there is one, but the port that sr is listening on is open.
Did you disable the tcp async mode, or you get this in async mode?
i have not disabled async mode.
in the second case when both ends claimed that tcp connection was in established state, i had to restart sr 3.0 sip proxy in order to make packets flow again.
-- juha
i turned out that the problem below was caused by a firewall that blocked tcp session if it had been idle for a few minutes. the problem went away when i reduced tcp_connection_lifetime from 3610 to 120 sec.
i don't know if it possible to configure tcp_connection_lifetime on per connection basis. for example, tcp connection to UA could have tcp_connection_lifetime=3610, since tcp session is kept active by UA sending crlfs, whereas tcp connection to another proxy could have a shorter tcp_connection_lifetime.
-- juha
-------------------------------------------------------------
i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:
SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)
there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.
The most likely candidates are:
- blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug.
- some local firewall rules on the OUTPUT chain running out of memory (but it's strange that you don't get any log messages)
On Apr 05, 2011 at 16:10, Juha Heinanen jh@tutpro.com wrote:
i turned out that the problem below was caused by a firewall that blocked tcp session if it had been idle for a few minutes. the problem went away when i reduced tcp_connection_lifetime from 3610 to 120 sec.
i don't know if it possible to configure tcp_connection_lifetime on per connection basis. for example, tcp connection to UA could have tcp_connection_lifetime=3610, since tcp session is kept active by UA sending crlfs, whereas tcp connection to another proxy could have a shorter tcp_connection_lifetime.
No, it's not possible to set it on a per connection basis. It could be done (at the price of an extra int per tcp connection), but the bigger problem is how to distinguish between proxy and UA connections.
Andrei
-- juha
i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:
SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)
there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.
The most likely candidates are:
- blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug.
- some local firewall rules on the OUTPUT chain running out of memory (but it's strange that you don't get any log messages)
Andrei Pelinescu-Onciul writes:
No, it's not possible to set it on a per connection basis. It could be done (at the price of an extra int per tcp connection), but the bigger problem is how to distinguish between proxy and UA connections.
Perhaps it could be done, since UA connections are associated with registrations.
-- Juha
On Jan 05, 2011 at 13:47, Juha Heinanen jh@tutpro.com wrote:
i have two sr proxies talking with each other over tcp. one is running 3.0 and the other 3.1. sometimes i notice that no sip requests or any other tcp packets get through from 3.0 sr host to 3.1 sr host. netstat on 3.0 sr host shows that tcp connection to 3.1 sr host is in ESTABLISHED state whereas netstat on 3.1 sr shows that the same tcp connection is in FIN_WAIT1 state. it takes several minutes before the hanging tcp connection is replaced by a new working one. there is no ip level connectivity problem between the two sr hosts.
any suggestions on what might be going on? is there some known bug in 3.0 tcp implementation that could explain this?
There is no known bug in 3.0 and even if it would be, the connection should go automatically in close-wait on 3.0 (without the userspace doing anything).
It looks like somehow the 3.0 does not receive the FIN (local firewall rules?).
Andrei
Andrei Pelinescu-Onciul writes:
It looks like somehow the 3.0 does not receive the FIN (local firewall rules?).
andrei,
i'm not aware of any such firewall rules. it is possible to establish telnet session to sip port of the other host from both ends. i see from wireshark dump that sr 3.0 host tried to tcp-resend the invite, but does not get anything back. in the second case, both hosts reported that the connection was established.
is it possible to make sr send crlfcrlf keepalive to the tcp connection?
-- juha
On Jan 05, 2011 at 15:03, Juha Heinanen jh@tutpro.com wrote:
Andrei Pelinescu-Onciul writes:
It looks like somehow the 3.0 does not receive the FIN (local firewall rules?).
andrei,
i'm not aware of any such firewall rules. it is possible to establish telnet session to sip port of the other host from both ends. i see from wireshark dump that sr 3.0 host tried to tcp-resend the invite, but does not get anything back. in the second case, both hosts reported that the connection was established.
is it possible to make sr send crlfcrlf keepalive to the tcp connection?
No, you can enable responding to them, but not sending them (tcp_crlf_ping = yes).
However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:
tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3
Andrei
Andrei Pelinescu-Onciul writes:
However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:
tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3
andrei,
my linux 2.6.32 kernel has these tcp keepalive variables:
tcp_keepalive_intvl tcp_keepalive_probes tcp_keepalive_time
linux tcp keepalive doc says:
Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.
question: does sr request that tcp keepalive is turned on on its tcp connections or via a function call on a particular connection?
-- juha
On Jan 09, 2011 at 21:41, Juha Heinanen jh@tutpro.com wrote:
Andrei Pelinescu-Onciul writes:
However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:
tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3
andrei,
my linux 2.6.32 kernel has these tcp keepalive variables:
tcp_keepalive_intvl tcp_keepalive_probes tcp_keepalive_time
linux tcp keepalive doc says:
Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.
question: does sr request that tcp keepalive is turned on on its tcp connections or via a function call on a particular connection?
It uses setsockopt (if tcp_keepalive = yes is present in the .cfg).
You don't have to change any kernel variables in proc for sr. You just need to enable keepalives and set the intervals in the .cfg file (the above example could be pasted directly in a ser.cfg).
Andrei
Hi Juha,
have the TCP keepalives addressed the problem? If not, could you send a PCAP? (ideally from both upstream and downstream perspective)
Thanks!
-jiri
On 1/10/11 11:32 AM, Andrei Pelinescu-Onciul wrote:
On Jan 09, 2011 at 21:41, Juha Heinanenjh@tutpro.com wrote:
Andrei Pelinescu-Onciul writes:
However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:
tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3
andrei,
my linux 2.6.32 kernel has these tcp keepalive variables:
tcp_keepalive_intvl tcp_keepalive_probes tcp_keepalive_time
linux tcp keepalive doc says:
Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.
question: does sr request that tcp keepalive is turned on on its tcp connections or via a function call on a particular connection?
It uses setsockopt (if tcp_keepalive = yes is present in the .cfg).
You don't have to change any kernel variables in proc for sr. You just need to enable keepalives and set the intervals in the .cfg file (the above example could be pasted directly in a ser.cfg).
Andrei
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users