tcp hangup

List overview All Threads
Download

newer

older

Kamailio v3.1.3 Released

[PATCH] modules_k/db_sqlite: new...

Juha Heinanen

5 Jan 2011 5 Jan '11

11:47 a.m.

i have two sr proxies talking with each other over tcp. one is running 3.0 and the other 3.1. sometimes i notice that no sip requests or any other tcp packets get through from 3.0 sr host to 3.1 sr host. netstat on 3.0 sr host shows that tcp connection to 3.1 sr host is in ESTABLISHED state whereas netstat on 3.1 sr shows that the same tcp connection is in FIN_WAIT1 state. it takes several minutes before the hanging tcp connection is replaced by a new working one. there is no ip level connectivity problem between the two sr hosts.

any suggestions on what might be going on? is there some known bug in 3.0 tcp implementation that could explain this?

-- juha

Show replies by date

Juha Heinanen

5 Jan 5 Jan

12:29 p.m.

i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:

SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)

there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.

-- juha

Andrei Pelinescu-Onciul

12:57 p.m.

On Jan 05, 2011 at 14:29, Juha Heinanen jh@tutpro.com wrote:

...

i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:

SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)

there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.

The most likely candidates are: - blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug. - some local firewall rules on the OUTPUT chain running out of memory (but it's strange that you don't get any log messages)

Did you disable the tcp async mode, or you get this in async mode?

Andrei

Juha Heinanen

1:13 p.m.

Andrei Pelinescu-Onciul writes:

...

The most likely candidates are:

blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug.

i'll check with those commands when the hangup happens again.

...

some local firewall rules on the OUTPUT chain

i don't have control of the firewall at sr 3.1 end. i know there is one, but the port that sr is listening on is open.

...

Did you disable the tcp async mode, or you get this in async mode?

i have not disabled async mode.

in the second case when both ends claimed that tcp connection was in established state, i had to restart sr 3.0 sip proxy in order to make packets flow again.

-- juha

Juha Heinanen

5 Apr 5 Apr

1:10 p.m.

i turned out that the problem below was caused by a firewall that blocked tcp session if it had been idle for a few minutes. the problem went away when i reduced tcp_connection_lifetime from 3610 to 120 sec.

i don't know if it possible to configure tcp_connection_lifetime on per connection basis. for example, tcp connection to UA could have tcp_connection_lifetime=3610, since tcp session is kept active by UA sending crlfs, whereas tcp connection to another proxy could have a shorter tcp_connection_lifetime.

-- juha

-------------------------------------------------------------

...

...
i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:

SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)

there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.

The most likely candidates are:

blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug.

some local firewall rules on the OUTPUT chain running out of memory (but it's strange that you don't get any log messages)

Andrei Pelinescu-Onciul

1:16 p.m.

On Apr 05, 2011 at 16:10, Juha Heinanen jh@tutpro.com wrote:

...

i turned out that the problem below was caused by a firewall that blocked tcp session if it had been idle for a few minutes. the problem went away when i reduced tcp_connection_lifetime from 3610 to 120 sec.

i don't know if it possible to configure tcp_connection_lifetime on per connection basis. for example, tcp connection to UA could have tcp_connection_lifetime=3610, since tcp session is kept active by UA sending crlfs, whereas tcp connection to another proxy could have a shorter tcp_connection_lifetime.

No, it's not possible to set it on a per connection basis. It could be done (at the price of an extra int per tcp connection), but the bigger problem is how to distinguish between proxy and UA connections.

Andrei

...

-- juha

...
...
i did some more debugging and wireshark shows that the 3.0 sr does not even try to send anything to the 3.1 sr over the tcp connection although netstat now tells at both hosts that the connection is established. instead sr 3.0 replies immediately after receiving invite from ua:

SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)

there is no related messages in syslog. perhaps tcp stack on 3.0 host has not got acks for earlier packets and just waits there.

The most likely candidates are:

blacklisted destination (due to some previous error). You could check it with sercmd dst_blacklist.view or dst_blacklist.debug.

some local firewall rules on the OUTPUT chain running out of memory (but it's strange that you don't get any log messages)

Juha Heinanen

1:21 p.m.

Andrei Pelinescu-Onciul writes:

...

No, it's not possible to set it on a per connection basis. It could be done (at the price of an extra int per tcp connection), but the bigger problem is how to distinguish between proxy and UA connections.

Perhaps it could be done, since UA connections are associated with registrations.

-- Juha

Andrei Pelinescu-Onciul

5 Jan 5 Jan

12:53 p.m.

On Jan 05, 2011 at 13:47, Juha Heinanen jh@tutpro.com wrote:

...

i have two sr proxies talking with each other over tcp. one is running 3.0 and the other 3.1. sometimes i notice that no sip requests or any other tcp packets get through from 3.0 sr host to 3.1 sr host. netstat on 3.0 sr host shows that tcp connection to 3.1 sr host is in ESTABLISHED state whereas netstat on 3.1 sr shows that the same tcp connection is in FIN_WAIT1 state. it takes several minutes before the hanging tcp connection is replaced by a new working one. there is no ip level connectivity problem between the two sr hosts.

any suggestions on what might be going on? is there some known bug in 3.0 tcp implementation that could explain this?

There is no known bug in 3.0 and even if it would be, the connection should go automatically in close-wait on 3.0 (without the userspace doing anything).

It looks like somehow the 3.0 does not receive the FIN (local firewall rules?).

Andrei

Juha Heinanen

1:03 p.m.

Andrei Pelinescu-Onciul writes:

...

It looks like somehow the 3.0 does not receive the FIN (local firewall rules?).

andrei,

i'm not aware of any such firewall rules. it is possible to establish telnet session to sip port of the other host from both ends. i see from wireshark dump that sr 3.0 host tried to tcp-resend the invite, but does not get anything back. in the second case, both hosts reported that the connection was established.

is it possible to make sr send crlfcrlf keepalive to the tcp connection?

-- juha

Andrei Pelinescu-Onciul

1:29 p.m.

On Jan 05, 2011 at 15:03, Juha Heinanen jh@tutpro.com wrote:

...

Andrei Pelinescu-Onciul writes:

...
It looks like somehow the 3.0 does not receive the FIN (local firewall rules?).

andrei,

i'm not aware of any such firewall rules. it is possible to establish telnet session to sip port of the other host from both ends. i see from wireshark dump that sr 3.0 host tried to tcp-resend the invite, but does not get anything back. in the second case, both hosts reported that the connection was established.

is it possible to make sr send crlfcrlf keepalive to the tcp connection?

No, you can enable responding to them, but not sending them (tcp_crlf_ping = yes).

However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:

tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3

Andrei

Juha Heinanen

9 Jan 9 Jan

7:41 p.m.

Andrei Pelinescu-Onciul writes:

...

However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:

tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3

andrei,

my linux 2.6.32 kernel has these tcp keepalive variables:

tcp_keepalive_intvl tcp_keepalive_probes tcp_keepalive_time

linux tcp keepalive doc says:

Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.

question: does sr request that tcp keepalive is turned on on its tcp connections or via a function call on a particular connection?

-- juha

Andrei Pelinescu-Onciul

10 Jan 10 Jan

10:32 a.m.

On Jan 09, 2011 at 21:41, Juha Heinanen jh@tutpro.com wrote:

...

Andrei Pelinescu-Onciul writes:

...
However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:

tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3

andrei,

my linux 2.6.32 kernel has these tcp keepalive variables:

tcp_keepalive_intvl tcp_keepalive_probes tcp_keepalive_time

linux tcp keepalive doc says:

Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.

question: does sr request that tcp keepalive is turned on on its tcp connections or via a function call on a particular connection?

It uses setsockopt (if tcp_keepalive = yes is present in the .cfg).

You don't have to change any kernel variables in proc for sr. You just need to enable keepalives and set the intervals in the .cfg file (the above example could be pasted directly in a ser.cfg).

Andrei

Jiri Kuthan

11 Jan 11 Jan

3:49 p.m.

Hi Juha,

have the TCP keepalives addressed the problem? If not, could you send a PCAP? (ideally from both upstream and downstream perspective)

Thanks!

-jiri

On 1/10/11 11:32 AM, Andrei Pelinescu-Onciul wrote:

...

On Jan 09, 2011 at 21:41, Juha Heinanenjh@tutpro.com wrote:

...
Andrei Pelinescu-Onciul writes:

...
However you could enable tcp level keepalives and on linux you can tune the intervals, e.g.:

tcp_keepalive = yes tcp_keepidle = 60 tcp_keepintvl = 10 tcp_keepcnt = 3

andrei,

my linux 2.6.32 kernel has these tcp keepalive variables:

tcp_keepalive_intvl tcp_keepalive_probes tcp_keepalive_time

linux tcp keepalive doc says:

Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.

question: does sr request that tcp keepalive is turned on on its tcp connections or via a function call on a particular connection?

It uses setsockopt (if tcp_keepalive = yes is present in the .cfg).

You don't have to change any kernel variables in proc for sr. You just need to enable keepalives and set the intervals in the .cfg file (the above example could be pasted directly in a ser.cfg).

Andrei

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

5231

Age (days ago)

5321

Last active (days ago)

sr-users@lists.kamailio.org

12 comments

3 participants

tags (0)

participants (3)

Andrei Pelinescu-Onciul
Jiri Kuthan
Juha Heinanen