About TCP connection failure (reject and timeout)

List overview All Threads
Download

newer

older

git:master: doxygen: small syntax...

Building kamailio from GIT on...

Iñaki Baz Castillo

15 Jun 2011 15 Jun '11

10:54 a.m.

Hi, I've opened an issue in the tracker since it seems that sip-router (today's master branch) does not properly react upon TCP rejection or TCP timeout (in outgoing transactions):

http://sip-router.org/tracker/index.php?do=details&task_id=136

Could somebody attemp these two calls?

1) sip:lalala@91.121.79.216:7777;transport=tcp

The server will reject (iptables REJECT action) so sip-router should inmediately generate a 503 for the transaction (but instead it generates a local 408 after fr_timer, usually 32 seconds).

2) sip:lalala@1.2.3.4:5060;transport=tcp

The server will not respond as 1.2.3.4 is not reachable, so sip-router should wait for "tcp_connect_timeout" value (10 seconds by default) and then generate a local 408 (but instead it generates a local 408 after fr_timer, usually 32 seconds).

Thanks a lot.

-- Iñaki Baz Castillo ibc@aliax.net

Show replies by date

Iñaki Baz Castillo

15 Jun 15 Jun

5:06 p.m.

2011/6/15 Iñaki Baz Castillo ibc@aliax.net:

...

Hi, I've opened an issue in the tracker since it seems that sip-router (today's master branch) does not properly react upon TCP rejection or TCP timeout (in outgoing transactions):

http://sip-router.org/tracker/index.php?do=details&task_id=136

Could somebody attemp these two calls?

sip:lalala@91.121.79.216:7777;transport=tcp

The server will reject (iptables REJECT action) so sip-router should inmediately generate a 503 for the transaction (but instead it generates a local 408 after fr_timer, usually 32 seconds).

sip:lalala@1.2.3.4:5060;transport=tcp

The server will not respond as 1.2.3.4 is not reachable, so sip-router should wait for "tcp_connect_timeout" value (10 seconds by default) and then generate a local 408 (but instead it generates a local 408 after fr_timer, usually 32 seconds).

The issue just occurs when tcp_async=yes.

-- Iñaki Baz Castillo ibc@aliax.net

Andrei Pelinescu-Onciul

23 Jun 23 Jun

6:32 p.m.

On Jun 15, 2011 at 19:06, Iñaki Baz Castillo ibc@aliax.net wrote:

...

2011/6/15 Iñaki Baz Castillo ibc@aliax.net:

...
Hi, I've opened an issue in the tracker since it seems that sip-router (today's master branch) does not properly react upon TCP rejection or TCP timeout (in outgoing transactions):

http://sip-router.org/tracker/index.php?do=details&task_id=136

Could somebody attemp these two calls?

sip:lalala@91.121.79.216:7777;transport=tcp

The server will reject (iptables REJECT action) so sip-router should inmediately generate a 503 for the transaction (but instead it generates a local 408 after fr_timer, usually 32 seconds).

sip:lalala@1.2.3.4:5060;transport=tcp

The server will not respond as 1.2.3.4 is not reachable, so sip-router should wait for "tcp_connect_timeout" value (10 seconds by default) and then generate a local 408 (but instead it generates a local 408 after fr_timer, usually 32 seconds).

The issue just occurs when tcp_async=yes.

Yes, it's a known limitation. Basically when async it's own, tm has no way of knowing that a connect() has failed and would have to rely on sip timeout. Of course these could be changed, but it would have both performance and memory usage impact and it would be very hard to integrate with tls. I would rather not do it in the near future.

The tcp_connect_timeout refers to how long the tcp connect will be attempted, but it's not linked to tm. The value is not 100% exact, the tcp timers are executed on a best effort basis, at most at 5s intervals and at minimum 1/16 seconds, so you should expect a 5s error If it's too much for you, change TCP_MAIN_SELECT_TIMEOUT and TCP_CHILD_SELECT_TIMEOUT in tcp_conn.h (btw. we don't use select() anymore, the names where not updated when we switched to epoll/kqueue/dev_poll).

Andrei

...

-- Iñaki Baz Castillo ibc@aliax.net

sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

Iñaki Baz Castillo

6:48 p.m.

2011/6/23 Andrei Pelinescu-Onciul andrei@iptel.org:

...

Yes, it's a known limitation. Basically when async it's own, tm has no way of knowing that a connect() has failed and would have to rely on sip timeout. Of course these could be changed, but it would have both performance and memory usage impact and it would be very hard to integrate with tls. I would rather not do it in the near future.

Ok, I understand. It would be great to have it however.

In the other side, this has an unexpected advantage:

It unifies both behaviours of UDP and TCP/TLS. In sync mode, if a TCP connection fails t_relay returns an error and doesn't execute on_failure_route block (I still this is a bug in the design as a TCP connection error should trigger a local 503 so on_failure_route should be called with such 503 as winning reply). Anyhow, in async mode, due to the explained limitation, a TCP connection error would generate a local timeout so on_failure_route would be called with 408 as winning reply. This allows unifying code for UDP and TCP in the script.

...

The tcp_connect_timeout refers to how long the tcp connect will be attempted, but it's not linked to tm. The value is not 100% exact, the tcp timers are executed on a best effort basis, at most at 5s intervals and at minimum 1/16 seconds, so you should expect a 5s error If it's too much for you, change TCP_MAIN_SELECT_TIMEOUT and TCP_CHILD_SELECT_TIMEOUT in tcp_conn.h (btw. we don't use select() anymore, the names where not updated when we switched to epoll/kqueue/dev_poll).

Thanks, I don't need it to be less than 5 seconds, neither I care it as anyway I must wait fr_timer seconds :)

Thanks a lot.

-- Iñaki Baz Castillo ibc@aliax.net

5133

Age (days ago)

5141

Last active (days ago)

sr-dev@lists.kamailio.org

3 comments

2 participants

tags (0)

participants (2)

Andrei Pelinescu-Onciul
Iñaki Baz Castillo