I understand, yes, that it comes from the os default settings how the tcp connection is handled, but as far as I understood that the patch done in the commit is adding the TCP_USER_TIMEOUT which should somehow at least try to fix this behaviour?
Kamailio core cookbook says: ”tcp_send_timeout Time in seconds after a TCP connection will be closed if it is not available for writing in this interval (and Kamailio wants to send something on it). Lower this value for faster detection of broken TCP connections. The default value is 10s. Example of usage: tcp_send_timeout=3”
And I understood that tcp_send_timeout is the one that sets the TCP_USER_TIMEOUT, which by default is set to 10seconds in kamailio. I was expecting this would change the os default behaviour?
And I know that yes, I can modify the system defaults also but rather tweak it in kamailio if its possible.
-Pyry
From: Olle E. Johansson oej@edvina.net Date: Friday, 4. April 2025 at 13.11 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Pyry Aaltonen pyry.aaltonen@cuuma.com Subject: Re: [SR-Users] TCP timeout
On 4 Apr 2025, at 10:55, Pyry Aaltonen via sr-users sr-users@lists.kamailio.org wrote:
Hello,
I´ve googled around about tcp timeout in kamailio tcp connections when the tcp connection is broken.
Found this old question and to me it feels like I’m having that same situation as original question https://www.mail-archive.com/sr-users@lists.kamailio.org/msg15020.html Also found this https://kamailio.org/mailman3/hyperkitty/list/sr-dev@lists.kamailio.org/thre... and it seems to be the same description pretty much that I´m seeing. When the tcp connection (or tls) is interrupted in the network it takes around 15min for kamailio to reset the outgoing tcp connection.
I see in the logs that when restarting kamailio process it logs 2025-03-27 12:25:18.697 { "level": "INFO", "module": "core", "file": "core/tcp_main.c", "line": 3282, "function": "tcp_init", "logprefix": "", "message": "Set TCP_USER_TIMEOUT=10000 ms" } So I think the fix from this https://github.com/kamailio/kamailio/commit/d893f3af1444c8c4c5db6cd53fb57770... is applied.
This is tested with 5.8.5 and I have tested this by setting up with dispatcher tcp connection to external host Then with iptables drop traffic to that host, waiting kamailio to notice that destination is down, removing the iptables input and it takes around 15min to recover. (also restarting the kamailio helps and resolves the connection) And this is what kamailio prints during the test: Apr 4 08:29:15 kamailio[156947]: { "level": "ERROR", "module": "xlog", "file": "xlog.c", "line": 278, "function": "", "logprefix": "", "message": "Destination down: OPTIONS sip:ext-host;transport=tcp (<null>)" } Apr 4 08:44:43 kamailio[156957]: { "level": "ERROR", "module": "core", "file": "core/tcp_read.c", "line": 267, "function": "tcp_read_data", "logprefix": "", "message": "error reading: Connection timed out (110) ([]:5060 -> []:47492)" } Apr 4 08:44:43 kamailio[156957]: { "level": "ERROR", "module": "core", "file": "core/tcp_read.c", "line": 1524, "function": "tcp_read_req", "logprefix": "", "message": "error reading - c: 0x7f8625dea9b0 r: 0x7f8625deaad8 (-1)" } Apr 4 08:45:05 kamailio[156957]: { "level": "ERROR", "module": "xlog", "file": "xlog.c", "line": 278, "function": "", "logprefix": "Source:[ext-host]:5060, Call-id:4d0e1ccd315c817f-156947@int-host, CSeq:10", "message": "Destination up: OPTIONS sip:ext.host;transport=tcp (<null>)" }
Any advice how to lower the timeout to be quicker in such event?
That’s dependent on the operating system. This is one of the reasons why SIP outbound required two active TCP connections in order to have a fast failover.
If you google for “unix tcp timeout” you will find many documents with hints. I think Geoff Houston wrote a good summary once upon a time, but I can’t find it any more.
/O
https://datatracker.ietf.org/doc/html/rfc5626
" For a UA to receive incoming requests, the UA has to connect to a
server. Since the server can't connect to the UA, the UA has to make
sure that a flow is always active. This requires the UA to detect
when a flow fails. Since such detection takes time and leaves a
window of opportunity for missed incoming requests, this mechanism
allows the UA to register over multiple flows at the same time.”