ctylihuai created an issue (kamailio/kamailio#4238)
### Description
The Kamailio “P” node responsible for 5G messaging encountered an unhandled exception and generated a core dump during operation . This node had been continuously running for over one month (34 days) without any issues, and the failure occurred only once . As of now, no reproducible steps have been identified to trigger the problem again
### Troubleshooting
#### Reproduction It cannot be reproduced for the time being
#### Debugging Data
``` (gdb) bt full #0 0x0000aaaac25ffd10 in local_timer_list_expire (l=0xaaaac29ef340 <tcp_main_ltimer>, t=1691001057, h=0xaaaac2a10158 <tcp_main_ltimer+134680>) at core/local_timer.c:193 tl = 0xaaaac2a7b528 <tcp_reader_ltimer+197384> ret = 0 #1 0x0000aaaac25ffe98 in local_timer_expire (h=0xaaaac29ef340 <tcp_main_ltimer>, t=1691001057) at core/local_timer.c:222 No locals. #2 0x0000aaaac260031c in local_timer_run (lt=0xaaaac29ef340 <tcp_main_ltimer>, saved_ticks=1691001057) at core/local_timer.c:245 __func__ = "local_timer_run" #3 0x0000aaaac2552bec in tcp_timer_run () at core/tcp_main.c:4875 ticks = 1691001057 #4 0x0000aaaac2555074 in tcp_main_loop () at core/tcp_main.c:5060 si = 0x0 r = 48 __func__ = "tcp_main_loop" #5 0x0000aaaac23d24b0 in main_loop () at main.c:1945 i = 48 pid = 0 si = 0x0 si_desc = "udp receiver child=47 sock=10.185.16.12:5060\000\000\000\000\000\027\065\215sw\247\235\060\346\256\310\377\377\000\000|\312p\302\252\252\000\000@\346\256\310\377\377\000\000\000\263\252\302\252\252\000\000\200\346\256\310\377\377\000\000\n\000\000\000\000\000\000\000\200\346\256\310\377\377\000\000\250\345=\302\252\252\000\000\200\346\256\310\377\377\000" nrprocs = 48 woneinit = 1 __func__ = "main_loop" #6 0x0000aaaac23de5f4 in main (argc=19, argv=0xffffc8aeebe8) at main.c:3212 cfg_stream = 0xaaaafeb0d260 c = -1 r = 0 tmp = 0xffffc8aefe85 "" tmp_len = -1484867064 port = 65535 proto = -1487420584 ahost = 0x0 aport = 0 options = 0xaaaac28ec2a0 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 3762187974 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 2 ---Type <return> to continue, or q <return> to quit--- n_lst = 0xffffc8aee8f8 p = 0x0 st = {st_dev = 64772, st_ino = 4298835, st_mode = 16877, st_nlink = 2, st_uid = 103, st_gid = 105, st_rdev = 0, __pad1 = 0, st_size = 6, st_blksize = 4096, __pad2 = 0, st_blocks = 0, st_atim = { tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 1733885384, tv_nsec = 0}, st_ctim = {tv_sec = 1737443001, tv_nsec = 2493470}, __glibc_reserved = {0, 0}} tbuf = "\210\351\256\310\377\377\000\000\204\351\256\310\377\377\000\000\002\b \200\002\b \200\260\351\256\310\377\377\000\000\234\303~\247\377\377\000\000\230e;\330\000\000\000\000\310*\201\247\377\377\000\000\330\fi\247\377\377\000\000\310\352\256\310\377\377\000\000\070\353\256\310\377\377\000\000\000\000\000\000\000\000\000\000@'\201\247\377\377\000\000\000\000\000\000\000\000\000\000X\352\256\310\377\377\000\000h\352\256\310\377\377\000\000\001", '\000' <repeats 15 times>, "\230e;\330\000\000\000\000h'\201\247\377\377\000\000\000\000\000\000\000\000\000\000X\352\256\310\377\377\000\000h\352\256\310\377\377\000\000\320\352\256\310\377\377\000\000\330\fi\247\377\377\000\000\210\351\256\310\377\377\000\000"... option_index = 12 long_options = {{name = 0xaaaac28ee678 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0xaaaac28e9410 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0xaaaac28ee680 "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0xaaaac28ee688 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0xaaaac28ee690 "substdef", has_arg = 1, flag = 0x0, val = 1026}, { name = 0xaaaac28ee6a0 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0xaaaac28ee6b0 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0xaaaac28ee6c0 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0xaaaac28ee6d0 "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0xaaaac28ee6e0 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, { name = 0xaaaac28ee6f0 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0xaaaac28ee6f8 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0xaaaac28ee708 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0xaaaac28ee710 "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" (gdb) p tl $30 = (struct timer_ln *) 0xaaaac2a7b528 <tcp_reader_ltimer+197384> (gdb) p *tl $31 = {next = 0x0, prev = 0xaaaac2a10158 <tcp_main_ltimer+134680>, expire = 0, initial_timeout = 0, data = 0x0, f = 0x0, flags = 0, slow_idx = 0} (gdb) p *h $10 = {next = 0x0, prev = 0xfff8adcef5d8}
循环打印 h->prev内容 $177 = (struct timer_ln *) 0xaaaac2a10158 <tcp_main_ltimer+134680> next = 0x0, prev = 0xfff8adcef5d8 $178 = (struct timer_ln * volatile) 0xfff8adcef5d8 next = 0xaaaac2a10158 <tcp_main_ltimer+134680>, prev = 0xfff9437ede60 $179 = (struct timer_ln *) 0xfff9437ede60 next = 0xfff8adcef5d8, prev = 0xfffa58c31970 $180 = (struct timer_ln *) 0xfffa58c31970 next = 0xfff9437ede60, prev = 0xaaaac2a1bd48 $181 = (struct timer_ln *) 0xaaaac2a1bd48 <tcp_main_ltimer+182792> next = 0xfffa58c31970, prev = 0xfffa53da92a8 $182 = (struct timer_ln *) 0xfffa53da92a8 next = 0xaaaac2a1bd48 <tcp_main_ltimer+182792>, prev = 0xfff96e22e0e0 $183 = (struct timer_ln *) 0xfff96e22e0e0 next = 0xfffa53da92a8, prev = 0xfffa4f642a28 $184 = (struct timer_ln *) 0xfffa4f642a28 next = 0xfff96e22e0e0, prev = 0xfff880261368 $185 = (struct timer_ln *) 0xfff880261368 next = 0xfffa4f642a28, prev = 0xfff99e55db30 $186 = (struct timer_ln *) 0xfff99e55db30 next = 0xfff880261368, prev = 0xfff99fe12eb0 $187 = (struct timer_ln *) 0xfff99fe12eb0 next = 0xfff99e55db30, prev = 0xfffa7f14dfa8 $188 = (struct timer_ln *) 0xfffa7f14dfa8 next = 0xfff99fe12eb0, prev = 0xaaaac2a21658 <tcp_main_ltimer+205592 $189 = (struct timer_ln *) 0xaaaac2a21658 <tcp_main_ltimer+205592> next = 0xfffa7f14dfa8, prev = 0xfff82902dfb8 $190 = (struct timer_ln *) 0xfff82902dfb8 next = 0xaaaac2a21658 <tcp_main_ltimer+205592>, prev = 0xfffa3728f340 $191 = (struct timer_ln *) 0xfffa3728f340 next = 0xfff82902dfb8, prev = 0xfff8f0b5ec40 $192 = (struct timer_ln *) 0xfff8f0b5ec40 next = 0xfffa3728f340, prev = 0xfffaa900c338 $193 = (struct timer_ln *) 0xfffaa900c338 next = 0xfff8f0b5ec40, prev = 0xfff888f53088 $194 = (struct timer_ln *) 0xfff888f53088 next = 0xfffaa900c338, prev = 0xfff79d06ff60 $195 = (struct timer_ln *) 0xfff79d06ff60 next = 0xfff888f53088, prev = 0xfff8eea80ff8 $196 = (struct timer_ln *) 0xfff8eea80ff8 next = 0xfffa38e4f870, prev = 0xfff83d1a6a18 $197 = (struct timer_ln *) 0xfff83d1a6a18 next = 0xfff8eea80ff8, prev = 0xfffa5c69fb48 $198 = (struct timer_ln *) 0xfffa5c69fb48 next = 0xfff83d1a6a18, prev = 0xfff8b5c31c58 $199 = (struct timer_ln *) 0xfff8b5c31c58 next = 0xfffa5c69fb48, prev = 0xfff9a4ac6688 $200 = (struct timer_ln *) 0xfff9a4ac6688 next = 0xfff8b5c31c58, prev = 0xfff9e3ee0200 $201 = (struct timer_ln *) 0xfff9e3ee0200 next = 0xfff9a4ac6688, prev = 0xaaaac2a267b8 $202 = (struct timer_ln *) 0xaaaac2a267b8 <tcp_main_ltimer+226424> next = 0xfff9e3ee0200, prev = 0xfff9b6b00188 $203 = (struct timer_ln *) 0xfff9b6b00188 next = 0xaaaac2a267b8 <tcp_main_ltimer+226424>, prev = 0xfffa38e4f870 $204 = (struct timer_ln *) 0xfffa38e4f870 next = 0xfff9b6b00188, prev = 0xfff8eea80ff8 $205 = (struct timer_ln *) 0xfff8eea80ff8 next = 0xfffa38e4f870, prev = 0xfff83d1a6a18 $206 = (struct timer_ln *) 0xfff83d1a6a18 next = 0xfff8eea80ff8, prev = 0xfffa5c69fb48 $207 = (struct timer_ln *) 0xfffa5c69fb48 next = 0xfff83d1a6a18, prev = 0xfff8b5c31c58 $208 = (struct timer_ln *) 0xfff8b5c31c58 next = 0xfffa5c69fb48, prev = 0xfff9a4ac6688 $209 = (struct timer_ln *) 0xfff9a4ac6688 next = 0xfff8b5c31c58, prev = 0xfff9e3ee0200 $210 = (struct timer_ln *) 0xfff9e3ee0200 next = 0xfff9a4ac6688, prev = 0xaaaac2a267b8 <tcp_main_ltimer+226424> $211 = (struct timer_ln *) 0xaaaac2a267b8 <tcp_main_ltimer+226424> next = 0xfff9e3ee0200, prev = 0xfff9b6b00188 $212 = (struct timer_ln *) 0xfff9b6b00188 next = 0xaaaac2a267b8 <tcp_main_ltimer+226424>, prev = 0xfffa38e4f870 $213 = (struct timer_ln *) 0xfffa38e4f870 next = 0xfff9b6b00188, prev = 0xfff8eea80ff8
(gdb) set $node = h->prev (gdb) while ($node != 0)
print *$node set $node = $node->prev end
$50 = {next = 0xaaaac2a10158 <tcp_main_ltimer+134680>, prev = 0xfff9437ede60, expire = 1691004064, initial_timeout = 5760, data = 0xfff8adcef430, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $51 = {next = 0xfff8adcef5d8, prev = 0xfffa58c31970, expire = 1691004064, initial_timeout = 5760, data = 0xfff9437edcb8, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $52 = {next = 0xfff9437ede60, prev = 0xaaaac2a1bd48 <tcp_main_ltimer+182792>, expire = 1691004064, initial_timeout = 5760, data = 0xfffa58c317c8, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $53 = {next = 0xfffa58c31970, prev = 0xfffa53da92a8, expire = 3125221032, initial_timeout = 65529, data = 0xfffa51099620, f = 0xfff7f9d36838, flags = 4191381560, slow_idx = 65527} $54 = {next = 0xaaaac2a1bd48 <tcp_main_ltimer+182792>, prev = 0xfff96e22e0e0, expire = 1691004064, initial_timeout = 5760, data = 0xfffa53da9100, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $55 = {next = 0xfffa53da92a8, prev = 0xfffa4f642a28, expire = 1691004064, initial_timeout = 5760, data = 0xfff96e22df38, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $56 = {next = 0xfff96e22e0e0, prev = 0xfff880261368, expire = 1691004064, initial_timeout = 5760, data = 0xfffa4f642880, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $57 = {next = 0xfffa4f642a28, prev = 0xfff99e55db30, expire = 1691005489, initial_timeout = 5760, data = 0xfff8802611c0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $58 = {next = 0xfff880261368, prev = 0xfff99fe12eb0, expire = 1691005489, initial_timeout = 5760, data = 0xfff99e55d988, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $59 = {next = 0xfff99e55db30, prev = 0xfffa7f14dfa8, expire = 1691005489, initial_timeout = 5760, data = 0xfff99fe12d08, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $60 = {next = 0xfff99fe12eb0, prev = 0xaaaac2a21658 <tcp_main_ltimer+205592>, expire = 1691005489, initial_timeout = 5760, data = 0xfffa7f14de00, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $61 = {next = 0xfffa7f14dfa8, prev = 0xfff82902dfb8, expire = 888530032, initial_timeout = 65528, data = 0xfff81b870028, f = 0xfff798f80cc0, flags = 522077992, slow_idx = 65529} $62 = {next = 0xaaaac2a21658 <tcp_main_ltimer+205592>, prev = 0xfffa3728f340, expire = 1691005489, initial_timeout = 5760, data = 0xfff82902de10, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $63 = {next = 0xfff82902dfb8, prev = 0xfff8f0b5ec40, expire = 1691005489, initial_timeout = 5760, data = 0xfffa3728f198, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $64 = {next = 0xfffa3728f340, prev = 0xfffaa900c338, expire = 1691005489, initial_timeout = 5760, data = 0xfff8f0b5ea98, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $65 = {next = 0xfff8f0b5ec40, prev = 0xfff888f53088, expire = 1691005489, initial_timeout = 5760, data = 0xfffaa900c190, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $66 = {next = 0xfffaa900c338, prev = 0xfff79d06ff60, expire = 1691005489, initial_timeout = 5760, data = 0xfff888f52ee0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $67 = {next = 0xfff888f53088, prev = 0xfff8eea80ff8, expire = 1691005489, initial_timeout = 5760, data = 0xfff79d06fdb8, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $68 = {next = 0xfffa38e4f870, prev = 0xfff83d1a6a18, expire = 1691006791, initial_timeout = 5760, data = 0xfff8eea80e50, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $69 = {next = 0xfff8eea80ff8, prev = 0xfffa5c69fb48, expire = 1691006791, initial_timeout = 5760, data = 0xfff83d1a6870, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $70 = {next = 0xfff83d1a6a18, prev = 0xfff8b5c31c58, expire = 1691006791, initial_timeout = 5760, data = 0xfffa5c69f9a0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $71 = {next = 0xfffa5c69fb48, prev = 0xfff9a4ac6688, expire = 1691006791, initial_timeout = 5760, data = 0xfff8b5c31ab0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $72 = {next = 0xfff8b5c31c58, prev = 0xfff9e3ee0200, expire = 1691006791, initial_timeout = 5760, data = 0xfff9a4ac64e0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $73 = {next = 0xfff9a4ac6688, prev = 0xaaaac2a267b8 <tcp_main_ltimer+226424>, expire = 1691006791, initial_timeout = 5760, data = 0xfff9e3ee0058, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $74 = {next = 0xfff9e3ee0200, prev = 0xfff9b6b00188, expire = 1429254648, initial_timeout = 65528, data = 0xfff88564eb80, f = 0xfff9cc7da958, flags = 881566008, slow_idx = 65529} $75 = {next = 0xaaaac2a267b8 <tcp_main_ltimer+226424>, prev = 0xfffa38e4f870, expire = 1691006791, initial_timeout = 5760, data = 0xfff9b6afffe0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $76 = {next = 0xfff9b6b00188, prev = 0xfff8eea80ff8, expire = 1691006791, initial_timeout = 5760, data = 0xfffa38e4f6c8, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $77 = {next = 0xfffa38e4f870, prev = 0xfff83d1a6a18, expire = 1691006791, initial_timeout = 5760, data = 0xfff8eea80e50, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $78 = {next = 0xfff8eea80ff8, prev = 0xfffa5c69fb48, expire = 1691006791, initial_timeout = 5760, data = 0xfff83d1a6870, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $79 = {next = 0xfff83d1a6a18, prev = 0xfff8b5c31c58, expire = 1691006791, initial_timeout = 5760, data = 0xfffa5c69f9a0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $80 = {next = 0xfffa5c69fb48, prev = 0xfff9a4ac6688, expire = 1691006791, initial_timeout = 5760, data = 0xfff8b5c31ab0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $81 = {next = 0xfff8b5c31c58, prev = 0xfff9e3ee0200, expire = 1691006791, initial_timeout = 5760, data = 0xfff9a4ac64e0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} ---Type <return> to continue, or q <return> to quit--- $82 = {next = 0xfff9a4ac6688, prev = 0xaaaac2a267b8 <tcp_main_ltimer+226424>, expire = 1691006791, initial_timeout = 5760, data = 0xfff9e3ee0058, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} $83 = {next = 0xfff9e3ee0200, prev = 0xfff9b6b00188, expire = 1429254648, initial_timeout = 65528, data = 0xfff88564eb80, f = 0xfff9cc7da958, flags = 881566008, slow_idx = 65529} $84 = {next = 0xaaaac2a267b8 <tcp_main_ltimer+226424>, prev = 0xfffa38e4f870, expire = 1691006791, initial_timeout = 5760, data = 0xfff9b6afffe0, f = 0xaaaac2551690 <tcpconn_main_timeout>, flags = 512, slow_idx = 0} ```
#### Log Messages
``` 2025-05-02T05:39:54.151913300Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([49.69.41.98]:36121 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.152035620Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff7cd0a8ba0 r: 0xfff7cd0a8cc8 (-1) 2025-05-02T05:39:54.152046140Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([110.252.23.203]:27380 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.152053220Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff9a1658f08 r: 0xfff9a1659030 (-1) 2025-05-02T05:39:54.152067100Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([240e:440:1b18:1fb3:22:da32:6be9:33e7]:50529 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [240e:954:2000:208::31]:5460) 2025-05-02T05:39:54.152468020Z 90(137) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([124.236.136.161]:13636 ->90(137) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.152535370Z 87(134) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([111.61.229.67]:41796 ->87(134) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.152895350Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfffa8d174330 r: 0xfffa8d174458 (-1) 2025-05-02T05:39:54.152916500Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([240e:440:9a08:24:1444:71da:9ff4:a6e6]:57851 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [240e:954:2000:208::31]:5460) 2025-05-02T05:39:54.153179200Z 90(137) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfffa45cedc18 r: 0xfffa45cedd40 (-1) 2025-05-02T05:39:54.153425390Z 87(134) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfffa22771e70 r: 0xfffa22771f98 (-1) 2025-05-02T05:39:54.153520770Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff83da65638 r: 0xfff83da65760 (-1) 2025-05-02T05:39:54.153540580Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([2408:8431:6810:2e2:1cbb:1c9e:63e6:abf1]:61717 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [240e:954:2000:208::31]:5260) 2025-05-02T05:39:54.154460230Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff7ccf40938 r: 0xfff7ccf40a60 (-1) 2025-05-02T05:39:54.154502480Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([123.182.115.202]:21533 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.154629490Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff89ae41378 r: 0xfff89ae414a0 (-1) 2025-05-02T05:39:54.154649140Z 84(131) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection timed out (110) ([183.199.201.13]:3125 ->84(131) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.154830180Z 84(131) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff9acc5d970 r: 0xfff9acc5da98 (-1) 2025-05-02T05:39:54.173624940Z 90(137) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff7de186590 r: 0xfff7de1866b8 (-1) 2025-05-02T05:39:54.325541260Z 117(164) ERROR: <core> [core/tcp_read.c:297]: tcp_read_data(): error reading: Connection reset by peer (104) ([171.114.149.119]:18629 ->117(164) ERROR: <core> [core/tcp_read.c:299]: tcp_read_data(): -> [10.185.18.67]:5260) 2025-05-02T05:39:54.325597230Z 117(164) ERROR: <core> [core/tcp_read.c:1556]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0xfff83556b7e0 r: 0xfff83556b908 (-1) 2025-05-02T05:39:54.395159000Z 41(88) ERROR: <core> [core/tcp_main.c:1406]: tcp_do_connect(): connect 183.198.156.45:1272 failed Cannot assign requested address 2025-05-02T05:39:54.395211960Z 41(88) ERROR: <core> [core/tcp_main.c:1409]: tcp_do_connect(): connect 183.198.156.45:1272: (99) Cannot assign requested address 2025-05-02T05:39:54.395220020Z 41(88) ERROR: <core> [core/tcp_main.c:1545]: tcpconn_finish_connect(): 183.198.156.45:1272: tcp_do_connect for 0xfff95f7037e0 failed 2025-05-02T05:39:54.395226280Z 41(88) ERROR: <core> [core/tcp_main.c:2200]: tcp_send(): 183.198.156.45:1272: tcpconn_finish_connect(0xfff95f7037e0) failed 2025-05-02T05:39:54.395232270Z 41(88) ERROR: tm [../../core/forward.h:290]: msg_send_buffer(): tcp_send failed 2025-05-02T05:39:54.395337640Z 41(88) ERROR: sl [sl_funcs.c:415]: sl_reply_error(): stateless error reply used: Unfortunately error on sending to next hop occurred (477/SL) 2025-05-02T05:39:54.484799620Z 19(66) ERROR: <core> [core/tcp_main.c:1406]: tcp_do_connect(): connect 1.202.124.130:54952 failed Cannot assign requested address 2025-05-02T05:39:54.484847710Z 19(66) ERROR: <core> [core/tcp_main.c:1409]: tcp_do_connect(): connect 1.202.124.130:54952: (99) Cannot assign requested address 2025-05-02T05:39:54.484882280Z 19(66) ERROR: <core> [core/tcp_main.c:1545]: tcpconn_finish_connect(): 1.202.124.130:54952: tcp_do_connect for 0xfff9c7c0d500 failed 2025-05-02T05:39:54.484889320Z 19(66) ERROR: <core> [core/tcp_main.c:2200]: tcp_send(): 1.202.124.130:54952: tcpconn_finish_connect(0xfff9c7c0d500) failed 2025-05-02T05:39:54.484894930Z 19(66) ERROR: tm [../../core/forward.h:260]: msg_send_buffer(): tcp_send failed 2025-05-02T05:39:54.484899790Z 19(66) ERROR: sl [sl_funcs.c:415]: sl_reply_error(): stateless error reply used: Unfortunately error on sending to next hop occurred (477/SL) 2025-05-02T05:39:54.759458370Z 118(165) CRITICAL: <core> [core/io_wait.h:594]: io_watch_del(): invalid fd 69, not in [0, 0) 2025-05-02T05:39:54.759506810Z 118(165) ERROR: <core> [core/tcp_read.c:1765]: tcpconn_read_timeout(): io_watch_del failed for 0xfff80ff7f860 id 65244425 fd 69, state 0, flags c018, main fd 48347 ([240e::31]:63566 -> [240e::31]:5460)
note:[240e::31] is the service address of kamailio, hiding the real IP address . ```
### Possible Solutions
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.7.4 (x86_64/linux) fdc771 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `lsb_release -a` and `uname -a`) -->
``` OpenEuler 22.03 Linux localhost.localdomain 5.10.0-136.12.0.86.ctl3.x86_64 #1 SMP Fri Jan 13 08:18:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux ```
ctylihuai left a comment (kamailio/kamailio#4238)
From the GDB output, two issues are apparent: 1. The `tcp_main` process was not configured as a worker, yet a local connection in the main process triggered `tcpconn_read_timeout`. 2. The `tcp_main_ltimer` list is corrupted: the doubly-linked list contains multiple `head` pointers, some null pointers, and reader timer pointers.
henningw left a comment (kamailio/kamailio#4238)
Its strange that you are seeing TCP operations in your backtrace, when you are saying that no TCP was configured for this node. Can you share the listen cfg?
Regarding the Kamailio 5.7.4 - its an old version which is not maintained anymore from the open source project. I would suggest to update to a maintained version, e.g. one of the latest 5.8.x or 6.0.x versions.
ctylihuai left a comment (kamailio/kamailio#4238)
listen cfg config:
listen=tcp:PCSCF_IP:5460 #!define TCP_PROCESSES 48 tcp_async= yes # Alias name: tcp_buf_write tcp_conn_wq_max = 16384 tcp_rd_buf_size = 16384 tcp_wq_max = 16777216 #tcp_defer_accept = 5 #tcp_keepidle = 7200 #tcp_keepintvl = 75 #tcp_keecnt = 10 tcp_poll_method= epoll_lt tcp_send_timeout = 10 tcp_syncnt = 2 tcp_delayed_ack = no tcp_max_connections = 1000000 tcp_accept_haproxy=no tcp_reuse_port = yes tcp_keepalive = yes tcp_keepcnt = 3 tcp_keepidle = 120 tcp_keepintvl = 75 tcp_listen_backlog = 10240 tcp_clone_rcvbuf=1 #tcp_msg_read_timeout=120 #tcp_msg_data_timeout=120 tcp_accept_iplimit =0 tcp_connection_lifetime=3615 tcp_accept_aliases=no # Enable SIP outbound TCP keep-alive using PING-PONG (CRLFCRLF - CRLF). tcp_crlf_ping=yes tcp_accept_no_cl=yes #!ifdef TCP_PROCESSES tcp_children=TCP_PROCESSES loadmodule "tcpops" modparam("tcpops", "closed_event", 2) event_route[tcp:closed] { route(TCPEVENT); event_route[tcp:reset] { route(TCPEVENT); event_route[tcp:timeout] { route(TCPEVENT);
ctylihuai left a comment (kamailio/kamailio#4238)
Its strange that you are seeing TCP operations in your backtrace, when you are saying that no TCP was configured for this node. Can you share the listen cfg?
Regarding the Kamailio 5.7.4 - its an old version which is not maintained anymore from the open source project. I would suggest to update to a maintained version, e.g. one of the latest 5.8.x or 6.0.x versions.
listen cfg config:
listen=tcp:PCSCF_IP:5460 #!define TCP_PROCESSES 48 tcp_async= yes # Alias name: tcp_buf_write tcp_conn_wq_max = 16384 tcp_rd_buf_size = 16384 tcp_wq_max = 16777216 #tcp_defer_accept = 5 #tcp_keepidle = 7200 #tcp_keepintvl = 75 #tcp_keecnt = 10 tcp_poll_method= epoll_lt tcp_send_timeout = 10 tcp_syncnt = 2 tcp_delayed_ack = no tcp_max_connections = 1000000 tcp_accept_haproxy=no tcp_reuse_port = yes tcp_keepalive = yes tcp_keepcnt = 3 tcp_keepidle = 120 tcp_keepintvl = 75 tcp_listen_backlog = 10240 tcp_clone_rcvbuf=1 #tcp_msg_read_timeout=120 #tcp_msg_data_timeout=120 tcp_accept_iplimit =0 tcp_connection_lifetime=3615 tcp_accept_aliases=no tcp_crlf_ping=yes tcp_accept_no_cl=yes #!ifdef TCP_PROCESSES tcp_children=TCP_PROCESSES loadmodule "tcpops" modparam("tcpops", "closed_event", 2) event_route[tcp:closed] { route(TCPEVENT); event_route[tcp:reset] { route(TCPEVENT); event_route[tcp:timeout] { route(TCPEVENT);
henningw left a comment (kamailio/kamailio#4238)
Ok, so you have configured TCP listener:
listen=tcp:PCSCF_IP:5460
As said, I would suggest trying to do an update to a new Kamailio version before investigating further.
ctylihuai left a comment (kamailio/kamailio#4238)
Ok, so you have configured TCP listener:
listen=tcp:PCSCF_IP:5460
As said, I would suggest trying to do an update to a new Kamailio version before investigating further.
1.The tcp_main process was not configured as a worker, yet a local connection in the main process triggered tcpconn_read_timeout. Is it possible that when sending a TCP CRLF ping, the connection is disconnected, and then reconnecting and reusing the object causes this? https://github.com/kamailio/kamailio/issues/3443? The situation is somewhat similar.
Will tcp_connection_match and tcp_reuse_port be helpful?
henningw left a comment (kamailio/kamailio#4238)
What do you mean with "the tcp_main process was not configured as a worker"? As said, I would recommend to do an update before investigating further.
ctylihuai left a comment (kamailio/kamailio#4238)
What do you mean with "the tcp_main process was not configured as a worker"? As said, I would recommend to do an update before investigating further.
tcp_main did not initialize tcp_reader_ltimer and io_w, The tcp_reader_ltimer should not have a value. (gdb) p tcp_reader_ltimer $1 = {prev_ticks = 0, timer_lst = {h0 = {{next = 0x0, prev = 0x0} <repeats 12336 times>, {next = 0x0, prev = 0xaaaac2a10158 <tcp_main_ltimer+134680>}, {next = 0x0, prev = 0x0} <repeats 4047 times>}, h1 = {{ next = 0x0, prev = 0x0} <repeats 512 times>}, h2 = {{next = 0x0, prev = 0x0} <repeats 512 times>}, expired = {next = 0x0, prev = 0x0}}}
Unfortunately, we are in a commercial environment now and cannot make major version upgrade .
henningw left a comment (kamailio/kamailio#4238)
Ok, you were referring to the source code, got it. Maybe somebody else can share some insight here as well. You can review the patches done to the relevant core parts in newer releases by yourself. Otherwise if you can not update, you can of course reach out to one of the companies in the business directory offering commercial support.
ctylihuai left a comment (kamailio/kamailio#4238)
Unable to pinpoint the exact cause, but connection reuse seems implicated. We'll enable tcp_no_connect = yes to test its impact. Thanks for the help.