Hi
Thanks for the reply. We are using the following limits and I guess they are the default values real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 15666 max locked memory (kbytes, -l) 8192 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 15666 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
But to me, it looks more like a deadlock somewhere around SSL_accept. According to the stack trace generated with "kamctl trap", all 8 TCP receiver processes are waiting in "#1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6". I guess the message "Too many references: cannot splice (109)" is just the consequence as the main process is no longer able to dispatch jobs to the child workers.
Meanwhile, I'm able to reproduce the issue. It requires a bit of brute force, but it deadlocks reliably after a few seconds.
I'm flooding Kamailio with register messages using sipexer. I'm running the following command 3 times in parallel and always after a few seconds, Kamailio is completely blocked. watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer -register -vl 3 -co -com -ex 0 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061"
I played a bit with the parameters "modparam("tls", "init_mode", 1)" and "tls_threads_mode = 2", but it didn't seem to change anything.
Please have in mind that we usually do not stress Kamailio that much. We have seen the issue a couple of times with around 5 to 15 SIP phones registered. But in this usual setup, the problem does not occur reliably.
Best regards Mathias
Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt <hw@gilawa.com
:
Hello,
check the ulimit settings for the kamailio user and general resource usage on that system. In development branch there is also a feature present to print the actual ulimits on startup.
Cheers,
Henning
*From:* Mathias Schneuwly via sr-users sr-users@lists.kamailio.org *Sent:* Montag, 17. März 2025 13:41 *To:* sr-users@lists.kamailio.org *Cc:* Mathias Schneuwly mathias@schneuwlys.ch *Subject:* [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
Hi
We are using kamailio 5.8.4.
Recently, we noticed, that after a few days, we see the following messages from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core> [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)`.
When this happens, devices are no longer able to register. At the moment we have no idea what might be wrong.
Unfortunately, the internet does not reveal much information regarding this issue. In one issue related to Kamailio ( https://lists.kamailio.org/mailman3/hyperkitty/list/sr-users@lists.kamailio....), I found the command `kamctl trap` and when I execute that, I see one issue, which might be related, but I'm just guessing...
---start 1782236 ----------------------------------------------------- [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 $1 = 26 $2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc = "tcp receiver (generic) child=0", '\000' <repeats 97 times>} #0 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6 No symbol table info available. #2 0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from /usr/lib/libcrypto.so.3 No symbol table info available. #3 0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #4 0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #5 0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #6 0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #7 0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #8 0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #9 0x00007fa14abe840b in EVP_RAND_generate () from /usr/lib/libcrypto.so.3 No symbol table info available. #10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3 No symbol table info available. #11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3 No symbol table info available. #12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3 No symbol table info available. #13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0, error=error@entry=0x7ffebd9443b8) at tls_server.c:457 ret = <optimized out> ssl = warning: could not convert 'ssl_st' from the host encoding (ANSI_X3.4-1968) to UTF-32. This normally should not happen, please file a bug report. 0x7fa147668bd0 cert = <optimized out> tls_c = 0x7fa14768f490 tls_log = <optimized out> __func__ = "tls_accept" #14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0, flags=0x7ffebd9647d0) at tls_server.c:1153 r = 0x7fa1475c0ee8 bytes_free = 16383 bytes_read = <optimized out> read_size = <optimized out> ssl_error = 0 ssl_read = 0 ssl = 0x7fa147668bd0 rd_buf = "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"... wr_buf = "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"... rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405, used = 1405, size = 65536} wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, size = 65536} tls_c = 0x7fa14768f490 enc_rd_buf = <optimized out> n = 0 flush_flags = 0 err_src = <optimized out> ip_buf = '\000' <repeats 63 times> x = <optimized out> tls_dbg = <optimized out> __func__ = "tls_h_read_f" redo_read = <optimized out> continue_ssl_read = <optimized out> __llevel = <optimized out> __kld = <optimized out> __llevel = <optimized out> __kld = <optimized out> #15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0, read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445 bytes = <optimized out> remaining = <optimized out> p = <optimized out> r = 0x7fa1475c0ee8 mc = <optimized out> body_len = <optimized out> tvnow = {tv_sec = 1742100857, tv_usec = 944923} tvdiff = <optimized out> mfline = <optimized out> mtransid = <optimized out> __func__ = "tcp_read_headers" #16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0, bytes_read=bytes_read@entry=0x7ffebd9647c8, read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:1509 bytes = <optimized out> total_bytes = 0 resp = 1 size = <optimized out> req = 0x7fa1475c0ee8 dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024, sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin = {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970}, sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024, sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u = {__u6_addr8 = "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16 = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding = "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats 23 times>, "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001 \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000 \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000", __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14 '\016', proto_pad0 = 0 '\000', proto_pad1 = 0} c = <optimized out> ret = <optimized out> again = <optimized out> __func__ = "tcp_read_req" #17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906 ret = 8 n = 0 read_flags = RD_CONN_SHORT_READ con = 0x7fa1475c0dc0 s = 14 resp = <optimized out> t = <optimized out> ee = 0x0 __func__ = "handle_io" error = <optimized out> #18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, t=2, h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075 n = 1 r = 0 fm = 0x7fa14c352c80 revents = 1 __func__ = "io_wait_loop_epoll" #19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2026 __func__ = "tcp_receive_loop" #20 0x000055d25197e72c in tcp_init_children (woneinit=woneinit@entry=0x7ffebd964d1c) at core/tcp_main.c:5391 r = <optimized out> i = <optimized out> reader_fd_1 = 39 pid = <optimized out> si_desc = "tcp receiver (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000 \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000 \020\251Q\322U\000\000\3750\235Q\322U\000" si = <optimized out> __func__ = "tcp_init_children" error = <optimized out> #21 0x000055d2517c6c7e in main_loop () at main.c:1950 i = <optimized out> pid = <optimized out> si = <optimized out> si_desc = "udp receiver child=7 sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat Mar 15 05:25:32 2025\n\000\264L\241\177\000" nrprocs = <optimized out> woneinit = 1 __func__ = "main_loop" error = <optimized out> #22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized out>) at main.c:3256 cfg_stream = <optimized out> c = <optimized out> r = <optimized out> tmp = 0x7ffebd966e7f "" tmp_len = 32673 port = 5060 proto = 0 aproto = 0 ahost = 0x0 aport = 0 options = 0x55d251aa74b8 ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 3209119183 rfd = <optimized out> debug_save = <optimized out> debug_flag = <optimized out> dont_fork_cnt = 0 n_lst = <optimized out> p = <optimized out> st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888, st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim = {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}} l1 = <optimized out> tbuf = '\000' <repeats 32 times>, "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000' <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, "\001", '\000' <repeats 118 times>... option_index = 12 long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" [Inferior 1 (process 1782236) detached] ---end 1782236 -------------------------------------------------------
Does anybody have an idea what this could be? I never saw that with previous versions, and now we can see it on several instances, but we are yet unable to reproduce.
Not sure if this might be related, but we also see many messages like `Mar 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core> [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing timeout on connection id: 402 (state: 0) - closing`
Best regards
Mathias