Hello Mathias,
1024 open files are not that much, you should increase it for the kamailio user. Maybe also check the actual limits by “cat /proc/$kamailio-pid/limits”
Not 100% if its related, but if you are looking to scale this would be for sure required.
About the tls_thread_mode – you should use mode 2 if you are running with openssl 3.x, otherwise you probably will observe random crashes.
If you are seeing the mentioned problem with 15 phones, this is clearly a problem and should be further investigated.
Cheers,
Henning
From: Mathias Schneuwly via sr-users <sr-users@lists.kamailio.org>
Sent: Donnerstag, 20. März 2025 08:07
To: sr-users@lists.kamailio.org
Cc: Mathias Schneuwly <mathias@schneuwlys.ch>
Subject: [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
Hi
Thanks for the reply. We are using the following limits and I guess they are the default values
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15666
max locked memory (kbytes, -l) 8192
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15666
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
But to me, it looks more like a deadlock somewhere around SSL_accept. According to the stack trace generated with "kamctl trap", all 8 TCP receiver processes are waiting in "#1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6".
I guess the message "Too many references: cannot splice (109)" is just the consequence as the main process is no longer able to dispatch jobs to the child workers.
Meanwhile, I'm able to reproduce the issue. It requires a bit of brute force, but it deadlocks reliably after a few seconds.
I'm flooding Kamailio with register messages using sipexer. I'm running the following command 3 times in parallel and always after a few seconds, Kamailio is completely blocked.
watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru
sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer -register -vl 3 -co -com -ex 0 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru
sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061"
I played a bit with the parameters "modparam("tls", "init_mode", 1)" and "tls_threads_mode = 2", but it didn't seem to change anything.
Please have in mind that we usually do not stress Kamailio that much. We have seen the issue a couple of times with around 5 to 15 SIP phones registered. But in this usual setup, the problem does not occur reliably.
Best regards
Mathias
Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt <hw@gilawa.com>:
Hello,
check the ulimit settings for the kamailio user and general resource usage on that system. In development branch there is also a feature present to print the actual ulimits on startup.
Cheers,
Henning
From: Mathias Schneuwly via sr-users <sr-users@lists.kamailio.org>
Sent: Montag, 17. März 2025 13:41
To: sr-users@lists.kamailio.org
Cc: Mathias Schneuwly <mathias@schneuwlys.ch>
Subject: [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
Hi
We are using kamailio 5.8.4.
Recently, we noticed, that after a few days, we see the following messages from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core> [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)`.
When this happens, devices are no longer able to register. At the moment we have no idea what might be wrong.
Unfortunately, the internet does not reveal much information regarding this issue. In one issue related to Kamailio (https://lists.kamailio.org/mailman3/hyperkitty/list/sr-users@lists.kamailio.org/thread/FKN25CGV27KA3GTS3ED3E7V3WKRZALRP/), I found the command `kamctl trap` and when I execute that, I see one issue, which might be related, but I'm just guessing...
```
---start 1782236 -----------------------------------------------------
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6
$1 = 26
$2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc = "tcp receiver (generic) child=0", '\000' <repeats 97 times>}
#0 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6
No symbol table info available.
#2 0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from /usr/lib/libcrypto.so.3
No symbol table info available.
#3 0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3
No symbol table info available.
#4 0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3
No symbol table info available.
#5 0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3
No symbol table info available.
#6 0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3
No symbol table info available.
#7 0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3
No symbol table info available.
#8 0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3
No symbol table info available.
#9 0x00007fa14abe840b in EVP_RAND_generate () from /usr/lib/libcrypto.so.3
No symbol table info available.
#10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3
No symbol table info available.
#11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3
No symbol table info available.
#12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3
No symbol table info available.
#13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0, error=error@entry=0x7ffebd9443b8) at tls_server.c:457
ret = <optimized out>
ssl = warning: could not convert 'ssl_st' from the host encoding (ANSI_X3.4-1968) to UTF-32.
This normally should not happen, please file a bug report.
0x7fa147668bd0
cert = <optimized out>
tls_c = 0x7fa14768f490
tls_log = <optimized out>
__func__ = "tls_accept"
#14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0, flags=0x7ffebd9647d0) at tls_server.c:1153
r = 0x7fa1475c0ee8
bytes_free = 16383
bytes_read = <optimized out>
read_size = <optimized out>
ssl_error = 0
ssl_read = 0
ssl = 0x7fa147668bd0
rd_buf = "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"...
wr_buf = "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"...
rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405, used = 1405, size = 65536}
wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, size = 65536}
tls_c = 0x7fa14768f490
enc_rd_buf = <optimized out>
n = 0
flush_flags = 0
err_src = <optimized out>
ip_buf = '\000' <repeats 63 times>
x = <optimized out>
tls_dbg = <optimized out>
__func__ = "tls_h_read_f"
redo_read = <optimized out>
continue_ssl_read = <optimized out>
__llevel = <optimized out>
__kld = <optimized out>
__llevel = <optimized out>
__kld = <optimized out>
#15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0, read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445
bytes = <optimized out>
remaining = <optimized out>
p = <optimized out>
r = 0x7fa1475c0ee8
mc = <optimized out>
body_len = <optimized out>
tvnow = {tv_sec = 1742100857, tv_usec = 944923}
tvdiff = <optimized out>
mfline = <optimized out>
mtransid = <optimized out>
__func__ = "tcp_read_headers"
#16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0, bytes_read=bytes_read@entry=0x7ffebd9647c8, read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:1509
bytes = <optimized out>
total_bytes = 0
resp = 1
size = <optimized out>
req = 0x7fa1475c0ee8
dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024, sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin = {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970}, sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024, sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u = {__u6_addr8 = "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16 = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding = "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats 23 times>, "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001 \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000 \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000", __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14 '\016', proto_pad0 = 0 '\000', proto_pad1 = 0}
c = <optimized out>
ret = <optimized out>
again = <optimized out>
__func__ = "tcp_read_req"
#17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906
ret = 8
n = 0
read_flags = RD_CONN_SHORT_READ
con = 0x7fa1475c0dc0
s = 14
resp = <optimized out>
t = <optimized out>
ee = 0x0
__func__ = "handle_io"
error = <optimized out>
#18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, t=2, h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075
n = 1
r = 0
fm = 0x7fa14c352c80
revents = 1
__func__ = "io_wait_loop_epoll"
#19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2026
__func__ = "tcp_receive_loop"
#20 0x000055d25197e72c in tcp_init_children (woneinit=woneinit@entry=0x7ffebd964d1c) at core/tcp_main.c:5391
r = <optimized out>
i = <optimized out>
reader_fd_1 = 39
pid = <optimized out>
si_desc = "tcp receiver (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000 \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000 \020\251Q\322U\000\000\3750\235Q\322U\000"
si = <optimized out>
__func__ = "tcp_init_children"
error = <optimized out>
#21 0x000055d2517c6c7e in main_loop () at main.c:1950
i = <optimized out>
pid = <optimized out>
si = <optimized out>
si_desc = "udp receiver child=7 sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat Mar 15 05:25:32 2025\n\000\264L\241\177\000"
nrprocs = <optimized out>
woneinit = 1
__func__ = "main_loop"
error = <optimized out>
#22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized out>) at main.c:3256
cfg_stream = <optimized out>
c = <optimized out>
r = <optimized out>
tmp = 0x7ffebd966e7f ""
tmp_len = 32673
port = 5060
proto = 0
aproto = 0
ahost = 0x0
aport = 0
options = 0x55d251aa74b8 ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
ret = -1
seed = 3209119183
rfd = <optimized out>
debug_save = <optimized out>
debug_flag = <optimized out>
dont_fork_cnt = 0
n_lst = <optimized out>
p = <optimized out>
st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888, st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim = {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}}
l1 = <optimized out>
tbuf = '\000' <repeats 32 times>, "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000' <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, "\001", '\000' <repeats 118 times>...
option_index = 12
long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
__func__ = "main"
[Inferior 1 (process 1782236) detached]
---end 1782236 -------------------------------------------------------```
Does anybody have an idea what this could be? I never saw that with previous versions, and now we can see it on several instances, but we are yet unable to reproduce.
Not sure if this might be related, but we also see many messages like `Mar 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core> [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing timeout on connection id: 402 (state: 0) - closing`
Best regards
Mathias