Thank you Henning!
Here are the configured limits for Kamailio
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 31543 31543 processes
Max open files 16384 16384 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31543 31543 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
This is what I know so far:
There seems to be a problem with siptrace module (going via TCP to a heplify-server on another host), but I’m not sure if this is a coincidence or the cause of the problems.
I see quite a bit of these before the actual problem arises:
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: <core> [core/tcp_main.c:630]: _wbufq_add(): (591 bytes): write queue full or timeout (32498, total 43196, last write 0 s ago)
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [../../core/forward.h:261]: msg_send_buffer(): tcp_send failed
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [siptrace_hep.c:229]: trace_send_hep3_duplicate(): cannot send hep duplicate message
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4023]: handle_ser_child(): received CON_ERROR for 0x7fe054f8ae10 (id 727321), refcnt 2, flags
0x3096
At some point then all things go south and the log is flooded with these messages
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/io_wait.h:596]: io_watch_del(): invalid fd 2244, not in [0, 482)
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4677]: handle_tcpconn_ev(): io_watch_del(3) failed: for 0x7fe054f8ae10, fd 2244
And then ultimately the above combined with the initially mentioned
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/tcp_main.c:4264]: handle_ser_child(): failed to add new socket to the fd list
And occasionally also something like this
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/io_wait.h:373]: io_watch_add(): trying to overwrite entry 2247 watched for 5 in the hash 0x5621dc404ee0
(fd:-601539712, type:22049, data:0x5621dc23c004) with (2247, 2, 0x7fe04f85c2a0)
In total these add up to roughly 3600 lines per second (!!) in the log, so it is quickly flooded with these.
Florian FLOIMAIR
Software Development - Symphony Cloud Services (1568)
Von:
Henning Westerholt <hw@gilawa.com>
Datum: Montag, 11. November 2024 um 19:52
An: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Cc: Floimair Florian <f.floimair@commend.com>
Betreff: [External] RE: Question regarding error message "failed to add new socket to the fd list"
CAUTION:
This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hello Florian,
Any other error messages before that error happens, e.g. something about the memory.
Otherwise, it could be indeed a ulimit issue. First step would be to check the actual configured limits for the kamailio user.
Cheers,
Henning
From: Floimair Florian via sr-users <sr-users@lists.kamailio.org>
Sent: Montag, 11. November 2024 17:45
To: sr-users@lists.kamailio.org
Cc: Floimair Florian <f.floimair@commend.com>
Subject: [SR-Users] Question regarding error message "failed to add new socket to the fd list"
Hi!
We have recently had issues with one of our Production Kamailios.
When those happened, the log was filled with the following message:
CRITICAL: <core> [core/tcp_main.c:4528]: handle_new_connect(): failed to add new socket to the fd list
Now I wonder what the best approach is to prevent this.
We are using TCP/TLS only and I think this might be related to the file ulimit, but I am not sure about that.
Shared memory is set to 512MB
Can you give me a hint on what to look for?
Thank you very much!
P.S.: Sorry, I accidentally replied to a previous post of a different topic before which is totally unrelated (I think I should stop working for today
😉)
FLORIAN FLOIMAIR
Software Development - Symphony Cloud Services
Commend International GmbH
Saalachstrasse 51
5020 Salzburg, Austria
commend.com
LG Salzburg / FN 178618z