Hello Florian,

 

this looks indeed like a resource exhaustion, probably related to some interaction between different modules.

 

If it happened only one time, it might be difficult to reproduce it. If it happens frequently, it probably makes sense to investigate more into the direction of the siptrace module.

 

Cheers,

 

Henning

 

From: Floimair Florian via sr-users <sr-users@lists.kamailio.org>
Sent: Dienstag, 12. November 2024 11:07
To: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Cc: Floimair Florian <f.floimair@commend.com>
Subject: [SR-Users] WG: [External] RE: Question regarding error message "failed to add new socket to the fd list"

 

Thank you Henning!

Here are the configured limits for Kamailio

 

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             31543                31543                processes

Max open files            16384                16384                files

Max locked memory         unlimited            unlimited            bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       31543                31543                signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

 

 

This is what I know so far:

There seems to be a problem with siptrace module (going via TCP to a heplify-server on another host), but I’m not sure if this is a coincidence or the cause of the problems.
I see quite a bit of these before the actual problem arises:


Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: <core> [core/tcp_main.c:630]: _wbufq_add(): (591 bytes): write queue full or timeout  (32498, total 43196, last write 0 s ago)

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [../../core/forward.h:261]: msg_send_buffer(): tcp_send failed

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [siptrace_hep.c:229]: trace_send_hep3_duplicate(): cannot send hep duplicate message

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4023]: handle_ser_child(): received CON_ERROR for 0x7fe054f8ae10 (id 727321), refcnt 2, flags 0x3096

 

 

 

At some point then all things go south and the log is flooded with these messages

 

 

 

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/io_wait.h:596]: io_watch_del(): invalid fd 2244, not in [0, 482)

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4677]: handle_tcpconn_ev(): io_watch_del(3) failed: for 0x7fe054f8ae10, fd 2244

 

 

And then ultimately the above combined with the initially mentioned

 

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/tcp_main.c:4264]: handle_ser_child(): failed to add new socket to the fd list

 

And occasionally also something like this

 

Nov  4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/io_wait.h:373]: io_watch_add(): trying to overwrite entry 2247 watched for 5 in the hash 0x5621dc404ee0 (fd:-601539712, type:22049, data:0x5621dc23c004) with (2247, 2, 0x7fe04f85c2a0)

 

 

In total these add up to roughly 3600 lines per second (!!) in the log, so it is quickly flooded with these.

 

Florian FLOIMAIR
Software Development - Symphony Cloud Services (1568)

 

 

 

Von: Henning Westerholt <hw@gilawa.com>
Datum: Montag, 11. November 2024 um 19:52
An: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Cc: Floimair Florian <f.floimair@commend.com>
Betreff: [External] RE: Question regarding error message "failed to add new socket to the fd list"

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hello Florian,

 

Any other error messages before that error happens, e.g. something about the memory.

 

Otherwise, it could be indeed a ulimit issue. First step would be to check the actual configured limits for the kamailio user.

 

Cheers,

 

Henning

 

From: Floimair Florian via sr-users <sr-users@lists.kamailio.org>
Sent: Montag, 11. November 2024 17:45
To: sr-users@lists.kamailio.org
Cc: Floimair Florian <f.floimair@commend.com>
Subject: [SR-Users] Question regarding error message "failed to add new socket to the fd list"

 

Hi!

 

We have recently had issues with one of our Production Kamailios.
When those happened, the log was filled with the following message:

CRITICAL: <core> [core/tcp_main.c:4528]: handle_new_connect(): failed to add new socket to the fd list

 

Now I wonder what the best approach is to prevent this.

We are using TCP/TLS only and I think this might be related to the file ulimit, but I am not sure about that.

Shared memory is set to 512MB

Can you give me a hint on what to look for?

Thank you very much!

 

 

P.S.: Sorry, I accidentally replied to a previous post of a different topic before which is totally unrelated (I think I should stop working for today 😉)

 

 

 

FLORIAN FLOIMAIR
Software Development - Symphony Cloud Services

Commend International GmbH
Saalachstrasse 51
5020 Salzburg, Austria

signature_2072127332

commend.com

LG Salzburg / FN 178618z