Thank you Henning!
Here are the configured limits for Kamailio
Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 31543 31543 processes Max open files 16384 16384 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 31543 31543 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
This is what I know so far:
There seems to be a problem with siptrace module (going via TCP to a heplify-server on another host), but I’m not sure if this is a coincidence or the cause of the problems. I see quite a bit of these before the actual problem arises:
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: <core> [core/tcp_main.c:630]: _wbufq_add(): (591 bytes): write queue full or timeout (32498, total 43196, last write 0 s ago) Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [../../core/forward.h:261]: msg_send_buffer(): tcp_send failed Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [siptrace_hep.c:229]: trace_send_hep3_duplicate(): cannot send hep duplicate message Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4023]: handle_ser_child(): received CON_ERROR for 0x7fe054f8ae10 (id 727321), refcnt 2, flags 0x3096
At some point then all things go south and the log is flooded with these messages
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/io_wait.h:596]: io_watch_del(): invalid fd 2244, not in [0, 482) Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4677]: handle_tcpconn_ev(): io_watch_del(3) failed: for 0x7fe054f8ae10, fd 2244
And then ultimately the above combined with the initially mentioned
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/tcp_main.c:4264]: handle_ser_child(): failed to add new socket to the fd list
And occasionally also something like this
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/io_wait.h:373]: io_watch_add(): trying to overwrite entry 2247 watched for 5 in the hash 0x5621dc404ee0 (fd:-601539712, type:22049, data:0x5621dc23c004) with (2247, 2, 0x7fe04f85c2a0)
In total these add up to roughly 3600 lines per second (!!) in the log, so it is quickly flooded with these.
Florian FLOIMAIR Software Development - Symphony Cloud Services (1568)
Von: Henning Westerholt hw@gilawa.com Datum: Montag, 11. November 2024 um 19:52 An: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Floimair Florian f.floimair@commend.com Betreff: [External] RE: Question regarding error message "failed to add new socket to the fd list"
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hello Florian,
Any other error messages before that error happens, e.g. something about the memory.
Otherwise, it could be indeed a ulimit issue. First step would be to check the actual configured limits for the kamailio user.
Cheers,
Henning
From: Floimair Florian via sr-users sr-users@lists.kamailio.org Sent: Montag, 11. November 2024 17:45 To: sr-users@lists.kamailio.org Cc: Floimair Florian f.floimair@commend.com Subject: [SR-Users] Question regarding error message "failed to add new socket to the fd list"
Hi!
We have recently had issues with one of our Production Kamailios. When those happened, the log was filled with the following message:
CRITICAL: <core> [core/tcp_main.c:4528]: handle_new_connect(): failed to add new socket to the fd list
Now I wonder what the best approach is to prevent this.
We are using TCP/TLS only and I think this might be related to the file ulimit, but I am not sure about that. Shared memory is set to 512MB
Can you give me a hint on what to look for? Thank you very much!
P.S.: Sorry, I accidentally replied to a previous post of a different topic before which is totally unrelated (I think I should stop working for today 😉)
FLORIAN FLOIMAIR Software Development - Symphony Cloud Services Commend International GmbH Saalachstrasse 51 5020 Salzburg, Austria [signature_2072127332] commend.com LG Salzburg / FN 178618z
Hello Florian,
this looks indeed like a resource exhaustion, probably related to some interaction between different modules.
If it happened only one time, it might be difficult to reproduce it. If it happens frequently, it probably makes sense to investigate more into the direction of the siptrace module.
Cheers,
Henning
From: Floimair Florian via sr-users sr-users@lists.kamailio.org Sent: Dienstag, 12. November 2024 11:07 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Floimair Florian f.floimair@commend.com Subject: [SR-Users] WG: [External] RE: Question regarding error message "failed to add new socket to the fd list"
Thank you Henning! Here are the configured limits for Kamailio
Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 31543 31543 processes Max open files 16384 16384 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 31543 31543 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
This is what I know so far:
There seems to be a problem with siptrace module (going via TCP to a heplify-server on another host), but I’m not sure if this is a coincidence or the cause of the problems. I see quite a bit of these before the actual problem arises:
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: <core> [core/tcp_main.c:630]: _wbufq_add(): (591 bytes): write queue full or timeout (32498, total 43196, last write 0 s ago) Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [../../core/forward.h:261]: msg_send_buffer(): tcp_send failed Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322822]: ERROR: siptrace [siptrace_hep.c:229]: trace_send_hep3_duplicate(): cannot send hep duplicate message Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4023]: handle_ser_child(): received CON_ERROR for 0x7fe054f8ae10 (id 727321), refcnt 2, flags 0x3096
At some point then all things go south and the log is flooded with these messages
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/io_wait.h:596]: io_watch_del(): invalid fd 2244, not in [0, 482) Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/tcp_main.c:4677]: handle_tcpconn_ev(): io_watch_del(3) failed: for 0x7fe054f8ae10, fd 2244
And then ultimately the above combined with the initially mentioned
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: CRITICAL: <core> [core/tcp_main.c:4264]: handle_ser_child(): failed to add new socket to the fd list
And occasionally also something like this
Nov 4 07:54:01 kamailio-prod-westeurope-2 /usr/sbin/kamailio[322829]: ERROR: <core> [core/io_wait.h:373]: io_watch_add(): trying to overwrite entry 2247 watched for 5 in the hash 0x5621dc404ee0 (fd:-601539712, type:22049, data:0x5621dc23c004) with (2247, 2, 0x7fe04f85c2a0)
In total these add up to roughly 3600 lines per second (!!) in the log, so it is quickly flooded with these.
Florian FLOIMAIR Software Development - Symphony Cloud Services (1568)
Von: Henning Westerholt <hw@gilawa.commailto:hw@gilawa.com> Datum: Montag, 11. November 2024 um 19:52 An: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org> Cc: Floimair Florian <f.floimair@commend.commailto:f.floimair@commend.com> Betreff: [External] RE: Question regarding error message "failed to add new socket to the fd list"
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hello Florian,
Any other error messages before that error happens, e.g. something about the memory.
Otherwise, it could be indeed a ulimit issue. First step would be to check the actual configured limits for the kamailio user.
Cheers,
Henning
From: Floimair Florian via sr-users <sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org> Sent: Montag, 11. November 2024 17:45 To: sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org Cc: Floimair Florian <f.floimair@commend.commailto:f.floimair@commend.com> Subject: [SR-Users] Question regarding error message "failed to add new socket to the fd list"
Hi!
We have recently had issues with one of our Production Kamailios. When those happened, the log was filled with the following message:
CRITICAL: <core> [core/tcp_main.c:4528]: handle_new_connect(): failed to add new socket to the fd list
Now I wonder what the best approach is to prevent this.
We are using TCP/TLS only and I think this might be related to the file ulimit, but I am not sure about that. Shared memory is set to 512MB
Can you give me a hint on what to look for? Thank you very much!
P.S.: Sorry, I accidentally replied to a previous post of a different topic before which is totally unrelated (I think I should stop working for today 😉)
FLORIAN FLOIMAIR Software Development - Symphony Cloud Services Commend International GmbH Saalachstrasse 51 5020 Salzburg, Austria [signature_2072127332] commend.com LG Salzburg / FN 178618z