Hello!
It often happens to me that the service begins to refuse service (503) and so far I have
not been able to understand the exact reason. Typically, before clients start receiving
503, many before them receive a 408 code. As soon as the service starts responding with
503, it does not recover on its own, you have to restart. There is no way to understand
from the logs what led to this (nothing unusual). My server is powerful enough for the
number of users of my service (what I mean is that there is no problem with resources).
For example, 64 cores, 125GB of DDR4 RAM, 2TB of disk, of which 1TB SSD for the
database,
10Gigabit channel, this is only a server for Kamailio for Rtpengines separate servers.
Please help me figure out what causes a service failure!?
OS: Debian 11, 64 cores, 125GB of DDR4 RAM; DB: MariaDB 10.5.15;
kamailio -v
version: kamailio 5.5.3 (x86_64/linux) 473cef
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST,
DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY,
USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR,
USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE
65535,
DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
udp_children: 30; tcp_children: 34; TLS: YES; async_workers: 64.
Incoming calls are sent via push notifications( Federico Cabiddu method:
https://www.voztovoice.org/sites/default/files/KamilioWorld2015%20-Federico…).
NetBridging(for SIP and RTPEngine).
ims_charging for billing (integration with our billing system using the Diameter
protocol).
In the logs exactly at the moment when the service began to refuse, the recording
disappears completely for 7
minutes, absolutely no entries, after which the recording of the TCP queue full begins.
Oct 3 18:38:07 sip1-life3 kamailio[3380553]: CRITICAL: {2 3691 INVITE
672699d1-c76b-4153-b72a-fe647913a9e9} tm [../../core/forward.h:292]: msg_send_buffer():
tcp_send failed
Oct 3 18:38:09 sip1-life3 kamailio[3380539]: CRITICAL: {2 3691 INVITE
672699d1-c76b-4153-b72a-fe647913a9e9} tm [../../core/forward.h:292]: msg_send_buffer():
tcp_send failed
Oct 3 18:38:12 sip1-life3 kamailio[3380534]: CRITICAL: {2 30106 INVITE
032f8650-1d90-49ed-82f1-c50ccbef191d} tm [../../core/forward.h:292]: msg_send_buffer():
tcp_send failed
Oct 3 18:45:14 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 11, socket 224: queue full, 285 requests
queued (total handled 2899418)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 1, socket 204: queue full, 286 requests
queued (total handled 2964063)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 4, socket 210: queue full, 286 requests
queued (total handled 2923484)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 6, socket 214: queue full, 286 requests
queued (total handled 2911633)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 7, socket 216: queue full, 286 requests
queued (total handled 2907132)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 8, socket 218: queue full, 286 requests
queued (total handled 2903281)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 9, socket 220: queue full, 286 requests
queued (total handled 2903438)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 11, socket 224: queue full, 286 requests
queued (total handled 2899419)
Oct 3 18:45:15 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 12, socket 226: queue full, 286 requests
queued (total handled 2897842)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 16, socket 234: queue full, 286 requests
queued (total handled 2895429)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 17, socket 236: queue full, 286 requests
queued (total handled 2895303)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 22, socket 246: queue full, 286 requests
queued (total handled 2891873)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 26, socket 254: queue full, 286 requests
queued (total handled 2892619)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 27, socket 256: queue full, 286 requests
queued (total handled 2891769)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 224 , queue entry
0,
retries 98, connection 0x7f0b6d974348, tcp so
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 30, socket 262: queue full, 286 requests
queued (total handled 2891614)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 31, socket 264: queue full, 286 requests
queued (total handled 2890778)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 33, socket 268: queue full, 286 requests
queued (total handled 2890852)
Oct 3 18:45:16 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 0, socket 202: queue full, 287 requests
queued (total handled 3071438)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 1, socket 204: queue full, 287 requests
queued (total handled 2964064)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 2, socket 206: queue full, 287 requests
queued (total handled 2944928)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 4, socket 210: queue full, 287 requests
queued (total handled 2923485)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 5, socket 212: queue full, 287 requests
queued (total handled 2915778)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 6, socket 214: queue full, 287 requests
queued (total handled 2911634)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 7, socket 216: queue full, 287 requests
queued (total handled 2907133)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 8, socket 218: queue full, 287 requests
queued (total handled 2903282)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 9, socket 220: queue full, 287 requests
queued (total handled 2903439)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 10, socket 222: queue full, 287 requests
queued (total handled 2900227)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 11, socket 224: queue full, 287 requests
queued (total handled 2899420)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 12, socket 226: queue full, 287 requests
queued (total handled 2897843)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 204 , queue entry
0,
retries 88, connection 0x7f0b742a1968, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 13, socket 228: queue full, 287 requests
queued (total handled 2896210)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 210 , queue entry
0,
retries 86, connection 0x7f0b74ffaa08, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 16, socket 234: queue full, 287 requests
queued (total handled 2895430)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 214 , queue entry
0,
retries 88, connection 0x7f0b75003218, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 216 , queue entry
1,
retries 86, connection 0x7f0b75007620, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 218 , queue entry
2,
retries 84, connection 0x7f0b74772f70, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 17, socket 236: queue full, 287 requests
queued (total handled 2895304)
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 220 , queue entry
0,
retries 87, connection 0x7f0b74777378, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: ERROR:
[core/tcp_main.c:3458]: send_fd_queue_run(): send_fd failed on socket 224 , queue entry
1,
retries 83, connection 0x7f0b7477fb88, tcp so
Oct 3 18:45:17 sip1-life3 /usr/local/sbin/kamailio[3380778]: CRITICAL:
[core/tcp_main.c:4170]: send2child(): tcp child 21, socket 244: queue full, 287 requests
queued (total handled 2893042)