Great everyone!
It often happens to me that the service completely fails and refuses to service calls. For
example, yesterday there was such a situation. I looked at the logs and at the beginning
of the incident I noticed this strange entry:
kamailio[283008]: WARNING: {1 14714 ACK clMGFB7OWsiT4fQp9TPvV-RlvGqv2zPa} <core>
[core/tcp_main.c:1297]: find_listening_sock_info(): binding to source address
109.68.232.40:15061 failed: Address already in use [98]
kamailio[283008]: WARNING: {1 14714 ACK clMGFB7OWsiT4fQp9TPvV-RlvGqv2zPa} <core>
[core/tcp_main.c:1441]: tcp_do_connect(): 10.241.60.46:15061: could not find corresponding
listening socket
kamailio[283008]: ERROR: {1 14714 ACK clMGFB7OWsiT4fQp9TPvV-RlvGqv2zPa} <core>
[core/tcp_main.c:3078]: tcpconn_1st_send(): connect 10.241.60.46:15061 failed (RST)
Connection refused
kamailio[283008]: ERROR: {1 14714 ACK clMGFB7OWsiT4fQp9TPvV-RlvGqv2zPa} <core>
[core/tcp_main.c:3087]: tcpconn_1st_send(): 10.241.60.46:15061: connect & send for
0x7f7107e01750 (sock 39)
kamailio[283008]: CRITICAL: {1 14714 ACK clMGFB7OWsiT4fQp9TPvV-RlvGqv2zPa} <core>
[core/forward.h:290]: msg_send_buffer(): tcp_send failed
kamailio[283008]: ERROR: {1 14714 ACK clMGFB7OWsiT4fQp9TPvV-RlvGqv2zPa} sl
[sl_funcs.c:415]: sl_reply_error(): stateless error reply used: Unfortunately error on
sending to next hop occurred (477/SL)
And after several such entries this begins:
/usr/local/sbin/kamailio[283069]: ERROR: <core> [core/tcp_main.c:4787]:
tcpconn_main_timeout(): connect 31.173.85.36:10588 failed (timeout)
/usr/local/sbin/kamailio[283069]: ERROR: <core> [core/tcp_main.c:4787]:
tcpconn_main_timeout(): connect 95.24.163.132:48075 failed (timeout)
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668011
(state: 0) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668304
(state: 0) - closing /usr/local/sbin/kamailio[282828]: CRITICAL: <core>
[core/tcp_main.c:5526]: tcp_timer_check_connections(): message processing timeout on
connection id: 22668309 (state: 0) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668316
(state: 0) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668192
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668228
(state: 0) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668236
(state: 0) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668427
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668428
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668429
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668430
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668431
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668432
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668433
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668434
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668435
(state: 3) - closing
/usr/local/sbin/kamailio[282828]: CRITICAL: <core> [core/tcp_main.c:5526]:
tcp_timer_check_connections(): message processing timeout on connection id: 22668436
(state: 3) - closing
Such recordings go on for a very long time (5 minutes) and begin like this:
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 20, socket 273: queue full, 286 requests queued (total handled
2421746)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 29, socket 291: queue full, 286 requests queued (total handled
2419460)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 4, socket 241: queue full, 287 requests queued (total handled
2451838)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 5, socket 243: queue full, 287 requests queued (total handled
2445182)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 18, socket 269: queue full, 287 requests queued (total handled
2422113)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 20, socket 273: queue full, 287 requests queued (total handled
2421747)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 27, socket 287: queue full, 287 requests queued (total handled
2419986)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 29, socket 291: queue full, 287 requests queued (total handled
2419461)
/usr/local/sbin/kamailio[283069]: CRITICAL: <core> [core/tcp_main.c:4361]:
send2child(): tcp child 30, socket 293: queue full, 287 requests queued (total handled
2419979)
Also, my eyes are distracted by such records and it’s difficult to understand the logs:
1) kamailio[282798]: CRITICAL: {2 12704 INVITE 4cb232e6-9b5a-4a5b-aa0e-1ed73539039f} tm
[../../core/forward.h:290]: msg_send_buffer(): tcp_send failed
2) /usr/local/sbin/kamailio[282970]: ERROR: sl [sl.c:285]: send_reply(): failed to reply
stateful (tm)
3) kamailio[283053]: CRITICAL: <core> [core/tcp_read.c:267]: tcp_read_data(): error
reading: Connection reset by peer (104) ([217.66.156.172]:11604 ->
kamailio[283053]: CRITICAL: <core> [core/tcp_read.c:270]: tcp_read_data():
-> [109.68.232.40]:15061)
kamailio[283053]: CRITICAL: <core> [core/tcp_read.c:1523]: tcp_read_req():
ERROR: tcp_read_req: error reading - c: 0x7f71215bb610 r: 0x7f71215bb738 (-1)
kamailio[283007]: CRITICAL: <core> [core/tcp_read.c:267]: tcp_read_data():
error reading: Connection timed out (110) ([188.170.204.16]:52148 ->
kamailio[283007]: CRITICAL: <core> [core/tcp_read.c:270]: tcp_read_data():
-> [109.68.232.40]:15061)
kamailio[283007]: CRITICAL: <core> [core/tcp_read.c:1523]: tcp_read_req():
ERROR: tcp_read_req: error reading - c: 0x7f711b638fa0 r: 0x7f711b6390c8 (-1)
kamailio[283027]: ERROR: {1 9746 REGISTER 8e6af58d-d4ca-4dfe-a8f6-d3c84685c845} tsilo
[ts_append.c:121]: ts_append_to(): transaction [23428:1443020265] not found
4) kamailio[282820]: CRITICAL: {1 2 UPDATE 6da65a87-3224-46a1-961f-8f9ee50348cc} tm
[../../core/forward.h:290]: msg_send_buffer(): tcp_send failed
kamailio[282820]: WARNING: {1 2 UPDATE 6da65a87-3224-46a1-961f-8f9ee50348cc} tm
[t_fwd.c:1581]: t_send_branch(): sending request on branch 0 failed
kamailio[282820]: ERROR: {1 2 UPDATE 6da65a87-3224-46a1-961f-8f9ee50348cc} sl
[sl_funcs.c:415]: sl_reply_error(): stateless error reply used: Unfortunately error on
sending to next hop occurred (477/SL)
Once the queue is full, the system stops servicing new calls and freezes.
kamailio -v
version: kamailio 5.7.2 (x86_64/linux) 95629c
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST,
DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC,
DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER,
USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535,
DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 95629c
OS: Debian 11, 64 cores, 125GB of DDR4 RAM; DB: MariaDB 10.5.15;
udp_children: 30; tcp_children: 34; TLS: YES; async_workers: 64.
Incoming calls are sent via push notifications( Federico Cabiddu method:
https://www.voztovoice.org/sites/default/files/KamilioWorld2015%20-Federico…).
NetBridging(for SIP and RTPEngine).
ims_charging for billing (integration with our billing system using the Diameter
protocol).
As soon as the service starts responding with 503, it does not recover on its own, you
have to restart.
My server is powerful enough for the number of users of my service (what I mean is that
there is no problem with resources). For example, 64 cores, 125GB of DDR4 RAM, 2TB of
disk, of which 1TB SSD for the database, 10Gigabit channel, this is only a server for
Kamailio for Rtpengines separate servers.
Please help me figure out what causes a service failure!?