Description

When one rtpengine instance becomes unavailable on a Kamailio instance with moderate call traffic, even where there are multiple rtpengine instances configured for high availability, all processes can become blocked and Kamailio stops responding to all requests.

We produced this in a real world test, with two Kamailio servers running (20 child processes) and two rtpengine servers running. We loaded the Kamailio servers with calls, some answered, some ringing/busy/etc, and then made one rtpengine server drop all packets. Both Kamailio servers were able to briefly detect that the rtpengine instance was unavailable when it attempted to send an offer to it, but then as existing calls began to complete, both Kamailio servers became congested attempting to send "delete" commands to rtpengine. Both Kamailio servers then became unavailable and stopped responding to all messages for 10 minutes, until we restored rtpengine.

As such, an rtpengine server that crashes can cause a complete Kamailio outage.

Troubleshooting

By using the reproduction steps below, we were easily able to make Kamailio stop responding to all incoming SIP messages when taking down an rtpengine instance.

The logs below show the sequence of events when this is reproduced for a single call.

Configuration:

A call is forwarded using t_relay and rtpengine_manage. rtpengine is killed mid-call. The call ends with 486.

When the t_on_failure route is called, the config makes a singular call to rtpengine_manage, however what follows is 30 errors, all mentioning the same rtpengine instance:

ERROR: rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>

Followed by the below:

ERROR: rtpengine [rtpengine.c:2729]: rtpp_function_call(): queried nodes limit reached
WARNING: tm [t_reply.c:1094]: run_failure_handlers(): failure route execution took too long: 150153261 us

Each timeout is 5 seconds (5x retries of 1000ms), which corresponds to about 150 seconds over 30 attempts. This matches the warning generated by tm.

In this case, there was about 5 minutes of time where a process was blocked. 150 seconds for the child leg's failure route, and 150 seconds for the parent leg's failure route. When this happens for many calls at once, congestion is guaranteed.

The 30 attempts to delete the call from rtpengine comes from the default value of queried_nodes_limit. This behavior doesn't make sense to me -- it's a retry loop that runs a retry loop.

Reproduction

tldr: kill rtpengine via iptables mid-call with the default rtpengine settings and you'll see lots of errors.

You can reproduce this with one or multiple rtpengine instances configured in Kamailio. It's easier with just one.

Standard setup, but importantly:

To reproduce easily, we configured the destination server to send a 100, a 180, wait 20 seconds, then send a 486.

Place a number of calls to Kamailio (we used sipp) -- importantly, exceeding the number of child processes configured. You can reproduce the issue with just one call, but this wont make Kamailio unresponsive.

Once the calls are placed, and still in progress (ie before the 486 is returned), use iptables on the rtpengine server to drop all traffic on its control port.

When the call(s) start receiving the 486, you will notice a flood of error messages timeout waiting reply for command "delete" from RTPEngine for approximately 5 minutes per call. This blocks the process.

If it's now attempting to send the delete command to rtpengine for a number of calls greater than your child process count: attempt to send any other SIP messages to Kamailio and you will not receive a response.

Log Messages

This is a log for a single call, forwarded using t_relay. All calls to rtpengine_manage are immediately surrounded by info log messages "calling rtpengine_manage" and "completed calling rtpengine_manage".

Feb 12 23:40:54 debian-kamailio kamailio[27715]: INFO: {1 1 INVITE 1-27864@127.0.1.1} <script>: new branch [0] to sip:0421XXXXXX@192.168.X.X:9060
Feb 12 23:40:54 debian-kamailio kamailio[27715]: INFO: {1 1 INVITE 1-27864@127.0.1.1} <script>: branch_route: calling rtpengine_manage
Feb 12 23:40:54 debian-kamailio kamailio[27715]: INFO: {1 1 INVITE 1-27864@127.0.1.1} <script>: branch_route: completed calling rtpengine_manage
Feb 12 23:40:54 debian-kamailio kamailio[27732]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: calling rtpengine_manage
Feb 12 23:40:54 debian-kamailio kamailio[27732]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: completed calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27730]: INFO: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: new branch [0] to sip:61421764018@192.168.X.X:9092;transport=udp
Feb 12 23:40:55 debian-kamailio kamailio[27730]: INFO: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: branch_route: calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27730]: INFO: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: branch_route: completed calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27717]: INFO: {2 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: onreply_route: calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27717]: INFO: {2 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: onreply_route: completed calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27714]: INFO: {2 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: onreply_route: calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27714]: INFO: {2 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: onreply_route: completed calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27735]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: calling rtpengine_manage
Feb 12 23:40:55 debian-kamailio kamailio[27735]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: completed calling rtpengine_manage
Feb 12 23:41:15 debian-kamailio kamailio[27713]: INFO: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: failure_route: calling rtpengine_manage
Feb 12 23:41:20 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:25 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:30 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:35 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:40 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:45 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:50 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:41:55 debian-kamailio kamailio[27740]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: calling rtpengine_manage
Feb 12 23:41:55 debian-kamailio kamailio[27740]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: completed calling rtpengine_manage
Feb 12 23:41:55 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:00 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:05 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:10 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:16 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:21 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:26 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:31 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:36 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:41 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:46 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:51 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:42:55 debian-kamailio kamailio[27734]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: calling rtpengine_manage
Feb 12 23:42:55 debian-kamailio kamailio[27734]: INFO: {2 1 INVITE 1-27864@127.0.1.1} <script>: onreply_route: completed calling rtpengine_manage
Feb 12 23:42:56 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:01 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:06 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:11 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:16 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:21 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:26 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:31 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:36 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:41 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:46 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:46 debian-kamailio kamailio[27713]: ERROR: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} rtpengine [rtpengine.c:2729]: rtpp_function_call(): queried nodes limit reached
Feb 12 23:43:46 debian-kamailio kamailio[27713]: INFO: {1 63577835 INVITE 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: failure_route: completed calling rtpengine_manage
Feb 12 23:43:46 debian-kamailio kamailio[27713]: WARNING: tm [t_reply.c:1094]: run_failure_handlers(): failure route execution took too long: 150153207 us
Feb 12 23:43:46 debian-kamailio kamailio[27736]: INFO: {1 63577835 ACK 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: ACK: calling rtpengine_manage
Feb 12 23:43:46 debian-kamailio kamailio[27736]: INFO: {1 63577835 ACK 720bf6d9-25fb-123c-cb91-80fa5d718101} <script>: ACK: completed calling rtpengine_manage
Feb 12 23:43:46 debian-kamailio kamailio[27738]: INFO: {1 1 INVITE 1-27864@127.0.1.1} <script>: failure_route: calling rtpengine_manage
Feb 12 23:43:51 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:43:56 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:01 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:06 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:11 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:16 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:21 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:26 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:31 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:36 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:41 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:46 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:51 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:44:56 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:01 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:06 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:11 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:16 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:21 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:26 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:31 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:36 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:41 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:46 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:51 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:45:56 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:46:01 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:46:06 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:46:11 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:46:16 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:3129]: send_rtpp_command(): timeout waiting reply for command "delete" from RTPEngine <udp:192.168.X.X:2223>
Feb 12 23:46:16 debian-kamailio kamailio[27738]: ERROR: {1 1 INVITE 1-27864@127.0.1.1} rtpengine [rtpengine.c:2729]: rtpp_function_call(): queried nodes limit reached
Feb 12 23:46:16 debian-kamailio kamailio[27738]: INFO: {1 1 INVITE 1-27864@127.0.1.1} <script>: failure_route: completed calling rtpengine_manage
Feb 12 23:46:16 debian-kamailio kamailio[27738]: WARNING: tm [t_reply.c:1094]: run_failure_handlers(): failure route execution took too long: 150153261 us

Possible Solutions

To minimize the issue, rtpengine_tout_ms and rtpengine_retr can be reduced. This doesn't exactly the fix the problem, as you need reasonable values here.

Additionally, queried_nodes_limit can be reduced to limit the number of unnecessary retries, at the expense of the functionality it gives you to initially select candidate nodes.

Actual solution: I think that, when a call has already selected an rtpengine instance successfully, and that's in the hash table, then queried_nodes_limit should not apply. It should only attempt to send commands with a total number of attempts = rtpengine_retr, not rtpengine_retr * queried_nodes_limit.

By changing the behavior as above, tweaking rtpengine_tout_ms and rtpengine_retr, and increasing the number of child workers, an rtpengine becoming unavailable should cause much less impactful blocking.

Additional Information

Kamailio version: (attempted on 5.4.1 and latest master)

version: kamailio 5.4.1 (x86_64/linux)
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled on 23:22:15 Aug 15 2022 with gcc 8.3.0
version: kamailio 5.7.0-dev3 (x86_64/linux) 2384c3
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 2384c3
compiled on 23:22:22 Feb 12 2023 with gcc 8.3.0

Operating System:

Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

Linux debian-kamailio 4.19.0-20-amd64 #1 SMP Debian 4.19.235-1 (2022-03-17) x86_64 GNU/Linux


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/3370@github.com>