Hi,
We have an RTPEngine that occasionally finds itself in this state of CPU usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25476 root 20 0 2929600 80820 1900 S 156.2 0.1 259:05.54 rtpengine
The journal is a dizzying stream of:
Aug 06 20:58:47 gw1 rtpengine[25476]: ERR: [1074537188_44545952 port 61666]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible
It eventually leads to increasing CPU usage and a degeneration of audio quality until RTPEngine is restarted.
It would be appreciated if someone could shed some light on the nature of this queue and what we can do about this problem. Is this referring to a netfilter packet forwarding queue, or is it a receive queue of the RTPengine user space control process? Can its size be increased or tweaked with kernel parameters?
Thanks,
-- Alex
From the 5.1 TB transferred on the 'lo' interface (RTPEngine running on
same host as Kamailio):
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1 (Local Loopback) RX packets 27694476978 bytes 5678003789723 (5.1 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 27694476978 bytes 5678003789723 (5.1 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
... I would surmise that this is some sort of loop / infinite loop condition, but I can't quite put my finger on what it is.
From an RTPEngine logging standpoint, the scenario begins looking rather
normally:
Aug 6 18:11:04 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'offer' from 127.0.0.1:48525 Aug 6 18:11:04 gw1 rtpengine[25476]: NOTICE: [1074537188_44545952@20.20.20.20]: Creating new call Aug 6 18:11:04 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: offer time = 0.000865 sec Aug 6 18:11:04 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'offer' from 127.0.0.1:48525 Aug 6 18:11:14 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'answer' from 127.0.0.1:58668 Aug 6 18:11:14 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: answer time = 0.000123 sec Aug 6 18:11:14 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'answer' from 127.0.0.1:58668 Aug 6 18:11:18 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20 port 61666]: Confirmed peer address as 2.2.2.2:38236 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'offer' from 127.0.0.1:50995 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: offer time = 0.000082 sec Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'offer' from 127.0.0.1:50995 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'offer' from 127.0.0.1:47321 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: offer time = 0.000151 sec Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'offer' from 127.0.0.1:47321 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20 port 61666]: Switching local interface to 1.1.1.1:61666 Aug 6 18:11:23 gw1 rtpengine[25476]: ERR: [1074537188_44545952@20.20.20.20 port 61666]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible [... ad nauseum... ]
The only thing unusual about it from a SIP perspective is that it involves an SRTP endpoint and several reinvites which don't seem to be handled as one would expect:
Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-INBOUND-ROUTE-BRANCH:1074537188_44545952@2.2.2.2] Relaying media through RTPEngine Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-INBOUND-ROUTE-BRANCH:1074537188_44545952@2.2.2.2] -> Forcing Secure RTP (SRTP) profile in offer Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22233]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '100 Trying' from 3.3.3.3:54246 Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22233]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '180 Ringing' from 3.3.3.3:54246 Aug 6 18:11:14 gw1 /usr/local/sbin/kamailio[22258]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 3.3.3.3:54246 Aug 6 18:11:14 gw1 /usr/local/sbin/kamailio[22258]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:14 gw1 /usr/local/sbin/kamailio[22112]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 2.2.2.2:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Re-invite received on rtpengine'd call -- updating RTPEngine Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22068]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Re-invite received on rtpengine'd call -- updating RTPEngine Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22074]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '100 Trying' from 1.1.1.1:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22057]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '100 Trying' from 2.2.2.2:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22065]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22065]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22088]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22088]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22118]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22118]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22100]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22100]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22116]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22116]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22094]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22094]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22090]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22074]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22068]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22098]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:58 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:11:58 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 1.1.1.1:5060 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session
But while this signalling chronology is anomalous, it doesn't explain a continuous queue overflow of some kind which lasted several hours?
-- Alex
On Mon, Aug 06, 2018 at 08:59:21PM -0400, Alex Balashov wrote:
Hi,
We have an RTPEngine that occasionally finds itself in this state of CPU usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25476 root 20 0 2929600 80820 1900 S 156.2 0.1 259:05.54 rtpengine
The journal is a dizzying stream of:
Aug 06 20:58:47 gw1 rtpengine[25476]: ERR: [1074537188_44545952 port 61666]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible
It eventually leads to increasing CPU usage and a degeneration of audio quality until RTPEngine is restarted.
It would be appreciated if someone could shed some light on the nature of this queue and what we can do about this problem. Is this referring to a netfilter packet forwarding queue, or is it a receive queue of the RTPengine user space control process? Can its size be increased or tweaked with kernel parameters?
Thanks,
-- Alex
-- Alex Balashov | Principal | Evariste Systems LLC
Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
On 2018-08-06 21:17, Alex Balashov wrote:
From the 5.1 TB transferred on the 'lo' interface (RTPEngine running on same host as Kamailio):
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1 (Local Loopback) RX packets 27694476978 bytes 5678003789723 (5.1 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 27694476978 bytes 5678003789723 (5.1 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
... I would surmise that this is some sort of loop / infinite loop condition, but I can't quite put my finger on what it is.
From an RTPEngine logging standpoint, the scenario begins looking rather normally:
Aug 6 18:11:04 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'offer' from 127.0.0.1:48525 Aug 6 18:11:04 gw1 rtpengine[25476]: NOTICE: [1074537188_44545952@20.20.20.20]: Creating new call Aug 6 18:11:04 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: offer time = 0.000865 sec Aug 6 18:11:04 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'offer' from 127.0.0.1:48525 Aug 6 18:11:14 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'answer' from 127.0.0.1:58668 Aug 6 18:11:14 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: answer time = 0.000123 sec Aug 6 18:11:14 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'answer' from 127.0.0.1:58668 Aug 6 18:11:18 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20 port 61666]: Confirmed peer address as 2.2.2.2:38236 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'offer' from 127.0.0.1:50995 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: offer time = 0.000082 sec Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'offer' from 127.0.0.1:50995 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Received command 'offer' from 127.0.0.1:47321 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: offer time = 0.000151 sec Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20]: Replying to 'offer' from 127.0.0.1:47321 Aug 6 18:11:23 gw1 rtpengine[25476]: INFO: [1074537188_44545952@20.20.20.20 port 61666]: Switching local interface to 1.1.1.1:61666 Aug 6 18:11:23 gw1 rtpengine[25476]: ERR: [1074537188_44545952@20.20.20.20 port 61666]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible [... ad nauseum... ]
The only thing unusual about it from a SIP perspective is that it involves an SRTP endpoint and several reinvites which don't seem to be handled as one would expect:
Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-INBOUND-ROUTE-BRANCH:1074537188_44545952@2.2.2.2] Relaying media through RTPEngine Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-INBOUND-ROUTE-BRANCH:1074537188_44545952@2.2.2.2] -> Forcing Secure RTP (SRTP) profile in offer Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22233]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '100 Trying' from 3.3.3.3:54246 Aug 6 18:11:04 gw1 /usr/local/sbin/kamailio[22233]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '180 Ringing' from 3.3.3.3:54246 Aug 6 18:11:14 gw1 /usr/local/sbin/kamailio[22258]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 3.3.3.3:54246 Aug 6 18:11:14 gw1 /usr/local/sbin/kamailio[22258]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:14 gw1 /usr/local/sbin/kamailio[22112]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 2.2.2.2:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Re-invite received on rtpengine'd call -- updating RTPEngine Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22068]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Re-invite received on rtpengine'd call -- updating RTPEngine Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22074]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '100 Trying' from 1.1.1.1:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22057]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '100 Trying' from 2.2.2.2:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:23 gw1 /usr/local/sbin/kamailio[22070]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22065]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22065]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22088]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22088]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22118]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22118]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22100]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 2.2.2.2:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22100]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22116]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22116]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22094]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] Reply '200 OK' from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22094]: INFO: [R-MAIN-REPLY:1074537188_44545952@2.2.2.2] -> Reply contains SDP; diverting through RTPEngine Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22090]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22074]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22068]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22263]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 3.3.3.3:54246 Aug 6 18:11:27 gw1 /usr/local/sbin/kamailio[22098]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> Other sequential ACK received from 1.1.1.1:5060 Aug 6 18:11:58 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:11:58 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 1.1.1.1:5060 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22104]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE (hangup) received from 3.3.3.3:54246 Aug 6 18:12:02 gw1 /usr/local/sbin/kamailio[22211]: INFO: [R-MAIN:1074537188_44545952@2.2.2.2] -> BYE received - tearing down rtpengine session
But while this signalling chronology is anomalous, it doesn't explain a continuous queue overflow of some kind which lasted several hours?
This can happen if the output SDP (rewritten by rtpengine) is fed back into rtpengine as an input SDP for the same call/branch. Then rtpengine will start looping media back to itself. There are some safeguards in the code to prevent this from happening, but if the underlying signalling is broken, it can still happen.
Cheers
On Tue, Aug 07, 2018 at 09:18:06AM -0400, Richard Fuchs wrote:
This can happen if the output SDP (rewritten by rtpengine) is fed back into rtpengine as an input SDP for the same call/branch. Then rtpengine will start looping media back to itself. There are some safeguards in the code to prevent this from happening, but if the underlying signalling is broken, it can still happen.
Ah, I see. Thank you for that insight.
I have trouble visualising where in our Kamailio config this could happen, but I suppose it's not impossible. I'll try to identify that case.
Richard,
On Tue, Aug 07, 2018 at 09:18:06AM -0400, Richard Fuchs wrote:
This can happen if the output SDP (rewritten by rtpengine) is fed back into rtpengine as an input SDP for the same call/branch. Then rtpengine will start looping media back to itself. There are some safeguards in the code to prevent this from happening, but if the underlying signalling is broken, it can still happen.
Do you think this could happen if the initial INVITE is handled this way:
rtpengine_offer("replace-origin replace-session-connection ICE=remove AVP SRTP");
And then two subsequent reinvites occurring back to back are handled this way:
rtpengine_offer("replace-origin replace-session-connection ICE=remove");
With SDP answers in both cases being handled this way:
rtpengine_answer("replace-origin replace-session-connection ICE=remove SIP-source-address");
We expected that the subsequent offers/answers would have no bearing on the existing RTP/AVP -> RTP/SAVP arrangement.
We need to do rtpengine_offer()/answer() instead of just invoking rtpengine_manage(), because we do a redirect query first when processing the call (sometimes). When there is no RTPEngine invoked on the first branch, only on branches >= 2, we find that rtpengine_manage() behaves improperly, though passing it a branch ID might fix that. I'm not sure that would address this issue, though.
The reinvites are coming simultaneously from both endpoints, in the strangest instance of reinvite glare I've ever seen. It's a consistent behaviour on this particular call flow.
-- Alex
On 2018-08-07 20:39, Alex Balashov wrote:
Richard,
On Tue, Aug 07, 2018 at 09:18:06AM -0400, Richard Fuchs wrote:
This can happen if the output SDP (rewritten by rtpengine) is fed back into rtpengine as an input SDP for the same call/branch. Then rtpengine will start looping media back to itself. There are some safeguards in the code to prevent this from happening, but if the underlying signalling is broken, it can still happen.
Do you think this could happen if the initial INVITE is handled this way:
rtpengine_offer("replace-origin replace-session-connection ICE=remove AVP SRTP");
And then two subsequent reinvites occurring back to back are handled this way:
rtpengine_offer("replace-origin replace-session-connection ICE=remove");
With SDP answers in both cases being handled this way:
rtpengine_answer("replace-origin replace-session-connection ICE=remove SIP-source-address");
We expected that the subsequent offers/answers would have no bearing on the existing RTP/AVP -> RTP/SAVP arrangement.
In and of itself that should be no problem, provided that the SIP proxy actually sees the correct source address for `SIP-source-address` (and not some local address for example).
In my experience, the most common reasons for inadvertently causing a media loop are:
1) An error in the signalling script that causes rtpengine_offer (or answer) to be called twice for the same SDP, with something like msg_apply_changes in between. That would give you: original SDP -> offer -> rewritten SDP, and rewritten SDP -> offer -> rewritten again. If you're branching, you need to make sure that rtpengine gets to see the original SDP as input in all branches.
2) Two SIP proxies back to back, both engaging the same instance of rtpengine. This can also be the same SIP proxy if the invite/SDP is looped back into it.
I suggest you enable debug logging in rtpengine and inspect the actual SDP bodies that it gets to see.
Cheers
Richard,
rtpproxy classic edition had an SDP attribute that could be inserted to prevent rtpproxy from operating on the SDP if another rtpproxy had already been engaged upstream. Does RTPEngine have a similar feature?
On 2018-08-08 09:25, Alex Balashov wrote:
Richard,
rtpproxy classic edition had an SDP attribute that could be inserted to prevent rtpproxy from operating on the SDP if another rtpproxy had already been engaged upstream. Does RTPEngine have a similar feature?
Yes, if you include the `loop-protect` option.
On Wed, Aug 08, 2018 at 09:38:02AM -0400, Richard Fuchs wrote:
On 2018-08-08 09:25, Alex Balashov wrote:
Richard,
rtpproxy classic edition had an SDP attribute that could be inserted to prevent rtpproxy from operating on the SDP if another rtpproxy had already been engaged upstream. Does RTPEngine have a similar feature?
Yes, if you include the `loop-protect` option.
Thanks! Are there any SDP interoperability concerns with using it?
The other aspect of this that I am puzzling over is where the packets in this loop are coming from. The call is long dead, the ports involved are not in use (per the RTPEngine call list), but I'm still getting:
Aug 08 12:50:00 gw1 rtpengine[18934]: ERR: [992251125_16092378@x.x.x.x port 60368]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible
Are these RTP frames that were previously sent at some point during the call looping?
-- Alex
On 2018-08-08 13:05, Alex Balashov wrote:
On Wed, Aug 08, 2018 at 09:38:02AM -0400, Richard Fuchs wrote:
On 2018-08-08 09:25, Alex Balashov wrote:
Richard,
rtpproxy classic edition had an SDP attribute that could be inserted to prevent rtpproxy from operating on the SDP if another rtpproxy had already been engaged upstream. Does RTPEngine have a similar feature?
Yes, if you include the `loop-protect` option.
Thanks! Are there any SDP interoperability concerns with using it?
Not any more than what you would get with rtpproxy doing the same thing :)
The other aspect of this that I am puzzling over is where the packets in this loop are coming from. The call is long dead, the ports involved are not in use (per the RTPEngine call list), but I'm still getting:
Aug 08 12:50:00 gw1 rtpengine[18934]: ERR: [992251125_16092378@x.x.x.x port 60368]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible
Are these RTP frames that were previously sent at some point during the call looping?
Hard to say without any further details. What does "long dead" mean exactly? Seconds, minutes, hours? A deleted call disappears from the call list immediately, but components of it (ports in particular) may remain open for a short period of time until all reference counts drop to zero. That should take only seconds at most though.
I don't have it in front of me, but it's definitely more than a few seconds. Anywhere from 5 to 30 minutes.
On August 8, 2018 1:20:22 PM EDT, Richard Fuchs rfuchs@sipwise.com wrote:
On 2018-08-08 13:05, Alex Balashov wrote:
On Wed, Aug 08, 2018 at 09:38:02AM -0400, Richard Fuchs wrote:
On 2018-08-08 09:25, Alex Balashov wrote:
Richard,
rtpproxy classic edition had an SDP attribute that could be
inserted to
prevent rtpproxy from operating on the SDP if another rtpproxy had already been engaged upstream. Does RTPEngine have a similar
feature?
Yes, if you include the `loop-protect` option.
Thanks! Are there any SDP interoperability concerns with using it?
Not any more than what you would get with rtpproxy doing the same thing :)
The other aspect of this that I am puzzling over is where the packets
in
this loop are coming from. The call is long dead, the ports involved
are
not in use (per the RTPEngine call list), but I'm still getting:
Aug 08 12:50:00 gw1 rtpengine[18934]: ERR:
[992251125_16092378@x.x.x.x port 60368]: Too many packets in UDP receive queue (more than 50), aborting loop. Dropped packets possible
Are these RTP frames that were previously sent at some point during
the
call looping?
Hard to say without any further details. What does "long dead" mean exactly? Seconds, minutes, hours? A deleted call disappears from the call list immediately, but components of it (ports in particular) may remain open for a short period of time until all reference counts drop to zero. That should take only seconds at most though.
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Alex
-- Sent via mobile, please forgive typos and brevity.