Andres, I believe I remember this issue being brought up a while back. If I remember correctly, ser children locked up when communicating to a locked rtpproxy over socket interface. The "solution" was to use udp over loopback to communicate as this would fail that specific call, but not lock the ser process. g-)
----- Original Message ----- From: "Andres" andres@telesip.net To: serusers@lists.iptel.org Sent: Tuesday, November 22, 2005 10:19 PM Subject: [Serusers] SER Children Misbehaving
Today we had an incident where SER (0.9.4) children drained all the CPUs of one of our servers. Top Showed: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 17925 root 25 0 5644 5644 3888 R 25.5 0.2 6:26 1 ser 17929 root 25 0 5672 5672 3880 R 24.7 0.2 6:48 0 ser 17928 root 25 0 5688 5688 3872 R 24.3 0.2 6:25 1 ser 17933 root 25 0 4540 4540 3740 R 22.8 0.2 6:00 0 ser
And .. # ps -Al | grep ser 1 S 0 17901 1 0 85 0 - 14200 pause ? 00:00:00 ser 1 S 0 17916 17901 0 75 0 - 14200 pipe_w ? 00:00:00 ser 1 S 0 17917 17901 0 75 0 - 14418 schedu ? 00:00:22 ser 1 S 0 17918 17901 0 75 0 - 14422 schedu ? 00:00:23 ser 1 S 0 17919 17901 0 75 0 - 14423 schedu ? 00:00:24 ser 1 S 0 17920 17901 0 75 0 - 14447 schedu ? 00:00:22 ser 1 S 0 17921 17901 0 75 0 - 14421 schedu ? 00:00:22 ser 1 S 0 17922 17901 0 75 0 - 14424 schedu ? 00:00:22 ser 1 S 0 17923 17901 0 75 0 - 14428 schedu ? 00:00:21 ser 1 S 0 17924 17901 0 75 0 - 14424 schedu ? 00:00:22 ser 1 R 0 17925 17901 0 85 0 - 14448 - ? 00:06:22 ser 1 S 0 17926 17901 0 75 0 - 14457 schedu ? 00:00:49 ser 1 S 0 17927 17901 0 75 0 - 14453 schedu ? 00:00:50 ser 1 R 0 17928 17901 0 85 0 - 14477 - ? 00:06:20 ser 1 R 0 17929 17901 0 85 0 - 14455 - ? 00:06:44 ser 1 S 0 17930 17901 0 75 0 - 14452 schedu ? 00:00:50 ser 1 S 0 17931 17901 0 75 0 - 14448 schedu ? 00:00:50 ser 1 S 0 17932 17901 0 76 0 - 14448 schedu ? 00:00:49 ser 1 R 0 17933 17901 0 85 0 - 14235 - ? 00:05:55 ser
As you can see it looks like 4 children dropped out of the scheduler. The only thing suspicious is that RTPProxy became non-responsive around that time. At least thats the only thing the log shows: Nov 22 15:56:17 /usr/local/sbin/ser[17931]: ERROR: send_rtpp_command: timeout waiting reply from a RTP proxy
Any idea why these 4 children dropped out? Any hints on how to troubleshoot this?
Thanks,
-- Andres Network Admin http://www.telesip.net
Serusers mailing list serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers