For this same reason we switched to a UDP interface about 6 months ago.
But as you can see we still managed to get 4 Children locked up. It
still was able to do millions of transaction since the last lockup (2
months ago) but we are still clueless about why this is happening. We
have rtpproxy running again in the foreground outputting all messages to
a file. Hopefully this will provide enough info to get to the bottom of
this.
--
Andres
Network Admin
http://www.telesip.net
Greger V. Teigre wrote:
> Andres,
> I believe I remember this issue being brought up a while back. If I
> remember correctly, ser children locked up when communicating to a
> locked rtpproxy over socket interface. The "solution" was to use udp
> over loopback to communicate as this would fail that specific call,
> but not lock the ser process.
> g-)
>
> ----- Original Message ----- From: "Andres"
andres@telesip.net
> To:
serusers@lists.iptel.org
> Sent: Tuesday, November 22, 2005 10:19 PM
> Subject: [Serusers] SER Children Misbehaving
>
>
>> Today we had an incident where SER (0.9.4) children drained all the
>> CPUs of one of our servers.
>> Top Showed:
>> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
>> 17925 root 25 0 5644 5644 3888 R 25.5 0.2 6:26 1 ser
>> 17929 root 25 0 5672 5672 3880 R 24.7 0.2 6:48 0 ser
>> 17928 root 25 0 5688 5688 3872 R 24.3 0.2 6:25 1 ser
>> 17933 root 25 0 4540 4540 3740 R 22.8 0.2 6:00 0 ser
>>
>> And ..
>> # ps -Al | grep ser
>> 1 S 0 17901 1 0 85 0 - 14200 pause ? 00:00:00 ser
>> 1 S 0 17916 17901 0 75 0 - 14200 pipe_w ? 00:00:00 ser
>> 1 S 0 17917 17901 0 75 0 - 14418 schedu ? 00:00:22 ser
>> 1 S 0 17918 17901 0 75 0 - 14422 schedu ? 00:00:23 ser
>> 1 S 0 17919 17901 0 75 0 - 14423 schedu ? 00:00:24 ser
>> 1 S 0 17920 17901 0 75 0 - 14447 schedu ? 00:00:22 ser
>> 1 S 0 17921 17901 0 75 0 - 14421 schedu ? 00:00:22 ser
>> 1 S 0 17922 17901 0 75 0 - 14424 schedu ? 00:00:22 ser
>> 1 S 0 17923 17901 0 75 0 - 14428 schedu ? 00:00:21 ser
>> 1 S 0 17924 17901 0 75 0 - 14424 schedu ? 00:00:22 ser
>> 1 R 0 17925 17901 0 85 0 - 14448 - ? 00:06:22 ser
>> 1 S 0 17926 17901 0 75 0 - 14457 schedu ? 00:00:49 ser
>> 1 S 0 17927 17901 0 75 0 - 14453 schedu ? 00:00:50 ser
>> 1 R 0 17928 17901 0 85 0 - 14477 - ? 00:06:20 ser
>> 1 R 0 17929 17901 0 85 0 - 14455 - ? 00:06:44 ser
>> 1 S 0 17930 17901 0 75 0 - 14452 schedu ? 00:00:50 ser
>> 1 S 0 17931 17901 0 75 0 - 14448 schedu ? 00:00:50 ser
>> 1 S 0 17932 17901 0 76 0 - 14448 schedu ? 00:00:49 ser
>> 1 R 0 17933 17901 0 85 0 - 14235 - ? 00:05:55 ser
>>
>> As you can see it looks like 4 children dropped out of the
>> scheduler. The only thing suspicious is that RTPProxy became
>> non-responsive around that time. At least thats the only thing the
>> log shows:
>> Nov 22 15:56:17 /usr/local/sbin/ser[17931]: ERROR:
>> send_rtpp_command: timeout waiting reply from a RTP proxy
>>
>> Any idea why these 4 children dropped out? Any hints on how to
>> troubleshoot this?
>>
>> Thanks,
>>
>> --
>> Andres
>> Network Admin
>>
http://www.telesip.net
>>
>>
>> _______________________________________________
>> Serusers mailing list
>> serusers@lists.iptel.org
>>
http://lists.iptel.org/mailman/listinfo/serusers
>>
>>
>
>