This was a rather peculiar crash:
From the logs, it would appear that Kamailio simply stopped processing
messages at some point. There's about 8 minutes of zero log output at a
time of constantly incoming traffic.
At some point, this situation is resolved when all Kamailio processes
die with a normal SIGTERM, when someone manually restarted it:
Mar 26 20:40:10 Proxy1 /usr/local/sbin/kamailio[27498]: NOTICE: <core>
[main.c:739]: handle_sigs(): Thank you for flying kamailio!!!
Mar 26 20:40:10 Proxy1 /usr/local/sbin/kamailio[27535]: INFO: <core>
[main.c:850]: sig_usr(): signal 15 received.
...
But there are a few things here that are difficult to explain from the log:
1. Why was there no SIP stack response for 8 minutes, no logging
activity, etc?
2. We have a script that checks if Kamailio processes are running every
1 second, and restarts Kamailio if it's not. It sends an e-mail
informing us of that development also.
It's a rather naive check:
ps aux | grep kamailio | grep -v 'grep kamailio' | wc -l
But in this case, the script was not triggered, which would imply that
some Kamailio processes--perhaps all--remained running.
There is no indication in the logs that any process died for any reason,
except for the 'signal 15' received by all processes at the time of
manual restart.
3. Why was a core dump generated at the time of the restart, if nothing
crashed?
#3 is most interesting to me, because if it were some other problem,
e.g. blocking of SIP worker threads for some reason, then I wouldn't
expect a core dump upon service shutdown.
There is no other indication of any child process dying with SIGSEGV or
SIGABRT.
-- Alex
On 03/27/2015 06:17 AM, Alex Balashov wrote:
Hello,
The system experienced another crash yesterday, but unfortunately the
core dump is not very insightful, possibly due to being incomplete:
BFD: Warning: /tmp/./core.kamailio.500.1427402410.27498 is truncated:
expected core file size >= 8602058752, found: 1769852928.
[New Thread 27498]
Cannot access memory at address 0x7f52891e3168
Cannot access memory at address 0x7f52891e3168
Cannot access memory at address 0x7f52891e3168
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Failed to read a valid object file image from memory.
Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio.pid
-m 8192 -u evaristesys -g eva'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f5286d97e45 in ?? ()
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.149.el6_6.5.x86_64
(gdb) where
#0 0x00007f5286d97e45 in ?? ()
Cannot access memory at address 0x7fffbe32a210
That's not much help at all, so I cannot possibly say it is for the same
reasons as before.
--
Alex Balashov | Principal | Evariste Systems LLC
303 Perimeter Center North, Suite 300
Atlanta, GA 30346
United States
Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct)
Web:
http://www.evaristesys.com/,
http://www.csrpswitch.com/