### Description ``` Kamailio crashes out with message: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 30 ```
### Troubleshooting
#### Reproduction
Happens sporadicly. Some times 6 times in a day. Some times it won't happen for a week. But on average it probably happens once per day.
#### Debugging Data
``` Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 29579] [New LWP 9469] [New LWP 29581] Missing separate debuginfo for /usr/lib64/libpq.so.5 Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/00/595f0be1374285c79ea1d4cdcee72682108fab.debug [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Missing separate debuginfo for /usr/lib64/libstdc++.so.6 Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/43/4c339faa62ca4e59eb71572a60967fe54a69ad.debug Missing separate debuginfo for /lib64/libgcc_s.so.1 Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/3f/d5f89de59e124ab1419a0bd16775b4096e84fd.debug Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio.pid -m 256 -M 8 -u kamailio -g ka'. Program terminated with signal 11, Segmentation fault. #0 0x00007fc9a9350521 in _int_malloc () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-13.16.amzn1.x86_64 glibc-2.17-106.168.amzn1.x86_64 keyutils-libs-1.5.8-3.12.amzn1.x86_64 krb5-libs-1.13.2-12.40.amzn1.x86_64 libcom_err-1.42.12-4.40.amzn1.x86_64 libcurl-7.47.1-9.68.amzn1.x86_64 libevent-2.0.21-4.19.amzn1.x86_64 libicu-50.1.2-11.12.amzn1.x86_64 libidn-1.18-2.8.amzn1.x86_64 libpsl-0.6.2-1.2.amzn1.x86_64 libselinux-2.1.10-3.22.amzn1.x86_64 libssh2-1.4.2-2.13.amzn1.x86_64 libxml2-2.9.1-6.3.49.amzn1.x86_64 nspr-4.11.0-1.37.amzn1.x86_64 nss-3.21.3-2.77.amzn1.x86_64 nss-softokn-freebl-3.16.2.3-14.4.39.amzn1.x86_64 nss-util-3.21.3-1.1.51.amzn1.x86_64 openldap-2.4.40-12.29.amzn1.x86_64 openssl-1.0.1k-15.96.amzn1.x86_64 xz-libs-5.1.2-12alpha.12.amzn1.x86_64 zlib-1.2.8-7.18.amzn1.x86_64
(gdb) bt full #0 0x00007fc9a9350521 in _int_malloc () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fc9a935226c in malloc () from /lib64/libc.so.6 No symbol table info available. #2 0x00007fc9a9ac45d5 in _dl_scope_free () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #3 0x00007fc9a9abf691 in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #4 0x00007fc9a9ac589b in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #5 0x00007fc9a9ac11b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #6 0x00007fc9a9ac51ab in _dl_open () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #7 0x00007fc9a94034d2 in do_dlopen () from /lib64/libc.so.6 No symbol table info available. #8 0x00007fc9a9ac11b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #9 0x00007fc9a9403592 in __libc_dlopen_mode () from /lib64/libc.so.6 No symbol table info available. #10 0x00007fc9a93dcac5 in init () from /lib64/libc.so.6 No symbol table info available. #11 0x00007fc9a6c1bbb0 in pthread_once () from /lib64/libpthread.so.0 No symbol table info available. #12 0x00007fc9a93dcbdc in backtrace () from /lib64/libc.so.6 No symbol table info available. #13 0x00007fc9a9347344 in __libc_message () from /lib64/libc.so.6 No symbol table info available. #14 0x00007fc9a934f053 in _int_free () from /lib64/libc.so.6 No symbol table info available. #15 0x00007fc9a933cff5 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6 No symbol table info available. #16 0x00007fc9a88c29bf in _nss_files_gethostbyname4_r () from /lib64/libnss_files.so.2 No symbol table info available. #17 0x00007fc9a93aebb8 in gaih_inet () from /lib64/libc.so.6 No symbol table info available. #18 0x00007fc9a93b227d in getaddrinfo () from /lib64/libc.so.6 No symbol table info available. #19 0x00007fc99f7c86e4 in ?? () from /usr/lib64/libcurl.so.4 No symbol table info available. #20 0x00007fc99f7d2efa in ?? () from /usr/lib64/libcurl.so.4 No symbol table info available. #21 0x00007fc99f7d099b in ?? () from /usr/lib64/libcurl.so.4 No symbol table info available. #22 0x00007fc9a6c16dc5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #23 0x00007fc9a93c8c9d in clone () from /lib64/libc.so.6 No symbol table info available.
(gdb) info locals No symbol table info available.
(gdb) list 1834 int proto; 1835 char *options; 1836 int ret; 1837 unsigned int seed; 1838 int rfd; 1839 int debug_save, debug_flag; 1840 int dont_fork_cnt; 1841 struct name_lst* n_lst; 1842 char *p; 1843 struct stat st = {0}; (gdb) ```
#### Log Messages
``` Aug 7 22:43:30 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [receive.c:173]: receive_msg(): core parsing of SIP message failed (67.191.157.42:5060/1) Aug 7 22:43:31 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event as-feature-event Aug 7 22:43:32 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:32 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:32 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event missed-call-summary Aug 7 22:43:36 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:36 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:37 ip-172-31-46-236 /usr/local/sbin/kamailio[9450]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event as-feature-event Aug 7 22:43:38 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event as-feature-event Aug 7 22:43:40 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:40 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:44 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:44 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:47 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: ERROR: <core> [receive.c:173]: receive_msg(): core parsing of SIP message failed (67.191.157.42:5060/1) Aug 7 22:43:48 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:48 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:50 ip-172-31-46-236 /usr/local/sbin/kamailio[9450]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event call-info Aug 7 22:43:52 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event as-feature-event Aug 7 22:43:52 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:52 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:52 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event as-feature-event Aug 7 22:43:52 ip-172-31-46-236 /usr/local/sbin/kamailio[9450]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event missed-call-summary Aug 7 22:43:56 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:43:56 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:43:57 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [receive.c:173]: receive_msg(): core parsing of SIP message failed (67.191.157.42:5060/1) Aug 7 22:44:00 ip-172-31-46-236 /usr/local/sbin/kamailio[9450]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:44:00 ip-172-31-46-236 /usr/local/sbin/kamailio[9450]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:44:00 ip-172-31-46-236 /usr/local/sbin/kamailio[9450]: ERROR: <core> [receive.c:173]: receive_msg(): core parsing of SIP message failed (67.191.157.42:5060/1) Aug 7 22:44:09 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:44:09 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:44:10 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:44:10 ip-172-31-46-236 /usr/local/sbin/kamailio[9452]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:44:11 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:44:11 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:44:11 ip-172-31-46-236 /usr/local/sbin/kamailio[9477]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 30 ```
#### SIP Traffic
N/A
### Possible Solutions
N/A
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 4.4.6 (x86_64/linux) 4d49b3 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 4d49b3 compiled on 11:39:53 Jul 29 2017 with gcc 4.8.3 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `uname -a`) -->
``` Linux ip-172-31-46-236 4.4.41-36.55.amzn1.x86_64 #1 SMP Wed Jan 18 01:03:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
It's Amazons Linux Distribution. I believe it's CentOS based. ```
Are you running some module that could create additional processes? The crash doesn't seem to be related to kamailio code. Here could be using some of embedded scripting languages, like app_lua, app_python, and the scripts create own processes ....
Also, can you enable one core file per pid (process) and then see if you get more than one core? If yes, attach the backtrace for all core files.
how i can reproduce it. is there a way to reproduce the same.as **miconda** says thats might be because of other module.please also check the memory on amazon monitor you can find the memory usage.so its if because of memory leak that also you can identify and report it here.
These are the modules I'm loading
``` loadmodule "db_postgres.so" loadmodule "mi_fifo.so" loadmodule "kex.so" loadmodule "tm.so" loadmodule "tmx.so" loadmodule "sl.so" loadmodule "rr.so" loadmodule "pv.so" loadmodule "maxfwd.so" loadmodule "usrloc.so" loadmodule "registrar.so" loadmodule "textops.so" loadmodule "siputils.so" loadmodule "xlog.so" loadmodule "sanity.so" loadmodule "ctl.so" loadmodule "cfg_rpc.so" loadmodule "mi_rpc.so" loadmodule "acc.so" loadmodule "dispatcher.so" loadmodule "auth.so" loadmodule "auth_db.so" loadmodule "presence.so" loadmodule "presence_xml.so" loadmodule "presence_mwi.so" loadmodule "presence_dialoginfo.so" loadmodule "nathelper.so" loadmodule "utils.so" loadmodule "path.so" loadmodule "htable.so" loadmodule "pike.so" loadmodule "http_client.so" loadmodule "http_async_client.so" loadmodule "siptrace.so" ```
How do I enable one core per pid? a quick googling didn't turn up anything for me :(
To get per-process core file then you can use on Linux: echo "1" > /proc/sys/kernel/core_uses_pid
@SurendraPlivo
Alright, I've executed the following on the server. Do I need to restart Kamailio or anything for it to take effect?
```` echo "1" > /proc/sys/kernel/core_uses_pid ````
I've checked the memory monitor for the time around the last crash (+/- 6 hours), but there was no noticable change in memory usage before or after the crash.
You can see the graph here: http://i.imgur.com/3v3usoJ.png
The red line is approximate when the crash happened.
@jonastelzio no need to restart the kamailio. yeah after seeing that i dont think this is about memory leak.
There should be no need to restart after enabling core per pid.
Can you see the next log messages after:
``` Aug 7 22:44:11 ip-172-31-46-236 /usr/local/sbin/kamailio[9477]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 30 ``` This is just saying that one process stopped, but not the reason why (signal, etc...), the log messages after it provide some information.
You can also try to run with latest branch 4.4 (pull, recompile and reinstall), I just backported a patch that should take care of handling the exit of additional processes created by libs, respectively:
* https://github.com/kamailio/kamailio/commit/78684f2bba3d408e60eb8450bf915ebe...
The last core dump it did was 300mb - it's running 20 processes or so, so I would guess now it needs to dump 300*20mb?
If that's the case I need to expand the volume of this server before enabling this, else it'll potentially run out of space.
```` echo "0" > /proc/sys/kernel/core_uses_pid ````
Will disable this again I assume?
@miconda
Here are the lines immediately after the Critical
```` Aug 7 22:44:11 ip-172-31-46-236 /usr/local/sbin/kamailio[9477]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 30 Aug 7 22:44:13 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: maxfwd [mf_funcs.c:62]: is_maxfwd_present(): parsing MAX_FORWARD header failed! Aug 7 22:44:13 ip-172-31-46-236 /usr/local/sbin/kamailio[9449]: ERROR: <core> [msg_translator.c:2330]: build_res_buf_from_sip_req(): alas, parse_headers failed Aug 7 22:44:13 ip-172-31-46-236 /usr/local/sbin/kamailio[9451]: NOTICE: presence [subscribe.c:1291]: handle_subscribe(): Unsupported presence event missed-call-summary Aug 7 22:44:14 ip-172-31-46-236 /usr/local/sbin/kamailio[9446]: ALERT: <core> [main.c:740]: handle_sigs(): child process 9469 exited by a signal 11 Aug 7 22:44:14 ip-172-31-46-236 /usr/local/sbin/kamailio[9446]: ALERT: <core> [main.c:743]: handle_sigs(): core was generated Aug 7 22:44:24 ip-172-31-46-236 kamailio: WARNING: <core> [daemonize.c:348]: daemonize(): pid file contains old pid, replacing pid Aug 7 22:44:24 ip-172-31-46-236 /usr/local/sbin/kamailio[29653]: ERROR: dispatcher [dispatcher.c:788]: ds_warn_fixup(): failover functions used, but required AVP parameters are NULL -- feature disabled ````
I'll grow the volume and install the latest revision on the 4.4 branch asap during off hours.
It usually dumps to core files, one of the process that crashed and another one for main attendant. Only when all processes do a segfault (or similar fault operation) at the same time, there will be a core for each process.
@miconda
Ah, as this looks like one of the processes crashing out we'll get just two core files?
Likely two, but could be still only one -- with this option enabled it is sure that the right core file is not lost.
@miconda
Roger that! I've re-enabled core per pid. now we play the waiting game.
Grab now the output of:
``` kamclt ps ```
so when it crashes, you can see the type of process based on pid. It will be useful to see if it is one created by kamailio or not, and if yes, it's easier to know which kind of processing is supposed to do.
And for the old core (the one from where you got the last log messages above), if you still have it, take from gdb the output for following commands:
``` set $i=0 while ($i < process_no) p pt[$i++] end ```
to see the type of the process that exited.
This is the result of that:
```` $31 = {pid = 9446, unix_sock = -1, idx = -1, desc = "main process - attendant", '\000' <repeats 103 times>} $32 = {pid = 9449, unix_sock = 13, idx = -1, desc = "udp receiver child=0 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $33 = {pid = 9450, unix_sock = 14, idx = -1, desc = "udp receiver child=1 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $34 = {pid = 9451, unix_sock = 15, idx = -1, desc = "udp receiver child=2 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $35 = {pid = 9452, unix_sock = 16, idx = -1, desc = "udp receiver child=3 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $36 = {pid = 9453, unix_sock = 17, idx = -1, desc = "udp receiver child=0 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $37 = {pid = 9454, unix_sock = 18, idx = -1, desc = "udp receiver child=1 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $38 = {pid = 9455, unix_sock = 19, idx = -1, desc = "udp receiver child=2 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $39 = {pid = 9456, unix_sock = 20, idx = -1, desc = "udp receiver child=3 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $40 = {pid = 9457, unix_sock = 21, idx = -1, desc = "slow timer", '\000' <repeats 117 times>} $41 = {pid = 9458, unix_sock = 22, idx = -1, desc = "timer", '\000' <repeats 122 times>} $42 = {pid = 9459, unix_sock = 23, idx = -1, desc = "secondary timer", '\000' <repeats 112 times>} $43 = {pid = 9460, unix_sock = 24, idx = -1, desc = "MI FIFO", '\000' <repeats 120 times>} $44 = {pid = 9467, unix_sock = 26, idx = -1, desc = "ctl handler", '\000' <repeats 116 times>} $45 = {pid = 9468, unix_sock = 4, idx = -1, desc = "TIMER NH", '\000' <repeats 119 times>} ````
Process 9469 doesn't seem to be mentioned?
@miconda
I've grapped kamctl ps output now:
```` Process:: ID=0 PID=29653 Type=main process - attendant Process:: ID=1 PID=29656 Type=udp receiver child=0 sock=172.31.46.236:5060 (sip.telzio.com:5060) Process:: ID=2 PID=29657 Type=udp receiver child=1 sock=172.31.46.236:5060 (sip.telzio.com:5060) Process:: ID=3 PID=29658 Type=udp receiver child=2 sock=172.31.46.236:5060 (sip.telzio.com:5060) Process:: ID=4 PID=29659 Type=udp receiver child=3 sock=172.31.46.236:5060 (sip.telzio.com:5060) Process:: ID=5 PID=29660 Type=udp receiver child=0 sock=172.31.46.236:6050 (sip.telzio.com:6050) Process:: ID=6 PID=29661 Type=udp receiver child=1 sock=172.31.46.236:6050 (sip.telzio.com:6050) Process:: ID=7 PID=29662 Type=udp receiver child=2 sock=172.31.46.236:6050 (sip.telzio.com:6050) Process:: ID=8 PID=29663 Type=udp receiver child=3 sock=172.31.46.236:6050 (sip.telzio.com:6050) Process:: ID=9 PID=29664 Type=slow timer Process:: ID=10 PID=29665 Type=timer Process:: ID=11 PID=29666 Type=secondary timer Process:: ID=12 PID=29671 Type=MI FIFO Process:: ID=13 PID=29674 Type=ctl handler Process:: ID=14 PID=29675 Type=TIMER NH Process:: ID=15 PID=29676 Type=Http Worker Process:: ID=16 PID=29677 Type=tcp receiver (generic) child=0 Process:: ID=17 PID=29678 Type=tcp receiver (generic) child=1 Process:: ID=18 PID=29679 Type=tcp receiver (generic) child=2 Process:: ID=19 PID=29683 Type=tcp receiver (generic) child=3 Process:: ID=20 PID=29684 Type=tcp main process ````
With a little guess work, if the list I posted before was complete, is it far fetched assume that the proces that crashed was Type=Http Worker
I'm using two types of HTTP in the kamailio config
http_client and http_async_client
Is it possible that this worker process is hosting http_async_client?
Looks like at least that modules spawns a process with that type:
https://github.com/kamailio/kamailio/blob/4.4/modules/http_async_client/http...
When you do the gdb print of processes, you may need to press ENTER or SPACE to get all of them, because of gdb output pagination.
It could be http async process, the backtrace showed something with libcurl.
@miconda
This is really all it gives me. I've tried both Enter, Space and a bunch of other things. Nothing really indicates there would be any more data to print.
```` (gdb) (gdb) (gdb) (gdb) set $i=0 (gdb) while ($i < process_no)
p pt[$i++] end
$16 = {pid = 9446, unix_sock = -1, idx = -1, desc = "main process - attendant", '\000' <repeats 103 times>} $17 = {pid = 9449, unix_sock = 13, idx = -1, desc = "udp receiver child=0 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $18 = {pid = 9450, unix_sock = 14, idx = -1, desc = "udp receiver child=1 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $19 = {pid = 9451, unix_sock = 15, idx = -1, desc = "udp receiver child=2 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $20 = {pid = 9452, unix_sock = 16, idx = -1, desc = "udp receiver child=3 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $21 = {pid = 9453, unix_sock = 17, idx = -1, desc = "udp receiver child=0 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $22 = {pid = 9454, unix_sock = 18, idx = -1, desc = "udp receiver child=1 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $23 = {pid = 9455, unix_sock = 19, idx = -1, desc = "udp receiver child=2 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $24 = {pid = 9456, unix_sock = 20, idx = -1, desc = "udp receiver child=3 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $25 = {pid = 9457, unix_sock = 21, idx = -1, desc = "slow timer", '\000' <repeats 117 times>} $26 = {pid = 9458, unix_sock = 22, idx = -1, desc = "timer", '\000' <repeats 122 times>} $27 = {pid = 9459, unix_sock = 23, idx = -1, desc = "secondary timer", '\000' <repeats 112 times>} $28 = {pid = 9460, unix_sock = 24, idx = -1, desc = "MI FIFO", '\000' <repeats 120 times>} $29 = {pid = 9467, unix_sock = 26, idx = -1, desc = "ctl handler", '\000' <repeats 116 times>} $30 = {pid = 9468, unix_sock = 4, idx = -1, desc = "TIMER NH", '\000' <repeats 119 times>} (gdb) (gdb) (gdb) ````
The what the value of process_no:
``` p process_no ```
Ohh, wait, I attached to last process myself and used the wrong variable, replace the `process_no` with `*process_count`.
@miconda
That did the trick!
```` (gdb) (gdb) set $i=0 (gdb) while ($i < *process_count)
p pt[$i++] end
$1 = {pid = 9446, unix_sock = -1, idx = -1, desc = "main process - attendant", '\000' <repeats 103 times>} $2 = {pid = 9449, unix_sock = 13, idx = -1, desc = "udp receiver child=0 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $3 = {pid = 9450, unix_sock = 14, idx = -1, desc = "udp receiver child=1 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $4 = {pid = 9451, unix_sock = 15, idx = -1, desc = "udp receiver child=2 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $5 = {pid = 9452, unix_sock = 16, idx = -1, desc = "udp receiver child=3 sock=172.31.46.236:5060 (sip.domain.com:5060)", '\000' <repeats 61 times>} $6 = {pid = 9453, unix_sock = 17, idx = -1, desc = "udp receiver child=0 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $7 = {pid = 9454, unix_sock = 18, idx = -1, desc = "udp receiver child=1 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $8 = {pid = 9455, unix_sock = 19, idx = -1, desc = "udp receiver child=2 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $9 = {pid = 9456, unix_sock = 20, idx = -1, desc = "udp receiver child=3 sock=172.31.46.236:6050 (sip.domain.com:6050)", '\000' <repeats 61 times>} $10 = {pid = 9457, unix_sock = 21, idx = -1, desc = "slow timer", '\000' <repeats 117 times>} $11 = {pid = 9458, unix_sock = 22, idx = -1, desc = "timer", '\000' <repeats 122 times>} $12 = {pid = 9459, unix_sock = 23, idx = -1, desc = "secondary timer", '\000' <repeats 112 times>} $13 = {pid = 9460, unix_sock = 24, idx = -1, desc = "MI FIFO", '\000' <repeats 120 times>} $14 = {pid = 9467, unix_sock = 26, idx = -1, desc = "ctl handler", '\000' <repeats 116 times>} $15 = {pid = 9468, unix_sock = 4, idx = -1, desc = "TIMER NH", '\000' <repeats 119 times>} $16 = {pid = 9469, unix_sock = 30, idx = -1, desc = "Http Worker", '\000' <repeats 116 times>} $17 = {pid = 9470, unix_sock = 31, idx = 0, desc = "tcp receiver (generic) child=0", '\000' <repeats 97 times>} $18 = {pid = 9474, unix_sock = 32, idx = 1, desc = "tcp receiver (generic) child=1", '\000' <repeats 97 times>} $19 = {pid = 9475, unix_sock = 34, idx = 2, desc = "tcp receiver (generic) child=2", '\000' <repeats 97 times>} $20 = {pid = 9476, unix_sock = 36, idx = 3, desc = "tcp receiver (generic) child=3", '\000' <repeats 97 times>} $21 = {pid = 9477, unix_sock = -1, idx = -1, desc = "tcp main process", '\000' <repeats 111 times>} (gdb) (gdb) ````
Which also confirms that pid 9469 is in fact the Http Worker.
If I'm able to search the kamailio source code correct, then only http_async_client spawns that process. I'll rewrite some config code to not use that module, and just use http_client instead. It's fairly inconsequential in this case, so I'm alright with that.
If I catch another core dump with core per pid enabled I'll be sure to post that here as well.
Changed the title of the issue to reflect better where the problem seems to be and assigned the authors of the module (as on its README), in case they know if it is an issue already fixed or may have some ideas about troubleshooting.
I've exeperienced a similar crash on centos 7.3 which come with curl 7.29. I haven't found anything better than updating curl to 7.54.1, which I built from this src rpm: http://mirror.city-fan.org/ftp/contrib/sysutils/Mirroring/curl-7.54.1-8.0.cf....
Got another crash during the night and it dumped a core for the crashed process. However the one that was originally dumped was also from the dying Http Worker-process, so it contained nothing that wasn't already posted in this issue.
I have both cores stored away, if the module devs want me to pull anything further from them
@grumvalski - thanks, very useful detail! The dev has usual some good insights!
@jonastelzio - as @grumvalski also mentioned above, but based on the back trace as well, it looks to be an issue in the libcurl version -- it is nothing in the backtrace that relates to kamailio code.
Try to upgrade the libcurl and see how it goes.
I would close this issue very soon, if no one has something else to add. If you get issues with another libcurl version, probably is better to open another issue with the information related to that build. This one got already log of comments that won't be relevant for the future build.
Closed #1208.
Closing as per above comments. Open a new one if you have a crash with another version of libcurl.
Had basically the same issue, and switching curl from 7.38 to 7.52 seems to have resolved it. In case anyone else stumbles on this.