Hello,
I looked at the core files when you reported, but due to busy times
traveling, I didn't get the chance to follow up. However, the core files
are truncated, so no useful details there.
We will release 5.1.6 these days, it has few more bits that might help
troubleshooting here, if there will be another good core dump file.
Cheers,
Daniel
On 26.09.18 09:25, Jayesh Nambiar wrote:
Hello Daniel,
Just a quick update on this. After running from the 5.1 branch twice it
happened that Server became unresponsive. It never crashed and on forceful
shutdown it only generated a core with shutdown routine. Here is what the
syslog contained off while the calls were being sent to kamailio but
kamailio was not processing any calls:
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first fre
And here's the "bt full" of the core file:
[New LWP 16003]
Cannot access memory at address 0x7fcd64e1a148
Cannot access memory at address 0x7fcd64e1a140
Failed to read a valid object file image from memory.
Core was generated by `/usr/local/kamailio/sbin/kamailio -P
/var/run/siptrunk.pid -f /usr/local/carrie'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fcd6443f428 in ?? ()
(gdb) bt full
#0 0x00007fcd6443f428 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7ffed40e9e68
Thanks,
- Jayesh
On Sat, Sep 15, 2018 at 7:09 PM Jayesh Nambiar <jayesh1017(a)gmail.com>
wrote:
Hello Daniel,
I have now git cloned from 5.1 branch and restarted kamailio today. Also
an observation; that when one particular customer with around 40-50cps
capacity starts his dialing is when the kamailio kept crashing. I will
observe in the coming week and report of any crashes accordingly.
Thanks,
- Jayesh
On Wed, Sep 12, 2018 at 3:11 AM Jayesh Nambiar <jayesh1017(a)gmail.com>
wrote:
> Thank you Daniel. I will deploy from 5.1 branch. I will be able to
> update it in next two days. Since this is a production system, I will have
> to have find a proper time slot to get this updated. Will let you know once
> that is done.
> Thanks again.
>
> - Jayesh
>
> On Tue, Sep 11, 2018 at 2:16 PM Daniel-Constantin Mierla <
> miconda(a)gmail.com> wrote:
>
>> Hi Jayesh,
>>
>> I looked over the weekend and seems like a race or a double free. I
>> backported some patches to tm module from master to 5.1 branch (a counter
>> to see if there is more than one free). Would you be able to deploy from
>> branch 5.1 or nightly builds for 5.1? With the next crash I can evaluate
>> (rule out or go for it) if it is double free or not.
>>
>> I haven't written you yet, because I planned to dig more into libev
>> (which is used by evapi), to see if it creates some threads that can mess
>> up with some kamailio globals, although I expect it is event loop in a
>> single thread. Other work related project took a lot of time these days, so
>> not there yet ...
>> Cheers,
>> Daniel
>>
>> On 11.09.18 09:56, Jayesh Nambiar wrote:
>>
>> Hi Daniel,
>> Just a gentle follow up on this. Could you find anything useful?
>>
>> Thanks,
>>
>> - Jayesh
>>
>>
>>>>> On 07.09.18 09:56, Jayesh Nambiar wrote:
>>>>>
>>>>> Hi Daniel,
>>>>> This happens randomly. It is not a testbed and never reproducible on
>>>>> the test server. I can still give you access if you wish to analyse
the
>>>>> core files to check whats happening here exactly.
>>>>> It simply feels like on heavy load the evapi socket gives up.
>>>>> Earlier I was running 5.0.2 and in that kamailio would get stuck by
not
>>>>> sending any async events on evapi socket. Not even the
>>>>> evapi:connection-closed event gets triggered. After I upgraded to
latest
>>>>> stable version (5.1.5), it at least started crashing. Here are two
core
>>>>> dumps put onto pastebin.
>>>>>
https://pastebin.com/nn6gJapm
>>>>>
https://pastebin.com/ph7b8vFH
>>>>>
>>>>> Thanks for all the support,
>>>>>
>>>>> - Jayesh
>>>>>
>>>>> On Fri, Sep 7, 2018 at 1:55 AM Daniel-Constantin Mierla <
>>>>> miconda(a)gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> are you able to reproduce it somehow? Or just happens randomly?
>>>>>>
>>>>>> Is it on a testbed where I could get access to investigate the
>>>>>> corefiles? If not, then I will ask for more details from cores
over email,
>>>>>> first would be 'bt full' for both cores and 'p
*dead_cell' for the second
>>>>>> one.
>>>>>> Cheers,
>>>>>> Daniel
>>>>>>
>>>>>> On 05.09.18 17:47, Jayesh Nambiar wrote:
>>>>>>
>>>>>> Hi Daniel,
>>>>>> Have got these core dumps. Let me know if I should be doing a bt
>>>>>> full. I'll pastebin and send. Thanks,
>>>>>>
>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>> #0 0x0000000000505d30 in sip_msg_shm_clone
>>>>>> (org_msg=0x7f6ab0d9f618, sip_msg_len=0x7ffdddb2e8bc,
clone_lumps=1) at
>>>>>> core/sip_msg_clone.c:491
>>>>>> 491 LUMP_LIST_LEN(len, org_msg->add_rm);
>>>>>> (gdb) bt
>>>>>> #0 0x0000000000505d30 in sip_msg_shm_clone
>>>>>> (org_msg=0x7f6ab0d9f618, sip_msg_len=0x7ffdddb2e8bc,
clone_lumps=1) at
>>>>>> core/sip_msg_clone.c:491
>>>>>> #1 0x00007f6af0bdf68d in fake_req (shmem_msg=0x7f6ab0d9f618,
>>>>>> extra_flags=0, uac=0x7f6ab1738980, len=0x7ffdddb2e8bc) at
t_reply.c:854
>>>>>> #2 0x00007f6af0c3aa27 in t_continue_helper (hash_index=58039,
>>>>>> label=413633661, rtact=0x7f6af10500f0, cbname=0x0, cbparam=0x0)
at
>>>>>> t_suspend.c:293
>>>>>> #3 0x00007f6af0c3eed4 in t_continue (hash_index=58039,
>>>>>> label=413633661, route=0x7f6af10500f0) at t_suspend.c:583
>>>>>> #4 0x00007f6aae4dd010 in w_t_continue (msg=0x7ffdddb2fa60,
>>>>>> idx=0x7f6af1098e90 "8\306\t\361j\177",
lbl=0x7f6af1098ff0
>>>>>> "\240\264\t\361j\177", rtn=0x7f6af1099150
"0\275\t\361j\177") at
>>>>>> tmx_mod.c:760
>>>>>> #5 0x000000000045b477 in do_action (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af109ab38, msg=0x7ffdddb2fa60) at core/action.c:1085
>>>>>> #6 0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af1096630, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>> #7 0x000000000045b234 in do_action (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1058
>>>>>> #8 0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>> #9 0x0000000000468797 in run_top_route (a=0x7f6af10a0f80,
>>>>>> msg=0x7ffdddb2fa60, c=0x0) at core/action.c:1654
>>>>>> #10 0x00007f6aabe79370 in evapi_run_cfg_route
>>>>>> (evenv=0x7ffdddb30250, rt=3, rtname=0x7f6aac08cb18
<_evapi_rts+56>) at
>>>>>> evapi_dispatch.c:161
>>>>>> #11 0x00007f6aabe7f271 in evapi_recv_client
(loop=0x7f6aabe698e0,
>>>>>> watcher=0x27af5e0, revents=1) at evapi_dispatch.c:467
>>>>>> #12 0x00007f6aabc5fd73 in ev_invoke_pending () from
>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>> #13 0x00007f6aabc633de in ev_run () from
>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>> #14 0x00007f6aabe7867c in ev_loop (loop=0x7f6aabe698e0, flags=0)
at
>>>>>> /usr/include/ev.h:835
>>>>>> #15 0x00007f6aabe83fc6 in evapi_run_dispatcher
>>>>>> (laddr=0x7f6af0f72300 "127.0.0.1", lport=8060) at
evapi_dispatch.c:705
>>>>>> #16 0x00007f6aabe6e262 in child_init (rank=0) at evapi_mod.c:213
>>>>>> #17 0x0000000000542cad in init_mod_child (m=0x7f6af0f71b70,
rank=0)
>>>>>> at core/sr_module.c:943
>>>>>> #18 0x0000000000542971 in init_mod_child (m=0x7f6af0f72968,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #19 0x0000000000542971 in init_mod_child (m=0x7f6af0f73d38,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #20 0x0000000000542971 in init_mod_child (m=0x7f6af0f74670,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #21 0x0000000000542971 in init_mod_child (m=0x7f6af0f76708,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #22 0x0000000000542971 in init_mod_child (m=0x7f6af0f76c08,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #23 0x0000000000542971 in init_mod_child (m=0x7f6af0f770d0,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #24 0x0000000000542971 in init_mod_child (m=0x7f6af0f77cf0,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #25 0x0000000000542971 in init_mod_child (m=0x7f6af0f78808,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #26 0x0000000000542971 in init_mod_child (m=0x7f6af0f78bd8,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #27 0x0000000000542971 in init_mod_child (m=0x7f6af0f794c8,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #28 0x0000000000542971 in init_mod_child (m=0x7f6af0f79920,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #29 0x0000000000542971 in init_mod_child (m=0x7f6af0f7a330,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #30 0x0000000000542971 in init_mod_child (m=0x7f6af0f7afd0,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #31 0x0000000000542971 in init_mod_child (m=0x7f6af0f7bc80,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #32 0x000000000054303d in init_child (rank=0) at
>>>>>> core/sr_module.c:970
>>>>>> #33 0x0000000000425399 in main_loop () at main.c:1701
>>>>>> #34 0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>> main.c:2638
>>>>>>
>>>>>> And this:
>>>>>> [New LWP 15804]
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> Using host libthread_db library
>>>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>> #0 0x00007f6af0b90b70 in free_cell_helper
>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630
"timer.c",
>>>>>> fline=654) at h_table.c:230
>>>>>> 230 foo = tt->next;
>>>>>> (gdb) bt
>>>>>> #0 0x00007f6af0b90b70 in free_cell_helper
>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630
"timer.c",
>>>>>> fline=654) at h_table.c:230
>>>>>> #1 0x00007f6af0c24409 in wait_handler (ti=932640643,
>>>>>> wait_tl=0x7f6ab0a6bb28, data=0x7f6ab0a6baa8) at timer.c:654
>>>>>> #2 0x00000000004bb445 in timer_list_expire (t=932640643,
>>>>>> h=0x7f6ab03ad158, slow_l=0x7f6ab03ae480, slow_mark=271) at
core/timer.c:874
>>>>>> #3 0x00000000004bb8ab in timer_handler () at core/timer.c:939
>>>>>> #4 0x00000000004bbd30 in timer_main () at core/timer.c:978
>>>>>> #5 0x00000000004250f9 in main_loop () at main.c:1691
>>>>>> #6 0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>> main.c:2638
>>>>>>
>>>>>> On Wed, Sep 5, 2018 at 3:13 PM Daniel-Constantin Mierla <
>>>>>> miconda(a)gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> the backtrace doesn't show any hint about kamailio, only
from
>>>>>>> closelog() up.
>>>>>>>
>>>>>>> It may be the core generated by shutdown procedure, have you
>>>>>>> enabled one core file per pid/process? If not, do it and
reproduce the
>>>>>>> issue again, you may get two core files, one being the
runtime issue and
>>>>>>> the other one from shutdown procedure, which likely is an
effect of the
>>>>>>> other one. The one from the runtime is more relevant.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 05.09.18 10:09, Jayesh Nambiar wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>> I'm using kamailio 5.1.5 with evapi. I have a node.js
connecting
>>>>>>> with kamailio evapi to which I send events and also consume
events based on
>>>>>>> which I do the routing. I have 8 evapi workers defined in the
config.
>>>>>>> The problem is that kamailio randomly crashes on high load.
I'm
>>>>>>> assuming that it is related to the evapi module as rest of
the config is
>>>>>>> pretty straight forward. I could get a core file and
here's the core dump:
>>>>>>> [New LWP 14042]
>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>> Using host libthread_db library
>>>>>>> “/lib/x86_64-linux-gnu/libthread_db.so.1”.
>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie’.
>>>>>>> Program terminated with signal SIGABRT, Aborted.
>>>>>>> #0 0x00007f9995283428 in __GI_raise (sig=sig@entry=6) at
>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or
>>>>>>> directory.
>>>>>>> (gdb) bt
>>>>>>> #0 0x00007f9995283428 in __GI_raise (sig=sig@entry=6) at
>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>> #1 0x00007f999528502a in __GI_abort () at abort.c:89
>>>>>>> #2 0x000000000041a029 in sig_alarm_abort (signo=14) at
main.c:646
>>>>>>> #3 <signal handler called>
>>>>>>> #4 0x00007f999534f497 in __libc_cleanup_routine
(f=<optimized
>>>>>>> out>) at ../sysdeps/nptl/libc-lockP.h:291
>>>>>>> #5 closelog () at ../misc/syslog.c:415
>>>>>>> #6 0x0000000000000000 in ?? ()
>>>>>>>
>>>>>>> Any help in this regards is would allow me to identify the
reason
>>>>>>> of the crash. Thanks for the support.
>>>>>>>
>>>>>>> - Jayesh
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Kamailio (SER) - Users Mailing
Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>>>>> Kamailio World Conference --
www.kamailioworld.com
>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda
--
www.linkedin.com/in/miconda
>>>>>> Kamailio World Conference --
www.kamailioworld.com
>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>>>
>>>>>>
>>>>> --
>>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>>> Kamailio World Conference --
www.kamailioworld.com
>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>>
>>>>>
>>>> --
>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>> Kamailio World Conference --
www.kamailioworld.com
>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>
>>>>
>> --
>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>> Kamailio World Conference --
www.kamailioworld.com
>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>
>>
_______________________________________________
Kamailio (SER) - Users Mailing
Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
--
Daniel-Constantin Mierla --