Hello Daniel,
Just a quick update on this. After running from the 5.1 branch twice it
happened that Server became unresponsive. It never crashed and on forceful
shutdown it only generated a core with shutdown routine. Here is what the
syslog contained off while the calls were being sent to kamailio but
kamailio was not processing any calls:
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first fre
And here's the "bt full" of the core file:
[New LWP 16003]
Cannot access memory at address 0x7fcd64e1a148
Cannot access memory at address 0x7fcd64e1a140
Failed to read a valid object file image from memory.
Core was generated by `/usr/local/kamailio/sbin/kamailio -P
/var/run/siptrunk.pid -f /usr/local/carrie'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fcd6443f428 in ?? ()
(gdb) bt full
#0 0x00007fcd6443f428 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7ffed40e9e68
Thanks,
- Jayesh
On Sat, Sep 15, 2018 at 7:09 PM Jayesh Nambiar <jayesh1017(a)gmail.com> wrote:
Hello Daniel,
I have now git cloned from 5.1 branch and restarted kamailio today. Also
an observation; that when one particular customer with around 40-50cps
capacity starts his dialing is when the kamailio kept crashing. I will
observe in the coming week and report of any crashes accordingly.
Thanks,
- Jayesh
On Wed, Sep 12, 2018 at 3:11 AM Jayesh Nambiar <jayesh1017(a)gmail.com>
wrote:
> Thank you Daniel. I will deploy from 5.1 branch. I will be able to update
> it in next two days. Since this is a production system, I will have to have
> find a proper time slot to get this updated. Will let you know once that is
> done.
> Thanks again.
>
> - Jayesh
>
> On Tue, Sep 11, 2018 at 2:16 PM Daniel-Constantin Mierla <
> miconda(a)gmail.com> wrote:
>
>> Hi Jayesh,
>>
>> I looked over the weekend and seems like a race or a double free. I
>> backported some patches to tm module from master to 5.1 branch (a counter
>> to see if there is more than one free). Would you be able to deploy from
>> branch 5.1 or nightly builds for 5.1? With the next crash I can evaluate
>> (rule out or go for it) if it is double free or not.
>>
>> I haven't written you yet, because I planned to dig more into libev
>> (which is used by evapi), to see if it creates some threads that can mess
>> up with some kamailio globals, although I expect it is event loop in a
>> single thread. Other work related project took a lot of time these days, so
>> not there yet ...
>> Cheers,
>> Daniel
>>
>> On 11.09.18 09:56, Jayesh Nambiar wrote:
>>
>> Hi Daniel,
>> Just a gentle follow up on this. Could you find anything useful?
>>
>> Thanks,
>>
>> - Jayesh
>>
>>
>>>>> On 07.09.18 09:56, Jayesh Nambiar wrote:
>>>>>
>>>>> Hi Daniel,
>>>>> This happens randomly. It is not a testbed and never reproducible on
>>>>> the test server. I can still give you access if you wish to analyse
the
>>>>> core files to check whats happening here exactly.
>>>>> It simply feels like on heavy load the evapi socket gives up.
Earlier
>>>>> I was running 5.0.2 and in that kamailio would get stuck by not
sending any
>>>>> async events on evapi socket. Not even the evapi:connection-closed
event
>>>>> gets triggered. After I upgraded to latest stable version (5.1.5), it
at
>>>>> least started crashing. Here are two core dumps put onto pastebin.
>>>>>
https://pastebin.com/nn6gJapm
>>>>>
https://pastebin.com/ph7b8vFH
>>>>>
>>>>> Thanks for all the support,
>>>>>
>>>>> - Jayesh
>>>>>
>>>>> On Fri, Sep 7, 2018 at 1:55 AM Daniel-Constantin Mierla <
>>>>> miconda(a)gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> are you able to reproduce it somehow? Or just happens randomly?
>>>>>>
>>>>>> Is it on a testbed where I could get access to investigate the
>>>>>> corefiles? If not, then I will ask for more details from cores
over email,
>>>>>> first would be 'bt full' for both cores and 'p
*dead_cell' for the second
>>>>>> one.
>>>>>> Cheers,
>>>>>> Daniel
>>>>>>
>>>>>> On 05.09.18 17:47, Jayesh Nambiar wrote:
>>>>>>
>>>>>> Hi Daniel,
>>>>>> Have got these core dumps. Let me know if I should be doing a bt
>>>>>> full. I'll pastebin and send. Thanks,
>>>>>>
>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>> #0 0x0000000000505d30 in sip_msg_shm_clone
(org_msg=0x7f6ab0d9f618,
>>>>>> sip_msg_len=0x7ffdddb2e8bc, clone_lumps=1) at
core/sip_msg_clone.c:491
>>>>>> 491 LUMP_LIST_LEN(len, org_msg->add_rm);
>>>>>> (gdb) bt
>>>>>> #0 0x0000000000505d30 in sip_msg_shm_clone
(org_msg=0x7f6ab0d9f618,
>>>>>> sip_msg_len=0x7ffdddb2e8bc, clone_lumps=1) at
core/sip_msg_clone.c:491
>>>>>> #1 0x00007f6af0bdf68d in fake_req (shmem_msg=0x7f6ab0d9f618,
>>>>>> extra_flags=0, uac=0x7f6ab1738980, len=0x7ffdddb2e8bc) at
t_reply.c:854
>>>>>> #2 0x00007f6af0c3aa27 in t_continue_helper (hash_index=58039,
>>>>>> label=413633661, rtact=0x7f6af10500f0, cbname=0x0, cbparam=0x0)
at
>>>>>> t_suspend.c:293
>>>>>> #3 0x00007f6af0c3eed4 in t_continue (hash_index=58039,
>>>>>> label=413633661, route=0x7f6af10500f0) at t_suspend.c:583
>>>>>> #4 0x00007f6aae4dd010 in w_t_continue (msg=0x7ffdddb2fa60,
>>>>>> idx=0x7f6af1098e90 "8\306\t\361j\177",
lbl=0x7f6af1098ff0
>>>>>> "\240\264\t\361j\177", rtn=0x7f6af1099150
"0\275\t\361j\177") at
>>>>>> tmx_mod.c:760
>>>>>> #5 0x000000000045b477 in do_action (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af109ab38, msg=0x7ffdddb2fa60) at core/action.c:1085
>>>>>> #6 0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af1096630, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>> #7 0x000000000045b234 in do_action (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1058
>>>>>> #8 0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>> #9 0x0000000000468797 in run_top_route (a=0x7f6af10a0f80,
>>>>>> msg=0x7ffdddb2fa60, c=0x0) at core/action.c:1654
>>>>>> #10 0x00007f6aabe79370 in evapi_run_cfg_route
(evenv=0x7ffdddb30250,
>>>>>> rt=3, rtname=0x7f6aac08cb18 <_evapi_rts+56>) at
evapi_dispatch.c:161
>>>>>> #11 0x00007f6aabe7f271 in evapi_recv_client
(loop=0x7f6aabe698e0,
>>>>>> watcher=0x27af5e0, revents=1) at evapi_dispatch.c:467
>>>>>> #12 0x00007f6aabc5fd73 in ev_invoke_pending () from
>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>> #13 0x00007f6aabc633de in ev_run () from
>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>> #14 0x00007f6aabe7867c in ev_loop (loop=0x7f6aabe698e0, flags=0)
at
>>>>>> /usr/include/ev.h:835
>>>>>> #15 0x00007f6aabe83fc6 in evapi_run_dispatcher
(laddr=0x7f6af0f72300
>>>>>> "127.0.0.1", lport=8060) at evapi_dispatch.c:705
>>>>>> #16 0x00007f6aabe6e262 in child_init (rank=0) at evapi_mod.c:213
>>>>>> #17 0x0000000000542cad in init_mod_child (m=0x7f6af0f71b70,
rank=0)
>>>>>> at core/sr_module.c:943
>>>>>> #18 0x0000000000542971 in init_mod_child (m=0x7f6af0f72968,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #19 0x0000000000542971 in init_mod_child (m=0x7f6af0f73d38,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #20 0x0000000000542971 in init_mod_child (m=0x7f6af0f74670,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #21 0x0000000000542971 in init_mod_child (m=0x7f6af0f76708,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #22 0x0000000000542971 in init_mod_child (m=0x7f6af0f76c08,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #23 0x0000000000542971 in init_mod_child (m=0x7f6af0f770d0,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #24 0x0000000000542971 in init_mod_child (m=0x7f6af0f77cf0,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #25 0x0000000000542971 in init_mod_child (m=0x7f6af0f78808,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #26 0x0000000000542971 in init_mod_child (m=0x7f6af0f78bd8,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #27 0x0000000000542971 in init_mod_child (m=0x7f6af0f794c8,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #28 0x0000000000542971 in init_mod_child (m=0x7f6af0f79920,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #29 0x0000000000542971 in init_mod_child (m=0x7f6af0f7a330,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #30 0x0000000000542971 in init_mod_child (m=0x7f6af0f7afd0,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #31 0x0000000000542971 in init_mod_child (m=0x7f6af0f7bc80,
rank=0)
>>>>>> at core/sr_module.c:939
>>>>>> #32 0x000000000054303d in init_child (rank=0) at
core/sr_module.c:970
>>>>>> #33 0x0000000000425399 in main_loop () at main.c:1701
>>>>>> #34 0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>> main.c:2638
>>>>>>
>>>>>> And this:
>>>>>> [New LWP 15804]
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> Using host libthread_db library
>>>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>> #0 0x00007f6af0b90b70 in free_cell_helper
>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630
"timer.c",
>>>>>> fline=654) at h_table.c:230
>>>>>> 230 foo = tt->next;
>>>>>> (gdb) bt
>>>>>> #0 0x00007f6af0b90b70 in free_cell_helper
>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630
"timer.c",
>>>>>> fline=654) at h_table.c:230
>>>>>> #1 0x00007f6af0c24409 in wait_handler (ti=932640643,
>>>>>> wait_tl=0x7f6ab0a6bb28, data=0x7f6ab0a6baa8) at timer.c:654
>>>>>> #2 0x00000000004bb445 in timer_list_expire (t=932640643,
>>>>>> h=0x7f6ab03ad158, slow_l=0x7f6ab03ae480, slow_mark=271) at
core/timer.c:874
>>>>>> #3 0x00000000004bb8ab in timer_handler () at core/timer.c:939
>>>>>> #4 0x00000000004bbd30 in timer_main () at core/timer.c:978
>>>>>> #5 0x00000000004250f9 in main_loop () at main.c:1691
>>>>>> #6 0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>> main.c:2638
>>>>>>
>>>>>> On Wed, Sep 5, 2018 at 3:13 PM Daniel-Constantin Mierla <
>>>>>> miconda(a)gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> the backtrace doesn't show any hint about kamailio, only
from
>>>>>>> closelog() up.
>>>>>>>
>>>>>>> It may be the core generated by shutdown procedure, have you
>>>>>>> enabled one core file per pid/process? If not, do it and
reproduce the
>>>>>>> issue again, you may get two core files, one being the
runtime issue and
>>>>>>> the other one from shutdown procedure, which likely is an
effect of the
>>>>>>> other one. The one from the runtime is more relevant.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 05.09.18 10:09, Jayesh Nambiar wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>> I'm using kamailio 5.1.5 with evapi. I have a node.js
connecting
>>>>>>> with kamailio evapi to which I send events and also consume
events based on
>>>>>>> which I do the routing. I have 8 evapi workers defined in the
config.
>>>>>>> The problem is that kamailio randomly crashes on high load.
I'm
>>>>>>> assuming that it is related to the evapi module as rest of
the config is
>>>>>>> pretty straight forward. I could get a core file and
here's the core dump:
>>>>>>> [New LWP 14042]
>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>> Using host libthread_db library
>>>>>>> “/lib/x86_64-linux-gnu/libthread_db.so.1”.
>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie’.
>>>>>>> Program terminated with signal SIGABRT, Aborted.
>>>>>>> #0 0x00007f9995283428 in __GI_raise (sig=sig@entry=6) at
>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or
directory.
>>>>>>> (gdb) bt
>>>>>>> #0 0x00007f9995283428 in __GI_raise (sig=sig@entry=6) at
>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>> #1 0x00007f999528502a in __GI_abort () at abort.c:89
>>>>>>> #2 0x000000000041a029 in sig_alarm_abort (signo=14) at
main.c:646
>>>>>>> #3 <signal handler called>
>>>>>>> #4 0x00007f999534f497 in __libc_cleanup_routine
(f=<optimized
>>>>>>> out>) at ../sysdeps/nptl/libc-lockP.h:291
>>>>>>> #5 closelog () at ../misc/syslog.c:415
>>>>>>> #6 0x0000000000000000 in ?? ()
>>>>>>>
>>>>>>> Any help in this regards is would allow me to identify the
reason
>>>>>>> of the crash. Thanks for the support.
>>>>>>>
>>>>>>> - Jayesh
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Kamailio (SER) - Users Mailing
Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>>>>> Kamailio World Conference --
www.kamailioworld.com
>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda
--
www.linkedin.com/in/miconda
>>>>>> Kamailio World Conference --
www.kamailioworld.com
>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>>>
>>>>>>
>>>>> --
>>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>>> Kamailio World Conference --
www.kamailioworld.com
>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>>
>>>>>
>>>> --
>>>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>> Kamailio World Conference --
www.kamailioworld.com
>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>>>
>>>>
>> --
>> Daniel-Constantin Mierla --
www.asipto.comwww.twitter.com/miconda --
www.linkedin.com/in/miconda
>> Kamailio World Conference --
www.kamailioworld.com
>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin --
www.asipto.com
>>
>>