We had another one of these today, under high call volume:
(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398
Is this a regression, or memory corruption from something else?
This is with CANCEL_REASON_SUPPORT #undef'd in tm_reply.h, btw.
On Oct 19, 2010 at 19:02, Alex Balashov abalashov@evaristesys.com wrote:
We had another one of these today, under high call volume:
(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398
Is this a regression, or memory corruption from something else?
It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).
This is with CANCEL_REASON_SUPPORT #undef'd in tm_reply.h, btw.
Andrei
On 10/20/2010 12:54 PM, Andrei Pelinescu-Onciul wrote:
On Oct 19, 2010 at 19:02, Alex Balashovabalashov@evaristesys.com wrote:
We had another one of these today, under high call volume:
(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398
Is this a regression, or memory corruption from something else?
It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).
Found it:
Oct 19 18:47:35 kamgw01 /usr/local/sbin/kamailio[21845]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_hash.c: dlg_set_leg_info(301) - aborting Oct 19 18:47:36 kamgw01 /usr/local/sbin/kamailio[21830]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: tm: h_table.c: free_cell(209) - aborting
Hmm, let me try to break it down:
On 20.10.2010 19:26, Alex Balashov wrote:
On 10/20/2010 12:54 PM, Andrei Pelinescu-Onciul wrote:
On Oct 19, 2010 at 19:02, Alex Balashovabalashov@evaristesys.com wrote:
We had another one of these today, under high call volume:
(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447
^^^^^^^^^^^^^^^^^^^^^^
That corresponds to a call to shm_free() regarding the callee's CSeq number:
if (dlg->cseq[DLG_CALLEE_LEG].s) shm_free(dlg->cseq[DLG_CALLEE_LEG].s);
As you can see, it's checked.
#3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at
It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).
Found it:
Oct 19 18:47:35 kamgw01 /usr/local/sbin/kamailio[21845]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_hash.c: dlg_set_leg_info(301) - aborting
That corresponds to another freeing operation, again making sure beforehand that memory is still allocated:
if (dlg->cseq[leg].s) shm_free(dlg->cseq[leg].s);
Oct 19 18:47:36 kamgw01 /usr/local/sbin/kamailio[21830]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: tm: h_table.c: free_cell(209) - aborting
That corresponds to freeing a series of To-tags:
tt=dead_cell->fwded_totags; while(tt) { ... shm_free_unsafe(tt); ... }
Although it's unsafe it doesn't look related to a dialog's CSeq number or even a dialog structure.
In short: I don't quite get it. :)
--Timo
I think the memory is corrupted here and it was corrupted before these calls were made. There was a similar issue in older versions when a module was run on top of the dialog module. That was fixed by introducing a new callback in the tm module.
Here, it may be that the memory was corrupted by some other module and the dialog caught it.
Regards, Ovidiu Sas
On Thu, Oct 21, 2010 at 9:50 AM, Timo Reimann timo.reimann@1und1.de wrote:
Hmm, let me try to break it down:
On 20.10.2010 19:26, Alex Balashov wrote:
On 10/20/2010 12:54 PM, Andrei Pelinescu-Onciul wrote:
On Oct 19, 2010 at 19:02, Alex Balashovabalashov@evaristesys.com wrote:
We had another one of these today, under high call volume:
(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447
^^^^^^^^^^^^^^^^^^^^^^
That corresponds to a call to shm_free() regarding the callee's CSeq number:
if (dlg->cseq[DLG_CALLEE_LEG].s) shm_free(dlg->cseq[DLG_CALLEE_LEG].s);
As you can see, it's checked.
#3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at
It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).
Found it:
Oct 19 18:47:35 kamgw01 /usr/local/sbin/kamailio[21845]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_hash.c: dlg_set_leg_info(301) - aborting
That corresponds to another freeing operation, again making sure beforehand that memory is still allocated:
if (dlg->cseq[leg].s) shm_free(dlg->cseq[leg].s);
Oct 19 18:47:36 kamgw01 /usr/local/sbin/kamailio[21830]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: tm: h_table.c: free_cell(209) - aborting
That corresponds to freeing a series of To-tags:
tt=dead_cell->fwded_totags; while(tt) { ... shm_free_unsafe(tt); ... }
Although it's unsafe it doesn't look related to a dialog's CSeq number or even a dialog structure.
In short: I don't quite get it. :)
--Timo
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
On 10/21/2010 09:55 AM, Ovidiu Sas wrote:
I think the memory is corrupted here and it was corrupted before these calls were made.
I agree with this interpretation.
Hello,
I did a small fix to dialog module, for a potential double free. See if the problem still persists.
Thanks, Daniel
On 10/21/10 4:07 PM, Alex Balashov wrote:
On 10/21/2010 09:55 AM, Ovidiu Sas wrote:
I think the memory is corrupted here and it was corrupted before these calls were made.
I agree with this interpretation.
Given the difficulty of reproducing the problem it will be hard to see, but I will try. :-)
On 10/29/2010 06:02 AM, Daniel-Constantin Mierla wrote:
Hello,
I did a small fix to dialog module, for a potential double free. See if the problem still persists.
Thanks, Daniel
On 10/21/10 4:07 PM, Alex Balashov wrote:
On 10/21/2010 09:55 AM, Ovidiu Sas wrote:
I think the memory is corrupted here and it was corrupted before these calls were made.
I agree with this interpretation.
On 20.10.2010 01:02, Alex Balashov wrote:
We had another one of these today, under high call volume:
(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398
Is this a regression, or memory corruption from something else?
This is with CANCEL_REASON_SUPPORT #undef'd in tm_reply.h, btw.
What version of Kamailio are you using?
--Timo
On Oct 21, 2010, at 5:41 AM, Timo Reimann timo.reimann@1und1.de wrote:
"What version of Kamailio are you using?"
Sorry, all these reports from me involve master:HEAD within last few days.
-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/
Do you see any other errors in the logs? Maybe something went wrong and there was a callback from tm trying to free the dialog structure at the same time as the dialog itself was shutting down.
Regards, Ovidiu Sas
On Thu, Oct 21, 2010 at 8:33 AM, Alex Balashov abalashov@evaristesys.com wrote:
On Oct 21, 2010, at 5:41 AM, Timo Reimann timo.reimann@1und1.de wrote:
"What version of Kamailio are you using?"
Sorry, all these reports from me involve master:HEAD within last few days.
-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev