Crash on processing dialog BYE

List overview All Threads
Download

newer

older

SF.net SVN:...

git:master: dialog(k): reset the...

Alex Balashov

19 Oct 2010 19 Oct '10

11:02 p.m.

We had another one of these today, under high call volume:

(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398

Is this a regression, or memory corruption from something else?

This is with CANCEL_REASON_SUPPORT #undef'd in tm_reply.h, btw.

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

Show replies by date

Andrei Pelinescu-Onciul

20 Oct 20 Oct

4:54 p.m.

On Oct 19, 2010 at 19:02, Alex Balashov abalashov@evaristesys.com wrote:

...

We had another one of these today, under high call volume:

(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398

Is this a regression, or memory corruption from something else?

It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).

...

This is with CANCEL_REASON_SUPPORT #undef'd in tm_reply.h, btw.

Andrei

Alex Balashov

5:26 p.m.

On 10/20/2010 12:54 PM, Andrei Pelinescu-Onciul wrote:

...

On Oct 19, 2010 at 19:02, Alex Balashovabalashov@evaristesys.com wrote:

...
We had another one of these today, under high call volume:

(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398

Is this a regression, or memory corruption from something else?

It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).

Found it:

Oct 19 18:47:35 kamgw01 /usr/local/sbin/kamailio[21845]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_hash.c: dlg_set_leg_info(301) - aborting Oct 19 18:47:36 kamgw01 /usr/local/sbin/kamailio[21830]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: tm: h_table.c: free_cell(209) - aborting

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

Timo Reimann

21 Oct 21 Oct

1:50 p.m.

Hmm, let me try to break it down:

On 20.10.2010 19:26, Alex Balashov wrote:

...

On 10/20/2010 12:54 PM, Andrei Pelinescu-Onciul wrote:

...
On Oct 19, 2010 at 19:02, Alex Balashovabalashov@evaristesys.com wrote:

...
We had another one of these today, under high call volume:

(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447

^^^^^^^^^^^^^^^^^^^^^^

That corresponds to a call to shm_free() regarding the callee's CSeq number:

if (dlg->cseq[DLG_CALLEE_LEG].s) shm_free(dlg->cseq[DLG_CALLEE_LEG].s);

As you can see, it's checked.

...

...
...
#3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at

It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).

Found it:

Oct 19 18:47:35 kamgw01 /usr/local/sbin/kamailio[21845]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_hash.c: dlg_set_leg_info(301) - aborting

That corresponds to another freeing operation, again making sure beforehand that memory is still allocated:

if (dlg->cseq[leg].s) shm_free(dlg->cseq[leg].s);

...

Oct 19 18:47:36 kamgw01 /usr/local/sbin/kamailio[21830]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: tm: h_table.c: free_cell(209) - aborting

That corresponds to freeing a series of To-tags:

tt=dead_cell->fwded_totags; while(tt) { ... shm_free_unsafe(tt); ... }

Although it's unsafe it doesn't look related to a dialog's CSeq number or even a dialog structure.

In short: I don't quite get it. :)

--Timo

Ovidiu Sas

1:55 p.m.

I think the memory is corrupted here and it was corrupted before these calls were made. There was a similar issue in older versions when a module was run on top of the dialog module. That was fixed by introducing a new callback in the tm module.

Here, it may be that the memory was corrupted by some other module and the dialog caught it.

Regards, Ovidiu Sas

On Thu, Oct 21, 2010 at 9:50 AM, Timo Reimann timo.reimann@1und1.de wrote:

...

Hmm, let me try to break it down:

On 20.10.2010 19:26, Alex Balashov wrote:

...
On 10/20/2010 12:54 PM, Andrei Pelinescu-Onciul wrote:

...
On Oct 19, 2010 at 19:02, Alex Balashovabalashov@evaristesys.com wrote:

...
We had another one of these today, under high call volume:

(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447

^^^^^^^^^^^^^^^^^^^^^^

That corresponds to a call to shm_free() regarding the callee's CSeq number:

if (dlg->cseq[DLG_CALLEE_LEG].s) shm_free(dlg->cseq[DLG_CALLEE_LEG].s);

As you can see, it's checked.

...
...
...
#3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at

It looks like somebody tried to free() twice the same pointer (if you look in the log you will see a BUG message about it, including the file and line where it was free()'d first).

Found it:

Oct 19 18:47:35 kamgw01 /usr/local/sbin/kamailio[21845]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_hash.c: dlg_set_leg_info(301) - aborting

That corresponds to another freeing operation, again making sure beforehand that memory is still allocated:

if (dlg->cseq[leg].s) shm_free(dlg->cseq[leg].s);

...
Oct 19 18:47:36 kamgw01 /usr/local/sbin/kamailio[21830]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: tm: h_table.c: free_cell(209) - aborting

That corresponds to freeing a series of To-tags:

tt=dead_cell->fwded_totags; while(tt) { ... shm_free_unsafe(tt); ... }

Although it's unsafe it doesn't look related to a dialog's CSeq number or even a dialog structure.

In short: I don't quite get it. :)

--Timo

sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

Alex Balashov

2:07 p.m.

On 10/21/2010 09:55 AM, Ovidiu Sas wrote:

...

I think the memory is corrupted here and it was corrupted before these calls were made.

I agree with this interpretation.

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

Daniel-Constantin Mierla

29 Oct 29 Oct

10:02 a.m.

Hello,

I did a small fix to dialog module, for a potential double free. See if the problem still persists.

Thanks, Daniel

On 10/21/10 4:07 PM, Alex Balashov wrote:

...

On 10/21/2010 09:55 AM, Ovidiu Sas wrote:

...
I think the memory is corrupted here and it was corrupted before these calls were made.

I agree with this interpretation.

-- Daniel-Constantin Mierla http://www.asipto.com

Alex Balashov

10:05 a.m.

Given the difficulty of reproducing the problem it will be hard to see, but I will try. :-)

On 10/29/2010 06:02 AM, Daniel-Constantin Mierla wrote:

...

Hello,

I did a small fix to dialog module, for a potential double free. See if the problem still persists.

Thanks, Daniel

On 10/21/10 4:07 PM, Alex Balashov wrote:

...
On 10/21/2010 09:55 AM, Ovidiu Sas wrote:

...
I think the memory is corrupted here and it was corrupted before these calls were made.

I agree with this interpretation.

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

Timo Reimann

21 Oct 21 Oct

9:41 a.m.

On 20.10.2010 01:02, Alex Balashov wrote:

...

We had another one of these today, under high call volume:

(gdb) where #0 0x0000003a60430265 in raise () from /lib64/libc.so.6 #1 0x0000003a60431d10 in abort () from /lib64/libc.so.6 #2 0x0000000000530a91 in qm_free (qm=0x2b343e1aa000, p=0x2b343eddf6e8, file=0x2b343dd860c3 "dialog: dlg_hash.c", func=0x2b343dd86b42 "destroy_dlg", line=176) at mem/q_malloc.c:447 #3 0x00002b343dd6a2ea in destroy_dlg (dlg=0x2b343fe1d6f8) at dlg_hash.c:176 #4 0x00002b343dd6d33a in unref_dlg (dlg=0x2b343fe1d6f8, cnt=1) at dlg_hash.c:591 #5 0x00002b343dd73a04 in profile_cleanup (msg=<value optimized out>, flags=<value optimized out>, param=0x6) at dlg_profile.c:317 #6 0x00000000004be9a1 in exec_post_script_cb (msg=0xa06e80, type=<value optimized out>) at script_cb.c:195 #7 0x00000000004989f0 in receive_msg ( buf=0x8b6300 "BYE sip:15746317356@yyy.yyy.yyy.yyy:5060 SIP/2.0\r\nRecord-Route: sip:yyy.yyy.yyy.yyy;lr;ftag=gK0ebaf6d4\r\nRecord-Route: sip:67.231.8.89;lr;ftag=gK0ebaf6d4\r\nVia: SIP/2.0/UDP xxx.xxx.xxx.xxx;branch=z9hG4bKd757."..., len=<value optimized out>, rcv_info=0x7fff4a043230) at receive.c:221 #8 0x000000000052576a in udp_rcv_loop () at udp_server.c:532 #9 0x0000000000468dcd in main_loop () at main.c:1554 #10 0x000000000046be9f in main (argc=<value optimized out>, argv=0x7fff4a043508) at main.c:2398

Is this a regression, or memory corruption from something else?

This is with CANCEL_REASON_SUPPORT #undef'd in tm_reply.h, btw.

What version of Kamailio are you using?

--Timo

Alex Balashov

12:33 p.m.

On Oct 21, 2010, at 5:41 AM, Timo Reimann timo.reimann@1und1.de wrote:

"What version of Kamailio are you using?"

Sorry, all these reports from me involve master:HEAD within last few days.

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

Ovidiu Sas

1:25 p.m.

Do you see any other errors in the logs? Maybe something went wrong and there was a callback from tm trying to free the dialog structure at the same time as the dialog itself was shutting down.

Regards, Ovidiu Sas

On Thu, Oct 21, 2010 at 8:33 AM, Alex Balashov abalashov@evaristesys.com wrote:

...

On Oct 21, 2010, at 5:41 AM, Timo Reimann timo.reimann@1und1.de wrote:

"What version of Kamailio are you using?"

Sorry, all these reports from me involve master:HEAD within last few days.

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

5363

Age (days ago)

5373

Last active (days ago)

sr-dev@lists.kamailio.org

10 comments

5 participants

tags (0)

participants (5)

Alex Balashov
Andrei Pelinescu-Onciul
Daniel-Constantin Mierla
Ovidiu Sas
Timo Reimann