kamailio core at qm_status

List overview All Threads
Download

newer

older

patch: sdpops - new function...

git:master: tcp: fix _wbufq_insert...

Jijo

16 Jul 2012 16 Jul '12

2:27 p.m.

Hi All,

I'm observing a core intermittently at "qm_status (qm=0x786cd000) at mem/q_malloc.c:763" for kamailio version 3.1.0

looking at the backtrace this is occurring while doing a tcp buffer overrun. Please have a look and let me know if anyone observed this issue or how can we debug it.

Please find the backtrace and kamailio version

osbprod-V2R0:~ # /usr/sbin/kamailio -V version: kamailio 3.1.0 (i386/linux) 21a375 flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, PKG_SIZE 15MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 21a375 compiled on 12:38:44 Apr 26 2012 with gcc 4.5.0

#0 qm_status (qm=0x786cd000) at mem/q_malloc.c:763

#1 0x0819c0f3 in qm_debug_frag (qm=0x786cd000, f=<value optimized out>) at mem/q_malloc.c:167

#2 0x0819c352 in qm_free (qm=0x786cd000, p=0x7a6cb6ac, file=0x821cce2 "<core>: tcp_main.c", func=0x822285f "wbufq_run", line=880) at mem/q_malloc.c:457

#3 0x0815f686 in wbufq_run (tcpconn=0x7a1a7548, ev=4, fd_i=-1) at tcp_main.c:880

#4 handle_tcpconn_ev (tcpconn=0x7a1a7548, ev=4, fd_i=-1) at tcp_main.c:4141

#5 0x08169bf2 in io_wait_loop_epoll () at io_wait.h:1092

#6 tcp_main_loop () at tcp_main.c:4606

#7 0x080b0404 in main_loop () at main.c:1655

#8 0x080b1f84 in main (argc=9, argv=0xbfb4c9f4) at main.c:2446

(gdb) l

758 LOG_(DEFAULT_FACILITY, memlog, "qm_status: ",

759 "dumping free list stats :\n");

760 for(h=0,i=0;h<QM_HASH_SIZE;h++){

761 unused=0;

762 for (f=qm->free_hash[h].head.u.nxt_free,j=0;

763 f!=&(qm->free_hash[h].head); f=f->u.nxt_free, i++, j++){

764 if (!FRAG_WAS_USED(f)){

765 unused++;

766 #ifdef DBG_QM_MALLOC

767 LOG_(DEFAULT_FACILITY, memlog, "qm_status: ",

Attachments:

attachment.html (text/html — 3.1 KB)

Show replies by date

Jon Bonilla

16 Jul 16 Jul

3:12 p.m.

El Mon, 16 Jul 2012 10:27:42 -0400 Jijo realjijo@gmail.com escribió:

...

Hi All,

I'm observing a core intermittently at "qm_status (qm=0x786cd000) at mem/q_malloc.c:763" for kamailio version 3.1.0

I'd say that you're using a very old version. You should update your branch to 3.1.6 or upgrade to a newer branch.

Jijo

3:28 p.m.

Thanks.. It is not easy to upgrade as it is happening at customer system. Is there any change occurred for this issue.I looked at it, but didn't see anything in q_malloc.c/qm_status()

On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla manwe@aholab.ehu.es wrote:

...

El Mon, 16 Jul 2012 10:27:42 -0400 Jijo realjijo@gmail.com escribió:

...
Hi All,

I'm observing a core intermittently at "qm_status (qm=0x786cd000) at mem/q_malloc.c:763" for kamailio version 3.1.0

I'd say that you're using a very old version. You should update your branch to 3.1.6 or upgrade to a newer branch.

sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

Daniel-Constantin Mierla

17 Jul 17 Jul

7:06 a.m.

Hello,

does it keep being or it is one time and that's it?

That is printed at shut down or if pkg_status() or shm_status() is executed from some part of code or in config via cfgutils module functions.

You can get rid of them by setting memdbg and memlog to a value higher than debug global parameter.

Cheers, Daniel

On 7/16/12 5:28 PM, Jijo wrote:

...

Thanks.. It is not easy to upgrade as it is happening at customer system. Is there any change occurred for this issue.I looked at it, but didn't see anything in q_malloc.c/qm_status()

On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla <manwe@aholab.ehu.es mailto:manwe@aholab.ehu.es> wrote:
El Mon, 16 Jul 2012 10:27:42 -0400
Jijo <realjijo@gmail.com <mailto:realjijo@gmail.com>> escribió:

> Hi All,
>
> I'm observing a core intermittently at "qm_status (qm=0x786cd000) at
> mem/q_malloc.c:763" for kamailio version 3.1.0
>

I'd say that you're using a very old version. You should update
your branch to
3.1.6 or upgrade to a newer branch.


_______________________________________________
sr-dev mailing list
sr-dev@lists.sip-router.org <mailto:sr-dev@lists.sip-router.org>
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- Daniel-Constantin Mierla - http://www.asipto.com http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 - http://asipto.com/u/katu Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 - http://asipto.com/u/kpw

Jijo

4:03 p.m.

Hi,

This is not happening at shutdown or status check. Its aborting when the system is active.

Thanks Jijo On Tue, Jul 17, 2012 at 3:06 AM, Daniel-Constantin Mierla <miconda@gmail.com

...

wrote:

...

Hello,

does it keep being or it is one time and that's it?

That is printed at shut down or if pkg_status() or shm_status() is executed from some part of code or in config via cfgutils module functions.

You can get rid of them by setting memdbg and memlog to a value higher than debug global parameter.

Cheers, Daniel

On 7/16/12 5:28 PM, Jijo wrote:

Thanks.. It is not easy to upgrade as it is happening at customer system. Is there any change occurred for this issue.I looked at it, but didn't see anything in q_malloc.c/qm_status()

On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla manwe@aholab.ehu.es wrote:

...
El Mon, 16 Jul 2012 10:27:42 -0400 Jijo realjijo@gmail.com escribió:

...
Hi All,

I'm observing a core intermittently at "qm_status (qm=0x786cd000) at mem/q_malloc.c:763" for kamailio version 3.1.0

I'd say that you're using a very old version. You should update your branch to 3.1.6 or upgrade to a newer branch.

sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

sr-dev mailing listsr-dev@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- Daniel-Constantin Mierla - http://www.asipto.comhttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 - http://asipto.com/u/katu Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 - http://asipto.com/u/kpw

Daniel-Constantin Mierla

18 Jul 18 Jul

4:42 p.m.

Hello,

sorry, I just read the last messages in the thread and I didn't noticed it is about a crash, but thought it is about an annoying log message.

The backtrace is no longer matching the latest sources of the 3.1 branch, but I expect it is due to a double free issue, so you have to update to latest 3.1.6, as suggested by other users here. There were many fixes from 3.1.0 to 3.1.6.

Cheers, Daniel

On 7/17/12 6:03 PM, Jijo wrote:

...

Hi,

This is not happening at shutdown or status check. Its aborting when the system is active.

Thanks Jijo On Tue, Jul 17, 2012 at 3:06 AM, Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> wrote:

Hello,

does it keep being or it is one time and that's it?

That is printed at shut down or if pkg_status() or shm_status() is
executed from some part of code or in config via cfgutils module
functions.

You can get rid of them by setting memdbg and memlog to a value
higher than debug global parameter.

Cheers,
Daniel


On 7/16/12 5:28 PM, Jijo wrote:

...

Thanks.. It is not easy to upgrade as it is happening at customer
system.
Is there any change occurred for this issue.I looked at it, but
didn't see anything in q_malloc.c/qm_status()


On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla
<manwe@aholab.ehu.es <mailto:manwe@aholab.ehu.es>> wrote:

    El Mon, 16 Jul 2012 10:27:42 -0400
    Jijo <realjijo@gmail.com <mailto:realjijo@gmail.com>> escribió:

    > Hi All,
    >
    > I'm observing a core intermittently at "qm_status
    (qm=0x786cd000) at
    > mem/q_malloc.c:763" for kamailio version 3.1.0
    >

    I'd say that you're using a very old version. You should
    update your branch to
    3.1.6 or upgrade to a newer branch.


    _______________________________________________
    sr-dev mailing list
    sr-dev@lists.sip-router.org <mailto:sr-dev@lists.sip-router.org>
    http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev




_______________________________________________
sr-dev mailing list
sr-dev@lists.sip-router.org  <mailto:sr-dev@lists.sip-router.org>
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- 
Daniel-Constantin Mierla -http://www.asipto.com
http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  -http://www.linkedin.com/in/miconda
Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 -http://asipto.com/u/katu
Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 -http://asipto.com/u/kpw

Jijo

19 Sep 19 Sep

2:54 p.m.

Hi All,

Finally i found the issue,

Here is one of the bad trace for SUBSCRIBE(722bytes) and NOTIFY(1282bytes) which corrupted the memory. The messages came in the order NOTIFY and SUBSCRIBE. The core is generated in a different place but I believe this could be the reason for memory corruption.

Here is the trace UDP Process 27294 processing NOTIFY and Process 27303 processing SUBSCRIBE .

The explanation and implementation is below

2012-09-19T02:06:17+01:00 [info] sipserver: [27303] INFO: <script>: CI=1-3292@10.233.20.152 -R39 - Entry R-URI=1234 FD=10.233.20.152 SI=10.233.20.152

2012-09-19T02:06:17+01:00 [info] sipserver: [27294] INFO: <script>: CI=1-3292@10.233.20.152 -R1 - Force LAN socket: tcp:10.233.20.151:5060<null>

2012-09-19T02:06:17+01:00 [info] sipserver: [27303] INFO: <script>: CI=1-3292@10.233.20.152 -R1 - Force LAN socket: tcp:10.233.20.151:5060<null>

2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core> [tcp_main.c:2357]: tcp_conn_send_put : calling wbufq_add

2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core> [tcp_main.c:730]: ERROR: wbufq_add(722 bytes): buf:SUBSCRIBE sip:1234@10.233.20.141:5063;transport=tcp SIP/2.0

Record-Route: <sip:10.233.20.151;tran

2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core> [tcp_main.c:747]: ERROR: wbufq_add(722 bytes): first:b00519f4 last:b00519f4

2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core> [tcp_main.c:774]: ERROR: wbufq_add(2 last free crt_size:722): first:b00519f4 last:b00519f4

2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core> [tcp_main.c:796]: ERROR: wbufq_insert(1282 bytes): buf:NOTIFY sip:1234@10.233.20.141:5063;transport=tcp SIP/2.0

Record-Route: <sip:10.233.20.151;transpo

2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core> [tcp_main.c:801]: ERROR: wbufq_insert(2 last free ): first:b00519f4 last:b00519f4

2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core> [tcp_main.c:820]: ERROR: wbufq_insert(22 last free ): first:b00519f4 last:b00519f4

2012-09-19T02:06:17+01:00 [err] sipserver: [27359] ERROR: <core> [tcp_main.c:887]: ERROR: wbufq_run(3 last free ): first:b00519f4 last:b00519f4

2012-09-19T02:06:17+01:00 [crit] sipserver: [27359] : <core> [mem/q_malloc.c:157]: BUG: qm_*: fragm. 0xb00519dc (address 0xb00519f4) end overwritten(0, 0)!

2012-09-19T02:06:18+01:00 [alert] sipserver: [27265] ALERT: <core> [main.c:755]: child process 27359 exited by a signal 11

2012-09-19T02:06:18+01:00 [alert] sipserver: [27265] ALERT: <core> [main.c:758]: core was generated

2012-09-19T02:06:18+01:00 [info] sipserver: [27265] INFO: <core> [main.c:770]: INFO: terminating due to SIGCHLD

Process 27294(NOTIFY) created the TCP connection structure for destination IP and just before calling wbufq_insert(), context switch happened and process 27303(SUBSCRIBE) got the cpu. Since the connection structure is already available process 27303 add the SUBSCRIBE message(722 bytes) to wbufq. Afterwards process 27294 got the CPU and invoked wbufq_insert() which basically corrupted the memory due to an overflow with the existing implementation.

inline static int _wbufq_insert(struct tcp_connection* c, const char* data,

unsigned int size)

{

struct tcp_wbuffer_queue* q;

struct tcp_wbuffer* wb;

q=&c->wbuf_q;

if (likely(q->first==0)) /* if empty, use wbufq_add */

return _wbufq_add(c, data, size);

if ((q->first==q->last) && ((q->last->b_size-q->last_used)>=size)){

/* one block with enough space in it for size bytes */

memmove(q->first->buf+size, q->first->buf, size);

memcpy(q->first->buf, data, size);

q->last_used+=size;

}

The above condition shall be true in this case and memmove was moving the pointer which was causing the overflow.

memmove(void *dest, const void *src, size_t n);

As per the memmove man page, the src shall be copied with size ‘n’ to a temporary buffer and then temporary buffer to dest.

dest is q->first->buf+size: which is basically (q->first->buf + NOTIFY MSG SIZE). so the dst will move my 1282 bytes, so we have remaining space of only 818 bytes.

src is q->first->buf: which is basically copied to temp buffer with NOTIFY SIZE(1282bytes).

Finally we are moving the buffer from temporary buffer of size 1282 bytes to buffer which we left with 818 bytes, This basically corrupt the memory and on wbufq_run we see the memory corruption

2012-09-19T02:06:17+01:00 [crit] sipserver: [27359] : <core> [mem/q_malloc.c:157]: BUG: qm_*: fragm. 0xb00519dc (address 0xb00519f4) end overwritten(0, 0)!

I think we don’t need memove, so we can change the code as below

if ((q->first==q->last) && ((q->last->b_size-q->last_used)>=size)){

/* one block with enough space in it for size bytes */

//memmove(q->first->buf+size, q->first->buf, size);

memcpy(q->first->buf+q->last_used, data, size);

q->last_used+=size;

}

OR we need only the else part as it always add the block to the first.

Thanks Jijo

On Wed, Jul 18, 2012 at 12:42 PM, Daniel-Constantin Mierla < miconda@gmail.com> wrote:

...

Hello,

sorry, I just read the last messages in the thread and I didn't noticed it is about a crash, but thought it is about an annoying log message.

The backtrace is no longer matching the latest sources of the 3.1 branch, but I expect it is due to a double free issue, so you have to update to latest 3.1.6, as suggested by other users here. There were many fixes from 3.1.0 to 3.1.6.

Cheers, Daniel

On 7/17/12 6:03 PM, Jijo wrote:

Hi,

This is not happening at shutdown or status check. Its aborting when the system is active.

Thanks Jijo On Tue, Jul 17, 2012 at 3:06 AM, Daniel-Constantin Mierla < miconda@gmail.com> wrote:

...
Hello,

does it keep being or it is one time and that's it?

That is printed at shut down or if pkg_status() or shm_status() is executed from some part of code or in config via cfgutils module functions.

You can get rid of them by setting memdbg and memlog to a value higher than debug global parameter.

Cheers, Daniel

On 7/16/12 5:28 PM, Jijo wrote:

Thanks.. It is not easy to upgrade as it is happening at customer system. Is there any change occurred for this issue.I looked at it, but didn't see anything in q_malloc.c/qm_status()

On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla manwe@aholab.ehu.eswrote:

...
El Mon, 16 Jul 2012 10:27:42 -0400 Jijo realjijo@gmail.com escribió:

...
Hi All,

I'm observing a core intermittently at "qm_status (qm=0x786cd000) at mem/q_malloc.c:763" for kamailio version 3.1.0

I'd say that you're using a very old version. You should update your branch to 3.1.6 or upgrade to a newer branch.

sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

sr-dev mailing listsr-dev@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- Daniel-Constantin Mierla - http://www.asipto.comhttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 - http://asipto.com/u/katu Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 - http://asipto.com/u/kpw

-- Daniel-Constantin Mierla - http://www.asipto.comhttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 - http://asipto.com/u/katu Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 - http://asipto.com/u/kpw

Andrei Pelinescu-Onciul

1 Oct 1 Oct

10:59 a.m.

On Sep 19, 2012 at 10:54, Jijo realjijo@gmail.com wrote:

...

Hi All,

Finally i found the issue,

Here is one of the bad trace for SUBSCRIBE(722bytes) and NOTIFY(1282bytes) which corrupted the memory. The messages came in the order NOTIFY and SUBSCRIBE. The core is generated in a different place but I believe this could be the reason for memory corruption.

Here is the trace UDP Process 27294 processing NOTIFY and Process 27303 processing SUBSCRIBE .

[...]

...

Process 27294(NOTIFY) created the TCP connection structure for destination IP and just before calling wbufq_insert(), context switch happened and process 27303(SUBSCRIBE) got the cpu. Since the connection structure is already available process 27303 add the SUBSCRIBE message(722 bytes) to wbufq. Afterwards process 27294 got the CPU and invoked wbufq_insert() which basically corrupted the memory due to an overflow with the existing implementation.

Thanks a lot, it was indeed a problem in _wbufq_insert, however the fix is slightly different (see below).

...

[...]

...

I think we don?t need memove, so we can change the code as below

We need the memmove, because the memory areas might overlap (e.g we are moving a 1000 bytes message only 512 bytes further in the buffer).

...

   if ((q->first==q->last) && ((q->last->b_size-q->last_used)>=size)){

            /* one block with enough space in it for size bytes */

            //memmove(q->first->buf+size, q->first->buf, size);

            memcpy(q->first->buf+q->last_used, data, size);

The problem here was that the wrong size was used for the memmove. The fixed version is: memmove(q->first->buf+size, q->first->buf, q->last_used);

...

            q->last_used+=size;

    }

[...]

Andrei

4658

Age (days ago)

4735

Last active (days ago)

sr-dev@lists.kamailio.org

7 comments

4 participants

tags (0)

participants (4)

Andrei Pelinescu-Onciul
Daniel-Constantin Mierla
Jijo
Jon Bonilla