Hello,
yes, I'll apply it, just that I was mainly offline these days to had a
proper chance to look at it.
Thanks for troubleshooting and patching,
Daniel
On 9/20/12 2:36 PM, Jijo wrote:
Hi Daniel,
This patch needs to be applied to avoid core.
Thanks
Jijo
On Wed, Sep 19, 2012 at 10:54 AM, Jijo <realjijo(a)gmail.com
<mailto:realjijo@gmail.com>> wrote:
Hi All,
Finally i found the issue,
Here is one of the bad trace for SUBSCRIBE(722bytes) and
NOTIFY(1282bytes) which corrupted the memory. The messages came in
the order NOTIFY and SUBSCRIBE. The core is generated in a
different place but I believe this could be the reason for memory
corruption.
Here is the trace UDP Process 27294processing NOTIFY and Process
27303processing SUBSCRIBE .
The explanation and implementation is below
2012-09-19T02:06:17+01:00 [info] sipserver: [27303] INFO:
<script>: CI=1-3292(a)10.233.20.152 <mailto:CI=1-3292@10.233.20.152>
-R39 - Entry R-URI=1234 FD=10.233.20.152 SI=10.233.20.152
2012-09-19T02:06:17+01:00 [info] sipserver: [27294] INFO:
<script>: CI=1-3292(a)10.233.20.152 <mailto:CI=1-3292@10.233.20.152>
-R1 - Force LAN socket: tcp:10.233.20.151:5060
<http://10.233.20.151:5060> <null>
2012-09-19T02:06:17+01:00 [info] sipserver: [27303] INFO:
<script>: CI=1-3292(a)10.233.20.152 <mailto:CI=1-3292@10.233.20.152>
-R1 - Force LAN socket: tcp:10.233.20.151:5060
<http://10.233.20.151:5060> <null>
2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
[tcp_main.c:2357]: tcp_conn_send_put : calling wbufq_add
2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
[tcp_main.c:730]: ERROR: wbufq_add(722 bytes): buf:SUBSCRIBE
sip:1234@10.233.20.141:5063;transport=tcp SIP/2.0
Record-Route: <sip:10.233.20.151;tran
2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
[tcp_main.c:747]: ERROR: wbufq_add(722 bytes): first:b00519f4
last:b00519f4
2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
[tcp_main.c:774]: ERROR: wbufq_add(2 last free crt_size:722):
first:b00519f4 last:b00519f4
2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core>
[tcp_main.c:796]: ERROR: wbufq_insert(1282 bytes): buf:NOTIFY
sip:1234@10.233.20.141:5063;transport=tcp SIP/2.0
Record-Route: <sip:10.233.20.151;transpo
2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core>
[tcp_main.c:801]: ERROR: wbufq_insert(2 last free ):
first:b00519f4 last:b00519f4
2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core>
[tcp_main.c:820]: ERROR: wbufq_insert(22 last free ):
first:b00519f4 last:b00519f4
2012-09-19T02:06:17+01:00 [err] sipserver: [27359] ERROR: <core>
[tcp_main.c:887]: ERROR: wbufq_run(3 last free ): first:b00519f4
last:b00519f4
2012-09-19T02:06:17+01:00 [crit] sipserver: [27359] : <core>
[mem/q_malloc.c:157]: BUG: qm_*: fragm. 0xb00519dc (address
0xb00519f4) end overwritten(0, 0)!
2012-09-19T02:06:18+01:00 [alert] sipserver: [27265] ALERT: <core>
[main.c:755]: child process 27359 exited by a signal 11
2012-09-19T02:06:18+01:00 [alert] sipserver: [27265] ALERT: <core>
[main.c:758]: core was generated
2012-09-19T02:06:18+01:00 [info] sipserver: [27265] INFO: <core>
[main.c:770]: INFO: terminating due to SIGCHLD
Process 27294(NOTIFY) created the TCP connection structure for
destination IP and just before calling wbufq_insert(), context
switch happened and process 27303(SUBSCRIBE) got the cpu. Since
the connection structure is already available process 27303 add
the SUBSCRIBE message(722 bytes) to wbufq. Afterwards process
27294 got the CPU and invoked wbufq_insert() which basically
corrupted the memory due to an overflow with the existing
implementation.
inline static int _wbufq_insert(struct tcp_connection* c, const
char* data,
unsigned int size)
{
struct tcp_wbuffer_queue* q;
struct tcp_wbuffer* wb;
q=&c->wbuf_q;
if (likely(q->first==0)) /* if empty, use wbufq_add */
return _wbufq_add(c, data, size);
:
:
:
:
if ((q->first==q->last) &&
((q->last->b_size-q->last_used)>=size)){
/* one block with enough space in it for size bytes */
memmove(q->first->buf+size, q->first->buf, size);
memcpy(q->first->buf, data, size);
q->last_used+=size;
}
The above condition shall be true in this case and memmove was
moving the pointer which was causing the overflow.
memmove(void *dest, const void *src, size_t n);
As per the memmove man page, the src shall be copied with size ‘n’
to a temporary buffer and then temporary buffer to dest.
dest is q->first->buf+size: which is basically (q->first->buf +
NOTIFY MSG SIZE). so the dst will move my 1282 bytes, so we have
remaining space of only 818 bytes.
src is q->first->buf: which is basically copied to temp buffer
with NOTIFY SIZE(1282bytes).
Finally we are moving the buffer from temporary buffer of size
1282 bytes to buffer which we left with 818 bytes, This basically
corrupt the memory and on wbufq_run we see the memory corruption
2012-09-19T02:06:17+01:00 [crit] sipserver: [27359] : <core>
[mem/q_malloc.c:157]: BUG: qm_*: fragm. 0xb00519dc (address
0xb00519f4) end overwritten(0, 0)!
I think we don’t need memove, so we can change the code as below
if ((q->first==q->last) &&
((q->last->b_size-q->last_used)>=size)){
/* one block with enough space in it for size bytes */
//memmove(q->first->buf+size, q->first->buf, size);
memcpy(q->first->buf+q->last_used, data, size);
q->last_used+=size;
}
OR
we need only the else part as it always add the block to the first.
Thanks
Jijo
On Wed, Jul 18, 2012 at 12:42 PM, Daniel-Constantin Mierla
<miconda(a)gmail.com <mailto:miconda@gmail.com>> wrote:
Hello,
sorry, I just read the last messages in the thread and I
didn't noticed it is about a crash, but thought it is about an
annoying log message.
The backtrace is no longer matching the latest sources of the
3.1 branch, but I expect it is due to a double free issue, so
you have to update to latest 3.1.6, as suggested by other
users here. There were many fixes from 3.1.0 to 3.1.6.
Cheers,
Daniel
On 7/17/12 6:03 PM, Jijo wrote:
Hi,
This is not happening at shutdown or status check. Its
aborting when the system is active.
Thanks
Jijo
On Tue, Jul 17, 2012 at 3:06 AM, Daniel-Constantin Mierla
<miconda(a)gmail.com <mailto:miconda@gmail.com>> wrote:
Hello,
does it keep being or it is one time and that's it?
That is printed at shut down or if pkg_status() or
shm_status() is executed from some part of code or in
config via cfgutils module functions.
You can get rid of them by setting memdbg and memlog to a
value higher than debug global parameter.
Cheers,
Daniel
On 7/16/12 5:28 PM, Jijo wrote:
Thanks.. It is not easy to upgrade as
it is happening at
customer system.
Is there any change occurred for this issue.I looked at
it, but didn't see anything in q_malloc.c/qm_status()
On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla
<manwe(a)aholab.ehu.es <mailto:manwe@aholab.ehu.es>> wrote:
El Mon, 16 Jul 2012 10:27:42 -0400
Jijo <realjijo(a)gmail.com
<mailto:realjijo@gmail.com>> escribió:
Hi All,
I'm observing a core intermittently at "qm_status
(qm=0x786cd000) at
mem/q_malloc.c:763" for kamailio version
3.1.0
I'd say that you're using a very old version. You
should update your branch to
3.1.6 or upgrade to a newer branch.
_______________________________________________
sr-dev mailing list
sr-dev(a)lists.sip-router.org
<mailto:sr-dev@lists.sip-router.org>
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
_______________________________________________
sr-dev mailing list
sr-dev(a)lists.sip-router.org <mailto:sr-dev@lists.sip-router.org>
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
--
Daniel-Constantin Mierla -http://www.asipto.com
http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda>
-http://www.linkedin.com/in/miconda
Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012
-http://asipto.com/u/katu
Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012
-http://asipto.com/u/kpw
--
Daniel-Constantin Mierla -http://www.asipto.com
http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda>
-http://www.linkedin.com/in/miconda
Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012
-http://asipto.com/u/katu
Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012
-http://asipto.com/u/kpw