Hello,
On 10/31/11 7:56 AM, Klaus Darilion wrote:
Hi Daniel!
The "out of memory" happened again. This time they were able to dump
the memory statistics before restarting the server.
There are almost no allocations from other modules, but lot of
allocations from usrloc and snmpstats:
# grep 'd from usrloc' syslog_core_dump|wc -l
138083
# grep 'd from snmpstats' syslog_core_dump|wc -l
2837533
is this command catching freed chunks as well?
Can you send the name of the files and lines that allocates memory
chunks and repeat a lot?
Thus, snmpstats seems guilty. What about usrloc?
Around 2000 clients
are registered to this Kamailio. I think 138.000 allocations for just
2000 clients is too much. Are those usrloc allocations related to the
snmpstats problem you mentioned?
AFAIS, your patch was done before 3.2 branch, thus updating to 3.2
should fix the issue (as default=turned off), correct?
Yes, it is in 3.2.0 and I hope I caught it all, at least that looked as
a problem.
Cheers,
Daniel
Thanks
Klaus
On 17.10.2011 09:33, Daniel-Constantin Mierla wrote:
Hi Klaus,
over the weekend I looked a bit at snmpstats module. These allocated
chunks are for exporting location records. Are you pulling them over
snmp? At the first sight, there should be a free of the memory when the
records are consumed.
The fact is that they are not pulled from usrloc module at the time of
the request over snmp, but cached in snmp when registration happens.
Practically, it is a partial clone of usrloc commands, which is not the
best solution IMO, but I am not the developer. For the moment, I added a
parameter to control whether the location records should be cached by
snmpstats module or not (if not, they cannot be exported), to fix this
issue. If you actually pull the location records over snmp, let me know.
I could not test, but if you can give a try (maybe you have a testbed
for 3.2 with snmpstats) and see if the memory is steady with
export_registrar set to 0 (which is default):
http://kamailio.org/docs/modules/devel/modules_k/snmpstats.html#id2539456
Cheers,
Daniel
On 10/6/11 6:03 PM, Daniel-Constantin Mierla wrote:
Hello,
seem the leak is in snmpstats, I see lot of allocations like:
ALERT: qm_status: 37599. N address=0xf30cdf74 frag=0xf30cdf5c size=20
used=1
ALERT: qm_status: alloc'd from snmpstats: interprocess_buffer.c:
handleContactCallbacks(143)
ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed
ALERT: qm_status: 37600. N address=0xf30cdfb8 frag=0xf30cdfa0 size=16
used=1
ALERT: qm_status: alloc'd from snmpstats: utilities.c:
convertStrToCharString(62)
ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed
There are some from usrloc, but very likely they are ok, because they
are persistent in shm for long time, unless snmpstats asks for some
clones of the structures from usrloc and forgets to free them (i see
one allocation is from handleContactCallbacks).
No time to look in the sources, but this is a lead to follow if you
want to investigate further.
In general, fr a memleak you have to look at allocated chunks that are
done from same place in the code and there are many of them. The
decide whether it is something that should be there for long time
(like usrloc records) or they should be freed quicker comparing with
the number of allocations.
Pkg log looks very clean, allocations only from startup time (maybe is
the main process).
Cheers,
Daniel
On 10/6/11 5:31 PM, Klaus Darilion wrote:
Indeed, DBG_QM_MALLOC is defined. So I have set
memlog=1 and dumped
mem_info with:
sercmd cfg.set_now_int core mem_dump_pkg 13286
sercmd cfg.set_now_int core mem_dump_shm 13286
The dumps were done after ~1h uptime. I can not offload the traffic
and wait until transactions are freed, thus the logs are quite huge
(~15MByte)
http://pernau.at/kd/memlog.zip
I have no idea for what I should look for - any hints how to analyze
the mem_dump?
Thanks
Klaus
On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:
> Hello,
>
> On 10/5/11 11:18 AM, Klaus Darilion wrote:
>>
>>
>> On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:
>>> Hello,
>>>
>>> On 10/4/11 12:27 PM, Klaus Darilion wrote:
>>>> Meanwhile the server was restarted and the DB problems were
>>>> fixed. As
>>>> it is a production server I can not reproduce anymore.
>>>
>>> So, once it started it didn't recovered, continued always with that
>>> error? How much of shm did you configure?
>>>
>>> You can try to attach from time to time to one process (can be
>>> even the
>>> main one to avoid blocking a sip worker) and walk through the shm
>>> allocated chunks, in order to see if there are some unexpected
>>> repetitions of allocation from same place in sources.
>>>
>>> I posted the gdb script for walking through pkg at some point, the
>>> difference will be to start from the head of shm list (i.e.,
>>> starting
>>> with shm_block->first_frag instead of mem_block->first_frag):
>>>
>>>
http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_th…
>>>
>>>
>>>
>>
>> Hi Daniel!
>>
>> After reading this wiki page I came to the conclusion that for
>> further
>> debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory
>> manager instead of F_MALLOC). With the default memory manager it is
>> not possible to debug the problem. Is it correct?
> in 3.1 malloc debug was left on (with the goal of catching buffer
> overflows quickly after several years of development of no using this
> flag in production), so unless you switched if off, you should get
> the
> reports. you can check in the output of kamailio -V
>
> Cheers,
> Daniel
>
_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users(a)lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
--
Daniel-Constantin Mierla --
http://www.asipto.com
Kamailio Advanced Training, Dec 5-8, Berlin:
http://asipto.com/u/kat
http://linkedin.com/in/miconda --
http://twitter.com/miconda