xavp_rcd keeps accumulating values leading to shm exaustion You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/1111
-- Commit Summary --
* registrar: fix xavp_rcd memory consumption
-- File Changes --
M src/modules/registrar/lookup.c (8) M src/modules/registrar/reply.c (1)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/1111.patch https://github.com/kamailio/kamailio/pull/1111.diff
Are not the xavps deleted automatically when the processing of the request is finished? There should not be an explicit delete from the modules, unless strictly necessary for functionality of some feature, but the xavps and avps should not leak when processing normal SIP traffic.
Or are you using them in some event routes?
@miconda it seems so, maybe the problem lies in the delete ? i'm using in script to create a json with the `ruid` and memory is always growing. removing just these parts, memory is back to normal.
``` modparam("registrar", "xavp_rcd", "ulrcd") ... after successful save $var(amqp_payload_request) = $_s({"Event-Category" : "directory", ..., "RUID" : "$xavp(ulrcd=>ruid)", ... }); .... ```
Which memory is growing? SHM, PKG or system memory?
shared memory tracked by [root@dev-01 kamailio]# kamcmd mod.stats core shm Module: core { *xavp_new_value(94): 4808* }
only one client registering every 60 sec
On Tue, May 2, 2017 at 3:18 PM, Daniel-Constantin Mierla < notifications@github.com> wrote:
Which memory is growing? SHM, PKG or system memory?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/pull/1111#issuecomment-298649131, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvDrNYPUE7PYX0dgg3vtKCsJFVJUbQZks5r1zs3gaJpZM4NOAPn .
Can you attach log messages with debug=3 for a registration request?
[kamailio.log.gz](https://github.com/kamailio/kamailio/files/973067/kamailio.log.gz) used gzip
Hi Daniel, i isolated all the requests and made sure only registration was being processed. turns out that the problem must be elsewhere as the memory is now stable. however, the first registration uses a suspend/continue and in this case the xavp memory is not released.
[kam.log.gz](https://github.com/kamailio/kamailio/files/986341/kam.log.gz)
@miconda sorry, wrong report. the patch was applied, that's why memory is stable. initial registration issues still occurs (xavp not released after t_continue() )
@miconda this is the log without the patch applied.
[kam.log.gz](https://github.com/kamailio/kamailio/files/986380/kam.log.gz)
Back on this after Kamailio World event -- so the issue is exposed when t_suspend()/t_continue() is used for REGISTER requests? Are you using these functions in kamailio.cfg or is via kazoo module?
@miconda Hi Daniel, kazoo module does the suspend/continue, but this is only for the initial registration. the 2nd registration doesn't go thru suspend/resume but we're also getting memory leaks here.
If there is no suspend/continue, then the issue can be that the registrar module doesn't add the xavp to core list, being left unliked, otherwise the xavps (as well as avps) are destroyed at the end of executing request_route. This would have been revealed so far if there is an issue with overall avps/xavps management.
Btw, what version you are using in this case? 5.0 or 4.4?
While looking at the t_continue() I discovered that it may overwrite a previous set faked environment, which can happen if the function is used in failure_route, as a result xavps/avps lists can be lost without being freed. With the commit 9ff4dd36ad63f60905a49f8c0e437f997e54ae70, this situation should be solved.
I am not sure this is solving your case, but it would be good if you test with this patch.
I am also going to look at the registrar module to see if I spot something wrong there. Deleting the xavps in the module as proposed by this pull request is not the proper solution, it may be a workaround for some specific case, but not fixing the real cause.
using master (with your latest commits). tried the referenced commit but still have memory leak for 1st request (suspend/continue). the xavp is being created after continue (registrar reply) , it seems receive_msg cleanup is not reached (or is early reached due to suspend) in this case ? and i agree this is not the recommended fix, its only hiding possible issues. also, i applied the patch only to reply.c in registrar and it fixes re-register.
Is this with the standard kazoo platform kamailio.cfg? Can you point me to the repo with the config at the line where the xavp is created? I may help seeing the config and understanding the use case.
https://github.com/2600hz/kazoo-configs/blob/master/kamailio/registrar-role....
the xavp is not created in the script. its created by registrar build_contact (reply.c) all cases or registrar registered4 (lookup.c) that calls xavp_rcd_helper.
check route `ATTEMPT_AUTHORIZATION`
for re-register, this line https://github.com/2600hz/kazoo-configs/blob/master/kamailio/registrar-role.... check if the device is already registered and skips to save location (no suspend/continue), leak seems to occur in registrar build_contact (reply.c).
@miconda about the first registration...
the `continue` executes in pid `6518` and it never calls `xavp_reset_list()` registrar:build_contact (reply.c) is executed in this pid
``` 2(6487) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|start|received udp request REGISTER sip:teste.sip.90e9.com 2(6487) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|source 192.168.16.12:50133 -> 192.168.16.125:8000 2(6487) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|from sip:user_7z7b6s@teste.sip.90e9.com 2(6487) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|to sip:user_7z7b6s@teste.sip.90e9.com 2(6487) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|originated from external sources 2(6487) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|correcting NATed contact in registration 2(6487) INFO: <core> [core/xavp.c:461]: xavp_reset_list(): RESET XAVP 32(6517) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|authenticating user_7z7b6s@teste.sip.90e9.com via Kazoo query response 32(6517) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|end|issued auth challenge to failed registration attempt for user_7z7b6s@teste.sip.90e9.com 192.168.16.12:50133 3(6488) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|start|received udp request REGISTER sip:teste.sip.90e9.com 3(6488) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|source 192.168.16.12:50133 -> 192.168.16.125:8000 3(6488) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|from sip:user_7z7b6s@teste.sip.90e9.com 3(6488) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|to sip:user_7z7b6s@teste.sip.90e9.com 3(6488) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|originated from external sources 3(6488) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|correcting NATed contact in registration 3(6488) INFO: <core> [core/xavp.c:461]: xavp_reset_list(): RESET XAVP 33(6518) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|authenticating user_7z7b6s@teste.sip.90e9.com via Kazoo query response 33(6518) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|log|caching sip credentials for user_7z7b6s@teste.sip.90e9.com 33(6518) INFO: <script>: 79956NjgwZTVmOTQ5Zjc0MjcxMTVjYjNlOTMxNGI5NDkwYjk|end|successful registered with contact uloc-5919b147-941-1 : sip:user_7z7b6s@192.168.16.12:50133;rinstance=f24eba8fb96940a0;+sip.instance="urn:uuid:50e56137-edf1-5c8c-99ab-f5ebf869e00f";reg-id=1 6(6491) INFO: <core> [core/xavp.c:461]: xavp_reset_list(): RESET XAVP 6(6491) INFO: <core> [core/xavp.c:461]: xavp_reset_list(): RESET XAVP 8(6493) INFO: <core> [core/xavp.c:461]: xavp_reset_list(): RESET XAVP 8(6493) INFO: <core> [core/xavp.c:461]: xavp_reset_list(): RESET XAVP ```
In this case there should be a REGISTER transaction and the xavps should be linked to the transaction structure and destroyed when the transaction is cleaned up.
@miconda more info
it seems that the `faked_env( t, 0, 1);` call (restore original environent) will discard the avps xavps that can be added during processing of the script block. in this use case, registrar save (which calls reply) is called, which creates the xavps that are not being released.
from t_suspend.c ``` faked_env( t, faked_req, 1);
route_type_bk = get_route_type(); set_route_type(FAILURE_ROUTE); /* execute the pre/post -script callbacks based on original route block */ if (exec_pre_script_cb(faked_req, cb_type)>0) { if (run_top_route(route, faked_req, 0)<0) LM_ERR("failure inside run_top_route\n"); exec_post_script_cb(faked_req, cb_type); } set_route_type(route_type_bk);
/* TODO: save_msg_lumps should clone the lumps to shm mem */
/* restore original environment */ faked_env( t, 0, 1); ```
Not really -- during the execution of the route block, the x/avp lists from tm are set in the core, so the new x/avps are stored directly in the transaction. This is used for failure_route as well and any issue should have been exposed for long time. Unless it was the case of creating a faked env inside a faked env that should be fixed by my last commit.
I looked at the kazoo code and I noticed that `kz_amqp_consumer_fire_event()` is using fake sip messages to execute event routes. Are those event routes creating xavps?
Anyhow, you can try to execute `xavp_reset_list()` inside kazoo module after t_continue(), and see if it gets fixed. Then we know where to dig more.
@miconda i reverted every change and added `xavp_reset_list()` after `t_continue()` in kazoo andalso after `kz_amqp_consumer_fire_event()` and leak continues. is there a way to lookup the memory allocated values to make sure it is coming from where we think it is ?
Yes. Start kamailio and do some operations so you expose the leak.
Do a shm status dump:
* http://www.kamailio.org/docs/modules/5.1.x/modules/corex.html#corex.rpc.shm_...
Then you get details of each allocated chunks -- two pointers
* start of the memory chunk * start of the data inside the memory chunk
With gdb attach to any of the kamailio processes and print the content of the data pointer casted to xavp structure. Do it for several pointers so you see what is inside.
@miconda thanks for the info. after digging more , i think the issue is not related to `kazoo`. can you try the below script (note that kazoo is not used) ? also maybe fake a transaction to a external auth and set the `$xavp(ulattrs=>extra_info)` on the `continue`block ? with this configuration, the 1st registration doesn't leak but the re-registrations do leak.
``` ... modparam("usrloc", "xavp_contact", "ulattrs") modparam("registrar", "xavp_rcd", "ulrcd") modparam("registrar", "xavp_cfg", "regcfg") modparam("htable", "htable", "auth_cache=>size=16;autoexpire=7200;") ...
route[REGISTRAR] {
$xavp(regcfg=>match_received) = $su; if($sht(auth_cache=>$Au) != $null && registered("location", "$rz:$Au", 2, 1) == 1) { xlog("L_INFO", "$ci|register|found cached registration for $rz:$Au\n"); $var(password) = $sht(auth_cache=>$Au); route(SAVE_LOCATION); }
if( !is_present_hf("Authorization")) { auth_challenge("$fd", "0"); exit(); }
$xavp(ulattrs=>extra_info) = '{"test" : true}';
route(SAVE_LOCATION);
}
route[SAVE_LOCATION] {
if ($sht(auth_cache=>$Au) == $null) { xlog("L_INFO", "$ci|log|caching sip credentials for $Au\n"); }; $sht(auth_cache=>$Au) = $var(password); $var(save_result) = save("location", "0x04"); if($var(save_result) == -1) { auth_challenge("$fd", "0"); xlog("L_INFO", "$ci|end|issued auth challenge after failed attempt to save contact for $Au $si:$sp\n"); exit; } else { if($var(save_result) == 1) { $var(new_reg) = "true"; } else { $var(new_reg) = "false"; } } xlog("L_INFO", "$ci|end|save result $var(save_result) with contact $xavp(ulrcd=>ruid) : $ct\n"); } ```
@lazedo pushed 47 commits.
10c5ae5 pkg/kamailio/centos/7: mod kazoo enabled default build de323d8 Merge pull request #1105 from sergey-safarov/spec_update9 bca06f7 ndb_redis: remove redis_execute command with no arguments be59c13 ndb_redis: make redis_execute() return negative value on error 6525428 kazoo: missing sender alternative for presence 78dce87 alias_db: get rid of compilation warnings ed553e3 diversion: fix compilation warnings 33189c2 domain: fix compilation warnings for kemi updates 8b21031 evrexec: fixed the compile time warnings cf7ed18 registrar: fix compilation warnings 1712365 textopsx: remove unused variables 15883fe tm: fix warnings for compilation f060937 uac: updates to fix compile time warnings ec0953c Merge pull request #1112 from claudiupb/redis_rm_execute b8b900a modules: readme files regenerated - ndb_redis ... [skip ci] 9a6c06a ndb_redis: fix connection problems with pipelining 26b072c ndb_redis: fix memory leak 99a73cb Merge pull request #1115 from claudiupb/fix_redis_pipeline 4751b77 kazoo : fix json keys a081a6e sst: exported functions to kemi interface fedaceb statistics: exported functions to kemi interface 8577eb1 statsc: exported functions to kemi interface d4c392f statsd: functions exported to kemi interface ced9bfb auth_ephemeral: Some more safety checks, added SHA384 as an option ac4cc61 modules: readme files regenerated - auth_ephemeral ... [skip ci] d010101 ndb_redis: add disable server on failure feature 8279492 call_control: exported functions to kemi framework 3ce13be auth: exported has_credentials() to kemi framework c65f639 kazoo: split exchange/queue declarations e69579b rtpengine: Allow override of call_id, from/to tags a813dec modules: readme files regenerated - rtpengine ... [skip ci] d678d09 misc/examples: handle return code for ksr_route_withindlg() in kemy python example 22fc025 tm: use NAME define for initializing uac 395699a ndb_redis: rename variables to avoind name conflicts ae081b2 tm: exported load contacts functions to kemi framework c55ad4e Merge pull request #1116 from claudiupb/redis_disable_server ce7c10c modules: readme files regenerated - ndb_redis ... [skip ci] 1aa2e58 rtpengine: consume MOS values reported back by RTP proxy 37a2dc0 modules: readme files regenerated - rtpengine ... [skip ci] 0e84e7b misc/examples: fixed typo in kemi js example d268d93 misc/examples: fixed typo in sqlang kemi example c32ba91 msrp: fix missed break in parsing pv name 17be61e path: exported functions to kemi interface d5d2bbd log_custom: exported functions to kemi framework e2a5608 ndb_redis: increasing a log message level cfe19d7 registrar: fix xavp_rcd memory consumption 987e0b4 Merge branch 'registrar_xavp_rcd' of https://github.com/kamailio/kamailio.git into registrar_xavp_rcd
@lazedo pushed 1 commit.
649d968 registrar: remove 1st try
@miconda i think i found it. no leaks on both 1st and 2nd registrations
@miconda so 1st registrations are ok. i was testing with db_mode = `Write-Through` (sorry about that). with for db_mode = `db_only`, all xavps are cleared after route (with async resume).
To summarize: all ok now with patch 3fddccd ?
Was the problem that some xavps were not linked to the root list from the core?
@miconda Hi, yes, all ok with https://github.com/kamailio/kamailio/commit/3fddccd0d7a3f9662699f88d1a9a708e... . the problem was how we obtained the existing `xavp's` and adding new values. this part `xavp = list ? &list->val.v.xavp : &new_xavp;` did the trick, we were adding values to xavp list and not to reg_xavp_rcd xavp.
But even when adding to xavp list, they should be freed. Do you still have the xavp_reset_list() in kazoo module?
i didn't used kazoo for testing, but i will be adding a xavp_reset_list() to event processing (not critical). test with this script with/without the patch. have sip client re-register every 30 and you'll see `shm` memory consumption growing (`xavp_new_value`) ``` ... modparam("usrloc", "xavp_contact", "ulattrs") modparam("registrar", "xavp_rcd", "ulrcd") modparam("registrar", "xavp_cfg", "regcfg") modparam("htable", "htable", "auth_cache=>size=16;autoexpire=7200;") modparam("async", "workers", 2)
...
route { ... if (is_method("REGISTER")) { route(REGISTRAR); exit(); }
... }
route[REGISTRAR] { $xavp(regcfg=>match_received) = $su; if($sht(auth_cache=>$Au) != $null && registered("location", "$rz:$Au", 2, 1) == 1) { xlog("L_INFO", "$ci|register|found cached registration for $rz:$Au\n"); $var(password) = $sht(auth_cache=>$Au); route(SAVE_LOCATION); }
if( !is_present_hf("Authorization")) { auth_challenge("$fd", "0"); exit(); }
async_route("MY_RESUME", "1");
}
route[MY_RESUME] { $xavp(ulattrs=>channel_vars) = '{"channel-vars" : true}'; $xavp(ulattrs=>other_vars) = '{"other-vars" : false}'; route(SAVE_LOCATION); }
route[SAVE_LOCATION] { if ($sht(auth_cache=>$Au) == $null) { xlog("L_INFO", "$ci|log|caching sip credentials for $Au\n"); }; $sht(auth_cache=>$Au) = $var(password); $var(save_result) = save("location", "0x04"); if($var(save_result) == -1) { auth_challenge("$fd", "0"); xlog("L_INFO", "$ci|end|issued auth challenge after failed attempt to save contact for $Au $si:$sp\n"); exit; } else { if($var(save_result) == 1) { $var(new_reg) = "true"; } else { $var(new_reg) = "false"; } } xlog("L_INFO", "$ci|end|ulattrs $xavp(ulattrs=>channel_vars) , $xavp(ulattrs=>other_vars)\n"); xlog("L_INFO", "$ci|end|save result $var(save_result) with with expires ($xavp(ulrcd=>expires)) contact $xavp(ulrcd=>ruid) : $ct\n"); } ```
Pushed the patch split in two, as the expires part might not be needed to be backported for older releases.
If you tested and are confident in the results, then you can push any of the patches in stable branches.
@miconda Thanks Daniel. i've asked for more people to confirm the fix and i'm also deploying this to our sandbox environment. will cherry-pick the 1st to 4.4/5.0 and the 2nd to 5.0 if all goes well. so far my environment is stable.
Closed #1111.
@miconda i cherry-picked the 2 commits to 5.0 branch and the first to 4.4 branch.