### Description Kamailio 5.1 on Debian - Enabled xavp_rcd in registrar - Shared memory usage keeps growing over time
### Troubleshooting
- `mod.stats core all `show a large amount of allocated memory for xavp_new_value that keeps growing over time
` xavp_new_value(94): 48815768`
#### Reproduction
We have the same symptom on multiple servers
#### Log Messages
Nothing shown in the log yet.
### Possible Solutions
Using reg_fetch_contacts to grab expiry time is an alternative that works.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.1.4 (x86_64/linux) flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled on 10:59:52 Jul 3 2018 with gcc 6.3.0 ```
* **Operating System**:
``` Debian 9.7 x86_64 ```
Are you using any async functions (t_suspend(), async or evapi modules ...)?
What is the registrar functions you use in the config after which you expect this leak happens? Are they used for a normal classic route block, or is inside an event_route block?
No async functions. Just a standard save() - one with reply sent and one for replicated registrations with no reply sent.
All used in a classic route block .
I have an event_route for registry expirations, but nothing else. Changed one server to not use the rcd xavp and instead use reg_fetch_contacts and that stopped the memory growth.
Can you give the relevant snippet of config that results in leak? Like fro `save()` to the end of processing for REGISTER, or where you use the xavp_rcd. I want to make a basic config to try to reproduce.
``` $var(rc) = save("location", 0x04); if (!$var(rc)) { xlog("L_ERR", "-- REGISTRAR: failed to save registration: $tu => UA: $ua, IP: $si:$sp, Caller-ID: $ci, CSeq: $cs, Expiry: $hdr(Expires), Contact $ct, Return code: $var(rc)\n"); sl_reply_error(); exit; }
if ($hdr(Expires) == "") { $var(expire) = $sel(contact.expires); } else { $var(expire) = $hdr(Expires); } $var(regtime) = $xavp(ulrcd[0]=>expires);
```
Can you test with the latest version of the branch? There was a related fix with commit adc4493fa6861895bdcf8b459e5fbc76e80d0f1f . It will be useful to know if that was for your case or there is something else.
I will try, but probably not until next week sometime. Please stand by. We downgraded our production servers to an older solution.
@miconda i don't think its related. the issue on [adc4493](https://github.com/kamailio/kamailio/commit/adc4493fa6861895bdcf8b459e5fbc76...) was to prevent crashing after leak that led to shm exhaustion.
in our case it was a instant shm exhaustion not a slow leak. the slow leak could be related to the growth in xavp_rcd_helper or lookup_helper.
i still need to proof this but my theory is that different processes accessed the contact xvap or the ulrcd in shm during xavp_clone and created a circular reference. something like a register and a notify (that calls lookup) at the same time. the server worked stable for 2 weeks, no shm growth, suddenly shm was exhausted, and this unprotected xavp_clone made sip workers died and server became unresponsive. btw, is there a way to create/generate/force a core file with the contents of the shm mem ? the core files we got (from sip workers) could not access the shm mem to see if the shm xavp's contents could proof my theory
@lazedo - the core file contains the shared memory, but you need to know where to look for xavps. They are linked either in the transaction structure, or registrar/location structure, or from global variables, a matter of what it is done at the moment of generating the crash.
You can use `abort()` (iirc, from cfgutils module )inside kamailio.cfg to trigger a crash at a specific line, so you know what that worker did just before crashing.
That's interesting. So if a device keeps re-registering using the same call ID the structure stays open and the xavp's keep hanging to the registration structure? Could that be it? Just a wild guess... We have a short time between re-registrations.
Is this issue fixed in latest kamailio version 5.2.4? i'm asking this because before i upgrade my system that was using kamailio-5.2.1 i had a similar issue reported here, but after upgrade kamailio to version 5.2.4 its gone. So i suppose that its resolved. if you guys confirm that this issue is resolved in 5.2.4 then i guess that this issue can be closed, right?
There were a memory leak present in the xavp_clone_level_nodata function. From a look to the code it looks that it was used also in the registrar lookup function if the xavp_rcd parameter was set. This was fixed in 0c93efec739 and also backported to 5.2 and 5.1 branch. This fix is included in 5.2.4, which matches the comment from @sebra.
Close this issue, if you still see it please re-open.
Closed #1834.