Hi Nicolas,
thanks for the feedback. š if you observe this crash with a āstandardā Kamailio, just create a PR about it on our tracker.
Cheers,
Henning
From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Tuesday, January 24, 2023 10:36 AM
To: Henning Westerholt <hw@gilawa.com>; sr-dev@lists.kamailio.org
Subject: RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory (follow up), and crash with tlsf
Hello Henning,
It seems youāre right. :)
I did tests over the last four days with allocation/release cycles performed every five minutes.
During the first two days, the available memory fluctuates, and drops significantly three times.
But after that, it goes up again two times.
So it looks like there is no issue in the long term.
I just have to ensure the shared memory size is configured with this is mind.
Thanks again for your help !
Now about tlsf, maybe there is an issueā¦
I can reproduce the crash when starting Kamailio with Ā« -x tlsf Ā», and it does not look like the issue is with my codeā¦
Here is the trace in debug before the segfault :
0(82419) INFO: <core> [main.c:2139]: main(): private (per process) memory: 8388608 bytes
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 1024) called from core: core/str_hash.h: str_hash_alloc(59)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 1024) returns address 0x7f8983b32110
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 256) called from core: core/str_hash.h: str_hash_alloc(59)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 256) returns address 0x7f8983b32518
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 512) called from core: core/counters.c: init_counters(117)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 512) returns address 0x7f8983b32620
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) called from core: core/counters.c: init_counters(125)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) returns address 0x7f8983b32828
0(82419) DEBUG: <core> [core/cfg.lex:1964]: pp_define(): defining id: KAMAILIO_5
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 11) called from core: core/cfg.lex: pp_define(1995)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 11) returns address 0x7f8983b328b0
0(82419) DEBUG: <core> [core/cfg.lex:1964]: pp_define(): defining id: KAMAILIO_5_5
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 13) called from core: core/cfg.lex: pp_define(1995)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 13) returns address 0x7f8983b328f0
0(82419) DEBUG: <core> [core/cfg.lex:1964]: pp_define(): defining id: KAMAILIO_5_5_2
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 15) called from core: core/cfg.lex: pp_define(1995)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 15) returns address 0x7f8983b32930
0(82419) INFO: <core> [main.c:2198]: main(): shared memory: 268435456 bytes
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 16) called from core: core/route.c: init_rlist(146)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 16) returns address 0x7f8983b32970
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) called from core: core/str_hash.h: str_hash_alloc(59)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) returns address 0x7f8983b329b0
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 48) called from core: core/route.c: route_add(124)
0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 48) returns address 0x7f8983b32a38
0(82419) DEBUG: <core> [core/route.c:129]: route_add(): mapping routing block (0x9bf900)[0] to 0
Segmentation fault
Regards,
Nicolas.
De : Henning Westerholt <hw@gilawa.com>
EnvoyƩ : lundi 23 janvier 2023 11:01
Ć : Chaigneau, Nicolas; sr-dev@lists.kamailio.org
Objet : RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory not recovered
This mail has been sent from an external source |
Hello,
I donāt think there is a generic issue with the order of operation of first allocating the new memory, and then freeing the old.
This patter is also used e.g. from modules like carrierroute, for a routing data reload.
The issues in reporting statistics might be related to memory fragementation/defragmentation. If you free first and then allocate, the memory manager will probably hand out you the same memory
block again. If the opposite order, it needs to allocate new memory blocks.
Maybe you can execute the load/reload function several times just as experiment, as it should even out after a few tries.
Cheers,
Henning
From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Monday, January 23, 2023 10:37 AM
To: Henning Westerholt <hw@gilawa.com>;
sr-dev@lists.kamailio.org; Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Subject: RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory not recovered
Hello,
Iāve pushed my investigations further, and now I understand a bit better whatās going on.
The following applies to the two memory managers Ā« qm Ā» and Ā« fm Ā».
In its simplest form, what Iām doing is the following :
alloc_new_data();
free_old_data();
After this, Iām looking at :
1)
Kamailio available shared memory (as returned by function Ā« shm_available Ā»).
2)
My module shared memory usage (as shown by command Ā« kamcmd mod.stats my_module shm Ā»).
What Iām observing is that Kamailio available shared memory is steadily decreasing (but not systematically after each execution), and that my module shared memory usage is, conversely, steadily
increasing.
(My fear is, of course, that at some point the allocation will fail because the available shared memory would be exhausted.)
I notice from the reports of Ā« mod.stats Ā» that Kamailio seems to keep track of the exact function and line number where an allocation occured.
Maybe, as long as such a reference exists, the shared memory is not properly recovered ? (even though it is properly freed using shm_free).
To test this theory, I temporarily changed the code to :
free_old_data();
alloc_new_data();
With this all my issues disappear. The available shared memory is stable, as well as the module shared memory usage reported.
This is really weird. Is it how Kamailio shared memory is supposed to work ?
How could I solve this issue ?
Regards,
Nicolas.
De : Chaigneau, Nicolas
EnvoyƩ : vendredi 20 janvier 2023 15:28
Ć : Henning Westerholt; sr-dev@lists.kamailio.org
Cc : Kamailio (SER) - Users Mailing List
Objet : RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory
Hello Henning,
Thanks for your help. :)
Iām coming with an update, and yet more questions.
First, I tried using Ā« fm Ā» instead of Ā« qm Ā» on real data.
The results are impressive :
-
Allocation time is reduced from 85 s to 49 s
-
Free time is reduced from 77 s to about 2 s
-
And I do not notice SIP high response times when freeing
The time difference when freeing is huge. Iām surprised that this is so much faster than Ā« qm Ā», this is just because we donāt have the same debugging information ?
Now, another issue Iām looking into (possible memory leak ?).
This happens with both memory managers, Ā« qm Ā» and Ā« fm Ā».
Iām using Ā« shm_available Ā» function from Kamailio to keep track of the remaining available memory in the shared memory pool.
Iāve noticed something weird. At first I thought that I had a memory leak in my code, but Iām not so sure anymoreā¦
Each time I reload the (same) data (through a RPC command), the value of shm_available is decreasing.
This happens if I load new data before freeing the old data.
However, if I first free the existing data, then load the new data, the memory available shown by shm_available seems to be properly Ā« reset Ā».
For example :
Remaining memory available: 758960224 # <- allocate new
Remaining memory available: 756141328 # <- allocate new, then free old
Remaining memory available: 752037032 # <- allocate new, then free old
Remaining memory available: 749523176 # <- allocate new, then free old
Remaining memory available: 1073094936 # <- free
Remaining memory available: 758958544 # <- allocate new
Remaining memory available: 756143304 # <- allocate new, then free old
Remaining memory available: 752067480 # <- allocate new, then free old
Remaining memory available: 749532680 # <- allocate new, then free old
And so onā¦
This is for the same exact data used each time.
Iāve also tried to use the following command to track memory :
kamcmd mod.stats my_module shm
The results seem consistent with what shm_available reports : the memory used seem to increase for each allocation being tracked, even though the memory is properly freed (or should be : shm_free
is called as needed).
Apparently the values are only reset when the free is performed before the new allocation.
It is as if the memory being tracked is not properly Ā« cleaned up Ā» until everything has been freedā¦
Iām not sure what this entails : is the memory really not properly released ? or is it just a reporting issue ?
One more thing, I think there might be a bug with the command Ā« kamcmd mod.stats my_module shm Ā» : it can display negative values.
Maybe thereās an integer overflow ?
Regards,
Nicolas.
De : Henning Westerholt <hw@gilawa.com>
EnvoyƩ : jeudi 19 janvier 2023 15:43
Ć : Chaigneau, Nicolas; sr-dev@lists.kamailio.org
Cc : Kamailio (SER) - Users Mailing List
Objet : RE: Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)
This mail has been sent from an external source |
Hello Nicolas,
some people are using the TLSF memory manager, so it should certainly not crash. Maybe you could create an issue about it if you got a backtrace and its not related to your (custom) module.
The QM memory manager is providing more debugging information and can be also used to find memory leaks and such. Therefore, its enabled by default, as most people are not using huge data sets
internally.
The FM memory manager is more lightweight, and in your scenario apparently significant faster. Let us know if itās also working fine in the production setup.
Cheers,
Henning
From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Thursday, January 19, 2023 12:47 PM
To: Henning Westerholt <hw@gilawa.com>; Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Cc: sr-dev@lists.kamailio.org
Subject: RE: Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)
[mail resent because I was not subscribed to sr-dev ā sorry for the duplicate]
Hello Henning,
Thank you for your quick response !
I do not have any error messages.
Shared memory allocation and freeing is done exclusively by the RPC process.
The workers only read that memory (and only the memory that is *not* being allocated or freed by the RPC process).
Iāve looked at the different shared memory managers as you suggested.
First, Ā« tlsf Ā» does not work : Kamailio crashes on startup with Ā« -x tlsf Ā».
A comparison of Ā« qm Ā» (default) and Ā« fm Ā» :
With Ā« fm Ā», the loading time is reduced by 25%.
The freeing is also much faster (maybe 4 times faster).
And I do not notice the performances issues (that I can reproduce when using Ā« qm Ā»).
But maybe this is because I do not have enough data on my test environment. Iāll have to test this with the real data.
But these first results with Ā« fm Ā» look promising ! :)
Could you maybe explain to me the main differences between the 3 shared memory managers ? and why is Ā« qm Ā» the default ?
Also, do you have an idea why Ā« tlsf Ā» makes Kamailio crash ? (does anyone use Ā« tlsf Ā» ?)
Thanks again.
Regards,
Nicolas.
De : Henning Westerholt <hw@gilawa.com>
EnvoyƩ : jeudi 19 janvier 2023 08:28
Ć : Kamailio (SER) - Users Mailing List
Cc : Chaigneau, Nicolas; sr-dev@lists.kamailio.org
Objet : RE: Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)
This mail has been sent from an external source |
Hello,
(Adding sr-dev to CC)
This looks indeed a bit strange. Do you get any error messages in the log? In which process you are freeing the memory, one of the worker processes or the RPC process?
You could also try to use another memory manager to see if you get better performance. There is a command line parameter to choose one during startup.
Cheers,
Henning
From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Wednesday, January 18, 2023 6:49 PM
To: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Subject: [SR-Users] Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)
Hello,
I'm encountering performance issues with Kamailio (5.5.2).
Iām using a custom Kamailio module that loads routing data in memory, using Kamailio shared memory.
This routing data is very large. It can be fully reloaded through a Kamailio RPC command (which is done once each day).
When reloading, two sets of data are maintained, one "loading" and another "current" (the latter being used to handle SIP requests).
When loading of the new data is finished, it is swapped to "current". Then, memory of the old (now unused) data is freed.
I've noticed that when Kamailio is freeing the old data, there is a very significant performance impact on SIP requests.
This is surprising to me, because the SIP requests do not use this old data.
This is not a CPU issue, idle CPU% is at about 99% at that moment.
I'm using the following functions :
- shm_mallocxz
- shm_free
From what I understand, shm_free is actually "qm_shm_free" defined in "src\core\mem\q_malloc.c" (the default shared memory manager being "qm").
I've noticed that there is also a variant shm_free_unsafe ("qm_free"), which does not perform locking.
I'm wondering if the lock could be the cause of my performances issues ?
(But I'm not sure how this could be possible, because although the SIP requests need to access the shared memory allocated, they do not use directly the functions from the share memory manager.)
If the performances issues are causes by the lock, could I use the unsafe version "safely" ? (considering that it is guaranteed that the old data cannot be used by anyone else)
Thanks for your help.
Regards,
Nicolas.
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the
person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately
and delete all copies of this message.