Hi, let me describe what I think it's a common case (I've already experimented it sometimes and know others too):
- A request arrives and it's handled by a worker process. - There is some memory leak in PKG MEM (or perhaps too few memory allocated for it). - The process still can do basic tasks as parsing and so. - The script creates a dialog and does other operation requiring SHM memory. - When calling to t_relay() it fails due to non enough PKG mem, so the transaction is not created. - Also depending on the memory status it's possible that the process can not generate a SIP error response (so there is no response). - The client starts with retransmissions. - These retransmissions would not match an existing transaction so all the script process is done again (creating a new dialog and so). - Again t_relay() fails.
In this case, the dialog statistics show an exorbitant ammount of dialogs, along with other data retrieved from shared memory.
So what I'm in mind is a new script function to get the current available PKG mem, so the script can determine not to process the request and reply (if it can) an error response. This would avoid the creation of a new dialog, db queries and so on.
A workaround is calling t_newtran() at the beginning of the script, but it's not a very "polite" solution. So I would prefer a script funciont "get_pkg_mem()" which return the ammount of bytes available in PKG MEM. Then I could decide whenever continue or terminate the request process.
Does it sound feasible and useful? opinions?
Thanks.
On Friday 16 April 2010, Iñaki Baz Castillo wrote:
- A request arrives and it's handled by a worker process.
- There is some memory leak in PKG MEM (or perhaps too few memory
allocated for it).
- The process still can do basic tasks as parsing and so.
- The script creates a dialog and does other operation requiring SHM
memory.
- When calling to t_relay() it fails due to non enough PKG mem, so
the transaction is not created.
- Also depending on the memory status it's possible that the process
can not generate a SIP error response (so there is no response).
- The client starts with retransmissions.
- These retransmissions would not match an existing transaction so all
the script process is done again (creating a new dialog and so).
- Again t_relay() fails.
Hello Inaki,
transaction live mainly in shared memory, so this could be another reason that the t_relay/ t_newtran fails. But you're right, it should also fail due insufficient private memory .
So what I'm in mind is a new script function to get the current available PKG mem, so the script can determine not to process the request and reply (if it can) an error response. This would avoid the creation of a new dialog, db queries and so on.
There are already some functions that output mem status, albeit in the log. Take a look to pkg_status/shm_status() in cfgutils. So one could of course implement a PV that returns the number of available memory, or a function that checks for a certain range.
But i wonder if this is really necessary. The last out of memory condition we observed in production is years away, if i remember correctly. So i'd suggest that you just use a proper size of PKG mem pool (like 10MB per process) and also enough shared memory, as RAM is really cheap this days. If then you still get a out of memory condition then there is a memory leak in the code, and this should just be fixed instead of trying to work around in the script here.
Best regards,
Henning
2010/4/16 Henning Westerholt henning.westerholt@1und1.de:
Hello Inaki,
transaction live mainly in shared memory, so this could be another reason that the t_relay/ t_newtran fails
Yes sorry, my fault. Basically the problem would be the fact that the worker has not enough memory to parse and create an error response.
. But you're right, it should also fail due insufficient private memory .
So what I'm in mind is a new script function to get the current available PKG mem, so the script can determine not to process the request and reply (if it can) an error response. This would avoid the creation of a new dialog, db queries and so on.
There are already some functions that output mem status, albeit in the log. Take a look to pkg_status/shm_status() in cfgutils. So one could of course implement a PV that returns the number of available memory, or a function that checks for a certain range.
Thanks.
But i wonder if this is really necessary. The last out of memory condition we observed in production is years away, if i remember correctly. So i'd suggest that you just use a proper size of PKG mem pool (like 10MB per process) and also enough shared memory, as RAM is really cheap this days. If then you still get a out of memory condition then there is a memory leak in the code, and this should just be fixed instead of trying to work around in the script here.
Yes, I agree. I experimented a PKG MEM issue (using 4 MB) last week using Kamailio 1.5.1. Usually a worker in this server just consumes 191176 bytes (so far from 4 MB) so increasing the PKG MEM is not very useful in this case as it would be reached soon (however using 1.5.4 I've not experimented this problem in 8 days with even more traffic).
But yes, you are fully right, if there is a memory leak it must be fixed rather than doing a dirty workaround in the script :)
Thanks.
On Apr 16, 2010 at 13:58, I??aki Baz Castillo ibc@aliax.net wrote:
2010/4/16 Henning Westerholt henning.westerholt@1und1.de:
Hello Inaki,
transaction live mainly in shared memory, so this could be another reason that the t_relay/ t_newtran fails
Yes sorry, my fault. Basically the problem would be the fact that the worker has not enough memory to parse and create an error response.
. But you're right, it should also fail due insufficient private memory .
So what I'm in mind is a new script function to get the current available PKG mem, so the script can determine not to process the request and reply (if it can) an error response. This would avoid the creation of a new dialog, db queries and so on.
There are already some functions that output mem status, albeit in the log. Take a look to pkg_status/shm_status() in cfgutils. So one could of course implement a PV that returns the number of available memory, or a function that checks for a certain range.
Thanks.
Try pkg_available() or shm_available(). Note however that this information is not always available (depends on the compilations options). When not available, they will always return max. ulong ( (unsigned long)-1).
You might want to look also at pkg_info(&mi) and shm_info(&mi). They fill a struct mem_info (defined in mem/meminfo.h):
struct mem_info{ unsigned long total_size; unsigned long free; unsigned long used; unsigned long real_used; /*used + overhead*/ unsigned long max_used; unsigned long min_frag; unsigned long total_frags; /* total fragment no */ };
[...]
Andrei
2010/4/19 Andrei Pelinescu-Onciul andrei@iptel.org:
Try pkg_available() or shm_available(). Note however that this information is not always available (depends on the compilations options). When not available, they will always return max. ulong ( (unsigned long)-1).
You might want to look also at pkg_info(&mi) and shm_info(&mi). They fill a struct mem_info (defined in mem/meminfo.h):
struct mem_info{ unsigned long total_size; unsigned long free; unsigned long used; unsigned long real_used; /*used + overhead*/ unsigned long max_used; unsigned long min_frag; unsigned long total_frags; /* total fragment no */ };
Thanks a lot.