Shared memory hash table with global hashtable lock. Add state maintaining the selected rtp node, for a given callid. Hashtable entry expiration time configurable using hash_entry_tout modparam. The actual deletion happens on the fly while insert/remove/lookup are called. Updated doku. You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/390
-- Commit Summary --
* rtpengine: hash table to keep the selected nodes
-- File Changes --
M modules/rtpengine/doc/rtpengine_admin.xml (26) M modules/rtpengine/rtpengine.c (189) A modules/rtpengine/rtpengine_hash.c (314) A modules/rtpengine/rtpengine_hash.h (30)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/390.patch https://github.com/kamailio/kamailio/pull/390.diff
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390
Basically tested it, with focus on entries expiration. Further tests will follow.
What do you think about current expired entries deletion implementation? Do you have any other advices/proposals?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-154093966
Updated pull request. Added some modparams and kamctl for the hashtable. Added a modparam to allow session deletion to manually disabled nodes (useful when performing maintenance). Updated doku.
In my opinion tests were ok. Tested this for 100 calls with media, 10 calls/s rate. Sessions were deleted successfully upon BYE. No memory leaks found at the hash table level (used "kamcmd mod.stats rtpengine shm" for this).
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-155447957
What would happen when there is a restart?
I find the feature useful overall, any limitations need to be documented when it is enabled.
Hopefully @rfuchs can comment from his point of view and merge when all is ok.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-155789303
On a restart (i.e. rtpengine's mod_destroy() is called), rtpengine_hash_table_destroy() is called which frees the entire shm hashtable.
Possible limitations: - overhead added by the rtpengine hashlock (i.e. decrease in calls/sec rate) - more shm used (i.e. one has to increase the shm using kamailio's "-m" parameter)
What is the current known limitation for kamailio (calls/sec) ?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-155802616
I was looking more at the perspective of the items in hash table being lost and if the restart was in a middle of the call, then after restart, when the BYE is processed, which rtpengine is updated?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-155804313
Hmm.. I think I get the point. If the kamailio restarts in the middle of a call, the hashtable is flushed. When the BYE comes, no entry is found for that callid and no rtpengine is updated.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-155807354
One approach to this would be to fallback to the stateless behaviour when no entry is found in the hashtable and just select a rtpengine based on the hashed callid (with 100% matching the rtpengine machine for that call if none crashed).
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-155809502
+}
+int rtpengine_hash_table_insert(void *key, void *value) {
- struct rtpengine_hash_entry *entry, *last_entry;
- struct rtpengine_hash_entry *new_entry = (struct rtpengine_hash_entry *) value;
- unsigned int hash_index;
- // check rtpengine hashtable
- if (!rtpengine_hash_table) {
LM_ERR("NULL rtpengine_hash_table");
return 0;
- }
- // get entry list
- hash_index = str_hash(key);
- entry = rtpengine_hash_table->entry_list[hash_index];
Don't you need to acquire the lock before accessing the start element for the list?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44547806
+}
+int rtpengine_hash_table_insert(void *key, void *value) {
- struct rtpengine_hash_entry *entry, *last_entry;
- struct rtpengine_hash_entry *new_entry = (struct rtpengine_hash_entry *) value;
- unsigned int hash_index;
- // check rtpengine hashtable
- if (!rtpengine_hash_table) {
LM_ERR("NULL rtpengine_hash_table");
return 0;
- }
- // get entry list
- hash_index = str_hash(key);
- entry = rtpengine_hash_table->entry_list[hash_index];
Nevermind, I see there's an empty list head. Retracted.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44548209
found: if (do_test) { node->rn_disabled = rtpp_test(node, node->rn_disabled, 0); if (node->rn_disabled) goto retry; }
- /* build hash table entry */
- struct rtpengine_hash_entry *entry = shm_malloc(sizeof(struct rtpp_node));
- if (shm_str_dup(&entry->callid, &callid) < 0) {
LM_ERR("rtpengine hash table fail to duplicate calllen=%d callid=%.*s",
callid.len, callid.len, callid.s);
- }
- entry->node = node;
- entry->next = NULL;
- entry->tout = get_ticks() + hash_table_tout;
- /* Insert the key<->entry from the hashtable */
- if (!rtpengine_hash_table_insert(&callid, entry)) {
LM_ERR("rtpengine hash table fail to insert node=%.*s for calllen=%d callid=%.*s",
The error case needs some shm_free() I think
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44548900
/* XXX Use quick-and-dirty hashing algo */
- for(sum = 0; callid.len > 0; callid.len--)
sum += callid.s[callid.len - 1];
- for(i = 0; i < callid.len; i++)
sum += callid.s[i];
"sum" is used uninitialized here
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44549819
+/*
- Main balancing routine. This DO try to keep the same proxy for
- the call if some proxies were disabled or enabled (e.g. kamctl command)
- */
+static struct rtpp_node * +select_rtpp_node(str callid, int do_test, int op) +{
- if(!active_rtpp_set) {
LM_ERR("script error - no valid set selected\n");
return NULL;
- }
- // calculate and choose a node
- if (op == OP_OFFER) {
// run the selection algorithm
"Offer" doesn't necessarily mean that this will be a new entry. A re-invite will also do an "offer" but needs to do a lookup into the hash table first. I think a lookup should always be done first, with fallback to creating a new entry in case of an offer.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44550420
found: if (do_test) { node->rn_disabled = rtpp_test(node, node->rn_disabled, 0); if (node->rn_disabled) goto retry; }
- /* build hash table entry */
- struct rtpengine_hash_entry *entry = shm_malloc(sizeof(struct rtpp_node));
- if (shm_str_dup(&entry->callid, &callid) < 0) {
LM_ERR("rtpengine hash table fail to duplicate calllen=%d callid=%.*s",
Function should abort here, otherwise risking segfault
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44551280
found: if (do_test) { node->rn_disabled = rtpp_test(node, node->rn_disabled, 0); if (node->rn_disabled) goto retry; }
- /* build hash table entry */
- struct rtpengine_hash_entry *entry = shm_malloc(sizeof(struct rtpp_node));
Return value of shm_malloc should be error checked (might be NULL)
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44551417
- }
- // get first entry from entry list; jump over unused list head
- hash_index = str_hash(key);
- entry = rtpengine_hash_table->entry_list[hash_index];
- last_entry = entry;
- // lock
- lock_get(rtpengine_hash_lock);
- while (entry) {
// if key found, return entry
if (str_equal(&entry->callid, (str *)key)) {
// unlock
lock_release(rtpengine_hash_lock);
return entry;
Returning the raw hash table entries straight from the hash table is dangerous as the entries are protected by the lock and so this opens the door to a race condition. A lookup might return one entry while the same entry is being freed in another process. Either return a copy of the entries or simply return its contents (the node should be enough).
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44553288
- str callid; // call callid
- struct rtpp_node *node; // call selected node
- struct rtpengine_hash_entry *next; // call next
+};
+/* table */ +struct rtpengine_hash_table {
- struct rtpengine_hash_entry **entry_list; // hastable
- unsigned int total; // total number of entries in the hashtable
+};
+int rtpengine_hash_table_init(int size); +int rtpengine_hash_table_destroy(); +int rtpengine_hash_table_insert(void *key, void *value);
I don't think there's a good reason to use void pointers here, since the types are known and using void pointers prevents the compiler from catching type errors.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44553486
- }
- LM_DBG("rtpengine_hash_table size = %d\n", hash_table_size);
- // init hashtable
- rtpengine_hash_table = shm_malloc(sizeof(struct rtpengine_hash_table));
- if (!rtpengine_hash_table) {
LM_ERR("no shm left to create rtpengine_hash_table\n");
return 0;
- }
- // init hashtable entry_list
- rtpengine_hash_table->entry_list = shm_malloc(hash_table_size * sizeof(struct rtpengine_hash_entry));
- // init hashtable entry_list[i] (head never filled)
- for (i = 0; i < hash_table_size; i++) {
rtpengine_hash_table->entry_list[i] = shm_malloc(sizeof(struct rtpengine_hash_entry));
I don't think shm_malloc() automatically zeroes the memory, and the lookup routines seem to do a strcmp on the head entries call-ID too, so the call-ID needs to be initialized as well.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44554482
@@ -356,6 +412,51 @@ modparam("rtpproxy", "rtp_inst_pvar", "$avp(RTP_INSTANCE)") </example>
</section>
<section id="rtpengine.p.hash_table_size">
<title><varname>hash_table_size</varname> (integer)</title>
<para>
Size of the hash table. Default value is 256.
</para>
<para>
NOTE: If configured size is <emphasis>less than</emphasis> 1, the size will be defaulted to 1.
</para>
<example>
<title>Set <varname>hash_table_size</varname> parameter</title>
+<programlisting format="linespecific"> +... +modparam("rtpproxy", "hash_table_size", "123")
Should be "rtpengine" not "rtpproxy" :)
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44554640
@@ -228,6 +235,8 @@ static pid_t mypid; static unsigned int myseqn = 0; static str extra_id_pv_param = {NULL, 0}; static char *setid_avp_param = NULL; +static int hash_table_tout = 120;
Is a default value of 2 minutes really all that useful? If a call lasts longer than 2 minutes, the final bye/delete won't be delivered to the correct RTP proxy any more, or do I misunderstand this value?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44555607
}
if (body_out) *body_out = body;
- if (op == OP_DELETE) {
/* Delete the key<->value from the hashtable */
if (!rtpengine_hash_table_remove(&callid)) {
I don't think immediately deleting the hash table entry on "delete" is a good idea as it may only be a single branch that has been deleted and a "delete" does not preclude the possibility of more commands (offer, answer, or something else) coming for the same call-ID. Perhaps resetting the expiry value for the entry to something small might be a better idea.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44556022
Why not just require the dialog module and store the selected rtpproxy in the dialog?
What happens on spiraled/hairpin calls? The call-id alone does not seem specific enough.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156028269
I would not like a dependency on dialog module, it is quite common to run rtp relay on edge proxy which is typically more or less a lightweight load balancer -- dialog management on those nodes will add a lot of overhead.
Fallback to hasing callid after a restart seams reasonable for me (giving the current behaviour).
For spiralled calls, iirc, there is an option to provide via branch to rtpproxy/rtpengine, but this is independent of selecting what rtpproxy/rtpengine to use.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156029590
We also thought of using dialog module for keeping the node. The main reason keeping the rtpengine module as independent as it could be. Maybe for spiraled calls this could be extended to consider also the from/to-tag when calculating the hash index ?!
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156029911
Alex writes:
Why not just require the dialog module and store the selected rtpproxy in the dialog?
dialog module is a complicated mess and its use cannot be warranted just for enabling this feature.
-- juha
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156032520
So, the dialog module is a no-go/
Have you considered just putting the selected rttproxy in an RR parameter? For me that has worked perfectly for years now.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156035497
Alex writes:
Have you considered just putting the selected rttproxy in an RR parameter? For me that has worked perfectly for years now.
that is what i'm doing also:
;pm[=setid]
-- juha
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156037397
That is only the setid not the selected rtp relay within the set. Afaik, there is no way to specify the rtp relay within a set from config file.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156039355
Daniel-Constantin Mierla writes:
That is only the setid not the selected rtp relay within the set. Afaik, there is no way to specify the rtp relay within a set from config file.
it works if you have only one relay per set.
-- juha
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156041957
My understanding was that the patch was adding support for keeping the relation between the selected rtpengine and the callid, when there are more than one rtpengine per set. If one of the rtpengine in the set is unavailable, then the current hashing selection becomes inconsistent if the rtpengine turns back available.
If there is only one rtpengine per set, then no hashing selection is done.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-156044158
}
if (body_out) *body_out = body;
- if (op == OP_DELETE) {
/* Delete the key<->value from the hashtable */
if (!rtpengine_hash_table_remove(&callid)) {
I understand the problem that may appear in that case you said. However, I don't understand why resetting the expiry so something smaller would solve this. Proposals: 1. reset the timer to the hash_entry_tout value 2. also consider from-tag, to-tag, maybe via-branch id (add them in the hash table entry). Then match them all, when deleting the entry; thus the deletion can be done right-away.
I'd go for 2. What do you think?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44639231
@@ -228,6 +235,8 @@ static pid_t mypid; static unsigned int myseqn = 0; static str extra_id_pv_param = {NULL, 0}; static char *setid_avp_param = NULL; +static int hash_table_tout = 120;
You are right, I'll update this to default 3600 sec. Also document this idea.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44639415
}
if (body_out) *body_out = body;
- if (op == OP_DELETE) {
/* Delete the key<->value from the hashtable */
if (!rtpengine_hash_table_remove(&callid)) {
Some notes:
* when session is created, there is no to-tag (initial invite) * the via-branch can be different for re-invite or bye, so it cannot be used as part of the key all the time
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44639702
- }
- // get first entry from entry list; jump over unused list head
- hash_index = str_hash(key);
- entry = rtpengine_hash_table->entry_list[hash_index];
- last_entry = entry;
- // lock
- lock_get(rtpengine_hash_lock);
- while (entry) {
// if key found, return entry
if (str_equal(&entry->callid, (str *)key)) {
// unlock
lock_release(rtpengine_hash_lock);
return entry;
True, I'll return just the node.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44639943
}
if (body_out) *body_out = body;
- if (op == OP_DELETE) {
/* Delete the key<->value from the hashtable */
if (!rtpengine_hash_table_remove(&callid)) {
If you can be bothered with 2, then go for it, but I think it would take a lot of work and fine-tuning to make it work just right. The proxies already do all the tag and branch handling so I see no need to duplicate it here.
Setting the expiry to the near future on "delete" would help to keep the hash table small if it was indeed the final BYE, but it would also require resetting the expiry to now+timeout on every operation, i.e. not only when the key is created but on a lookup as well.
Another option would be to extend the control protocol and have the RTP proxy tell the module whether this call is finished now or if it should be kept alive.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r44656355
Updated pull request. Some of the most important updates: - free previously allocated shm in case of reached memory limit and solve possible segfaults; tested this using -m 128 and a large table: kamailio didn't crashed when had no memory for the entries and still accepted calls afterwards. - always lookup hashtable before selecting a new node; this fallbacks to the previous behaviour - check also for viabranch when matching the call in addition to the callid; tested this for simple calls i.e. viabranch was STR_NULL; _didn't tested it_ for real-life branching scenarios as we are not using it; here I'd need some help from your side, if you use those scenarios and have time for it; IMHO the matching should be fine as I'm checking the viabranch value I get from the struct sip_msg.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-157059972
**Accidentally** pushed all the commits when trying to "git push upstream p_usrloc_NULL_checks master"; forgot the ":".
Shall I re-push the old files? (can't push -f a reset on previous commit due to locking).
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-158903750
git revert is your friend
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-158904989
Done. I reverted all your commits
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-158905844
Thank you; didn't think of it :)
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-158905971
@linuxmaniac -- you should add the commands you executed in the wiki:
* https://www.kamailio.org/wiki/devel/git-commit-guidelines#useful_commands
It will be useful for devs getting into git.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-158907208
you should add the commands you executed in the wiki: https://www.kamailio.org/wiki/devel/git-commit-guidelines#useful_commands
Done https://www.kamailio.org/wiki/devel/git-commit-guidelines#revert_already_pus...
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-158916629
Updated the pull request with focus on the hastable API: - done **per row** hashtable locking instead of global hashtable locking; this should further decrease the possible limitations of using hashtable locks; - when enabled, the allow_op modparam will allow sessions to finish for both disabled(permanent) and disabled nodes(they might be disabled due to timeout not to real machine break and one might still want the current sessions to finish);
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-163281872
- LM_DBG("rtpengine_hash_table size = %d\n", hash_table_size);
- // init hashtable
- rtpengine_hash_table = shm_malloc(sizeof(struct rtpengine_hash_table));
- if (!rtpengine_hash_table) {
LM_ERR("no shm left to create rtpengine_hash_table\n");
return 0;
- }
- memset(rtpengine_hash_table, 0, sizeof(struct rtpengine_hash_table));
- rtpengine_hash_table->size = hash_table_size;
- // init hashtable row_entry_list
- rtpengine_hash_table->row_entry_list = shm_malloc(rtpengine_hash_table->size * sizeof(struct rtpengine_hash_entry*));
- if (!rtpengine_hash_table->row_entry_list) {
LM_ERR("no shm left to create rtpengine_hash_table->row_entry_list\n");
rtpengine_hash_table_destroy();
A minor nitpick: These calls to _destroy() here won't ever actually do anything, because the _sanity_checks() will always fail.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r47195509
- LM_DBG("rtpengine_hash_table size = %d\n", hash_table_size);
- // init hashtable
- rtpengine_hash_table = shm_malloc(sizeof(struct rtpengine_hash_table));
- if (!rtpengine_hash_table) {
LM_ERR("no shm left to create rtpengine_hash_table\n");
return 0;
- }
- memset(rtpengine_hash_table, 0, sizeof(struct rtpengine_hash_table));
- rtpengine_hash_table->size = hash_table_size;
- // init hashtable row_entry_list
- rtpengine_hash_table->row_entry_list = shm_malloc(rtpengine_hash_table->size * sizeof(struct rtpengine_hash_entry*));
- if (!rtpengine_hash_table->row_entry_list) {
LM_ERR("no shm left to create rtpengine_hash_table->row_entry_list\n");
rtpengine_hash_table_destroy();
Right. The _destroy() should do sanity checking, with memory free, when possible. Will correct this.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390/files#r47196329
Other than that, looks good to me.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#issuecomment-163524427
Merged #390.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/pull/390#event-488081282