Hello World!
Kamailio 5.5 in use.
I wonder, how I could prevent this issue.
modparam("dialog", "send_bye", 1) modparam("dialog", "default_timeout", 21600)
in the corresponding route: $dlg_ctx(timeout_route) = "DIALOG_TIMEOUT"; dlg_manage();
Dialog starts on Node01 and it's variables, status and timer are DMQ synced to Node02
As both nodes have the same information, I guess both arm the timeout trigger and sometimes the wrong node is more trigger friendly.
If dlg_ontimeout() is triggered on Node02 then:
* No Bye is send, as Node02 is not handling that Dialog. * Dialog is removed from Memory on Node02 and state synced back to Node01 * Dialog is removed from Memory on Node01 too, but NOT from database [*1] * Dialog CDR is never commited on Node01 * Call is never sent to timeout_route
And subsequent in dialog messages on Node01 result in:
dlg_onroute(): unable to find dialog
Did I misconfigure something?
Is there a way to make sure the timer is NOT triggered on the node not handling the dialogue?
Shall I try by setting the default_timeout and then use a timeout_avp to set that timeout slightly lower AFTER the dialog has started? Or would this be synced to the other node too?
I found out, session timer changes are not synced after an initial timer was synced, so you can not extend the session timer after setting a initial timeout. So maybe this is also true for dialog timeout?
[*1] this also explains why upon restarting kamailio, the dialog suddenly is back and after timing out again, a CDR is then written with a way too long duration.
Hi
Shall I try by setting the default_timeout and then use a timeout_avp to set that timeout slightly lower AFTER the dialog has started? Or would this be synced to the other node too?
No, this does not work. Tested on our dev platform (Kamailio 5.6) by setting default_timeout to 5 seconds and trying to lower this value to 3 seconds during call to force the timeout being triggered on the node which is handling the call.
modparam("dialog", "default_timeout", 5 ) modparam("dialog", "timeout_avp", "$avp(dlgtimeout)")
$avp(dlgtimeout) = 3; dlg_manage();
=> Still timing out after 5 seconds, about half the attempts on wrong node.
dlg_manage(); $avp(dlgtimeout) = 3;
=> Still timing out after 5 seconds, same issue.
$dlg_ctx(default_timeout) = 3
=> Invalid!
event_route[dialog:start] { $avp(dlgtimeout) = 3 }
=> Still timing out after 5 seconds, same issue.
Ping Olle and Alex. Any idea how to fix this issue?
Hi
Some more testing...
It looks like, in REPLY_ROUTE
if (is_known_dlg()) { dlg_set_timeout("3"); xlog("L_INFO", "$cfg(route): DEBUG DLG Lifetime $dlg(lifetime)\n"); }
indeed does set the lifetime to 3 seconds.
Unfortunately this is replicated on the other DMQ node which then again has a 50% chance of wrongfully triggering timeout after 3 seconds instead of what the default is. So no luck in setting a generous default lifetime and then shortening the lifetime on the node handling the case to make sure it is the one triggering the timeout.
I had a quick glimpse into the source. As an n00b coder, I think I understand that the dmq message is parsed and then timer inserted or updated on all dmq nodes when a DQM dialog message is received. I fear, this causes this behaviour of the timer expiring on the wrong node.
So is this a bug? Shall I report an issue on github?
Hi
I'm opening an issue on github as I consider this a bug.
This fixes the issue:
route[DMQ_CAPTURE] { if(is_method("KDMQ")) { if(has_body("application/json") && $fU == 'dialog') { if (jansson_get("lifetime", $rb, "$var(lifetime)")) { $var(new_lifetime) = $var(lifetime) + 60; # Add 60 seconds on DMQ peer to make sure it expires AFTER main node. $var(newrb) = $rb; jansson_set("integer", "lifetime", $var(new_lifetime), "$var(newrb)"); set_body("$var(newrb)","application/json"); msg_apply_changes(); } } dmq_handle_message(); exit; } }
Mit freundlichen Grüssen
-Benoît Panizzon-