Hi,
I am wondering if perhaps we ought to do something with regard to
specific handling of confirmed-nonacked dialogs (CONFIRMED_NA state in
dlg_hash.h) in the dialog module. These are dialogs where a 2xx reply is
sent to the opening INVITE transaction, but no end-to-end ACK is seen[1]
by Kamailio and thus the dialog is not recorded as transitioning to
CONFIRMED state.
This can happen for a variety of reasons, but the most common scenario I
run into is the CANCEL-200 OK race, where the caller cancels the call
just as the callee answers it, near-simultaneously. The 200 OK hasn't
gotten back to the caller yet, so when it receives it, it has no effect,
because from the caller's point of view, the dialog has already been
CANCEL'd. Meanwhile, the CANCEL has no effect on the callee end either,
since, from its point of view, the dialog has already transitioned into
confirmed state.
The problem I am running into a lot is that these dialogs stay tracked
up until the dialog timeout period, which can be several hours away. In
high-volume environments, they can clog up concurrent channel counts.
The receiving UAS has, of course, disposed of these dialogs long ago,
after 64*T1, but they remain "stuck" in Kamailio.
I know that RFC 3261 Section 13.3.1.4 ("The INVITE is Accepted") says:
If the server retransmits the 2xx response for 64*T1 seconds without
receiving an ACK, the dialog is confirmed, but the session SHOULD
be terminated. This is accomplished with a BYE, as described
in Section 15.
Now, I know that "SHOULD" != "MUST", and I would imagine that this is
probably the main reason why the dialog module does not time out such
dialogs according to the same timers as the callee UA might.
Nevertheless, they present a problem. Right now, I deal with it by using
a script that combs 'kamctl fifo dlg_list' for dialogs in 'state:: 3'
for more than X seconds and manually ends them.
But, when there's a problem that one runs into nearly ubiquitously in
all deployments with nontrivial deployments, it seems to me it's time to
consider an additional 'dialog' modparam or something of that ilk that
can provide an expedited timeout for nonacked dialogs.
I would be happy to write such a patch. The reason I am bringing it up
to the community is because I am uncertain as to whether this might have
any unforeseen consequences, or whether it's been discussed in various
dialog_ng discussions in the past that I have not carefully monitored.
Thanks!
-- Alex
(With apologies for cross-posting.)
[1] Or correctly associated based on tight matching.
--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.
The following task has a new comment added:
FS#420 - Crash when reloading addresses from mysql database (permissions module)
User who did this - Torrey Searle (tsearle)
----------
Thanks! going to test the revised patch and will in the coming days if the crash is gone :-)
----------
More information can be found at the following URL:
http://sip-router.org/tracker/index.php?do=details&task_id=420#comment1410
You are receiving this message because you have requested it from the Flyspray bugtracking system. If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.
The following task has a new comment added:
FS#420 - Crash when reloading addresses from mysql database (permissions module)
User who did this - Daniel-Constantin Mierla (miconda)
----------
It was caught and fixed shortly afterwards, but forgot to note it here - run 4.1.3 and is there.
----------
More information can be found at the following URL:
http://sip-router.org/tracker/index.php?do=details&task_id=420#comment1409
You are receiving this message because you have requested it from the Flyspray bugtracking system. If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.
The following task has a new comment added:
FS#420 - Crash when reloading addresses from mysql database (permissions module)
User who did this - Torrey Searle (tsearle)
----------
The crash is at the following piece of code
*_r = 0;
#if (MYSQL_VERSION_ID >= 40100)
while( mysql_more_results(CON_CONNECTION(_h)) && mysql_next_result(CON_CONNECTION(_h)) > 0 ) {
MYSQL_RES *res = mysql_store_result( CON_CONNECTION(_h) );
mysql_free_result(res);
}
#endif
RES_RESULT(*_r) = 0;
I'm guessing you shouldn't do both
*_r = 0;
and
RES_RESULT(*_r) = 0;
and the RES_RESULT line should be removed or put before the RES_RESULT(*_r) = 0;
----------
More information can be found at the following URL:
http://sip-router.org/tracker/index.php?do=details&task_id=420#comment1408
You are receiving this message because you have requested it from the Flyspray bugtracking system. If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.
The following task has a new comment added:
FS#420 - Crash when reloading addresses from mysql database (permissions module)
User who did this - Torrey Searle (tsearle)
----------
In the lab I'm still getting a segfault after applying the patch
#0 0x00007f4d380a48da in db_mysql_store_result (_h=0x7f4d38398130,
_r=0x7fff5fcf5918) at km_dbase.c:205
#1 0x00007f4d37867867 in db_do_query_internal (_h=0x7f4d38398130, _k=0x0,
_op=0x0, _v=0x0, _c=0x7fff5fcf5920, _n=0, _nc=5, _o=0x0,
_r=0x7fff5fcf5918, val2str=0x7f4d380ac3a8 <db_mysql_val2str>,
submit_query=0x7f4d380a3a76 <db_mysql_submit_query>,
store_result=0x7f4d380a4063 <db_mysql_store_result>, _l=0)
at db_query.c:137
#2 0x00007f4d37867ba8 in db_do_query (_h=0x7f4d38398130, _k=0x0, _op=0x0,
_v=0x0, _c=0x7fff5fcf5920, _n=0, _nc=5, _o=0x0, _r=0x7fff5fcf5918,
val2str=0x7f4d380ac3a8 <db_mysql_val2str>,
submit_query=0x7f4d380a3a76 <db_mysql_submit_query>,
store_result=0x7f4d380a4063 <db_mysql_store_result>) at db_query.c:156
#3 0x00007f4d380a4d10 in db_mysql_query (_h=0x7f4d38398130, _k=0x0, _op=0x0,
_v=0x0, _c=0x7fff5fcf5920, _n=0, _nc=5, _o=0x0, _r=0x7fff5fcf5918)
at km_dbase.c:263
----------
More information can be found at the following URL:
http://sip-router.org/tracker/index.php?do=details&task_id=420#comment1407
You are receiving this message because you have requested it from the Flyspray bugtracking system. If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.