Hi,
Ovidiu Sas wrote:
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out.
Taking a look at dlg_handlers.c from 3.0.2, one can see that the function which transitions the dialog state (dlg_onreply) is called back from TMCB_RESPONSE_FWDED or TMCB_RESPONSE_READY. While the former is called just before a response is forwarded (but hasn't been so yet), the latter is mapped to TMCB_RESPONSE_OUT which, according to t_hooks.h and the code in t_reply.c, is called *after* the response has been delivered.
In 1.5, however, dialog's reply handler function already relies on TMCB_RESPONSE_PRE_OUT which properly avoids the possibility of receiving an ACK prior to the complete processing of a previous OK response, and in turn race conditions.
Consequently, it seems to me that 3.0.x introduced a bug by not using the proper tm callback anymore, specifically not one that finishes for sure before any message has been sent out. I'm not quite sure whether TMCB_RESPONSE_READY has simply been given the wrong value (its name indicates to me that the response is still waiting to be transmitted) or the tm module uses the wrong callback macros: Apparently, 1.5's *_PRE_* callbacks have gone and *_SENT introduced instead in 3.0.2. Not sure if that circumstance is truly reflected by the code.
Cheers,
--Timo
For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 9:01 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello Ovidiu,
I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
regards,
Klaus Feichtinger
Datum: Thu, 10 Jun 2010 22:24:52 +0200 Von: Klaus Feichtinger klaus.feichtinger@gmx.net An: Ovidiu Sas osas@voipembedded.com CC: sr-dev@lists.sip-router.org Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2 Hello,
according tcpdump-traces on the linux machine and log entries in syslog (see below) the 200 OK passed the SIP Router (this means - the call was accounted and afterwards the bogus event was noted):
Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116202@10.16.10.99 is coming in Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152 Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8 [2285:1225504260] with clid '741466613-36939680-1276187634579@10.16.10.152' and tags '3120233282-36939680-1276187634579' ''
However, this is not the one and only event / state combination I found in syslog. There is also an entry with event 6 in state 1 present:
Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116002@10.16.10.99 is coming in Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152 Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8 [1648:1273002173] with clid '789390150-37017968-1276191635112@10.16.10.152' and tags '3805046041-37017968-1276191635112' ''
But tomorrow I will test the setting - as recommended by you - to reduce kamailio to a single worker and check if the problem will keep repreduceable.
Regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of
SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in
general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked
nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl. Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev