Hello list,
under special circumstances I have a problem with the DIALOG module of SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl. Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hello,
according tcpdump-traces on the linux machine and log entries in syslog (see below) the 200 OK passed the SIP Router (this means - the call was accounted and afterwards the bogus event was noted):
Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116202@10.16.10.99 is coming in Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152 Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8 [2285:1225504260] with clid '741466613-36939680-1276187634579@10.16.10.152' and tags '3120233282-36939680-1276187634579' ''
However, this is not the one and only event / state combination I found in syslog. There is also an entry with event 6 in state 1 present:
Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116002@10.16.10.99 is coming in Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152 Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8 [1648:1273002173] with clid '789390150-37017968-1276191635112@10.16.10.152' and tags '3805046041-37017968-1276191635112' ''
But tomorrow I will test the setting - as recommended by you - to reduce kamailio to a single worker and check if the problem will keep repreduceable.
Regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hello Ovidiu,
I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
regards,
Klaus Feichtinger
Datum: Thu, 10 Jun 2010 22:24:52 +0200 Von: Klaus Feichtinger klaus.feichtinger@gmx.net An: Ovidiu Sas osas@voipembedded.com CC: sr-dev@lists.sip-router.org Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2
Hello,
according tcpdump-traces on the linux machine and log entries in syslog (see below) the 200 OK passed the SIP Router (this means - the call was accounted and afterwards the bogus event was noted):
Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116202@10.16.10.99 is coming in Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152 Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8 [2285:1225504260] with clid '741466613-36939680-1276187634579@10.16.10.152' and tags '3120233282-36939680-1276187634579' ''
However, this is not the one and only event / state combination I found in syslog. There is also an entry with event 6 in state 1 present:
Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116002@10.16.10.99 is coming in Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152 Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8 [1648:1273002173] with clid '789390150-37017968-1276191635112@10.16.10.152' and tags '3805046041-37017968-1276191635112' ''
But tomorrow I will test the setting - as recommended by you - to reduce kamailio to a single worker and check if the problem will keep repreduceable.
Regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of
SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in
general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked
nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hello Klaus,
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out. For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 9:01 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello Ovidiu,
I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
regards,
Klaus Feichtinger
Datum: Thu, 10 Jun 2010 22:24:52 +0200 Von: Klaus Feichtinger klaus.feichtinger@gmx.net An: Ovidiu Sas osas@voipembedded.com CC: sr-dev@lists.sip-router.org Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2
Hello,
according tcpdump-traces on the linux machine and log entries in syslog (see below) the 200 OK passed the SIP Router (this means - the call was accounted and afterwards the bogus event was noted):
Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116202@10.16.10.99 is coming in Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152 Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8 [2285:1225504260] with clid '741466613-36939680-1276187634579@10.16.10.152' and tags '3120233282-36939680-1276187634579' ''
However, this is not the one and only event / state combination I found in syslog. There is also an entry with event 6 in state 1 present:
Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116002@10.16.10.99 is coming in Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152 Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8 [1648:1273002173] with clid '789390150-37017968-1276191635112@10.16.10.152' and tags '3805046041-37017968-1276191635112' ''
But tomorrow I will test the setting - as recommended by you - to reduce kamailio to a single worker and check if the problem will keep repreduceable.
Regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of
SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in
general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked
nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl. Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl
On Monday 14 June 2010, Ovidiu Sas wrote:
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out. For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Hi Timo,
i don't remember exactly, did you also discussed about this issue in your recent dialog re-design proposal, or is this a new problem?
Regards,
Henning
This is an old problem, since the creation of the dialog module.
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 9:21 AM, Henning Westerholt henning.westerholt@1und1.de wrote:
On Monday 14 June 2010, Ovidiu Sas wrote:
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out. For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Hi Timo,
i don't remember exactly, did you also discussed about this issue in your recent dialog re-design proposal, or is this a new problem?
Regards,
Henning
Ovidiu Sas wrote:
This is an old problem, since the creation of the dialog module.
Don't think so. Please have a look at 1.5:
http://git.sip-router.org/cgi-bin/gitweb.cgi?p=kamailio;a=blob;f=modules/dia...
Cheers,
--Timo
On Mon, Jun 14, 2010 at 9:21 AM, Henning Westerholt henning.westerholt@1und1.de wrote:
On Monday 14 June 2010, Ovidiu Sas wrote:
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out. For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Hi Timo,
i don't remember exactly, did you also discussed about this issue in your recent dialog re-design proposal, or is this a new problem?
Regards,
Henning
Yeah, this was fixed only in 1.5 by myself (backport from opensips): http://openser.svn.sourceforge.net/viewvc/openser?view=rev&revision=5748
The problem was there since the dialog module was created (in all versions prior to 1.5) - this is what I meant by "This is an old problem, since the creation of the dialog module". In kamailio 3.0, this is a regression due to the new ser tm module. The sip-router tm module must be updated in order to avoid this regression.
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 9:56 AM, Timo Reimann timo.reimann@1und1.de wrote:
Ovidiu Sas wrote:
This is an old problem, since the creation of the dialog module.
Don't think so. Please have a look at 1.5:
http://git.sip-router.org/cgi-bin/gitweb.cgi?p=kamailio;a=blob;f=modules/dia...
Cheers,
--Timo
On Mon, Jun 14, 2010 at 9:21 AM, Henning Westerholt henning.westerholt@1und1.de wrote:
On Monday 14 June 2010, Ovidiu Sas wrote:
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out. For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Hi Timo,
i don't remember exactly, did you also discussed about this issue in your recent dialog re-design proposal, or is this a new problem?
Regards,
Henning
-- Timo Reimann Software-Developer
1&1 Internet AG - Consumer Core Ernst-Frey-Straße 9 · DE-76135 Karlsruhe Telefon: +49 721 91374-8122 timo.reimann@1und1.de
Amtsgericht Montabaur / HRB 6484 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas Gottschlich, Robert Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Gert Nowotny, Jan Oetjen Aufsichtsratsvorsitzender: Michael Scheeren
On 6/14/10 4:06 PM, Ovidiu Sas wrote:
[...] In kamailio 3.0, this is a regression due to the new ser tm module. The sip-router tm module must be updated in order to avoid this regression.
IIRC, there was a corresponding TM callback in 3.0 that was used instead -- a callback executed before sending the reply. Unfortunately I cannot check right now the sources, I will do it later.
Cheers, Daniel
There is a new TM callback, but it is an alias to the old TMCB_RESPONSE_OUT.
/* response ready - might need to be executed before sending reply out */ #define TMCB_RESPONSE_READY TMCB_RESPONSE_OUT
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 11:31 AM, Daniel-Constantin Mierla miconda@gmail.com wrote:
On 6/14/10 4:06 PM, Ovidiu Sas wrote:
[...] In kamailio 3.0, this is a regression due to the new ser tm module. The sip-router tm module must be updated in order to avoid this regression.
IIRC, there was a corresponding TM callback in 3.0 that was used instead -- a callback executed before sending the reply. Unfortunately I cannot check right now the sources, I will do it later.
Cheers, Daniel
-- Daniel-Constantin Mierla Asterisk SIP Router Masterclass Malaga, Spain - July 5-9, 2010 http://www.asipto.com/
Hello,
last week I committed an update for this. If any of you can run some tests and report back, would be great. Plan is to backport to 3.0 as soon as some testing was done to be sure it does not break something else.
Thanks, Daniel
On 6/14/10 5:35 PM, Ovidiu Sas wrote:
There is a new TM callback, but it is an alias to the old TMCB_RESPONSE_OUT.
/* response ready - might need to be executed before sending reply out */ #define TMCB_RESPONSE_READY TMCB_RESPONSE_OUT
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 11:31 AM, Daniel-Constantin Mierla miconda@gmail.com wrote:
On 6/14/10 4:06 PM, Ovidiu Sas wrote:
[...] In kamailio 3.0, this is a regression due to the new ser tm module. The sip-router tm module must be updated in order to avoid this regression.
IIRC, there was a corresponding TM callback in 3.0 that was used instead -- a callback executed before sending the reply. Unfortunately I cannot check right now the sources, I will do it later.
Cheers, Daniel
-- Daniel-Constantin Mierla Asterisk SIP Router Masterclass Malaga, Spain - July 5-9, 2010 http://www.asipto.com/
Hello,
today I've tested the committed fix in an actual development version (tar.gz) from yesterday.
In this test I have compared the released Kamailio 3.0.2 version with the development version. The bug was reproduceable within a minute in the released version. But in the development version this fault didn't occur. The configuration is the same as at the beginning when I found this problem!
How was it tested? With SIPp (UAC + UAS) in both cases. The development version was tested with a standard call setup over sip-route and a call rate of 20 calls / second for a period of 60 minutes. No bogus event! In the released version the bogus event occured after 30 seconds with a call rate of 10 calls / seconds.
Note - this is not a classification about a call rate when this fault may start! It was only a free decision without testing the intervals in between. At the beginning I started with a call rate of 2 / second. In that case no error occured in both versions.
I hope this test is enough for a first comment.
regards,
Klaus
Am 05.07.2010 12:29, schrieb Daniel-Constantin Mierla:
Hello,
last week I committed an update for this. If any of you can run some tests and report back, would be great. Plan is to backport to 3.0 as soon as some testing was done to be sure it does not break something else.
Thanks, Daniel
On 6/14/10 5:35 PM, Ovidiu Sas wrote:
There is a new TM callback, but it is an alias to the old TMCB_RESPONSE_OUT.
/* response ready - might need to be executed before sending reply out */ #define TMCB_RESPONSE_READY TMCB_RESPONSE_OUT
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 11:31 AM, Daniel-Constantin Mierla miconda@gmail.com wrote:
On 6/14/10 4:06 PM, Ovidiu Sas wrote:
[...] In kamailio 3.0, this is a regression due to the new ser tm module. The sip-router tm module must be updated in order to avoid this regression.
IIRC, there was a corresponding TM callback in 3.0 that was used instead -- a callback executed before sending the reply. Unfortunately I cannot check right now the sources, I will do it later.
Cheers, Daniel
-- Daniel-Constantin Mierla Asterisk SIP Router Masterclass Malaga, Spain - July 5-9, 2010 http://www.asipto.com/
Hi,
Ovidiu Sas wrote:
It is a race between the 200ok being sent out and the update of the dialog state machine. The ACK is received while the dialog state machine was not updates for the previously sent 200ok. IIRC, in opensips this was fixed by introducing a new tm callback in order to update the dialog state machine as soon as the 200ok was sent out.
Taking a look at dlg_handlers.c from 3.0.2, one can see that the function which transitions the dialog state (dlg_onreply) is called back from TMCB_RESPONSE_FWDED or TMCB_RESPONSE_READY. While the former is called just before a response is forwarded (but hasn't been so yet), the latter is mapped to TMCB_RESPONSE_OUT which, according to t_hooks.h and the code in t_reply.c, is called *after* the response has been delivered.
In 1.5, however, dialog's reply handler function already relies on TMCB_RESPONSE_PRE_OUT which properly avoids the possibility of receiving an ACK prior to the complete processing of a previous OK response, and in turn race conditions.
Consequently, it seems to me that 3.0.x introduced a bug by not using the proper tm callback anymore, specifically not one that finishes for sure before any message has been sent out. I'm not quite sure whether TMCB_RESPONSE_READY has simply been given the wrong value (its name indicates to me that the response is still waiting to be transmitted) or the tm module uses the wrong callback macros: Apparently, 1.5's *_PRE_* callbacks have gone and *_SENT introduced instead in 3.0.2. Not sure if that circumstance is truly reflected by the code.
Cheers,
--Timo
For now, the only way to avoid this issue is to run with one single listener. This will keep the ACK into the received queue until the listener finishes the processing of received 200ok.
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 9:01 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello Ovidiu,
I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
regards,
Klaus Feichtinger
Datum: Thu, 10 Jun 2010 22:24:52 +0200 Von: Klaus Feichtinger klaus.feichtinger@gmx.net An: Ovidiu Sas osas@voipembedded.com CC: sr-dev@lists.sip-router.org Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2 Hello,
according tcpdump-traces on the linux machine and log entries in syslog (see below) the 200 OK passed the SIP Router (this means - the call was accounted and afterwards the bogus event was noted):
Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116202@10.16.10.99 is coming in Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152 Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8 [2285:1225504260] with clid '741466613-36939680-1276187634579@10.16.10.152' and tags '3120233282-36939680-1276187634579' ''
However, this is not the one and only event / state combination I found in syslog. There is also an entry with event 6 in state 1 present:
Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO: <script>: ---xlog: a call with sip:116002@10.16.10.99 is coming in Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152 Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog [dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8 [1648:1273002173] with clid '789390150-37017968-1276191635112@10.16.10.152' and tags '3805046041-37017968-1276191635112' ''
But tomorrow I will test the setting - as recommended by you - to reduce kamailio to a single worker and check if the problem will keep repreduceable.
Regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of
SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in
general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked
nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl. Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Am 14.06.2010 15:01, schrieb Klaus Feichtinger:
Hello Ovidiu,
I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
This sounds like a race condition where process X received the 200 OK, forwards it and then gets suspended before before updating the dialog table.
Then, the ACK is received, forwarded and dialog state updated by process Y.
Maybe it can be solved by updating the dialog state before sending out the message.
regards klaus
regards,
Klaus Feichtinger
Datum: Thu, 10 Jun 2010 22:24:52 +0200 Von: Klaus Feichtingerklaus.feichtinger@gmx.net An: Ovidiu Sasosas@voipembedded.com CC: sr-dev@lists.sip-router.org Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2
Hello,
according tcpdump-traces on the linux machine and log entries in syslog (see below) the 200 OK passed the SIP Router (this means - the call was accounted and afterwards the bogus event was noted):
Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO:<script>: ---xlog: a call with sip:116202@10.16.10.99 is coming in Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152
Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8 [2285:1225504260] with clid '741466613-36939680-1276187634579@10.16.10.152' and tags '3120233282-36939680-1276187634579' ''
However, this is not the one and only event / state combination I found in syslog. There is also an entry with event 6 in state 1 present:
Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO:<script>: ---xlog: a call with sip:116002@10.16.10.99 is coming in Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112@10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152
Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8 [1648:1273002173] with clid '789390150-37017968-1276191635112@10.16.10.152' and tags '3805046041-37017968-1276191635112' ''
But tomorrow I will test the setting - as recommended by you - to reduce kamailio to a single worker and check if the problem will keep repreduceable.
Regards,
Klaus Feichtinger
Bogus event 6 in state 2 means ACK received in early state. This is a little bit bizarre, because it seems that the dialog did not see the 200ok passing by. Try to use a single worker (children=1) and see if you can reproduce the problem. Also, check that the 200ok is really going through the proxy.
Regards, Ovidiu Sas
On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger klaus.feichtinger@gmx.net wrote:
Hello list,
under special circumstances I have a problem with the DIALOG module of
SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules + db_mysql modules. The db_mode is set to "write_through", because of redundancy requirements.
From my point of view this problem has something to do with timing in
general. E.g. a so called "data call" has a duration of about 50 ms only; INVITE....BYE). That problem occurs on this short calls only. Because of the problems in the dialog state machine, NOTIFY messages are incorrect. Even when the call is already finished, the NOTIFY message includes the state "confirmed" and causes a wrong status indication....
The detailed error message can be seen here: "Jun 9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog
[dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
I found in old mails of the developer list another error that looked
nearly the same, but it differed in the event and state of the dialog state machine (the thread can be found under the link http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the information that I've found in the list - be solved. Therefore I will ask you: is the error as displayed above another well known error / bug, which (maybe) should already be solved? Is it a new bug?
Thanks for any information.
regards,
Klaus Feichtinger
_______________________________________________ sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
_______________________________________________ sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
This is exactly how it was fixed in opensips by introducing a new tm callback: TMCB_RESPONSE_PRE_OUT.
Regards, Ovidiu Sas
On Mon, Jun 14, 2010 at 9:23 AM, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Am 14.06.2010 15:01, schrieb Klaus Feichtinger:
Hello Ovidiu,
I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
This sounds like a race condition where process X received the 200 OK, forwards it and then gets suspended before before updating the dialog table.
Then, the ACK is received, forwarded and dialog state updated by process Y.
Maybe it can be solved by updating the dialog state before sending out the message.
regards klaus
On Monday 14 June 2010, Ovidiu Sas wrote:
This is exactly how it was fixed in opensips by introducing a new tm callback: TMCB_RESPONSE_PRE_OUT.
Hi,
this is the commit in question related to the CB:
http://www.mail-archive.com/devel@lists.opensips.org/msg01356.html
Andrei, i guess its not possible (probably not even a good idea) to just port this changes from them to our TM? Or do you thought of a similar logic for the new introduced TMCB_RESPONSE_READY callback?
Thanks and regards,
Henning
On Jun 25, 2010 at 14:36, Henning Westerholt henning.westerholt@1und1.de wrote:
On Monday 14 June 2010, Ovidiu Sas wrote:
This is exactly how it was fixed in opensips by introducing a new tm callback: TMCB_RESPONSE_PRE_OUT.
Hi,
this is the commit in question related to the CB:
http://www.mail-archive.com/devel@lists.opensips.org/msg01356.html
Andrei, i guess its not possible (probably not even a good idea) to just port this changes from them to our TM? Or do you thought of a similar logic for the new introduced TMCB_RESPONSE_READY callback?
Daniel looked some time ago at the callbacks used in kamailio and at the time he thought that the ones we have in sr are enough (there was only some name change). I'm not at all familiar with what's used or needed by the dialog module, so let's see what Daniel has to say about it.
Andrei
Hello,
On 6/26/10 11:41 AM, Andrei Pelinescu-Onciul wrote:
On Jun 25, 2010 at 14:36, Henning Westerholthenning.westerholt@1und1.de wrote:
On Monday 14 June 2010, Ovidiu Sas wrote:
This is exactly how it was fixed in opensips by introducing a new tm callback: TMCB_RESPONSE_PRE_OUT.
Hi,
this is the commit in question related to the CB:
http://www.mail-archive.com/devel@lists.opensips.org/msg01356.html
Andrei, i guess its not possible (probably not even a good idea) to just port this changes from them to our TM? Or do you thought of a similar logic for the new introduced TMCB_RESPONSE_READY callback?
Daniel looked some time ago at the callbacks used in kamailio and at the time he thought that the ones we have in sr are enough (there was only some name change). I'm not at all familiar with what's used or needed by the dialog module, so let's see what Daniel has to say about it.
Indeed, I thought it was ok when ported the module to new tm, but seems I need to review again. Ovidiu sent me also some description about this, but I had no time to look over last week due to unavailability. I will try to get to it by mid of next week, if someone has patches, please send over or attach on tracker.
Cheers, Daniel
On Sunday 27 June 2010, Daniel-Constantin Mierla wrote:
Daniel looked some time ago at the callbacks used in kamailio and at the time he thought that the ones we have in sr are enough (there was only some name change). I'm not at all familiar with what's used or needed by the dialog module, so let's see what Daniel has to say about it.
Indeed, I thought it was ok when ported the module to new tm, but seems I need to review again. Ovidiu sent me also some description about this, but I had no time to look over last week due to unavailability. I will try to get to it by mid of next week, if someone has patches, please send over or attach on tracker.
Hi Daniel,
i've included one diff link for the TM callback extensions in an earlier mail in this thread, but was not able to adapt the patch to our TM so far.
Cheers,
Henning