Hi, Kamailio 1.5.4, let's suppose this case:
1) A dialog in eary state (state 2):
~# kamctl fifo profile_list_dlgs out dialog:: hash=1055:330534072 state:: 2
2) Now try to terminate it:
~# kamctl fifo dlg_end_dlg 1055 330534072
3) It produces an error because in the early dialog there is no Contact for the called:
ERROR:dialog:build_dlg_t: no contact available ERROR:dialog:send_bye: failed to create dlg_t CRITICAL:dialog:log_next_state_dlg: bogus event 7 in state 2 for dlg 0x7f25f0cde718 [3168:435552476] with clid 'hqpprxbyrutiytu@ibc-torre' and tags 'ysldc' ''
4) The dialog remains in state 2 (as the MI command shows), and later it's cancelled by the UAC.
5) MI profile_list_dlgs out will show the dialog in state 5 *forever*, it's never deleted from memory! (perhaps it's after expiration time, haven't checked it).
NOTE: I do know that dlg_end_dlg is not ready for early dialogs, as it should trigger a transaction cancel (CANCEL to all the branches and 408 to the UAC) rather than sending a BYE, but this feature is not implemented. However the current code could leak.
2010/7/9 Iñaki Baz Castillo ibc@aliax.net:
However the current code could leak.
And IMHO it leaks!. The reason is that after the CANCEL the dialog information remains as follows:
dialog:: hash=3132:647756461 state:: 5 timestart:: 0 timeout:: 0 callid:: knfmgpcorrteiia@ibc-torre from_uri:: sip:test_ibc@somedomain.org from_tag:: vicxp caller_contact:: sip:test_ibc@X.X.X.X caller_cseq:: 326 caller_route_set:: caller_bind_addr:: udp:X.X.X.X:5060 to_uri:: sip:XXXXXX@somedomain.org to_tag:: callee_contact:: callee_cseq:: callee_route_set:: callee_bind_addr::
This is, there is no timestart neither timeout values, so even if the expiration time for dialgo module is set to 60 seconds, the dialog remains in memory forever!
Most probably, when the problem "CRITICAL:dialog:log_next_state_dlg: bogus event 7 in state 2 for dlg" occurs (due to a buggy device or whatever) same issue could occur so dialog module would be leaking memory. Perhaps this has something to do with the problems I issued yesterday in a production server.
Hey Iñaki,
Iñaki Baz Castillo wrote:
2010/7/9 Iñaki Baz Castillo ibc@aliax.net:
However the current code could leak.
And IMHO it leaks!. The reason is that after the CANCEL the dialog information remains as follows:
dialog:: hash=3132:647756461 state:: 5 timestart:: 0 timeout:: 0 callid:: knfmgpcorrteiia@ibc-torre from_uri:: sip:test_ibc@somedomain.org from_tag:: vicxp caller_contact:: sip:test_ibc@X.X.X.X caller_cseq:: 326 caller_route_set:: caller_bind_addr:: udp:X.X.X.X:5060 to_uri:: sip:XXXXXX@somedomain.org to_tag:: callee_contact:: callee_cseq:: callee_route_set:: callee_bind_addr::
This is, there is no timestart neither timeout values, so even if the expiration time for dialgo module is set to 60 seconds, the dialog remains in memory forever!
I don't think that it leaks. Please have a look at the get_expired_dlgs(unsigned int time) function in dlg_timer.c: The loop condition to get all expired dialogs is
while( tl!=end && tl->timeout <= time) { ^^^^^^^^^^^^^^^^^^^ [...] }
The unit of "time" is ticks, so the while-loop picks those dialogs for cleanup whose timeout value is lower or equal the current number of ticks. Certainly, this is true for a timeout of zero.
I think it makes sense this way because dlg_end_dlg isn't supposed to change the dialog state in case of failure (and checking mi_terminate_dlg(), I believe it doesn't), so a subsequent, UAC-initiated CANCEL shouldn't be any different from such a CANCEL not involving a call to dlg_end_dlg(). A leak in this case would have been detected already, hopefully.
Nevertheless, I think the current dialog module should be hotfixed such that dlg_end_dlg doesn't end dialogs in the "early" state simply because it's not capable of doing so. The upcoming, refurbished dialog module should do better though.
Cheers,
--Timo
2010/7/9 Timo Reimann timo.reimann@1und1.de:
Hey Iñaki,
Iñaki Baz Castillo wrote:
2010/7/9 Iñaki Baz Castillo ibc@aliax.net:
However the current code could leak.
And IMHO it leaks!. The reason is that after the CANCEL the dialog information remains as follows:
dialog:: hash=3132:647756461 state:: 5 timestart:: 0 timeout:: 0 callid:: knfmgpcorrteiia@ibc-torre from_uri:: sip:test_ibc@somedomain.org from_tag:: vicxp caller_contact:: sip:test_ibc@X.X.X.X caller_cseq:: 326 caller_route_set:: caller_bind_addr:: udp:X.X.X.X:5060 to_uri:: sip:XXXXXX@somedomain.org to_tag:: callee_contact:: callee_cseq:: callee_route_set:: callee_bind_addr::
This is, there is no timestart neither timeout values, so even if the expiration time for dialgo module is set to 60 seconds, the dialog remains in memory forever!
I don't think that it leaks. Please have a look at the get_expired_dlgs(unsigned int time) function in dlg_timer.c: The loop condition to get all expired dialogs is
while( tl!=end && tl->timeout <= time) { ^^^^^^^^^^^^^^^^^^^ [...] }
The unit of "time" is ticks, so the while-loop picks those dialogs for cleanup whose timeout value is lower or equal the current number of ticks. Certainly, this is true for a timeout of zero.
Hi Timo, thanks for your response. Unfortunatelly that is not what I get in my tests. If now I repeat the step 5 of my first mail I sitll see such dialog (state 5) with timeout 0 and timestart 0, but it still appears when retrieving the dialog list via MI. So IMHO such information is obviously leaking (note that dialog module expiration value is 60 seconds in case).
Could you please perform the same experiment I've done in my first mail? I can reproduce it 100% of times.
Nevertheless, I think the current dialog module should be hotfixed such that dlg_end_dlg doesn't end dialogs in the "early" state simply because it's not capable of doing so. The upcoming, refurbished dialog module should do better though.
Great :)
2010/7/9 Iñaki Baz Castillo ibc@aliax.net:
(note that dialog module expiration value is 60 seconds in case).
I mean "in my case".
Hey Iñaki,
Iñaki Baz Castillo wrote:
2010/7/9 Timo Reimann timo.reimann@1und1.de:
Hey Iñaki,
Iñaki Baz Castillo wrote:
2010/7/9 Iñaki Baz Castillo ibc@aliax.net:
However the current code could leak.
And IMHO it leaks!. The reason is that after the CANCEL the dialog information remains as follows:
dialog:: hash=3132:647756461 state:: 5 timestart:: 0 timeout:: 0 callid:: knfmgpcorrteiia@ibc-torre from_uri:: sip:test_ibc@somedomain.org from_tag:: vicxp caller_contact:: sip:test_ibc@X.X.X.X caller_cseq:: 326 caller_route_set:: caller_bind_addr:: udp:X.X.X.X:5060 to_uri:: sip:XXXXXX@somedomain.org to_tag:: callee_contact:: callee_cseq:: callee_route_set:: callee_bind_addr::
This is, there is no timestart neither timeout values, so even if the expiration time for dialgo module is set to 60 seconds, the dialog remains in memory forever!
I don't think that it leaks. Please have a look at the get_expired_dlgs(unsigned int time) function in dlg_timer.c: The loop condition to get all expired dialogs is
while( tl!=end && tl->timeout <= time) { ^^^^^^^^^^^^^^^^^^^ [...] }
The unit of "time" is ticks, so the while-loop picks those dialogs for cleanup whose timeout value is lower or equal the current number of ticks. Certainly, this is true for a timeout of zero.
Hi Timo, thanks for your response. Unfortunatelly that is not what I get in my tests. If now I repeat the step 5 of my first mail I sitll see such dialog (state 5) with timeout 0 and timestart 0, but it still appears when retrieving the dialog list via MI. So IMHO such information is obviously leaking (note that dialog module expiration value is 60 seconds in case).
There's a good chance you are right and I am wrong because I was arguing on a mere theoretical basis. That is, I was being lazy and looked at code only. :)
Could you please perform the same experiment I've done in my first mail? I can reproduce it 100% of times.
Will do and report back as soon as I get to it.
Cheers,
--Timo
2010/7/9 Timo Reimann timo.reimann@1und1.de:
Hi Timo, thanks for your response. Unfortunatelly that is not what I get in my tests. If now I repeat the step 5 of my first mail I sitll see such dialog (state 5) with timeout 0 and timestart 0, but it still appears when retrieving the dialog list via MI. So IMHO such information is obviously leaking (note that dialog module expiration value is 60 seconds in case).
There's a good chance you are right and I am wrong because I was arguing on a mere theoretical basis. That is, I was being lazy and looked at code only. :)
:)
Yes, the code you show seems to work but I suspect that something "bad" occurs in the step 2 of my experiment (trying to terminate an early-dialog with still no remote-target as there is no Contact in the 180). Perhaps after that the dialog information is changed somehow, or un-indexed from a list or whatever. But the fact is that after the dialog is terminated (CANCEL by teh UAC) it remains forever in state 5.
Could you please perform the same experiment I've done in my first mail? I can reproduce it 100% of times.
Will do and report back as soon as I get to it.
Thanks a lot. BTW I'm using kamailio 1.5.4.
Hey,
Iñaki Baz Castillo wrote:
Could you please perform the same experiment I've done in my first mail? I can reproduce it 100% of times.
Will do and report back as soon as I get to it.
Thanks a lot. BTW I'm using kamailio 1.5.4.
I was able to repeat your results with Kamailio 1.5 SVN: Whenever I issue "dlg_end_dlg" on a call in the "early" state, it will never get cleaned up.
The reason is that the reference counter isn't properly decremented. Normal calls where no BYE message is forced correctly drop to zero references and finally let the dialog module clean up the terminated call. However, with dlg_end_dlg-enforced calls, the dialog drops no less than one no matter how long you wait.
I know the error is somewhere within send_bye(), and I'm in the process of closing in. Will report again once I find the right spot and probably also provide a patch right away.
Oh, and by the way: Kamailio does deny sending out BYE requests for calls not in the "confirmed" state as of now. However, this check is done in the tm module called by the dialog module, and the latter passes to the former a dialog state of DLG_CONFIRMED *always*. That is, it doesn't respect the current dialog's state. I have no clue why but will certainly consider using the real dialog state to work around the module's current deficiencies.
Cheers,
--Timo
2010/7/12 Timo Reimann timo.reimann@1und1.de:
Iñaki Baz Castillo wrote:
Could you please perform the same experiment I've done in my first mail? I can reproduce it 100% of times.
Will do and report back as soon as I get to it.
Thanks a lot. BTW I'm using kamailio 1.5.4.
I was able to repeat your results with Kamailio 1.5 SVN: Whenever I issue "dlg_end_dlg" on a call in the "early" state, it will never get cleaned up.
The reason is that the reference counter isn't properly decremented. Normal calls where no BYE message is forced correctly drop to zero references and finally let the dialog module clean up the terminated call. However, with dlg_end_dlg-enforced calls, the dialog drops no less than one no matter how long you wait.
I know the error is somewhere within send_bye(), and I'm in the process of closing in. Will report again once I find the right spot and probably also provide a patch right away.
Thanks a lot for your work.
Oh, and by the way: Kamailio does deny sending out BYE requests for calls not in the "confirmed" state as of now. However, this check is done in the tm module called by the dialog module, and the latter passes to the former a dialog state of DLG_CONFIRMED *always*.
Opsss, and this is why it gives an error when there is no remote-target (no Contact in the UAS's responses yet).
That is, it doesn't respect the current dialog's state. I have no clue why but will certainly consider using the real dialog state to work around the module's current deficiencies.
Definitely, 'dialog' module is the most weakest piece in Kamailio. Hopefully your work will change it :)
Best regards.
On Monday 12 July 2010, Iñaki Baz Castillo wrote:
[..]
That is, it doesn't respect the current dialog's state. I have no clue why but will certainly consider using the real dialog state to work around the module's current deficiencies.
Definitely, 'dialog' module is the most weakest piece in Kamailio. Hopefully your work will change it :)
Hi Iñaki,
well, there are probably some not other that much used module in the tree which are therefore in a weaker state. But i agree somehow, despite the modules popularity they are still some interesting bugs hidden inside.. :-)
Cheers,
Henning
Hey Iñaki,
Iñaki Baz Castillo wrote:
2010/7/12 Timo Reimann timo.reimann@1und1.de:
Oh, and by the way: Kamailio does deny sending out BYE requests for calls not in the "confirmed" state as of now. However, this check is done in the tm module called by the dialog module, and the latter passes to the former a dialog state of DLG_CONFIRMED *always*.
Opsss, and this is why it gives an error when there is no remote-target (no Contact in the UAS's responses yet).
Yep, exactly.
Done with forensic. This is why the leak happens for early dialogs:
(1) dlg_end_dlg triggers generation of a BYE request for the caller which succeeds because the caller's Contact address was stored during processing of the initial INVITE. (2) The reference counter is increased by one and supposed to be decremented again during processing of the BYE response (in bye_reply_cb()) from the caller. (3) The BYE request is send to the caller. However, it replies with "481 Call Leg/Transaction Does Not Exist" because the request is missing the callee's To tag which is not stored in the dialog module prior to the call's transition to the confirmed state. (4) The caller's 481 response is dialog-handled in bye_reply_cb(). What this function basically does is on reception of a final response, run the proxy-initiated BYE request (not the response) through the state machine and decrease the reference counter *if the new dialog state is "terminated" after completion of the state machine*. However, because BYE requests in the early state do not trigger state transitions due to the fact that the dialog module cannot tell yet whether a single branch or the entire call was terminated the "early" state is maintained and, in consequence, the reference counter *not decremented*. (5) Generation of a BYE request for the callee is triggered. However, it fails because no callee Contact is available. The reference counter will not be touched either though. (6) When the UAC sends out a CANCEL request the call is torn down. However, the dialog structure will not be deleted because the reference counter can not drop lower than one.
The bottom line of this is that the dialog module leaks because it cannot handle in-early-dialog BYE requests accordingly. The way to deal with this at the moment is to deny proxy-initiated dialog termination unless the call is in the "confirmed" state because that's the only state it may manage to succeed. I say *may* because if a flawed or crashed UA is not able to process such a BYE properly it will reply with a non-2xx response or not at all, respectively, by which the proxy should consider its termination attempt to have failed and, subsequently, not delete the dialog IMHO. Even worse, you could have a situation where only one of the call participants shuts down the session while the other doesn't, leading to a weird kind of one-sided dialog. In order to avoid this, I believe the proxy should have UA response codes affect the state machinery and not rely on its BYE requests only.
However, the latter is something to be done in future dialog module. To prevent any kind of leakage in the current implementation, my proposed fix would be to
(1) Deny proxy-initiated dialog termination unless the call is in the "confirmed" state and (2) always decrement the reference counter during BYE response handling.
Let me know what you think of this.
Cheers,
--Timo
On Monday 12 July 2010, Timo Reimann wrote:
[..] However, the latter is something to be done in future dialog module. To prevent any kind of leakage in the current implementation, my proposed fix would be to
(1) Deny proxy-initiated dialog termination unless the call is in the "confirmed" state and
Hi Timo,
so if understand correctly this should fix the leak that was reported, as the refcnt should be then not increased prior it comes to this invalid internal state.
(2) always decrement the reference counter during BYE response handling.
But why is this additionally necessary? Does this apply to the handling of all BYEs?
Henning
2010/7/12 Henning Westerholt henning.westerholt@1und1.de:
On Monday 12 July 2010, Timo Reimann wrote:
[..] However, the latter is something to be done in future dialog module. To prevent any kind of leakage in the current implementation, my proposed fix would be to
(1) Deny proxy-initiated dialog termination unless the call is in the "confirmed" state and
Hi Timo,
so if understand correctly this should fix the leak that was reported, as the refcnt should be then not increased prior it comes to this invalid internal state.
I think so.
(2) always decrement the reference counter during BYE response handling.
But why is this additionally necessary?
AFAIU ("U" = understand X-D) he means that after the proxy generates the BYE it should inmediately decrease the reference counter without waiting for the response.
Does this apply to the handling of all BYEs?
Good question. Let me ask another question (the same in other words):
When a BYE is received in Kamailio, is the dialog terminated when the BYE is processed? or when the BYE receives a 2XX? what about if the BYE receives a [3456]XX response (or no response at all)?
Hey,
Iñaki Baz Castillo wrote:
2010/7/12 Henning Westerholt henning.westerholt@1und1.de:
On Monday 12 July 2010, Timo Reimann wrote:
[..] However, the latter is something to be done in future dialog module. To prevent any kind of leakage in the current implementation, my proposed fix would be to
(1) Deny proxy-initiated dialog termination unless the call is in the "confirmed" state and
so if understand correctly this should fix the leak that was reported, as the refcnt should be then not increased prior it comes to this invalid internal state.
I think so.
Right -- this part of the fix makes sure that BYE requests are not even sent in the first place for calls that have not reached the confirmed state yet. Otherwise, it would inevitably lead to 481 responses and UAs not terminating because of missing tags.
(2) always decrement the reference counter during BYE response handling.
But why is this additionally necessary?
AFAIU ("U" = understand X-D) he means that after the proxy generates the BYE it should inmediately decrease the reference counter without waiting for the response.
That's what I had in mind. It should guarantee that dialogs are not dangling and leaking no matter how the UAs reply and how that affects dialog state.
However...
Does this apply to the handling of all BYEs?
Good question. Let me ask another question (the same in other words):
When a BYE is received in Kamailio, is the dialog terminated when the BYE is processed? or when the BYE receives a 2XX? what about if the BYE receives a [3456]XX response (or no response at all)?
...your question made me take a look at the code again. For UA-initiated BYE requests the dialog is terminated (as in state) and destroyed (as in cleanup) once the BYE has been sent. For proxy-initiated BYE requests the dialog is terminated and destroyed once the final response to the BYE request is received, or when no response at all is received (because bye_reply_cb() is a callback to TMCB_LOCAL_COMPLETED which triggers for both UA and timeout-generated 408 final responses). Analogously, callbacks to DLGCB_TERMINATED are executed at respective times depending on the use case.
That means that destroying dialogs terminated by proxy-initiated BYE requests right after such requests have been sent cannot work that easily. One would have to move dialog destruction and callback execution into that part of the code where the BYE request is sent. The alternative is to keep the code where it is right now (bye_reply_cb()).
I'm not sure why dialog termination points in time differ for UA-/proxy-initiated BYE requests. Is there a rationale that proxies triggering dialog termination need to ensure that UAs respond or time-out before they can destroy the dialog? How does it differ from UA-initiated dialog terminations?
Maybe you have an idea on this; I will continue thinking about it. In any case, the "always decrement counter during BYE processing" isn't strictly required for the designated fix if the "deny proxy-initiated call termination for non-confirmed calls" rule is in effect. That's because confirmed dialogs should always transition to the terminated state on processing of the BYE request and thereby decrement the counter.
Too much thinking about dialog today. I need a break. :)
Cheers,
--Timo
2010/7/12 Timo Reimann timo.reimann@1und1.de:
I'm not sure why dialog termination points in time differ for UA-/proxy-initiated BYE requests. Is there a rationale that proxies triggering dialog termination need to ensure that UAs respond or time-out before they can destroy the dialog? How does it differ from UA-initiated dialog terminations?
I strongly believe (as RFC 3261 states) that a proxy SHOULD NOT generate in-dialog requests. In fact, 'dialog' module is a bit "hack" ;)
IMHO when a proxy-generated BYE is sent (to both caller and callee) that should be enough to set the dialog in 'terminated' state. There is no reason to behave different than when handling a UA-initiated BYE.
Maybe you have an idea on this; I will continue thinking about it. In any case, the "always decrement counter during BYE processing" isn't strictly required for the designated fix if the "deny proxy-initiated call termination for non-confirmed calls" rule is in effect. That's because confirmed dialogs should always transition to the terminated state on processing of the BYE request and thereby decrement the counter.
Too much thinking about dialog today. I need a break. :)
Ok, I also need a break to see the goal again and again XDDD
On Monday 12 July 2010, Iñaki Baz Castillo wrote:
IMHO when a proxy-generated BYE is sent (to both caller and callee) that should be enough to set the dialog in 'terminated' state. There is no reason to behave different than when handling a UA-initiated BYE.
This sounds indeed resonable.
Maybe you have an idea on this; I will continue thinking about it. In any case, the "always decrement counter during BYE processing" isn't strictly required for the designated fix if the "deny proxy-initiated call termination for non-confirmed calls" rule is in effect. That's because confirmed dialogs should always transition to the terminated state on processing of the BYE request and thereby decrement the counter.
If the "proxy deny" change is enough to fix the problem about the leaking dialogs in this certain case, then maybe we should only change this so far, especially as there are still at least one fix left (tm delete timer dlg) which haven't been commited so far.
Too much thinking about dialog today. I need a break. :)
Ok, I also need a break to see the goal again and again XDDD
Hehe, you've not that much goals in the last games, so i understand. ;-)
Henning
2010/7/13 Henning Westerholt henning.westerholt@1und1.de:
Hehe, you've not that much goals in the last games, so i understand. ;-)
It's better 1-0, 1-0, 1-0 than 4-1, 4-0, 0-1.
Opsss, sorry... :)
On Tuesday 13 July 2010, Iñaki Baz Castillo wrote:
2010/7/13 Henning Westerholt henning.westerholt@1und1.de:
Hehe, you've not that much goals in the last games, so i understand. ;-)
It's better 1-0, 1-0, 1-0 than 4-1, 4-0, 0-1.
Sure, sure. :-) But its ok, as you haven't been won the cup before.. ;-) And against Holland it was well deserved in the end.
Henning
2010/7/13 Henning Westerholt henning.westerholt@1und1.de:
On Tuesday 13 July 2010, Iñaki Baz Castillo wrote:
2010/7/13 Henning Westerholt henning.westerholt@1und1.de:
Hehe, you've not that much goals in the last games, so i understand. ;-)
It's better 1-0, 1-0, 1-0 than 4-1, 4-0, 0-1.
Sure, sure. :-) But its ok, as you haven't been won the cup before.. ;-)
Humm, the question would be: how many times have you celebrated it? (I mean being adult to drink beer) :)
On Tuesday 13 July 2010, Iñaki Baz Castillo wrote:
It's better 1-0, 1-0, 1-0 than 4-1, 4-0, 0-1.
Sure, sure. :-) But its ok, as you haven't been won the cup before.. ;-)
Humm, the question would be: how many times have you celebrated it? (I mean being adult to drink beer) :)
Hi Iñaki,
well, at least in the last weeks people (including me) had enought reasons to celebrate, if you happened to be in the city during a game. ;-)
Cheers,
Henning
2010/7/13 Henning Westerholt henning.westerholt@1und1.de:
On Tuesday 13 July 2010, Iñaki Baz Castillo wrote:
It's better 1-0, 1-0, 1-0 than 4-1, 4-0, 0-1.
Sure, sure. :-) But its ok, as you haven't been won the cup before.. ;-)
Humm, the question would be: how many times have you celebrated it? (I mean being adult to drink beer) :)
Hi Iñaki,
well, at least in the last weeks people (including me) had enought reasons to celebrate, if you happened to be in the city during a game. ;-)
Germany will have its time (again) in Brazil 2014, I'm sure of that.
Iñaki Baz Castillo wrote:
2010/7/13 Henning Westerholt henning.westerholt@1und1.de:
On Tuesday 13 July 2010, Iñaki Baz Castillo wrote:
It's better 1-0, 1-0, 1-0 than 4-1, 4-0, 0-1.
Sure, sure. :-) But its ok, as you haven't been won the cup before.. ;-)
Humm, the question would be: how many times have you celebrated it? (I mean being adult to drink beer) :)
Hi Iñaki,
well, at least in the last weeks people (including me) had enought reasons to celebrate, if you happened to be in the city during a game. ;-)
Germany will have its time (again) in Brazil 2014, I'm sure of that.
There's a small tournament in 2012 which we are going to win too. :)
--Timo
2010/7/13 Timo Reimann timo.reimann@1und1.de:
Germany will have its time (again) in Brazil 2014, I'm sure of that.
There's a small tournament in 2012 which we are going to win too. :)
I'm sorry, but in 2012 all the spanish players winning the world cup are still "available". You must wait a bit more ;)
SIP/2.0 503 Service Unavailable From: sip:germany To: sip:win-something Retry-After: 4 years
Iñaki Baz Castillo wrote:
2010/7/13 Timo Reimann timo.reimann@1und1.de:
Germany will have its time (again) in Brazil 2014, I'm sure of that.
There's a small tournament in 2012 which we are going to win too. :)
I'm sorry, but in 2012 all the spanish players winning the world cup are still "available". You must wait a bit more ;)
SIP/2.0 503 Service Unavailable From: sip:germany To: sip:win-something Retry-After: 4 years
Haha, too good. I'll let you win for the moment and wait for the major Spanish players to turn ancient (i.e., older than 33).
Cheers,
--Timo
Hola,
Henning Westerholt wrote:
Maybe you have an idea on this; I will continue thinking about it. In any case, the "always decrement counter during BYE processing" isn't strictly required for the designated fix if the "deny proxy-initiated call termination for non-confirmed calls" rule is in effect. That's because confirmed dialogs should always transition to the terminated state on processing of the BYE request and thereby decrement the counter.
If the "proxy deny" change is enough to fix the problem about the leaking dialogs in this certain case, then maybe we should only change this so far, especially as there are still at least one fix left (tm delete timer dlg) which haven't been commited so far.
Agreed. Putting it on the agenda for the new improved implementation should suffice for the moment. I will soon note down in the wiki dialog proposal what we have discussed so far.
Too much thinking about dialog today. I need a break. :)
Ok, I also need a break to see the goal again and again XDDD
Hehe, you've not that much goals in the last games, so i understand. ;-)
Iñaki's law: "As an sr-dev thread grows longer, the probability of him teasing others with the fact that the Spanish national football team won the 2010 world cup approaches 1." :)
¡Viva diálogo!
--Timo
2010/7/12 Timo Reimann timo.reimann@1und1.de:
Done with forensic. This is why the leak happens for early dialogs:
(1) dlg_end_dlg triggers generation of a BYE request for the caller which succeeds because the caller's Contact address was stored during processing of the initial INVITE. (2) The reference counter is increased by one and supposed to be decremented again during processing of the BYE response (in bye_reply_cb()) from the caller. (3) The BYE request is send to the caller. However, it replies with "481 Call Leg/Transaction Does Not Exist" because the request is missing the callee's To tag which is not stored in the dialog module prior to the call's transition to the confirmed state. (4) The caller's 481 response is dialog-handled in bye_reply_cb(). What this function basically does is on reception of a final response, run the proxy-initiated BYE request (not the response) through the state machine and decrease the reference counter *if the new dialog state is "terminated" after completion of the state machine*. However, because BYE requests in the early state do not trigger state transitions due to the fact that the dialog module cannot tell yet whether a single branch or the entire call was terminated the "early" state is maintained and, in consequence, the reference counter *not decremented*. (5) Generation of a BYE request for the callee is triggered. However, it fails because no callee Contact is available. The reference counter will not be touched either though. (6) When the UAC sends out a CANCEL request the call is torn down. However, the dialog structure will not be deleted because the reference counter can not drop lower than one.
Great analysis.
However, the latter is something to be done in future dialog module. To prevent any kind of leakage in the current implementation, my proposed fix would be to
(1) Deny proxy-initiated dialog termination unless the call is in the "confirmed" state and
It makes sense as termination of early-dialog would involve terminating an existing INVITE transaction which would require a 408 to the UAC and CANCEL/BYE to the UAS. Perhaps a future feature ;)
(2) always decrement the reference counter during BYE response handling.
So, never rely on the response to the BYE and never expect that the BYE would get a response, am I right? If so I fully agree. A BYE initiated by the proxy shoud always work except in case it's sent at the same time as an in-dialog request by an endpoint, so the CSeq of the BYE could be too small. In order to prevent it, the BYE generated by the proxy should ensure that its CSeq value is 5-10 times greater than the last CSeq value in the dialog (for each side). And when the BYE is sent then update the dialog status without waiting for the response, do you agree?
Thanks a lot!
Heya,
Iñaki Baz Castillo wrote:
(2) always decrement the reference counter during BYE response handling.
So, never rely on the response to the BYE and never expect that the BYE would get a response, am I right?
Yes.
If so I fully agree. A BYE initiated by the proxy shoud always work except in case it's sent at the same time as an in-dialog request by an endpoint, so the CSeq of the BYE could be too small. In order to prevent it, the BYE generated by the proxy should ensure that its CSeq value is 5-10 times greater than the last CSeq value in the dialog (for each side). And when the BYE is sent then update the dialog status without waiting for the response, do you agree?
I didn't consider the case where an in-dialog request created by one of the UAs could shadow a proxy-induced BYE but it's worth remembering. Agreeing with you, I will update the wiki page on this aspect tomorrow.
Cheers,
--Timo