I've got a few of those zombie dialogs :-/
After doing some investigations, here's what I found:
On a normal call, the logs shows this when a BYE is received:
DEBUG: dialog [dlg_handlers.c:1116]: dlg_onroute(): route param is
'a1.e4b2' (len=7)
DEBUG: dialog [dlg_hash.c:639]: dlg_lookup(): ref dlg 0x7f66087522e8 with 1 -> 3
DEBUG: dialog [dlg_hash.c:641]: dlg_lookup(): dialog id=11086 found on entry 26
DEBUG: dialog [dlg_profile.c:406]: set_current_dialog(): setting
current dialog [26:11086]
DEBUG: dialog [dlg_hash.c:998]: next_state_dlg(): dialog
0x7f66087522e8 changed from state 4 to state 5, due event 7 (ref 3)
DEBUG: dialog [dlg_hash.c:786]: dlg_ref(): ref dlg 0x7f66087522e8 with 1 -> 4
DEBUG: dialog [dlg_handlers.c:1484]: dlg_run_event_route(): executing
event_route 1 on state 5
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x7f66087522e8
with 1 -> 3
DEBUG: dialog [dlg_hash.c:786]: dlg_ref(): ref dlg 0x7f66087522e8 with 1 -> 4
DEBUG: dialog [dlg_handlers.c:1224]: dlg_onroute(): BYE successfully processed
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x7f66087522e8
with 2 -> 2
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x7f66087522e8
with 1 -> 1
Then on a separate process:
DEBUG: dialog [dlg_hash.c:639]: dlg_lookup(): ref dlg 0x7f66087522e8 with 1 -> 2
DEBUG: dialog [dlg_hash.c:641]: dlg_lookup(): dialog id=11086 found on entry 26
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x7f66087522e8
with 1 -> 1
And then the timer process kills the dialog:
DEBUG: dialog [dlg_handlers.c:322]: dlg_iuid_sfree(): freeing dlg iuid
[26:11086] (0x7f6608753e20)
DEBUG: dialog [dlg_hash.c:639]: dlg_lookup(): ref dlg 0x7f66087522e8 with 1 -> 2
DEBUG: dialog [dlg_hash.c:641]: dlg_lookup(): dialog id=11086 found on entry 26
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x7f66087522e8
with 2 -> 0
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): ref <=0 for dialog 0x7f66087522e8
DEBUG: dialog [dlg_hash.c:348]: destroy_dlg(): destroying dialog
0x7f66087522e8 (ref 0)
DEBUG: dialog [dlg_hash.c:364]: destroy_dlg(): removed timer for dlg
0x7f66087522e8 [26:11086] with clid '8a13d838-eeebf118(a)192.168.2.97'
and tags '5c57adb8a4be1998o4' 'as6432ed47'
DEBUG: dialog [dlg_handlers.c:322]: dlg_iuid_sfree(): freeing dlg iuid
[26:11086] (0x7f66087563f8)
DEBUG: dialog [dlg_handlers.c:322]: dlg_iuid_sfree(): freeing dlg iuid
[26:11086] (0x7f6608755478)
And we are all happy!
Now, for the zombie dialog, when a BYE is received, I see this:
DEBUG: dialog [dlg_handlers.c:1116]: dlg_onroute(): route param is
'e5.e4d2' (len=7)
DEBUG: dialog [dlg_hash.c:639]: dlg_lookup(): ref dlg 0x2b20505468c0 with 1 -> 3
DEBUG: dialog [dlg_hash.c:641]: dlg_lookup(): dialog id=11598 found on entry 94
DEBUG: dialog [dlg_profile.c:406]: set_current_dialog(): setting
current dialog [94:11598]
DEBUG: dialog [dlg_hash.c:998]: next_state_dlg(): dialog
0x2b20505468c0 changed from state 4 to state 5, due event 7 (ref 3)
DEBUG: dialog [dlg_hash.c:786]: dlg_ref(): ref dlg 0x2b20505468c0 with 1 -> 4
DEBUG: dialog [dlg_handlers.c:1484]: dlg_run_event_route(): executing
event_route 1 on state 5
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x2b20505468c0
with 1 -> 3
Then here things starts to go weird:
DEBUG: dialog [dlg_handlers.c:322]: dlg_iuid_sfree(): freeing dlg iuid
[3343:8686] (0x2b20505109f8)
DEBUG: dialog [dlg_handlers.c:322]: dlg_iuid_sfree(): freeing dlg iuid
[3343:8686] (0x2b2050505278)
DEBUG: dialog [dlg_hash.c:786]: dlg_ref(): ref dlg 0x2b20505468c0 with 1 -> 4
DEBUG: dialog [dlg_handlers.c:1224]: dlg_onroute(): BYE successfully processed
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x2b20505468c0
with 2 -> 2
DEBUG: dialog [dlg_hash.c:804]: dlg_unref(): unref dlg 0x2b20505468c0
with 1 -> 1
And nothing else ....
Looking at dlg_onroute(), we see that we get to line 1201 and we run
the event route.
We know that we migrated from state 4 (DLG_STATE_CONFIRMED) to state 5
(DLG_STATE_DELETED) due to event 7 (DLG_EVENT_REQBYE).
Based on the above, we run inside the if at line 1205, we get to clone
the iuid (no errors in the logs) and then we register the tm callback.
What's strange is that the logs shows that dlg_iuid_sfree was executed
twice for the zombie case (the normal case doesn't show those lines).
After that we have the log on line 1224: BYE successfully processed.
Is it possible to run the tm callback as soon as it was registered (on
the same process)?
Regards,
Ovidiu Sas
On Mon, Jan 13, 2014 at 3:20 AM, Daniel-Constantin Mierla
<miconda(a)gmail.com> wrote:
Hello,
On 12/01/14 19:37, Alex Balashov wrote:
With 4.1, I've got a call scenario that looks like this:
1. INVITE -->
2. <-- 100
3. <-- 183
4. <-- 200 OK
5. <-- 200 OK again (no ACK from near end)
6. Finally an e2e ACK comes through -->
7. (38 seconds later) BYE -->
8. (no response to BYE) BYE -->
9. BYE -->
10. BYE -->
11. BYE -->
12. BYE -->
13. BYE -->
etc.
I've also got another call that looks very similarly, except that the BYE
comes from the callee instead of the caller. It also goes unanswered by the
near end.
I am tracking both dialogs with 'dialog', and, sooner or later, they end
up in DELETED state. However, they do not get deleted. I have one that has
been in there for 4 days and isn't going anywhere, and another that has been
stuck in DELETED state for 3. My dialog timeout is set to 4 hours, but the
timeout operation doesn't delete dialogs that are already DELETED.
Moreover, dlg_end_dlg is not a means of removing these dialogs, either:
[root@gw1 ~]# kamctl fifo dlg_end_dlg 154 6753
500 Operation failed
Any suggestions on what to do here? Should this be considered a bug?
yes.
What is reference counter you see when you list the dialog via mi?
Cheers,
Daniel
--
Daniel-Constantin Mierla -
http://www.asipto.com
http://twitter.com/#!/miconda -
http://www.linkedin.com/in/miconda
_______________________________________________
sr-dev mailing list
sr-dev(a)lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
--
VoIP Embedded, Inc.
http://www.voipembedded.com