One little clarification that the impact of the match mode, the non-default mode is forced with dlg_manage, meaning everyone will use it in this case.



int dlg_manage(sip_msg_t *msg)
...
                backup_mode = seq_match_mode;
                seq_match_mode = SEQ_MATCH_NO_ID;
                dlg_onroute(msg, NULL, NULL);                                                                                                                                                              
                seq_match_mode = backup_mode;
...


I just create a MR with the intent of preventing any mismatch caused by empty to-tag.
https://github.com/kamailio/kamailio/pull/2484

On Fri, Sep 25, 2020 at 11:58 PM Henning Westerholt <hw@skalatan.de> wrote:
Thank you Julien for digging into it. If its affects not the default match mode - this sounds indeed like the reason that it was not found earlier.

Cheers,

Henning


--
Henning Westerholt - https://skalatan.de/blog/
Kamailio services - https://skalatan.de/services


Von: sr-users <sr-users-bounces@lists.kamailio.org> im Auftrag von Julien Chavanton <jchavanton@gmail.com>
Gesendet: Samstag, 26. September 2020, 04:17
An: Daniel-Constantin Mierla
Cc: Kamailio (SER) - Users Mailing List
Betreff: Re: [SR-Users] Dialog - timeout for dlg with CallID

It seems I found the problem and I have a fix.

The root cause is probably that the locally generated 408 is not updating the dialog to-tag.

However, always checking for a to-tag match, before a non to-tag match will fix any such issue.

I will prepare a merge request on Monday to start discussing the option always matching to-tag first.

On Fri, Sep 25, 2020 at 11:27 AM Julien Chavanton <jchavanton@gmail.com> wrote:
I did catch the logs, and after looking at the trace, it seems like dialog mismatch with a serial forking scenario :

- log line 3 is telling us that a NO-ACK disconnection should be triggered
- log line 1-2 is telling us what happened when the ACK was received in dlg_onroute(), oddly enough state 5 was old and new, could it be a mismatch/confusio with the previous dialog, looking in this direction ...

1: 2020-09-25T16:30:16.896: dialog [dlg_handlers.c:1273]: extra_ack_debug_info(): [ACK][1] state not changed >>> call-id[562419_125824138_2072238224] to-tag[<sip:+14019991904@anon.com>;tag=gK02b68836]
2: 2020-09-25T16:30:16.896: dialog [dlg_handlers.c:1440]: dlg_onroute(): [ACK] state not changed old[5]new[5]
...
3: 2020-09-25T16:32:22.674: dialog [dlg_hash.c:247]: dlg_clean_run(): dialog disconnection no-ACK call-id[562419_125824138_2072238224][1601051416]<[1601051542 - 60]


After looking at the pcap trace, call-id 562419_125824138_2072238224 was involved in serial forking :

call attempt #1

X >> INVITE >> Y   // no to-tag  
X << 100
...
X << 408           // to-tag=594d50c3218065a60bb91fd47a70fbc1-59edef02 (locally generated)
X >> ACK           // to-tag=594d50c3218065a60bb91fd47a70fbc1-59edef02

call attempt #2

X >> INVITE >> Z   // no to-tag
X << 100
X << 200    << Z   // to-tag=gK02b68836
X >> ACK    >> Z   // to-tag=gK02b68836 (Should be state old[3]new[4], I wonder how it could possibly be state old[5]new[5])



I did look at several occurrences and there is always a locally generated 408/to-tag before, seems like I have a good lead to investigate further.