Hi Henning,
I am in fact using timeout_avp, but it's set to 2 hours (7200 sec).
However, to follow up on this, I have made the discovery that the caller adulterates my RR cookie parameter value to something other than the value my Kamailio instance injected. It's truncated.
This leads parse_dlg_rr_param() to complain about a malformed RR parameter.
At some point, a reinvite from the caller occurs, and the Route header in both the reinvite and the subsequent e2e ACK to the 200 OK contain an improper RR cookie. 154 seconds after the ACK, Kamailio spoofs the BYE in both directions.
Nevertheless, I am moved to ask: why is Kamailio so actively interventionist in this situation? I have dlg_match_mode modparam set to 1 (cookie with fallback to SIP fundamentals), so presumably Kamailio is able to follow the mutation to the dialog state based on these in-dialog requests, otherwise it would not see a need to end the call after a lot shorter a period than the 7200 sec to which the timeout_avp is set. But if it follows the change based on a fallback to SIP identifiers, why would it care that the cookie is broken, and why does this put the dialog in a state where the dialog module seems to apply some sort of timeout to it that leads to a spoofed BYE?
-- Alex
On Thu, Aug 29, 2019 at 08:08:06AM +0000, Henning Westerholt wrote:
Hi Alex,
sounds strange indeed. 4.3.5 is from March 2016, a lot happened since then..
One idea to investigate further - have you tried already to get more debugging information from dialog? With debugger module you could restrict it to dialog, if this is a loaded production instance.
Another idea - are you using functionality like timeout_avp or sst module to update/modify the dialog timeout?
Cheers,
Henning
Am 27.08.19 um 22:31 schrieb Alex Balashov:
Hi,
I am running 4.3.5:1b0c0a, which I am aware is an EOL'd release train, and have a problem with the dialog module I am baffled by.
On many calls - I can't find any correlation to particular kinds of endpoints - I see spoofed BYEs come out of Kamailio after a minute and a half, as if the call had hit a dialog timeout or the dialog module's dead peer detection (OPTIONS pinging) kicked it out.
The thing is, neither of those explanations seem to be borne out by the parameters of the calls under investigation. The dialog timeout on these calls is set to 7200, and the dialog keepalives aren't enabled in either direction. If they were, it'd be logged.
Calls that are terminated in this manner also see the following log message:
WARNING: dialog [dlg_req_within.c:214]: bye_reply_cb(): inconsitent dlg timer data on dlg 0x7f9997317e48 [3390:9644] with clid '1996679936_133218050@x.x.x.x' and tags '2bb17663-co4006-INS001' 'as7925780d'
But I don't think this is unusual for a dialog-spoofed BYE; presumably this is due to a 200 OK for the BYE from both ends.
So anyway, I am trying to track down anything else that could inadvertently cause dialog to hang up calls relatively quickly in that release, or inadvertently set the dialog timeout parameter to something lower than is apparent from the logging and the config.
If anyone has any pointers, that would not go unappreciated!
Many thanks,
-- Alex
-- Henning Westerholt - https://skalatan.de/blog/ Kamailio services - https://skalatan.de/services