Hello,
while I did some work for the dialog module over the time, it is not one of my favourites modules beside using it to ensure a maximum duration of calls (for which it should work fine). Also, I never ended up using it for CDRs generation, I like the acc event based account which can record more events event for the same call.
That said, for active calls limiting I usually rely on other solutions built via config file and leveraging htable or various backends. Also, for values that I need to use during call duration, I use htable.
Anyhow, I find it strange that after restart a request within dialog does not match the record loaded in memory, because obviously it is there as you say the dialog times out at some point in time later. Did you change the value of modparam hash_size?
Have you captured the sip traffic and can you see the 'did' parameter in the Route headers of BYE?
Cheers, Daniel
On 05.07.23 15:44, Benoît Panizzon wrote:
Hi Daniel
PS: Kamailio 5.5 in use so not on the edge yet.
Thank you for helping regarding that issue and maybe hinting how it could be improved.
what is the purpose of dmq replication? To limit active calls?
Exactly. Our subscriptions contain a certain number of 'channels'. If they are used the customer is busy.
So I use profile counters to track the used channel count per customer.
What exactly happens? What means "corrupts" them? What data/fields become corrupted?
It looks like the/some dialogues just don't exist any more after a reload. Or they exist but are not being found.
Observed issues:
- dialogue variables that were populated before the restart do not exist any more.
- When a call ends, the corresponding dialogue is not found, so the dialogue modules is unable to end the CDR - but when the dialogue timeout hits, the CDR is then written with duration = timeout value which is way longer than the actual duration.
- profile counter for dialogues that were not found when the call ended are still present so 'POTS' customers with 'one' channel stay 'busy' until the dialogue timeout hits.
- Database accumulates data from dialogues that do not exist anymore.
Specific error I see, when a dialogue should be ended and kamailio can't find it anymore after a restart is:
ERROR: dialog [dlg_dmq.c:289]: dlg_dmq_handle_msg(): dialog [838:15539] not found
If you could help, I could try to dig out the full log of a dialogue experiencing that issue.
Dialog Parameters used:
modparam("dialog", "send_bye", 1) modparam("dialog", "timer_procs", 0) modparam("dialog", "db_mode", 1 ) modparam("dialog", "db_url", DBLOCAL ) modparam("dialog", "dlg_flag", FLT_DLG ) modparam("dialog", "dlg_match_mode", 1) modparam("dialog", "dlg_extra_hdrs", "Hint: Initiated by IMP Core Proxy\r\n") modparam("dialog", "hash_size", 4096 ) # Do not send any keepalive messages in dialog modparam("dialog", "ka_timer", 0) modparam("dialog", "ka_interval", 30 ) modparam("dialog", "enable_stats", 1 ) modparam("dialog", "detect_spirals", 1 ) modparam("dialog", "bridge_controller", "sip:controller@imp.ch") modparam("dialog", "default_timeout", 43200 ) modparam("dialog", "timeout_avp", "$avp(dlgtimeout)") # Needs to be same as sst timeout! modparam("dialog", "profiles_no_value", "callcounter;total_sbcincoming"); modparam("dialog", "profiles_with_value", "dispatchout;sbcincoming;trunkincoming;cpeincoming;safariincoming;custprofilecounter;legcounter"); modparam("dialog", "enable_dmq", 1) modparam("dialog", "h_id_start", -1) # Use server_id modparam("dialog", "h_id_step", 2)
Each node uses a local database (defined as DBLOCAL), they don't access our common 'remote' database where for example customer authentication information is provided.
Regarding reject of the calls for cooling down the instance for restart, check if the 305 Use Proxy is supported by origin of the calls, it might be more suitable.
Our registrar nodes run kamailio too, so implementing that would be an option. Regarding our IC to other TSP and Carriers, I would have to check, at the moment, they are all connected via a commercial vendor SBC so if that SBC can handle 305 on Invites (it can in register) that would work.
But one of our goals is to eventually also get rid of that SBC which has some limitations and a not very advantageous 'feature' licensing model in favour of open source and flexiblity by using Kamailion and rtpengine for that task. But then we would have to check with every IC we have. I know that at the moment 503 is understood by all our switches connected to kamailio and also our registrars handle 503 as a failure to the other node in the dispatcher list.
-- Mit freundlichen Grüssen
-Benoît Panizzon- @ HomeOffice und normal erreichbar
I m p r o W a r e A G - Leiter Commerce Kunden ______________________________________________________
Zurlindenstrasse 29 Tel +41 61 826 93 00 CH-4133 Pratteln Fax +41 61 826 93 01 Schweiz Web http://www.imp.ch ______________________________________________________