### Description
I established a call between two extensions using TLS, and the presence state was replicated between two Kamailio servers using presence_dmq (no database backing). This all worked correctly except when it comes to sending a PUBLISH with a state terminated in the body, the presence_dmq sent the serialized data correctly, but the Kamailio server sends the wrong state and the caller is marked as busy on all subscribed phones until Kamailio is restarted. This occurs on version 5.6.1. In order generate this data, we use a separate script which responds to Asterisk events and sends the right PUBLISH at the right time to Kamailio, eg. if it sees a Hangup event, it sends a PUBLISH with the state in the body set to 'terminated' and the Expires: header set to 0. This appears to prompt Kamailio to send a NOTIFY with a state of early or confirmed to all subscribers.
### Troubleshooting
#### Reproduction
Install two Kamailio servers running Kamailio 5.6.1. I am using this installation command: `sudo apt install kamailio=5.6.1+bpo9 kamailio-dbg=5.6.1+bpo9 kamailio-lua-modules=5.6.1+bpo9 kamailio-memcached-modules=5.6.1+bpo9 kamailio-mysql-modules=5.6.1+bpo9 kamailio-presence-modules=5.6.1+bpo9 kamailio-redis-modules=5.6.1+bpo9 kamailio-tls-modules=5.6.1+bpo9 kamailio-utils-modules=5.6.1+bpo9`
Some configuration detail for /etc/kamailio/kamailio.cfg:
listen=tls:172.22.0.155:5061 # This ensures we have a UDP socket for sending HEP datagrams (we also send the plaintext UDP PUBLISH here) listen=udp:172.22.0.155:9060 port=5061 enable_tls=yes tcp_connection_lifetime=3605
#modules (this is not an exhaustive list) loadmodule "dmq.so" loadmodule "dialog.so" loadmodule "presence.so" loadmodule "presence_xml.so" loadmodule "presence_dialoginfo.so" loadmodule "pua.so" loadmodule "pua_dialoginfo.so" loadmodule "tls.so"
modparam("dmq", "server_address", "sip:172.22.0.155:9060") modparam("dmq", "notification_address", "sip:172.22.1.124:9060") modparam("dialog", "db_url", DBURL) modparam("dialog", "enable_stats", 1) modparam("dialog", "db_mode", 0) # 0 - NO_DB - the memory content is not flushed into DB modparam("dialog", "dlg_flag", 9) modparam("dialog", "enable_dmq", 1) modparam("pua", "db_url", DBURL) modparam("pua", "db_mode", 0) # high speed memory based storage modparam("pua", "outbound_proxy", "sips:172.22.0.155:5061;transport=tls") modparam("pua_dialoginfo", "include_localremote", 0) modparam("presence", "db_url", DBURL) modparam("presence", "server_address", "sips:this.host.example.org:5061;transport=tls") modparam("presence", "db_table_lock_type", 0) modparam("presence", "subs_db_mode", 0) # 0 - This disables database completely. modparam("presence", "enable_dmq", 1) modparam("presence", "publ_cache", 2) modparam("presence", "subs_remove_match", 1) modparam("presence", "timeout_rm_subs", 0) modparam("presence_dialoginfo", "force_dummy_dialog", 1) modparam("presence_dialoginfo", "force_single_dialog", 1) modparam("presence_xml", "force_active", 1) modparam("presence_xml", "force_presence_single_body", 1)
#### Log Messages
These log aren't directly linked but Kamailio reports a 200 OK in response to the PUBLISH, but the NOTIFY to the handset is _not_ terminated. Difficult to capture this because of TLS.
``` Aug 26 13:32:26 this.host.example.org /usr/sbin/kamailio[4187]: DEBUG: presence [presence_dmq.c:467]: pres_dmq_replicate_presentity(): sending serialized data {"action":1,"presentity":{"domain":"sip.prod.example.org","user":"XXXX469","etag":"a.1661495738.4187.2612.0","expires":3600,"recv":1661520746,"event":"dialog"},"t_new":1,"cur_etag":"","ruid":"pres-630869ce-105b-43a","body":"<?xml version=\"1.0\"?>\n<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="0" state="full" entity="sip:XXXX469@sip.prod.example.org">\n<dialog id="c95340dd-5266-450a-8c53-a5cd75c4bb67" call-id="c95340dd-5266-450a-8c53-a5cd75c4bb67" direction="recipient">\n<state>confirmed</state>\n</dialog>\n</dialog-info>\n"}
Aug 26 13:33:00 this.host.example.org /usr/sbin/kamailio[4187]: DEBUG: presence [presence_dmq.c:467]: pres_dmq_replicate_presentity(): sending serialized data {"action":1,"presentity":{"domain":"sip.prod.example.org","user":"XXXX469","etag":"a.1661495738.4187.2622.0","expires":0,"recv":1661520780,"event":"dialog"},"t_new":1,"cur_etag":"","ruid":"pres-630869ce-105b-e3a","body":"<?xml version=\"1.0\"?>\n<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="0" state="full" entity="sip:XXXX469@sip.prod.example.org">\n<dialog id="c95340dd-5266-450a-8c53-a5cd75c4bb67" call-id="c95340dd-5266-450a-8c53-a5cd75c4bb67" direction="recipient">\n<state>terminated</state>\n</dialog>\n</dialog-info>\n"} ```
This is a similar example:
``` Sep 1 12:37:14 this.host.example.org /usr/sbin/kamailio[5176]: INFO: presence [notify.c:1738]: send_notify_request(): NOTIFY sip:XXXX475@sip.prod.example.org via ÿÿÿÿÿÿÿÿ on behalf of sip:XXXX472@sip.prod.example.org for event dialog : 2_2191795053@172.31.29.18 Sep 1 12:37:14 this.host.example.org /usr/sbin/kamailio[5176]: DEBUG: presence [presence_dmq.c:400]: pres_dmq_replicate_presentity(): replicating presentity record - old etag a.1662035772.5176.17.0, new etag , ruid pres-6310a741-1438-11 Sep 1 12:37:14 this.host.example.org /usr/sbin/kamailio[5176]: DEBUG: presence [presence_dmq.c:467]: pres_dmq_replicate_presentity(): sending serialized data {"action":1,"presentity":{"domain":"sip.prod.example.org","user":"XXXX472","etag":"a.1662035772.5176.17.0","expires":0,"recv":1662035834,"event":"dialog"},"t_new":1,"cur_etag":"","ruid":"pres-6310a741-1438-11","body":"<?xml version=\"1.0\"?>\n<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="0" state="full" entity="sip:XXXX472@sip.prod.example.org">\n<dialog id="76024a0e-2d35-47d8-afd1-a1363acbf859" call-id="76024a0e-2d35-47d8-afd1-a1363acbf859" direction="recipient">\n<state>terminated</state>\n</dialog>\n</dialog-info>\n"} Sep 1 12:37:14 this.host.example.org /usr/sbin/kamailio[5176]: DEBUG: presence [presence_dmq.c:149]: pres_dmq_send(): sending dmq broadcast... ```
#### SIP Traffic
``` PUBLISH sip:XXXX469@sip.prod.example.org SIP/2.0 Via: SIP/2.0/UDP this.host.example.org;branch=656ebca4-5e71-4082-984a-7efa54bdd2cc To: sip:XXXX469@sip.prod.example.org From: sip:XXXX469@sip.prod.example.org;tag=a06f8d92-609b-4210-a1f0-f7042b871f07 CSeq: 12 PUBLISH Call-ID: d67c8bb5-f507-4ada-8f98-d7caa6d86721@this.host.example.org Content-Length: 323 User-Agent: Astertisk-Event-Bridge Max-Forwards: 70 Event: dialog Expires: 0 Content-Type: application/dialog-info+xml <?xml version="1.0"?> <dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="0" state="full" entity="sip:XXXX469@sip.prod.example.org"> <dialog id="9fa5bf3c-63f3-4d3f-82c5-cdff374c4e13" call-id="9fa5bf3c-63f3-4d3f-82c5-cdff374c4e13" direction="recipient"> <state>terminated</state> </dialog> </dialog-info> ```
### Possible Solutions
Downgrading to Kamailio version 5.5.4 resolves this problem.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` $ kamailio -v version: kamailio 5.6.1 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled with gcc 6.3.0
$ dpkg -l | egrep '^ii' | grep kamailio | awk '{ print $2, $3 }' kamailio 5.6.1+bpo9 kamailio-dbg:amd64 5.6.1+bpo9 kamailio-lua-modules:amd64 5.6.1+bpo9 kamailio-memcached-modules:amd64 5.6.1+bpo9 kamailio-mysql-modules:amd64 5.6.1+bpo9 kamailio-presence-modules:amd64 5.6.1+bpo9 kamailio-redis-modules:amd64 5.6.1+bpo9 kamailio-tls-modules:amd64 5.6.1+bpo9 kamailio-utils-modules:amd64 5.6.1+bpo9 ```
* **Operating System**:
``` $ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 9.2 (stretch) Release: 9.2 Codename: stretch
$ uname -a Linux this.host.example.org 4.9.0-16-amd64 #1 SMP Debian 4.9.272-2 (2021-07-19) x86_64 GNU/Linux ```
Iirc, expires 0 is used to remove a state published before, not to publish new states.
Publishing a state that should be kept by presence server for 0 seconds makes no much sense, but maybe there are specs defining a specific behaviour in such case. If yes, point the RFC and the section. If not, then try with expires > 0 in publish and close here if all ok.
Hi Miconda. Unfortunately I'd tried to change the script to use Expires: 3600 before raising this ticket, based on reading RFC 3903 section 4.5 "Removing Event State", but the behaviour was the same. I was wondering about that, that perhaps the state might be removed (unpublished) and then then immediately expire and not be replicated using dmq.
So far I don't think this is a dmq problem, as the event does appear to be replicated, in as much as the serialized JSON is in the log. Additionally, I tried modifying the PUBLISH to be sent to both servers in the pair (HA) to try to rule this out, but the behaviour was the same.
I have narrowed it down to a difference between version 5.5.4 and 5.6.0, so I can rule 5.6.1 out. I read the diff between both revisions in src/core and couldn't see any smoking gun as yet. For the moment we are downgrading in production until we can figure out what is happening.
To be clear: is it ok with 5.5.4 but not with 5.6.0?
Yes, 5.5.4 success, 5.6.0 fails, 5.6.1 fails.
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Closed #3229 as not planned.