When handling a PUBLISH
we call handle_publish() and NOTIFY
s are sent to all of the corresponding active_watchers
(as expected). However, when NOTIFY
s timeout (408
), we're seeing that the corresponding entries in the active_watchers
table are not being deleted as expected. Furthermore, we've noticed that NOTIFY
s are being sent to active_watchers
which are expired (i.e. expires < UNIX_TIMESTAMP()
) and when we run kamcmd presence.cleanup
, no expired entries are removed from the active_watchers
table.
We suspect that all of these things might be related--the common theme being that records aren't deleted when expected.
In our setup, we're using Kamailio as a "presence server" (via the presence, presence_dialoginfo, and presence_xml modules). We're using subs_db_mode
3
(DB-only scheme) and we have multiple Kamailio instances connected to a shared database (MySQL 8.0.27
).
Everything seems to be working as expected. However, as we accumulated stale entries in the active_watchers
table we're finding that we're wasting more and more time on sending NOTIFY
s to black holes. We're generating a lot of traffic and waiting for the timeouts to hit is causing memory issues and backups.
Here are the relevant portions of our kamailio.cfg
file:
# ----- presence params -----
modparam("presence", "db_table_lock_type", 0) # Disable locking; MySQL has issues with this is enabled.
modparam("presence", "db_update_period", -1) # Disable synchronization.
modparam("presence", "db_url", PRESENCE_DB_URL)
modparam("presence", "expires_offset", 60) # Force the client to send an UPDATE before the old PUBLISH expires.
modparam("presence", "max_expires", 1800)
modparam("presence", "min_expires", 1700)
modparam("presence", "publ_cache", 0) # Disable the PUBLISH cache since the database is shared.
modparam("presence", "server_address", "sip:$CLUSTER_DOMAIN_NAME:5060") # This becomes the value of the Contact header.
modparam("presence", "sip_uri_match", 1) # Use case insensitive URI matching.
modparam("presence", "subs_db_mode", 3) # Database-only scheme; everything is stored in the database.
modparam("presence", "notifier_processes", 0) # Caution! Under load a race condition can cause CSeq's to be reused.
modparam("presence", "timeout_rm_subs", 1)
# ----- presence_dialoginfo params -----
modparam("presence_dialoginfo", "force_single_dialog", 1) # Maybe not all phones support multiple "dialog" elements?
modparam("presence_dialoginfo", "force_dummy_dialog", 1) # Maybe not all phones support a null body?
# ----- presence_xml params -----
modparam("presence_xml", "db_url", PRESENCE_DB_URL)
modparam("presence_xml", "force_active", 1) # Skip permission/XCAP checks.
modparam("presence_xml", "force_dummy_presence", 1) # Default to a simple "open" status when presentity info is unavailable.
# ...
route[PRESENCE] {
if (!is_method("PUBLISH|SUBSCRIBE")) {
return;
}
if (!t_newtran()) {
sl_reply_error();
exit;
}
if (is_method("PUBLISH")) {
handle_publish();
t_release();
} else if (is_method("SUBSCRIBE")) {
handle_subscribe();
t_release();
}
exit;
}
Here's a somewhat sanitized example (the message seems OK to us; however, the Subscription-State: terminated; reason=timeout
does make us wonder--do we as the sender know that the client is terminated/timed-out?):
2022/04/05 21:09:55.209846 10.21.3.12:5060 -> 10.31.0.226:6060
NOTIFY sip:SomeUser@192.168.86.24:54639;alias=123.21.125.232~54639~1 SIP/2.0
Via: SIP/2.0/UDP presence-w.staging.internal:5060;branch=z9hG4bK43ea.648a1952000000000000000000000000.0
To: <sip:SomeOtherUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946>;tag=2607596073
From: <sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946>;tag=69309ea83adcd977af8788878e9f31b3-42e32342
CSeq: 66 NOTIFY
Call-ID: 0_2607659559@192.168.86.24
Route: <sip:10.31.0.226:6060;r2=on;lr;ftag=2607596073>, <sip:55.8.122.110;r2=on;lr;ftag=2607596073>
Content-Length: 710
Max-Forwards: 70
Event: dialog
Contact: <sip:presence-w.staging.internal:5060>
Subscription-State: terminated;reason=timeout
Content-Type: application/dialog-info+xml
<?xml version="1.0"?>
<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="66" state="full" entity="sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946">
<dialog id="0_1364146118@192.168.1.244" call-id="0_1364146118@192.168.1.244" direction="initiator">
<state>confirmed</state>
<remote>
<identity>sip:4355558565@9bfadf66-a77b-6a69-25f3-02d96d4aa945:5060</identity>
<target uri="sip:4355558565@9bfadf66-a77b-6a69-25f3-02d96d4aa946:5060"/>
</remote>
<local>
<identity>sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946:5060</identity>
<target uri="sip:SomeUser@123.130.50.202:58872"/>
</local>
</dialog>
</dialog-info>
We didn't see any functions in the presence
module that we could call directly to clean things up. One thought we had was to manually run some database commands from event_route[presence:notify-reply]
(or in a reply_route
). We've noticed that once the problematic entries are manually removed from the database that we no longer attempt to send NOTIFY
s to the defunct destinations.
kamailio -v
version: kamailio 5.5.4 (x86_64/linux)
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled with gcc 10.2.1
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
$ uname -a
Linux ip-10-21-3-12 5.10.0-13-cloud-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.