### Description
<!--
Explain what you did, what you expected to happen, and what actually happened.
-->
When handling a `PUBLISH` we call
[
handle_publish()](https://kamailio.org/docs/modules/5.5.x/modules/presence.…
and `NOTIFY`s are sent to all of the corresponding `active_watchers` (as expected).
However, when `NOTIFY`s timeout (`408`), we're seeing that the corresponding entries
in the `active_watchers` table are _not_ being deleted as expected. Furthermore, we've
noticed that `NOTIFY`s are being sent to `active_watchers` which are expired (i.e.
`expires < UNIX_TIMESTAMP()`) and when we run `kamcmd presence.cleanup`, no expired
entries are removed from the `active_watchers` table.
We suspect that all of these things might be related--the common theme being that records
aren't deleted when expected.
### Troubleshooting
#### Reproduction
<!--
If the issue can be reproduced, describe how it can be done.
-->
In our setup, we're using Kamailio as a "presence server" (via the
[
presence](https://kamailio.org/docs/modules/5.5.x/modules/presence.html),
[
presence_dialoginfo](https://kamailio.org/docs/modules/5.5.x/modules/presen…,
and [
presence_xml](https://kamailio.org/docs/modules/5.5.x/modules/presence_xml.…
modules). We're using
[`subs_db_mode`](https://kamailio.org/docs/modules/5.5.x/modules/presence.html#presence.p.subs_db_mode)
`3` (DB-only scheme) and we have multiple Kamailio instances connected to a shared
database (MySQL `8.0.27`).
Everything seems to be working as expected. However, as we accumulated stale entries in
the `active_watchers` table we're finding that we're wasting more and more time on
sending `NOTIFY`s to black holes. We're generating a _lot_ of traffic and waiting for
the timeouts to hit is causing memory issues and backups.
Here are the relevant portions of our `kamailio.cfg` file:
```
# ----- presence params -----
modparam("presence", "db_table_lock_type", 0) # Disable locking;
MySQL has issues with this is enabled.
modparam("presence", "db_update_period", -1) # Disable
synchronization.
modparam("presence", "db_url", PRESENCE_DB_URL)
modparam("presence", "expires_offset", 60) # Force the client to send
an UPDATE before the old PUBLISH expires.
modparam("presence", "max_expires", 1800)
modparam("presence", "min_expires", 1700)
modparam("presence", "publ_cache", 0) # Disable the PUBLISH cache
since the database is shared.
modparam("presence", "server_address",
"sip:$CLUSTER_DOMAIN_NAME:5060") # This becomes the value of the Contact
header.
modparam("presence", "sip_uri_match", 1) # Use case insensitive URI
matching.
modparam("presence", "subs_db_mode", 3) # Database-only scheme;
everything is stored in the database.
modparam("presence", "notifier_processes", 0) # Caution! Under load a
race condition can cause CSeq's to be reused.
modparam("presence", "timeout_rm_subs", 1)
# ----- presence_dialoginfo params -----
modparam("presence_dialoginfo", "force_single_dialog", 1) # Maybe not
all phones support multiple "dialog" elements?
modparam("presence_dialoginfo", "force_dummy_dialog", 1) # Maybe not
all phones support a null body?
# ----- presence_xml params -----
modparam("presence_xml", "db_url", PRESENCE_DB_URL)
modparam("presence_xml", "force_active", 1) # Skip permission/XCAP
checks.
modparam("presence_xml", "force_dummy_presence", 1) # Default to a
simple "open" status when presentity info is unavailable.
# ...
route[PRESENCE] {
if (!is_method("PUBLISH|SUBSCRIBE")) {
return;
}
if (!t_newtran()) {
sl_reply_error();
exit;
}
if (is_method("PUBLISH")) {
handle_publish();
t_release();
} else if (is_method("SUBSCRIBE")) {
handle_subscribe();
t_release();
}
exit;
}
```
#### SIP Traffic
<!--
If the issue is exposed by processing specific SIP messages, grab them with ngrep or save
in a pcap file, then add them next, or attach to issue, or provide a link to download them
(e.g., to a pastebin site).
-->
Here's a somewhat sanitized example (the message seems OK to us; however, the
`Subscription-State: terminated; reason=timeout` does make us wonder--do we as the sender
_know_ that the client is terminated/timed-out?):
```
2022/04/05 21:09:55.209846 10.21.3.12:5060 -> 10.31.0.226:6060
NOTIFY sip:SomeUser@192.168.86.24:54639;alias=123.21.125.232~54639~1 SIP/2.0
Via: SIP/2.0/UDP
presence-w.staging.internal:5060;branch=z9hG4bK43ea.648a1952000000000000000000000000.0
To: <sip:SomeOtherUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946>;tag=2607596073
From:
<sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946>;tag=69309ea83adcd977af8788878e9f31b3-42e32342
CSeq: 66 NOTIFY
Call-ID: 0_2607659559(a)192.168.86.24
Route: <sip:10.31.0.226:6060;r2=on;lr;ftag=2607596073>,
<sip:55.8.122.110;r2=on;lr;ftag=2607596073>
Content-Length: 710
Max-Forwards: 70
Event: dialog
Contact: <sip:presence-w.staging.internal:5060>
Subscription-State: terminated;reason=timeout
Content-Type: application/dialog-info+xml
<?xml version="1.0"?>
<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info"
version="66" state="full"
entity="sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946">
<dialog id="0_1364146118(a)192.168.1.244"
call-id="0_1364146118(a)192.168.1.244" direction="initiator">
<state>confirmed</state>
<remote>
<identity>sip:4355558565@9bfadf66-a77b-6a69-25f3-02d96d4aa945:5060</identity>
<target
uri="sip:4355558565@9bfadf66-a77b-6a69-25f3-02d96d4aa946:5060"/>
</remote>
<local>
<identity>sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946:5060</identity>
<target uri="sip:SomeUser@123.130.50.202:58872"/>
</local>
</dialog>
</dialog-info>
```
### Possible Solutions
<!--
If you found a solution or workaround for the issue, describe it. Ideally, provide a pull
request with a fix.
-->
We didn't see any functions in the `presence` module that we could call directly to
clean things up. One thought we had was to manually run some database commands from
`event_route[presence:notify-reply]` (or in a `reply_route`). We've noticed that once
the problematic entries are manually removed from the database that we no longer attempt
to send `NOTIFY`s to the defunct destinations.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
```
version: kamailio 5.5.4 (x86_64/linux)
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST,
DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY,
USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR,
USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535,
DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled with gcc 10.2.1
```
* **Operating System**:
<!--
Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04,
CentOS 7.1, ...), MacOS, xBSD, Solaris, ...;
Kernel details (output of `lsb_release -a` and `uname -a`)
-->
```
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
$ uname -a
Linux ip-10-21-3-12 5.10.0-13-cloud-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64
GNU/Linux
```
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3074
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3074(a)github.com>