Description

We are plotting the call success rate on each of our proxies.
Using the Kamailio snmpstats module, we try to obtain three counters in order to plot them on a graph:

Most of the time, everything works well and we obtain sensible values:
snmpwalk -c mycommunity -v 2c W.X.Y.Z 1.3.6.1.4.1.34352.3.1.3.1.3.2 -O n

.1.3.6.1.4.1.34352.3.1.3.1.3.2.1.0 = Gauge32: 309 -> Total Calls
.1.3.6.1.4.1.34352.3.1.3.1.3.2.2.0 = Gauge32: 95 -> Active Calls
.1.3.6.1.4.1.34352.3.1.3.1.3.2.3.0 = Gauge32: 214 -> Error calls

However, when our proxies are facing intense peaks of failed calls in a short timespan, sometimes, the "Error Calls" counter returns a value higher than "Total Calls". As a result, "Active Calls" suddenly reaches a value close to the maximum of a Gauge32 (because ${TotalCalls} - {ErrorCalls} < 0$).

snmpwalk -c mycommunity -v 2c W.X.Y.Z 1.3.6.1.4.1.34352.3.1.3.1.3.2 -O n

.1.3.6.1.4.1.34352.3.1.3.1.3.2.1.0 = Gauge32: 153 -> Total Calls
.1.3.6.1.4.1.34352.3.1.3.1.3.2.2.0 = Gauge32: 4294967226 -> Active Calls
.1.3.6.1.4.1.34352.3.1.3.1.3.2.3.0 = Gauge32: 223 -> Error calls

As a result, our graphs make no more sense.

Troubleshooting

There seems to be a bug causing Total calls to be lower than Error calls when a high number of failed calls are received in a short time window.
Moreover, there is an integer overflow resulting to the negative result of the substraction being converted to a Gauge32. Maybe setting it to 0 in case of a negative result would be safer ?

Reproduction

Using the following configuration for the snmpstats module of your Kamailio proxy:

# -------- snmpstats params -------
modparam("snmpstats", "snmpgetPath", "/usr/bin/")
modparam("snmpstats", "snmpCommunity", "mycommunity")
modparam("snmpstats", "sipEntityType", "proxyServer")

On intense peaks of failed calls, get the call success metrics using snmpwalk :

snmpwalk -c mycommunity -v 2c W.X.Y.Z 1.3.6.1.4.1.34352.3.1.3.1.3.2 -O n

.1.3.6.1.4.1.34352.3.1.3.1.3.2.1.0 = Gauge32: 153 -> Total Calls
.1.3.6.1.4.1.34352.3.1.3.1.3.2.2.0 = Gauge32: 4294967226 -> Active Calls
.1.3.6.1.4.1.34352.3.1.3.1.3.2.3.0 = Gauge32: 223-> Error calls

Additional Information

Kamailio Version:
kamailio -v

version: kamailio 5.6.4 (x86_64/linux) a004cf
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_ MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_S R_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, U SE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SI ZE 65535, DEFAULT PKG_SIZE 64MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: a004cf
compiled on 16:10:41 May 22 2023 with gcc 4.4.7

uname -r

2.6.32-279.el6.x86_64

lsb_release -a

LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.3 (Santiago)
Release: 6.3
Codename: Santiago


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/4006@github.com>