### Description
The Kamailio 5.4.x dialog profiles functionality can lead to dead-lock on certain high-load scenarios.
The Kamailio dialog profiles are used to track parallel channels for about 200 outgoing PSTN carrier interconnections. During high traffic times (like several thousands parallel calls) the Kamailio server will frequently (e.g. hourly) goes into an end-less loop while executing get_profile_size in the configuration script. This causes the locking for the dialog profiles never be released and Kamailio will stop serving traffic. Internal monitoring tools and RPC commands stay working, as long as they do not touch the dialog functionality.
A similar (dedicated) Kamailio setup is used for tracking parallel channels for customers. Here the dead-lock is not observed that frequently, but aparentely also some crashes happens in a much longer time interval.
### Troubleshooting
After analysis of the back-traces with GDB the get_profile_size() function was removed from the configuration script. After this change the crash did not happened anymore for several days.
#### Reproduction
Issue could not be reproduced so far.
#### Debugging Data
##### bt 1 (some data removed)
(gdb) bt
\# 0 0x00007f57cf3b00da in get_profile_size (profile=0x7f50ccbc7e80, value=0x7ffd9928f300) at dlg_profile.c:859
n = 364
i = 12
ph = 0x7f50d3e4b7d0
\# 1 0x00007f57cf419c67 in w_get_profile_size_helper (msg=0x7f57d699d418, profile=0x7f50ccbc7e80, value=0x7ffd9928f300, spd=0x7f57d6916960) at dialog.c:941
\# 2 0x00007f57cf41a459 in w_get_profile_size3 (msg=0x7f57d699d418, profile=0x7f50ccbc7e80, value=0x7f57d6935118, result=0x7f57d6916960) at dialog.c:982
\# 3 0x0000000000463fea in do_action (h=0x7ffd99293610, a=0x7f57d6936488, msg=0x7f57d699d418) at core/action.c:1094
\# 4 0x00000000004711ee in run_actions (h=0x7ffd99293610, a=0x7f57d6936488, msg=0x7f57d699d418) at core/action.c:1581
\# 5 0x000000000046058b in do_action (h=0x7ffd99293610, a=0x7f57d690fda8, msg=0x7f57d699d418) at core/action.c:700
The first back-trace was taking from a running process with gdb. The counter in f0 does not increased that much during this time, probably due the overflow of the loop counter.
##### bt2 (analysis with data structure with gdb scripts)
Here the loop counter in f0 showed a really high value. Expected size of dialog profiles hash table:
(gdb) p profile->entries[3]
$4 = {first = 0x7f9bfd4aad98, content = 2068}
(gdb) p profile->entries[7]
$3 = {first = 0x7f9c12079f70, content = 784}
(gdb) p profile->entries[12]
$6 = {first = 0x7f9c02be5d50, content = 7600}
(gdb) p profile->entries[14]
$2 = {first = 0x7f9bff636de8, content = 6764}
hash table bucket 14 shows a lot of corruption and the loop never ends (carrier names and IPs replaced). The list for hash bucket 7 got linked to the list for hash bucket 14:
counter 6755: prev 0x7f9c0b9dcde0 - current 0x7f9c02e5b378 - next 0x7f9c0a5f9ba0 - value carrier1-XX.XX - hash 14
counter 6756: prev 0x7f9c02e5b378 - current 0x7f9c0a5f9ba0 - next 0x7f9c0860b968 - value carrier1-XX.XX▒▒▒▒ - hash 14
counter 6757: prev 0x7f9c0a5f9ba0 - current 0x7f9c0860b968 - next 0x7f9bfe3f3a78 - value carrier1-XX.XX▒▒▒▒ - hash 14
counter 6758: prev 0x7f9c0860b968 - current 0x7f9bfe3f3a78 - next 0x7f9c10d977f0 - value carrier1-XX.XX - hash 14
counter 6759: prev 0x7f9bfe3f3a78 - current 0x7f9c10d977f0 - next 0x7f9c0ae198b0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 6760: prev 0x7f9c10d977f0 - current 0x7f9c0ae198b0 - next 0x7f9c12079f70 - value carrier3-XX.XX - hash 7
counter 6761: prev 0x7f9c0ae198b0 - current 0x7f9c12079f70 - next 0x7f9c011f2540 - value-carrier2-XX.XX▒▒▒▒ - hash 7
counter 6762: prev 0x7f9c12079f70 - current 0x7f9c011f2540 - next 0x7f9bfff886f0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 6763: prev 0x7f9c011f2540 - current 0x7f9bfff886f0 - next 0x7f9c05db00a8 - value carrier3-XX.XX= - hash 7
[...]
counter 28270: prev 0x7f9c019d06e8 - current 0x7f9bfaf18290 - next 0x7f9c12c90680 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 28271: prev 0x7f9bfaf18290 - current 0x7f9c12c90680 - next 0x7f9c086a2b58 - value-carrier2-XX.XX▒▒▒▒ - hash 7
counter 28272: prev 0x7f9c12c90680 - current 0x7f9c086a2b58 - next 0x7f9c0b4f09e8 - value carrier2-XX.XX▒▒▒▒ - hash 7
[...]
hash table bucket 7 is still consistent regarding the loop, but already shows initial sign of corruption. There is one item of the list for hash bucket 14 visible:
counter 780: prev 0x7f9c0db57ac8 - current 0x7f9c02225700 - next 0x7f9bfbf7db08 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 781: prev 0x7f9c02225700 - current 0x7f9bfbf7db08 - next 0x7f9c10d977f0 - value carrier1-XX.XX- hash 14
counter 782: prev 0x7f9bfe3f3a78 - current 0x7f9c10d977f0 - next 0x7f9c0ae198b0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 783: prev 0x7f9c10d977f0 - current 0x7f9c0ae198b0 - next 0x7f9c12079f70 - value carrier3-XX.XX - hash 7
total size of hash table is 784
#### Log Messages
No special log messages observed.
#### SIP Traffic
SIP traffic looked ok during analysis of the core dumps.
### Possible Solutions
* adding additional safe-guards for the get_profile_size function to not access data from other hash buckets
* stopping the loop counter after some threshold
* finding and fixing the source of the internal data corruption (obviously)
* refactoring the dialog modules to use another approach for storing the dialog profile information
### Additional Information
* **Kamailio version**:
Kamailio 5.4.7, compiled from git repository
* **Operating System**:
CentOS 7.9
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2923
Module: kamailio
Branch: master
Commit: e792de60d24386cdd3816b67d4778f4eba33b0f0
URL: https://github.com/kamailio/kamailio/commit/e792de60d24386cdd3816b67d4778f4…
Author: Kamailio Dev <kamailio.dev(a)kamailio.org>
Committer: Kamailio Dev <kamailio.dev(a)kamailio.org>
Date: 2022-03-30T08:46:20+02:00
modules: readme files regenerated - evrexec ... [skip ci]
---
Modified: src/modules/evrexec/README
---
Diff: https://github.com/kamailio/kamailio/commit/e792de60d24386cdd3816b67d4778f4…
Patch: https://github.com/kamailio/kamailio/commit/e792de60d24386cdd3816b67d4778f4…
---
diff --git a/src/modules/evrexec/README b/src/modules/evrexec/README
index f7363c4c43..4d80d4c006 100644
--- a/src/modules/evrexec/README
+++ b/src/modules/evrexec/README
@@ -57,11 +57,12 @@ Chapter 1. Admin Guide
1. Overview
The module executes event route blocks or KEMI functions on dedicated
- processes at startup. The execution can be delayed for a specified
- interval of time.
+ processes at startup, upon an RPC command or data received on a custom
+ UDP socket.
- The actions in the event route should be a loop or other tasks that run
- forever.
+ For startup event route, the execution can be delayed for a specified
+ interval of time. The actions in the event route should be a loop or
+ other tasks that run forever.
2. Dependencies
@@ -87,7 +88,8 @@ Chapter 1. Admin Guide
The definition of an exec task. The value of the parameter must have
the following format:
- * "name=_string_;wait=_number_;workers=_number_"
+ * "name=_string_;wait=_number_;workers=_number_;sockaddr=_udp_socket_
+ "
The parameter can be set multiple times to get more exec tasks in same
configuration file.
@@ -97,15 +99,20 @@ Chapter 1. Admin Guide
will retrieve the index of the works in string format.
* workers - if set to 0 or 1 the task is executed in a dedicated
process. Any number > 1 will create more dedicated processes, each
- of them executing the task.
+ of them executing the startup task. For UDP data execution (when
+ 'sockaddr' is set), only 1 worker process is created.
* wait - timer interval in micro-seconds to wait inside the dedicated
process before executing the task.
+ * sockaddr - full UDP socket address in format 'udp:ip:port'
+ (example: 'udp:127.0.0.1:54321').
Default value is NULL.
Example 1.1. Set exec parameter
...
modparam("evrexec", "exec", "name=evrexec:timer;wait=1000;workers=1;")
+modparam("evrexec", "exec", "name=evrexec:udp;sockaddr=udp:127.0.0.1:4444;worker
+s=1;")
...
event_route[evrexec:timer] {
$var(x) = 0;
@@ -115,6 +122,11 @@ event_route[evrexec:timer] {
sleep("600");
}
}
+
+event_route[evrexec:udp] {
+ xinfo("udp socket data: [$evr(data)]\n");
+}
+
...
4. RPC Commands
Module: kamailio
Branch: master
Commit: a074608ca41e70b21de27e050869883e70e13033
URL: https://github.com/kamailio/kamailio/commit/a074608ca41e70b21de27e050869883…
Author: Daniel-Constantin Mierla <miconda(a)gmail.com>
Committer: Daniel-Constantin Mierla <miconda(a)gmail.com>
Date: 2022-03-30T08:40:45+02:00
evrexec: docs for sockaddr attribute
---
Modified: src/modules/evrexec/doc/evrexec_admin.xml
---
Diff: https://github.com/kamailio/kamailio/commit/a074608ca41e70b21de27e050869883…
Patch: https://github.com/kamailio/kamailio/commit/a074608ca41e70b21de27e050869883…
---
diff --git a/src/modules/evrexec/doc/evrexec_admin.xml b/src/modules/evrexec/doc/evrexec_admin.xml
index bcd6af5f49..565b33f560 100644
--- a/src/modules/evrexec/doc/evrexec_admin.xml
+++ b/src/modules/evrexec/doc/evrexec_admin.xml
@@ -17,12 +17,13 @@
<title>Overview</title>
<para>
The module executes event route blocks or KEMI functions on dedicated
- processes at startup. The execution can be delayed for a specified
- interval of time.
+ processes at startup, upon an RPC command or data received on a custom
+ UDP socket.
</para>
<para>
- The actions in the event route should be a loop or other tasks that
- run forever.
+ For startup event route, the execution can be delayed for a specified
+ interval of time. The actions in the event route should be a loop or
+ other tasks that run forever.
</para>
</section>
<section>
@@ -66,7 +67,7 @@
<itemizedlist>
<listitem>
<para>
- "name=_string_;wait=_number_;workers=_number_"
+ "name=_string_;wait=_number_;workers=_number_;sockaddr=_udp_socket_"
</para>
</listitem>
</itemizedlist>
@@ -88,7 +89,8 @@
<para>
<emphasis>workers</emphasis> - if set to 0 or 1 the task is executed
in a dedicated process. Any number > 1 will create more dedicated
- processes, each of them executing the task.
+ processes, each of them executing the startup task. For UDP data
+ execution (when 'sockaddr' is set), only 1 worker process is created.
</para>
</listitem>
<listitem>
@@ -97,6 +99,13 @@
inside the dedicated process before executing the task.
</para>
</listitem>
+ <listitem>
+ <para>
+ <emphasis>sockaddr</emphasis> - full UDP socket address in format
+ 'udp:ip:port' (example: 'udp:127.0.0.1:54321').
+ </para>
+ </listitem>
+
</itemizedlist>
<para>
<emphasis>
@@ -108,6 +117,7 @@
<programlisting format="linespecific">
...
modparam("evrexec", "exec", "name=evrexec:timer;wait=1000;workers=1;")
+modparam("evrexec", "exec", "name=evrexec:udp;sockaddr=udp:127.0.0.1:4444;workers=1;")
...
event_route[evrexec:timer] {
$var(x) = 0;
@@ -117,6 +127,11 @@ event_route[evrexec:timer] {
sleep("600");
}
}
+
+event_route[evrexec:udp] {
+ xinfo("udp socket data: [$evr(data)]\n");
+}
+
...
</programlisting>
</example>
### Description
Hello Guys I am having this issue where kamailio is receiving requests from SIPp (we are currently testing the platform and doing QA) but if we do more than 800 concurrent calls kamailio starts failing and slowing the requests, The scenario is the following:
I am sending 5000 calls at 100 CPS on TCP and I am having this route in the configuration file:
```
# Handle the calls to api
route[CALL_API] {
xlog("L_NOTICE"," Call request $var(call_request) method: $rm \n");
$var(loop_true)=1;
while ($var(loop_true)) {
if(!lua_run("call_request","$var(call_request)","$sht(token=>new_token)")) {
xlog("L_NOTICE", "SCRIPT: failed to execute lua function!\n");
}
if ($var(loop_true)){
sleep("1");
}
}
xlog("L_NOTICE", "SCRIPT: Sucess to execute lua function!\n");
}
```
So I am executing a LUA script for the requests and the API response will tell the destination of the call.
At 800 concurrent calls, I can notice SLOWLINESS in the SIPp testing tool, and then I will start receiving 408 timeouts.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
```
version: kamailio 5.3.8 (x86_64/linux)
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled with gcc 8.3.0
```
* **Operating System**:
<!--
Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...;
Kernel details (output of `lsb_release -a` and `uname -a`)
-->
```
BUT KAMAILIO IS RUNNING IN A DOCKER CONTAINER
Linux ip-10-10-0-27.us-west-1.compute.internal 4.14.246-187.474.amzn2.x86_64 #1 SMP Tue Sep 7 21:48:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
```
This sounds like a load issue or something because I believe there's a time that it's been added every time we hit the backend and wait for the response.
any help will be appreciated
Regards
Gio
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3068
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3068(a)github.com>
- added vs_certsubject_pvname and vs_pptgrants_pvname config params
- adjusted log level of load/unload events
<!-- Kamailio Pull Request Template -->
<!--
IMPORTANT:
- for detailed contributing guidelines, read:
https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
- pull requests must be done to master branch, unless they are backports
of fixes from master branch to a stable branch
- backports to stable branches must be done with 'git cherry-pick -x ...'
- code is contributed under BSD for core and main components (tm, sl, auth, tls)
- code is contributed GPLv2 or a compatible license for the other components
- GPL code is contributed with OpenSSL licensing exception
-->
#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list -->
- [ ] Commit message has the format required by CONTRIBUTING guide
- [ ] Commits are split per component (core, individual modules, libs, utils, ...)
- [ ] Each component has a single commit (if not, squash them into one commit)
- [ ] No commits to README files for modules (changes must be done to docbook files
in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change
- [ ] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [ ] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description
<!-- Describe your changes in detail -->
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/3063
-- Commit Summary --
* stirshaken: Add PVs to allow access to x509 subject and ppt grants
-- File Changes --
M src/modules/stirshaken/doc/stirshaken_admin.xml (41)
M src/modules/stirshaken/stirshaken_mod.c (72)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/3063.patchhttps://github.com/kamailio/kamailio/pull/3063.diff
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/3063
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/pull/3063(a)github.com>