### Description
The Kamailio 5.4.x dialog profiles functionality can lead to dead-lock on certain high-load scenarios.
The Kamailio dialog profiles are used to track parallel channels for about 200 outgoing PSTN carrier interconnections. During high traffic times (like several thousands parallel calls) the Kamailio server will frequently (e.g. hourly) goes into an end-less loop while executing get_profile_size in the configuration script. This causes the locking for the dialog profiles never be released and Kamailio will stop serving traffic. Internal monitoring tools and RPC commands stay working, as long as they do not touch the dialog functionality.
A similar (dedicated) Kamailio setup is used for tracking parallel channels for customers. Here the dead-lock is not observed that frequently, but aparentely also some crashes happens in a much longer time interval.
### Troubleshooting
After analysis of the back-traces with GDB the get_profile_size() function was removed from the configuration script. After this change the crash did not happened anymore for several days.
#### Reproduction
Issue could not be reproduced so far.
#### Debugging Data
##### bt 1 (some data removed)
(gdb) bt
\# 0 0x00007f57cf3b00da in get_profile_size (profile=0x7f50ccbc7e80, value=0x7ffd9928f300) at dlg_profile.c:859
n = 364
i = 12
ph = 0x7f50d3e4b7d0
\# 1 0x00007f57cf419c67 in w_get_profile_size_helper (msg=0x7f57d699d418, profile=0x7f50ccbc7e80, value=0x7ffd9928f300, spd=0x7f57d6916960) at dialog.c:941
\# 2 0x00007f57cf41a459 in w_get_profile_size3 (msg=0x7f57d699d418, profile=0x7f50ccbc7e80, value=0x7f57d6935118, result=0x7f57d6916960) at dialog.c:982
\# 3 0x0000000000463fea in do_action (h=0x7ffd99293610, a=0x7f57d6936488, msg=0x7f57d699d418) at core/action.c:1094
\# 4 0x00000000004711ee in run_actions (h=0x7ffd99293610, a=0x7f57d6936488, msg=0x7f57d699d418) at core/action.c:1581
\# 5 0x000000000046058b in do_action (h=0x7ffd99293610, a=0x7f57d690fda8, msg=0x7f57d699d418) at core/action.c:700
The first back-trace was taking from a running process with gdb. The counter in f0 does not increased that much during this time, probably due the overflow of the loop counter.
##### bt2 (analysis with data structure with gdb scripts)
Here the loop counter in f0 showed a really high value. Expected size of dialog profiles hash table:
(gdb) p profile->entries[3]
$4 = {first = 0x7f9bfd4aad98, content = 2068}
(gdb) p profile->entries[7]
$3 = {first = 0x7f9c12079f70, content = 784}
(gdb) p profile->entries[12]
$6 = {first = 0x7f9c02be5d50, content = 7600}
(gdb) p profile->entries[14]
$2 = {first = 0x7f9bff636de8, content = 6764}
hash table bucket 14 shows a lot of corruption and the loop never ends (carrier names and IPs replaced). The list for hash bucket 7 got linked to the list for hash bucket 14:
counter 6755: prev 0x7f9c0b9dcde0 - current 0x7f9c02e5b378 - next 0x7f9c0a5f9ba0 - value carrier1-XX.XX - hash 14
counter 6756: prev 0x7f9c02e5b378 - current 0x7f9c0a5f9ba0 - next 0x7f9c0860b968 - value carrier1-XX.XX▒▒▒▒ - hash 14
counter 6757: prev 0x7f9c0a5f9ba0 - current 0x7f9c0860b968 - next 0x7f9bfe3f3a78 - value carrier1-XX.XX▒▒▒▒ - hash 14
counter 6758: prev 0x7f9c0860b968 - current 0x7f9bfe3f3a78 - next 0x7f9c10d977f0 - value carrier1-XX.XX - hash 14
counter 6759: prev 0x7f9bfe3f3a78 - current 0x7f9c10d977f0 - next 0x7f9c0ae198b0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 6760: prev 0x7f9c10d977f0 - current 0x7f9c0ae198b0 - next 0x7f9c12079f70 - value carrier3-XX.XX - hash 7
counter 6761: prev 0x7f9c0ae198b0 - current 0x7f9c12079f70 - next 0x7f9c011f2540 - value-carrier2-XX.XX▒▒▒▒ - hash 7
counter 6762: prev 0x7f9c12079f70 - current 0x7f9c011f2540 - next 0x7f9bfff886f0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 6763: prev 0x7f9c011f2540 - current 0x7f9bfff886f0 - next 0x7f9c05db00a8 - value carrier3-XX.XX= - hash 7
[...]
counter 28270: prev 0x7f9c019d06e8 - current 0x7f9bfaf18290 - next 0x7f9c12c90680 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 28271: prev 0x7f9bfaf18290 - current 0x7f9c12c90680 - next 0x7f9c086a2b58 - value-carrier2-XX.XX▒▒▒▒ - hash 7
counter 28272: prev 0x7f9c12c90680 - current 0x7f9c086a2b58 - next 0x7f9c0b4f09e8 - value carrier2-XX.XX▒▒▒▒ - hash 7
[...]
hash table bucket 7 is still consistent regarding the loop, but already shows initial sign of corruption. There is one item of the list for hash bucket 14 visible:
counter 780: prev 0x7f9c0db57ac8 - current 0x7f9c02225700 - next 0x7f9bfbf7db08 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 781: prev 0x7f9c02225700 - current 0x7f9bfbf7db08 - next 0x7f9c10d977f0 - value carrier1-XX.XX- hash 14
counter 782: prev 0x7f9bfe3f3a78 - current 0x7f9c10d977f0 - next 0x7f9c0ae198b0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 783: prev 0x7f9c10d977f0 - current 0x7f9c0ae198b0 - next 0x7f9c12079f70 - value carrier3-XX.XX - hash 7
total size of hash table is 784
#### Log Messages
No special log messages observed.
#### SIP Traffic
SIP traffic looked ok during analysis of the core dumps.
### Possible Solutions
* adding additional safe-guards for the get_profile_size function to not access data from other hash buckets
* stopping the loop counter after some threshold
* finding and fixing the source of the internal data corruption (obviously)
* refactoring the dialog modules to use another approach for storing the dialog profile information
### Additional Information
* **Kamailio version**:
Kamailio 5.4.7, compiled from git repository
* **Operating System**:
CentOS 7.9
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2923
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.
A user has added themself to the list of users assigned to this task.
FS#100 - Assignment operators don't work
User who did this - Alex Hermann (axlh)
http://sip-router.org/tracker/index.php?do=details&task_id=100
You are receiving this message because you have requested it from the Flyspray bugtracking system. If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.
Changes to example files for PCSCF in misc/examples/ims to make funcitional with current stable version.
<!--
IMPORTANT:
- for detailed contributing guidelines, read:
https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
- pull requests must be done to master branch, unless they are backports
of fixes from master branch to a stable branch
- backports to stable branches must be done with 'git cherry-pick -x ...'
- code is contributed under BSD for core and main components (tm, sl, auth, tls)
- code is contributed GPLv2 or a compatible license for the other components
- GPL code is contributed with OpenSSL licensing exception
-->
#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list -->
- [x] Commit message has the format required by CONTRIBUTING guide
- [x] Commits are split per component (core, individual modules, libs, utils, ...)
- [x] Each component has a single commit (if not, squash them into one commit)
- [x] No commits to README files for modules (changes must be done to docbook files
in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change
- [x] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [x] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description
Changes to example files for PCSCF in misc/examples/ims to make funcitional with current stable version.
Changes:
- Loading IPsec module prior to IMS Usrloc PCSCF (Now required)
- removed modparam("ims_usrloc_pcscf", "hashing_type", 2) from example (This parameter was removed some time ago)
- Fix to formatting of single MySQL connection to work in current version
- Bind to any IP by default
- Dispatcher parameters only loaded if required
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/2203
-- Commit Summary --
* misc: examples: IMS PCSCF kamailio.cfg update
* misc: examples: IMS PCSCF pcscf.cfg update
-- File Changes --
M misc/examples/ims/pcscf/kamailio.cfg (6)
M misc/examples/ims/pcscf/pcscf.cfg.sample (12)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/2203.patchhttps://github.com/kamailio/kamailio/pull/2203.diff
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/2203
<!--
Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment).
-->
### Description
Hi, I ran into an issue where the Kamailio service seemingly froze (didn't crash, but failed to receive and deliver calls) while running, I went through the logs a little bit, and saw the same log pattern (pasted below) recouring every few months or so, on an older installation (5.2.0). Usually, it ended up automatically restarting the service. But not this time, I had to reload it manually to make the service work again.
Is this a known issue?
How can I make sure that the service will be able to restart the next time it happens?
Edward
### Troubleshooting
#### Debugging Data
<!--
If you got a core dump, use gdb to extract troubleshooting data - full backtrace,
local variables and the list of the code at the issue location.
gdb /path/to/kamailio /path/to/
bt full
info locals
list
If you are familiar with gdb, feel free to attach more of what you consider to
be relevant.
-->
```
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/kamailio...done.
warning: exec file is newer than core file.
[New LWP 11413]
Core was generated by `kamailio'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f4e4a52e428 in ?? ()
(gdb) Quit
(gdb) bt full
#0 0x00007f4e4a52e428 in ?? ()
No symbol table info available.
#1 0x00007f4e4a53002a in ?? ()
No symbol table info available.
#2 0x0000000000000020 in ?? ()
No symbol table info available.
#3 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) info locals
No symbol table info available.
(gdb) list
1871 & now if we don't need it */
1872 #ifdef USE_SLOW_TIMER
1873 + 1 /* slow timer process */
1874 #endif
1875 #ifdef USE_TCP
1876 +((!tcp_disable)?( 1/* tcp main */ + tcp_listeners ):0)
1877 #endif
1878 #ifdef USE_SCTP
1879 +((!sctp_disable)?sctp_listeners:0)
1880 #endif
(gdb)
```
#### Log Messages
<!--
Check the syslog file and if there are relevant log messages printed by Kamailio, add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site).
-->
```
Jun 28 16:35:22 kamprodegres kamailio: INFO: <core> [core/sctp_core.c:74]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
Jun 28 16:35:22 kamprodegres kamailio: WARNING: <core> [core/socket_info.c:1394]: fix_hostname(): could not rev. resolve 70.36.25.87
Jun 28 16:35:22 kamprodegres kamailio: WARNING: <core> [core/socket_info.c:1394]: fix_hostname(): could not rev. resolve 70.36.25.87
Jun 28 16:35:22 kamprodegres kamailio: INFO: <core> [core/tcp_main.c:5042]: init_tcp(): using epoll_lt as the io watch method (auto detected)
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: jsonrpcs [jsonrpcs_sock.c:197]: jsonrpc_dgram_mod_init(): the socket /var/run/kamailio/kamailio_rpc.sock already exists, trying to delete it...
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: rr [../outbound/api.h:52]: ob_load_api(): unable to import bind_ob - maybe module is not loaded
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: rr [rr_mod.c:177]: mod_init(): outbound module not available
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [main.c:2779]: main(): processes (at least): 40 - shm size: 67108864 - pkg size: 8388608
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [core/udp_server.c:154]: probe_max_receive_buffer(): SO_RCVBUF is initially 212992
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [core/udp_server.c:206]: probe_max_receive_buffer(): SO_RCVBUF is finally 425984
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [core/udp_server.c:154]: probe_max_receive_buffer(): SO_RCVBUF is initially 212992
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [core/udp_server.c:206]: probe_max_receive_buffer(): SO_RCVBUF is finally 425984
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [core/udp_server.c:154]: probe_max_receive_buffer(): SO_RCVBUF is initially 212992
Jun 28 16:35:22 kamprodegres kamailio[8489]: INFO: <core> [core/udp_server.c:206]: probe_max_receive_buffer(): SO_RCVBUF is finally 425984
Jun 28 16:35:22 kamprodegres kamailio[8518]: INFO: jsonrpcs [jsonrpcs_sock.c:443]: jsonrpc_dgram_process(): a new child 0/8518
Jun 28 16:35:22 kamprodegres kamailio[8519]: INFO: ctl [io_listener.c:214]: io_listen_loop(): io_listen_loop: using epoll_lt io watch method (config)
Jun 29 18:13:37 kamprodegres kamailio[8513]: INFO: {1 581461 CANCEL 402954192_49936786(a)67.231.13.146} tm [t_reply.c:478]: _reply_light(): can't generate 487 reply when a final 404 was sent out
Jun 29 18:27:00 kamprodegres kamailio[8508]: INFO: {1 211236 CANCEL 405019408_127896329(a)67.231.13.146} tm [t_reply.c:478]: _reply_light(): can't generate 487 reply when a final 404 was sent out
Jun 29 18:51:10 kamprodegres /usr/local/sbin/kamailio[2269]: NOTICE: <core> [main.c:725]: handle_sigs(): Thank you for flying kamailio!!!
Jun 29 18:51:10 kamprodegres /usr/local/sbin/kamailio[2326]: INFO: <core> [main.c:847]: sig_usr(): signal 15 received
```
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
```
version: kamailio 5.3.2 (x86_64/linux) 7ba545
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 7ba545
compiled on 16:37:32 Jun 14 2020 with gcc 5.4.0
```
* **Operating System**:
<!--
Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...;
Kernel details (output of `uname -a`)
-->
```
Ubuntu 16.04
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2380
version: kamailio 5.5.2 (arm6/linux) 55e232
raspberry pi 3b
I initially stated that I was running on 5.5.3 but I was mistaken. However I checked the 5.5.3 source and it appears to be the same.
I think that there is a bug in the tm module with respect to the "tm:local-response" event route. In the "t_reply.c" file, there is a static variable called '_tm_local_response_set_lookup'. This variable is initialized at load time to zero. It is checked in the '_reply_light()' routine and will initiate a local callback from the config script if "armed". The problem as I see it is that if the callback is readied, the variable is set to one. But it is never reset.
So the observed behavior is as follows. A REGISTER is received and the request_route arms the callback. The REGISTER requires an authorization (local database sqlite). After the 401 is sent back, the callback via the event route is called as expected. All good except that the event route script fragment is never executed again after the first call... ever, even if it's a new REGISTER request.
I would think that somewhere in the tm module, the variable should be reset to zero so that subsequent transactions can initiate the event route again. Or maybe the variable ought to live somewhere in the transaction cell???
. . .
t_on_reply ("MY_FRAG");
t_on_failure ("MY_FRAG");
. . .
event_route [tm:local-response] {
xlog ("L_NOTICE", IN tm:local-response\n");
my_function();
}
. . .
So 'my_function()' is only called once.
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3064
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3064(a)github.com>