<!--
Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev…
[View More]Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment).
-->
### Description
Kamailio is configured to use the DMQ replication for dialog.
When running a load test Kamailio DMQ workers eventually becomes unresponsive and stops processing DMQ messages. As a side effect the shm memory is fully consumed then SIP processing stops happening properly and everything else stops working due to that.
### Troubleshooting
Investigation showed that the DMQ workers are locked in the dmq_send_all_dlgs.
Debugging and studying the function it was identified that it is prone to data inconsistency because it uses a copy of the (d_table->entries)[index] and not a pointer to it. This gives a small window where a change in the d_table for that entry can occur and changing the lock information before the copy can actually get the lock making the process to get stuck in the lock get.
There is a need for some expert to validate this conclusion.
#### Reproduction
Run a simple load test making calls at a rate of ~5 cps and keep around ~2000 calls connected all the time. A higher cps seems to make it easier to reproduce the problem.
Adding network degradation to the environment makes the problem happens, but when running a tool such as SIPp for the load test, the retransmission can be forced by simply killing the SIPp instance receiving calls which will then force Kamailio to retransmit.
#### Debugging Data
```
7236 # DMQ WORKER
#0 0x00007fa7bd760bf9 in syscall () from /lib64/libc.so.6
#1 0x00007fa793395484 in futex_get (lock=0x7ffd072acaa4) at ../../core/futexlock.h:121
#2 0x00007fa7933a123b in dmq_send_all_dlgs (dmq_node=0x0) at dlg_dmq.c:657
(gdb) p entry
$1 = {first = 0x7fa79f8540d0, last = 0x7fa79f8540d0, next_id = 86622, lock = {val = 2}, locker_pid = {val = 7211}, rec_lock_level = 0}
(gdb) p &entry->lock
$2 = (gen_lock_t *) 0x7ffd072acaa4
(gdb) p index
$3 = 187
(gdb) p (d_table->entries)[index]
$1 = {first = 0x0, last = 0x0, next_id = 86652, lock = {val = 0}, locker_pid = {val = 0}, rec_lock_level = 0}
(gdb) p entry
$2 = {first = 0x7fa79f8540d0, last = 0x7fa79f8540d0, next_id = 86622, lock = {val = 2}, locker_pid = {val = 7211}, rec_lock_level = 0}
#3 0x00007fa79339cfa3 in dlg_dmq_handle_msg (msg=0x7fa79fca15c8, resp=0x7ffd072aceb0, node=0x7fa79ee93cb8) at dlg_dmq.c:391
#4 0x00007fa793f3bb98 in worker_loop (id=0) at worker.c:113
#5 0x00007fa793f39314 in child_init (rank=0) at dmq.c:296
#6 0x00000000005990e8 in init_mod_child (m=0x7fa7bc735d98, rank=0) at core/sr_module.c:780
#7 0x0000000000598d5d in init_mod_child (m=0x7fa7bc736b00, rank=0) at core/sr_module.c:776
#8 0x0000000000598d5d in init_mod_child (m=0x7fa7bc737038, rank=0) at core/sr_module.c:776
#9 0x0000000000598d5d in init_mod_child (m=0x7fa7bc738eb8, rank=0) at core/sr_module.c:776
#10 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73a0e0, rank=0) at core/sr_module.c:776
#11 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73ab70, rank=0) at core/sr_module.c:776
#12 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73d8e0, rank=0) at core/sr_module.c:776
#13 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73e720, rank=0) at core/sr_module.c:776
#14 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73f1b8, rank=0) at core/sr_module.c:776
#15 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73fd08, rank=0) at core/sr_module.c:776
#16 0x0000000000598d5d in init_mod_child (m=0x7fa7bc740128, rank=0) at core/sr_module.c:776
#17 0x0000000000598d5d in init_mod_child (m=0x7fa7bc7409d0, rank=0) at core/sr_module.c:776
#18 0x0000000000598d5d in init_mod_child (m=0x7fa7bc7414d8, rank=0) at core/sr_module.c:776
#19 0x0000000000598d5d in init_mod_child (m=0x7fa7bc742cd8, rank=0) at core/sr_module.c:776
#20 0x0000000000598d5d in init_mod_child (m=0x7fa7bc742fc8, rank=0) at core/sr_module.c:776
#21 0x0000000000598d5d in init_mod_child (m=0x7fa7bc744ed0, rank=0) at core/sr_module.c:776
#22 0x0000000000598d5d in init_mod_child (m=0x7fa7bc745920, rank=0) at core/sr_module.c:776
#23 0x0000000000598d5d in init_mod_child (m=0x7fa7bc748d60, rank=0) at core/sr_module.c:776
#24 0x0000000000598d5d in init_mod_child (m=0x7fa7bc7498f8, rank=0) at core/sr_module.c:776
#25 0x0000000000599887 in init_child (rank=0) at core/sr_module.c:825
#26 0x000000000042ab2d in main_loop () at main.c:1763
#27 0x0000000000433a96 in main (argc=12, argv=0x7ffd072aeb48) at main.c:2856
7237 # DMQ WORKER
#0 0x00007fa7bd760bf9 in syscall () from /lib64/libc.so.6
#1 0x00007fa793395484 in futex_get (lock=0x7ffd072acaa4) at ../../core/futexlock.h:121
#2 0x00007fa7933a123b in dmq_send_all_dlgs (dmq_node=0x0) at dlg_dmq.c:657
(gdb) p entry
$1 = {first = 0x7fa79f38bdb8, last = 0x7fa79f38bdb8, next_id = 15672, lock = {val = 2}, locker_pid = {val = 7221}, rec_lock_level = 1}
(gdb) p &entry->lock
$2 = (gen_lock_t *) 0x7ffd072acaa4
(gdb) p index
$1 = 402
(gdb) p (d_table->entries)[index]
$1 = {first = 0x0, last = 0x0, next_id = 15702, lock = {val = 0}, locker_pid = {val = 0}, rec_lock_level = 0}
(gdb) p entry
$4 = {first = 0x7fa79f38bdb8, last = 0x7fa79f38bdb8, next_id = 15672, lock = {val = 2}, locker_pid = {val = 7221}, rec_lock_level = 1}
(gdb) p &(d_table->entries)[index]->lock
$5 = (gen_lock_t *) 0x7fa79ec44e2c
(gdb) p &entry->lock
$6 = (gen_lock_t *) 0x7ffd072acaa4
#3 0x00007fa79339cfa3 in dlg_dmq_handle_msg (msg=0x7fa79f9e4d90, resp=0x7ffd072aceb0, node=0x7fa79ee93cb8) at dlg_dmq.c:391
#4 0x00007fa793f3bb98 in worker_loop (id=1) at worker.c:113
#5 0x00007fa793f39314 in child_init (rank=0) at dmq.c:296
#6 0x00000000005990e8 in init_mod_child (m=0x7fa7bc735d98, rank=0) at core/sr_module.c:780
#7 0x0000000000598d5d in init_mod_child (m=0x7fa7bc736b00, rank=0) at core/sr_module.c:776
#8 0x0000000000598d5d in init_mod_child (m=0x7fa7bc737038, rank=0) at core/sr_module.c:776
#9 0x0000000000598d5d in init_mod_child (m=0x7fa7bc738eb8, rank=0) at core/sr_module.c:776
#10 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73a0e0, rank=0) at core/sr_module.c:776
#11 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73ab70, rank=0) at core/sr_module.c:776
#12 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73d8e0, rank=0) at core/sr_module.c:776
#13 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73e720, rank=0) at core/sr_module.c:776
#14 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73f1b8, rank=0) at core/sr_module.c:776
#15 0x0000000000598d5d in init_mod_child (m=0x7fa7bc73fd08, rank=0) at core/sr_module.c:776
#16 0x0000000000598d5d in init_mod_child (m=0x7fa7bc740128, rank=0) at core/sr_module.c:776
#17 0x0000000000598d5d in init_mod_child (m=0x7fa7bc7409d0, rank=0) at core/sr_module.c:776
#18 0x0000000000598d5d in init_mod_child (m=0x7fa7bc7414d8, rank=0) at core/sr_module.c:776
#19 0x0000000000598d5d in init_mod_child (m=0x7fa7bc742cd8, rank=0) at core/sr_module.c:776
#20 0x0000000000598d5d in init_mod_child (m=0x7fa7bc742fc8, rank=0) at core/sr_module.c:776
#21 0x0000000000598d5d in init_mod_child (m=0x7fa7bc744ed0, rank=0) at core/sr_module.c:776
#22 0x0000000000598d5d in init_mod_child (m=0x7fa7bc745920, rank=0) at core/sr_module.c:776
#23 0x0000000000598d5d in init_mod_child (m=0x7fa7bc748d60, rank=0) at core/sr_module.c:776
#24 0x0000000000598d5d in init_mod_child (m=0x7fa7bc7498f8, rank=0) at core/sr_module.c:776
#25 0x0000000000599887 in init_child (rank=0) at core/sr_module.c:825
#26 0x000000000042ab2d in main_loop () at main.c:1763
#27 0x0000000000433a96 in main (argc=12, argv=0x7ffd072aeb48) at main.c:2856
7211
(gdb) bt
#0 0x00007fa7bd7677d3 in __recvfrom_nocancel () from /lib64/libc.so.6
#1 0x00000000004d6a45 in udp_rcv_loop () at core/udp_server.c:457
#2 0x0000000000429d47 in main_loop () at main.c:1683
#3 0x0000000000433a96 in main (argc=12, argv=0x7ffd072aeb48) at main.c:2856
7221
#0 0x00007fa7bd7677d3 in __recvfrom_nocancel () from /lib64/libc.so.6
#1 0x00000000004d6a45 in udp_rcv_loop () at core/udp_server.c:457
#2 0x0000000000429d47 in main_loop () at main.c:1683
#3 0x0000000000433a96 in main (argc=12, argv=0x7ffd072aeb48) at main.c:2856
```
### Possible Solutions
Suggested patch (was not compiled and tested yet):
```
diff --git a/src/modules/dialog/dlg_dmq.c b/src/modules/dialog/dlg_dmq.c
index a5628f4..b681759 100644
--- a/src/modules/dialog/dlg_dmq.c
+++ b/src/modules/dialog/dlg_dmq.c
@@ -646,22 +646,22 @@
int dmq_send_all_dlgs(dmq_node_t* dmq_node) {
int index;
- dlg_entry_t entry;
+ dlg_entry_t *entry;
dlg_cell_t *dlg;
LM_DBG("sending all dialogs \n");
for(index = 0; index< d_table->size; index++){
/* lock the whole entry */
- entry = (d_table->entries)[index];
- dlg_lock( d_table, &entry);
+ entry = &(d_table->entries)[index];
+ dlg_lock( d_table, entry);
- for(dlg = entry.first; dlg != NULL; dlg = dlg->next){
+ for(dlg = entry->first; dlg != NULL; dlg = dlg->next){
dlg->dflags |= DLG_FLAG_CHANGED_PROF;
dlg_dmq_replicate_action(DLG_DMQ_UPDATE, dlg, 0, dmq_node);
}
- dlg_unlock( d_table, &entry);
+ dlg_unlock( d_table, entry);
}
return 0;
```
### Additional Information
kamailio 5.4.2 (x86_64/linux) 5bd72f
* **Operating System**:
```
Centos 7
3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2547
[View Less]
(from devmeeting #Fosdem) - adding new dependencies as core dependencies
- Curl
- libssl
- uuid
- libunistring
- libxml2
Any other candidates? They have to be libraries that are commonly installed on most Linux/Unix system.
This would mean that a lot of more modules will be built by default, like TLS, HTTP_CLIENT, WEBSOCKET.
We may also look over the names of the packages/groups for the build system and linux distros
--
You are receiving this because you are subscribed to this thread.
…
[View More]Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/969
[View Less]
### Description
>From what I experience dynamic routing (drouting) module is not supporting SRV resolution if NAPTR is turned off (dns_try_naptr=no). Is this intentional to not have SRV if NAPTR is turned off? Or some special format of entry is expected for SRV records in gateways table? I've tried with just **domain** and **_sip._udp.domain** but it it the same.
#### Reproduction
1. Use Dynamic routing module
2. set dns_try_naptr=no
3. insert gateway with SRV domain name in dynamic …
[View More]routing gateways table.
4. execute kamcmd drouting.reload
5. in the log of the kamailio is the error: ERROR: drouting [routing.c:337]: add_dst(): cannot resolve <domainexample>
### Possible Solutions
Upon reload of drouting tables, dynamic routing tries to add destinations and resolve them:
https://github.com/kamailio/kamailio/blob/08a9ed3d136c861f990e4f63c2c0544e0…
But, since port that is sent is NULL, SRV is being skipped later on. It seems that SRV would be executed if port would be 0 instead of NULL.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
```
version: kamailio 5.1.9 (x86_64/linux)
flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled with gcc 4.9.2
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2553
[View Less]