<!--
Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please
use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on
sr-users mailing list:
*
https://lists.kamailio.org/mailman3/postorius/lists/sr-users.lists.kamailio…
If you have questions about developing extensions to Kamailio or its existing C code, ask
on sr-dev mailing list:
*
https://lists.kamailio.org/mailman3/postorius/lists/sr-dev.lists.kamailio.o…
Please try to fill this template as much as possible for any issue. It helps the
developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment).
-->
### Description
I've been simulating calls while there's (faked) issues against the diameter
server using TCP.
Ro_CCR() uses t_suspend() to suspend the SIP transaction while waiting for a response from
the charging server.
A cdp transaction is also created with a callback to resume_on_initial_ccr() in the ims
charging module.
When a response to a charging request is received, this is handled in cdp/api_process.c
where cdp_take_trans() find the given transaction and returns it without a lock held. The
callback is then executed, and everything is fine.
The problem occurs when a CCR times out, and the callback is done by the cdp_trans_timer()
function. This hold a lock for the entire transaction list while executing callbacks for
every timed out entry. In my case, the charging module does t_continue and the execution
of that routing block is started again.
Here we reply a cause indicating it was an issue with the charging server. This will
trigger a new callback from the ims_dialog module which again will try to start a new cdp
transaction (CCR Stop) - still while the cdp_trans_timer function holds a lock.
Since the cdp timers now are blocked, no new connection attempts to the failed diameter
server is done either.
### Troubleshooting
#### Reproduction
<!--
If the issue can be reproduced, describe how it can be done.
-->
```
Ro_CCR("RO_CONTINUE_DISPATCHER", "orig", 30, "",
"")
this times out, and will do continue in the routing block specified (based on the cdp
transaction timeout timer)
route[RO_CONTINUE_DISPATCHER] {
send_reply("500", "Internal error");
exit;
}
```
### Debugging Data
<!--
If you got a core dump, use gdb to extract troubleshooting data - full backtrace,
local variables and the list of the code at the issue location.
gdb /path/to/kamailio /path/to/corefile
bt full
info locals
list
If you are familiar with gdb, feel free to attach more of what you consider to
be relevant.
-->
cdp_add_trans() on line 2 is blocked while waiting for lock held by cdp_trans_timer() on
line 26 to be freed. The line numbers are a bit off because of some debugging.
```
#0 0x00007f6d9a3e0e29 in syscall () from /lib64/libc.so.6
#1 0x00007f6d979622f6 in futex_get (lock=0x7f6d706622a0) at
../../core/mem/../futexlock.h:121
#2 0x00007f6d979632cd in cdp_add_trans (msg=0x7f6d70921280, cb=0x7f6d9084caf5
<resume_on_termination_ccr>, ptr=0x0, timeout=5, auto_drop=1) at transaction.c:126
#3 0x00007f6d97980740 in AAASendMessage (message=0x7f6d70921280,
callback_f=0x7f6d9084caf5 <resume_on_termination_ccr>, callback_param=0x0) at
diameter_comm.c:158
#4 0x00007f6d9084c71b in send_ccr_stop_with_param (ro_session=0x7f6d70921dc0, code=0,
reason=0x7ffe9b6cd400) at ims_ro.c:1183
#5 0x00007f6d90829f6d in dlg_terminated (dlg=0x7f6d7091f5d0, type=4, termcode=0,
reason=0x7f6d9086f998 "call failed", _params=0x7f6d90d20280 <params>) at
dialog.c:249
#6 0x00007f6d908216cd in dlg_callback_received (dlg=0x7f6d7091f5d0, type=4,
_params=0x7f6d90d20280 <params>) at dialog.c:29
#7 0x00007f6d90aed1b9 in run_dlg_callbacks (type=4, dlg=0x7f6d7091f5d0,
req=0x7f6d7091fb68, rpl=0xffffffffffffffff, dir=2, dlg_data=0x0) at dlg_cb.c:271
#8 0x00007f6d90abced4 in dlg_onreply (t=0x7f6d70931e10, type=1048576,
param=0x7ffe9b6cda80) at dlg_handlers.c:1363
#9 0x00007f6d94252f2c in run_trans_callbacks_internal (cb_lst=0x7f6d70931eb8,
type=1048576, trans=0x7f6d70931e10, params=0x7ffe9b6cda80) at t_hooks.c:236
#10 0x00007f6d942530e4 in run_trans_callbacks_with_buf (type=1048576, rbuf=0x7f6d70931f10,
req=0x7f6d7091fb68, repl=0xffffffffffffffff, flags=0) at t_hooks.c:272
#11 0x00007f6d941d2860 in _reply_light (trans=0x7f6d70931e10, buf=0x7f6d97d10768
"SIP/2.0 480 Temporarily Unavailable\r\nVia: SIP/2.0/UDP
192.168.10.15:5060;branch=z9hG4bK427d31b9\r\nFrom: \"sipp\"
<sip:+4749484948@192.168.10.15;user=phone>;tag=as22aa85b4\r\nTo:
<sip:+4749494848@192.168.10.20"..., len=386, code=480, to_tag=0x7f6d944bb5e0
<tm_tags> "b239b52f382f2203913409401a079c72-9b859c73", to_tag_len=41,
lock=0, bm=0x7ffe9b6ce5d0)
at t_reply.c:520
#12 0x00007f6d941d3e8e in _reply (trans=0x7f6d70931e10, p_msg=0x7f6d70946878, code=480,
reason=0x7ffe9b6ce750, lock=0) at t_reply.c:689
#13 0x00007f6d941dfdcf in t_reply_str_unsafe (t=0x7f6d70931e10, p_msg=0x7f6d70946878,
code=480, reason=0x7ffe9b6ce750) at t_reply.c:1753
#14 0x00007f6d941fd400 in ki_t_reply (msg=0x7f6d70946878, code=480, reason=0x7ffe9b6ce750)
at tm.c:1538
#15 0x00007f6d941fdcee in w_t_reply_wrp (msg=0x7f6d70946878, code=480, txt=0x7f6d97d106e8
"Temporarily Unavailable") at tm.c:1598
#16 0x00007f6d93d24c0e in send_reply (msg=0x7f6d70946878, code=480, reason=0x7ffe9b6ce880)
at sl.c:287
#17 0x00007f6d93d25728 in w_send_reply (msg=0x7f6d70946878, p1=0x7f6d97cee2e8
"x)\320\227m\177", p2=0x7f6d97cee388 "\350)\320\227m\177") at
sl.c:329
#18 0x0000000000466fb3 in do_action (h=0x7ffe9b6cf9b0, a=0x7f6d97d025d8,
msg=0x7f6d70946878) at core/action.c:1131
#19 0x0000000000474188 in run_actions (h=0x7ffe9b6cf9b0, a=0x7f6d97d02210,
msg=0x7f6d70946878) at core/action.c:1618
#20 0x000000000047028d in do_action (h=0x7ffe9b6cf9b0, a=0x7f6d97d02f00,
msg=0x7f6d70946878) at core/action.c:1241
#21 0x0000000000474188 in run_actions (h=0x7ffe9b6cf9b0, a=0x7f6d97cfdb68,
msg=0x7f6d70946878) at core/action.c:1618
#22 0x00000000004748cc in run_top_route (a=0x7f6d97cfdb68, msg=0x7f6d70946878, c=0x0) at
core/action.c:1701
#23 0x00007f6d9425802a in t_continue_helper (hash_index=39149, label=1245761144,
rtact=0x7f6d97cfdb68, cbname=0x0, cbparam=0x0, skip_timer=1) at t_suspend.c:307
#24 0x00007f6d9425c6d8 in t_continue (hash_index=39149, label=1245761144,
route=0x7f6d97cfdb68) at t_suspend.c:603
#25 0x00007f6d9085b372 in resume_on_initial_ccr (is_timeout=1, param=0x7f6d70921cc8,
cca=0x0, elapsed_msecs=1) at ims_ro.c:1786
#26 0x00007f6d979646d1 in cdp_trans_timer (now=1699966590, ptr=0x0) at transaction.c:223
#27 0x00007f6d9798e3b0 in timer_loop () at timer.c:117
#28 0x00007f6d9798f639 in timer_process (returns=0) at timer.c:217
#29 0x00007f6d97942cf8 in diameter_peer_start (blocking=0) at diameter_peer.c:350
#30 0x00007f6d97932bb2 in cdp_child_init (rank=0) at cdp_mod.c:272
```
#### Log Messages
<!--
Check the syslog file and if there are relevant log messages printed by Kamailio, add them
next, or attach to issue, or provide a link to download them (e.g., to a pastebin site).
-->
```
```
#### SIP Traffic
<!--
If the issue is exposed by processing specific SIP messages, grab them with ngrep or save
in a pcap file, then add them next, or attach to issue, or provide a link to download them
(e.g., to a pastebin site).
-->
```
```
### Possible Solutions
I'm considering changing cdp_trans_timer() to iterate through the list, remove expired
items while keeping them in a new list so all callbacks can be executed without the lock
held.
Could also be fixed within the ims_charging module. For example by not sending a CCR-T
when there's no answer to the CCR-I, but I feel that's more of a band aid rather
than a real fix.
Any feedback or suggestions would be nice.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
```
master
```
* **Operating System**:
<!--
Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04,
CentOS 7.1, ...), MacOS, xBSD, Solaris, ...;
Kernel details (output of `lsb_release -a` and `uname -a`)
-->
```
centos 7.8
```
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3641
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3641(a)github.com>