Fixes for SEGV in dialog module.
<!-- Kamailio Pull Request Template -->
<!--
IMPORTANT:
- for detailed contributing guidelines, read:
https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
- pull requests must be done to master branch, unless they are backports
of fixes from master branch to a stable branch
- backports to stable branches must be done with 'git cherry-pick -x ...'
- code is contributed under BSD for core and main components (tm, sl, auth, tls)
- code is contributed GPLv2 or a compatible license for the other components
- GPL code is contributed with OpenSSL licensing exception
-->
#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list -->
- [x] Commit message has the format required by CONTRIBUTING guide
- [x] Commits are split per component (core, individual modules, libs, utils, ...)
- [x] Each component has a single commit (if not, squash them into one commit)
- [x] No commits to README files for modules (changes must be done to docbook files
in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change
- [ ] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [ ] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description
Hi !
The problem - periodically, 2-3 times in а month, kamailio dumps core with SEGV.
Different OS versions (Ubuntu 18,20,22), different kamailio versions from 5.7.1 up to 5.8.4,
high call load, python kemi, actively used dialog module.
Analyzing core files shows more or less the same picture - SEGV happends in dialog mododule
around the dialog vars, can be in rpc call (dlg.list), in destroy dlg, in accessing dlg variable.
Backtrace can look like:
```
Core was generated by `/usr/sbin/kamailio -P /run/kamailio/kamailio.pid -f /etc/kamailio/kamailio.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236 dlg_var.c: No such file or directory.
(gdb) bt full
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:236
var = 0x8
var_list = <optimized out>
#1 0x00007fa2a2ce0397 in get_dlg_varref (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:297
var = 0x0
__func__ = "get_dlg_varref"
#2 0x00007fa2a2ccd429 in ki_dlg_var_get_mode (msg=<optimized out>, name=0x7ffe67990d68, rmode=0) at dialog.c:2462
dlg = 0x7fa2818849b8
pval = <optimized out>
#3 0x000055e2d5b87cbb in sr_kemi_exec_func (ket=ket@entry=0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>, msg=msg@entry=0x7fa2a34577c0, pno=pno@entry=1, vps=vps@entry=0x7ffe67990d60) at core/kemiexec.c:82
ret = <optimized out>
__func__ = "sr_kemi_exec_func"
#4 0x00007fa2a2930376 in sr_apy_kemi_exec_func_ex (ket=ket@entry=0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>, self=self@entry=0x7fa280ce2b80, args=args@entry=0x7fa2804ba3a0, idx=idx@entry=485) at apy_kemi.c:327
fname = <optimized out>
i = <optimized out>
ret = <optimized out>
vps = {{vtype = 2, v = {n = -2142122656, l = 140335914274144, s = {s = 0x7fa28051cd60 "rtpe_setid", len = 10}, dict = 0x7fa28051cd60}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {
n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0,
s = {s = 0x0, len = 0}, dict = 0x0}}}
env_P = <optimized out>
lmsg = 0x7fa2a34577c0
xret = <optimized out>
slen = 10
alen = 1
pobj = <optimized out>
__func__ = "sr_apy_kemi_exec_func_ex"
#5 0x00007fa2a2931f45 in sr_apy_kemi_exec_func (self=0x7fa280ce2b80, args=0x7fa2804ba3a0, idx=485) at apy_kemi.c:356
ket = 0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>
ret = 0x0
pstate = 0x0
pframe = 0x0
tvb = {tv_sec = 0, tv_usec = 0}
tve = {tv_sec = 0, tv_usec = 0}
tz = {tz_minuteswest = 2, tz_dsttime = 0}
tdiff = <optimized out>
__func__ = "sr_apy_kemi_exec_func"
#6 0x00007fa2a266f697 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
```
Inspecting dlg vars, the chain look like this:
```
(gdb) frame 0
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236 in dlg_var.c
(gdb) p *dlg.vars
$1 = {key = {s = 0x7fa2833ea040 "mos", len = 3}, value = {s = 0x7fa2833ea0b0 "4.3", len = 3}, vflags = 1, next = 0x7fa2825fa3d0}
(gdb) p *$.next
$2 = {key = {s = 0x7fa200332e34 <error: Cannot access memory at address 0x7fa200332e34>, len = -1061109568}, value = {s = 0xabcdefed <error: Cannot access memory at address 0xabcdefed>, len = 49}, vflags = 1, next = 0x8}
(gdb)
```
Additionaly, inspecting logs i found that this is happends in SIP race conditions, f.e. when caller and calle sends BYE at the same time and
kamailio processing two different transactions of same dialog in two different workers, so present some competition from identical scenarios.
Long time i can't reproduce this in lab, until insert an artificial delay into the scenario which process BYE request to increase competition.
This code quickly causes SEGV in race conditions with BYE:
```
#kemi get/set dlg var
while time.time()-start < 0.5:
for i in ('bbb','ccc'):
dlgvar=KSR.dialog.var_get(i)
KSR.dialog.var_sets(i,str(random.randint(1,500000)))
```
SEGV not fire if i change dialog var_get/set to kamscript variant:
```
while time.time()-start < 0.5:
for i in ('bbb','ccc'):
dlgvar=KSR.pv.get(f"$dlg_var({i})")
KSR.pv.sets(f"$dlg_var({i})", str(random.randint(1,500000)))
```
By analyzing source i found some places which are not properly covered by locks, and some dangerous places
where direct pointer to dialog var structure used, which can change in concurrent process,
so in first case i insert the locks, in second - it is better to use the clone of variable in private PV buffer.
Also found some mistakes with dialog flags, which can potentionaly affect DMQ and DB dialog behavior.
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/4151
-- Commit Summary --
* dialog: covering by locks accessing/changing variables in kemi functions
-- File Changes --
M src/modules/dialog/dialog.c (36)
M src/modules/dialog/dlg_hash.h (2)
M src/modules/dialog/dlg_var.c (2)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/4151.patchhttps://github.com/kamailio/kamailio/pull/4151.diff
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/4151
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/pull/4151(a)github.com>
Hi There,
We found a situation where topos seems to break, let me explain... Assuming the following scenario:
```
Caller ---- Callee
A: ------INVITE-----> Record-Route:A.A.A.A, Record-Route:B.B.B.B
B: <-----200 OK------
C: <-----INVITE------ Route: A.A.A.A, B.B.B.B
D: ------200 OK-----> Record-Route:B.B.B.B, Record-Route:A.A.A.A (reversed order)
E: <======INVITE===== Route: B.B.B.B, A.A.A.A (wrong)
```
A and B establish the connection from caller to callee, and topos works fine.
C (re-INVITE from callee) sends the Route header according to the Record-Routes from the original INVITE (A)
D is the 200 OK sent from the caller to the first re-INVITE (C) coming from the callee, with the Record-Route headers reversed, because is the order in which the callee received them; and according to the RFC it's working as intended:
```
When a UAS responds to a request with a response that establishes a
dialog (such as a 2xx to INVITE), the UAS MUST copy all Record-Route
header field values from the request into the response (including the
URIs, URI parameters, and any Record-Route header field parameters,
whether they are known or unknown to the UAS) and MUST maintain the
order of those values.
[...]
[When a UAC receives a response...]
The route set MUST be set to the list of URIs in the Record-Route
header field from the response, taken in reverse order and preserving
all URI parameters.
```
E takes the Route order from the last 200 OK ignoring they are in reversed order and assuming the top one is the first one, when it should be the other way around, sending to an IP address not reachable from the callee. And I think here is the issue, topos should not update the path on the Record-Routes from a 200 OK but if it does, it should take the reverse order
When disabling topos, everything works fine, or with topos enabled, by setting rr_update=0 works for us, but what if there is a real path update, rr_update=0 wouldn't work for us anymore. The Kamailio version is 5.8.0-rc0
Let me know if you need more information.
Thanks a lot,
Javi
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3778
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3778(a)github.com>
<!-- Kamailio Pull Request Template -->
<!--
IMPORTANT:
- for detailed contributing guidelines, read:
https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
- pull requests must be done to master branch, unless they are backports
of fixes from master branch to a stable branch
- backports to stable branches must be done with 'git cherry-pick -x ...'
- code is contributed under BSD for core and main components (tm, sl, auth, tls)
- code is contributed GPLv2 or a compatible license for the other components
- GPL code is contributed with OpenSSL licensing exception
-->
#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list -->
- [X] Commit message has the format required by CONTRIBUTING guide
- [X] Commits are split per component (core, individual modules, libs, utils, ...)
- [X] Each component has a single commit (if not, squash them into one commit)
- [X] No commits to README files for modules (changes must be done to docbook files
in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change
- [ ] Small bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [X] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description
<!-- Describe your changes in detail -->
Add new api function to dmq to get local socket. Use that in dmq_usrloc.
Sync dmq_usrloc UL_CONTACT_EXPIRE. Very useful for lost TCP connections, so contacts will be expired on all dmq nodes.
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/4134
-- Commit Summary --
* dmq: add new api function
* dmq_usrloc: new value for replicate_socket_info
* dmq_usrloc: add new modparam to sync UL_CONTACT_EXPIRE actions
-- File Changes --
M src/modules/dmq/bind_dmq.c (1)
M src/modules/dmq/bind_dmq.h (2)
M src/modules/dmq/dmq_funcs.c (5)
M src/modules/dmq/dmq_funcs.h (1)
M src/modules/dmq_usrloc/dmq_usrloc.c (3)
M src/modules/dmq_usrloc/doc/dmq_usrloc_admin.xml (26)
M src/modules/dmq_usrloc/usrloc_sync.c (22)
M src/modules/dmq_usrloc/usrloc_sync.h (1)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/4134.patchhttps://github.com/kamailio/kamailio/pull/4134.diff
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/4134
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/pull/4134(a)github.com>