Fixes for SEGV in dialog module.
doc/
subfolder, the README file is autogenerated)Hi !
The problem - periodically, 2-3 times in а month, kamailio dumps core with SEGV.
Different OS versions (Ubuntu 18,20,22), different kamailio versions from 5.7.1 up to 5.8.4,
high call load, python kemi, actively used dialog module.
Analyzing core files shows more or less the same picture - SEGV happends in dialog mododule
around the dialog vars, can be in rpc call (dlg.list), in destroy dlg, in accessing dlg variable.
Backtrace can look like:
Core was generated by `/usr/sbin/kamailio -P /run/kamailio/kamailio.pid -f /etc/kamailio/kamailio.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236 dlg_var.c: No such file or directory.
(gdb) bt full
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:236
var = 0x8
var_list = <optimized out>
#1 0x00007fa2a2ce0397 in get_dlg_varref (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:297
var = 0x0
__func__ = "get_dlg_varref"
#2 0x00007fa2a2ccd429 in ki_dlg_var_get_mode (msg=<optimized out>, name=0x7ffe67990d68, rmode=0) at dialog.c:2462
dlg = 0x7fa2818849b8
pval = <optimized out>
#3 0x000055e2d5b87cbb in sr_kemi_exec_func (ket=ket@entry=0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>, msg=msg@entry=0x7fa2a34577c0, pno=pno@entry=1, vps=vps@entry=0x7ffe67990d60) at core/kemiexec.c:82
ret = <optimized out>
__func__ = "sr_kemi_exec_func"
#4 0x00007fa2a2930376 in sr_apy_kemi_exec_func_ex (ket=ket@entry=0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>, self=self@entry=0x7fa280ce2b80, args=args@entry=0x7fa2804ba3a0, idx=idx@entry=485) at apy_kemi.c:327
fname = <optimized out>
i = <optimized out>
ret = <optimized out>
vps = {{vtype = 2, v = {n = -2142122656, l = 140335914274144, s = {s = 0x7fa28051cd60 "rtpe_setid", len = 10}, dict = 0x7fa28051cd60}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {
n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0,
s = {s = 0x0, len = 0}, dict = 0x0}}}
env_P = <optimized out>
lmsg = 0x7fa2a34577c0
xret = <optimized out>
slen = 10
alen = 1
pobj = <optimized out>
__func__ = "sr_apy_kemi_exec_func_ex"
#5 0x00007fa2a2931f45 in sr_apy_kemi_exec_func (self=0x7fa280ce2b80, args=0x7fa2804ba3a0, idx=485) at apy_kemi.c:356
ket = 0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>
ret = 0x0
pstate = 0x0
pframe = 0x0
tvb = {tv_sec = 0, tv_usec = 0}
tve = {tv_sec = 0, tv_usec = 0}
tz = {tz_minuteswest = 2, tz_dsttime = 0}
tdiff = <optimized out>
__func__ = "sr_apy_kemi_exec_func"
#6 0x00007fa2a266f697 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
Inspecting dlg vars, the chain look like this:
(gdb) frame 0
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236 in dlg_var.c
(gdb) p *dlg.vars
$1 = {key = {s = 0x7fa2833ea040 "mos", len = 3}, value = {s = 0x7fa2833ea0b0 "4.3", len = 3}, vflags = 1, next = 0x7fa2825fa3d0}
(gdb) p *$.next
$2 = {key = {s = 0x7fa200332e34 <error: Cannot access memory at address 0x7fa200332e34>, len = -1061109568}, value = {s = 0xabcdefed <error: Cannot access memory at address 0xabcdefed>, len = 49}, vflags = 1, next = 0x8}
(gdb)
Additionaly, inspecting logs i found that this is happends in SIP race conditions, f.e. when caller and calle sends BYE at the same time and
kamailio processing two different transactions of same dialog in two different workers, so present some competition from identical scenarios.
Long time i can't reproduce this in lab, until insert an artificial delay into the scenario which process BYE request to increase competition.
This code quickly causes SEGV in race conditions with BYE:
#kemi get/set dlg var
while time.time()-start < 0.5:
for i in ('bbb','ccc'):
dlgvar=KSR.dialog.var_get(i)
KSR.dialog.var_sets(i,str(random.randint(1,500000)))
SEGV not fire if i change dialog var_get/set to kamscript variant:
while time.time()-start < 0.5:
for i in ('bbb','ccc'):
dlgvar=KSR.pv.get(f"$dlg_var({i})")
KSR.pv.sets(f"$dlg_var({i})", str(random.randint(1,500000)))
By analyzing source i found some places which are not properly covered by locks, and some dangerous places
where direct pointer to dialog var structure used, which can change in concurrent process,
so in first case i insert the locks, in second - it is better to use the clone of variable in private PV buffer.
Also found some mistakes with dialog flags, which can potentionaly affect DMQ and DB dialog behavior.
https://github.com/kamailio/kamailio/pull/4151
(3 files)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.