Hello Péter,
Den ons 24 jan. 2024 kl 09:58 skrev Dr. Barabás Péter <
dr.peter.barabas(a)gmail.com>gt;:
Indeed it looks very similar. One of the core dumps we have seen is
identical to one of those that you provided in your report.
Question: is $uac_req(evroute) set to 1 and do you handle uac:reply event
route?
In my case crash only occurs when these are true. If I switch evroute off,
no crash occurs.
Yes, we have $uac_req(evroute) set to 1 and handle the uac:reply event
route. We tested removing those but it had a lot of side-effects we weren't
aware of in the rest of our system so we could enable it in our
staging environment.
What other versions of Kamailio have you tested? We have reverted back to
5.5 and so far it seems to be working, but we have had very little run time
on it so we are not convinced yet. The problem is that there is a feature
in the nathelp modules (alias_name) that we would like to have which only
exists from 5.6 and onwards.
/Mattis
Péter Barabás
*From: *Mattis Lind via sr-users <sr-users(a)lists.kamailio.org>
*Date: *Wednesday, 2024. January 24. 9:46
*To: *Kamailio (SER) - Users Mailing List <sr-users(a)lists.kamailio.org>
*Cc: *Mattis Lind <mattislind(a)gmail.com>
*Subject: *[SR-Users] Kamailio 5.6 (and 5.7) core dumping.
Hello Kamailio list!
We have a scenario that makes use of the UAC-module to send SIP MESSAGE
and then in some cases the Kamailio process core dumps after some time
after processing messages. I have been able to gather a core dump which
shows this backtrace appended below. We are using Kamailio 5.6 retrieved
from the kamailio repository:
http://deb.kamailio.org/kamailio56. We are
running Kamailio in a Docker container which runs on "5.10.0-25-cloud-amd64
#1 SMP Debian 5.10.191-1 (2023-08-16) x86_64 GNU/Linux"
We have previously tried to use Kamailio 5.7 but it gave the same type of
crashes.
We have been using the uac module a lot but it is just in this scenario we
get a core dump. We have some kind of relation to a specific scenario but
it can take from several seconds from the last SIP message of this scenario
up to 50 minutes until the crash occurs. To me this sounds like some kind
of cleanup that is not handled properly. The back trace indicates that free
of shared memory could be the issue, but I don't know the code
unfortunately.
The last things we see in the log file is:
2024-01-23T13:18:08.828+01:00 Jan 23 12:18:08 /usr/sbin/kamailio[4789]:
INFO: <script>: Incoming SIP TCP request conid 21 call-id
un0rihsRLJLvP-grn6LO-A
2024-01-23T13:18:08.835+01:00 Jan 23 12:18:08 /usr/sbin/kamailio[4789]:
INFO: <script>: Incoming SIP TCP request conid 21 call-id
WQjhldpRJbxjZRYe7fWgbw
2024-01-23T13:18:08.860+01:00 Jan 23 12:18:08 /usr/sbin/kamailio[4795]:
CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 49
2024-01-23T13:18:09.486+01:00 Jan 23 12:18:09 /usr/sbin/kamailio[4751]:
ALERT: <core> [main.c:783]: handle_sigs(): child process 4777 exited by a
signal 11
2024-01-23T13:18:09.486+01:00 Jan 23 12:18:09 /usr/sbin/kamailio[4751]:
ALERT: <core> [main.c:787]: handle_sigs(): core was generated
2024-01-23T13:18:09.516+01:00 Jan 23 12:18:09 /usr/sbin/kamailio[4751]:
INFO: <core> [core/sctp_core.c:53]: sctp_core_destroy(): SCTP API not
initialized
2024-01-23T13:18:09.570+01:00 Started /root/sipconfig/startkamailio.sh
2024-01-23T13:18:09.570+01:00 info: :-) Starting Kamailio
Just before all the crashes we see the "CRITICAL: <core>
[core/pass_fd.c:281]: receive_fd(): EOF on 49" log line.
Best regards,
Mattis Lind
# gdb /usr/sbin/kamailio /core
*GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git*
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/kamailio...
(No debugging symbols found in /usr/sbin/kamailio)
warning: Can't open file /dev/zero (deleted) during file-backed mapping
note processing
[New LWP 4777]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/kamailio -DD -M 18 -m 192 -A
serverId=17173 -A sendTraceLocal="sip:10'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
77 ../sysdeps/x86_64/multiarch/strlen-evex.S: No such file or directory.
(gdb) bt full
#0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
No locals.
#1 0x00007f5425e34b78 in __vfprintf_internal (s=s@entry=0x55a06e969a60,
format=format@entry=0x55a06e558020 "%s: %.*s%s%s%sBUG: qm: fragm. %p
(address %p) beginning overwritten (%lx)! Memory allocator was called from
%s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n",
ap=ap@entry=0x7ffd6bebfb50, mode_flags=mode_flags@entry=0) at
vfprintf-internal.c:1647
len = <optimized out>
step0_jumps = {0, 1717, 1621, 3413, 3317, 3997, 2677, 2837, 3613,
1773, 4309, 4445, 3517, 4437, 4389, 2789, 4197, 3917, 3221, 2997, 1141,
1365, 1997, 1925, 1885, 733, 3709, 533, 533, 4101}
space = <optimized out>
is_short = <optimized out>
use_outdigits = 0
outc = <optimized out>
step1_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1773, 4309, 4445, 3517,
4437, 4389, 2789, 4197, 3917, 3221, 2997, 1141, 1365, 1997, 1925, 1885,
733, 3709, 533, 533, 0}
group = 0
prec = -1
step2_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4309, 4445, 3517,
4437, 4389, 2789, 4197, 3917, 3221, 2997, 1141, 1365, 1997, 1925, 1885,
733, 3709, 533, 533, 0}
string = 0x756d6f7266222c22 <error: Cannot access memory at
address 0x756d6f7266222c22>
left = 0
is_long_double = <optimized out>
width = 0
signed_number = <optimized out>
step3a_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4213, 0, 0, 0,
4389, 2789, 4197, 3917, 3221, 0, 0, 0, 0, 1925, 0, 0, 0, 0, 0, 0}
alt = <optimized out>
showsign = 0
is_long = 0
is_char = <optimized out>
pad = <optimized out>
step3b_jumps = {0 <repeats 11 times>, 3517, 0, 0, 4389, 2789,
4197, 3917, 3221, 2997, 1141, 1365, 1997, 1925, 1885, 733, 3709, 0, 0, 0}
step4_jumps = {0 <repeats 14 times>, 4389, 2789, 4197, 3917,
3221, 2997, 1141, 1365, 1997, 1925, 1885, 733, 3709, 0, 0, 0}
args_value = <optimized out>
is_negative = <optimized out>
number = {longlong = <optimized out>, word = <optimized out>}
base = <optimized out>
the_arg = {pa_wchar = 4777 L'\x12a9', pa_int = 4777, pa_long_int
= 4777, pa_long_long_int = 4777, pa_u_int = 4777, pa_u_long_int = 4777,
pa_u_long_long_int = 4777,
pa_double = 2.3601515901836347e-320, pa_long_double =
1.74131181638025811763e-4947, pa_float128 =
3.09319115455554459548860449034534676e-4962,
pa_string = 0x12a9 <error: Cannot access memory at address
0x12a9>, pa_wstring = 0x12a9 <error: Cannot access memory at address
0x12a9>, pa_pointer = 0x12a9, pa_user = 0x12a9}
spec = 115 's'
_buffer = {__routine = 0x4, __arg = 0xd, __canceltype =
1855363680, __prev = 0xe0}
_avail = <optimized out>
thousands_sep = 0x0
grouping = 0xffffffffffffffff <error: Cannot access memory at
address 0xffffffffffffffff>
done = 238
f = 0x55a06e5580a7 "s:%lu. Exec from %s:%u.\n"
lead_str_end = 0x55a06e558020 "%s: %.*s%s%s%sBUG: qm: fragm. %p
(address %p) beginning overwritten (%lx)! Memory allocator was called from
%s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n"
end_of_spec = <optimized out>
work_buffer =
"h\r\000\000\000\000\000\000\000\000\000\000\060\000\000\000\000\000\000\000\375\177\000\000
\372\353k\375\177\000\000\237MSn\n\000\000\000\000\000\000\000\240U",
'\000' <repeats 18 times>,
"P\225Un\240U\000\000\000\000\000\000[\214A\323\f\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\206\002",
'\000' <repeats 14 times>, "\004\000\000\000\000\000\000\000
\367\353k\375\177\000\000\301\225Un\240U\000\000\323\262\vn\240U\000\000\000\000\000\000\000\000\000\000s\374On\240U\000\000W\225Un\240U\000\000\350>\212\027T\177\000\000
\323\365$T\177\000\000\060\a\354k\375\177\000\00--Type <RET> for more, q to
quit, c to continue without paging--
0\220\371\227n\240U\000\000"...
workend = 0x7ffd6bebf9f8 ""
ap_save = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0x7ffd6bebfc30, reg_save_area = 0x7ffd6bebfb70}}
nspecs_done = 10
save_errno = 4
readonly_format = 0
do_longlong_number = <optimized out>
__result = <optimized out>
#2 0x00007f5425ec079f in __vsyslog_internal (pri=<optimized out>,
fmt=0x55a06e558020 "%s: %.*s%s%s%sBUG: qm: fragm. %p (address %p)
beginning overwritten (%lx)! Memory allocator was called from %s:%u.
Fragment marked by %s:%lu. Exec from %s:%u.\n",
ap=0x7ffd6bebfb50, mode_flags=0) at ../misc/syslog.c:233
now_tm = {tm_sec = 8, tm_min = 18, tm_hour = 12, tm_mday = 23,
tm_mon = 0, tm_year = 124, tm_wday = 2, tm_yday = 22, tm_isdst = 0,
tm_gmtoff = 0, tm_zone = 0x55a06e94c5e0 "UTC"}
now = 1706012288
fd = <optimized out>
f = 0x55a06e969a60
buf = 0x0
bufsize = 0
msgoff = 21
saved_errno = <optimized out>
failbuf =
"`\232\226n\240U\000\000\000\204\201\247[\214A\323`\374\353k\375\177\000\000\300p\371%T"
clarg = {buf = <optimized out>, oldaction = <optimized out>}
#3 0x00007f5425ec0c46 in __syslog (pri=<optimized out>, fmt=<optimized
out>) at ../misc/syslog.c:117
ap = {{gp_offset = 48, fp_offset = 48, overflow_arg_area =
0x7ffd6bebfc70, reg_save_area = 0x7ffd6bebfb70}}
#4 0x000055a06e3b7839 in ?? ()
No symbol table info available.
#5 0x000055a06e3bc039 in qm_free ()
No symbol table info available.
#6 0x000055a06e3c7c28 in qm_shm_free ()
No symbol table info available.
#7 0x00007f542325fb8e in uac_send_tm_callback () from
/usr/lib/x86_64-linux-gnu/kamailio/modules/uac.so
No symbol table info available.
#8 0x00007f5424a2f002 in run_trans_callbacks_internal () from
/usr/lib/x86_64-linux-gnu/kamailio/modules/tm.so
No symbol table info available.
#9 0x00007f5424a2f179 in run_trans_callbacks () from
/usr/lib/x86_64-linux-gnu/kamailio/modules/tm.so
No symbol table info available.
#10 0x00007f54249d5e8c in free_cell_helper () from
/usr/lib/x86_64-linux-gnu/kamailio/modules/tm.so
No symbol table info available.
#11 0x00007f5424aa8f82 in wait_handler () from
/usr/lib/x86_64-linux-gnu/kamailio/modules/tm.so
No symbol table info available.
#12 0x000055a06e37a263 in ?? ()
No symbol table info available.
#13 0x000055a06e37a79d in ?? ()
No symbol table info available.
#14 0x000055a06e37acc6 in timer_main ()
No symbol table info available.
#15 0x000055a06e0a5f62 in main_loop ()
No symbol table info available.
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000055a06e0b120c in main ()
No symbol table info available.