version: kamailio 5.5.6 (x86_64/linux) 21a9bc Operating System: Debian GNU/Linux 11 (bullseye) Kernel: Linux 5.10.0-22-amd64
we see this core it has repeated some times in different days ``` (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fde2329f537 in __GI_abort () at abort.c:79 #2 0x000055d5fab45995 in sig_alarm_abort (signo=14) at main.c:699 #3 <signal handler called> #4 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:37 #5 0x00007fde1c80a015 in futex_get (lock=0x7fdd9f9c9ca4) at ../../core/mem/../futexlock.h:108 #6 0x00007fde1c80c526 in destroy_linkers (linker=0x0) at dlg_profile.c:275 #7 0x00007fde1c7f87b3 in destroy_dlg (dlg=0x7fddad479ea0) at dlg_hash.c:377 #8 0x00007fde1c7f8ca0 in destroy_dlg_table () at dlg_hash.c:438 #9 0x00007fde1c790286 in mod_destroy () at dialog.c:809 #10 0x000055d5fad76dd8 in destroy_modules () at core/sr_module.c:842 #11 0x000055d5fab440e2 in cleanup (show_status=1) at main.c:575 #12 0x000055d5fab45d45 in shutdown_children (sig=15, show_status=1) at main.c:718 #13 0x000055d5fab49129 in handle_sigs () at main.c:816 #14 0x000055d5fab56959 in main_loop () at main.c:1903 #15 0x000055d5fab602e9 in main (argc=15, argv=0x7ffd9451bbb8) at main.c:3061 (gdb) bt full #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {8192, 0 <repeats 15 times>}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #1 0x00007fde2329f537 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x55d5faf6b088} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x000055d5fab45995 in sig_alarm_abort (signo=14) at main.c:699 __func__ = "sig_alarm_abort" #3 <signal handler called> No locals. #4 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:37 No locals. #5 0x00007fde1c80a015 in futex_get (lock=0x7fdd9f9c9ca4) at ../../core/mem/../futexlock.h:108 v = 2 i = 1024 #6 0x00007fde1c80c526 in destroy_linkers (linker=0x0) at dlg_profile.c:275 p_entry = 0x7fdd9f9c9d88 l = 0x7fddd2aca430 lh = 0x55d5faf81a03 __func__ = "destroy_linkers" #7 0x00007fde1c7f87b3 in destroy_dlg (dlg=0x7fddad479ea0) at dlg_hash.c:377 ret = 1 var = 0x7fde1c719a44 <mod_destroy+859> --Type <RET> for more, q to quit, c to continue without paging-- __func__ = "destroy_dlg" #8 0x00007fde1c7f8ca0 in destroy_dlg_table () at dlg_hash.c:438 dlg = 0x0 l_dlg = 0x7fddad479ea0 i = 0 __func__ = "destroy_dlg_table" #9 0x00007fde1c790286 in mod_destroy () at dialog.c:809 No locals. #10 0x000055d5fad76dd8 in destroy_modules () at core/sr_module.c:842 t = 0x7fde1f328018 foo = 0x7fde1f327838 __func__ = "destroy_modules" #11 0x000055d5fab440e2 in cleanup (show_status=1) at main.c:575 memlog = 0 __func__ = "cleanup" #12 0x000055d5fab45d45 in shutdown_children (sig=15, show_status=1) at main.c:718 __func__ = "shutdown_children" #13 0x000055d5fab49129 in handle_sigs () at main.c:816 chld = 0 chld_status = 139 any_chld_stopped = 1 memlog = 0 __func__ = "handle_sigs" #14 0x000055d5fab56959 in main_loop () at main.c:1903 i = 14 pid = 3845341 --Type <RET> for more, q to quit, c to continue without paging-- si = 0x0 si_desc = "udp receiver child=13 sock=87.237.87.28:5060\000\000\000\000\300\272Q\224\375\177\000\000\000\000\000\000\000\000\000\000\003\032\370\372\325U\000\000-\000\000\000\000\000\000\000\200\003\062\037\336\177\000\000F\034\067#\336\177\000\000\060\000\000\000\060\000\000\000x\266Q\224\375\177\000\000\220\265Q\224\375\177\000\000\000\230\026︾\314"" nrprocs = 14 woneinit = 1 __func__ = "main_loop" #15 0x000055d5fab602e9 in main (argc=15, argv=0x7ffd9451bbb8) at main.c:3061 cfg_stream = 0x55d5fbd3f2e0 c = -1 r = 0 tmp = 0x7ffd9451ce7c "" tmp_len = 832 port = 832 proto = 832 ahost = 0x0 aport = 0 options = 0x55d5faf6e0b8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 4110196155 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x98000000980 p = 0xc2 <error: Cannot access memory at address 0xc2> st = {st_dev = 23, st_ino = 946, st_nlink = 2, st_mode = 16832, st_uid = 0, st_gid = 998, __pad0 = 0, st_rdev = 0, st_size = 140, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1696021490, tv_nsec = 675255852}, --Type <RET> for more, q to quit, c to continue without paging-- st_mtim = {tv_sec = 1696418622, tv_nsec = 168794592}, st_ctim = {tv_sec = 1696418622, tv_nsec = 168794592}, __glibc_reserved = {0, 0, 0}} tbuf = "P\267Q\224\375\177\000\000\310e)#\336\177\000\000\020\204]#\336\177\000\000\000\000\000\000\000\000\000\000зQ\224\375\177\000\000\000\000\000\000\000\000\000\000зQ\224\375\177", '\000' <repeats 18 times>, "`g^#\336\177\000\000\350$a#\336\177\000\000\204i^#\336\177\000\000\060d^#\336\177\000\000H\020a#\336\177\000\000\000`^#\336\177\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000@"(#\336\177", '\000' <repeats 66 times>... option_index = 0 long_options = {{name = 0x55d5faf70526 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55d5faf6b51c "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x55d5faf7052b "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x55d5faf70531 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x55d5faf70537 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55d5faf70540 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55d5faf7054a "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x55d5faf70554 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x55d5faf7055f "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x55d5faf70568 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x55d5faf70573 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x55d5faf70579 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55d5faf70583 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" (gdb) ```
could it be related to any type of message that is making the kamailio crush?
thanks a lot and regards david
This is a backtrace from shut down process. You have to enable core files per pid/process and if it is is a runtime crash (not a shut down/restart crash), then there have to be more core files -- bt from all are needed.
On the other hand 5.5 series is out of maintenance, you should consider upgrading.
ok thanks Daniel. I will activate the core per pid, and we will check to upgrade to 5.6
hello Daniel we have had another core
``` #0 0x00007fa0db0799b0 in ?? () #1 0x00007fa10d780741 in run_trans_callbacks_internal (cb_lst=0x7fa0b54752f0, type=131072, trans=0x7fa0b5475278, params=0x7ffdfb6b4d90) at t_hooks.c:258 #2 0x00007fa10d780871 in run_trans_callbacks (type=131072, trans=0x7fa0b5475278, req=0x0, rpl=0x0, code=0) at t_hooks.c:285 #3 0x00007fa10d72a79c in free_cell_helper (dead_cell=0x7fa0b5475278, silent=0, fname=0x7fa10d83bd72 "timer.c", fline=648) at h_table.c:165 #4 0x00007fa10d787036 in wait_handler (ti=1120415570, wait_tl=0x7fa0b5475300, data=0x7fa0b5475278) at timer.c:648 #5 0x000056079079f21a in timer_list_expire (t=1120415570, h=0x7fa08dc4b360, slow_l=0x7fa08dc4e2d8, slow_mark=45780) at core/timer.c:857 #6 0x000056079079f724 in timer_handler () at core/timer.c:922 #7 0x000056079079fc27 in timer_main () at core/timer.c:961 #8 0x00005607904e76cf in main_loop () at main.c:1839 #9 0x00005607904f22e9 in main (argc=15, argv=0x7ffdfb6b57d8) at main.c:3061 ``` [NOC-2123_bt.log](https://github.com/kamailio/kamailio/files/13115341/NOC-2123_bt.log)
I saw that the process which had the SIGSEV was handling all cancel requests it seems. Maybe this can be related with the memory management when releasing transactions?
thanks a lot and regards david
hello all
we have seen some cores on the same server like this one
``` #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f02297a0537 in __GI_abort () at abort.c:79 #2 0x000055caeb47dc3d in qm_debug_check_frag (qm=0x7f00aba53000, f=0x7f019c02c760, file=0x7f0222d42d32 "dialog: dlg_hash.c", line=412, efile=0x55caeb60c079 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:158 #3 0x000055caeb48148a in qm_free (qmp=0x7f00aba53000, p=0x7f019c02c798, file=0x7f0222d42d32 "dialog: dlg_hash.c", func=0x7f0222d43338 <__func__.17> "destroy_dlg", line=412, mname=0x7f0222d42d2b "dialog") at core/mem/q_malloc.c:511 #4 0x000055caeb48c6a9 in qm_shm_free (qmp=0x7f00aba53000, p=0x7f019c02c798, file=0x7f0222d42d32 "dialog: dlg_hash.c", func=0x7f0222d43338 <__func__.17> "destroy_dlg", line=412, mname=0x7f0222d42d2b "dialog") at core/mem/q_malloc.c:1276 #5 0x00007f0222cf9bb8 in destroy_dlg (dlg=0x7f00cdec7d28) at dlg_hash.c:412 #6 0x00007f0222d03b8b in dlg_unref_helper (dlg=0x7f00cdec7d28, cnt=2, fname=0x7f0222d3ea61 "dlg_handlers.c", fline=431) at dlg_hash.c:1095 #7 0x00007f0222ce1146 in dlg_ontdestroy (t=0x7f01f1b17808, type=131072, param=0x7ffe98532bd0) at dlg_handlers.c:431 #8 0x00007f0225607741 in run_trans_callbacks_internal (cb_lst=0x7f01f1b17880, type=131072, trans=0x7f01f1b17808, params=0x7ffe98532bd0) at t_hooks.c:258 #9 0x00007f0225607871 in run_trans_callbacks (type=131072, trans=0x7f01f1b17808, req=0x0, rpl=0x0, code=0) at t_hooks.c:285 #10 0x00007f02255b179c in free_cell_helper (dead_cell=0x7f01f1b17808, silent=0, fname=0x7f02256c2d72 "timer.c", fline=648) at h_table.c:165 #11 0x00007f022560e036 in wait_handler (ti=1916150547, wait_tl=0x7f01f1b17890, data=0x7f01f1b17808) at timer.c:648 #12 0x000055caeb44121a in timer_list_expire (t=1916150547, h=0x7f00abad2360, slow_l=0x7f00abad6278, slow_mark=34766) at core/timer.c:857 #13 0x000055caeb441724 in timer_handler () at core/timer.c:922 #14 0x000055caeb441c27 in timer_main () at core/timer.c:961 #15 0x000055caeb1896cf in main_loop () at main.c:1839 #16 0x000055caeb1942e9 in main (argc=15, argv=0x7ffe98533618) at main.c:3061 ``` in this server we are having very high call rate, more than 2000 calls per second.
this is the dialog module setup we have ``` modparam("dialog", "db_url", DBURL_DIALOG) modparam("dialog", "enable_stats", 1) modparam("dialog", "hash_size", 16384) modparam("dialog", "dlg_flag", 31) modparam("dialog", "default_timeout", 14500) modparam("dialog", "dlg_match_mode", 1) modparam("dialog", "db_mode", 0) modparam("dialog", "profiles_with_value", "dedalus") modparam("dialog", "timer_procs", 1) modparam("dialog", "db_skip_load", 1) modparam("dialog", "db_update_period", 60) modparam("dialog", "db_fetch_rows", 1000) ```
changing to this ``` modparam("dialog", "db_url", DBURL_DIALOG) modparam("dialog", "enable_stats", 1) modparam("dialog", "hash_size", 32768) modparam("dialog", "dlg_flag", 31) modparam("dialog", "default_timeout", 14500) modparam("dialog", "dlg_match_mode", 1) modparam("dialog", "db_mode", 0) modparam("dialog", "profiles_with_value", "dedalus_in ; dedalus_out") modparam("dialog", "timer_procs", 1) modparam("dialog", "db_skip_load", 1) modparam("dialog", "db_update_period", 60) modparam("dialog", "db_fetch_rows", 1000) modparam("dialog", "wait_ack", 0) ``` AS far as i understand, cores are related to dialog end handling ?¿ increasing hash_size and setting wait_ack to 0 woudl help to the performance?
thanks a lot and regards david
Hi descartin,
We had a similar problem not long ago - see what is your current setting for SHM size:
https://github.com/kamailio/kamailio-wiki/blob/main/docs/cookbooks/5.6.x/cor...
It's possible that memory manager runs out of shared memory and raises `SIGABRT`, causing Kamailio to crash. In our case, we had multiple cores dumped per crash - one of parent raising `SIGABRT` and 2-3 from child processes hitting a race condition between handling `SIGABRT` from parent and `SIGSEGV` raised internally from failed SHM operation.
We had to increase this setting significantly, but our use-case involves lots of `usrloc` data in memory. It is possible that for your use-case you didn't have to tune SHM size yet and hit the default limit just recently.
When tuning memory, you may also want to check out `-M` command line option: https://github.com/kamailio/kamailio-wiki/blob/main/docs/cookbooks/5.6.x/cor...
I haven't found corresponding parameter to configure this inside the routing script like the one for SHM, and we had some problems later with `pkg` memory running low as well.
You can check current memory usage with RPC commands, i.e.:
https://www.kamailio.org/docs/modules/5.6.x/modules/kex.html#kex.r.core.shmm... https://www.kamailio.org/docs/modules/5.6.x/modules/kex.html#kex.r.pkg.stats
Hope this helps :-)
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Closed #3593 as not planned.