Description

hi, I encountered a problem where the CDP module is extremely prone to process crashes. The following are screenshots of the logs and core files. I couldn't find the exception code that caused the problem, but I suspect that the TCP link was properly established, but the peer did not initialize or handle the exception properly, resulting in an exception when the packet was parsed incorrectly and disconnected later. In addition, since the socket is not a normal peer, it will constantly rebuild the chain, but the CDP does not recognize and process it properly. The socket will continue to grow, but the number of peers will not increase.

_20241011175120.png (view on web)

Troubleshooting

Reproduction

Debugging Data

Core was generated by `/usr/sbin/kamailio -f /etc/kamailio_dra/kamailio_dra.cfg -P /var/run/kamailio_d'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f9860fae336 in cc_acc_client_stateful_sm_process (s=0x7f9861bd8980, event=21874, msg=0x55720d57c702 <qm_free+8029>) at acctstatemachine.c:304
304	}
(gdb) bt
#0  0x00007f9860fae336 in cc_acc_client_stateful_sm_process (s=0x7f9861bd8980, event=21874, msg=0x55720d57c702 <qm_free+8029>) at acctstatemachine.c:304
#1  0x00007f9860fae391 in atomic_get_and_set_int (var=0x776f6e6b6e752072, v=32766) at ../../core/mem/../atomic/atomic_x86.h:242
#2  0x00007f9860fb16eb in disconnect_serviced_peer (sp=0x7f98e28287d0, locked=0) at receiver.c:232
#3  0x00007f9860fbd56b in receive_loop (original_peer=0x0) at receiver.c:942
#4  0x00007f9860fb4c02 in receiver_process (p=0x0) at receiver.c:488
#5  0x00007f9860f50c80 in diameter_peer_start (blocking=0) at diameter_peer.c:278
#6  0x00007f9860f413df in cdp_child_init (rank=0) at cdp_mod.c:274
#7  0x000055720d3df6c2 in init_mod_child (m=0x7f98e279b180, rank=0) at core/sr_module.c:920
#8  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279be50, rank=0) at core/sr_module.c:912
#9  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279c2c0, rank=0) at core/sr_module.c:912
#10 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279cc40, rank=0) at core/sr_module.c:912
#11 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d130, rank=0) at core/sr_module.c:912
#12 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d6c0, rank=0) at core/sr_module.c:912
#13 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279db10, rank=0) at core/sr_module.c:912
#14 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279dff0, rank=0) at core/sr_module.c:912
#15 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279fae0, rank=0) at core/sr_module.c:912
#16 0x000055720d3dfff0 in init_child (rank=0) at core/sr_module.c:999
#17 0x000055720d23d70a in main_loop () at main.c:1942
#18 0x000055720d2488f5 in main (argc=16, argv=0x7ffea4fde838) at main.c:3256
(gdb) bt full
#0  0x00007f9860fae336 in cc_acc_client_stateful_sm_process (s=0x7f9861bd8980, event=21874, msg=0x55720d57c702 <qm_free+8029>) at acctstatemachine.c:304
        x = 0x7f9861b97000
        ret = 441
        rc = 1627304660
        record_type = 32664
        __func__ = "cc_acc_client_stateful_sm_process"
#1  0x00007f9860fae391 in atomic_get_and_set_int (var=0x776f6e6b6e752072, v=32766) at ../../core/mem/../atomic/atomic_x86.h:242
No locals.
#2  0x00007f9860fb16eb in disconnect_serviced_peer (sp=0x7f98e28287d0, locked=0) at receiver.c:232
        __llevel = 0
        __func__ = "disconnect_serviced_peer"
#3  0x00007f9860fbd56b in receive_loop (original_peer=0x0) at receiver.c:942
        __llevel = -1526863824
        rfds = {__fds_bits = {0, 0, 0, 0, 1024, 0 <repeats 11 times>}}
        efds = {__fds_bits = {0 <repeats 16 times>}}
        tv = {tv_sec = 0, tv_usec = 883496}
        n = 1
        max = 298
        cnt = 0
        msg = 0x0
        sp = 0x7f98e28287d0
        sp2 = 0x7f98e2827e30
        p = 0x0
        fd = 295
        fd_exchange_pipe_local = 28
        __func__ = "receive_loop"
#4  0x00007f9860fb4c02 in receiver_process (p=0x0) at receiver.c:488
        __llevel = -990730168
        __func__ = "receiver_process"
#5  0x00007f9860f50c80 in diameter_peer_start (blocking=0) at diameter_peer.c:278
        pid = 0
        k = 1
        seed = 1112701621
        p = 0x0
        __func__ = "diameter_peer_start"
#6  0x00007f9860f413df in cdp_child_init (rank=0) at cdp_mod.c:274
        __llevel = 0
        __func__ = "cdp_child_init"
#7  0x000055720d3df6c2 in init_mod_child (m=0x7f98e279b180, rank=0) at core/sr_module.c:920
        ret = 0
        __func__ = "init_mod_child"
#8  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279be50, rank=0) at core/sr_module.c:912
        ret = 1
        __func__ = "init_mod_child"
#9  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279c2c0, rank=0) at core/sr_module.c:912
        ret = 0
        __func__ = "init_mod_child"
#10 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279cc40, rank=0) at core/sr_module.c:912
        ret = 0
        __func__ = "init_mod_child"
---Type <return> to continue, or q <return> to quit---
#11 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d130, rank=0) at core/sr_module.c:912
        ret = 0
        __func__ = "init_mod_child"
#12 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d6c0, rank=0) at core/sr_module.c:912
        ret = 0
        __func__ = "init_mod_child"
#13 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279db10, rank=0) at core/sr_module.c:912
        ret = 32766
        __func__ = "init_mod_child"
#14 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279dff0, rank=0) at core/sr_module.c:912
        ret = 21874
        __func__ = "init_mod_child"
#15 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279fae0, rank=0) at core/sr_module.c:912
        ret = 12
        __func__ = "init_mod_child"
#16 0x000055720d3dfff0 in init_child (rank=0) at core/sr_module.c:999
        ret = 0
        type = 0x55720d6ece7b "PROC_MAIN"
        __func__ = "init_child"
#17 0x000055720d23d70a in main_loop () at main.c:1942
        i = 1639542784
        pid = 50
        si = 0x0
        si_desc = "\240\233\"\rrU\000\000@\361\367a\230\177\000\000\000\343\375\244\376\177\000\000\327~2\rrU\000\000\000\343\375\244\376\177\000\000\025\t>\r\005\000\000\000\000\000\000\000\037\000\000\000\000;\213\222\213\017\r\202h\r\000\000\000\000\000\000\060\000\000\000\000\000\000\000\240\233\"\rrU\000\000\060\350\375\244\376\177", '\000' <repeats 18 times>, "\340\342\375\244\376\177\000\000 \351O\rrU\000"
        nrprocs = 21874
        woneinit = 0
        __func__ = "main_loop"
#18 0x000055720d2488f5 in main (argc=16, argv=0x7ffea4fde838) at main.c:3256
        cfg_stream = 0x55720ef75260
        c = -1
        r = 0
        tmp = 0x7ffea4fdfee3 ""
        tmp_len = 32766
        port = 5060
        proto = 0
        aproto = 0
        ahost = 0x0
        aport = 0
        options = 0x55720d6b3698 ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
        ret = -1
        seed = 2959679812
        rfd = 4
        debug_save = 0
        debug_flag = 0
        dont_fork_cnt = 2
        n_lst = 0x0
        p = 0x7f996220a3d0 ""
        st = {st_dev = 50, st_ino = 2995745, st_nlink = 1, st_mode = 16877, st_uid = 103, st_gid = 105, __pad0 = 0, st_rdev = 0, st_size = 4096, st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1726638407, tv_nsec = 558385502}, st_mtim = {tv_sec = 1727831142, 
            tv_nsec = 822744119}, st_ctim = {tv_sec = 1727831142, tv_nsec = 822744119}, __glibc_reserved = {0, 0, 0}}
---Type <return> to continue, or q <return> to quit---
        l1 = 2048
        tbuf = "pQ!b\231\177\000\000\070-\000b\231\177\000\000\020\347\375\244\376\177\000\000\367\344\376a\231\177", '\000' <repeats 18 times>, "\001\000\000\000\000\000\000\000(W!b\231\177\000\000\000Q!b\231\177\000\000\001\000\000\000\000\000\000\000\300\200 b\231\177\000\000\017Q\377a\231\177\000\000\020W!b\231\177", '\000' <repeats 19 times>, "S\376\244\376\177\000\000\300\212\225\001\000\000\000\000\207\026=a\231\177\000\000`\347\375\244\376\177\000\000\220Q\376\244\376\177\000\000\002\000\000\000\231\177\000\000\000\000\000\000\000\000\000\000\300\346\375\244\376\177\000\000\003\000\000\000\000\000\000\000\260\346\375\244\376\177\000\000\000\000\000\000\000\000\000\000"...
        option_index = 0
        long_options = {{name = 0x55720d6b59f6 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55720d6b087c "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x55720d6b59fb "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x55720d6b5a01 "subst", has_arg = 1, 
            flag = 0x0, val = 1025}, {name = 0x55720d6b5a07 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55720d6b5a10 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55720d6b5a1a "server-id", has_arg = 1, flag = 0x0, val = 1028}, {
            name = 0x55720d6b5a24 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x55720d6b5a2f "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x55720d6b5a38 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x55720d6b5a43 "debug", has_arg = 1, 
            flag = 0x0, val = 1032}, {name = 0x55720d6b5a49 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55720d6b5a53 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x55720d6b5a5a "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, 
            has_arg = 0, flag = 0x0, val = 0}}
        __func__ = "main"
(gdb) 
(gdb)   info locals
cfg_stream = 0x55720ef75260
c = -1
r = 0
tmp = 0x7ffea4fdfee3 ""
tmp_len = 32766
port = 5060
proto = 0
aproto = 0
ahost = 0x0
aport = 0
options = 0x55720d6b3698 ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
ret = -1
seed = 2959679812
rfd = 4
debug_save = 0
debug_flag = 0
dont_fork_cnt = 2
n_lst = 0x0
p = 0x7f996220a3d0 ""
st = {st_dev = 50, st_ino = 2995745, st_nlink = 1, st_mode = 16877, st_uid = 103, st_gid = 105, __pad0 = 0, st_rdev = 0, st_size = 4096, st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1726638407, tv_nsec = 558385502}, st_mtim = {tv_sec = 1727831142, tv_nsec = 822744119}, 
  st_ctim = {tv_sec = 1727831142, tv_nsec = 822744119}, __glibc_reserved = {0, 0, 0}}
l1 = 2048
tbuf = "pQ!b\231\177\000\000\070-\000b\231\177\000\000\020\347\375\244\376\177\000\000\367\344\376a\231\177", '\000' <repeats 18 times>, "\001\000\000\000\000\000\000\000(W!b\231\177\000\000\000Q!b\231\177\000\000\001\000\000\000\000\000\000\000\300\200 b\231\177\000\000\017Q\377a\231\177\000\000\020W!b\231\177", '\000' <repeats 19 times>, "S\376\244\376\177\000\000\300\212\225\001\000\000\000\000\207\026=a\231\177\000\000`\347\375\244\376\177\000\000\220Q\376\244\376\177\000\000\002\000\000\000\231\177\000\000\000\000\000\000\000\000\000\000\300\346\375\244\376\177\000\000\003\000\000\000\000\000\000\000\260\346\375\244\376\177\000\000\000\000\000\000\000\000\000\000"...
option_index = 0
long_options = {{name = 0x55720d6b59f6 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55720d6b087c "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x55720d6b59fb "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x55720d6b5a01 "subst", has_arg = 1, 
    flag = 0x0, val = 1025}, {name = 0x55720d6b5a07 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55720d6b5a10 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55720d6b5a1a "server-id", has_arg = 1, flag = 0x0, val = 1028}, {
    name = 0x55720d6b5a24 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x55720d6b5a2f "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x55720d6b5a38 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x55720d6b5a43 "debug", has_arg = 1, 
    flag = 0x0, val = 1032}, {name = 0x55720d6b5a49 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55720d6b5a53 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x55720d6b5a5a "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, has_arg = 0, 
    flag = 0x0, val = 0}}
__func__ = "main"
(gdb) list
299		if(s) {
300			AAASessionsUnlock(s->hash);
301		}
302	
303		return ret;
304	}

Log Messages

2024-10-11T08:23:20.234947572Z 21(60) ERROR: cdp [receiver.c:783]: receive_loop(): select_recv(): Bad file descriptor
2024-10-11T08:23:24.906121404Z 21(60) ERROR: cdp [receiver.c:783]: receive_loop(): select_recv(): Bad file descriptor
2024-10-11T08:23:41.857946233Z 21(60) ERROR: cdp [receiver.c:783]: receive_loop(): select_recv(): Bad file descriptor
2024-10-11T08:25:18.095639136Z 25(64) WARNING: cdp [peermanager.c:337]: peer_timer(): Inactivity on peer [scscf32.ims.mnc011.mcc460.3gppnetwork.org] and no DWA, Closing peer...
2024-10-11T08:43:38.138243609Z 31(70) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 34
2024-10-11T08:43:47.476315811Z  0(39) ALERT: <core> [main.c:805]: handle_sigs(): child process 60 exited by a signal 11
2024-10-11T08:43:47.476380054Z  0(39) ALERT: <core> [main.c:809]: handle_sigs(): core was generated
2024-10-11T08:43:47.503780368Z  0(39) CRITICAL: cdp [diameter_peer.c:447]: diameter_peer_destroy(): destroy_diameter_peer(): Bye Bye from C Diameter Peer test


version: kamailio 5.8.1 (x86_64/linux) 07b761
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_SEND_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 07b761
compiled on 10:00:57 Oct 9 2024 with gcc 7.5.0


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/3999@github.com>