Submitting this on behalf of Jacob Greene, who posted a day or two ago:

https://lists.kamailio.org/pipermail/sr-users/2020-May/109315.html

Version in production is Debian 9 and Kamailio 5.3.4 from Debian packages:

ii  kamailio                        5.3.4+bpo9                        amd64        very fast, dynamic and configurable SIP server
ii  kamailio-postgres-modules:amd64 5.3.4+bpo9                        amd64        PostgreSQL database connectivity module for the Kamailio SIP server
ii  kamailio-presence-modules:amd64 5.3.4+bpo9                        amd64        SIP presence modules for Kamailio
ii  kamailio-python3-modules:amd64  5.3.4+bpo9                        amd64        The app_python3 module for the Kamailio SIP server
rc  kamailio-xml-modules:amd64      5.2.6+stretch                     amd64        XML based extensions for Kamailio's Management Interface

Anyway, a Kamailio child process periodically exits -- about ten times a day -- with a SIGABRT set off by a failed assertion somewhere inside the app_python3 bureaucracy:

(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f2c2f76042a in __GI_abort () at abort.c:89
#2  0x00007f2c2f757e67 in __assert_fail_base (fmt=<optimized out>, 
    assertion=assertion@entry=0x7f2c2892bda8 "PyBytes_Check(pframe->f_code->co_name)", 
    file=file@entry=0x7f2c2892bd9c "apy_kemi.c", line=line@entry=1819, 
    function=function@entry=0x7f2c2892d0b0 "sr_apy_kemi_exec_func") at assert.c:92
#3  0x00007f2c2f757f12 in __GI___assert_fail (assertion=0x7f2c2892bda8 "PyBytes_Check(pframe->f_code->co_name)", 
    file=0x7f2c2892bd9c "apy_kemi.c", line=1819, function=0x7f2c2892d0b0 "sr_apy_kemi_exec_func") at assert.c:101
#4  0x00007f2c2890bd57 in sr_apy_kemi_exec_func () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
#5  0x00007f2c288d525c in ?? () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
#6  0x00007f2c282ad259 in PyCFunction_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#7  0x00007f2c283de739 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#8  0x00007f2c283de6ad in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#9  0x00007f2c284a0074 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#10 0x00007f2c283dcadf in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#11 0x00007f2c283de6ad in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#12 0x00007f2c284a0074 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#13 0x00007f2c284a0153 in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#14 0x00007f2c28333558 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#15 0x00007f2c2840fa37 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#16 0x00007f2c2845b95c in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#17 0x00007f2c2840fa37 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#18 0x00007f2c2849eb77 in PyEval_CallObjectWithKeywords () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#19 0x00007f2c2891c136 in apy_exec () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
#20 0x00007f2c288dfe94 in sr_kemi_config_engine_python () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
#21 0x0000558ec29ba7b4 in sr_kemi_route ()
#22 0x0000558ec2ab7c03 in receive_msg ()
#23 0x0000558ec2997e4a in udp_rcv_loop ()
#24 0x0000558ec28de718 in main_loop ()
#25 0x0000558ec28e7a39 in main ()

The fuller backtrace for the top few frames:

Thread 1 (Thread 0x7f2c3031db40 (LWP 26211)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 140720998929208, 139827752272384, 139827751795724, 140724530937856, 94071669029552, 
            94071669029653, 94071669029552, 94071669029552, 94071669029661, 94071669029852, 94071669029552, 94071669029852, 
            0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007f2c2f76042a in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x558ec469d990, sa_sigaction = 0x558ec469d990}, sa_mask = {__val = {
              139827751803746, 140720998929248, 109, 139827755099200, 1819, 139827635998896, 139827635638836, 
              139827752272384, 139827751811289, 139827755115808, 139827751803256, 139827752815328, 139827752801040, 
              94071641717939, 139827763892224, 139827635994024}}, sa_flags = 1819, sa_restorer = 0x7f2c2892d0b0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f2c2f757e67 in __assert_fail_base (fmt=<optimized out>, 
    assertion=assertion@entry=0x7f2c2892bda8 "PyBytes_Check(pframe->f_code->co_name)", 
    file=file@entry=0x7f2c2892bd9c "apy_kemi.c", line=line@entry=1819, 
    function=function@entry=0x7f2c2892d0b0 "sr_apy_kemi_exec_func") at assert.c:92
        str = 0x558ec466a760 ""
        total = 4096
#3  0x00007f2c2f757f12 in __GI___assert_fail (assertion=0x7f2c2892bda8 "PyBytes_Check(pframe->f_code->co_name)", 
    file=0x7f2c2892bd9c "apy_kemi.c", line=1819, function=0x7f2c2892d0b0 "sr_apy_kemi_exec_func") at assert.c:101
No locals.
#4  0x00007f2c2890bd57 in sr_apy_kemi_exec_func () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
No symbol table info available.
#5  0x00007f2c288d525c in ?? () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
No symbol table info available.
#6  0x00007f2c282ad259 in PyCFunction_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#7  0x00007f2c283de739 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#8  0x00007f2c283de6ad in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#9  0x00007f2c284a0074 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#10 0x00007f2c283dcadf in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#11 0x00007f2c283de6ad in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#12 0x00007f2c284a0074 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#13 0x00007f2c284a0153 in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#14 0x00007f2c28333558 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#15 0x00007f2c2840fa37 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#16 0x00007f2c2845b95c in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#17 0x00007f2c2840fa37 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#18 0x00007f2c2849eb77 in PyEval_CallObjectWithKeywords () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
No symbol table info available.
#19 0x00007f2c2891c136 in apy_exec () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
No symbol table info available.
#20 0x00007f2c288dfe94 in sr_kemi_config_engine_python () from /usr/lib/x86_64-linux-gnu/kamailio/modules/app_python3.so
No symbol table info available.
#21 0x0000558ec29ba7b4 in sr_kemi_route ()
No symbol table info available.
#22 0x0000558ec2ab7c03 in receive_msg ()
No symbol table info available.
#23 0x0000558ec2997e4a in udp_rcv_loop ()
No symbol table info available.
#24 0x0000558ec28de718 in main_loop ()
No symbol table info available.
#25 0x0000558ec28e7a39 in main ()
No symbol table info available.

For what it's worth, line 1819 in apy_kemi.c in the latest 5.3 HEAD (which I'm not sure this binary package reflects) is:

1801         if(unlikely(cfg_get(core, core_cfg, latency_limit_action)>0)
1802                         && is_printable(cfg_get(core, core_cfg, latency_log))) {
1803                 gettimeofday(&tve, &tz);
1804                 tdiff = (tve.tv_sec - tvb.tv_sec) * 1000000
1805                                    + (tve.tv_usec - tvb.tv_usec);
1806                 if(tdiff >= cfg_get(core, core_cfg, latency_limit_action)) {
1807                         pstate = PyThreadState_GET();
1808                         if (pstate != NULL && pstate->frame != NULL) {
1809                                 pframe = pstate->frame;
1810                         }
1811 
1812                         LOG(cfg_get(core, core_cfg, latency_log),
1813                                         "alert - action KSR.%s%s%s(...)"
1814                                         " took too long [%u ms] (file:%s func:%s line:%d)\n",
1815                                         (ket->mname.len>0)?ket->mname.s:"",
1816                                         (ket->mname.len>0)?".":"", ket->fname.s, tdiff,
1817                                         (pframe)?PyBytes_AS_STRING(pframe->f_code->co_filename):"",
1818                                         (pframe)?PyBytes_AS_STRING(pframe->f_code->co_name):"",
1819                                         (pframe)?PyCode_Addr2Line(pframe->f_code, pframe->f_lasti):0);
1820                 }
1821         }
1822 
1823         return ret;
1824 }
1825 

That's inside sr_apy_kemi_exec_func().

Some suspicion falls upon this Python function:

    def set_send_socket(self):
        KSR.info('[socket] forced_socket is: {}'.format(KSR.pv.get('$fs')))

        if KSR.pv.get('$fs'):
            KSR.info('[socket] socket already set')
            return
        elif not self.is_local():
            KSR.info('[socket] set socket to 5065')
            KSR.corex.set_send_socket(MY_IP + ':5065')
        else:
            KSR.info('[socket] set socket to 5060')
            KSR.corex.set_send_socket(MY_IP + ':5060')
        KSR.info('[socket] socket set')

... simply because the last message out of KEMI which appears prior to the crash is one of its log emissions ...

May 21 18:20:25 proxy1 /usr/sbin/kamailio[26210]: INFO: 1697311439-5060-88@BA.CFC.EH.IF - {NOTIFY}: <core> [core/kemi.c:104]: sr_kemi_core_info(): domain exists result: None
May 21 18:20:25 proxy1 /usr/sbin/kamailio[26211]: INFO: 1361696919-5060-39@BA.CFC.EH.BFC - {NOTIFY}: <core> [core/kemi.c:104]: sr_kemi_core_info(): [socket] forced_socket is: None
May 21 18:20:25 proxy1 /usr/sbin/kamailio[26213]: INFO: 813204231-5060-53@BA.BF.J.CBF - {NOTIFY}: <core> [core/receive.c:518]: receive_msg(): reply-route executed in: 260 usec
May 21 18:20:25 proxy1 /usr/sbin/kamailio[26212]: INFO: 1941306664-5060-17@BA.CFC.EH.CAG - {NOTIFY}: <core> [core/receive.c:518]: receive_msg(): reply-route executed in: 412 usec
May 21 18:20:25 proxy1 /usr/sbin/kamailio[26209]: ALERT: <core> [main.c:766]: handle_sigs(): child process 26210 exited by a signal 6
May 21 18:20:25 proxy1 /usr/sbin/kamailio[26209]: ALERT: <core> [main.c:769]: handle_sigs(): core was not generated
May 21 18:20:25 proxy1 /usr/sbin/kamailio[26209]: INFO: <core> [main.c:792]: handle_sigs(): terminating due to SIGCHLD

However, it's hard to troubleshoot without debug symbols and due to the third-party binary libraries, of course. Furthermore, the problem is not possible to reproduce on demand, though its incidence does seem to correlate to higher call volumes. Would building from source to provide more flexibility be recommended at this stage?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.