I'm returning to an old issue, where Kamailio stops processing incoming requests.
Now it turned out that only requests over UDP are not processed. Requests over TCP work fine. Debug log doesn't show anything about the requests over UDP, but wireshark sees them coming. core.ps shows that main process - attendant and all udp receiver processes are alive.
Any ideas what could cause Kamailio (5.2 version) to stop processing requests over UDP? What additional info should I try to get?
-- Juha
Juha Heinanen writes:
Any ideas what could cause Kamailio (5.2 version) to stop processing requests over UDP? What additional info should I try to get?
Is there some means to find out if a UDP receiver process is currently processing a request?
Forgot to mention that after about 10 minutes Kamailio recovers from the UDP freeze on its own.
-- Juha
Hi Juha,
Did you check to see if UDP packets (maybe?) are being queued at the network level before reaching K?
You should be able to see that with netstat, not that this fixes the problem but might point in some direction...
On Tue, Aug 27, 2019 at 9:29 AM Juha Heinanen jh@tutpro.com wrote:
Juha Heinanen writes:
Any ideas what could cause Kamailio (5.2 version) to stop processing requests over UDP? What additional info should I try to get?
Is there some means to find out if a UDP receiver process is currently processing a request?
Forgot to mention that after about 10 minutes Kamailio recovers from the UDP freeze on its own.
-- Juha
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
During the UDP freeze, ss shows:
udp   UNCONN    429312 0     x.x.x.x:5060 *:* tcp   LISTEN    0     128   x.x.x.x:5060 *:*
i.e., there is lots of bytes in UDP receive queue, which K has not picked up.
I don't see anything special in debug log when the udp freeze started at about 08:31:15. Below is debug log when the last request over udp was processed. After that no requests over udp were processed until about 10 minutes later. The options request processed at 08:31:38 came over tcp.
What can cause UDP workers from not picking up requests from the receive queue?
K is here running on Debian Stretch virtual machine.
-- Juha
Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/udp_server.c:491]: udp_rcv_loop(): received on udp socket: (106/100/516) [[SIP/2.0 480 Request Terminated 0D 0A Via: SIP/2.0/UDP x.x.x.x;branch=z9hG4bK77ac.4ef023a18a4e088ff54]] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:620]: parse_msg(): SIP Reply (status): Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:622]: parse_msg(): version: <SIP/2.0> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:624]: parse_msg(): status: <480> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:626]: parse_msg(): reason: <Request Terminated> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:1303]: parse_via_param(): Found param type 232, <branch> = <z9hG4bK77ac.4ef023a18a4e088ff54d3011a49d2566.0>; state=16 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:2639]: parse_via(): end of header reached, state=5 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:498]: parse_headers(): Via found, flags=2 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:500]: parse_headers(): this is the first via Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:1303]: parse_via_param(): Found param type 232, <branch> = <z9hG4bKco9p4lg74s9h8t8ql3nrqeeb60>; state=16 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:2639]: parse_via(): end of header reached, state=5 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:498]: parse_headers(): Via found, flags=78 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:511]: parse_headers(): this is the second via Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_addr_spec.c:185]: parse_to_param(): add param: tag=mh8iDJcH5U8. Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_addr_spec.c:864]: parse_addr_spec(): end of header reached, state=29 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:171]: get_hdr_field(): <To> [69]; uri=[sip:+nnnnnn@x.x.x.x:5060;user=phone] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:174]: get_hdr_field(): to body [sip:+nnnnnn@x.x.x.x:5060;user=phone], to tag [mh8iDJcH5U8.] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:152]: get_hdr_field(): cseq <CSeq>: <1> <INVITE> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/receive.c:241]: receive_msg(): --- received sip message - reply - call-id: [SDm882f01-3268f8b9028f4a395a9cb72379c4b423-ao92gd1] - cseq: [1 INVITE] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7998/19674 T start=(nil) Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:919]: t_reply_matching(): t_reply_matching: hash 51831 label 0 branch 0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:974]: t_reply_matching(): reply (0x7fd7cd1278a0) matched an active transaction (T=0x7fd7b443ead0)! Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_hooks.c:258]: run_trans_callbacks_internal(): DBG: trans=0x7fd7b443ead0, callback type 2, id 0 entered Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:681]: tmcb_func(): acc callback called for t(0x7fd7b443ead0) event type 2, reply code 480 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:413]: should_acc_reply(): probing acc state - code: 480 flags: 0x240005c4 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:430]: should_acc_reply(): failed acc is on Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1111]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T end=0x7fd7b443ead0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_reply.c:2243]: reply_received(): transaction found - T:0x7fd7b443ead0 branch:0 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/io_wait.h:602]: io_watch_del(): DBG: io_watch_del (0x560ad2dc4100, 73, -1, 0x0) fd_no=56 called Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/tcp_main.c:4196]: handle_tcpconn_ev(): sending to child, events 1 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/tcp_main.c:3847]: send2child(): checking per-socket generic workers (19699/27..19706/34) [tcp:x.x.x.x:5060] Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker idx:0 proc:27 pid:19699 for activity on [tcp:x.x.x.x:5060], 0x7fd7b4715da0 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/tcp_read.c:1759]: handle_io(): received n=8 con=0x7fd7b4715da0, fd=20 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/tcp_read.c:1560]: tcp_read_req(): content-length=0 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:610]: parse_msg(): SIP Request: Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:612]: parse_msg(): method: <OPTIONS> Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:614]: parse_msg(): uri: sip:x.x.x.x Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:616]: parse_msg(): version: <SIP/2.0> Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/parse_via.c:1303]: parse_via_param(): Found param type 232, <branch> = <z9hG4bK7ad16a0b>; state=16
Can you see what is the next debug message printed by pid 19674?
If you have access to the system when that blocks in this way during that interval, take the output of 'kamctl trap' or the backtrace with gdb for a sip udp worker process.
Cheers, Daniel
On 29.08.19 08:50, Juha Heinanen wrote:
During the UDP freeze, ss shows:
udp   UNCONN    429312 0     x.x.x.x:5060 *:* tcp   LISTEN    0     128   x.x.x.x:5060 *:*
i.e., there is lots of bytes in UDP receive queue, which K has not picked up.
I don't see anything special in debug log when the udp freeze started at about 08:31:15. Below is debug log when the last request over udp was processed. After that no requests over udp were processed until about 10 minutes later. The options request processed at 08:31:38 came over tcp.
What can cause UDP workers from not picking up requests from the receive queue?
K is here running on Debian Stretch virtual machine.
-- Juha
Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/udp_server.c:491]: udp_rcv_loop(): received on udp socket: (106/100/516) [[SIP/2.0 480 Request Terminated 0D 0A Via: SIP/2.0/UDP x.x.x.x;branch=z9hG4bK77ac.4ef023a18a4e088ff54]] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:620]: parse_msg(): SIP Reply (status): Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:622]: parse_msg(): version: <SIP/2.0> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:624]: parse_msg(): status: <480> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:626]: parse_msg(): reason: <Request Terminated> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:1303]: parse_via_param(): Found param type 232, <branch> = <z9hG4bK77ac.4ef023a18a4e088ff54d3011a49d2566.0>; state=16 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:2639]: parse_via(): end of header reached, state=5 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:498]: parse_headers(): Via found, flags=2 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:500]: parse_headers(): this is the first via Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:1303]: parse_via_param(): Found param type 232, <branch> = <z9hG4bKco9p4lg74s9h8t8ql3nrqeeb60>; state=16 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_via.c:2639]: parse_via(): end of header reached, state=5 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:498]: parse_headers(): Via found, flags=78 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:511]: parse_headers(): this is the second via Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_addr_spec.c:185]: parse_to_param(): add param: tag=mh8iDJcH5U8. Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/parse_addr_spec.c:864]: parse_addr_spec(): end of header reached, state=29 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:171]: get_hdr_field(): <To> [69]; uri=[sip:+nnnnnn@x.x.x.x:5060;user=phone] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:174]: get_hdr_field(): to body [sip:+nnnnnn@x.x.x.x:5060;user=phone], to tag [mh8iDJcH5U8.] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/parser/msg_parser.c:152]: get_hdr_field(): cseq <CSeq>: <1> <INVITE> Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/receive.c:241]: receive_msg(): --- received sip message - reply - call-id: [SDm882f01-3268f8b9028f4a395a9cb72379c4b423-ao92gd1] - cseq: [1 INVITE] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7998/19674 T start=(nil) Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:919]: t_reply_matching(): t_reply_matching: hash 51831 label 0 branch 0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:974]: t_reply_matching(): reply (0x7fd7cd1278a0) matched an active transaction (T=0x7fd7b443ead0)! Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_hooks.c:258]: run_trans_callbacks_internal(): DBG: trans=0x7fd7b443ead0, callback type 2, id 0 entered Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:681]: tmcb_func(): acc callback called for t(0x7fd7b443ead0) event type 2, reply code 480 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:413]: should_acc_reply(): probing acc state - code: 480 flags: 0x240005c4 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:430]: should_acc_reply(): failed acc is on Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1111]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T end=0x7fd7b443ead0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_reply.c:2243]: reply_received(): transaction found - T:0x7fd7b443ead0 branch:0 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/io_wait.h:602]: io_watch_del(): DBG: io_watch_del (0x560ad2dc4100, 73, -1, 0x0) fd_no=56 called Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/tcp_main.c:4196]: handle_tcpconn_ev(): sending to child, events 1 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/tcp_main.c:3847]: send2child(): checking per-socket generic workers (19699/27..19706/34) [tcp:x.x.x.x:5060] Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19711]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker idx:0 proc:27 pid:19699 for activity on [tcp:x.x.x.x:5060], 0x7fd7b4715da0 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/tcp_read.c:1759]: handle_io(): received n=8 con=0x7fd7b4715da0, fd=20 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/tcp_read.c:1560]: tcp_read_req(): content-length=0 Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:610]: parse_msg(): SIP Request: Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:612]: parse_msg(): method: <OPTIONS> Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:614]: parse_msg(): uri: sip:x.x.x.x Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/msg_parser.c:616]: parse_msg(): version: <SIP/2.0> Aug 28 08:31:38 rox2 /usr/bin/sip-proxy[19699]: DEBUG: <core> [core/parser/parse_via.c:1303]: parse_via_param(): Found param type 232, <branch> = <z9hG4bK7ad16a0b>; state=16
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla writes:
Can you see what is the next debug message printed by pid 19674?
Below is a few lines of the process before the freeze and a few after (freeze took this time about 13 minutes). Looks like the process resumed processing of the same sip request.
If you have access to the system when that blocks in this way during that interval, take the output of 'kamctl trap' or the backtrace with gdb for a sip udp worker process.
Will try during the next free.
-- Juha
Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/receive.c:241]: receive_msg(): --- received sip message - reply - call-id: [SDm882f01-3268f8b9028f4a395a9cb72379c4b423-ao92gd1] - cseq: [1 INVITE] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7998/19674 T start=(nil) Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:919]: t_reply_matching(): t_reply_matching: hash 51831 label 0 branch 0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:974]: t_reply_matching(): reply (0x7fd7cd1278a0) matched an active transaction (T=0x7fd7b443ead0)! Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_hooks.c:258]: run_trans_callbacks_internal(): DBG: trans=0x7fd7b443ead0, callback type 2, id 0 entered Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:681]: tmcb_func(): acc callback called for t(0x7fd7b443ead0) event type 2, reply code 480 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:413]: should_acc_reply(): probing acc state - code: 480 flags: 0x240005c4 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:430]: should_acc_reply(): failed acc is on Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1111]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T end=0x7fd7b443ead0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_reply.c:2243]: reply_received(): transaction found - T:0x7fd7b443ead0 branch:0 .... freeze Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_reply.c:2256]: reply_received(): org. status uas=408, uac[0]=408 local=0 is_invite=1) Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T start=0x7fd7b443ead0 Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1116]: t_check_msg(): T (0x7fd7b443ead0) already found for msg (0x7fd7cd1278a0)! Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [tm.c:951]: t_check_status(): checked status is <480> Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T start=0x7fd7b443ead0 Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1116]: t_check_msg(): T (0x7fd7b443ead0) already found for msg (0x7fd7cd1278a0)! Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [tm.c:951]: t_check_status(): checked status is <480> Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T start=0x7fd7b443ead0 Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1116]: t_check_msg(): T (0x7fd7b443ead0) already found for msg (0x7fd7cd1278a0)! Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [tm.c:951]: t_check_status(): checked status is <480>
Being reply processing, one potential operation that can be slow is writing the acc record to the database. Any external tool blocking the acc tables?
Cheers, Daniel
On 29.08.19 10:26, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
Can you see what is the next debug message printed by pid 19674?
Below is a few lines of the process before the freeze and a few after (freeze took this time about 13 minutes). Looks like the process resumed processing of the same sip request.
If you have access to the system when that blocks in this way during that interval, take the output of 'kamctl trap' or the backtrace with gdb for a sip udp worker process.
Will try during the next free.
-- Juha
Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: <core> [core/receive.c:241]: receive_msg(): --- received sip message - reply - call-id: [SDm882f01-3268f8b9028f4a395a9cb72379c4b423-ao92gd1] - cseq: [1 INVITE] Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7998/19674 T start=(nil) Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:919]: t_reply_matching(): t_reply_matching: hash 51831 label 0 branch 0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:974]: t_reply_matching(): reply (0x7fd7cd1278a0) matched an active transaction (T=0x7fd7b443ead0)! Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_hooks.c:258]: run_trans_callbacks_internal(): DBG: trans=0x7fd7b443ead0, callback type 2, id 0 entered Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:681]: tmcb_func(): acc callback called for t(0x7fd7b443ead0) event type 2, reply code 480 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:413]: should_acc_reply(): probing acc state - code: 480 flags: 0x240005c4 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: acc [acc_logic.c:430]: should_acc_reply(): failed acc is on Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1111]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T end=0x7fd7b443ead0 Aug 28 08:31:15 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_reply.c:2243]: reply_received(): transaction found - T:0x7fd7b443ead0 branch:0 .... freeze Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_reply.c:2256]: reply_received(): org. status uas=408, uac[0]=408 local=0 is_invite=1) Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T start=0x7fd7b443ead0 Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1116]: t_check_msg(): T (0x7fd7b443ead0) already found for msg (0x7fd7cd1278a0)! Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [tm.c:951]: t_check_status(): checked status is <480> Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T start=0x7fd7b443ead0 Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1116]: t_check_msg(): T (0x7fd7b443ead0) already found for msg (0x7fd7cd1278a0)! Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [tm.c:951]: t_check_status(): checked status is <480> Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1041]: t_check_msg(): msg (0x7fd7cd1278a0) id=7999/19674 global id=7999/19674 T start=0x7fd7b443ead0 Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [t_lookup.c:1116]: t_check_msg(): T (0x7fd7b443ead0) already found for msg (0x7fd7cd1278a0)! Aug 28 08:44:50 rox2 /usr/bin/sip-proxy[19674]: DEBUG: tm [tm.c:951]: t_check_status(): checked status is <480>
Daniel-Constantin Mierla writes:
Being reply processing, one potential operation that can be slow is writing the acc record to the database. Any external tool blocking the acc tables?
I'm not sure about that. Unfortunately there was no SIP over TCP calls during the UDP freeze so that accounting records would be produced.
Isn't there any timeout on writing accounting records to database, i.e., if writing doesn't succeed in short time, K would give up and create en error message to syslog?
-- Juha
Normally it should report an error (if the database supports it):
https://www.kamailio.org/docs/modules/5.2.x/modules/db_mysql.html#idm1051351...
For library reasons this is in the seconds time range.
Cheers,
Henning
Am 29.08.19 um 13:02 schrieb Juha Heinanen:
Daniel-Constantin Mierla writes:
Being reply processing, one potential operation that can be slow is writing the acc record to the database. Any external tool blocking the acc tables?
I'm not sure about that. Unfortunately there was no SIP over TCP calls during the UDP freeze so that accounting records would be produced.
Isn't there any timeout on writing accounting records to database, i.e., if writing doesn't succeed in short time, K would give up and create en error message to syslog?
-- Juha
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
On 29.08.19 13:02, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
Being reply processing, one potential operation that can be slow is writing the acc record to the database. Any external tool blocking the acc tables?
I'm not sure about that. Unfortunately there was no SIP over TCP calls during the UDP freeze so that accounting records would be produced.
Isn't there any timeout on writing accounting records to database, i.e., if writing doesn't succeed in short time, K would give up and create en error message to syslog?
There is a timeout in db_mysql modules, which sets an option to libmysqlclient, but, the docs from mysql site (at least for some versions) said that timeout might not happen in some cases, like disk full - I made a related comment at:
* https://github.com/kamailio/kamailio/issues/1895#issuecomment-501139692
Also, when I got access to a system that was reported with memory leaks, I found the timer process blocked due to a mysql query (and because timer was blocked, expired transactions were not cleaned) -- the mysql timeout didn't fired there:
* https://github.com/kamailio/kamailio/issues/1895#issuecomment-500895303
That's why backtrace would help to see where the processes are stuck.
Cheers, Daniel
On 29.08.19 10:36, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
If you have access to the system when that blocks in this way during that interval, take the output of 'kamctl trap' or the backtrace with gdb for a sip udp worker process.
Didn't find 'kamctl trap', will try gdb.
What is the version? Practically kamctl trap is wrapping around gdb batch commands to get the backtrace of all kamailio processes.
Cheers, Daniel
Daniel-Constantin Mierla writes:
If you have access to the system when that blocks in this way during that interval, take the output of 'kamctl trap' or the backtrace with gdb for a sip udp worker process.
Got trap of all processes. Placed it here:
https://box.tutpro.com/tmp/gdb_kamailio_20190902_072911
-- Juha
Hello,
is still blocked? Can you get a second trap cmd output? I want to compare with another output taken later if still blocked.
I see a process blocked on a mysql insert query on acc_onreply() and other processes wait for the same lock.
Cheers, Daniel
On 02.09.19 10:22, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
If you have access to the system when that blocks in this way during that interval, take the output of 'kamctl trap' or the backtrace with gdb for a sip udp worker process.
Got trap of all processes. Placed it here:
https://box.tutpro.com/tmp/gdb_kamailio_20190902_072911
-- Juha
Daniel-Constantin Mierla writes:
is still blocked? Can you get a second trap cmd output? I want to compare with another output taken later if still blocked.
Here is two traps taken about 8 minutes apart during the same freeze:
https://box.tutpro.com/tmp/gdb_kamailio_20190902_120249 https://box.tutpro.com/tmp/gdb_kamailio_20190902_121032
During the freeze, it was possible to make MySQL queries to accounting table using mysql cli command.
-- Juha
Juha Heinanen writes:
Here is two traps taken about 8 minutes apart during the same freeze:
https://box.tutpro.com/tmp/gdb_kamailio_20190902_120249 https://box.tutpro.com/tmp/gdb_kamailio_20190902_121032
During the freeze, it was possible to make MySQL queries to accounting table using mysql cli command.
Looks like 8 UDP receivers are waiting for futexlock and 8 are in udp_rcv_loop. Those that are waiting for the lock are processing the same reply. Same in both traps.
There is also mysql ping going on by timer process.
-- Juha
On 03.09.19 09:54, Juha Heinanen wrote:
Juha Heinanen writes:
Here is two traps taken about 8 minutes apart during the same freeze:
https://box.tutpro.com/tmp/gdb_kamailio_20190902_120249 https://box.tutpro.com/tmp/gdb_kamailio_20190902_121032
During the freeze, it was possible to make MySQL queries to accounting table using mysql cli command.
Looks like 8 UDP receivers are waiting for futexlock and 8 are in udp_rcv_loop. Those that are waiting for the lock are processing the same reply. Same in both traps.
There is also mysql ping going on by timer process.
Yesterday I started a trip and didn't get the time to look over the last two traces, I should be able to do it later today.
In the first trace, the ping-pong looked like internally in the libmysqlclient library before doing the acc insert, probably to wake up/be sure connection is alive, and that seemed to be stuck.
Cheers, Daniel
On 03.09.19 10:53, Daniel-Constantin Mierla wrote:
On 03.09.19 09:54, Juha Heinanen wrote:
Juha Heinanen writes:
Here is two traps taken about 8 minutes apart during the same freeze:
https://box.tutpro.com/tmp/gdb_kamailio_20190902_120249 https://box.tutpro.com/tmp/gdb_kamailio_20190902_121032
During the freeze, it was possible to make MySQL queries to accounting table using mysql cli command.
Looks like 8 UDP receivers are waiting for futexlock and 8 are in udp_rcv_loop. Those that are waiting for the lock are processing the same reply. Same in both traps.
There is also mysql ping going on by timer process.
The mysql client library is stuck, waiting for a poll event to happen. It is the insert done for accounting on SIP response.
It is hard to say what is the reason, maybe you can install the debug symbols for lib mysql client library and the the backtrace shows details of what happens inside the library. The mysql modules was not significantly touched for a long time, so might be something specific for lib mysql (mariadb) client.
Cheers, Daniel
Daniel-Constantin Mierla writes:
The mysql client library is stuck, waiting for a poll event to happen. It is the insert done for accounting on SIP response.
It is hard to say what is the reason, maybe you can install the debug symbols for lib mysql client library and the the backtrace shows details of what happens inside the library. The mysql modules was not significantly touched for a long time, so might be something specific for lib mysql (mariadb) client.
I'll try if the problem persists with newer libmariadb3 in Debian Buster and/or without cluster.
-- Juha