Hi,
I'm working with Kamailio 1.5 (rev 5968) and from time to time kamailio freezes and stops processing messages. Kamailio is working only in UDP. I really don't know in which conditions this problem is happening: it happens both in high and low load. Here is a backtrace of one of the UDP working processes when kamailio is frozen (all the other UDP process backtraces look similar) :
#0 0xb7fdf410 in ?? () #1 0xbfbfec28 in ?? () #2 0x00000001 in ?? () #3 0xa7392ad0 in ?? () #4 0xb7f3b4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7ba9463 in lock_hash (i=10361) at ../../mem/../fastlock.h:182 #6 0xb7bc4587 in t_lookup_request (p_msg=0x823fc70, leave_new_locked=1) at t_lookup.c:468 #7 0xb7bc54ae in t_newtran (p_msg=0x823fc70) at t_lookup.c:1124 #8 0xb7bb740c in t_relay_to (p_msg=0x823fc70, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7bb8f6c in t_replicate (p_msg=0x823fc70, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7bc7fce in w_t_replicate (p_msg=0x823fc70, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x823fc70) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x823fc70) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x823fc70) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x823fc70) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x823fc70) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x823fc70) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x823fc70) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x823fc70) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x823fc70) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x823fc70) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x823fc70) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x823fc70) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x823fc70) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x823fc70) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x823fc70) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com SIP/2.0\r\nFrom: < sip:xxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxx@xxxxxxxxxxxx.com>;tag=as516fcce2\r\nTo: <sip:xxxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxxx@xxxxxxxxxxxx.com>\r\nCall-ID: 01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com\r\nCSeq: 383 REGISTER\r\nVia: SIP"..., len=794, rcv_info=0xbfc004d4) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbfc006b4) at main.c:774
Any idea of what might be happening?
Best regards,
Santi
Hello,
seems to be something related to t_replicate(). Can you reproduce it easily? If not, how often does it happen? Do you see any particular error messages in syslog, before frozen?
Cheers, Daniel
On 5/13/10 10:39 AM, Santiago Gimeno wrote:
Hi,
I'm working with Kamailio 1.5 (rev 5968) and from time to time kamailio freezes and stops processing messages. Kamailio is working only in UDP. I really don't know in which conditions this problem is happening: it happens both in high and low load. Here is a backtrace of one of the UDP working processes when kamailio is frozen (all the other UDP process backtraces look similar) :
#0 0xb7fdf410 in ?? () #1 0xbfbfec28 in ?? () #2 0x00000001 in ?? () #3 0xa7392ad0 in ?? () #4 0xb7f3b4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7ba9463 in lock_hash (i=10361) at ../../mem/../fastlock.h:182 #6 0xb7bc4587 in t_lookup_request (p_msg=0x823fc70, leave_new_locked=1) at t_lookup.c:468 #7 0xb7bc54ae in t_newtran (p_msg=0x823fc70) at t_lookup.c:1124 #8 0xb7bb740c in t_relay_to (p_msg=0x823fc70, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7bb8f6c in t_replicate (p_msg=0x823fc70, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7bc7fce in w_t_replicate (p_msg=0x823fc70, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x823fc70) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x823fc70) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x823fc70) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x823fc70) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x823fc70) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x823fc70) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x823fc70) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x823fc70) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x823fc70) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x823fc70) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x823fc70) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x823fc70) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x823fc70) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x823fc70) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x823fc70) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com http://xxxxxxxxxxx.com SIP/2.0\r\nFrom: <sip:xxxxxx@xxxxxxxxxxxx.com mailto:sip%3Axxxxxx@xxxxxxxxxxxx.com>;tag=as516fcce2\r\nTo: <sip:xxxxxxx@xxxxxxxxxxxx.com mailto:sip%3Axxxxxxx@xxxxxxxxxxxx.com>\r\nCall-ID: 01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com mailto:01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com\r\nCSeq: 383 REGISTER\r\nVia: SIP"..., len=794, rcv_info=0xbfc004d4) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbfc006b4) at main.c:774
Any idea of what might be happening?
Best regards,
Santi
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Hi,
It happens randomly, after some hours or some days, no particular pattern. There is no error message in the syslog. The debug level is set at 0. Would a higher debug level help to track the issue? We only use t_replicate to send the REGISTER to a backup proxy.
Thanks,
Santi
2010/5/13 Daniel-Constantin Mierla miconda@gmail.com
Hello,
seems to be something related to t_replicate(). Can you reproduce it easily? If not, how often does it happen? Do you see any particular error messages in syslog, before frozen?
Cheers, Daniel
On 5/13/10 10:39 AM, Santiago Gimeno wrote:
Hi,
I'm working with Kamailio 1.5 (rev 5968) and from time to time kamailio freezes and stops processing messages. Kamailio is working only in UDP. I really don't know in which conditions this problem is happening: it happens both in high and low load. Here is a backtrace of one of the UDP working processes when kamailio is frozen (all the other UDP process backtraces look similar) :
#0 0xb7fdf410 in ?? () #1 0xbfbfec28 in ?? () #2 0x00000001 in ?? () #3 0xa7392ad0 in ?? () #4 0xb7f3b4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7ba9463 in lock_hash (i=10361) at ../../mem/../fastlock.h:182 #6 0xb7bc4587 in t_lookup_request (p_msg=0x823fc70, leave_new_locked=1) at t_lookup.c:468 #7 0xb7bc54ae in t_newtran (p_msg=0x823fc70) at t_lookup.c:1124 #8 0xb7bb740c in t_relay_to (p_msg=0x823fc70, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7bb8f6c in t_replicate (p_msg=0x823fc70, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7bc7fce in w_t_replicate (p_msg=0x823fc70, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x823fc70) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x823fc70) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x823fc70) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x823fc70) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x823fc70) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x823fc70) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x823fc70) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x823fc70) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x823fc70) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x823fc70) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x823fc70) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x823fc70) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x823fc70) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x823fc70) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x823fc70) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com SIP/2.0\r\nFrom: < sip:xxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxx@xxxxxxxxxxxx.com>;tag=as516fcce2\r\nTo: <sip:xxxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxxx@xxxxxxxxxxxx.com>\r\nCall-ID: 01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com\r\nCSeq: 383 REGISTER\r\nVia: SIP"..., len=794, rcv_info=0xbfc004d4) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbfc006b4) at main.c:774
Any idea of what might be happening?
Best regards,
Santi
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla
Hello,
what are the parameters for t_replicate(), any flags? Do you create a transaction for register before t_replicate (e.g., by calling t_newtran in config)?
Cheers, Daniel
On 5/13/10 11:53 AM, Santiago Gimeno wrote:
Hi,
It happens randomly, after some hours or some days, no particular pattern. There is no error message in the syslog. The debug level is set at 0. Would a higher debug level help to track the issue? We only use t_replicate to send the REGISTER to a backup proxy.
Thanks,
Santi
2010/5/13 Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com>
Hello, seems to be something related to t_replicate(). Can you reproduce it easily? If not, how often does it happen? Do you see any particular error messages in syslog, before frozen? Cheers, Daniel On 5/13/10 10:39 AM, Santiago Gimeno wrote:
Hi, I'm working with Kamailio 1.5 (rev 5968) and from time to time kamailio freezes and stops processing messages. Kamailio is working only in UDP. I really don't know in which conditions this problem is happening: it happens both in high and low load. Here is a backtrace of one of the UDP working processes when kamailio is frozen (all the other UDP process backtraces look similar) : #0 0xb7fdf410 in ?? () #1 0xbfbfec28 in ?? () #2 0x00000001 in ?? () #3 0xa7392ad0 in ?? () #4 0xb7f3b4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7ba9463 in lock_hash (i=10361) at ../../mem/../fastlock.h:182 #6 0xb7bc4587 in t_lookup_request (p_msg=0x823fc70, leave_new_locked=1) at t_lookup.c:468 #7 0xb7bc54ae in t_newtran (p_msg=0x823fc70) at t_lookup.c:1124 #8 0xb7bb740c in t_relay_to (p_msg=0x823fc70, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7bb8f6c in t_replicate (p_msg=0x823fc70, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7bc7fce in w_t_replicate (p_msg=0x823fc70, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x823fc70) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x823fc70) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x823fc70) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x823fc70) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x823fc70) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x823fc70) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x823fc70) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x823fc70) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x823fc70) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x823fc70) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x823fc70) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x823fc70) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x823fc70) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x823fc70) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x823fc70) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com <http://xxxxxxxxxxx.com> SIP/2.0\r\nFrom: <sip:xxxxxx@xxxxxxxxxxxx.com <mailto:sip%3Axxxxxx@xxxxxxxxxxxx.com>>;tag=as516fcce2\r\nTo: <sip:xxxxxxx@xxxxxxxxxxxx.com <mailto:sip%3Axxxxxxx@xxxxxxxxxxxx.com>>\r\nCall-ID: 01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com <mailto:01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com>\r\nCSeq: 383 REGISTER\r\nVia: SIP"..., len=794, rcv_info=0xbfc004d4) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbfc006b4) at main.c:774 Any idea of what might be happening? Best regards, Santi _______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org <mailto:sr-users@lists.sip-router.org> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla *http://www.asipto.com/ *http://twitter.com/miconda *http://www.linkedin.com/in/danielconstantinmierla
Hi,
We don't use any flags in the t_replicate function, just the uri of the backup proxy. I think we are not creating any transaction before t_replicate.
Thanks,
Santi
2010/5/13 Daniel-Constantin Mierla miconda@gmail.com
Hello,
what are the parameters for t_replicate(), any flags? Do you create a transaction for register before t_replicate (e.g., by calling t_newtran in config)?
Cheers, Daniel
On 5/13/10 11:53 AM, Santiago Gimeno wrote:
Hi,
It happens randomly, after some hours or some days, no particular pattern. There is no error message in the syslog. The debug level is set at 0. Would a higher debug level help to track the issue? We only use t_replicate to send the REGISTER to a backup proxy.
Thanks,
Santi
2010/5/13 Daniel-Constantin Mierla miconda@gmail.com
Hello,
seems to be something related to t_replicate(). Can you reproduce it easily? If not, how often does it happen? Do you see any particular error messages in syslog, before frozen?
Cheers, Daniel
On 5/13/10 10:39 AM, Santiago Gimeno wrote:
Hi,
I'm working with Kamailio 1.5 (rev 5968) and from time to time kamailio freezes and stops processing messages. Kamailio is working only in UDP. I really don't know in which conditions this problem is happening: it happens both in high and low load. Here is a backtrace of one of the UDP working processes when kamailio is frozen (all the other UDP process backtraces look similar) :
#0 0xb7fdf410 in ?? () #1 0xbfbfec28 in ?? () #2 0x00000001 in ?? () #3 0xa7392ad0 in ?? () #4 0xb7f3b4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7ba9463 in lock_hash (i=10361) at ../../mem/../fastlock.h:182 #6 0xb7bc4587 in t_lookup_request (p_msg=0x823fc70, leave_new_locked=1) at t_lookup.c:468 #7 0xb7bc54ae in t_newtran (p_msg=0x823fc70) at t_lookup.c:1124 #8 0xb7bb740c in t_relay_to (p_msg=0x823fc70, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7bb8f6c in t_replicate (p_msg=0x823fc70, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7bc7fce in w_t_replicate (p_msg=0x823fc70, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x823fc70) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x823fc70) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x823fc70) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x823fc70) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x823fc70) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x823fc70) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x823fc70) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x823fc70) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x823fc70) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x823fc70) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x823fc70) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x823fc70) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x823fc70) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x823fc70) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x823fc70) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com SIP/2.0\r\nFrom: < sip:xxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxx@xxxxxxxxxxxx.com>;tag=as516fcce2\r\nTo: <sip:xxxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxxx@xxxxxxxxxxxx.com>\r\nCall-ID: 01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com\r\nCSeq: 383 REGISTER\r\nVia: SIP"..., len=794, rcv_info=0xbfc004d4) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbfc006b4) at main.c:774
Any idea of what might be happening?
Best regards,
Santi
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla
-- Daniel-Constantin Mierla Kamailio Advanced Training for Carriers June 21-23, 2010, Miami, Fl, USA
Hi,
I have experienced a new lock. I have seen that 4 of the UDP worker processes, but not all, have gone 100% cpu. Kamailio was still able to process messages through the rest of UDP worker processes. By checking the logs I can see that the 4 blocked processes have stopped handling messages at the same time.
I have attached to the blocked processes and the backtrace looks similar to the previous one except that it seems that the lock is not exactly related to t_replicate.
#0 0xb7f24410 in ?? () #1 0xbf9cdeb8 in ?? () #2 0x00000001 in ?? () #3 0xa72eb318 in ?? () #4 0xb7e804ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7aee463 in lock_hash (i=13692) at ../../mem/../fastlock.h:182 #6 0xb7b09587 in t_lookup_request (p_msg=0x81d9708, leave_new_locked=1) at t_lookup.c:468 #7 0xb7b0a4ae in t_newtran (p_msg=0x81d9708) at t_lookup.c:1124 #8 0xb7afc40c in t_relay_to (p_msg=0x81d9708, proxy=0x0, flags=8) at t_funcs.c:212 #9 0xb7b0fac7 in w_t_relay (p_msg=0x81d9708, proxy=0x0, flags=0x8 <Address 0x8 out of bounds>) at tm.c:1002 #10 0x0805301c in do_action (a=0x818c370, msg=0x81d9708) at action.c:874 #11 0x080557aa in run_action_list (a=0x818c370, msg=0x81d9708) at action.c:145 #12 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x81d9708, val=0x0) at route.c:1171 #13 0x0809bd80 in eval_expr (e=0x818c400, msg=0x81d9708, val=0x0) at route.c:1488 #14 0x0809bd16 in eval_expr (e=0x818c428, msg=0x81d9708, val=0x0) at route.c:1493 #15 0x080527ed in do_action (a=0x818c740, msg=0x81d9708) at action.c:729 #16 0x080557aa in run_action_list (a=0x818be08, msg=0x81d9708) at action.c:145 #17 0x08053efb in do_action (a=0x81a6a48, msg=0x81d9708) at action.c:120 #18 0x080557aa in run_action_list (a=0x81a14e0, msg=0x81d9708) at action.c:145 #19 0x08053efb in do_action (a=0x81c29f0, msg=0x81d9708) at action.c:120 #20 0x080557aa in run_action_list (a=0x81c2698, msg=0x81d9708) at action.c:145 #21 0x08054491 in do_action (a=0x81c2b48, msg=0x81d9708) at action.c:746 #22 0x080557aa in run_action_list (a=0x81c2030, msg=0x81d9708) at action.c:145 #23 0x08053efb in do_action (a=0x8193278, msg=0x81d9708) at action.c:120 #24 0x080557aa in run_action_list (a=0x8193278, msg=0x81d9708) at action.c:145 #25 0x08054491 in do_action (a=0x8193418, msg=0x81d9708) at action.c:746 #26 0x080557aa in run_action_list (a=0x8193418, msg=0x81d9708) at action.c:145 #27 0x08054f2d in do_action (a=0x8193480, msg=0x81d9708) at action.c:752 #28 0x080557aa in run_action_list (a=0x81922f0, msg=0x81d9708) at action.c:145 #29 0x08054491 in do_action (a=0x81934e8, msg=0x81d9708) at action.c:746 #30 0x080557aa in run_action_list (a=0x818ee78, msg=0x81d9708) at action.c:145 #31 0x08053efb in do_action (a=0x81b3448, msg=0x81d9708) at action.c:120 #32 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x81d9708) at action.c:145 #33 0x08054491 in do_action (a=0x81b5f68, msg=0x81d9708) at action.c:746 #34 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81d9708) at action.c:145 #35 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81d9708) at action.c:752 #36 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81d9708) at action.c:145 #37 0x08053efb in do_action (a=0x818bc08, msg=0x81d9708) at action.c:120 #38 0x080557aa in run_action_list (a=0x8187910, msg=0x81d9708) at action.c:145 #39 0x08055b43 in run_top_route (a=0x8187910, msg=0x81d9708) at action.c:120 #40 0x0808c659 in receive_msg ( buf=0x8158040 "INVITE sip:xxxxxxxx@xxxxxxx.comsip%3Axxxxxxxx@xxxxxxx.comSIP/2.0\r\nRecord-Route: sip:10.100.2.252;lr=on;ftag=as6897d90b\r\nVia: SIP/2.0/UDP 10.100.2.252;branch=z9hG4bKc753.2c8885d4.0\r\nVia: SIP/2.0/UDP 10.100.2.253:5060;bra"..., len=896, rcv_info=0xbf9d0aa4) at receive.c:175 #41 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #42 0x0806e394 in main (argc=9, argv=0xbf9d0c84) at main.c:774
I hope this provides some new hints on what might be happening.
Thanks,
Santi
2010/5/13 Santiago Gimeno santiago.gimeno@gmail.com
Hi,
We don't use any flags in the t_replicate function, just the uri of the backup proxy. I think we are not creating any transaction before t_replicate.
Thanks,
Santi
2010/5/13 Daniel-Constantin Mierla miconda@gmail.com
Hello,
what are the parameters for t_replicate(), any flags? Do you create a transaction for register before t_replicate (e.g., by calling t_newtran in config)?
Cheers, Daniel
On 5/13/10 11:53 AM, Santiago Gimeno wrote:
Hi,
It happens randomly, after some hours or some days, no particular pattern. There is no error message in the syslog. The debug level is set at 0. Would a higher debug level help to track the issue? We only use t_replicate to send the REGISTER to a backup proxy.
Thanks,
Santi
2010/5/13 Daniel-Constantin Mierla miconda@gmail.com
Hello,
seems to be something related to t_replicate(). Can you reproduce it easily? If not, how often does it happen? Do you see any particular error messages in syslog, before frozen?
Cheers, Daniel
On 5/13/10 10:39 AM, Santiago Gimeno wrote:
Hi,
I'm working with Kamailio 1.5 (rev 5968) and from time to time kamailio freezes and stops processing messages. Kamailio is working only in UDP. I really don't know in which conditions this problem is happening: it happens both in high and low load. Here is a backtrace of one of the UDP working processes when kamailio is frozen (all the other UDP process backtraces look similar) :
#0 0xb7fdf410 in ?? () #1 0xbfbfec28 in ?? () #2 0x00000001 in ?? () #3 0xa7392ad0 in ?? () #4 0xb7f3b4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7ba9463 in lock_hash (i=10361) at ../../mem/../fastlock.h:182 #6 0xb7bc4587 in t_lookup_request (p_msg=0x823fc70, leave_new_locked=1) at t_lookup.c:468 #7 0xb7bc54ae in t_newtran (p_msg=0x823fc70) at t_lookup.c:1124 #8 0xb7bb740c in t_relay_to (p_msg=0x823fc70, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7bb8f6c in t_replicate (p_msg=0x823fc70, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7bc7fce in w_t_replicate (p_msg=0x823fc70, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x823fc70) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x823fc70) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x823fc70) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x823fc70) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x823fc70) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x823fc70) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x823fc70) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x823fc70) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x823fc70) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x823fc70) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x823fc70) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x823fc70) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x823fc70) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x823fc70) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x823fc70) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com SIP/2.0\r\nFrom: < sip:xxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxx@xxxxxxxxxxxx.com>;tag=as516fcce2\r\nTo: <sip:xxxxxxx@xxxxxxxxxxxx.com sip%3Axxxxxxx@xxxxxxxxxxxx.com>\r\nCall-ID: 01e4ea8e185c85332ea68874543c810e@xxxxxxxxxxxxxx.com\r\nCSeq: 383 REGISTER\r\nVia: SIP"..., len=794, rcv_info=0xbfc004d4) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbfc006b4) at main.c:774
Any idea of what might be happening?
Best regards,
Santi
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla
-- Daniel-Constantin Mierla Kamailio Advanced Training for Carriers June 21-23, 2010, Miami, Fl, USA
Santiago Gimeno wrote:
Hi,
I have experienced a new lock. I have seen that 4 of the UDP worker processes, but not all, have gone 100% cpu. Kamailio was still able to process messages through the rest of UDP worker processes. By checking the logs I can see that the 4 blocked processes have stopped handling messages at the same time.
I have attached to the blocked processes and the backtrace looks similar to the previous one except that it seems that the lock is not exactly related to t_replicate.
#0 0xb7f24410 in ?? () #1 0xbf9cdeb8 in ?? () #2 0x00000001 in ?? () #3 0xa72eb318 in ?? () #4 0xb7e804ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7aee463 in lock_hash (i=13692) at ../../mem/../fastlock.h:182 #6 0xb7b09587 in t_lookup_request (p_msg=0x81d9708, leave_new_locked=1) at t_lookup.c:468 #7 0xb7b0a4ae in t_newtran (p_msg=0x81d9708) at t_lookup.c:1124 #8 0xb7afc40c in t_relay_to (p_msg=0x81d9708, proxy=0x0, flags=8) at t_funcs.c:212 #9 0xb7b0fac7 in w_t_relay (p_msg=0x81d9708, proxy=0x0, flags=0x8
<Address 0x8 out of bounds>) at tm.c:1002 #10 0x0805301c in do_action (a=0x818c370, msg=0x81d9708) at action.c:874 #11 0x080557aa in run_action_list (a=0x818c370, msg=0x81d9708) at action.c:145 #12 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x81d9708, val=0x0) at route.c:1171 #13 0x0809bd80 in eval_expr (e=0x818c400, msg=0x81d9708, val=0x0) at route.c:1488 #14 0x0809bd16 in eval_expr (e=0x818c428, msg=0x81d9708, val=0x0) at route.c:1493 #15 0x080527ed in do_action (a=0x818c740, msg=0x81d9708) at action.c:729 #16 0x080557aa in run_action_list (a=0x818be08, msg=0x81d9708) at action.c:145 #17 0x08053efb in do_action (a=0x81a6a48, msg=0x81d9708) at action.c:120 #18 0x080557aa in run_action_list (a=0x81a14e0, msg=0x81d9708) at action.c:145 #19 0x08053efb in do_action (a=0x81c29f0, msg=0x81d9708) at action.c:120 #20 0x080557aa in run_action_list (a=0x81c2698, msg=0x81d9708) at action.c:145 #21 0x08054491 in do_action (a=0x81c2b48, msg=0x81d9708) at action.c:746 #22 0x080557aa in run_action_list (a=0x81c2030, msg=0x81d9708) at action.c:145 #23 0x08053efb in do_action (a=0x8193278, msg=0x81d9708) at action.c:120 #24 0x080557aa in run_action_list (a=0x8193278, msg=0x81d9708) at action.c:145 #25 0x08054491 in do_action (a=0x8193418, msg=0x81d9708) at action.c:746 #26 0x080557aa in run_action_list (a=0x8193418, msg=0x81d9708) at action.c:145 #27 0x08054f2d in do_action (a=0x8193480, msg=0x81d9708) at action.c:752 #28 0x080557aa in run_action_list (a=0x81922f0, msg=0x81d9708) at action.c:145 #29 0x08054491 in do_action (a=0x81934e8, msg=0x81d9708) at action.c:746 #30 0x080557aa in run_action_list (a=0x818ee78, msg=0x81d9708) at action.c:145 #31 0x08053efb in do_action (a=0x81b3448, msg=0x81d9708) at action.c:120 #32 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x81d9708) at action.c:145 #33 0x08054491 in do_action (a=0x81b5f68, msg=0x81d9708) at action.c:746 #34 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81d9708) at action.c:145 #35 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81d9708) at action.c:752 #36 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81d9708) at action.c:145 #37 0x08053efb in do_action (a=0x818bc08, msg=0x81d9708) at action.c:120 #38 0x080557aa in run_action_list (a=0x8187910, msg=0x81d9708) at action.c:145 #39 0x08055b43 in run_top_route (a=0x8187910, msg=0x81d9708) at action.c:120 #40 0x0808c659 in receive_msg ( buf=0x8158040 "INVITE sip:xxxxxxxx@xxxxxxx.com <mailto:sip%3Axxxxxxxx@xxxxxxx.com> SIP/2.0\r\nRecord-Route: <sip:10.100.2.252;lr=on;ftag=as6897d90b>\r\nVia: SIP/2.0/UDP 10.100.2.252;branch=z9hG4bKc753.2c8885d4.0\r\nVia: SIP/2.0/UDP 10.100.2.253:5060;bra"..., len=896, rcv_info=0xbf9d0aa4) at receive.c:175 #41 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #42 0x0806e394 in main (argc=9, argv=0xbf9d0c84) at main.c:774
I hope this provides some new hints on what might be happening.
Hello,
It also might help if you can do a bt "full" Also try printing the whole message x/s 0x8158040 should do the trick.
If I recall this correctly you said something about only t_replicating REGISTERs. The message is an INVITE so it also might be that you have a cfg error somewhere
Cheers Marius
Hi,
2010/5/14 marius zbihlei marius.zbihlei@1and1.ro
Hello,
It also might help if you can do a bt "full" Also try printing the whole message x/s 0x8158040 should do the trick.
I have already restarted the server, I'll do that the next time it happens.
If I recall this correctly you said something about only t_replicating REGISTERs. The message is an INVITE so it also might be that you have a cfg error somewhere
Yes, I'm only t_replicating REGISTER's and this backtrace doesn't show any call to t_replicate. Am I missing something?
Cheers Marius
Thanks,
Santi
Santiago Gimeno wrote:
Hi,
2010/5/14 marius zbihlei <marius.zbihlei@1and1.ro mailto:marius.zbihlei@1and1.ro>
Hello, It also might help if you can do a bt "full" Also try printing the whole message x/s 0x8158040 should do the trick.
I have already restarted the server, I'll do that the next time it happens.
If I recall this correctly you said something about only t_replicating REGISTERs. The message is an INVITE so it also might be that you have a cfg error somewhere
Yes, I'm only t_replicating REGISTER's and this backtrace doesn't show any call to t_replicate. Am I missing something?
Hello
Sorry, my fault you are right of course.
Thanks for clearing this up.
Marius
Cheers Marius
Thanks,
Santi
Hi,
The problem happened again and I can provide some more info. 6 of the UDP worker processes got blocked. By checking the logs I can see that 4 of them, that seem to be related to the same INVITE request, got blocked at the same time. The other 2 got blocked some hours later and not at the same time.
From the 4 first processes, the backtrace of 3 of them is this:
#0 0xb7f6d410 in ?? () #1 0xbff60768 in ?? () #2 0x00000001 in ?? () #3 0xa7358180 in ?? () #4 0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182 #6 0xb7b52587 in t_lookup_request (p_msg=0x82403d0, leave_new_locked=1) at t_lookup.c:468 #7 0xb7b534ae in t_newtran (p_msg=0x82403d0) at t_lookup.c:1124 #8 0xb7b4540c in t_relay_to (p_msg=0x82403d0, proxy=0x0, flags=8) at t_funcs.c:212 #9 0xb7b58ac7 in w_t_relay (p_msg=0x82403d0, proxy=0x0, flags=0x8 <Address 0x8 out of bounds>) at tm.c:1002 #10 0x0805301c in do_action (a=0x818c370, msg=0x82403d0) at action.c:874 #11 0x080557aa in run_action_list (a=0x818c370, msg=0x82403d0) at action.c:145 #12 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x82403d0, val=0x0) at route.c:1171 #13 0x0809bd80 in eval_expr (e=0x818c400, msg=0x82403d0, val=0x0) at route.c:1488 #14 0x0809bd16 in eval_expr (e=0x818c428, msg=0x82403d0, val=0x0) at route.c:1493 #15 0x080527ed in do_action (a=0x818c740, msg=0x82403d0) at action.c:729 #16 0x080557aa in run_action_list (a=0x818be08, msg=0x82403d0) at action.c:145 #17 0x08053efb in do_action (a=0x81a12e0, msg=0x82403d0) at action.c:120 #18 0x080557aa in run_action_list (a=0x818ee78, msg=0x82403d0) at action.c:145 #19 0x08053efb in do_action (a=0x81b3448, msg=0x82403d0) at action.c:120 #20 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x82403d0) at action.c:145 #21 0x08054491 in do_action (a=0x81b5f68, msg=0x82403d0) at action.c:746 #22 0x080557aa in run_action_list (a=0x81b5f68, msg=0x82403d0) at action.c:145 #23 0x08054f2d in do_action (a=0x81b5fd0, msg=0x82403d0) at action.c:752 #24 0x080557aa in run_action_list (a=0x81aefd0, msg=0x82403d0) at action.c:145 #25 0x08053efb in do_action (a=0x818bc08, msg=0x82403d0) at action.c:120 #26 0x080557aa in run_action_list (a=0x8187910, msg=0x82403d0) at action.c:145 #27 0x08055b43 in run_top_route (a=0x8187910, msg=0x82403d0) at action.c:120 #28 0x0808c659 in receive_msg ( buf=0x8158040 "INVITE sip:xxxxx@xxxxxxxxxx.comsip%3Axxxxx@xxxxxxxxxx.comSIP/2.0\r\nRecord-Route: sip:10.100.29.7;lr=on;ftag=as60035314\r\nVia: SIP/2.0/UDP 10.100.29.7;branch=z9hG4bKb6d4.a49d7633.0\r\nVia: SIP/2.0/UDP 10.100.29.8:5060;branch=z9hG"..., len=926, rcv_info=0xbff62334) at receive.c:175 #29 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #30 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774
The backtrace of the other is:
#0 0xb7f6d410 in ?? () #1 0xbff60138 in ?? () #2 0x00000001 in ?? () #3 0xa7358180 in ?? () #4 0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182 #6 0xb7b6ce01 in t_uac (method=0xbff60558, headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158) at uac.c:306 #7 0xb7b6e311 in request (m=0xbff60558, ruri=0x81d9adc, to=0x81d9adc, from=0x81d9adc, h=0x81e3108, b=0x81d9afb, oburi=0xb73564ac, cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158) at uac.c:503 #8 0xb7349641 in send_publish (publ=0x81d9aa8) at send_publish.c:552 #9 0xb73339bf in dialog_publish (state=0xb7335bb4 "Trying", entity=0xa7709f34, peer=0xa7709f3c, callid=0xa7709f2c, initiator=1, lifetime=300, localtag=0x0, remotetag=0x0, localtarget=0x0, remotetarget=0x0) at dialog_publish.c:347 #10 0xb73348ea in __dialog_created (dlg=0xa7709ef0, type=2, _params=0xb7a7cb9c) at pua_dialoginfo.c:343 #11 0xb7a586ff in run_create_callbacks (dlg=0xa7709ef0, msg=0x81f18a8) at dlg_cb.c:230 #12 0xb7a60d1f in dlg_new_dialog (msg=0x81f18a8, t=0xa75deeb0) at dlg_handlers.c:494 #13 0xb7a61f77 in dlg_onreq (t=0xa75deeb0, type=1, param=0xb7b785a8) at dlg_handlers.c:414 #14 0xb7b4a791 in run_reqin_callbacks (trans=0xa75deeb0, req=0x81f18a8, code=1) at t_hooks.c:272 #15 0xb7b376af in build_cell (p_msg=0x81f18a8) at h_table.c:284 #16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064 #17 0xb7b4540c in t_relay_to (p_msg=0x81f18a8, proxy=0x0, flags=8) at t_funcs.c:212 #18 0xb7b58ac7 in w_t_relay (p_msg=0x81f18a8, proxy=0x0, flags=0x8 <Address 0x8 out of bounds>) at tm.c:1002 #19 0x0805301c in do_action (a=0x818c370, msg=0x81f18a8) at action.c:874 #20 0x080557aa in run_action_list (a=0x818c370, msg=0x81f18a8) at action.c:145 #21 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x81f18a8, val=0x0) at route.c:1171 #22 0x0809bd80 in eval_expr (e=0x818c400, msg=0x81f18a8, val=0x0) at route.c:1488 #23 0x0809bd16 in eval_expr (e=0x818c428, msg=0x81f18a8, val=0x0) at route.c:1493 #24 0x080527ed in do_action (a=0x818c740, msg=0x81f18a8) at action.c:729 #25 0x080557aa in run_action_list (a=0x818be08, msg=0x81f18a8) at action.c:145 #26 0x08053efb in do_action (a=0x81a12e0, msg=0x81f18a8) at action.c:120 #27 0x080557aa in run_action_list (a=0x818ee78, msg=0x81f18a8) at action.c:145 #28 0x08053efb in do_action (a=0x81b3448, msg=0x81f18a8) at action.c:120 #29 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x81f18a8) at action.c:145 #30 0x08054491 in do_action (a=0x81b5f68, msg=0x81f18a8) at action.c:746 #31 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81f18a8) at action.c:145 #32 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81f18a8) at action.c:752 #33 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81f18a8) at action.c:145 #34 0x08053efb in do_action (a=0x818bc08, msg=0x81f18a8) at action.c:120 #35 0x080557aa in run_action_list (a=0x8187910, msg=0x81f18a8) at action.c:145 #36 0x08055b43 in run_top_route (a=0x8187910, msg=0x81f18a8) at action.c:120 #37 0x0808c659 in receive_msg ( buf=0x8158040 "INVITE sip:xxxxx@xxxxxxxxxxxxxx.comsip%3Axxxxx@xxxxxxxxxxxxxx.comSIP/2.0\r\nRecord-Route: sip:10.100.29.7;lr=on;ftag=as60035314\r\nVia: SIP/2.0/UDP 10.100.29.7;branch=z9hG4bKb6d4.a49d7633.0\r\nVia: SIP/2.0/UDP 10.100.29.8:5060;branch=z9hG"..., len=926, rcv_info=0xbff62334) at receive.c:175 #38 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #39 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774,
And finally the backtraces of the other two:
#0 0xb7f6d410 in ?? () #1 0xbff61848 in ?? () #2 0x00000001 in ?? () #3 0xa7358180 in ?? () #4 0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182 #6 0xb7b6ce01 in t_uac (method=0xbff61c68, headers=0x81e2e20, body=0x81f1ca3, dialog=0xa76e0478, cb=0xb734a622 <publ_cback_func>, cbp=0xa772bbc0) at uac.c:306 #7 0xb7b6e311 in request (m=0xbff61c68, ruri=0x81f1c84, to=0x81f1c84, from=0x81f1c84, h=0x81e2e20, b=0x81f1ca3, oburi=0xb73564ac, cb=0xb734a622 <publ_cback_func>, cbp=0xa772bbc0) at uac.c:503 #8 0xb7349641 in send_publish (publ=0x81f1c50) at send_publish.c:552 #9 0xb73339bf in dialog_publish (state=0xb7335bc6 "early", entity=0xbff61ea8, peer=0xa7630458, callid=0xa7630468, initiator=0, lifetime=300, localtag=0xbff61eb0, remotetag=0xa7630470, localtarget=0xbff61ea0, remotetarget=0xa7630480) at dialog_publish.c:347 #10 0xb7335183 in __dialog_sendpublish (dlg=0xa76d83e0, type=128, _params=0xb7a7cb9c) at pua_dialoginfo.c:256 #11 0xb7a58598 in run_dlg_callbacks (type=128, dlg=0xa76d83e0, msg=0x81d9aa8, dir=2, dlg_data=0x0) at dlg_cb.c:252 #12 0xb7a62475 in dlg_onreply (t=0xa75d0f68, type=128, param=0xb7b785a8) at dlg_handlers.c:274 #13 0xb7b4a942 in run_trans_callbacks (type=128, trans=0xa75d0f68, req=0xa75feaf8, rpl=0x81d9aa8, code=180) at t_hooks.c:240 #14 0xb7b6680d in relay_reply (t=0xa75d0f68, p_msg=0x81d9aa8, branch=<value optimized out>, msg_status=180, cancel_bitmap=0xbff62200) at t_reply.c:1254 #15 0xb7b672d8 in reply_received (p_msg=0x81d9aa8) at t_reply.c:1511 #16 0x0806596b in forward_reply (msg=0x81d9aa8) at forward.c:521 #17 0x0808c887 in receive_msg ( buf=0x8158040 "SIP/2.0 180 Ringing\r\nVia: SIP/2.0/UDP 10.100.0.2;branch=z9hG4bK874c.16937ae2.1\r\nVia: SIP/2.0/UDP 10.100.210.252;branch=z9hG4bK874c.3855deb5.0\r\nVia: SIP/2.0/UDP 10.100.210.253:5060;branch=z9hG4bK4489e1"..., len=782, rcv_info=0xbff62334) at receive.c:212 #18 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #19 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774
and
#0 0xb7f6d410 in ?? () #1 0xbff60a88 in ?? () #2 0x00000001 in ?? () #3 0xa7358180 in ?? () #4 0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182 #6 0xb7b52587 in t_lookup_request (p_msg=0x81f18c0, leave_new_locked=1) at t_lookup.c:468 #7 0xb7b534ae in t_newtran (p_msg=0x81f18c0) at t_lookup.c:1124 #8 0xb7b4540c in t_relay_to (p_msg=0x81f18c0, proxy=0x0, flags=1) at t_funcs.c:212 #9 0xb7b46f6c in t_replicate (p_msg=0x81f18c0, dst=0x817cdf8, flags=1) at t_fwd.c:763 #10 0xb7b55fce in w_t_replicate (p_msg=0x81f18c0, dst=0x817cdf8 "\230ê\030\b\016", flags=0x0) at tm.c:957 #11 0x0805301c in do_action (a=0x818eb20, msg=0x81f18c0) at action.c:874 #12 0x080557aa in run_action_list (a=0x818eb20, msg=0x81f18c0) at action.c:145 #13 0x08054491 in do_action (a=0x818ec78, msg=0x81f18c0) at action.c:746 #14 0x080557aa in run_action_list (a=0x818c940, msg=0x81f18c0) at action.c:145 #15 0x08053efb in do_action (a=0x81b38e0, msg=0x81f18c0) at action.c:120 #16 0x080557aa in run_action_list (a=0x81b3878, msg=0x81f18c0) at action.c:145 #17 0x08054491 in do_action (a=0x81b5f00, msg=0x81f18c0) at action.c:746 #18 0x080557aa in run_action_list (a=0x81b5f00, msg=0x81f18c0) at action.c:145 #19 0x08054f2d in do_action (a=0x81b5f68, msg=0x81f18c0) at action.c:752 #20 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81f18c0) at action.c:145 #21 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81f18c0) at action.c:752 #22 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81f18c0) at action.c:145 #23 0x08053efb in do_action (a=0x818bc08, msg=0x81f18c0) at action.c:120 #24 0x080557aa in run_action_list (a=0x8187910, msg=0x81f18c0) at action.c:145 #25 0x08055b43 in run_top_route (a=0x8187910, msg=0x81f18c0) at action.c:120 #26 0x0808c659 in receive_msg ( buf=0x8158040 "REGISTER sip:xxxxxxxxxxx.com SIP/2.0\r\nVia: SIP/2.0/UDP 10.100.210.252;branch=z9hG4bK76cf4c16d55a6407f\r\nVia: SIP/2.0/UDP 10.100.210.197:5060;branch=z9hG4bK76cf4c16d55a6407f\r\nMax-Forwards: 68\r\nFrom: <sip:"..., len=944, rcv_info=0xbff62334) at receive.c:175 #27 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #28 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774
I can provide full backtraces (bt full) of all the processes but they are too large. Should I attach them on a file? Is there any other information that can be helpful to track this down?
Thanks in advance,
Santi
2010/5/14 Santiago Gimeno santiago.gimeno@gmail.com
Hi,
2010/5/14 marius zbihlei marius.zbihlei@1and1.ro
Hello,
It also might help if you can do a bt "full" Also try printing the whole message x/s 0x8158040 should do the trick.
I have already restarted the server, I'll do that the next time it happens.
If I recall this correctly you said something about only t_replicating REGISTERs. The message is an INVITE so it also might be that you have a cfg error somewhere
Yes, I'm only t_replicating REGISTER's and this backtrace doesn't show any call to t_replicate. Am I missing something?
Cheers Marius
Thanks,
Santi
Santiago Gimeno wrote:
Hi,
The problem happened again and I can provide some more info. 6 of the UDP worker processes got blocked. By checking the logs I can see that 4 of them, that seem to be related to the same INVITE request, got blocked at the same time. The other 2 got blocked some hours later and not at the same time.
From the 4 first processes, the backtrace of 3 of them is this:
#0 0xb7f6d410 in ?? () #1 0xbff60768 in ?? () #2 0x00000001 in ?? () #3 0xa7358180 in ?? () #4 0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182 #6 0xb7b52587 in t_lookup_request (p_msg=0x82403d0, leave_new_locked=1) at t_lookup.c:468 #7 0xb7b534ae in t_newtran (p_msg=0x82403d0) at t_lookup.c:1124
The backtrace of the other is:
#0 0xb7f6d410 in ?? () #1 0xbff60138 in ?? () #2 0x00000001 in ?? () #3 0xa7358180 in ?? () #4 0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182 #6 0xb7b6ce01 in t_uac (method=0xbff60558, headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158) at uac.c:306 #7 0xb7b6e311 in request (m=0xbff60558, ruri=0x81d9adc, to=0x81d9adc, from=0x81d9adc, h=0x81e3108, b=0x81d9afb, oburi=0xb73564ac, cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158) at uac.c:503 #8 0xb7349641 in send_publish (publ=0x81d9aa8) at send_publish.c:552 #9 0xb73339bf in dialog_publish (state=0xb7335bb4 "Trying", entity=0xa7709f34, peer=0xa7709f3c, callid=0xa7709f2c, initiator=1, lifetime=300, localtag=0x0, remotetag=0x0, localtarget=0x0, remotetarget=0x0) at dialog_publish.c:347 #10 0xb73348ea in __dialog_created (dlg=0xa7709ef0, type=2, _params=0xb7a7cb9c) at pua_dialoginfo.c:343 #11 0xb7a586ff in run_create_callbacks (dlg=0xa7709ef0, msg=0x81f18a8) at dlg_cb.c:230 #12 0xb7a60d1f in dlg_new_dialog (msg=0x81f18a8, t=0xa75deeb0) at dlg_handlers.c:494 #13 0xb7a61f77 in dlg_onreq (t=0xa75deeb0, type=1, param=0xb7b785a8) at dlg_handlers.c:414 #14 0xb7b4a791 in run_reqin_callbacks (trans=0xa75deeb0, req=0x81f18a8, code=1) at t_hooks.c:272 #15 0xb7b376af in build_cell (p_msg=0x81f18a8) at h_table.c:284 #16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064 #17 0xb7b4540c in t_relay_to (p_msg=0x81f18a8, proxy=0x0, flags=8) at t_funcs.c:212 #18 0xb7b58ac7 in w_t_relay (p_msg=0x81f18a8, proxy=0x0, flags=0x8
<Address 0x8 out of bounds>) at tm.c:1002 #19 0x0805301c in do_action (a=0x818c370, msg=0x81f18a8) at action.c:874 #20 0x080557aa in run_action_list (a=0x818c370, msg=0x81f18a8) at action.c:145 #21 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x81f18a8, val=0x0) at route.c:1171 #22 0x0809bd80 in eval_expr (e=0x818c400, msg=0x81f18a8, val=0x0) at route.c:1488 #23 0x0809bd16 in eval_expr (e=0x818c428, msg=0x81f18a8, val=0x0) at route.c:1493 #24 0x080527ed in do_action (a=0x818c740, msg=0x81f18a8) at action.c:729 #25 0x080557aa in run_action_list (a=0x818be08, msg=0x81f18a8) at action.c:145 #26 0x08053efb in do_action (a=0x81a12e0, msg=0x81f18a8) at action.c:120 #27 0x080557aa in run_action_list (a=0x818ee78, msg=0x81f18a8) at action.c:145 #28 0x08053efb in do_action (a=0x81b3448, msg=0x81f18a8) at action.c:120 #29 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x81f18a8) at action.c:145 #30 0x08054491 in do_action (a=0x81b5f68, msg=0x81f18a8) at action.c:746 #31 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81f18a8) at action.c:145 #32 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81f18a8) at action.c:752 #33 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81f18a8) at action.c:145 #34 0x08053efb in do_action (a=0x818bc08, msg=0x81f18a8) at action.c:120 #35 0x080557aa in run_action_list (a=0x8187910, msg=0x81f18a8) at action.c:145 #36 0x08055b43 in run_top_route (a=0x8187910, msg=0x81f18a8) at action.c:120 #37 0x0808c659 in receive_msg ( buf=0x8158040 "INVITE sip:xxxxx@xxxxxxxxxxxxxx.com <mailto:sip%3Axxxxx@xxxxxxxxxxxxxx.com> SIP/2.0\r\nRecord-Route: <sip:10.100.29.7;lr=on;ftag=as60035314>\r\nVia: SIP/2.0/UDP 10.100.29.7;branch=z9hG4bKb6d4.a49d7633.0\r\nVia: SIP/2.0/UDP 10.100.29.8:5060;branch=z9hG"..., len=926, rcv_info=0xbff62334) at receive.c:175 #38 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449 #39 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774,
Hello
I am a little busy atm, so before I dig into the code, I have a question for core devs. Is the LOCK_HASH() call recursive (being called again from the same process will not block) ? I ask this because in the 4th blocked INVITE the hash _might_ be blocked by both t_newtran(#16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064) and 6 t_uac (#6 0xb7b6ce01 in t_uac (method=0xbff60558, headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158)), thus causing a deadlock.
Thanks Marius