Still the issue is in the system memory, even the xmlrpc module was
compiled with pkg malloc manager.
The crash this time is in regexp functions from libc.
It could be the operations done with libxml2 inside the xmlrpc, because
they use the libxml2 native functions that rely on system malloc.
I checked the xmlrpc module commit log and last code changes were in
September 2014 - even those are not related to any xml ops. In the past,
have you used the xmlrpc commands for other similar deployments with an
older version than 4.4 of kamailio?
Did you remember if there was any update of the OS bringing a new
version of libxml2? No other ops should use system memory in xmlrpc
module than the ones on libxml2...
Cheers,
Daniel
On 03/05/16 10:46, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
Thinking more of it -- is the route[cleaner]
executed with rtimer on a
dedicated timer process?
I have two timer routes:
modparam("rtimer", "timer",
"name=stat_timer;interval=60;mode=1")
modparam("rtimer", "exec", "timer=stat_timer;route=stats")
modparam("rtimer", "timer",
"name=clean_timer;interval=3600")
modparam("rtimer", "exec",
"timer=clean_timer;route=cleaner")
and I remembered incorrectly: it is the stat_timer route that is
executed once per minute and that does two mysql selects and one
inserts.
i rebuild latest 4.4 kamailio where I have
# use system malloc
#DEFS+=-DXMLRPC_SYSTEM_MALLOC
in xmlrpc/Makefile.
Again I got a crash. Syslog has:
May 3 11:31:22 lohi /usr/bin/sip-proxy[7234]: CRITICAL: <core> [pass_fd.c:275]:
receive_fd(): EOF on 39
May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: ALERT: <core> [main.c:739]:
handle_sigs(): child process 7219 exited by a signal 6
May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: ALERT: <core> [main.c:742]:
handle_sigs(): core was generated
May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: INFO: <core> [main.c:754]:
handle_sigs(): terminating due to SIGCHLD
May 3 11:31:22 lohi /usr/bin/sip-proxy[7233]: INFO: <core> [main.c:809]:
sig_usr(): signal 15 received
May 3 11:31:22 lohi kernel: [115953.517429] sip-proxy[7232]: segfault at 2987008 ip
00007fb666045b45 sp 00007ffe0eea3c60 error 4 in libc-2.19.so[7fb665fcd000+1a2000]
May 3 11:31:22 lohi /usr/bin/sip-proxy[7229]: INFO: <core> [main.c:809]:
sig_usr(): signal 15 received
...
This one produced two core files:
47508 -rw------- 1 root root 60002304 May 3 11:31 core.sip-proxy.sig11.7232
47516 -rw------- 1 root root 60006400 May 3 11:31 core.sip-proxy.sig6.7219
gdbs are below.
-- juha
# gdb /usr/bin/sip-proxy /var/cores/core.sip-proxy.sig11.7232
...
(gdb) where
#0 malloc_consolidate (av=av@entry=0x7fb666372620 <main_arena>) at malloc.c:4151
#1 0x00007fb666046e98 in _int_malloc (av=av@entry=0x7fb666372620 <main_arena>,
bytes=bytes@entry=8192) at malloc.c:3423
#2 0x00007fb6660499dc in __libc_calloc (n=<optimized out>, elem_size=<optimized
out>)
at malloc.c:3219
#3 0x00007fb66603ed43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe0eea3dd0,
sizeloc=sizeloc@entry=0x7ffe0eea3dd8) at memstream.c:85
#4 0x00007fb6660b107d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1,
fmt=0x6d9c80 "%s: %s%s(): signal %d received\n",
ap=ap@entry=0x7ffe0eea3ea8)
at ../misc/syslog.c:167
#5 0x00007fb6660b15bf in __syslog (pri=<optimized out>, fmt=<optimized
out>)
at ../misc/syslog.c:117
#6 0x0000000000502f39 in sig_usr (signo=1714890272) at main.c:809
#7 <signal handler called>
#8 0x00007fb6660b5e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#9 0x00000000005d6c01 in io_wait_loop_epoll (h=<optimized out>, t=<optimized
out>,
repeat=<optimized out>) at io_wait.h:1030
#10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781
#11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788
#12 0x000000000050735f in main_loop () at main.c:1690
#13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
(gdb) quit
# gdb /usr/bin/sip-proxy /var/cores/core.sip-proxy.sig6.7219
...
(gdb) where
#0 0x00007fb666002067 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fb666003448 in __GI_abort () at abort.c:89
#2 0x00007fb6660401b4 in __libc_message (do_abort=do_abort@entry=1,
fmt=fmt@entry=0x7fb666135530 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007fb66604598e in malloc_printerr (action=1,
str=0x7fb6661358d0 "malloc(): memory corruption (fast)", ptr=<optimized
out>) at malloc.c:4996
#4 0x00007fb666047d2b in _int_malloc (av=0x7fb666372620 <main_arena>, bytes=4) at
malloc.c:3359
#5 0x00007fb666049020 in __GI___libc_malloc (bytes=4) at malloc.c:2891
#6 0x00007fb6660926c5 in re_node_set_init_copy (dest=0x296a2c8, src=0x7ffe0eea2ba0)
at regex_internal.c:1031
#7 0x00007fb6660929c3 in create_cd_newstate (hash=2, context=0, nodes=0x7ffe0eea2ba0,
dfa=0x292acb0) at regex_internal.c:1673
#8 re_acquire_state_context (err=err@entry=0x7ffe0eea2b80, dfa=dfa@entry=0x292acb0,
nodes=nodes@entry=0x7ffe0eea2ba0, context=context@entry=0) at regex_internal.c:1545
#9 0x00007fb665fee28b in build_trtable (dfa=<optimized out>,
state=state@entry=0x292be30)
at regexec.c:3439
#10 0x00007fb666098500 in transit_state (state=0x292be30, mctx=0x7ffe0eea2d30,
err=0x7ffe0eea2cf8)
at regexec.c:2322
#11 check_matching (p_match_first=0x7ffe0eea2cf4, fl_longest_match=1,
mctx=0x7ffe0eea2d30)
at regexec.c:1173
#12 re_search_internal (preg=preg@entry=0x7fb665e563a8, string=string@entry=0x2953558
"412",
length=3, start=<optimized out>, start@entry=0, range=3, stop=<optimized
out>,
nmatch=<optimized out>, pmatch=0x7ffe0eea2ee0, eflags=0) at regexec.c:844
#13 0x00007fb66609ce65 in __regexec (preg=0x7fb665e563a8, string=0x2953558
"412",
nmatch=<optimized out>, pmatch=0x7ffe0eea2ee0, eflags=<optimized out>) at
regexec.c:252
#14 0x00007fb6632f0234 in t_check_status (msg=0x7fb6657333e8, p1=0x7fb665e56328
"pY\247e\266\177",
foo=0x6 <error: Cannot access memory at address 0x6>) at tm.c:947
---Type <return> to continue, or q <return> to quit---
#15 0x000000000053c340 in do_action (h=0x7ffe0eea34a0, a=0x7fb665a75010,
msg=0x7fb6657333e8)
at action.c:1060
#16 0x000000000053ae05 in run_actions (h=0x1c33, h@entry=0x7ffe0eea34a0, a=0x1c33,
msg=0x6)
at action.c:1549
#17 0x00000000005477d7 in run_actions_safe (h=0x7ffe0eea3ec0, a=<optimized out>,
msg=<optimized out>) at action.c:1614
#18 0x0000000000424720 in rval_get_int (h=0x7ffe0eea3ec0, msg=0x1c33, i=0x7ffe0eea37d0,
rv=0x7fb666002067 <__GI_raise+55>, cache=0x3134323336393230) at rvalue.c:912
#19 0x000000000042a4ac in rval_expr_eval_int (h=0x7ffe0eea3ec0, msg=0x7fb6657333e8,
res=0x7ffe0eea37d0, rve=0x7fb665a75208) at rvalue.c:1910
#20 0x000000000053c817 in do_action (h=0x7ffe0eea3ec0, a=0x7fb665a789c8,
msg=0x7fb6657333e8)
at action.c:1030
#21 0x000000000053ae05 in run_actions (h=0x1c33, h@entry=0x7ffe0eea3ec0, a=0x1c33,
a@entry=0x7fb665a745f0, msg=0x6, msg@entry=0x7fb6657333e8) at action.c:1549
#22 0x0000000000547875 in run_top_route (a=0x7fb665a745f0, msg=0x7fb6657333e8,
c=<optimized out>)
at action.c:1635
#23 0x00007fb6632b7957 in reply_received (p_msg=0x7fb6657333e8) at t_reply.c:2323
#24 0x00000000004fbfe6 in do_forward_reply (msg=0x7fb6657333e8, mode=0) at forward.c:747
#25 0x0000000000549673 in receive_msg (buf=0x0, len=1702048744, rcv_info=0x7fb665770f50)
at receive.c:298
#26 0x00000000005cb9fa in tcp_read_req (con=0x7fb65def8fd8, bytes_read=0x7ffe0eea4404,
read_flags=0x7ffe0eea4408) at tcp_read.c:1418
#27 0x00000000005d0313 in handle_io (fm=0x7fb65def8fd8, events=7219, idx=-1) at
tcp_read.c:1611
#28 0x00000000005d6c4e in io_wait_loop_epoll (h=<optimized out>, t=<optimized
out>,
repeat=<optimized out>) at io_wait.h:1061
#29 tcp_receive_loop (unix_sock=7219) at tcp_read.c:1781
#30 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788
#31 0x000000000050735f in main_loop () at main.c:1690
#32 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
(gdb)