Andrei Pelinescu-Onciul writes:
If it happens again, could you try to attach with gdb
to the processes
eating the cpu and send me some back traces and the output of print
pt[process_no]? You could try using a larger exit_timeout (e.g.
exit_timeout=1800), just to be sure you'll catch them.
andrei,
it did happen again. here is some gdb info. i noticed that after a
while the processes stopped consuming lots of cpu time, but still didn't
die.
-- juha
(gdb) where
#0 fm_status (qm=0x8230b20) at mem/f_malloc.c:614
#1 0x08088a63 in sig_usr (signo=15) at main.c:747
#2 <signal handler called>
#3 0xb7f4d424 in __kernel_vsyscall ()
#4 0xb7dee8ba in sigwaitinfo () from /lib/i686/cmov/libc.so.6
#5 0x08113f37 in slow_timer_main () at timer.c:1108
#6 0x08088475 in main_loop () at main.c:1435
#7 0x0808aaf7 in main (argc=Cannot access memory at address 0x0
) at main.c:2178
(gdb) print pt[22724]
$1 = {pid = 0, unix_sock = 0, idx = 0, desc = '\0' <repeats 127 times>}
(gdb)
another process gave this:
(gdb) where
#0 0x08121488 in fm_status (qm=0x8230b20) at mem/f_malloc.c:615
#1 0x08088a63 in sig_usr (signo=15) at main.c:747
#2 <signal handler called>
#3 0xb7f4d422 in __kernel_vsyscall ()
#4 0xb7ea3831 in recvfrom () from /lib/i686/cmov/libc.so.6
#5 0x0811aab7 in udp_rcv_loop () at udp_server.c:446
#6 0x08087e03 in main_loop () at main.c:1387
#7 0x0808aaf7 in main (argc=6, argv=0x821d700) at main.c:2178