Andrei Pelinescu-Onciul writes:
If it happens again, could you try to attach with gdb to the processes eating the cpu and send me some back traces and the output of print pt[process_no]? You could try using a larger exit_timeout (e.g. exit_timeout=1800), just to be sure you'll catch them.
andrei,
it did happen again. here is some gdb info. i noticed that after a while the processes stopped consuming lots of cpu time, but still didn't die.
-- juha
(gdb) where #0 fm_status (qm=0x8230b20) at mem/f_malloc.c:614 #1 0x08088a63 in sig_usr (signo=15) at main.c:747 #2 <signal handler called> #3 0xb7f4d424 in __kernel_vsyscall () #4 0xb7dee8ba in sigwaitinfo () from /lib/i686/cmov/libc.so.6 #5 0x08113f37 in slow_timer_main () at timer.c:1108 #6 0x08088475 in main_loop () at main.c:1435 #7 0x0808aaf7 in main (argc=Cannot access memory at address 0x0 ) at main.c:2178
(gdb) print pt[22724] $1 = {pid = 0, unix_sock = 0, idx = 0, desc = '\0' <repeats 127 times>} (gdb)
another process gave this:
(gdb) where #0 0x08121488 in fm_status (qm=0x8230b20) at mem/f_malloc.c:615 #1 0x08088a63 in sig_usr (signo=15) at main.c:747 #2 <signal handler called> #3 0xb7f4d422 in __kernel_vsyscall () #4 0xb7ea3831 in recvfrom () from /lib/i686/cmov/libc.so.6 #5 0x0811aab7 in udp_rcv_loop () at udp_server.c:446 #6 0x08087e03 in main_loop () at main.c:1387 #7 0x0808aaf7 in main (argc=6, argv=0x821d700) at main.c:2178