i have noticed that when i try to restart sr, quite often three of the processes won't die immediately. top show that two of them spend quite a lot cpu time.
they go away after a minute or two.
any idea why this happens and how to make sr all sr processes to die reliably?
-- juha
On Sep 23, 2009 at 21:06, Juha Heinanen jh@tutpro.com wrote:
i have noticed that when i try to restart sr, quite often three of the processes won't die immediately. top show that two of them spend quite a lot cpu time.
they go away after a minute or two.
any idea why this happens and how to make sr all sr processes to die reliably?
Probably some process refuses to terminate (bug). They die after a minute because that's when the shutdown timeout kicks in (if all the processes and the cleanup take longer then 1 minute, sr/ser will kill itself). You can change this timeout using exit_timeout in sr.cfg.
If it happens again, could you try to attach with gdb to the processes eating the cpu and send me some back traces and the output of print pt[process_no]? You could try using a larger exit_timeout (e.g. exit_timeout=1800), just to be sure you'll catch them.
Andrei
Andrei Pelinescu-Onciul writes:
If it happens again, could you try to attach with gdb to the processes eating the cpu and send me some back traces and the output of print pt[process_no]? You could try using a larger exit_timeout (e.g. exit_timeout=1800), just to be sure you'll catch them.
andrei,
it did happen again. here is some gdb info. i noticed that after a while the processes stopped consuming lots of cpu time, but still didn't die.
-- juha
(gdb) where #0 fm_status (qm=0x8230b20) at mem/f_malloc.c:614 #1 0x08088a63 in sig_usr (signo=15) at main.c:747 #2 <signal handler called> #3 0xb7f4d424 in __kernel_vsyscall () #4 0xb7dee8ba in sigwaitinfo () from /lib/i686/cmov/libc.so.6 #5 0x08113f37 in slow_timer_main () at timer.c:1108 #6 0x08088475 in main_loop () at main.c:1435 #7 0x0808aaf7 in main (argc=Cannot access memory at address 0x0 ) at main.c:2178
(gdb) print pt[22724] $1 = {pid = 0, unix_sock = 0, idx = 0, desc = '\0' <repeats 127 times>} (gdb)
another process gave this:
(gdb) where #0 0x08121488 in fm_status (qm=0x8230b20) at mem/f_malloc.c:615 #1 0x08088a63 in sig_usr (signo=15) at main.c:747 #2 <signal handler called> #3 0xb7f4d422 in __kernel_vsyscall () #4 0xb7ea3831 in recvfrom () from /lib/i686/cmov/libc.so.6 #5 0x0811aab7 in udp_rcv_loop () at udp_server.c:446 #6 0x08087e03 in main_loop () at main.c:1387 #7 0x0808aaf7 in main (argc=6, argv=0x821d700) at main.c:2178
On Sep 26, 2009 at 12:35, Juha Heinanen jh@tutpro.com wrote:
Andrei Pelinescu-Onciul writes:
If it happens again, could you try to attach with gdb to the processes eating the cpu and send me some back traces and the output of print pt[process_no]? You could try using a larger exit_timeout (e.g. exit_timeout=1800), just to be sure you'll catch them.
andrei,
it did happen again. here is some gdb info. i noticed that after a while the processes stopped consuming lots of cpu time, but still didn't die.
It's strange, That's the pkg memory status (memory dump at the end, for debugging). One possibility is that there is a lot to log (e.g. memory leak?) and the syslog daemon slows things down. Another possibility is a nasty memory corruption bug, that happens to create some kind of loop in the list of free fragments (e.g. someone writes more then allocated, overwriting some malloc internal information). Did you change memlog in the .cfg? What was your debug level? Do you have in the log, line containing: "fm_status"? If so could you send me the output of grep "f_malloc.c" logfile ?
Does the same happen if you compile with -DDBG_QM_MALLOC and without -DF_MALLOC (qm_malloc might catch a problem sooner)?
Andrei
(gdb) where #0 fm_status (qm=0x8230b20) at mem/f_malloc.c:614 #1 0x08088a63 in sig_usr (signo=15) at main.c:747 #2 <signal handler called> #3 0xb7f4d424 in __kernel_vsyscall () #4 0xb7dee8ba in sigwaitinfo () from /lib/i686/cmov/libc.so.6 #5 0x08113f37 in slow_timer_main () at timer.c:1108 #6 0x08088475 in main_loop () at main.c:1435 #7 0x0808aaf7 in main (argc=Cannot access memory at address 0x0 ) at main.c:2178
(gdb) print pt[22724] $1 = {pid = 0, unix_sock = 0, idx = 0, desc = '\0' <repeats 127 times>} (gdb)
another process gave this:
(gdb) where #0 0x08121488 in fm_status (qm=0x8230b20) at mem/f_malloc.c:615 #1 0x08088a63 in sig_usr (signo=15) at main.c:747 #2 <signal handler called> #3 0xb7f4d422 in __kernel_vsyscall () #4 0xb7ea3831 in recvfrom () from /lib/i686/cmov/libc.so.6 #5 0x0811aab7 in udp_rcv_loop () at udp_server.c:446 #6 0x08087e03 in main_loop () at main.c:1387 #7 0x0808aaf7 in main (argc=6, argv=0x821d700) at main.c:2178
Andrei Pelinescu-Onciul writes:
It's strange, That's the pkg memory status (memory dump at the end, for debugging).
andrei,
i don't have that kind of debugging on, so why is it trying to dump the memory?
One possibility is that there is a lot to log (e.g. memory leak?) and the syslog daemon slows things down.
this happens only once in a while. perhaps a memory leak would make it happen more often.
Did you change memlog in the .cfg? What was your debug level?
i don't set memlog in my .cfg file so it is whatever the default value is. debug is 2.
Do you have in the log, line containing: "fm_status"? If so could you send me the output of grep "f_malloc.c" logfile ?
there is no such line in syslog.
Does the same happen if you compile with -DDBG_QM_MALLOC and without -DF_MALLOC (qm_malloc might catch a problem sooner)?
i have these:
-DPKG_MALLOC \ -DSHM_MEM -DSHM_MMAP \ -DF_MALLOC \
i'll compile with -DDBG_QM_MALLOC to see if there is any difference. but again, why print memory at shutdown when DBG_QM_MALLOC is not enabled?
-- juha
On Sep 29, 2009 at 09:40, Juha Heinanen jh@tutpro.com wrote:
Andrei Pelinescu-Onciul writes:
It's strange, That's the pkg memory status (memory dump at the end, for debugging).
andrei,
i don't have that kind of debugging on, so why is it trying to dump the memory?
I'll fix that. It should try to print it only if memlog<=debug. Right now it won't print anything if memlog>debug, but it will still iterate on all the memory fragments.
One possibility is that there is a lot to log (e.g. memory leak?) and the syslog daemon slows things down.
this happens only once in a while. perhaps a memory leak would make it happen more often.
It's not the syslog. According to your debug value nothing will be printed to the syslog.
Did you change memlog in the .cfg? What was your debug level?
i don't set memlog in my .cfg file so it is whatever the default value is. debug is 2.
melog is 3 by default, so you won't get anything printed.
Do you have in the log, line containing: "fm_status"? If so could you send me the output of grep "f_malloc.c" logfile ?
there is no such line in syslog.
Does the same happen if you compile with -DDBG_QM_MALLOC and without -DF_MALLOC (qm_malloc might catch a problem sooner)?
i have these:
-DPKG_MALLOC \ -DSHM_MEM -DSHM_MMAP \ -DF_MALLOC \
i'll compile with -DDBG_QM_MALLOC to see if there is any difference. but again, why print memory at shutdown when DBG_QM_MALLOC is not enabled?
It should, if memlog < debug (default it's not). It's a good way to debug possible memleaks on production systems and once fixed it will have no impact when not enabled from the .cfg or sercmd.
Andrei