You haven't even confirmed if it is a real deadlock (as I asked during the past days again), as you didn't report whether is full CPU usage or not. Again, blocking can happen from different reasons, a lot of them being I/O (e.g., writing to syslog, db operations, dns, ...).
Dumping the output of top in a file as well as running gdb in batch mode to grab backtrace in a file is not costing more that few seconds -- you can put the commands in the script for restarting.
Daniel
On 10/06/15 18:29, Alex Balashov wrote:
Daniel,
I hear you, and don't disagree that reports without follow-up are not useful. The limitations on the information I've provided is a reflection of the constraints I face in trying to get it.
My goal was not to criticise, but to ask if there were any suggestions for technical means of gathering the necessary further details while causing negligible downtime after a crash or deadlock was discovered.
-- Alex