Hello Arif,
This sounds a lot like you've got a pathological database query or other source of I/O
wait which periodically deadlocks or hangs for an inordinate amount of time, blocking one
of Kamailio's worker threads.
Such a condition can have knock-on effects which overwhelm other workers, since delay
invites retransmissions.
If this hypothesis is meritorious (i.e. you have nontrivial database or other synchronous
I/O-bound interactions), the best thing to do, perhaps, is to audit your external
dependency services. For instance, you can turn on the slow query log in your database and
see if an exceptionally slow query or deadlock is detected. That sort of thing would be
"low-hanging fruit" here.
Failing that, you can run 'netstat --inet -n -l'. Check the Recv-Q column. Under
normal conditions, this should be near zero, though it may burst ephemerally under high
load. If all of Kamailio's workers stop responding, however, it won't empty out,
but just grow bigger.
-- Alex
--
Alex Balashov | Principal | Evariste Systems LLC
303 Perimeter Center North, Suite 300
Atlanta, GA 30346
United States
Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct)
Web:
http://www.evaristesys.com/,
http://www.csrpswitch.com/
Sent from my BlackBerry.