Asynchronous processing feature set - sr-dev

10 Jun 2015


      Hello,
I am looking for some recommendations on how to move forward with 
asynchronous call processing.
Our adoption and standardisation of async processing in >= 4.2 into our 
core product has been a disaster. I don't mean that to sound accusatory; 
it's open source, there's no reason to blame anyone. It's just a matter 
of fact.
1) We can't use async_task_route()/the standard async task worker 
approach in the 'async' module because of this problem, which both Olle 
and I have reported:
http://sr-dev.sip-router.narkive.com/5Sfc5cUU/async-module-cpu-load
Most users run Kamailio inside a VM and the problem shows up for ~50% of 
them.
2) Using t_suspend -> mqueue -> rtimer -> t_continue(), we continue to 
see deadlocks and occasional crashes. They are rare, and are most likely 
to happen in high-throughput, short-duration environments, but when they 
do happen, they're politically disastrous. We've had to roll back use of 
this method of async processing for pretty much all customers for whom 
it was enabled.
After the last time we visited this issue earlier this spring, the 
problem has shifted away from crashes and mostly toward deadlocks. 
Regardless, the customer's enthusiasm for pausing call processing long 
enough to attach a debugger and grab a backtrace or something like that 
is exactly 0.0%. I think most of them have more of an enthusiasm for 
firing us as a vendor than for doing any diagnostic work.
I know there's a way to invoke a process in such a way that when it 
crashes, gdb auto-attaches and pulls a backtrace, then restarts the 
process. I've written such a wrapper script before in the distant past. 
I just don't remember how to do it, especially with modern versions of 
GDB; any suggestions would be appreciated.
Otherwise, I don't really know what to do. We need async processing for 
higher-CPS systems, and would like to standardise upon it in principle, 
but so far it has, from a strictly functional point of view, been an 
enormous economic blunder.
I still prefer to be an "early adopter" of such novelties - when useful 
- in high-volume production systems in order to contribute the testing 
and feedback back to the project. But I have to strike some realistic 
balance here and not lose the customers. :-)
Thanks!
-- Alex
-- 
Alex Balashov | Principal | Evariste Systems LLC
303 Perimeter Center North, Suite 300
Atlanta, GA 30346
United States

Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/