Hi Jason,
over the weekend I pushed a patch that disables the use of dedicated
mutex for t_continue(). It can be enabled by defining ENABLE_ASYNC_MUTEX.
While investigated some reports of crash when removing from time, I
found the potential of a race when t_coninue() is executed at the time
the fr_timer for suspended transaction elapsed. The timer process will
get the transaction out and remove it from timer under the reply lock
and the worker doing t_continue() will get it out under the async lock.
I looked at the commit you did when introducing the dedicated async
mutex, the note being:
- "dedicated lock to prevent multiple invocations of suspend on
tz (reply lock used to be used)"
Perhaps tz is tx and stands for transmission - however, the reply lock
should be safe for this case as well. Moreover, the continue is like the
suspended branch got a reply and transaction continues processing, which
implies the reply lock is aquired (like execution of failure_route,
which can also happen if fr_timer elapses before t_continue() is executed).
Given those, I don't see anymore a reason for dedicated async mutex.
Also, it protects to races of using two mutexes, which can easily lead
to deadlocks (e.g., one process acquires the reply lock and tries to get
the async lock while another one wanted first the reply lock and later
the async lock).
For now I disabled the code with defines, as I wanted to discuss and be
sure I haven't overlooked any issue you tried to avoid with the
dedicated mutex. Let me know what you think about.
Cheers,
Daniel
--
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com