Hi Jason,
the commit doesn't seem correct, because looks like you removed the
locking completely. It should still stay with locking on reply
mutex.
Can you revert the last commit? Then I can take care of removing the
unnecessary code -- note that the code before your commit was
already using the reply lock, so was like it should just be, with
dedicated mutex code being inside defines which were not enabled.
Cheers,
Daniel
On 13/05/15 08:56, Jason Penton wrote:
Hey Daniel,
You are correct this is no longer needed - we actually only
used this to be able to set a flag on the transaction that we
later used for branch picking. We made a fix a while back for
doing the correct branch picking which no longer required this
extra flag and therefore no longer required the mutex. I have
tested in our env and all looks good.
p.s. I have committed the cleanup.
Cheers
Jason
On Tue, 12 May 2015 at 10:34
Daniel-Constantin Mierla <
miconda@gmail.com>
wrote:
Hi Jason,
ok, would be great to sort it out properly, thanks for your
time!
Cheers,
Daniel
On 12/05/15 10:10, Jason Penton wrote:
Hey Daniel,
Okay great, let me look into this. It will be great if
we have one less mutex to worry about ;) - If not
required I will remove and commit.
Cheers
Jason
On Mon, 11 May 2015 at 09:55
Daniel-Constantin Mierla <
miconda@gmail.com>
wrote:
Hi
Jason,
over the weekend I pushed a patch that disables the
use of dedicated
mutex for t_continue(). It can be enabled by defining
ENABLE_ASYNC_MUTEX.
While investigated some reports of crash when removing
from time, I
found the potential of a race when t_coninue() is
executed at the time
the fr_timer for suspended transaction elapsed. The
timer process will
get the transaction out and remove it from timer under
the reply lock
and the worker doing t_continue() will get it out
under the async lock.
I looked at the commit you did when introducing the
dedicated async
mutex, the note being:
- "dedicated lock to prevent multiple
invocations of suspend on
tz (reply lock used to be used)"
Perhaps tz is tx and stands for transmission -
however, the reply lock
should be safe for this case as well. Moreover, the
continue is like the
suspended branch got a reply and transaction continues
processing, which
implies the reply lock is aquired (like execution of
failure_route,
which can also happen if fr_timer elapses before
t_continue() is executed).
Given those, I don't see anymore a reason for
dedicated async mutex.
Also, it protects to races of using two mutexes, which
can easily lead
to deadlocks (e.g., one process acquires the reply
lock and tries to get
the async lock while another one wanted first the
reply lock and later
the async lock).
For now I disabled the code with defines, as I wanted
to discuss and be
sure I haven't overlooked any issue you tried to avoid
with the
dedicated mutex. Let me know what you think about.
Cheers,
Daniel
--
Daniel-Constantin Mierla
http://twitter.com/#!/miconda -
http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com
--
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com
--
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com