On May 25, 2010 at 17:10, marius zbihlei <marius.zbihlei(a)1and1.ro> wrote:
Andrei Pelinescu-Onciul wrote:
On May 20, 2010 at 13:53, marius zbihlei
<marius.zbihlei(a)1and1.ro> wrote:
Forwarded the message from sr-users to sr-dev
list
Cheers
Marius
[...]
Hello
I am a little busy atm, so before I dig into the code, I have a
question for core devs. Is the LOCK_HASH() call recursive (being
called again from the same process will not block) ? I ask this
because in the 4th blocked INVITE the hash _might_ be blocked by
both t_newtran(#16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at
t_lookup.c:1064)
No it's not recursive (it will deadlock if called twice for the same
entry in the same process). This is true for all *ser versions
(sip-router, kamailio < 3.0, ser *).
Hello Andrei
I have looked thru the code and it seems this is the case. While the
transaction is locked in t_newtrans(), the tm callbacks are called,
which call the pua_dialog callback, which in turn call a transaction
function that requires the same lock; so the mutex is locked twice
from the same process. Is there anyway we can prevent this from the
future? One way is to patch pua_dialog so it doesn't call t_uac()(?)
but this still leaves the possibility of another lock happening
somewhere else. I was thinking that a better approach was
implementing a recursive mutex.
I don't like it. It would induce a performance hit (depending on the
arch. and the locking method used). While the performance hit of a
single locking operation will be small, given the number of
locks/unlocks we do under heavy traffic it will have a measurable impact
on the overall performance.
Moreover it's only use would be to "hide" bugs in the code (IMHO locking
twice it's a bug).
If the recursive mutex solution is not desired, what can we do to
prevent these types of deadlocks ?
Well, there is no general solution and I'm not familiar with the dialog
code in question. Somehow locking twice should be avoided (unsafe
callback called from failure route?).
Does the same deadlock appears with 3.0/sip-router?
Marius
>>and 6 t_uac (#6 0xb7b6ce01 in t_uac (method=0xbff60558,
>>headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622
>><publ_cback_func>, cbp=0xa7715158)), thus causing a deadlock.
>>
>
Andrei