On May 25, 2010 at 17:10, marius zbihlei marius.zbihlei@1and1.ro wrote:
Andrei Pelinescu-Onciul wrote:
On May 20, 2010 at 13:53, marius zbihlei marius.zbihlei@1and1.ro wrote:
Forwarded the message from sr-users to sr-dev list
Cheers Marius
[...]
Hello
I am a little busy atm, so before I dig into the code, I have a question for core devs. Is the LOCK_HASH() call recursive (being called again from the same process will not block) ? I ask this because in the 4th blocked INVITE the hash _might_ be blocked by both t_newtran(#16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064)
No it's not recursive (it will deadlock if called twice for the same entry in the same process). This is true for all *ser versions (sip-router, kamailio < 3.0, ser *).
Hello Andrei
I have looked thru the code and it seems this is the case. While the transaction is locked in t_newtrans(), the tm callbacks are called, which call the pua_dialog callback, which in turn call a transaction function that requires the same lock; so the mutex is locked twice from the same process. Is there anyway we can prevent this from the future? One way is to patch pua_dialog so it doesn't call t_uac()(?) but this still leaves the possibility of another lock happening somewhere else. I was thinking that a better approach was implementing a recursive mutex.
I don't like it. It would induce a performance hit (depending on the arch. and the locking method used). While the performance hit of a single locking operation will be small, given the number of locks/unlocks we do under heavy traffic it will have a measurable impact on the overall performance. Moreover it's only use would be to "hide" bugs in the code (IMHO locking twice it's a bug).
If the recursive mutex solution is not desired, what can we do to prevent these types of deadlocks ?
Well, there is no general solution and I'm not familiar with the dialog code in question. Somehow locking twice should be avoided (unsafe callback called from failure route?).
Does the same deadlock appears with 3.0/sip-router?
Marius
and 6 t_uac (#6 0xb7b6ce01 in t_uac (method=0xbff60558, headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158)), thus causing a deadlock.
Andrei