The instance is official 4.3.1 with no custom modules, the only patch is jsonrpc_io.c from #268 to avoid crash when connect a jsonrpc server which was not up.
However, after some experiments, it is caused by execute t_continue() on a transaction that are not suspend.
I can avoid it by adjust script, but did not know which component that actually killed kamailio. but I think it can be reproduce.
We have implement a push join algorithm introduced by Danail at Kamailio World 2014. Aynchronous Processing in Kamailio Configuration File
Let's say the caller A wants to call callee B who is offline:
1. A send INVITE to KAMAILIO
2. set t_set_fr(60000, 4000) to simulate max-wait-timeout for a suspended transaction. ( which is 4 Secs )
3. KAMAILIO suspend this transaction via t_suspend()
4. KAMAILIO stores transaction id in htable with timeout 10 Secs ( htable timeout set via modparam )
5. B did NOT register within 4 Secs. then A timed out.
6. this transaction will go to a failure_route that we armed before suspend.
7. in failure route, A will continue to dial another phone number ( says pstn ), send INVITE to another server, and waiting for 180.
8. B registered after 4 Secs but before 10 Secs ( htable not delete that record yet )
9. so issue t_continue() on transaction A. But it will NOT send anything to B actually. you will see kamailio complaining "script writer didn't release transaction" in syslog. And looks like KAMAILIO did some cleaning procedure.
10. when KAMAILIO received response 180 or 183 from the remote server, sooner or later kamailio will crash (at onsend_route or onreply_route, occurs often when parsing Cseq. )
11. If A send a CANCEL at this moment, kamailio will generate a fake 487 response and swallows it, no forward to remote server, and remote server will waiting 200 ok, left a blocking dead channel alone.
currently I delete record in htable after max-wait-timeout, and everything works fine.
Is it possible if B registered and lookup htable after max-wait-timeout but before we delete htable record in timeout failure_route() ? ( edge case race condition )
Ha, I love the sentence "It should not crash in any case.", it really should become a slogan of kamailio.
"kamailio - it never crash"
Anyway, no matter whether I can avoid crash by adjust config file now or not, I will grant you access if you need. it looks like a hell of corefiles though.
please tell me what's the mail address I should send to, and I'll prepare for it.
for better kamailio, cheers.
—
Reply to this email directly or view it on GitHub.