Hello,
We are experiencing an issue when having some blocking code in the dialog:end event route. If the code takes too long to execute then CDR information is not logged.
In our case we have a blocking socket trying to establish a connection and waiting 500ms before failing. When it timeouts, the accounting module will never log any CDR logs for that call. It appears that the callbacks for DLGCB_TERMINATED_CONFIRMED event are never called. That is because the function dlg_terminated_confirmed is never called (this function is registered as a callback to the TMCB_RESPONSE_OUT event from tm module).
After further investigation, the problem seems to be related to a race condition between two processes when dealing with ending a dialog:
From what we saw the dialog:end event route is treated when a BYE request is received. But if that takes too long, and that BYE request will be retransmitted (we are using SIPp to test this, and SIPp will automatically retransmit the message), it will then be treated by a different process.
When this happens, the first process has already modified the dialog state, so the second process will not execute code specific to dialog ending from the dialog module dlg_onroute function. It will not execute the event route, nor the DLGCB_TERMINATED event callbacks, and it will neither register the dlg_terminated_confirmed function to the TMCB_RESPONSE_OUT event. This means that the second process will actually finish processing the BYE faster then the first process. By the time the first process finishes with processing the dialog:end event route, the transaction corresponding to the BYE request is already created, and even though dlg_terminated_confirmed function is eventually registered for the TMCB_RESPONSE_OUT event, it is too late because this event was already treated, and all callbacks registered for it were called. This means that DLGCB_TERMINATED_CONFIRMED event will never be treated, and the registered callbacks for this event (including the accounting module functions) will not get called.
We managed to work around this behavior by decreasing the timeout of the socket so that the BYE can be processed before a retransmit occurs, but maybe there is a different solution to fix this issue. Do you have any suggestions on how to solve this problem?
Thank you, Claudiu Boriga.
Hello,
do you have proper retransmission detection in the kamailio.cfg? If you look at etc/kamailio.cfg with the latest master, it uses t_precheck_trans() to detect the retransmission in early phase and not allow parallel processing of the same request.
If routing of BYE doesn't depend on any response from your socket communication in event route dialog:end, maybe you can redesign a bit and do that in a dedicated process using mqueue+rtimer, so you avoid blocking from sip routing workers.
Cheers, Daniel
On 03.07.17 13:31, Paul Claudiu Boriga wrote:
Hello,
We are experiencing an issue when having some blocking code in the dialog:end event route. If the code takes too long to execute then CDR information is not logged.
In our case we have a blocking socket trying to establish a connection and waiting 500ms before failing. When it timeouts, the accounting module will never log any CDR logs for that call. It appears that the callbacks for DLGCB_TERMINATED_CONFIRMED event are never called. That is because the function dlg_terminated_confirmed is never called (this function is registered as a callback to the TMCB_RESPONSE_OUT event from tm module).
After further investigation, the problem seems to be related to a race condition between two processes when dealing with ending a dialog:
From what we saw the dialog:end event route is treated when a BYE request is received. But if that takes too long, and that BYE request will be retransmitted (we are using SIPp to test this, and SIPp will automatically retransmit the message), it will then be treated by a different process.
When this happens, the first process has already modified the dialog state, so the second process will not execute code specific to dialog ending from the dialog module dlg_onroute function. It will not execute the event route, nor the DLGCB_TERMINATED event callbacks, and it will neither register the dlg_terminated_confirmed function to the TMCB_RESPONSE_OUT event. This means that the second process will actually finish processing the BYE faster then the first process. By the time the first process finishes with processing the dialog:end event route, the transaction corresponding to the BYE request is already created, and even though dlg_terminated_confirmed function is eventually registered for the TMCB_RESPONSE_OUT event, it is too late because this event was already treated, and all callbacks registered for it were called. This means that DLGCB_TERMINATED_CONFIRMED event will never be treated, and the registered callbacks for this event (including the accounting module functions) will not get called.
We managed to work around this behavior by decreasing the timeout of the socket so that the BYE can be processed before a retransmit occurs, but maybe there is a different solution to fix this issue. Do you have any suggestions on how to solve this problem?
Thank you, Claudiu Boriga.
Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Hello,
Thank you very much for your help. I have tried using t_precheck_trans() to detect retransmission, and not allow parallel processing for BYE requests, and the issue is solved by this modification. Also using an mqueue+rtimer solution works, and no retransmits of the BYE request are happening because of it.
Regards,
Claudiu.
On 03.07.2017 14:58, Daniel-Constantin Mierla wrote:
Hello,
do you have proper retransmission detection in the kamailio.cfg? If you look at etc/kamailio.cfg with the latest master, it uses t_precheck_trans() to detect the retransmission in early phase and not allow parallel processing of the same request.
If routing of BYE doesn't depend on any response from your socket communication in event route dialog:end, maybe you can redesign a bit and do that in a dedicated process using mqueue+rtimer, so you avoid blocking from sip routing workers.
Cheers, Daniel
On 03.07.17 13:31, Paul Claudiu Boriga wrote:
Hello,
We are experiencing an issue when having some blocking code in the dialog:end event route. If the code takes too long to execute then CDR information is not logged.
In our case we have a blocking socket trying to establish a connection and waiting 500ms before failing. When it timeouts, the accounting module will never log any CDR logs for that call. It appears that the callbacks for DLGCB_TERMINATED_CONFIRMED event are never called. That is because the function dlg_terminated_confirmed is never called (this function is registered as a callback to the TMCB_RESPONSE_OUT event from tm module).
After further investigation, the problem seems to be related to a race condition between two processes when dealing with ending a dialog:
From what we saw the dialog:end event route is treated when a BYE request is received. But if that takes too long, and that BYE request will be retransmitted (we are using SIPp to test this, and SIPp will automatically retransmit the message), it will then be treated by a different process.
When this happens, the first process has already modified the dialog state, so the second process will not execute code specific to dialog ending from the dialog module dlg_onroute function. It will not execute the event route, nor the DLGCB_TERMINATED event callbacks, and it will neither register the dlg_terminated_confirmed function to the TMCB_RESPONSE_OUT event. This means that the second process will actually finish processing the BYE faster then the first process. By the time the first process finishes with processing the dialog:end event route, the transaction corresponding to the BYE request is already created, and even though dlg_terminated_confirmed function is eventually registered for the TMCB_RESPONSE_OUT event, it is too late because this event was already treated, and all callbacks registered for it were called. This means that DLGCB_TERMINATED_CONFIRMED event will never be treated, and the registered callbacks for this event (including the accounting module functions) will not get called.
We managed to work around this behavior by decreasing the timeout of the socket so that the BYE can be processed before a retransmit occurs, but maybe there is a different solution to fix this issue. Do you have any suggestions on how to solve this problem?
Thank you, Claudiu Boriga.
Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev