Hi Sean,
Yes, t_check() sets T as NULL if no transaction is matched, but the reply_received() function (that calls t_check), if T was set to NULL will go to "not_found" label and set T to T_UNDEFINED.
Do you agree on this? if so, we can start working in adding some more debug logs to see where the problem is.
Regards, Bogdan
Sean O'Donnell wrote:
Hi all,
I’m using openser as a call distributor/proxy between a soft-switch/SBC and voicemail platform. I’m seeing a problem with openser in that it is sometimes cancels an in-progress call (fr_inv_timer firing) because it didn’t match the 200/OK with the call.
After some investigation, I noticed that this was happening after a missing ACK on a previous call caused the voicemail platform to retransmit 200/OK responses beyond the TM wt_timer expiration, which in turn left several openser child processes (those that received a 200 after wt_timer expiration) in a state such that they might not properly match transactions on subsequent calls.
My setup: I have openser 1.2.0 operating on a linux box with two network interfaces, with one interface (call it the outside interface) taking incoming calls from the soft-switch, and the other (inside) connected to the VM platform. I have openser configured to use both interfaces (see config below) and the TM wt_timer set to 5 seconds (default). As this is a voicemail system, all of the call traffic is inbound from the soft-switch. Given the traffic flow, for the most part the openser child processes servicing the inside interface are handling responses (180,183,200) from the VM platform.
Call scenario: When an INVITE arrives from the soft-switch, openser forwards it to the VM platform. The VM platform responds with a 180 and then a 200. I've noticed several instances where the soft-switch did not respond with an ACK. This caused the VM platform to retransmit the 200 several times over a 10 second period. These were absorbed correctly by openser for the duration of wt_timer. After the timer expired, however, each openser child process that received a retransmitted 200 logged something like this: 4(2715) DEBUG: t_reply_matching: hash 45870 label 727647196 branch 0 4(2715) DEBUG: t_reply_matching: no matching transaction exists 4(2715) DEBUG: t_reply_matching: failure to match a transaction 4(2715) DEBUG: t_check: end=(nil)
When I look at the TM code, the static variable T in t_lookup.c is now NULL for this child process.
On a subsequent inbound call, the INVITE is passed to the VM correctly, and the 180 transaction matches (causing the fr_inv_timer to be armed). If the 200 is read by child proc 2715, I see: 4(2715) DEBUG: t_check: start=(nil) 4(2715) DEBUG: t_check: T previously sought and not found
The 200 is forwarded back to the soft-switch, which responds with an ACK. Both end-points think the call is up, but since openser never matched the 200 with the call, the fr_inv_timer fires and cancels the call. Basically, child proc 2715 won’t match any transaction after this unless it happens to process a request.
I think this problem is made worse by the fact that I’m using two network interfaces, and that the openser children on the inside interface handle (for the most part) only responses. This problem was touched on here: http://lists.openser.org/pipermail/users/2007-November/014188.html but I didn’t see any follow up. Also, I’ve checked openser 1.2.3 and 1.3.1 for fixes, but I don’t think this has been addressed.
I have a work around, I think, by upping the wt_timer to something like 15 seconds, but I was wondering if there is any scenario in which leaving T=NULL is desirable.
Thanks in advance Sean