Hi all,
Im using openser as a call distributor/proxy between a soft-switch/SBC and
voicemail platform. Im seeing a problem with openser in that it is sometimes
cancels an in-progress call (fr_inv_timer firing) because it didnt match the
200/OK with the call.
After some investigation, I noticed that this was happening after a missing ACK
on a previous call caused the voicemail platform to retransmit 200/OK responses
beyond the TM wt_timer expiration, which in turn left several openser child
processes (those that received a 200 after wt_timer expiration) in a state such
that they might not properly match transactions on subsequent calls.
My setup:
I have openser 1.2.0 operating on a linux box with two network interfaces, with
one interface (call it the outside interface) taking incoming calls from the
soft-switch, and the other (inside) connected to the VM platform. I have
openser configured to use both interfaces (see config below) and the TM wt_timer
set to 5 seconds (default). As this is a voicemail system, all of the call
traffic is inbound from the soft-switch. Given the traffic flow, for the most
part the openser child processes servicing the inside interface are handling
responses (180,183,200) from the VM platform.
Call scenario:
When an INVITE arrives from the soft-switch, openser forwards it to the VM
platform. The VM platform responds with a 180 and then a 200. I've noticed
several instances where the soft-switch did not respond with an ACK. This
caused the VM platform to retransmit the 200 several times over a 10 second
period. These were absorbed correctly by openser for the duration of wt_timer.
After the timer expired, however, each openser child process that received a
retransmitted 200 logged something like this:
4(2715) DEBUG: t_reply_matching: hash 45870 label 727647196 branch 0
4(2715) DEBUG: t_reply_matching: no matching transaction exists
4(2715) DEBUG: t_reply_matching: failure to match a transaction
4(2715) DEBUG: t_check: end=(nil)
When I look at the TM code, the static variable T in t_lookup.c is now NULL for
this child process.
On a subsequent inbound call, the INVITE is passed to the VM correctly, and the
180 transaction matches (causing the fr_inv_timer to be armed). If the 200 is
read by child proc 2715, I see:
4(2715) DEBUG: t_check: start=(nil)
4(2715) DEBUG: t_check: T previously sought and not found
The 200 is forwarded back to the soft-switch, which responds with an ACK. Both
end-points think the call is up, but since openser never matched the 200 with
the call, the fr_inv_timer fires and cancels the call. Basically, child proc
2715 wont match any transaction after this unless it happens to process a
request.
I think this problem is made worse by the fact that Im using two network
interfaces, and that the openser children on the inside interface handle (for
the most part) only responses. This problem was touched on here:
http://lists.openser.org/pipermail/users/2007-November/014188.html but I
didnt see any follow up. Also, Ive checked openser 1.2.3 and 1.3.1 for
fixes, but I dont think this has been addressed.
I have a work around, I think, by upping the wt_timer to something like 15
seconds, but I was wondering if there is any scenario in which leaving T=NULL is
desirable.
Thanks in advance
Sean
==== CONFIG =====
#
# VMPROXY-PROXY - route between softswitch and media servers
#
# ----------- global configuration parameters ------------------------
check_via=yes # (cmd. line: -v)
dns=no # (cmd. line: -r)
rev_dns=no # (cmd. line: -R)
children=8
# multi-homed (i.e. listen on two or more interfaces)
mhomed=1
alias=sipproxy1
alias=sipproxy1.vmproxy.com
port=5060
#inside address
listen=192.168.1.100
#outside address
listen=10.55.2.100
disable_dns_blacklist=yes
# ------------------ module loading ----------------------------------
# Uncomment this if you want to use SQL database
#loadmodule "/usr/local/openser/lib/openser/modules/mysql.so"
mpath="/usr/local/openser/lib/openser/modules"
loadmodule "sl.so"
loadmodule "tm.so"
loadmodule "rr.so"
loadmodule "maxfwd.so"
loadmodule "usrloc.so"
loadmodule "registrar.so"
loadmodule "exec.so"
loadmodule "xlog.so"
loadmodule "textops.so"
loadmodule "options.so"
loadmodule "mi_fifo.so"
# Uncomment this if you want digest authentication
# mysql.so must be loaded !
#loadmodule "/usr/local/openser/lib/openser/modules/auth.so"
#loadmodule "/usr/local/openser/lib/openser/modules/auth_db.so"
# ----------------- setting module-specific parameters ---------------
# -- usrloc params --
# local memory
#modparam("usrloc", "db_mode", 0)
# -- rr params --
# add value to ;lr param to make some broken UAs happy
modparam("rr", "enable_full_lr", 1)
# uncomment this for multiple record routes
#modparam("rr", "enable_double_rr", 1)
# -- tm params --
modparam("tm", "fr_timer", 5)
modparam("tm", "fr_inv_timer", 60)
# -- mi_fifo parms
modparam("mi_fifo", "fifo_name", "/tmp/openser_fifo_proxy")
# ------------------------- request routing logic -------------------
# main routing logic
route{
# initial sanity checks -- messages with
# max_forwards==0, or excessively long requests
if (!mf_process_maxfwd_header("10")) {
sl_send_reply("483","Too Many Hops");
return;
};
if ( msg:len > max_len ) {
sl_send_reply("513", "Message too big");
return;
};
xlog("L_INFO","$Tf VMPROXY: <$rm> message From:<$fu>
To:<$tu>
srcip:<$si>\n");
#
# we record-route all messages -- to make sure that
# subsequent messages will go through our proxy; that's
# particularly good if upstream and downstream entities
# use different transport protocol
#
record_route();
#
# loose-route processing
# messages that already have routes on them are proxied here
#
if (loose_route()) {
t_relay();
return;
};
#
# message not for us, route it through
#
if ( !(uri==myself) ) {
if (!t_relay()) {
sl_reply_error();
};
return;
};
xlog("L_DBG","$Tf VMPROXY: <$rm> for me\n");
# handle options meant for me
if( method == "OPTIONS" ) {
options_reply();
return;
};
if( method == "REGISTER" || method == "PUBLISH" || method ==
"SUBSCRIBE"
|| method == "MESSAGE" ) {
sl_send_reply("405","Method Not Allowed");
return;
};
# route INVITE based on source
if( method == "INVITE" ) {
exec_dset("/usr/local/vmproxy/bin/routeINVITE.sh");
xlog("L_DBG","$Tf VMPROXY: <$rm> after routeINVITE:
new
destination <$ru>\n");
t_on_failure("1");
};
# route NOTIFY to softswitch
if( method == "NOTIFY" ) {
exec_dset("/usr/local/vmproxy/bin/routeNOTIFY.sh");
xlog("L_DBG","$Tf VMPROXY: <$rm> after routeNOTIFY:
new
destination <$ru>\n");
t_on_failure("2");
};
#
# NOTE: this logic is only executed if the request didn't have
# a route on it (tsk,tsk). If it did, it would have been handed via
# loose route function above.
if( method == "ACK" || method == "BYE" || method ==
"CANCEL" || method
== "PRACK" ) {
exec_dset("/usr/local/vmproxy/bin/routeACK.sh");
xlog("L_DBG","$Tf VMPROXY: <$rm> after routeACK: new
destination
<$ru>\n");
};
# routing script error?
if( uri=="sip:err@127.0.0.1" ) {
sl_reply_error();
return;
};
# forward to current uri
if (!t_relay()) {
sl_reply_error();
};
}
#
# Initial INVITE route failed, retry
#
failure_route[1] {
# endpoint canceled call
if( t_was_cancelled()) {
xlog("L_DBG", "$Tf VMPROXY: call <id:$ci>
cancelled\n");
return;
}
# don't retry on 40[0-7|5xx failures
if( !t_check_status("40[0-7]|420|487|488|5[0-9][0-9]")) {
xlog("L_ERR","$Tf VMPROXY: ERROR routing call <$rm>
from <$fu>
to <$ru>\n");
exec_dset("/usr/local/vmproxy/bin/routeINVITE2.sh");
xlog("L_ERR","$Tf VMPROXY: Rerouting call <$rm> to
<$ru>\n");
append_branch();
t_relay();
}
}
#
# Initial NOTIFY route failed, retry
#
failure_route[2] {
# endpoint canceled call
if( t_was_cancelled()) {
xlog("L_DBG", "$Tf VMPROXY: call <id:$ci>
cancelled\n");
return;
}
# don't retry on 40[0-7|5xx failures
if( !t_check_status("40[0-7]|5[0-9][0-9]")) {
xlog("L_ERR","$Tf VMPROXY: ERROR routing <$rm> from
<$fu> to
<$ru>\n");
exec_dset("/usr/local/vmproxy/bin/routeNOTIFY2.sh");
xlog("L_ERR","$Tf VMPROXY: Rerouting <$rm> to
<$ru>\n");
append_branch();
t_relay();
}
}