There is a commercial switch vendor who has recently
decided to make two changes that combined are quite evil:
1, They decided to devivate from RFC 3398 and when a
given call INVITE timer expire (any of the SS7 ones,
plus the SIP ones, including any limit specified
in the INVITE itself), instead of translating the
ISUP Release Cause 102 into a SIP Response Code of 504
(Gateway Timeout), they send out a 503 (Service
Unavailable) instead.
(503 is commonly used in low-cost carriers and the
commercial equipment they use to trigger a
route-advance, where the call doesn't end but is
relaunched via another slightly more expensive
route/carrier who perhaps can get the call to the
destination. Some of these low-cost outfits may
retry the same call with five or six carriers
before reluctantly sending to the local access tandem
which is usually the most expensive choice, but
almost certain to work. This probably isn't what
the inventors of the 503 code had in mind, but that
is how it is used. A minority use 404 the same way.)
2. The same vendor implemented an automatic "Address Reach"
testing mechanism for outbound calls. This thing means that
when sending a call to a distant switch, if a single
or small number of calls are returned from that
distant switch with a SIP Response Code of 504 from
the destination, the sending switch will take that
destination switch completely out of route ("blacklist")
and not send any calls at all to that destination
until the blacklist is manually reset. So a result that
applies to one call is fatal to all subsequent calls.
This also means that if you sent the INVITE Expires
timer to a value less than the number of seconds the
calling party is willing to let the called number
ring (called party doesn't have voice mail or it
doesn't pick up for several rings), you trigger a
504 and bingo, you have killed the route between those
two switches. So your own choices for how long to
wait on ring-back for a call dictate how quickly it
could kill the entire route for that call.
Now, the two changes taken together are also somewhat
convienent because this equipment maker just happened
to make it impossible for their own equipment to ever
return a 504 and do can never trip off their own booby-trap.
So in a way what these two changes effectively create a
not-our-hardware-detector, triggering route failures
only when in contact with RFC-compliant equipment made
by somebody else. I think that behavior is anti-competitive,
and probably illegal in quite a few countries (notably
the EU), but that is what exists at the moment. This
creation started getting deployed in the past few months.
So, to get around this nonsense, I need a way in SER
to detect that a 504 has come in and change it to
something else before sending it on. I won't quibble
about the "proxies can't forward a 503", so I won't try
that. I would like to turn the 504 into a 404,
which is better than having route get pulled all over
the place.
Now, the last time (a year or so ago) I asked about
trying to change a SIP Response Code in SER, the
suggested ser.cfg "fix" created all sorts of strange
warnings in the logs about timers expiring and I was
told to not worry about that. Well, it seemed to cause
a lot more problems than just warnings like memory
leaks or cores caused by something, so I ended up having
to hard-code the translation I wanted into the SER
source code and not use rules in ser.cfg to do this.
I'm writing now to (A) alert people about what this
major brand switch maker is doing with regard to 504s
so you will recognize it if you encounter it, but
also (B) to see if maybe has a more elegant way to
change SIP Responses via the ser.cfg file has emerged.
The obvious things like subst subst_uri don't appear to
work, obviously busily editing the non-existent INVITE
method message and ignoring the reply message.
Using a reply/drop combination to generate a new reply
and kill the real one doesn't seem to work either.
As I recall it ends up sending the response specified
in the reply, and then passing the original response as
well. "drop" apparently doesn't kill it completely
and I suspect reply is really meant to be used only during
handling of the method messages, and not the response
messages.
Suggestions appreciated.