Hi,
As a starter for your exploration, a few key points:
(1) When we talk about stateful proxies, we (and the standards) mean that they are
transaction-stateful, not something like "dialog-stateful".
A transaction consists of a SIP request and 0 or more provisional replies (where
applicable) and a final dispositive reply (2xx - 6xx), although ACK is a little special.
This is what a stateful proxy has memory of, and, aside from conferring a slight
performance benefit[1], it is needed to implement things like failover timers. Can't
have a timeout if you aren't tracking something.
(2) Neither transaction state, nor dialog state, nor any other kind of state, is required
to route a SIP message, with the exception of a CANCEL (see below).
Thus, this formulation is actually quite incorrect: "then use stateful responses to
direct client back to same node for subsequent messages in a dialog."
You do not need state to route in-dialog requests to the correct place, as these are
routed via the Route/RR set in the SIP request body itself. You do not need state to route
replies to the correct place, whether to in-dialog requests or any other kind of request,
because this is done through the 'Via' header stack.
So, everything that is needed to route SIP requests and replies within some sort of
context that persists for some amount of time (SIP calls this a dialog) can actually be
found in the content of SIP messages themselves.
You can easily show this by using a stateful-only (i.e. TM) configuration and restarting
Kamailio in the middle of a pending or established call. Try it and watch the capture. You
will see that every message you expect to have delivered, whether request or reply, will
make it exactly where you think it should, even though Kamailio has lost all transaction
state[2].
A firm grasp of this is very important to any redundancy ruminations.
(3) The exception to this is a CANCEL, and that is because a CANCEL is a so-called
"hop-by-hop" request.
Whereas most requests and replies pass through the proxy, the proxy is actually an
independent party to CANCEL requests. That is to say, when a party CANCELs an INVITE, it
actually asks the proxy to CANCEL it, and the proxy asks any upstream branches to CANCEL
separately. This is to make the forking behaviour of proxies possible.
The consequence of this is that when a proxy receives a CANCEL request, it needs
transaction state in order to know which upstream branches to match it up to. If it is
lost, it won't know what to do with the CANCEL.
This is the primary obstacle to anycast setups, from my point of view. You can count on
any proxy to relay requests statelessly in a correct fashion, but you can't count on
any proxy to process a CANCEL correctly. So, if a CANCEL goes to a different place than
the INVITE to which it corresponds, it'll be dropped on the floor.
...
Otherwise, and notwithstanding transparent approaches like anycast, the methods you're
contemplating are all variations of a common idea: the redundancy and failover is provided
on the client side, in principle. Actual choice of method here is usually dictated by what
the concrete clients in questio support. For example, not all clients support DNS-based
failover, or may not implement it in the way you want. If you're offering a service to
many different kinds of clients or devices, you'll have to take that into account.
-- Alex
[1] At a memory cost, but this isn't really a factor in modern computing.
[2] Where it exists. An established call (200 OK + e2e ACK, no BYE yet) will actually not
have any transaction state, since all transactions involved in establishing it have been
terminated and no further transactions, e.g. to hang it up, have been created.
On Dec 14, 2022, at 6:56 PM, Jawaid Bazyar
<bazyar(a)gmail.com> wrote:
Hi,
I am exploring different redundancy / load-balancing models for a Kamailio cluster.
When I say cluster, I mean, a number (N) of Kamailio nodes acting as stateful proxies.
Each node is configured the same as the others, and all have access to the same lookup
data to make routing decisions.
I would appreciate any advice or experience any of you can share on these different
models.
Overall model:
• Direct to proxies
• Redirect servers first, which redirect to proxies
Selecting the first node to talk to. Each model could use either type of selection.
• DNS-based (SRV or NAPTR, client makes call to dns name)
• Anycast with ECMP (equal-cost multi-path routing)
• Cluster with a mobile IP and service-down detection (this would just provide 1:1
protection)
Have clients make calls through the proxy using a DNS record containing an SRV record
for each node (or, alternatively, done with NAPTR). Would rely on the client to switch
nodes in the event of a node failure mid-call. (Is that even possible?)
Anycast would only work with UDP signaling. Use Anycast to find the first proxy, then
use stateful responses to direct client back to same node for subsequent messages in a
dialog.
So for anyone who has tried any of these methods, I would love to hear the pros and
cons..
Thanks in advance!
Jawaid
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-leave(a)lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the sender!
Edit mailing list options or unsubscribe:
--
Alex Balashov | Principal | Evariste Systems LLC
Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web:
http://www.evaristesys.com/,
http://www.csrpswitch.com/