[SR-Users] Re: Redundancy

15 Dec 2022

      Hi,
As a starter for your exploration, a few key points:
(1) When we talk about stateful proxies, we (and the standards) mean that they are transaction-stateful, not something like "dialog-stateful".
A transaction consists of a SIP request and 0 or more provisional replies (where applicable) and a final dispositive reply (2xx - 6xx), although ACK is a little special.
This is what a stateful proxy has memory of, and, aside from conferring a slight performance benefit[1], it is needed to implement things like failover timers. Can't have a timeout if you aren't tracking something.
(2) Neither transaction state, nor dialog state, nor any other kind of state, is required to route a SIP message, with the exception of a CANCEL (see below).
Thus, this formulation is actually quite incorrect: "then use stateful responses to direct client back to same node for subsequent messages in a dialog."
You do not need state to route in-dialog requests to the correct place, as these are routed via the Route/RR set in the SIP request body itself. You do not need state to route replies to the correct place, whether to in-dialog requests or any other kind of request, because this is done through the 'Via' header stack.
So, everything that is needed to route SIP requests and replies within some sort of context that persists for some amount of time (SIP calls this a dialog) can actually be found in the content of SIP messages themselves.
You can easily show this by using a stateful-only (i.e. TM) configuration and restarting Kamailio in the middle of a pending or established call. Try it and watch the capture. You will see that every message you expect to have delivered, whether request or reply, will make it exactly where you think it should, even though Kamailio has lost all transaction state[2].
A firm grasp of this is very important to any redundancy ruminations.
(3) The exception to this is a CANCEL, and that is because a CANCEL is a so-called "hop-by-hop" request.
Whereas most requests and replies pass through the proxy, the proxy is actually an independent party to CANCEL requests. That is to say, when a party CANCELs an INVITE, it actually asks the proxy to CANCEL it, and the proxy asks any upstream branches to CANCEL separately. This is to make the forking behaviour of proxies possible.
The consequence of this is that when a proxy receives a CANCEL request, it needs transaction state in order to know which upstream branches to match it up to. If it is lost, it won't know what to do with the CANCEL.
This is the primary obstacle to anycast setups, from my point of view. You can count on any proxy to relay requests statelessly in a correct fashion, but you can't count on any proxy to process a CANCEL correctly. So, if a CANCEL goes to a different place than the INVITE to which it corresponds, it'll be dropped on the floor.
...
Otherwise, and notwithstanding transparent approaches like anycast, the methods you're contemplating are all variations of a common idea: the redundancy and failover is provided on the client side, in principle. Actual choice of method here is usually dictated by what the concrete clients in questio support. For example, not all clients support DNS-based failover, or may not implement it in the way you want. If you're offering a service to many different kinds of clients or devices, you'll have to take that into account.
-- Alex
[1] At a memory cost, but this isn't really a factor in modern computing.
[2] Where it exists. An established call (200 OK + e2e ACK, no BYE yet) will actually not have any transaction state, since all transactions involved in establishing it have been terminated and no further transactions, e.g. to hang it up, have been created.
...
On Dec 14, 2022, at 6:56 PM, Jawaid Bazyar bazyar@gmail.com wrote:
Hi,
 I am exploring different redundancy / load-balancing models for a Kamailio cluster.  When I say cluster, I mean, a number (N) of Kamailio nodes acting as stateful proxies.
 Each node is configured the same as the others, and all have access to the same lookup data to make routing decisions.
 I would appreciate any advice or experience any of you can share on these different models.
 Overall model:
• Direct to proxies
• Redirect servers first, which redirect to proxies

Selecting the first node to talk to. Each model could use either type of selection.
• DNS-based (SRV or NAPTR, client makes call to dns name)
• Anycast with ECMP (equal-cost multi-path routing)
• Cluster with a mobile IP and service-down detection (this would just provide 1:1 protection)

Have clients make calls through the proxy using a DNS record containing an SRV record for each node (or, alternatively, done with NAPTR). Would rely on the client to switch nodes in the event of a node failure mid-call. (Is that even possible?)
 Anycast would only work with UDP signaling. Use Anycast to find the first proxy, then use stateful responses to direct client back to same node for subsequent messages in a dialog.
 So for anyone who has tried any of these methods, I would love to hear the pros and cons..
 Thanks in advance!
 Jawaid
  __________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-leave@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the sender!
Edit mailing list options or unsubscribe:
-- 
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[SR-Users] Re: Redundancy