Jiri Kuthan wrote:
Dear SER users,
I would like to collect feedback on use of RADIUS with SIP.
There is a concern in the IETF that RADIUS is not good enough
for accounting due to its lack of reliability. On the other hand,
use of RADIUS for us has been one of the most frequently asked
SER features.
Can people with hands on real deployments share their experience
with me? I'm interested in aspects like how the missing reliability
has been stressing your operation, how much they are interested
in fixing it, and what kind of fixes they would welcoyme (transition
to Diameter? adding fail-over capabilities?)
Due to the fact that VoIP billing is our main product, we have had quite
significant experience in using Radius for authorization and accounting
both for h323 and SIP. Reliability is actually a non-issue for our
clients, since usually both radius clients (e.g. Cisco GWs, SIP server)
and radius are located in the same LAN, so that there is a guaranteed
bandwidth available and communication channel is mostly loseless, even
for UDP. Retransmission mechanism in Radius deals with rare losses just
fine, therefore there is no or very little incentive to switch to
another AAA protocol. At the same time, Radius supported by virtually
any more or less serious VoIP hardware thereby allowing to build
heterogeneous VoIP networks easily.
-Maxim
-Jiri
There is a
believe in the IETF that lack of reliability in RADIUS
determines this work to be dropped. (Authentication is a different
story, though.)
The point of this particular discussion is to understand whether
specification of RADIUS transport behavior might be helpful in that
regard, or whether we've have to go further (such as specifying failover
behavior).
Reading RFC 2865, one of the reasons that RADIUS was not made to run over
TCP in the first place was that it was desired that failover occur in a
timeframe prior to connection failure. RFC 3539 handles this issue by
defining application-layer timers and heartbeats that allow the AAA
application to re-send an accounting packet over another connection before
tearing down a suspect one.
Via the heartbeat mechanism, the AAA client can determine whether the
issue is due to its immediate connection, or something downstream.
Failover only occurs if the immediate connection is found to be suspect,
so failover occurs on a hop-by-hop basis.
RFC 3539 cannot be applied to the RADIUS protocol as it stands because
RADIUS does not support a heartbeat. As a result, a RADIUS client can
failover even where its immediate proxy is healthy, because of a problem
on a downstream RADIUS server. Since RADIUS failover is
typically end-to-end, there may be no failover in proxies, even if a
server is not responding to requests proxied to it.
Even if one were to define appropriate RTT/RTO measurements, and use traffic
from the proxy as a demonstration of
"liveness", inappropriate failover
can still happen in cases where the time between
proxy traffic is greater
than the failover timer.
The end result is that to be able to apply RFC 3539, one would need to add
a heartbeat command to RADIUS. This is a fairly major step, since it
would require changing both RADIUS client and server implementations.
So the question is:
"If a major protocol change were to be made to RADIUS to improve
reliability, would such a change be deployed on RADIUS clients and
servers and would it be acceptable for the SIP accounting specification
to depend on such a change?"
If the answer is yes, then a RADIUS failover spec might be worth
discussing further. If no, then it seems to me that a RADIUS failover spec it might
not be worth doing -- it's not considered important enough to get over the deployment
hurdles.
That raises the question of whether a dependency even on transport
behavior improvements is possible, since even this would still require
changes on RADIUS clients (though not servers).
Comments?
--
Jiri Kuthan
http://iptel.org/~jiri/
_______________________________________________
Serusers mailing list
serusers(a)lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers