Dear SER users,
I would like to collect feedback on use of RADIUS with SIP.
There is a concern in the IETF that RADIUS is not good enough
for accounting due to its lack of reliability. On the other hand,
use of RADIUS for us has been one of the most frequently asked
SER features.
Can people with hands on real deployments share their experience
with me? I'm interested in aspects like how the missing reliability
has been stressing your operation, how much they are interested
in fixing it, and what kind of fixes they would welcoyme (transition
to Diameter? adding fail-over capabilities?)
-Jiri
There is a
believe in the IETF that lack of reliability in RADIUS
determines this work to be dropped. (Authentication is a different
story, though.)
The point of this particular discussion is to understand whether
specification of RADIUS transport behavior might be helpful in that
regard, or whether we've have to go further (such as specifying failover
behavior).
Reading RFC 2865, one of the reasons that RADIUS was not made to run over
TCP in the first place was that it was desired that failover occur in a
timeframe prior to connection failure. RFC 3539 handles this issue by
defining application-layer timers and heartbeats that allow the AAA
application to re-send an accounting packet over another connection before
tearing down a suspect one.
Via the heartbeat mechanism, the AAA client can determine whether the
issue is due to its immediate connection, or something downstream.
Failover only occurs if the immediate connection is found to be suspect,
so failover occurs on a hop-by-hop basis.
RFC 3539 cannot be applied to the RADIUS protocol as it stands because
RADIUS does not support a heartbeat. As a result, a RADIUS client can
failover even where its immediate proxy is healthy, because of a problem
on a downstream RADIUS server. Since RADIUS failover is
typically end-to-end, there may be no failover in proxies, even if a
server is not responding to requests proxied to it.
Even if one were to define appropriate RTT/RTO measurements, and use traffic
from the proxy as a demonstration of "liveness", inappropriate failover
can still happen in cases where the time between proxy traffic is greater
than the failover timer.
The end result is that to be able to apply RFC 3539, one would need to add
a heartbeat command to RADIUS. This is a fairly major step, since it
would require changing both RADIUS client and server implementations.
So the question is:
"If a major protocol change were to be made to RADIUS to improve
reliability, would such a change be deployed on RADIUS clients and
servers and would it be acceptable for the SIP accounting specification
to depend on such a change?"
If the answer is yes, then a RADIUS failover spec might be worth
discussing further. If no, then it seems to me that a RADIUS failover spec it might
not be worth doing -- it's not considered important enough to get over the deployment
hurdles.
That raises the question of whether a dependency even on transport
behavior improvements is possible, since even this would still require
changes on RADIUS clients (though not servers).
Comments?