Yes, I understand your problem. Handling RADIUS retries demands a server
design made for it. I don't know if it is allowed, but wouldn't it be
better to reduce timeout to 1 (or 2) and retries to 0? I mean, if you don't
get a response within one second (dependent on your network setup), why wait
or retry? I have never really understood the wait and retry of RADIUS, we
tend to failover to secondary or tertiary RADIUS as fast as possible. The
only point I see of using long waits and maybe 1 retry is if you are running
auths across an (unstable) Internet connection. I guess it's part of the
legacy.
g-)
Klaus Darilion wrote:
Greger V. Teigre wrote:
Hi Klaus,
Just a quick response to what you describe below:
We have a different scenario based on three facts:
- We have complete control and monitoring of all participating RADIUS
servers
- Each ser has a RADIUS server on the local LAN where the server
center is managed as a whole (i.e. individual components should not
be unavailable)
- We do not tolerate RADIUS downtime at all. Our 24x7 operations
center will immediately respond and correct the situation
Thus, we have never experienced the scenario below. However, if
something happens, it is actually more likely that we start to NAK
all requests as a default. This of course causes the clients to
re-register, but ser does not slow down.
As you proxy the requests, you probably have a re-send from the
RADIUS proxy to the other servers as well, in addition to ser's
resend.
We have disabled retransmissions at the radius proxy. In
radiusclient.conf we have:
radius_timeout 3
radius_retries 1
Now, our setup works, but it's not a fien working solution. The
problem is that an oingoing radius request will block a thread
completly. Thus, having lots of clients (lots of REGISTERs) and
having a slow radius backend is like a DoS attack.
regards,
klaus