Hello,
I noticed a couple of strange behaviors, using this version :
kamailio-5.3.3-4.1.x86_64
I am using algorithm "0". Hash over Call-ID. The reason for that
is that I am just re-sending the message, using send_udp(), to a
destination in a group from file dispatcher.list. Kamailio is
completely stateless, it does not need to remember anything. And
using a hash over Call-ID I can be sure that any other message
related to the same call will be sent to the same destination :
retransmissions of the INVITE, CANCEL, ACK, BYE, etc.
I general, it works, but I detected two problems.
I have a group of 5 destinations, where every server is running at
80% of the nominal load. It's a 4 + 1 configuration, for
redundancy. I know it's not 100% evenly distributed using hash
over Call-ID, but in practice it's almost perfectly even.
However, if one node is down, instead of having the four alive
running at 100%, I get three running still at 80% capacity, and
one is sent 160% of the load (which of course can't process). it
seems, all traffic that was supposed to be sent to the failed node
is transferred to the same unique destination.
That implementation makes my solution for redundancy worthless.
Why didn't the implementation use, for instance, a re-hash over
the hash? That would allow to redistribute "evenly" over the
remaining servers.
I tried to solve this by repeating the entries in dispatcher.list,
as I noticed that kamailio doesn't check if different lines
contain duplicated uris. For instance, if I have two lines with
"destination A", and one line with "destination B", like this:
0 sip:server_A
0 sip:server_A
0 sip:server_B
I see that "destination A" receives twice the amount of SIP
OPTIONS that "destination B" receives.
And so, I used this dispatcher.list :
0 sip:server_1
0 sip:server_2
0 sip:server_3
0 sip:server_4
0 sip:server_5
0 sip:server_1
0 sip:server_3
0 sip:server_5
0 sip:server_2
0 sip:server_4
0 sip:server_1
0 sip:server_4
0 sip:server_2
0 sip:server_5
0 sip:server_3
0 sip:server_1
0 sip:server_5
0 sip:server_4
0 sip:server_3
0 sip:server_2
(it's not a random order. It follows a sequence)
I thought : "what a genius I am. This way, if any node fails, and
kamailio select the "next" to send the traffic to, it will
distribute evenly over the rest".
It doesn't work. I detected something that looks like a bug.
If one entry is repeated at least once, for instance :
0 sip:server_1
0 sip:server_1
0 sip:server_2
0 sip:server_3
0 sip:server_4
0 sip:server_5
if server_1 is down, kamailio still sends some of the INVITEs to
it. Not all of them, but many. I can see the SIP OPTIONS been
sent to it, and the response "ICMP Destination Unreachable", so
Kamailio knows that server_down. However, still sends INVITE
requests to that failed node. Look at this trace :
You can see all the failed SIP OPTIONS. Still, kamailio sends
traffic to that server.
It does not send traffic to a failed destination if it's listed
only once in dispatcher.list
Any ideas?
Thanks in advance,
Luis
-- Luis Rojas Software Architect Sixbell Los Leones 1200 Providencia Santiago, Chile Phone: (+56-2) 22001288 mailto:luis.rojas@sixbell.com http://www.sixbell.com