Hello,
I noticed a couple of strange behaviors, using this version :
kamailio-5.3.3-4.1.x86_64
I am using algorithm "0". Hash over Call-ID. The reason for that is that
I am just re-sending the message, using send_udp(), to a destination in
a group from file dispatcher.list. Kamailio is completely stateless, it
does not need to remember anything. And using a hash over Call-ID I can
be sure that any other message related to the same call will be sent to
the same destination : retransmissions of the INVITE, CANCEL, ACK, BYE,
etc.
I general, it works, but I detected two problems.
I have a group of 5 destinations, where every server is running at 80%
of the nominal load. It's a 4 + 1 configuration, for redundancy. I know
it's not 100% evenly distributed using hash over Call-ID, but in
practice it's almost perfectly even.
However, if one node is down, instead of having the four alive running
at 100%, I get three running still at 80% capacity, and one is sent 160%
of the load (which of course can't process). it seems, all traffic that
was supposed to be sent to the failed node is transferred to the same
unique destination.
That implementation makes my solution for redundancy worthless.
Why didn't the implementation use, for instance, a re-hash over the
hash? That would allow to redistribute "evenly" over the remaining servers.
I tried to solve this by repeating the entries in dispatcher.list, as I
noticed that kamailio doesn't check if different lines contain
duplicated uris. For instance, if I have two lines with "destination A",
and one line with "destination B", like this:
0 sip:server_A
0 sip:server_A
0 sip:server_B
I see that "destination A" receives twice the amount of SIP OPTIONS that
"destination B" receives.
And so, I used this dispatcher.list :
0 sip:server_1
0 sip:server_2
0 sip:server_3
0 sip:server_4
0 sip:server_5
0 sip:server_1
0 sip:server_3
0 sip:server_5
0 sip:server_2
0 sip:server_4
0 sip:server_1
0 sip:server_4
0 sip:server_2
0 sip:server_5
0 sip:server_3
0 sip:server_1
0 sip:server_5
0 sip:server_4
0 sip:server_3
0 sip:server_2
(it's not a random order. It follows a sequence)
I thought : "what a genius I am. This way, if any node fails, and
kamailio select the "next" to send the traffic to, it will distribute
evenly over the rest".
It doesn't work. I detected something that looks like a bug.
If one entry is repeated at least once, for instance :
0 sip:server_1
0 sip:server_1
0 sip:server_2
0 sip:server_3
0 sip:server_4
0 sip:server_5
if server_1 is down, kamailio still sends some of the INVITEs to it.
Not all of them, but many. I can see the SIP OPTIONS been sent to it,
and the response "ICMP Destination Unreachable", so Kamailio knows that
server_down. However, still sends INVITE requests to that failed node.
Look at this trace :
You can see all the failed SIP OPTIONS. Still, kamailio sends traffic
to that server.
It does not send traffic to a failed destination if it's listed only
once in dispatcher.list
Any ideas?
Thanks in advance,
Luis
--
Luis Rojas
Software Architect
Sixbell
Los Leones 1200
Providencia
Santiago, Chile
Phone: (+56-2) 22001288
mailto:luis.rojas@sixbell.com
http://www.sixbell.com