Hello,

I noticed a couple of strange behaviors, using this version : kamailio-5.3.3-4.1.x86_64

I am using algorithm "0". Hash over Call-ID. The reason for that is that I am just re-sending the message, using send_udp(), to a destination in a group from file dispatcher.list.  Kamailio is completely stateless, it does not need to remember anything.  And using a hash over Call-ID I can be sure that any other message related to the same call will be sent to the same destination  : retransmissions of the INVITE, CANCEL, ACK, BYE, etc.

I general, it works, but I detected two problems.

I have a group of 5 destinations, where every server is running at 80% of the nominal load. It's a 4 + 1 configuration, for redundancy.  I know it's not 100% evenly distributed using hash over Call-ID, but in practice it's almost perfectly even.

However, if one node is down, instead of having the four alive running at 100%, I get three running still at 80% capacity, and one is sent 160% of the load (which of course can't process). it seems, all traffic that was supposed to be sent to the failed node is transferred to the same unique destination.

That implementation makes my solution for redundancy worthless.

Why didn't the implementation use, for instance,  a re-hash over the hash? That would allow to redistribute "evenly" over the remaining servers. 

I tried to solve this by repeating the entries in dispatcher.list, as I noticed that kamailio doesn't check if different lines contain duplicated uris. For instance, if I have two lines with  "destination A", and one line with "destination B", like this:

0 sip:server_A
0 sip:server_A
0 sip:server_B

I see that "destination A" receives twice the amount of SIP OPTIONS that "destination B" receives.

And so, I used this dispatcher.list :

0 sip:server_1
0 sip:server_2
0 sip:server_3
0 sip:server_4
0 sip:server_5
0 sip:server_1
0 sip:server_3
0 sip:server_5
0 sip:server_2
0 sip:server_4
0 sip:server_1
0 sip:server_4
0 sip:server_2
0 sip:server_5
0 sip:server_3
0 sip:server_1
0 sip:server_5
0 sip:server_4
0 sip:server_3
0 sip:server_2

(it's not a random order. It follows a sequence)

I thought : "what a genius I am.  This way, if any node fails, and kamailio select the "next" to send the traffic to, it will distribute evenly over the rest".

It doesn't work. I detected something that looks like a bug.

If one entry is repeated at least once, for instance :

0 sip:server_1
0 sip:server_1
0 sip:server_2
0 sip:server_3
0 sip:server_4
0 sip:server_5

if server_1 is down,  kamailio still sends some of the INVITEs to it. Not all of them, but many.  I can see the SIP OPTIONS been sent to it, and the response "ICMP Destination Unreachable", so Kamailio knows that server_down. However, still sends INVITE requests to that failed node. Look at this trace :



You can see all the failed SIP OPTIONS. Still,  kamailio sends traffic to that server.

It does not send traffic to a failed destination if it's listed only once in dispatcher.list

Any ideas?

Thanks in advance,

Luis

-- 
Luis Rojas
Software Architect
Sixbell
Los Leones 1200
Providencia
Santiago, Chile
Phone: (+56-2) 22001288
mailto:luis.rojas@sixbell.com
http://www.sixbell.com