Failover with SRV records

List overview All Threads
Download

newer

older

Fwd: Need Clarification on sst...

Settings adjustments

Eric Hiller

29 Nov 2010 29 Nov '10

4:03 a.m.

Sorry all for the second question of the night as I go through working out bugs in my setup. I want to use SRV records to loadbalance across hosts. This works great, when all the hosts listed in the SRV are up. However, I want kamailio to use the next host if the current hosts fails. I tried just setting up a new branch, but kamailio keeps using the same SRV entry over and over on that transaction, only a new transaction seems to give kamailio a chance to use a different SRV entry. Any ideas as to how I can force kamailio to try the next SRV entry if the first one fails? For example I t_relay to 2.domain.com which has a _sip._udp.2.domain.com entry for hostA and hostB. If hostB is down, but kamialio decides to send to hostB it just keeps doing so even though hostA is perfectly up it doesn't try for it. Is there a way to remove hostB from the kamailio try list? Or as an alternative to do a manual SRV lookup on 2.domain.com and then put then in a variable and go through them one by one manually? Ideas? Thanks-Eric

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

marius zbihlei

29 Nov 29 Nov

9:58 a.m.

On 11/29/2010 06:03 AM, Eric Hiller wrote:

...

Sorry all for the second question of the night as I go through working out bugs in my setup. I want to use SRV records to loadbalance across hosts. This works great, when all the hosts listed in the SRV are up. However, I want kamailio to use the next host if the current hosts fails. I tried just setting up a new branch, but kamailio keeps using the same SRV entry over and over on that transaction, only a new transaction seems to give kamailio a chance to use a different SRV entry. Any ideas as to how I can force kamailio to try the next SRV entry if the first one fails?

For example I t_relay to 2.domain.com which has a _sip._udp.2.domain.com entry for hostA and hostB. If hostB is down, but kamialio decides to send to hostB it just keeps doing so even though hostA is perfectly up it doesn't try for it. Is there a way to remove hostB from the kamailio try list? Or as an alternative to do a manual SRV lookup on 2.domain.com and then put then in a variable and go through them one by one manually?

Ideas? Thanks -Eric

Hello,

Kamailio does not use ICMP replies to check if a host is reachable on a UDP sendto(). This means that ICMP port unreachable errors are not handled by K and so the same host is retried on TM retransmits. Host selection of both A records and SRV records is done based on weights returned from the DNS query(so it is balanced)

For a INVITE you can use fr_timer to very low values (5 seconds) and fr_inv_timer to larger value. If no 100 is received than the fr_timer kicks in and depending on the disable_dns_failover core parameter another host is tried. The order of retry is this: Regardless of disable_dns_failover param, other A records are tried (if no 0x4 flag is given to t_relay (at least in K1.5)), then , is dns_failover is active, other SRV records, then NAPTR.

Check the TM documentation for more info.

Marius

P.S. This is on K 1.5 . I hope 3.0/3.1 still has this . I think dns caching has changed so you need to recheck the parameters I gave.

Iñaki Baz Castillo

12:16 p.m.

2010/11/29 marius zbihlei marius.zbihlei@1and1.ro:

...

Kamailio does not use ICMP replies to check if a host is reachable on a UDP sendto(). This means that ICMP port unreachable errors are not handled by K and so the same host is retried on TM retransmits.

AFAIR using raw sockets checking ICMP notifications would be possible (not yet implemented, but possible as I remember from a thread with Andrei). IMHO this would be very useful because if a UDP port is unreachable and there is a ICMP notification about it, the proxy should generate an internal 503 (transport error) rather than a 408 (fr_timer timeout).

-- Iñaki Baz Castillo ibc@aliax.net

marius zbihlei

1:14 p.m.

On 11/29/2010 02:16 PM, Iñaki Baz Castillo wrote:

...

2010/11/29 marius zbihleimarius.zbihlei@1and1.ro:

...
Kamailio does not use ICMP replies to check if a host is reachable on a UDP sendto(). This means that ICMP port unreachable errors are not handled by K and so the same host is retried on TM retransmits.

AFAIR using raw sockets checking ICMP notifications would be possible (not yet implemented, but possible as I remember from a thread with Andrei).

Possible, but not easily implementable, as ICMP Host unreachable are sent asynchronously from the kernel. Also the current sendto() call does not guarantee delivery on all Unixes (Linux should be fine), connected UDP sockets are to be used instead.

...

IMHO this would be very useful because if a UDP port is unreachable and there is a ICMP notification about it, the proxy should generate an internal 503 (transport error) rather than a 408 (fr_timer timeout).

Well, this means that we should disable dns_failover (or equivalents) completely and handle ICMP errors in failure_route blocks(just test if the transaction issued a 503). If I recall RFC 3263 , this would mean another server discovery (as the new request generates a new transaction) so again there is the possibility that the broken host is selected. If we use this dns fallback(IMHO this is a nice feature- I personally rely on this) how do we decide to generate a 503 ?

If the host is already a IP address, that it would be ok to send a 503, as no DNS failover is possible.

Ideas?

Marius

Iñaki Baz Castillo

1:48 p.m.

2010/11/29 marius zbihlei marius.zbihlei@1and1.ro:

...

...
AFAIR using raw sockets checking ICMP notifications would be possible (not yet implemented, but possible as I remember from a thread with Andrei).

Possible, but not easily implementable, as ICMP Host unreachable are sent asynchronously from the kernel. Also the current sendto() call does not guarantee delivery on all Unixes (Linux should be fine), connected UDP sockets are to be used instead.

...
IMHO this would be very useful because if a UDP port is unreachable and there is a ICMP notification about it, the proxy should generate an internal 503 (transport error) rather than a 408 (fr_timer timeout).

Well, this means that we should disable dns_failover (or equivalents) completely and handle ICMP errors in failure_route blocks(just test if the transaction issued a 503).

Humm, I expect that when discovering the destination (DNS SRV) N branches should be generated in serial forking fashion in case there are various priorities in the received response, am I wrong?

...

If I recall RFC 3263 , this would mean another server discovery (as the new request generates a new transaction) so again there is the possibility that the broken host is selected. If we use this dns fallback(IMHO this is a nice feature- I personally rely on this) how do we decide to generate a 503 ?

503 should be the final winning response in case all the branches fail.

...

If the host is already a IP address, that it would be ok to send a 503, as no DNS failover is possible.

Yes.

...

Ideas?

I think that what I've proposed in this mail requires a big change, so... not sure if it's feasible right now.

-- Iñaki Baz Castillo ibc@aliax.net

Eric Hiller

30 Nov 30 Nov

2:35 a.m.

Once I commented out the 3 lines below it works fine to failover.

failure_route[1]{ xlog(" FAILED FAILURE_ROUTE[1]\n"); if(t_any_timeout()){ xlog(" TIMEOUT!\n"); # append_branch(); # xlog(" host is now $rd; all is $ru\n"); # route(1); } }

Just so I understand this correctly, there is not need for a failover route in my case necessarily correct? The only reason I am going to keep this in here is so that the TIMEOUT! can notify me of a host failure and then I can remove the record from DNS so it doesn't keep trying it.

One curious thing is that if say the first invite gets a 401 unauthorized from the 2nd server (the one that is online) when the client responds with its appropriate second invite with the authorization info kamailio 3.1.0 retries the dead host again whereas 1.5.5 does not retry the dead host a second time. That in 1.5.5. could have been a fluke because it completely requeries dns for each invite so it may just have been luke that 10 of the 10 times I tested it it randomly choose the live host the second invite?

Either way this appears to work now.

One question that did result from these tests is that a typical transaction looks like: 0(2856) ERROR: <script>: [Mon Nov 29 20:30:34 2010] INVITE 0(2856) ERROR: <script>: FAILED FAILURE_ROUTE[1] 0(2856) ERROR: <script>: TIMEOUT! 0(2856) ERROR: <script>: [Mon Nov 29 20:30:35 2010] ACK 0(2856) ERROR: <script>: [Mon Nov 29 20:30:35 2010] INVITE

With no FAILED FAILURE_ROUTE[1] or TIMEOUT! on the second invite even though it did timeout. As I am typing this the idea came to me that the reason it didn't fail the second time around is because it did not receive a 401 the second time. If this is the case then what happens when the client isn't unauthorized? Or will the server always reply with a 401 the first time?

-Eric

...

Date: Mon, 29 Nov 2010 14:48:25 +0100 From: ibc@aliax.net To: marius.zbihlei@1and1.ro CC: sr-users@lists.sip-router.org Subject: Re: [SR-Users] Failover with SRV records

2010/11/29 marius zbihlei marius.zbihlei@1and1.ro:

...
...
AFAIR using raw sockets checking ICMP notifications would be possible (not yet implemented, but possible as I remember from a thread with Andrei).

Possible, but not easily implementable, as ICMP Host unreachable are sent asynchronously from the kernel. Also the current sendto() call does not guarantee delivery on all Unixes (Linux should be fine), connected UDP sockets are to be used instead.

...
IMHO this would be very useful because if a UDP port is unreachable and there is a ICMP notification about it, the proxy should generate an internal 503 (transport error) rather than a 408 (fr_timer timeout).

Well, this means that we should disable dns_failover (or equivalents) completely and handle ICMP errors in failure_route blocks(just test if the transaction issued a 503).

Humm, I expect that when discovering the destination (DNS SRV) N branches should be generated in serial forking fashion in case there are various priorities in the received response, am I wrong?

...
If I recall RFC 3263 , this would mean another server discovery (as the new request generates a new transaction) so again there is the possibility that the broken host is selected. If we use this dns fallback(IMHO this is a nice feature- I personally rely on this) how do we decide to generate a 503 ?

503 should be the final winning response in case all the branches fail.

...
If the host is already a IP address, that it would be ok to send a 503, as no DNS failover is possible.

Yes.

...
Ideas?

I think that what I've proposed in this mail requires a big change, so... not sure if it's feasible right now.

-- Iñaki Baz Castillo ibc@aliax.net

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

5331

Age (days ago)

5332

Last active (days ago)

sr-users@lists.kamailio.org

5 comments

3 participants

tags (0)

participants (3)

Eric Hiller
Iñaki Baz Castillo
marius zbihlei