Hi Igor.

Well, indeed I've already done the solutions with heartbeat, but the main idea now is to minimize the absense of SIP server.

Heartbeat need time (that depends on your condigurations) to understand that primary is down, e.g. you have dead interval set to 10 seconds, so if no activity has noticed while this period, the node is considered as dead. But, if you will set this interval lower, e.g. 2-3 seconds, you get the risk to obtain flaps (e.g. there is a delay within ip route from slave to primary node, so slave brings up the shared ip and start to process calls, but real master works fine indeed and have possibility to communicate).

So according to heartbeat, I decided to perform it only inside same physical domain, where ucast/bcast packets will reach other node without any problems.

According to my actual question, I've moved further and now think following scheme will work fine:

1. NAPTR records for every transport protocol (e.g. " _sip._udp.domain.org").

2. SRV records for every NAPTR record (e.g. kamailio1.domain.org, kamailio2.domain.org) with same priority/weight for both of them, to balance half invites to first one and half invites to second one.

3. A records for every domain name (e.g. kamailio1.domain.org - 10.0.0.1, 10.0.0.2, where actually second one is kamailio2;

and the same for fqdn kamailio2.domain.org - 10.0.0.2, 10.0.0.1).

So the sequence of dialog actions will be

1. Invite from uac is balanced to kamailio1;

2. Dialog is established and media stream is up;

3. Then kamailio1 goes down;

4. Bye message tries to achieve host that was set in rr hf (kamailio1), but kamailio1 (10.0.0.1) is down, so bye message will be sent to 10.0.0.2 (kamailio2) and a cause of the behaviour is 10.0.0.2 ip assigned to kamailio1 fqdn as second ip.

5. The message will be processed by kamailio2, because of common dialog/usrloc db.

I will make an effort to set up it next week.

In case of success, I will write a short report here.

2017-08-25 17:26 GMT+03:00 Donat Zenichev <donat.zenichev@gmail.com>:

I've searched through the sr users list and found a few discussions on this count.

So the way (as I think) that is more relevant for kamailio failover, is solution with DNS: NAPTR -> SRV records.

Like:

NAPTR record:
"IN NAPTR 10 10 SIP+D2U "" _sip._udp.domain.org"

SRV records:
"_sip._udp.domain.org SRV 10 1 5060 kamailio1.domain.org"
"_sip._udp.domain.org SRV 10 1 5060 kamailio2.domain.org"

A records:
"kamailio1 IN A 10.0.0.1"
"kamailio2 IN A 10.0.0.2"

So each kamailio will add rr with own hostname - e.g. kamailio1.domain.org
So that, client will send in-dialog requests to route with fqdn kamailio1.domain.org
And I can't add to rr sip.domain.org, because every new request (whatever it is initial or indialog) will be sent to one of the kamailio servers, but I need to send in-dialog requests to the same kamailio.

So for the goal of failover, I need to have more A records, like:
"kamailio1 IN A 10.0.0.1"
"kamailio1 IN A 10.0.0.2"
"kamailio2 IN A 10.0.0.2"
"kamailio2 IN A 10.0.0.1"

And in case when kamailio 1 goes down, uac will have two ip dst to send request: 10.0.0.1 and 10.0.0.2 (where indeed second one is kamailio2).
So as result I will have one database for userlocation and dialog module, and loadbalancing based on SRV priority/weight fields.

And as failover, A records, that give possibility to send requests first to 10.0.0.1 and second to 10.0.0.2 (if rr was bind to kamailio1).
And otherwise, if rr was defined as kamailio2, first request tries to achive kamailio1 and then kamailio2.

Am I right at this point?

2017-08-22 21:57 GMT+03:00 Donat Zenichev <donat.zenichev@gmail.com>:
Hi.

I came up with idea to set up stand with two kamailio and one b2bua server (for routing).

The idea consists of failover for dialogs, transactions.
So if one of kamailio nodes is down, another one is able to catch up the dialog and let users to properly end up the session.

For better realizing of it, I will try to describe the idea step by step:
1. UAC invites UAS, they've done three-way-handshake, media stream is up.
2. Kamailio that processed this dialog is down.
3. Users decided to end the session with BYE method, but proxy that processed their three-way-handshake recently is down, so one of ua sends BYE to the destination route that contains domain name (that both kamailio serve), BYE achieves the second kamailio to let him properly end the dialog.
But, there is a big but, this second kamailio hasn't ever known about this dialog, he doesn't support any transactions for it and furthermore he doesn't know anything about this call-id.

So the solution for it, as I think, is hidden in db mode for user location (columns that contain call-ids, branches etc.
But I need to be sure, if I'm on the right way.

For purpose, where one ip is served by two nodes, I have two solutions:

-First one. I want to create heartbeat cluster with two kamailio nodes, they will have one shared ip address, so when one node gets down, another one brings up shared ip interface and implements the same actions that master does.

-Another method is to assign a few ip addresses to one domain name (ip addresses of different kamailio proxies).

So the goal looks simple, if someone has ever done something like that, I will be glad to read the ideas.

--
--
BR, Donat Zenichev
Wnet VoIP team
Tel: +380(44) 5-900-808
http://wnet.ua

--
--
BR, Donat Zenichev
Wnet VoIP team
Tel: +380(44) 5-900-808
http://wnet.ua

--
BR, Donat Zenichev
Wnet VoIP team
Tel: +380(44) 5-900-808
http://wnet.ua