Hi Igor.
Well, indeed I've already done the solutions with heartbeat, but the main idea now is to minimize the absense of SIP server.
Heartbeat need time (that depends on your condigurations) to understand that primary is down, e.g. you have dead interval set to 10 seconds, so if no activity has noticed while this period, the node is considered as dead. But, if you will set this interval lower, e.g. 2-3 seconds, you get the risk to obtain flaps (e.g. there is a delay within ip route from slave to primary node, so slave brings up the shared ip and start to process calls, but real master works fine indeed and have possibility to communicate).
So according to heartbeat, I decided to perform it only inside same physical domain, where ucast/bcast packets will reach other node without any problems.
According to my actual question, I've moved further and now think following scheme will work fine:
1. NAPTR records for every transport protocol (e.g.
" _sip._udp.domain.org").
2. SRV records for every NAPTR record (e.g. kamailio1.domain.org, kamailio2.domain.org) with same priority/weight for both of them, to balance half invites to first one and half invites to second one.
3. A records for every domain name (e.g. kamailio1.domain.org - 10.0.0.1, 10.0.0.2, where actually second one is kamailio2;
So the sequence of dialog actions will be
1. Invite from uac is balanced to kamailio1;
2. Dialog is established and media stream is up;
3. Then kamailio1 goes down;
4. Bye message tries to achieve host that was set in rr hf (kamailio1), but kamailio1 (10.0.0.1) is down, so bye message will be sent to 10.0.0.2 (kamailio2) and a cause of the behaviour is 10.0.0.2 ip assigned to kamailio1 fqdn as second ip.
5. The message will be processed by kamailio2, because of common dialog/usrloc db.
I will make an effort to set up it next week.
In case of success, I will write a short report here.