Pre-Submission Checklist

Type Of Change

Checklist:

Description

I've recently being experiencing a loop in nodes removal/addition leading to "ghost nodes".
Suppose to have three servers A,B,C.
Server C goes down not cleanly, so DMQ doesn't notify the other nodes. Server A is the first to send its ping, with a nodelist including node C. After fr_timer, the transaction for the message to node C times out and the node is removed from node A nodelist.
Then node B sends its ping with a nodelist including node C (still alive for A), node A sees node C as a new node and adds it back to its nodelist. Now node B reaching fr_timer timeout removes node C, until next node's A ping, and so on. This does not occur if the delta between node A and node B pings is less than fr_timer.
What I propose here is that, upon a failed ping, the failing node is put in disabled state and we wait a 2nd failed ping before removing it from the nodelist. This should prevent dead nodes to come back.


You can view, comment on, or merge this pull request online at:

  https://github.com/kamailio/kamailio/pull/1840

Commit Summary

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.