Hi Daniel,
Thanks for the explanation. I've been doing some testing and I've come accross the following situation:
ds_probing_threshold = 1 ds_probing_mode = 0
in failure route (when timeout occurs) I do:
ds_mark_dst("ip")
State changes from active to inactive and mode set to probing is correct, then dispatcher sends 3 ping messages to destination set in probing state, it then recieves no response to the probe and then sets the destination inactive. It looks like the probing for state inactive also honours ds_probing_threshold.
If I wanted to keep pinging the destination, while its down, how would I achieve that? So, for example, i have a destination in active state, the destination goes down for some reason, I mark the destination as inactive but want to keep probing it until it comes back. In this case I will always be sending a probe to the destination, until it comes back, in which case i recieve a 200 ok back and dispatcher sets state back to active.
Currently, what happens is, the destination is active, it crashes, i set state & mode to inactive probing, probe goes out to destination, it times out, dispatcher sets state inactive, no probing. Therefor the destination will never be selected unless manually set to active/trying via fifo command when gateway is back alive.
The kamailio version I was testing with is:
# ./kamailio -V version: kamailio 3.3.0-dev1 (i386/linux) 0b8f2e flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 4MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 0b8f2e compiled on 10:24:56 Oct 28 2011 with gcc 4.1.2
On 27/10/2011 17:49, Daniel-Constantin Mierla wrote:
Hello,
On 10/27/11 5:30 PM, Asgaroth wrote:
Hi Daniel,
[...]
Since with 3.2 seemed that it was lost capability to go inactive after a certain number of failures (ds_probing_threshold), there is a new state 'trying' that can be used for it. Means that you can set a destination in trying state couple of times and then it becomes inactive. In 3.1 it was using a confusing mechanism based on probing mode.
Can you explain this trying state and "lost capability to go inactive after certain number of failures" a little more please and how it relates to the new trying->inactive states. I would like to understand how these states relate so that I can test better.
I was not using the feature in the past, but from the source code I could see that there was a way not to go directly in probing mode (which in the past meant not to select the gateway anymore), but just count failure until a threshold is reached and then set probing.
So if threshold was 3 and there were (in 3.1.x-): ds_mark_dst(p) => state still active (no probing, gateway still selected) ds_mark_dst(p) => state still active (no probing, gateway still selected) ds_mark_dst(p) => state goes to probing (inactive, gateway not selected)
Now (3.3.x+), since probing can be always on, even for active destinations (to detect when they go down), you can get previous like behavior with trying state:
ds_mark_dst(t) => state trying (gateway still selected) ds_mark_dst(t) => state trying (gateway still selected) ds_mark_dst(t) => state goes to inactive (gateway not selected)
Default failure counter threshold is 1, so goes to inactive as soon as you set trying, but you can change it via ds_probing_threshold parameter.
So right now there are states: active, inactive, trying and disabled, plus modes: probing, not-probing. A destination can be selected only if it is active or trying. It will not be selected in inactive and disabled. Probing mode specifies whether keepalives should be sent to destinations, can be done per address or globally with the module parameter ds_probing_mode. If a keepalive is not replied, the address is marked as trying first and later will become inactive if keeps being non-responsive.
OK, so if I understand this above paragraph correctly, if I have ds_probing_mode = 0, then I need to set mode manually to probing for a gateway that has failed "ds_probing_threshold" times? If a server times out and I set state/mode to "ip", then I assume probing will commence. In this case the server will not responde to probe requests (as it has crashed), does this mean then that the state will change to "trying" because there was no probe response recieved from destination?
Probing is no longer a gw selection state, but a mode switch to send keepalives or not to a gateway. So if you want these keepalives and ds_probing_mode=0, you have to set 'p' in any of the states you want keepalives. A matter of the reply code from keepalives, the state in probing mode is changed to active if it is 200ok or a reply code configured in module parameter, or to trying if it is a failure (which may end up in inactive when failure threshold is met). ds_probing_mode controls as well if a keepalive reply will maintain the probing mode or not.
Cheers, Daniel