<!-- Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* https://lists.kamailio.org/mailman3/postorius/lists/sr-users.lists.kamailio....
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* https://lists.kamailio.org/mailman3/postorius/lists/sr-dev.lists.kamailio.or...
Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment). -->
### Description
I have 2 jsonrpc servers configured with different prio's. For testing, I have the servers configured to always delay the response to any request by more than the module's timout setting.
The (initial) request is sent to the first server. As this one times out, I would expect a retry to go to the second servers, but instead, all retries are sent to the same server. The backup server is never contacted. This makes the whole "prio" system seem a bit useless. <!-- Explain what you did, what you expected to happen, and what actually happened. -->
### Troubleshooting
#### Reproduction
``` modparam("janssonrpcc", "server", "conn=test;addr=pc1;port=8081;priority=5;weight=10") modparam("janssonrpcc", "server", "conn=test;addr=pc1;port=8082;priority=5;weight=10")
```
``` janssonrpc_request("test", "Test.Timeout", '[ { "Timout": 1000} ]', "route=JSONRPC_RESPONSE;retry=10;timeout=1000"); ``` <!-- If the issue can be reproduced, describe how it can be done. -->
#### Log Messages
No useful logs are produced. I verified the described behavior on the jsonrpc server.
<!-- Check the syslog file and if there are relevant log messages printed by Kamailio, add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` 2023-02-23T16:59:34.585346+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:361]: bev_connect(): Connecting to server pc1:8081 for conn rating. 2023-02-23T16:59:34.585420+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:361]: bev_connect(): Connecting to server pc1:8082 for conn rating. 2023-02-23T16:59:34.585446+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:290]: bev_connect_cb(): Connected to host pc1:8081 2023-02-23T16:59:34.585462+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:290]: bev_connect_cb(): Connected to host pc1:8082 2023-02-23T17:05:10.903398+01:00 pc1 proxy1[340870]: WARNING: janssonrpcc [janssonrpc_request.c:247]: schedule_retry(): Number of retries exceeded. Failing request. ```
### Possible Solutions
<!-- If you found a solution or workaround for the issue, describe it. Ideally, provide a pull request with a fix. --> Retry in combination with a timeout and prio's is a bit tricky. When do what? Just retrying on the first prio makes the lower prio servers completely useless, while going to the next prio on every retry skips possibly useful high-prio servers and may exhaust the number of candidate servers very fast.
Best solution IMHO would be to first try every server in the highest prio, before going to the next prio. Do not do (exponential) backoff between these steps.
If there are still retries remaining after that, wrap around to the highest prio with the exponentional backoff delay.
With the above, failover considers all servers and failover between servers is fast while not overloading a single server.
BTW. If I configure multiple servers per prio, it seems to randomly select one of them for every (re)try. It never selects one form the next prio.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` 5.6.1 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `lsb_release -a` and `uname -a`) -->
``` (paste your output here) ```
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Closed #3378 as completed.
Reopened #3378.
Closed #3378 as completed.
Reopened #3378.
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Well, IMHO this just rude. Reporting bugs takes a lot of time. (Threatening of) closing them without any interaction is an insult to the users trying to help improve the product.
@gaaf: you have to be patient because this is a process that we try to put in place in order to optimise/automise some devel-related tasks while couple of people are meeting now in Dusseldorf at Kamailio Developers Meeting.
If someone needs to keep an issue or pull request open, it can do it, the message is saying to remove the stale label.
For the record, the bot aparently removes the stale flag automatically now when there is activity.
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Closed #3378 as not planned.