<!-- Kamailio Pull Request Template -->
<!-- IMPORTANT: - for detailed contributing guidelines, read: https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md - pull requests must be done to master branch, unless they are backports of fixes from master branch to a stable branch - backports to stable branches must be done with 'git cherry-pick -x ...' - code is contributed under BSD for core and main components (tm, sl, auth, tls) - code is contributed GPLv2 or a compatible license for the other components - GPL code is contributed with OpenSSL licensing exception -->
#### Pre-Submission Checklist <!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply --> <!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above--> <!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list --> - [X] Commit message has the format required by CONTRIBUTING guide - [X] Commits are split per component (core, individual modules, libs, utils, ...) - [X] Each component has a single commit (if not, squash them into one commit) - [X] No commits to README files for modules (changes must be done to docbook files in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change - [ ] Small bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds new functionality) - [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist: <!-- Go over all points below, and after creating the PR, tick the checkboxes that apply --> - [ ] PR should be backported to stable branches - [X] Tested changes locally - [X] Related to issue #3297
#### Description <!-- Describe your changes in detail --> The setup is a kamailio with no available rtpengine. (e.g. has a list with rtpengine urls but those are not reachable)
Recently had a crash similar to #3297, but in an outdated kamailio version (5.5.7). Tried to reproduce this in 5.5.7 and 5.8.3 but could not crash any of them. However I managed to get the logs similar to ones right before the crash happened:
``` 2024-10-04T17:39:58.026206+03:00 kamailio[450237]: ERROR: {1 1 INVITE 203-450290@192.168.100.93} <core> [core/action.c:1595]: run_actions(): alert - action [corefunc (16)] cfg [/home/stefan/kamailio.cfg:625] took too long [29310906 us] ```
...directly related to how much rtpengine_manage() function took to execute. So routing of SIP is delayed by that ammount.
Tracked this in code, down to where this lock is get, when "aggressive_redetection" modparam is enabled: https://github.com/kamailio/kamailio/blob/66fe6eb71e58a02222d1a2fb00f9a0cdb8...
I will double check that part of the code, since I don't think a lock get is necessary. It only updates a value inside a node, inside the list of nodes, but not changes the list links at all. (this in another PR)
For now I propose to disable this aggressive_redetection mechanism by default. Since it delays the SIP routing logic when no rtpengines available (and in some cases lead to crashes in transaction module). You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/3992
-- Commit Summary --
* rtpengine: disable aggressive redetection by default
-- File Changes --
M src/modules/rtpengine/config.c (2) M src/modules/rtpengine/doc/rtpengine_admin.xml (6)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/3992.patch https://github.com/kamailio/kamailio/pull/3992.diff
Done a few more tests with a large list of 16 x rtpengines, all disabled. Routing of SIP is still delayed even without the above lock_get(). It depends on the number kamailio processes used and on the "rtpengine_disable_tout" modparam.
So I think this is yet another reason to disable "aggressive_redetection" by default.
Seems like a smart change to me.
I will double check that part of the code, since I don't think a lock get is necessary. It only updates a value inside a node, inside the list of nodes, but not changes the list links at all. (this in another PR)
I think you're right. However, you would need a memory fence to safely access the values in shared memory without a lock. Also, without the lock, you might get multiple processes doing the health check at the same time, but that's not a big deal. If you want to be fancy, you could make it so that other processes skip the instance while some process is doing busy the health check, and only use it (or continue skipping it) after the health check is done.
Merged #3992 into master.