Hi,
I've identified a lock contention issue in the rtpengine module's ping timer that can significantly impact call processing under certain conditions.
**Problem:** The `rtpengine_ping_check_timer()` function, triggered periodically by `ping_interval`, holds locks on `rtpp_set_list` for the entire duration of pinging all rtpengine nodes.
Normally, this works find, but when rtpengine instances become unreachable, the lock is held for an long time. For example:
- 2 unreachable rtpengine nodes - 1000ms command timeout - 5 retries per node - locks held for ~10 seconds
**Proposed Solution:** Split the ping operation into three phases: 1. **Phase 1 (with lock):** Create a snapshot of node pointers 2. **Phase 2 (without lock):** Perform actual ping operations on the snapshot 3. **Phase 3 (with lock):** update node state
Since rtpengine nodes are never freed during runtime, the pointers in the snapshot remain valid throughout the ping cycle so lock is not required.
I have implemented and tested this change in my deployment. Should I submit a pull request, or is there a preferred alternative approach?
regards, Rajneesh