Hi,
I've identified a lock contention issue in the rtpengine module's ping timer that can significantly impact call processing under certain conditions.
**Problem:**
The `rtpengine_ping_check_timer()` function, triggered periodically by `ping_interval`, holds locks on `rtpp_set_list` for the entire duration of pinging all rtpengine nodes.
Normally, this works find, but when rtpengine instances become unreachable, the lock is held for an long time. For example:
- 2 unreachable rtpengine nodes
- 1000ms command timeout
- 5 retries per node
- locks held for ~10 seconds
**Proposed Solution:**
Split the ping operation into three phases:
1. **Phase 1 (with lock):** Create a snapshot of node pointers
2. **Phase 2 (without lock):** Perform actual ping operations on the snapshot
3. **Phase 3 (with lock):** update node state
Since rtpengine nodes are never freed during runtime, the pointers in the snapshot remain valid throughout the ping cycle so lock is not required.
I have implemented and tested this change in my deployment. Should I submit a pull request, or is there a preferred alternative approach?
regards,
Rajneesh