vingarzan left a comment (kamailio/kamailio#4191)
Hey Morten!
a seg-fault is indeed better. From my perspective it gives the core-dump/back-trace to be able to fix the code, immediately when the abuse potentially happens. IMHO the peer pointer should always be reset on lock release, because any use beyond that is dangerous (something might free the memory while we're not holding the lock). The optimization of not having to call `get_peer_by_fqdn()` again and re-lock the proper way is not worth the risk. That's why I'm advocating for hiding `p->lock` and instead having a `unlock_peer(peer **p)` which forces `*p = 0` inside.
I was thinking that problems are most likely to happen around timers, simply because otherwise the probability of 2+ other operations happening in parallel is much lower, normally due to normal networking delay. One way to accelerate it then for testing is to reduce the interval at which the timer runs, even to something ridiculously low. Good code should never block or crash.
Re sticky peers by session, I wasn't sure why that would be needed really. Normally I would see a production topology having some DRAs (>1 for redundancy) in-between Kamailio and some Charging Servers, with the message paths being redundant and just a matter of random distribution. The `cdp` stack task is just to pick the next hop of those paths.
But if you don't have a DRA, then I could see why you'd want to have stickiness. Yet I would've implemented it differently maybe. I guess what you'd want and would be well aligned to the Diameter and 3GPP specs would be to set the Destination-Host AVP in the requests (response are routed differently, so not affected). Then let the `cdp` handle routing to peers on its own: would see the Destination-Host as a hint to check if that's an immediate next-hop and prefer it before actually trying to use the realm/app-id routes to decide on another egress connection. Or in short: I don't think that the `cdp` message routing routines need to care about sessions at all, just leave the standard "marker" in the message for them to follow.
I'm not using myself this code in production, so I'm mostly helping here the community a bit. I'd love to overhaul the `cdp` stack a bit though, since I think some parts right now need quite a bit of cleaning to keep them safe. That's why I was offering a call, to maybe see if we could hatch a plan. But to be direct, I'm lacking also a bit the time to spend on it.
Cheers, -Dragos