I'm not sure if I can participate, but one issue that could be discussed is how to distribute load to many worker processes in situations when one Kamailio forwards requests to another over TCP or TLS.
If I have understood correctly, by default, sending Kamailio reuses existing TCP or TLS connection, which at the receiving Kamailio is terminated to a single worker process.
If so, one solution could be a new core function similar to set_forward_no_connect(), e.g., set_forward_connect(<number>), where <number> tells the how many parallel connections is desired to the destination. t_relay() would then setup a new connection if <number> has not been reached yet.
I'm sure there are other solutions too.
-- Juha