(Apologies for the long wall of text)
Ideas for developer meeting 2: Rearchitect TLS
Background: Kamailio uses OpenSSL in fork() for load balancing (TLS followed by SIP).The TLS state must be in shared memory so that each worker can pick up where the previous worker has left off.
Over the years this has caused much friction - only OpenSSL 3 and wolfSSL can be used in this manner because they include hooks for memory management
Problems with current architecture Original OpenSSL 3+ has gone heavily down the way of using pthreads, especially thread-local variables.
OpenSSL lookalikes - boringSSL, libreSSL, AWS-LC - don't support memory management to redirect functions to shared memory pools
A pure GPL library like GnuTLS also not enable memory management hooks
Today Kamailio is "fighting" its primary TLS provider, namely OpenSSL 3.
We require pthread symbol overrides (which is a code smell) and have to do thread local tricks for the workers to have a clean state.
With OpenSSL 3 - Kamailio users have an unhappy relationship with memory management. Structures are duplicated per worker(SSL_CTX) to avoid unexpected failures when an SSL object derived from worker A is used in worker B.
Using a single SSL_CTX in the main process - in an attempt to conserve memory - has not been solved yet (my assessment is that this problem can be solved within the current architecture).
History It seems that Kamailio code base has vestiges of when TLS was handled in core (possibly the TCP manager process?).
Proposal 2 This proposal has the spirit that we should not be fighting the libraries we use especially if this library is a key component of many distributions/containers with many eyes on the way we use and abuse OpenSSL.
The Zen of OpenSSL(I just made that up) would suggest we embrace their path forward and have Kamailio work within that boundary.
The GPL of Kamailio would also suggest that we should validate this approach with the ability to use something like GnuTLS.
This proposal is to perform TLS in a thread-pool in the TCP manager so that all TLS related operations are confined to a single process.
When the TCP manager terminates or initiates TLS it should perform TLS in a hairpin socketpair(managed by a thread pool). The socket fd used in sendmsg/recvmsg with the worker is this internal proxy'ied socket but carries metadata about the original connection.
In other words: Kamailio TCP manager internally implements a TLS/TCP bridge like Nginx/HAProxy do with HTTPS/HTTP or TLS/TCP. These proxies send headers so the worker is informed about the nature of the original connection.
Benefits - use OpenSSL/AWS-LC/boringSSL/libreSSL in the way they are intended to be used; even GnuTLS - no pthreads hackery - a well-defined boundary for TLS operations, certificate and key management: allowing for easier scrutiny and audit
The offloading of both TLS/SIP dates from the days when threading was poor on Linux. Today that is no longer the case, so the TLS Manager could handle all TLS and the worker should handle SIP/TCP.
In fact this proposal was inspired by the recent work on UDP enhancement.
Optional riff: we may be able to off-load TLS entirely to HAProxy(or any other TLS/TLS bridge). Instead of handling TLS ourselves Kamailio creates a new type of listener/speaker haproxy-in, haproxy-out. Kamailio "knows" that these sockets are not SIP traffic but internal hairpins to decrypt/encrypt the streams via HAProxy et al. Think of Kamailio as using HAProxy as a filter object: the sendmsg/recvmsg will use the haproxy-in socket (which is what the worker will see - TCP only). See the sidenode below.
t_relay would have to taught that it doesn't need to look for "real" TLS sockets but instead a proxy socket should suffice.
Sidenote:
Today TLS can be solved with external TLS/TCP bridges. The main issue with this approach is the config file occasionally needs to force TCP otherwise Kamailio will look for a non-existent TLS socket - so it is not an entirely happy experience for users.
Richard (Shih-Ping)
Hello,
this has been considered for a while, your work with tls_threads_mode has left it out of focus, but probably needs to be addressed anyhow. I am not sure if the time during the developers meeting is enough (usually 2 days), unless many people try to work on it. But if there is interest and other people want to contribute, we can start earlier, doing some online meetings (e.g., via jitsi meet, some of us host their own instance) to discuss and eventually code/test together.
Cheers, Daniel
On 28.06.25 03:22, Richard Chan via sr-dev wrote:
(Apologies for the long wall of text)
Ideas for developer meeting 2: Rearchitect TLS
Background: Kamailio uses OpenSSL in fork() for load balancing (TLS followed by SIP).The TLS state must be in shared memory so that each worker can pick up where the previous worker has left off.
Over the years this has caused much friction - only OpenSSL 3 and wolfSSL can be used in this manner because they include hooks for memory management
Problems with current architecture Original OpenSSL 3+ has gone heavily down the way of using pthreads, especially thread-local variables.
OpenSSL lookalikes - boringSSL, libreSSL, AWS-LC - don't support memory management to redirect functions to shared memory pools
A pure GPL library like GnuTLS also not enable memory management hooks
Today Kamailio is "fighting" its primary TLS provider, namely OpenSSL 3.
We require pthread symbol overrides (which is a code smell) and have to do thread local tricks for the workers to have a clean state.
With OpenSSL 3 - Kamailio users have an unhappy relationship with memory management. Structures are duplicated per worker(SSL_CTX) to avoid unexpected failures when an SSL object derived from worker A is used in worker B.
Using a single SSL_CTX in the main process - in an attempt to conserve memory - has not been solved yet (my assessment is that this problem can be solved within the current architecture).
History It seems that Kamailio code base has vestiges of when TLS was handled in core (possibly the TCP manager process?).
Proposal 2 This proposal has the spirit that we should not be fighting the libraries we use especially if this library is a key component of many distributions/containers with many eyes on the way we use and abuse OpenSSL.
The Zen of OpenSSL(I just made that up) would suggest we embrace their path forward and have Kamailio work within that boundary.
The GPL of Kamailio would also suggest that we should validate this approach with the ability to use something like GnuTLS.
This proposal is to perform TLS in a thread-pool in the TCP manager so that all TLS related operations are confined to a single process.
When the TCP manager terminates or initiates TLS it should perform TLS in a hairpin socketpair(managed by a thread pool). The socket fd used in sendmsg/recvmsg with the worker is this internal proxy'ied socket but carries metadata about the original connection.
In other words: Kamailio TCP manager internally implements a TLS/TCP bridge like Nginx/HAProxy do with HTTPS/HTTP or TLS/TCP. These proxies send headers so the worker is informed about the nature of the original connection.
Benefits
- use OpenSSL/AWS-LC/boringSSL/libreSSL in the way they are intended
to be used; even GnuTLS
- no pthreads hackery
- a well-defined boundary for TLS operations, certificate and key
management: allowing for easier scrutiny and audit
The offloading of both TLS/SIP dates from the days when threading was poor on Linux. Today that is no longer the case, so the TLS Manager could handle all TLS and the worker should handle SIP/TCP.
In fact this proposal was inspired by the recent work on UDP enhancement.
Optional riff: we may be able to off-load TLS entirely to HAProxy(or any other TLS/TLS bridge). Instead of handling TLS ourselves Kamailio creates a new type of listener/speaker haproxy-in, haproxy-out. Kamailio "knows" that these sockets are not SIP traffic but internal hairpins to decrypt/encrypt the streams via HAProxy et al. Think of Kamailio as using HAProxy as a filter object: the sendmsg/recvmsg will use the haproxy-in socket (which is what the worker will see - TCP only). See the sidenode below.
t_relay would have to taught that it doesn't need to look for "real" TLS sockets but instead a proxy socket should suffice.
Sidenote:
Today TLS can be solved with external TLS/TCP bridges. The main issue with this approach is the config file occasionally needs to force TCP otherwise Kamailio will look for a non-existent TLS socket - so it is not an entirely happy experience for users.
Richard (Shih-Ping)
Kamailio - Development Mailing List -- sr-dev@lists.kamailio.org To unsubscribe send an email to sr-dev-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender!
Hello,
some updates: during the past several days I added a new mode controlled by the new parameter tcp_main_threads. If set to 1, the tcp main process create a thread for each Kamailio process and the tls read and encode operations are done by these threads. Each Kamailio process when needs to do tcp read or encode passes the task to the corresponding tcp main thread via in memory IPC (pipes and shared memory).
At the first sight, tls read and encode seemed to be the operations sensitive to libssl, the tls connection close/clean seem to be executed by tcp main process anyhow (or by main process at instance shut down).
I tested locally with sipp 3.6.x compiled locally (as 3.7.x from deb package fails to work for tls), with both tcp_main_threads set to 0 (old-style) and 1 (multi-threaded style), sending 3600 registrations with the limit 1000 and rate 500 (sipp -m 3600 -l 1000 -r 500 ...), spiralling once back to Kamailio on different ports (registrations received on 5061, then spiralled once to one of the ports in range 5070-5077): the execution time was the more or less the same. When it was mode 0, a couple of kamailio processes used 15-20% cpu; when it was 1, kamailio main tcp process (multi-threaded now) used about 50% cpu. Cumulative, CPU usage seemed to be the same for both cases.
Memory functions for libssl are still set to shm wrappers, kamailio processes may need to access various attributes (e.g., connection certificates), but the encryption/decryption operations should happen inside a single process (tcp main process).
Help for testing different scenarios with tcp_main_threads=1 would be appreciated, as well as reviewing if other TLS API callbacks (tls hooks) should be executed via tcp main process threads. This should validate if this solution is suitable or not for libssl3 and its multi-threaded oriented design.
Cheers, Daniel
On 01.07.25 14:31, Daniel-Constantin Mierla wrote:
Hello,
this has been considered for a while, your work with tls_threads_mode has left it out of focus, but probably needs to be addressed anyhow. I am not sure if the time during the developers meeting is enough (usually 2 days), unless many people try to work on it. But if there is interest and other people want to contribute, we can start earlier, doing some online meetings (e.g., via jitsi meet, some of us host their own instance) to discuss and eventually code/test together.
Cheers, Daniel
On 28.06.25 03:22, Richard Chan via sr-dev wrote:
(Apologies for the long wall of text)
Ideas for developer meeting 2: Rearchitect TLS
Background: Kamailio uses OpenSSL in fork() for load balancing (TLS followed by SIP).The TLS state must be in shared memory so that each worker can pick up where the previous worker has left off.
Over the years this has caused much friction - only OpenSSL 3 and wolfSSL can be used in this manner because they include hooks for memory management
Problems with current architecture Original OpenSSL 3+ has gone heavily down the way of using pthreads, especially thread-local variables.
OpenSSL lookalikes - boringSSL, libreSSL, AWS-LC - don't support memory management to redirect functions to shared memory pools
A pure GPL library like GnuTLS also not enable memory management hooks
Today Kamailio is "fighting" its primary TLS provider, namely OpenSSL 3.
We require pthread symbol overrides (which is a code smell) and have to do thread local tricks for the workers to have a clean state.
With OpenSSL 3 - Kamailio users have an unhappy relationship with memory management. Structures are duplicated per worker(SSL_CTX) to avoid unexpected failures when an SSL object derived from worker A is used in worker B.
Using a single SSL_CTX in the main process - in an attempt to conserve memory - has not been solved yet (my assessment is that this problem can be solved within the current architecture).
History It seems that Kamailio code base has vestiges of when TLS was handled in core (possibly the TCP manager process?).
Proposal 2 This proposal has the spirit that we should not be fighting the libraries we use especially if this library is a key component of many distributions/containers with many eyes on the way we use and abuse OpenSSL.
The Zen of OpenSSL(I just made that up) would suggest we embrace their path forward and have Kamailio work within that boundary.
The GPL of Kamailio would also suggest that we should validate this approach with the ability to use something like GnuTLS.
This proposal is to perform TLS in a thread-pool in the TCP manager so that all TLS related operations are confined to a single process.
When the TCP manager terminates or initiates TLS it should perform TLS in a hairpin socketpair(managed by a thread pool). The socket fd used in sendmsg/recvmsg with the worker is this internal proxy'ied socket but carries metadata about the original connection.
In other words: Kamailio TCP manager internally implements a TLS/TCP bridge like Nginx/HAProxy do with HTTPS/HTTP or TLS/TCP. These proxies send headers so the worker is informed about the nature of the original connection.
Benefits
- use OpenSSL/AWS-LC/boringSSL/libreSSL in the way they are intended
to be used; even GnuTLS
- no pthreads hackery
- a well-defined boundary for TLS operations, certificate and key
management: allowing for easier scrutiny and audit
The offloading of both TLS/SIP dates from the days when threading was poor on Linux. Today that is no longer the case, so the TLS Manager could handle all TLS and the worker should handle SIP/TCP.
In fact this proposal was inspired by the recent work on UDP enhancement.
Optional riff: we may be able to off-load TLS entirely to HAProxy(or any other TLS/TLS bridge). Instead of handling TLS ourselves Kamailio creates a new type of listener/speaker haproxy-in, haproxy-out. Kamailio "knows" that these sockets are not SIP traffic but internal hairpins to decrypt/encrypt the streams via HAProxy et al. Think of Kamailio as using HAProxy as a filter object: the sendmsg/recvmsg will use the haproxy-in socket (which is what the worker will see - TCP only). See the sidenode below.
t_relay would have to taught that it doesn't need to look for "real" TLS sockets but instead a proxy socket should suffice.
Sidenote:
Today TLS can be solved with external TLS/TCP bridges. The main issue with this approach is the config file occasionally needs to force TCP otherwise Kamailio will look for a non-existent TLS socket - so it is not an entirely happy experience for users.
Richard (Shih-Ping)
Kamailio - Development Mailing List -- sr-dev@lists.kamailio.org To unsubscribe send an email to sr-dev-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender!
-- Daniel-Constantin Mierla (@ asipto.com) twitter.com/miconda -- linkedin.com/in/miconda Kamailio Consultancy, Training and Development Services -- asipto.com
Hi Daniel - great work.
On Tue, 8 Jul 2025, 19:20 Daniel-Constantin Mierla, miconda@gmail.com wrote:
Hello,
some updates: during the past several days I added a new mode controlled by the new parameter tcp_main_threads. If set to 1, the tcp main process create a thread for each Kamailio process and the tls read and encode operations are done by these threads. Each Kamailio process when needs to do tcp read or encode passes the task to the corresponding tcp main thread via in memory IPC (pipes and shared memory).
Thanks Daniel! Just to be clear - you are converting TLS function calls into RPC-style where the 'remote' actually does all the work?
The remote in this case, is a multi-threaded service that handles actual OpenSSL function calls - (IOW a TLS-offloading service).
Regards Richard
Hello,
On 08.07.25 15:55, Richard Chan wrote:
Hi Daniel - great work.
On Tue, 8 Jul 2025, 19:20 Daniel-Constantin Mierla, miconda@gmail.com wrote:
Hello, some updates: during the past several days I added a new mode controlled by the new parameter tcp_main_threads. If set to 1, the tcp main process create a thread for each Kamailio process and the tls read and encode operations are done by these threads. Each Kamailio process when needs to do tcp read or encode passes the task to the corresponding tcp main thread via in memory IPC (pipes and shared memory).
Thanks Daniel! Just to be clear - you are converting TLS function calls into RPC-style where the 'remote' actually does all the work?
The remote in this case, is a multi-threaded service that handles actual OpenSSL function calls - (IOW a TLS-offloading service).
yes, it is rpc-style, passing a pointer to a structure in shared memory to tcp-main-process threads via in memory pipes, so nothing on the net.
I think it can be optimized more over the time, by allocating tls-encoding input-buffers in shared memory at startup and reused, and maybe other operations ... but for PoC, I didn't want to change much in the existing code, and the initial tests does not show any noticeable difference in running classic style vs new style.
The spiralling over 8 connections back to Kamailio was done to simulate also trunk-style connections, with many messages on the same connection. Probably needs to be double-checked, but the read/encode tls callbacks should be run under tcp-connection read/write locks, which should be thread safe, if not some pthread mutexes have to be added, but still it will be as a pair for the tcp-connection read/write locks, so it should not make big difference comparing with the old style.
Cheers, Daniel