### Description References: #3635
Early initialization of tls in OpenSSL 3 in rank 0 results in the use of shared state (pointers to the same shared memory location). Note: this is not related to shared memory allocation contention as this protected by a (multi-process)futex. Under heavy traffic the workers will corrupt each others state (race condition). This is only visible under heavy loading and is the reason for the intermittent appearance in #3635.
Ping @miconda https://github.com/kamailio/kamailio/commit/1a9b0b63617afebcee2aecb3b2240d76... Since qm/fm are already protected by a multi-process futex this commit is redundant (it puts a pthread mutex around the futex). I have been able to reproduce OpenSSL 3 crashes with heavy loading with this commit.
Example of shared object is the error state (SSL_get_error). It should be per worker but
``` CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring ```
shows that multiple workers are accessing the same object (in OpenSSL this is `ERR_STATE *`).
### Troubleshooting N/A
#### Reproduction * create heaving loading of TLS clients * observe shm logging errors
#### Debugging Data
Dump `ERR_STATE *` from two different processes: observe that these are identical meaning both workers are using the same struct. ``` # !!!!IMPORTANT!!!! 2 == OpenSSL thread local key for ERR_STATE * (gdb) p err_thread_local $3 = 2
# this is worker 1, process_no 7 Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout); (gdb) p pthread_getspecific(2) $1 = (void *) 0x7f1d700a65a0
# this is worker 2, process_no 8, rank 4 Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/stun.so... Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". --Type <RET> for more, q to quit, c to continue without paging--c 0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout); (gdb) p pthread_getspecific(2) $2 = (void *) 0x7f1d700a65a0 ``` #### Log Messages ``` CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring ```
#### SIP Traffic
N/A
### Possible Solutions
- avoid doing TLS initialization in rank 0; this will require cooperation from all modules that use OpenSSL 3 themselves. If an OpenSSL-using module initializes state before `fork()` then all workers will reuse global state
### Additional Information
- OpenSSL 3 has initialize-once and initialize-once-per-thread states. An example of this is `ERR_STATE *` - this is of the type initialize-once-per-thread.
- when kamailio does OpenSSL initialization in rank 0, the workers inherit all "global" objects. If these objects are in shared memory then the worker processes will contend for the same state leading to corruption
- due to the design of the OpenSSL 3 alot of this state (static variables, one time initialization) cannot be reinitialized after `fork()`. "initialize-once-per-thread" can be reinitialized if the child were to spawn threads.
RFC: OpenSSL 3 POC branch: https://github.com/kamailio/kamailio/tree/space88man/openssl3-poc
What about OpenSSL 1.1.1?
This is also affected by this bug — but 1.1.1 does much less dynamic memory management in `crypto/err/err.c` so attempted double-free is a much lesser occurrence.
Thank you for the detailed analysis and also the POC. I remember from a custom module that I maintained for some year that using threads in Kamailio can bring some challenges due to its multi-process architectures. So its probably need to be discussed more thoroughly. The other option you mentioned would be to just don't do the TLS initialization in rank 0, right? If we need to touch all openssl using modules anyway, maybe this is an easier and less intrusive way?
...just don't do the TLS initialization in rank 0, right? If we need to touch all openssl using modules anyway, maybe this is an easier and less intrusive way?
One solution is to have each module declare a `mod_init_openssl()`; then have a helper function to run `mod_init_openssl` in a transient thread
``` pthread_create(... mod_init_openssl, ) // do OpenSSL stuff here pthread_join(...) ```
Then this thread will disappear after `mod_init` in rank 0—all the OpenSSL thread-local varables in rank 0(thread#1) will be "clean".
BTW this study explains why even OpenSSL 1.1.1 is so odd - per child replicated `SSL_CTX*`, and RNG replacement with `RAND_set_rand_method`. The root cause is the same: there are thread-local variables in rank 0(thread#1) that are replicated in the workers—after `fork()` OpenSSL doesn't properly reinitialize these states.
I have also gone back to look at the OpenSSL 1.1.1 implementation - by putting all initialization (`SSL_CTX_new`, `tls_fix_domains_cfg` etc) into a transient thread none of the workarounds are necessary any more(!) - in particular the `tls_rand.c` stuff is not needed.
To be clear, the dlsym-pthreads stuff(`src/main.c`) is still needed to handle multi-process locks.
@space88man: thanks for digging deep into this one!
I am fine to try the proposed approach in the PR #3696 and see how it goes, the only question would be about the impact of other libs that link behind with libssl, like libcurl or libmysqlclient. Does a similar approach needs for the modules linked with such libraries?
Regarding multi-threading in Kamailio, there are couple of modules already using threads, like `lwsc`, `secsipid/_proc`. Issues with multi-threading and multi-processing is when there are active threads at `fork()` time, but if I got right the PR #3696, the process with rank 0 waits for the created libssl-initialization thread to terminate. Thats why secsipid_proc was split out of secsipid, to bind to it after fork(), which then initialize the golang library (that creates internally threads for memory management). lwsc creates threads only inside child processes, which is after fork().
I have mentioned this before as an issue and it was a driving factor behind creating the API to the curl module. Having multiple modules using Curl that initiates OpenSSL (or non-curl modules initiating OpenSSL) will lead to problems. I remember that Kevin Fleming while working with Asterisk wrote a wrapper library that initialised OpenSSL once only for all modules. I have no idea if that still exists, but if it does, it could be an inspiration.
The tls module initializes very early the libssl, which was ok for libssl 0.9.x, 1.0.x. With libssl 1.1.x we had to import random number generator to go around some of the libssl-specific multi-threading. With libssl 3.x seems to be more impact, somehow related to what was introduced in libssl 1.1.x, but expanded to other globals.
In other words, I don't think that a wrapper library done long time ago, pre libssl 1.1.x/3.x can help nowadays.
Ok thanks guys for your feedback: I am going to proceed with a series of commits to master. These can then be reverted easilyt. To each commit message I will also add the label thread-local to facilitate review and search.
@miconda you are correct regarding libssl-initialization threads in rank 0; they are run-and-done type are will complete before `fork()`.
@space88man: one remark regarding the commit message format in the PR #3696, do not make first line like:
``` tls: POC for OpenSSL 3 ```
Where I assume POC stands for `proof of concept`. Make it more explicit about what the commit does, maybe like:
``` tls: init libssl in separate thread for v3.+ ```
The initial set of commits are in. Verification: OpenSSL 3: - after kamailio is started, gdb attach rank 0 PID, verify that
(gdb) p pthread_getspecific(err_thread_local) # should be 0x0
OpenSSL 1.1.1: - after kamailio is started, gdb attach rank 0 PID and a worker PID, verify that
# in rank 0 and worker rank these pointers should be different (gdb) p ERR_get_state() # in rank 0 these two values should be 0x0 (gdb) p pthread_getspecific(public_drbg) (gdb) p pthread_getspecific(private_drbg)
Closing now with commits on master—reopen with separate issues for OpenSSL 1.1.1/OpenSSL 3.
Closed #3695 as completed.