### Description
References: #3635
Early initialization of tls in OpenSSL 3 in rank 0 results in the use of shared state
(pointers to the same shared memory location). Note: this is not related to shared memory
allocation contention as this protected by a (multi-process)futex. Under heavy traffic the
workers will corrupt each others state (race condition). This is only visible under heavy
loading and is the reason for the intermittent appearance in #3635.
Ping @miconda
https://github.com/kamailio/kamailio/commit/1a9b0b63617afebcee2aecb3b2240d7…
Since qm/fm are already protected by a multi-process futex this commit is redundant (it
puts a pthread mutex around the futex). I have been able to reproduce OpenSSL 3 crashes
with heavy loading with this commit.
Example of shared object is the error state (SSL_get_error). It should be per worker but
```
CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed
pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls:
tls_init.c: ser_free(535) - ignoring
```
shows that multiple workers are accessing the same object (in OpenSSL this is `ERR_STATE
*`).
### Troubleshooting
N/A
#### Reproduction
* create heaving loading of TLS clients
* observe shm logging errors
#### Debugging Data
Dump `ERR_STATE *` from two different processes: observe that these are identical meaning
both workers are using the same struct.
```
# !!!!IMPORTANT!!!! 2 == OpenSSL thread local key for ERR_STATE *
(gdb) p err_thread_local
$3 = 2
# this is worker 1, process_no 7
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1,
timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$1 = (void *) 0x7f1d700a65a0
# this is worker 2, process_no 8, rank 4
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/stun.so...
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--c
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1,
timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$2 = (void *) 0x7f1d700a65a0
```
#### Log Messages
```
CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed
pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls:
tls_init.c: ser_free(535) - ignoring
```
#### SIP Traffic
N/A
### Possible Solutions
- avoid doing TLS initialization in rank 0; this will require cooperation from all modules
that use OpenSSL 3 themselves. If an OpenSSL-using module initializes state before
`fork()` then all workers will reuse global state
### Additional Information
- OpenSSL 3 has initialize-once and initialize-once-per-thread states. An example of
this is `ERR_STATE *` - this is of the type initialize-once-per-thread.
- when kamailio does OpenSSL initialization in rank 0, the workers inherit all
"global" objects. If these objects are in shared memory then the worker
processes will contend for the same state leading to corruption
- due to the design of the OpenSSL 3 alot of this state (static variables, one time
initialization) cannot be reinitialized after `fork()`.
"initialize-once-per-thread" can be reinitialized if the child were to spawn
threads.
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3695
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3695(a)github.com>