References: #3635
Early initialization of tls in OpenSSL 3 in rank 0 results in the use of shared state (pointers to the same shared memory location). Note: this is not related to shared memory allocation contention as this protected by a (multi-process)futex. Under heavy traffic the workers will corrupt each others state (race condition). This is only visible under heavy loading and is the reason for the intermittent appearance in #3635.
Ping @miconda
1a9b0b6
Since qm/fm are already protected by a multi-process futex this commit is redundant (it puts a pthread mutex around the futex). I have been able to reproduce OpenSSL 3 crashes with heavy loading with this commit.
Example of shared object is the error state (SSL_get_error). It should be per worker but
CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring
shows that multiple workers are accessing the same object (in OpenSSL this is ERR_STATE *
).
N/A
Dump ERR_STATE *
from two different processes: observe that these are identical meaning both workers are using the same struct.
# !!!!IMPORTANT!!!! 2 == OpenSSL thread local key for ERR_STATE *
(gdb) p err_thread_local
$3 = 2
# this is worker 1, process_no 7
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$1 = (void *) 0x7f1d700a65a0
# this is worker 2, process_no 8, rank 4
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/stun.so...
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--c
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$2 = (void *) 0x7f1d700a65a0
CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring
N/A
fork()
then all workers will reuse global stateOpenSSL 3 has initialize-once and initialize-once-per-thread states. An example of this is ERR_STATE *
- this is of the type initialize-once-per-thread.
when kamailio does OpenSSL initialization in rank 0, the workers inherit all "global" objects. If these objects are in shared memory then the worker processes will contend for the same state leading to corruption
due to the design of the OpenSSL 3 alot of this state (static variables, one time initialization) cannot be reinitialized after fork()
. "initialize-once-per-thread" can be reinitialized if the child were to spawn threads.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.