Description

References: #3635

Early initialization of tls in OpenSSL 3 in rank 0 results in the use of shared state (pointers to the same shared memory location). Note: this is not related to shared memory allocation contention as this protected by a (multi-process)futex. Under heavy traffic the workers will corrupt each others state (race condition). This is only visible under heavy loading and is the reason for the intermittent appearance in #3635.

Ping @miconda
1a9b0b6
Since qm/fm are already protected by a multi-process futex this commit is redundant (it puts a pthread mutex around the futex). I have been able to reproduce OpenSSL 3 crashes with heavy loading with this commit.

Example of shared object is the error state (SSL_get_error). It should be per worker but

CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring

shows that multiple workers are accessing the same object (in OpenSSL this is ERR_STATE *).

Troubleshooting

N/A

Reproduction

Debugging Data

Dump ERR_STATE * from two different processes: observe that these are identical meaning both workers are using the same struct.

# !!!!IMPORTANT!!!! 2 == OpenSSL thread local key for ERR_STATE *
(gdb) p err_thread_local
$3 = 2

# this is worker 1, process_no 7
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$1 = (void *) 0x7f1d700a65a0

# this is worker 2, process_no 8, rank 4
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/stun.so...
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--c
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$2 = (void *) 0x7f1d700a65a0

Log Messages

CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring

SIP Traffic

N/A

Possible Solutions

Additional Information


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/3695@github.com>