Hi there,
We are encountering consistent segfaults after rebooting our Kamailio instance with incoming traffic, specifically when using Kamailio 5.7.4. We think this issue did not occur with version 5.7.2, so it seems to have been introduced in either 5.7.3 or 5.7.4.
Due to team bandwidth constraints and the potential impact on production traffic, we don't want to spend time on trying to reproduce the issue. So we have decided to downgrade to 5.6.4, which we confirmed to be stable. (Probably 5.7.2 would be too - but we didn't try).
Unfortunately, our logging was only set to WARNING level, and we did not capture a core dump, so we cannot provide additional details beyond the following logs:
This was with tcp_reuse_ports=yes:
2024-05-17T15:42:55.582475541Z Listening on 2024-05-17T15:42:55.582512370Z [redacted] 2024-05-17T15:42:55.582538161Z tls: 10.X.X.X:5061 advertise Y.Y.Y:5061 2024-05-17T15:42:55.582543750Z Aliases: 2024-05-17T15:42:55.582549081Z tls: [redacted]:5061 2024-05-17T15:42:55.582574890Z 2024-05-17T15:42:55.587876630Z 0(1) WARNING: tls [tls_init.c:978]: tls_h_mod_init_f(): openssl bug #1491 (crash/mem leaks on low memory) workaround enabled (on low memory tls operations will fail preemptively) with free memory thresholds 18874368 and 9437184 bytes 2024-05-17T15:42:55.703927049Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 23 2024-05-17T15:42:55.703972029Z 0(1) ALERT: <core> [main.c:791]: handle_sigs(): child process 15 exited by a signal 11 2024-05-17T15:42:55.703978409Z 0(1) ALERT: <core> [main.c:795]: handle_sigs(): core was generated 2024-05-17T15:42:55.705049839Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 17 2024-05-17T15:42:55.705074209Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 21 2024-05-17T15:42:55.705081209Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 22 2024-05-17T15:42:55.705085879Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 20 2024-05-17T15:42:55.705090319Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 18 2024-05-17T15:42:55.705094649Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 19 2024-05-17T15:42:55.705098879Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 16 2024-05-17T15:42:55.705207399Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 15 2024-05-17T15:42:55.705459439Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 27
Without tcp_reuse_ports=yes, the segfault was always preceded by the following line if any existing TLS connections were stuck in TIME_WAIT:
2024-05-16T19:18:51.654447639Z 9(14) WARNING: {1 1 INVITE XXX@0.0.0.0} <core> [core/tcp_main.c:1301]: find_listening_sock_info(): binding to source address 10.X.X>X:5061 failed: Address already in use [98] 2024-05-16T19:18:51.746994728Z 0(1) ALERT: <core> [main.c:791]: handle_sigs(): child process 14 exited by a signal 11
When the server wasn't handling any traffic, the issue didn't occur even in 5.7.4.
Does anyone have any insights or suggestions on how to address this issue?
Kind regards Stefan