### Description during sipp "tls caller and tls callee" test, do tls.reload repeatedly, the sipp tls connection will be disconnected and tls.reload will report error and can not be recovered
### Troubleshooting
#### Reproduction (1)write a simple request_route logic in kamailio.cfg, as follows: request_route { add_local_rport(); $du = "sip:192.168.131.190:1001;transport=tls"; forward(); } (2)start sipp caller to kamailio tls:0.0.0.0:5061 and kamailio will relay this request to callee (3)at the same time, in another console, run the following cmds: ((i = 0)); while true; do /opt/kamailio/sbin/kamcmd tls.reload && echo $i && ((i = i+1)); done;
#### Debugging Data None
#### Log Messages tls.reload will report the following error, and it can not work again until kamailio is restarted: error: 500 - Error while loading TLS configuration file (consult server log)
at the same time, sipp tls connection is disconnected by kamailio, some error is shown in the kamailo.log: -------------------------------------------------------------------------------------------------------- Dec 14 10:59:54 localhost sipproxy[64723]: ERROR: tls [tls_server.c:1330]: tls_h_read_f(): protocol level error Dec 14 10:59:54 localhost sipproxy[64723]: ERROR: tls [tls_server.c:1334]: tls_h_read_f(): src addr: 192.168.131.190:59545 Dec 14 10:59:54 localhost sipproxy[64723]: ERROR: tls [tls_server.c:1337]: tls_h_read_f(): dst addr: 192.168.131.190:5061 Dec 14 10:59:54 localhost sipproxy[64723]: ERROR: <core> [core/tcp_read.c:1478]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7f3f67c89ba8 r: 0x7f3f67c89cd0 (-1) -------------------------------------------------------------------------------------------------------- #### SIP Traffic
<!-- If the issue is exposed by processing specific SIP messages, grab them with ngrep or save in a pcap file, then add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` ```
### Possible Solutions
<!-- If you found a solution or workaround for the issue, describe it. Ideally, provide a pull request with a fix. -->
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.6.2 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled on 10:03:56 Dec 14 2022 with gcc 4.9.4 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `lsb_release -a` and `uname -a`) -->
``` Linux localhost.localdomain 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux ```
Hello, thanks for the report. Short answer is, don't do that. If you only reload e.g. every few months for a certificate update, it should be fine. The module could be extended to enforce a reload timer limit, similar like to other modules, so this can be considered as feature-request. Please note if nobody create a pull-request, after some time it will be closed. If somebody want to work on it, some inspiration can be found e.g. at the permissions module, reload_delta parameter.
hi, thanks for the reply. I tried to loadmodule permission.so and the default reload_delta is 5,the issue can still be repeated. error: 500 - Error while loading TLS configuration file (consult server log). And this issue can not be recovered until restart. Currently, without sipp call test, only frequent tls.reload will be fine. But mix with call test, the issue will be shown.
actually, in our online environment, only one tls.reload every month can cause crash randomly
Regarding permissions module, there seems to be a misunderstanding. I was referring about a similar mechanism that is not implemented right now for TLS module. If you are also seeing this for "normal" use, its another story, this is course more serious and should investigated.
actually, we use 4.4.7 to meet this issue: in our environment tls.reload should be executed per month,some hosts will crash,but another hosts will not。 so we try to use latest 5.6.2 to repeat issue. In order to accelerate the repeating, we use sipp and repeated tls.reload. In 5.6.2, crash will not happen, but the above issue will happened
is there any fixing from 4.4.7 to 5.6.2 to fix this crash issue ?
Ok, thanks for the clarification. They were many fixes from 4.4.x to 5.6.x regarding the TLS module. You should consider upgrading the Kamailio server, as it will fix also many other bugs.
ok, we will discuss upgrading later, thank you very much
we will use latest 5.6.2 version. we want to try to import this repeated tls.reload issue, is there any advice ?
Sorry, do not understand the question. If you have general questions regarding the usage of the TLS module, we should discuss it on our sr-users mailing list.
tls.reload may cause crash. Any suggestions for modification? We seem to have not found the reason?
currently we use latest kamailio v5.6.2. During our test ,we enlarge the gap of tls.reload to 5 seconds to avoid ctl reloading issue(your advice), and then we do sipp relaying test. At this time, there is no crash, but there is an very strange log in worker task, as follows: Dec 22 05:37:58 localhost sipproxy[63374]: ERROR: tls [tls_server.c:1330]: tls_h_read_f(): protocol level error Dec 22 05:37:58 localhost sipproxy[63374]: ERROR: tls [tls_server.c:1334]: tls_h_read_f(): src addr: 192.168.131.190:49260 Dec 22 05:37:58 localhost sipproxy[63374]: ERROR: tls [tls_server.c:1337]: tls_h_read_f(): dst addr: 192.168.131.190:5061 Dec 22 05:37:58 localhost sipproxy[63374]: ERROR: <core> [core/tcp_read.c:1478]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7f8a6d2d3f48 r: 0x7f8a6d2d4070 (-1) do you know the reason? and is it serious ? and how could we avoid this error ? thank you very much!!!
You should only do a tls reload when the you change the certificate. Usually this happes not often, like once a month or so.
1.Although it was a low-frequency operation, But we found a serious problem with the code that caused this problem using the openSSL API.
2.reason: The main process first initializes the TLS module, causing the OpenSSL Error queue to initialize, followed by the fork process, the child process does not initialize in the error queue (multiple processes share the error memory), and the OpenSSL API of multiple child processes may have a double free when the error queue is free
main.cp for init TLS #ifdef USE_TCP #ifdef USE_TLS if (!tls_disable){ if (!tls_loaded()){ LM_WARN("tls support enabled, but no tls engine " " available (forgot to load the tls module?)\n"); LM_WARN("disabling tls...\n"); tls_disable=1; } else { if (pre_init_tls()<0){ LM_CRIT("could not pre-initialize tls, exiting...\n"); goto error; } } } #endif /* USE_TLS */ #endif /* USE_TCP */
--------->openssl err.c this state , multiple child processes ,share the error memory,and may have a double free when the error queue is free ERR_STATE *ERR_get_state(void) { ERR_STATE *state; int saveerrno = get_last_sys_error();
if (!OPENSSL_init_crypto(OPENSSL_INIT_BASE_ONLY, NULL)) return NULL;
if (!RUN_ONCE(&err_init, err_do_init)) return NULL;
state = CRYPTO_THREAD_get_local(&err_thread_local); if (state == (ERR_STATE*)-1) return NULL;
if (state == NULL) { if (!CRYPTO_THREAD_set_local(&err_thread_local, (ERR_STATE*)-1)) return NULL;
if ((state = OPENSSL_zalloc(sizeof(*state))) == NULL) { CRYPTO_THREAD_set_local(&err_thread_local, NULL); return NULL; }
if (!ossl_init_thread_start(OPENSSL_INIT_THREAD_ERR_STATE) || !CRYPTO_THREAD_set_local(&err_thread_local, state)) { ERR_STATE_free(state); CRYPTO_THREAD_set_local(&err_thread_local, NULL); return NULL; }
/* Ignore failures from these */ OPENSSL_init_crypto(OPENSSL_INIT_LOAD_CRYPTO_STRINGS, NULL); }
set_sys_error(saveerrno); return state; } 此致, 敬礼! 姓名 宋伟
信令部 软件工程师 ***@***.*** 电话:0571-86849591-3618
---- Replied Message ---- From Henning ***@***.***>Date 12/31/2022 00:26To ***@***.***>Cc ***@***.***> , ***@***.***>Subject Re: [kamailio/kamailio] limit tls.reload interval to prevent memory corruption in case of to frequent reloads (Issue #3305)
You should only do a tls reload when the you change the certificate. Usually this happes not often, like once a month or so. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>
This is implemented now with the help of the new `rpc_exec_delta` core parameter.
The issue with openssl error queue initialization will be tracked on #3319.
Closed #3305 as completed.