Hello,
We have a weird bug in our Kamailio boxes.
We use the DNS cache (for dns failover to work properly), and on some servers, the cache becomes broken after some time.
The expire value is stuck and do not decrement over time, on the entry and on the records : ` kamcmd> dns.lookup A freeswitch { name: freeswitch type: A size_bytes: 200 reference_counter: 2 permanent: no expires: 2 last_used: 0 negative_entry: no records: { { rr_idx: 0 rr_ip: 10.0.0.1 rr_permanent: no rr_expires: 2 } { rr_idx: 1 rr_ip: 10.0.0.2 rr_permanent: no rr_expires: 2 } { rr_idx: 2 rr_ip: 10.0.0.3 rr_permanent: no rr_expires: 2 } } } `
Expires values are computed each time like this : ` now=get_ticks_raw(); expires = (s_ticks_t)(e->expire-now)<0?-1: TICKS_TO_S(e->expire-now); `
So it would means either : - TICKS_TO_S(get_ticks_raw()) == 0 when the entry is created - TICKS_TO_S(get_ticks_raw()) == 0 when the value is printed when we execute a dns.lookup through rpc.
Details : - Kamailio : 5.2.4 (x86_64/linux) - OS : Debian 9.9 - Running in Docker on Kubernetes cluster
I cannot say how to reproduce, it happens sometimes after running the binary :/
Do you have any idea ?
Kind regards, Mathieu Bodjikian
Little update :
- The problem happens at startup. - The consequence : get_ticks_raw() returns always the same value (set at startup)
So it seems that, sometimes, Kamailio won't boot up correctly, and the timers are not working properly.
We keep digging to try to find why.
Kind regards, Mathieu Bodjikian
Can you get the output of `kamctl trap` when the issue is exposed? If the timer doesn't work properly, then a lot of other components should be affected.
Hello,
I think we found the bug.
It is related to the libssl 1.1 mutex issue.
We load our LCR rules from a postgres server using TLS connection and I think the mtree module got stuck.
We are trying the custom library « openssl_mutex_shared » in the TLS directory and preload it before starting kamailio.
For now it seems that we do not have freezes anymore, I’ll close issue beginning of next week we do not have anymore issue during week-end.
This issue is creating wild and random consequences ! What is the status of this regarding libressl development team ?
Kind regards, Mathieu Bodjikian
Interesting - I think we also had reports related to this for the db_mysql module with a similar issue. Today the workaround related to the libssl 1.1 problem was added to the 5.3-pre version core. It will be also backported to the stable branches after a testing period.
@bodji How does it works out for you, can this be closed? About the question related to the libressl team, I am not aware of any news here. For 5.3.0(-pre) the library workaround have been added to the core.
Hello,
Yes, all is working fine either with the openssl_mutex_shared.so preload at start or the next 5.3.0 release.
I close the issue.
Thanks !
Closed #2063.
Thanks for feedback, I am going to backport to stable branches.