Hi Daniel,

Tks a lot for lookint at it.

$ ldd /usr/lib/x86_64-linux-gnu/kamailio/modules/tls.so
        linux-vdso.so.1 (0x00007fff997dd000)
        libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007fe40b53c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe40b19d000)
        libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007fe40ad03000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe40aaff000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe40a8e2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe40ba4a000)

$ /usr/sbin/kamailio -I
Print out of kamailio internals
  Version: kamailio 5.3.1 (x86_64/linux)
  Default config: /etc/kamailio/kamailio.cfg
  Default paths to modules: /usr/lib/x86_64-linux-gnu/kamailio/modules
  Compile flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
  MAX_RECV_BUFFER_SIZE=262144
  MAX_URI_SIZE=1024
  BUF_SIZE=65535
  DEFAULT PKG_SIZE=8MB
  DEFAULT SHM_SIZE=64MB
  ADAPTIVE_WAIT_LOOPS=1024
  TCP poll methods: poll, epoll_lt, epoll_et, sigio_rt, select
  Source code revision ID: unknown
  Compiled with: gcc 6.3.0
  Compiled architecture: x86_64
  Compiled on:
Thank you for flying kamailio!

Additional note:
I have tried to better understand the pike module and after reading the "end" of the module documentation,
I do better understand the "Tree of IP" and settings.

The pike documentation, for each settins and description, should refer to the section "Chapter 3. Developer Guide",
otherwise, the parameters cannot be understood. Also, it's not possible to understand, according to me, the real time
for removing an IP from the tree (removing it 100% or only last node of IP)

Looking again at my statistics, I feel the first graph is definitly showing an issue.  This graph is showing
"$stat(location-users)" and "$stat(location-contacts)". During the 10 hours, many users are banned, unregistred, etc..
so it is really not expected that the number of registred users is maintained. From what I understand, the fact
that the stats went down when deadlock dissapeared obviouly means kamailio threads was in a bad state for the
last 10 hours...

https://www.antisip.com/sip-antisip-com-register/status2.htm  

If you need more information, let me know...
Regards
Aymeric

Le lun. 16 déc. 2019 à 08:22, Daniel-Constantin Mierla <miconda@gmail.com> a écrit :

Hello,

can you provide output of ldd for tls.so and output of "kamailio -I" (that's an uppercase i)?

Cheers,
Daniel

On 13.12.19 16:39, Aymeric Moizard wrote:
Hi List,

History:
* In the past, I had deadlock which was, most probably, related to ssl1.1.
  We have discussed this issue, and a fix is supposed to workaround the issue that was detected.
* With latest 5.2.X, I have experienced ONCE a similar behavior with TCP and TLS being mostly stuck. I have not been using this version much, but the fix was supposed to be in the core of kamailio.

The status of the server this night:
* I'm today running version: kamailio 5.3.1 (x86_64/linux), 
* Installed on stretch using http://deb.kamailio.org/kamailio53 repository.
* This versions use libssl1.1
* A user reported that he can't connect with TCP
* An average of 5000 IPs per 10 minutes are being banned by the pike module
   (could be twice the same)
Yesterday/Today:
* at the end of the outage, I had 2479 IP in my ipban htable. (which is equivalent to my statistics showing 2 bans/IP every 10 minutes = 5000)
* looking at my logs, it appears that most (ALL?) ip being banned... are my regular users.
* looking at my logs, I can't understand why pike would block them.

This is a graph for statistics on my service for the last 24 hours:
https://www.antisip.com/sip-antisip-com-register/status2.html  

Yesterday, at 22:18:39, kamailio started to BAN some IPs. 52 IPs were banned in a period of 10 minutes. I can confirm this from my logs.

My pike configuration is this one:

modparam("pike", "sampling_time_unit", 2)
modparam("pike", "reqs_density_per_unit", 64)
modparam("pike", "remove_latency", 4)

When detecting the issue, this morning, I typed:

$> sudo kamctl stats
$> sudo kamcmd htable.dump ipban
//FAILURE (answer too large...)
$> sudo kamctl trap

Then, I started an agent with TCP and it worked...???
Then, a few seconds, may be a minute after:

$> sudo kamcmd htable.dump ipban
//SUCCESS and shows 2479 banned ip.

and... everything is back to normal in a few minutes.

I haven't restarted kamailio, and all statistics are as expected, as usual.

Thus, it looks that " sudo kamctl trap" has triggered something. I already
experienced a similar behavior -when testing my ssl1.1 deadlock last year-.

2 questions:
1/ I beleive my "pike" configuration should not ban users. Is my pike configuration wrong?
As an example, pike has banned an IP sending one message/second. I believe my configuration should accept that?

2/ Could there still be a TLS issue with libssl1.1?

This is the result of the "kamctl trap":


Sorry for the long story & hoping to find a long term solution or at least a workaround!

Regards
Aymeric

--

_______________________________________________
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - April 27-29, 2020, in Berlin -- www.kamailioworld.com


--