Have you done a test with tools such as sipp, or was this happening
after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers,
Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution
here is to use set_reply_no_connect()
implemented it. Now the issue has shifted
to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum
number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any
device. Seems this counter for maximum number of connections
now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
<miconda(a)gmail.com>om>:
> Hello,
>
> based on the trap output I think I could figure out what happened there.
>
> You have tcp_children to very low value (1 or so), the problem is not
> actually that one, but the fact that the connection to upstream (the
> device/app sending the request) was closed after receiving the request
> and routing of the reply gets stuck in the way of:
>
> - a reply is received and has to be forwarded
> - connection was lost, so Kamailio tries to establish a new one, but
> takes time till fails because the upstream is behind nat or so based on
> the via header:
>
> Via: SIP/2.0/TLS
>
10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>
> - the reply is retransmitted and gets to another worker, which tries
> to forward it again, but discovers a connection structure for that
> destination exists (created by previous reply worker) and now waits for
> the connection to be released (or better said, for the mutex on writing
> buffer to be unlocked)
>
> - as the second reply waits, there can be other retransmissions of the
> reply ending up in other workers stuck on waiting for the mutex of the
> connection write buffer
>
> The solution here is to use set_reply_no_connect() -- you can put it
> first in request_route block. I think this would be a good addition to
> the default configuration file as well, IMO, the sip server should not
> connect for sending replies and should do it also for requests that go
> behind nat.
>
> Cheers,
> Daniel
>
> On 19.03.19 10:53, Kristijan Vrban wrote:
>> So i had again the situation. But this time, incoming udp was
>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to
>> a group of asterisk machines
>> but the 200 OK reply to the OPTIONS where not processed, so the
>> dispatcher module set all asterisk to inactive, even though they
>> replied 200 OK
>>
>> Attached the output of kamctl trap during the situation. Hope there is
>> any useful in it. Because after "kamctl trap" it was working again
>> without kamailio restart.
>>
>> Best
>> Kristijan
>>
>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
>> <miconda(a)gmail.com>om>:
>>> Hello,
>>>
>>> setting tcp_children=1 is not a god option for scallability, practically
>>> you set kamailio to process a single tcp message at one time, on high
>>> traffic, that won't work well.
>>>
>>> Maybe try to set tcp_children to 2 or 4, that should make an eventual
>>> race appear faster.
>>>
>>> Regarding the pid, if it is an outgoing connection, then it can be
>>> created by any worker process, including a UDP worker, if that was the
>>> one receiving the sip message over udp and sends it out via tcp.
>>>
>>> Cheers,
>>> Daniel
>>>
>>> On 18.03.19 10:09, Kristijan Vrban wrote:
>>>> Hi Daniel,
>>>>
>>>> for testing, i now had set: "tcp_children=1" and so far this
issue did not occur
>>>> ever since. So now value to provide for "kamctl trap" yet.
>>>>
>>>> "kamctl ps" show this two process to handle tcp:
>>>>
>>>> ...
>>>> }, {
>>>> "IDX": 25,
>>>> "PID": 71929,
>>>> "DSC": "tcp receiver (generic) child=0"
>>>> }, {
>>>> "IDX": 26,
>>>> "PID": 71933,
>>>> "DSC": "tcp main process"
>>>> }
>>>> ...
>>>>
>>>>
>>>> Ok, but then is was wondering to see a TCP connection on a udp receiver
child:
>>>>
>>>>
>>>> netstat -ntp |grep 5061
>>>>
>>>> ...
>>>> tcp 0 0 172.17.217.10:5061 195.70.114.125:18252
>>>> ESTABLISHED 71895/kamailio
>>>> ...
>>>>
>>>> An pid 71895 is:
>>>>
>>>> }, {
>>>> "IDX": 3,
>>>> "PID": 71895,
>>>> "DSC": "udp receiver child=2
sock=127.0.0.1:5060"
>>>> }, {
>>>>
>>>>
>>>>
>>>> And if i look into it via "lsof -p 71895" (the udp receiver
child)
>>>>
>>>> ...
>>>> kamailio 71895 kamailio 14u sock 0,9 0t0
>>>> 8856085 protocol: TCP
>>>> kamailio 71895 kamailio 15u sock 0,9 0t0
>>>> 8886886 protocol: TCP
>>>> kamailio 71895 kamailio 16u sock 0,9 0t0
>>>> 8854886 protocol: TCP
>>>> kamailio 71895 kamailio 17u sock 0,9 0t0
>>>> 8828915 protocol: TCP
>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0
>>>> 1680314 type=DGRAM
>>>> kamailio 71895 kamailio 19u IPv4 1846523 0t0
>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
>>>> kamailio 71895 kamailio 20u sock 0,9 0t0
>>>> 8887192 protocol: TCP
>>>> kamailio 71895 kamailio 21u sock 0,9 0t0
>>>> 8813634 protocol: TCP
>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0
>>>> 1681407 type=STREAM
>>>> kamailio 71895 kamailio 23u sock 0,9 0t0
>>>> 8850488 protocol: TCP
>>>> ...
>>>>
>>>> Not only the ESTABLISHED TCP session. But also this empty sockets
>>>> "protocol: TCP"
>>>> What are they doing there in the udp receiver? Is that how it's
supposed to be?
>>>>
>>>> Kristijan
>>>>
>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
>>>> <miconda(a)gmail.com>om>:
>>>>> Can you get file written by `kamctl trap`? It should have the
backtrace
>>>>> for all kamailio processes. You need latest kamailio 5.2.
>>>>>
>>>>> Also, get the output for: kamctl ps
>>>>>
>>>>> Cheers,
>>>>> Daniel
>>>>>
>>>>> On 14.03.19 13:52, Kristijan Vrban wrote:
>>>>>> When i attach via gdb to one of the tcp worker, i see this:
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized
out>,
>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at
>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>>>>> #1 futex_wait_simple (private=<optimized out>,
expected=1,
>>>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
>>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
>>>>>> pthread_rwlock_wrlock.c:67
>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from
>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422
>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
>>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at
>>>>>> core/tcp_read.c:1496
>>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98,
events=1,
>>>>>> idx=-1) at core/tcp_read.c:1862
>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
>>>>>> t=2, repeat=0) at core/io_wait.h:1065
>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
>>>>>> core/tcp_read.c:1974
>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735
>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
>>>>>> <vrban.lkml(a)gmail.com>om>:
>>>>>>> Hi, with full debug is see this in log for every incoming TCP
SIP request:
>>>>>>>
>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING:
no free tcp
>>>>>>> receiver, connection passed to the least busy one (105)
>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected
tcp worker 2
>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>>>>>
>>>>>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>>>>> the tcp workers are somehow busy.
>>>>>>>
>>>>>>> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>>>>>>>
>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
>>>>>>>
>>>>>>> and nothing, even when i see the main tcp process choose this
worker process.
>>>>>>>
>>>>>>> Kristijan
>>>>>>>
>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
>>>>>>> <vrban.lkml(a)gmail.com>om>:
>>>>>>>> first of all thanks for the feedback. i prepared our
system now to run
>>>>>>>> with debug=3
>>>>>>>> I hope to see more then then.
>>>>>>>>
>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan
Vrban
>>>>>>>> <vrban.lkml(a)gmail.com>om>:
>>>>>>>>> Hi kamailios,
>>>>>>>>>
>>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio.
After a day or
>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via
TCP. The
>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS
(Debian 9,
>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show
any processing for
>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing.
While traffic via
>>>>>>>>> UDP is working just totally fine.
>>>>>>>>>
>>>>>>>>> When i look via command "netstat -ntp" is
see, that the Recv-Q get
>>>>>>>>> bigger and bigger. e.g.:
>>>>>>>>>
>>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program
>>>>>>>>> name tcp 4566 0 172.17.217.12:5060
xxx.xxx.xxx.xxx:57252 ESTABLISHED
>>>>>>>>> 31347/kamailio
>>>>>>>>>
>>>>>>>>> After Kamailio restart, all is working fine again for
a day. We have
>>>>>>>>> maybe 10-20 devices online via TCP and low call
volume (1-2 call per
>>>>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>>>>>
>>>>>>>>> How to could we debug this situation? Again, no
error, no warings in
>>>>>>>>> the log. Just nothing.
>>>>>>>>>
>>>>>>>>> Kristijan
>>>>>> _______________________________________________
>>>>>> Kamailio (SER) - Users Mailing List
>>>>>> sr-users(a)lists.kamailio.org
>>>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>> --
>>>>> Daniel-Constantin Mierla --
www.asipto.com
>>>>>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
>>>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
--
www.asipto.com
>>>>>
>>>> _______________________________________________
>>>> Kamailio (SER) - Users Mailing List
>>>> sr-users(a)lists.kamailio.org
>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>> --
>>> Daniel-Constantin Mierla --
www.asipto.com
>>>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
www.asipto.com
>>>
> --
> Daniel-Constantin Mierla --
www.asipto.com
>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
www.asipto.com
>