And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
<vrban.lkml(a)gmail.com>om>:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
<vrban.lkml(a)gmail.com>om>:
>
> > Have you done a test with tools such as sipp, or was this happening
> > after a while, with usual phones registering?
>
> Usual variety of devices registering via TLS. But i can not exclude
> that some devices displaying behavioural problems.
>
> > Can you list the tcp connections and see if they are listed?
> > kamctl tcp core.tcp_list
>
> Need Kex module for that? So i can deliver next time. But when i do
> "lsof -u kamailio |grep TCP"
> i get a long list of more then 2000 lines with:
>
> ...
> kamailio 37561 kamailio 2105u sock 0,9 0t0
> 27856287 protocol: TCP
> kamailio 37561 kamailio 2106u sock 0,9 0t0
> 27856305 protocol: TCP
> kamailio 37561 kamailio 2107u sock 0,9 0t0
> 27856306 protocol: TCP
> kamailio 37561 kamailio 2108u sock 0,9 0t0
> 27856914 protocol: TCP
> ...
>
> So about the time Kamailio created a lot of socket in the TCP domain,
> but which are not bound to any port (eg via connect(2) or listen(2) or
> bind(2))
> Until we get to the maximum number of 2048 connections.
>
> Best
> Kristijan
>
> Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla
> <miconda(a)gmail.com>om>:
> >
> > Have you done a test with tools such as sipp, or was this happening
> > after a while, with usual phones registering?
> >
> > Can you list the tcp connections and see if they are listed?
> >
> > kamctl tcp core.tcp_list
> >
> > Cheers,
> > Daniel
> >
> > On 25.03.19 08:03, Kristijan Vrban wrote:
> > >> The solution here is to use set_reply_no_connect()
> > > implemented it. Now the issue has shifted to:
> > >
> > > ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum
> > > number of connections exceeded: 2048/2048
> > >
> > > But not a single TCP connection is active between Kamailio and any
> > > device. Seems this counter for maximum number of connections
> > > now has an issue?
> > >
> > > Kristijan
> > >
> > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
> > > <miconda(a)gmail.com>om>:
> > >> Hello,
> > >>
> > >> based on the trap output I think I could figure out what happened
there.
> > >>
> > >> You have tcp_children to very low value (1 or so), the problem is not
> > >> actually that one, but the fact that the connection to upstream (the
> > >> device/app sending the request) was closed after receiving the
request
> > >> and routing of the reply gets stuck in the way of:
> > >>
> > >> - a reply is received and has to be forwarded
> > >> - connection was lost, so Kamailio tries to establish a new one,
but
> > >> takes time till fails because the upstream is behind nat or so based
on
> > >> the via header:
> > >>
> > >> Via: SIP/2.0/TLS
> > >>
10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
> > >>
> > >> - the reply is retransmitted and gets to another worker, which
tries
> > >> to forward it again, but discovers a connection structure for that
> > >> destination exists (created by previous reply worker) and now waits
for
> > >> the connection to be released (or better said, for the mutex on
writing
> > >> buffer to be unlocked)
> > >>
> > >> - as the second reply waits, there can be other retransmissions of
the
> > >> reply ending up in other workers stuck on waiting for the mutex of
the
> > >> connection write buffer
> > >>
> > >> The solution here is to use set_reply_no_connect() -- you can put it
> > >> first in request_route block. I think this would be a good addition
to
> > >> the default configuration file as well, IMO, the sip server should
not
> > >> connect for sending replies and should do it also for requests that
go
> > >> behind nat.
> > >>
> > >> Cheers,
> > >> Daniel
> > >>
> > >> On 19.03.19 10:53, Kristijan Vrban wrote:
> > >>> So i had again the situation. But this time, incoming udp was
> > >>> affected. Kamailio was sending out OPTIONS (via dispatcher module)
to
> > >>> a group of asterisk machines
> > >>> but the 200 OK reply to the OPTIONS where not processed, so the
> > >>> dispatcher module set all asterisk to inactive, even though they
> > >>> replied 200 OK
> > >>>
> > >>> Attached the output of kamctl trap during the situation. Hope
there is
> > >>> any useful in it. Because after "kamctl trap" it was
working again
> > >>> without kamailio restart.
> > >>>
> > >>> Best
> > >>> Kristijan
> > >>>
> > >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin
Mierla
> > >>> <miconda(a)gmail.com>om>:
> > >>>> Hello,
> > >>>>
> > >>>> setting tcp_children=1 is not a god option for scallability,
practically
> > >>>> you set kamailio to process a single tcp message at one time,
on high
> > >>>> traffic, that won't work well.
> > >>>>
> > >>>> Maybe try to set tcp_children to 2 or 4, that should make an
eventual
> > >>>> race appear faster.
> > >>>>
> > >>>> Regarding the pid, if it is an outgoing connection, then it
can be
> > >>>> created by any worker process, including a UDP worker, if that
was the
> > >>>> one receiving the sip message over udp and sends it out via
tcp.
> > >>>>
> > >>>> Cheers,
> > >>>> Daniel
> > >>>>
> > >>>> On 18.03.19 10:09, Kristijan Vrban wrote:
> > >>>>> Hi Daniel,
> > >>>>>
> > >>>>> for testing, i now had set: "tcp_children=1" and
so far this issue did not occur
> > >>>>> ever since. So now value to provide for "kamctl
trap" yet.
> > >>>>>
> > >>>>> "kamctl ps" show this two process to handle
tcp:
> > >>>>>
> > >>>>> ...
> > >>>>> }, {
> > >>>>> "IDX": 25,
> > >>>>> "PID": 71929,
> > >>>>> "DSC": "tcp receiver (generic)
child=0"
> > >>>>> }, {
> > >>>>> "IDX": 26,
> > >>>>> "PID": 71933,
> > >>>>> "DSC": "tcp main process"
> > >>>>> }
> > >>>>> ...
> > >>>>>
> > >>>>>
> > >>>>> Ok, but then is was wondering to see a TCP connection on a
udp receiver child:
> > >>>>>
> > >>>>>
> > >>>>> netstat -ntp |grep 5061
> > >>>>>
> > >>>>> ...
> > >>>>> tcp 0 0 172.17.217.10:5061
195.70.114.125:18252
> > >>>>> ESTABLISHED 71895/kamailio
> > >>>>> ...
> > >>>>>
> > >>>>> An pid 71895 is:
> > >>>>>
> > >>>>> }, {
> > >>>>> "IDX": 3,
> > >>>>> "PID": 71895,
> > >>>>> "DSC": "udp receiver child=2
sock=127.0.0.1:5060"
> > >>>>> }, {
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> And if i look into it via "lsof -p 71895" (the
udp receiver child)
> > >>>>>
> > >>>>> ...
> > >>>>> kamailio 71895 kamailio 14u sock 0,9
0t0
> > >>>>> 8856085 protocol: TCP
> > >>>>> kamailio 71895 kamailio 15u sock 0,9
0t0
> > >>>>> 8886886 protocol: TCP
> > >>>>> kamailio 71895 kamailio 16u sock 0,9
0t0
> > >>>>> 8854886 protocol: TCP
> > >>>>> kamailio 71895 kamailio 17u sock 0,9
0t0
> > >>>>> 8828915 protocol: TCP
> > >>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91
0t0
> > >>>>> 1680314 type=DGRAM
> > >>>>> kamailio 71895 kamailio 19u IPv4 1846523
0t0
> > >>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
> > >>>>> kamailio 71895 kamailio 20u sock 0,9
0t0
> > >>>>> 8887192 protocol: TCP
> > >>>>> kamailio 71895 kamailio 21u sock 0,9
0t0
> > >>>>> 8813634 protocol: TCP
> > >>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102
0t0
> > >>>>> 1681407 type=STREAM
> > >>>>> kamailio 71895 kamailio 23u sock 0,9
0t0
> > >>>>> 8850488 protocol: TCP
> > >>>>> ...
> > >>>>>
> > >>>>> Not only the ESTABLISHED TCP session. But also this empty
sockets
> > >>>>> "protocol: TCP"
> > >>>>> What are they doing there in the udp receiver? Is that how
it's supposed to be?
> > >>>>>
> > >>>>> Kristijan
> > >>>>>
> > >>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb
Daniel-Constantin Mierla
> > >>>>> <miconda(a)gmail.com>om>:
> > >>>>>> Can you get file written by `kamctl trap`? It should
have the backtrace
> > >>>>>> for all kamailio processes. You need latest kamailio
5.2.
> > >>>>>>
> > >>>>>> Also, get the output for: kamctl ps
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Daniel
> > >>>>>>
> > >>>>>> On 14.03.19 13:52, Kristijan Vrban wrote:
> > >>>>>>> When i attach via gdb to one of the tcp worker, i
see this:
> > >>>>>>>
> > >>>>>>> (gdb) bt
> > >>>>>>> #0 0x00007fdaf4d14470 in futex_wait
(private=<optimized out>,
> > >>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at
> > >>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > >>>>>>> #1 futex_wait_simple (private=<optimized
out>, expected=1,
> > >>>>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
> > >>>>>>> #2 __pthread_rwlock_wrlock_slow
(rwlock=0x7fdaeca92f80) at
> > >>>>>>> pthread_rwlock_wrlock.c:67
> > >>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock
() from
> > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl ()
from
> > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake ()
from
> > >>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #11 0x00007fdaf0e8d31b in tls_accept
(c=0x7fdaed26fa98,
> > >>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422
> > >>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f
(c=0x7fdaed26fa98,
> > >>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
> > >>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers
(c=0x7fdaed26fa98,
> > >>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
> > >>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req
(con=0x7fdaed26fa98,
> > >>>>>>> bytes_read=0x7ffffe2c3184,
read_flags=0x7ffffe2c318c) at
> > >>>>>>> core/tcp_read.c:1496
> > >>>>>>> #15 0x0000556ead5f575f in handle_io
(fm=0x7fdaf597aa98, events=1,
> > >>>>>>> idx=-1) at core/tcp_read.c:1862
> > >>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll
(h=0x556eadaaeec0 <io_w>,
> > >>>>>>> t=2, repeat=0) at core/io_wait.h:1065
> > >>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop
(unix_sock=49) at
> > >>>>>>> core/tcp_read.c:1974
> > >>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
> > >>>>>>> #19 0x0000556ead3c352a in main_loop () at
main.c:1735
> > >>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13,
argv=0x7ffffe2c3828) at main.c:2675
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb
Kristijan Vrban
> > >>>>>>> <vrban.lkml(a)gmail.com>om>:
> > >>>>>>>> Hi, with full debug is see this in log for
every incoming TCP SIP request:
> > >>>>>>>>
> > >>>>>>>> Mar 14 12:10:15 kamailio-preview
/usr/sbin/kamailio[17940]: DEBUG:
> > >>>>>>>> <core> [core/tcp_main.c:3871]:
send2child(): WARNING: no free tcp
> > >>>>>>>> receiver, connection passed to the least busy
one (105)
> > >>>>>>>> Mar 14 12:10:15 kamailio-preview
/usr/sbin/kamailio[17940]: DEBUG:
> > >>>>>>>> <core> [core/tcp_main.c:3875]:
send2child(): selected tcp worker 2
> > >>>>>>>> 27(17937) for activity on
[tls:172.17.217.10:5061], 0x7fdaeda8f928
> > >>>>>>>>
> > >>>>>>>> So the Kamailio TCP process is working, and
received TCP traffic. But
> > >>>>>>>> the tcp workers are somehow busy.
> > >>>>>>>>
> > >>>>>>>> When i attach via strace to the TCP worker, i
do not see any activity. Just:
> > >>>>>>>>
> > >>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1,
NULL
> > >>>>>>>>
> > >>>>>>>> and nothing, even when i see the main tcp
process choose this worker process.
> > >>>>>>>>
> > >>>>>>>> Kristijan
> > >>>>>>>>
> > >>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb
Kristijan Vrban
> > >>>>>>>> <vrban.lkml(a)gmail.com>om>:
> > >>>>>>>>> first of all thanks for the feedback. i
prepared our system now to run
> > >>>>>>>>> with debug=3
> > >>>>>>>>> I hope to see more then then.
> > >>>>>>>>>
> > >>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb
Kristijan Vrban
> > >>>>>>>>> <vrban.lkml(a)gmail.com>om>:
> > >>>>>>>>>> Hi kamailios,
> > >>>>>>>>>>
> > >>>>>>>>>> i have a creepy situation with v5.2.1
stable Kamilio. After a day or
> > >>>>>>>>>> so, Kamailio stop to process incoming
SIP traffic via TCP. The
> > >>>>>>>>>> incoming TCP network packages get
TCP-ACK from the OS (Debian 9,
> > >>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio
does not show any processing for
> > >>>>>>>>>> the SIP-Traffic incoming via TCP. No
logs, nothing. While traffic via
> > >>>>>>>>>> UDP is working just totally fine.
> > >>>>>>>>>>
> > >>>>>>>>>> When i look via command "netstat
-ntp" is see, that the Recv-Q get
> > >>>>>>>>>> bigger and bigger. e.g.:
> > >>>>>>>>>>
> > >>>>>>>>>> Proto Recv-Q Send-Q Local Address
Foreign Address State PID/Program
> > >>>>>>>>>> name tcp 4566 0 172.17.217.12:5060
xxx.xxx.xxx.xxx:57252 ESTABLISHED
> > >>>>>>>>>> 31347/kamailio
> > >>>>>>>>>>
> > >>>>>>>>>> After Kamailio restart, all is working
fine again for a day. We have
> > >>>>>>>>>> maybe 10-20 devices online via TCP and
low call volume (1-2 call per
> > >>>>>>>>>> minute). The only settings for tcp we
have is "tcp_delayed_ack=no"
> > >>>>>>>>>>
> > >>>>>>>>>> How to could we debug this situation?
Again, no error, no warings in
> > >>>>>>>>>> the log. Just nothing.
> > >>>>>>>>>>
> > >>>>>>>>>> Kristijan
> > >>>>>>> _______________________________________________
> > >>>>>>> Kamailio (SER) - Users Mailing List
> > >>>>>>> sr-users(a)lists.kamailio.org
> > >>>>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> > >>>>>> --
> > >>>>>> Daniel-Constantin Mierla --
www.asipto.com
> > >>>>>>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > >>>>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > >>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in
Washington, DC, USA --
www.asipto.com
> > >>>>>>
> > >>>>> _______________________________________________
> > >>>>> Kamailio (SER) - Users Mailing List
> > >>>>> sr-users(a)lists.kamailio.org
> > >>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> > >>>> --
> > >>>> Daniel-Constantin Mierla --
www.asipto.com
> > >>>>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > >>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA --
www.asipto.com
> > >>>>
> > >> --
> > >> Daniel-Constantin Mierla --
www.asipto.com
> > >>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > >> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
--
www.asipto.com
> > >>
> > --
> > Daniel-Constantin Mierla --
www.asipto.com
> >
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
www.asipto.com
> >