Do you have pure tcp traffic and facing this issue, or there are
actually tls connections?
What are the values for core parameters related to tcp connect and tcp
send timeouts?
As for restart taking long, see exit_timeout parameter:
*
As for tls with libssl1.1/libcrypto1.1, I think I discover what the
issue is. With v1.1 they use their own internal locking functions, not
exposing any api to set them from outside. Before, kamailio was
initializing the libray telling to use Kamailio locks, giving one lock
per connection. As i could get from some gdb traces I received, with
libssl 1.1, the same internal lock is used for when attempting to
connect to different addresses as well as when trying to write to
different connections. If one operation is slow for what so ever reason,
the others are waiting for the lock to be lifted by the slow operation.
I am digging in the source code of libssl1.1 to figure out a solution,
it can still take a bit because I am travelling for several days with no
much spare time.
Among the tunnings would be lower timeouts to connect and send, do not
attempt to connect unless you are sure the target expects new
connections (e.g., sending to a gateway/sip server accepting traffic via
tls, but don't do it even for the requests routed via lookup(location)
as the registration is using a connection with an ephemeral source port
and trying to connect back to it will fail). If still a major issue for
what so ever reason, using a version compiled with libssl1.0 would be
something to go for it.
Cheers,
Daniel
On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio
5.2.2.
As far as I was able to see, SIP application were able to "connect()"
with TCP, but my logs wasn't reporting any of the SIP message received
with TCP.
I have an pike right before an xlog showing every incoming request.
However
I suspect the issue was not related to pike module. The log didn't
showed unusual
number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions.
I have many TCP, UDP childreen...
Server was not under high load
Nothing unusual.
I'm running the default build for debian stretch from here:
http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was
not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using
"set_reply_no_connect()".
I have added it and restarted!
(Tks Daniel for this tip!)
I have been looking at issue reported here:
"Kamailio 5.0.2 suddenly stops processing traffic, then generates a
core when restarting."
https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1.
And I do have crash when I restart my kamailio. (even when I simply
restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]:
sig_usr(): signal 15 received
Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]:
mod_destroy(): ERROR: ctl: could not delete unix socket
/var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
[jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed:
Permission denied
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
[jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed:
Permission denied
Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]:
sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have
trouble with libssl1.1?
I just restarted again my service (to see if it restart better after
30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]:
sig_usr(): signal 15 received
Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]:
mod_destroy(): ERROR: ctl: could not delete unix socket
/var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]:
sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way!
Tks!
Regards
Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla
<miconda(a)gmail.com <mailto:miconda@gmail.com>> a écrit :
Hello,
based on the trap output I think I could figure out what happened
there.
You have tcp_children to very low value (1 or so), the problem is not
actually that one, but the fact that the connection to upstream (the
device/app sending the request) was closed after receiving the request
and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so
based on
the via header:
Via: SIP/2.0/TLS
10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that
destination exists (created by previous reply worker) and now
waits for
the connection to be released (or better said, for the mutex on
writing
buffer to be unlocked)
- as the second reply waits, there can be other retransmissions
of the
reply ending up in other workers stuck on waiting for the mutex of the
connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it
first in request_route block. I think this would be a good addition to
the default configuration file as well, IMO, the sip server should not
connect for sending replies and should do it also for requests that go
behind nat.
Cheers,
Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time,
incoming udp was
affected. Kamailio was sending out OPTIONS (via dispatcher
module) to
a group of asterisk machines
but the 200 OK reply to the OPTIONS where not processed, so the
dispatcher module set all asterisk to inactive, even though they
replied 200 OK
Attached the output of kamctl trap during the situation. Hope
there is
any useful in it. Because after "kamctl
trap" it was working again
without kamailio restart.
Best
Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
<miconda(a)gmail.com <mailto:miconda@gmail.com>>:
> Hello,
>
> setting tcp_children=1 is not a god option for scallability,
practically
> you set kamailio to process a single tcp
message at one time,
on high
> traffic, that won't work well.
>
> Maybe try to set tcp_children to 2 or 4, that should make an
eventual
> race appear faster.
>
> Regarding the pid, if it is an outgoing connection, then it can be
> created by any worker process, including a UDP worker, if that
was the
> one receiving the sip message over udp and
sends it out via tcp.
>
> Cheers,
> Daniel
>
> On 18.03.19 10:09, Kristijan Vrban wrote:
>> Hi Daniel,
>>
>> for testing, i now had set: "tcp_children=1" and so far this
issue did not occur
>> ever since. So now value to provide for
"kamctl trap" yet.
>>
>> "kamctl ps" show this two process to handle tcp:
>>
>> ...
>> }, {
>> "IDX": 25,
>> "PID": 71929,
>> "DSC": "tcp receiver (generic) child=0"
>> }, {
>> "IDX": 26,
>> "PID": 71933,
>> "DSC": "tcp main process"
>> }
>> ...
>>
>>
>> Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
>>
>>
>> netstat -ntp |grep 5061
>>
>> ...
>> tcp 0 0 172.17.217.10:5061
<http://172.17.217.10:5061> 195.70.114.125:18252
<http://195.70.114.125:18252>
>> ESTABLISHED 71895/kamailio
>> ...
>>
>> An pid 71895 is:
>>
>> }, {
>> "IDX": 3,
>> "PID": 71895,
>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060
<http://127.0.0.1:5060>"
>> }, {
>>
>>
>>
>> And if i look into it via "lsof -p 71895" (the udp receiver child)
>>
>> ...
>> kamailio 71895 kamailio 14u sock 0,9 0t0
>> 8856085 protocol: TCP
>> kamailio 71895 kamailio 15u sock 0,9 0t0
>> 8886886 protocol: TCP
>> kamailio 71895 kamailio 16u sock 0,9 0t0
>> 8854886 protocol: TCP
>> kamailio 71895 kamailio 17u sock 0,9 0t0
>> 8828915 protocol: TCP
>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0
>> 1680314 type=DGRAM
>> kamailio 71895 kamailio 19u IPv4 1846523 0t0
>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
>> kamailio 71895 kamailio 20u sock 0,9 0t0
>> 8887192 protocol: TCP
>> kamailio 71895 kamailio 21u sock 0,9 0t0
>> 8813634 protocol: TCP
>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0
>> 1681407 type=STREAM
>> kamailio 71895 kamailio 23u sock 0,9 0t0
>> 8850488 protocol: TCP
>> ...
>>
>> Not only the ESTABLISHED TCP session. But also this empty sockets
>> "protocol: TCP"
>> What are they doing there in the udp receiver? Is that how
it's
supposed to be?
>>
>> Kristijan
>>
>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin
Mierla
>> <miconda(a)gmail.com
<mailto:miconda@gmail.com>>:
>>> Can you get file written by `kamctl trap`? It should have the
backtrace
>>> for all kamailio processes. You need
latest kamailio 5.2.
>>>
>>> Also, get the output for: kamctl ps
>>>
>>> Cheers,
>>> Daniel
>>>
>>> On 14.03.19 13:52, Kristijan Vrban wrote:
>>>> When i attach via gdb to one of the tcp worker, i see this:
>>>>
>>>> (gdb) bt
>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>,
>>>> expected=1, futex_word=0x7fdaeca92f8c) at
>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>>> #1 futex_wait_simple (private=<optimized out>, expected=1,
>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
>>>> #2 __pthread_rwlock_wrlock_slow
(rwlock=0x7fdaeca92f80) at
>>>> pthread_rwlock_wrlock.c:67
>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>> #5 0x00007fdaf08a6f69 in ?? ()
from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>> #6 0x00007fdaf08b36c7 in
EVP_CIPHER_CTX_ctrl () from
>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #8 0x00007fdaf0c2bddb in ?? ()
from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #9 0x00007fdaf0c22858 in ?? ()
from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #10 0x00007fdaf0c1af61 in
SSL_do_handshake () from
>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
>>>> error=0x7ffffe2a2df0) at tls_server.c:422
>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at
>>>> core/tcp_read.c:1496
>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98,
events=1,
>>>> idx=-1) at core/tcp_read.c:1862
>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll
(h=0x556eadaaeec0
<io_w>,
>>>> t=2, repeat=0) at
core/io_wait.h:1065
>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
>>>> core/tcp_read.c:1974
>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>>> #19 0x0000556ead3c352a in
main_loop () at main.c:1735
>>>> #20 0x0000556ead3ca5f8 in main (argc=13,
argv=0x7ffffe2c3828)
at main.c:2675
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
>>>> <vrban.lkml(a)gmail.com <mailto:vrban.lkml@gmail.com>>:
>>>>> Hi, with full debug is see this in log for every incoming
TCP SIP request:
>>>>>
>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>> <core>
[core/tcp_main.c:3871]: send2child(): WARNING: no
free tcp
>>>>> receiver, connection passed
to the least busy one (105)
>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>> <core>
[core/tcp_main.c:3875]: send2child(): selected tcp
worker 2
>>>>> 27(17937) for activity on
[tls:172.17.217.10:5061
<http://172.17.217.10:5061>], 0x7fdaeda8f928
>>>>>
>>>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>>> the tcp workers are somehow
busy.
>>>>>
>>>>> When i attach via strace to the TCP worker, i do not see
any activity. Just:
>>>>>
>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
>>>>>
>>>>> and nothing, even when i see the main tcp process choose
this worker process.
>>>>>
>>>>> Kristijan
>>>>>
>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
>>>>> <vrban.lkml(a)gmail.com <mailto:vrban.lkml@gmail.com>>:
>>>>>> first of all thanks for the feedback. i prepared our
system now to run
>>>>>> with debug=3
>>>>>> I hope to see more then then.
>>>>>>
>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
>>>>>> <vrban.lkml(a)gmail.com
<mailto:vrban.lkml@gmail.com>>:
>>>>>>> Hi kamailios,
>>>>>>>
>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio.
After a day or
>>>>>>> so, Kamailio stop to
process incoming SIP traffic via
TCP. The
>>>>>>> incoming TCP network
packages get TCP-ACK from the OS
(Debian 9,
>>>>>>>
4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>>> the SIP-Traffic
incoming via TCP. No logs, nothing. While
traffic via
>>>>>>> UDP is working just
totally fine.
>>>>>>>
>>>>>>> When i look via command "netstat -ntp" is see, that
the
Recv-Q get
>>>>>>> bigger and bigger.
e.g.:
>>>>>>>
>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>>> name tcp 4566 0
172.17.217.12:5060
<http://172.17.217.12:5060> xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>>>> 31347/kamailio
>>>>>>>
>>>>>>> After Kamailio restart, all is working fine again for a
day. We have
>>>>>>> maybe 10-20 devices
online via TCP and low call volume
(1-2 call per
>>>>>>> minute). The only
settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>>>
>>>>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>>>> the log. Just
nothing.
>>>>>>>
>>>>>>> Kristijan
>>>> _______________________________________________
>>>> Kamailio (SER) - Users Mailing List
>>>> sr-users(a)lists.kamailio.org <mailto:sr-users@lists.kamailio.org>
>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>> --
>>> Daniel-Constantin Mierla --
www.asipto.com
<http://www.asipto.com>
>>>
www.twitter.com/miconda
<http://www.twitter.com/miconda> --
www.linkedin.com/in/miconda
<http://www.linkedin.com/in/miconda>
>>> Kamailio World Conference - May 6-8,
2019 --
www.kamailioworld.com <http://www.kamailioworld.com>
>>> Kamailio Advanced Training - Mar
25-27, 2019, in Washington,
DC, USA --
www.asipto.com
<http://www.asipto.com>
>>>
>> _______________________________________________
>> Kamailio (SER) - Users Mailing List
>> sr-users(a)lists.kamailio.org <mailto:sr-users@lists.kamailio.org>
>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> --
> Daniel-Constantin Mierla --
www.asipto.com <http://www.asipto.com>
>
www.twitter.com/miconda <http://www.twitter.com/miconda> --
www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com <http://www.kamailioworld.com>
> Kamailio Advanced Training - Mar 25-27, 2019,
in Washington,
DC, USA --
www.asipto.com <http://www.asipto.com>
>
--
Daniel-Constantin Mierla --
www.asipto.com <http://www.asipto.com>
www.twitter.com/miconda <http://www.twitter.com/miconda> --
www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
<http://www.kamailioworld.com>
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA --
www.asipto.com <http://www.asipto.com>
_______________________________________________
Kamailio (SER) - Users Mailing List
sr-users(a)lists.kamailio.org <mailto:sr-users@lists.kamailio.org>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
--
Antisip -
http://www.antisip.com