Hi everybody,
regarding our TCP/TLS
stability problems we have no decided to make test with kamailio
1.5.1
Nevertheless it would be
interesting if there is a chance to get rid of this
problems.
Is anybody using
TLS?
Used modules: SNMP,
mySQL
Summary of
problems
Errors may be related to
the following log file entries
un 17 09:01:27 si-…. /usr/local/sbin/openser[13921]: WARNING:core:send2child:
no free tcp receiver, connection passed to the leastbusy one (6)
Jun 17 08:54:52 si-…. /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new:
shared memory allocation failure
Jun 17 08:54:52 si-… /usr/local/sbin/openser[13921]:
ERROR:core:handle_new_connect: tcpconn_new failed, closing socket
And a few of these also (7613 times):
Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_accept:
some error in SSL:
Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]:
ERROR:core:tls_print_errstack: error:1409C041:SSL
routines:SSL3_SETUP_BUFFERS:malloc failure
shared memory
consumption
shared memory is
continously increasing (set to 1024)
PKG_MEM is 1
MB
high CPU load for
some openser processes
normally after some days we
get a high CPU load (50-90%) for a small number of the openser
processes
It looks like an endless
loop and requires restart of openser
There may be an endless
loop in
Pass_fd.c
again:
ret=sendmsg(unix_socket, &msg, 0);
if (ret<0){
if (errno==EINTR) goto again;
LM_CRIT("sendmsg failed on %d: %s\n", unix_socket,
strerror(errno));
}
any comments on
that?
Mit besten Grüßen | Best
regards
Albert Munder
Robert Bosch GmbH
IT Systems Engineering (CI/ISE)
Postfach 30 02
20
70442
Stuttgart
GERMANY
www.bosch.com
Tel. +49 711 811-40562
Fax +49 711
811-5113333
Albert.Munder@de.bosch.com
Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht
Stuttgart HRB 14000
Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung:
Franz Fehrenbach, Siegfried Dais;
Bernd Bohr, Wolfgang
Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks;
Volkmar Denner, Peter
Tyroller.
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
>
[..]
> We are running OpenSER in a pilot project and
> unfortunately
have some stability problems.
Hallo
Albert,
>
* Appr. 5000 subscriber accounts
> * Appr. 1200 simultaneously registered
users
> * Signalling encrypted with TLS
> * Media data encrypted
with SRTP
> * Clients: softphones and hardphones
> * Re-registration
time for clients: 3600 sec
I've
not that much experience with TCP, but don't think that this numbers should be a
problem in a setup like this.
>
OpenSER configuration
> · Works as stateful SIP Proxy
> 1 mySQL
database
> 2 Version 1.3.4.-TLS
> 3 Tcp_children: 100 --> is it
recommended to increase this number?
This
are quite a lot of children, but ok.
>
4 Udp_children: 20
> 5 Tcp_connection_timeout: 3600
> 6 Shared
memory:
> · -m 512 when error occurred
> 1 Now set to 1024
How
much PKG_MEM do you use? The default value?
>
Problems
> * Shared memory consumption
> Shared memory usage is
permanently increasing (about 50 MB per day)
> Application already crashed
twice
This
could be a memory leak, what modules do you use? And do you use any proprietary
modules? You could use the memory debugging to further investigate this:
http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
>
First messages were, these, repeated thousands of times (5915 times):
>
Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]:
>
ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52
>
si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect:
>
tcpconn_new failed, closing socket And a few of these also (7613 times):
>
Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]:
>
ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-...
>
/usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack:
>
error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This
are caused from insufficient memory conditions. I can't comment on the TCP and
TLS errors. But before really starting to investigate this problem, would it be
possible for you to use a more recent version, e.g. kamailio 1.5.1 for
testing?
>
* TCP errors, lost SIP messages
>
> Examples from error
messages:
> 14.100 times in log file from 17.06.09
> Jun 17 04:03:15
si-... /usr/local/sbin/openser[13863]:
> ERROR:core:tcp_blocking_connect:
poll error: flags 18 Jun 17 04:03:15
> si-...
/usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect:
> failed
to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-...
>
/usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect:
>
tcp_blocking_connect failed Jun 17 04:03:15 si-...
>
/usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun
17
> 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send:
tcp_send
> failed Jun 17 04:03:15 si-...
/usr/local/sbin/openser[13863]:
> ERROR:tm:t_forward_nonack: sending
request failed
>
> Appears at least 20 000 times; and in the day of
the last shared memory
> errors, it was 225.794 times in the log file
(note that the number in
> parenthesis is usually 1 or 2, but on that day
it has reached 6): Jun 17
> 09:01:27 si-....
/usr/local/sbin/openser[13921]: WARNING:core:send2child:
> no free tcp
receiver, connection passed to the leastbusy one (6) Jun 17
> 09:01:27
si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no
> free
tcp receiver, connection passed to the leastbusy one (5)
>
> *
Certificate validation problems
> TCP traffic is currently significantly
increased by some ( appr. 70)
> clients which failed to validate the TLS
certificate. Registration is
> repeated every 5 sec.
>
> Circa
30 thousand per day (on that day, it was 37.162 times in log)
> Jun 17
04:03:10 si-024lc008 /usr/local/sbin/openser[13801]:
>
ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008
>
/usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack:
>
error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Best
regards,
Henning