Hi everybody,
 
regarding our TCP/TLS stability problems we have no decided to make test with kamailio 1.5.1
Nevertheless it would be interesting if there is a chance to get rid of this problems.
 
Is anybody using TLS?
 
Used modules: SNMP, mySQL
 
Summary of problems
Errors may be related to the following log file entries

un 17 09:01:27 si-…. /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6)

Jun 17 08:54:52 si-…. /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure

Jun 17 08:54:52 si-… /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket

And a few of these also (7613 times):

Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL:

Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure

shared memory consumption
shared memory is continously increasing (set to 1024)
PKG_MEM is 1 MB
 
high CPU load for some openser processes
normally after some days we get a high CPU load (50-90%) for a small number of the openser processes
It looks like an endless loop and requires restart of openser
There may be an endless loop in

Pass_fd.c

again:

ret=sendmsg(unix_socket, &msg, 0);

if (ret<0){

if (errno==EINTR) goto again;

LM_CRIT("sendmsg failed on %d: %s\n", unix_socket, strerror(errno));

}

any comments on that?

Mit besten Grüßen | Best regards
Albert Munder
Robert Bosch GmbH
IT Systems Engineering (CI/ISE)
Postfach 30 02 20
70442 Stuttgart
GERMANY
www.bosch.com

Tel. +49 711 811-40562
Fax +49 711 811-5113333
Albert.Munder@de.bosch.com

Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000
Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais;
Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks;
Volkmar Denner, Peter Tyroller.

 


Von: Henning Westerholt [mailto:henning.westerholt@1und1.de]
Gesendet: Dienstag, 30. Juni 2009 17:25
An: users@lists.kamailio.org
Cc: Munder Albert (CI/ISE)
Betreff: Re: [Kamailio-Users] OpenSER stability problems in pilot project

On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
> [..]
> We are running OpenSER in a pilot project and
> unfortunately have some stability problems.


Hallo Albert,


> * Appr. 5000 subscriber accounts
> * Appr. 1200 simultaneously registered users
> * Signalling encrypted with TLS
> * Media data encrypted with SRTP
> * Clients: softphones and hardphones
> * Re-registration time for clients: 3600 sec


I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.


> OpenSER configuration
> · Works as stateful SIP Proxy
> 1 mySQL database
> 2 Version 1.3.4.-TLS
> 3 Tcp_children: 100 --> is it recommended to increase this number?


This are quite a lot of children, but ok.


> 4 Udp_children: 20
> 5 Tcp_connection_timeout: 3600
> 6 Shared memory:
> · -m 512 when error occurred
> 1 Now set to 1024


How much PKG_MEM do you use? The default value?


> Problems
> * Shared memory consumption
> Shared memory usage is permanently increasing (about 50 MB per day)
> Application already crashed twice


This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory


> First messages were, these, repeated thousands of times (5915 times):
> Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]:
> ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52
> si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect:
> tcpconn_new failed, closing socket And a few of these also (7613 times):
> Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-...
> /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack:
> error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure


This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?


> * TCP errors, lost SIP messages
>
> Examples from error messages:
> 14.100 times in log file from 17.06.09
> Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
> ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15
> si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect:
> failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect:
> tcp_blocking_connect failed Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17
> 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send
> failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
> ERROR:tm:t_forward_nonack: sending request failed
>
> Appears at least 20 000 times; and in the day of the last shared memory
> errors, it was 225.794 times in the log file (note that the number in
> parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17
> 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child:
> no free tcp receiver, connection passed to the leastbusy one (6) Jun 17
> 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no
> free tcp receiver, connection passed to the leastbusy one (5)
>
> * Certificate validation problems
> TCP traffic is currently significantly increased by some ( appr. 70)
> clients which failed to validate the TLS certificate. Registration is
> repeated every 5 sec.
>
> Circa 30 thousand per day (on that day, it was 37.162 times in log)
> Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008
> /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca


Best regards,


Henning