Henning Westerholt schrieb:
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE)
wrote:
[..]
We are running OpenSER in a pilot project and
unfortunately have some stability problems.
Hallo Albert,
> * Appr. 5000 subscriber accounts
> * Appr. 1200 simultaneously registered users
> * Signalling encrypted with TLS
> * Media data encrypted with SRTP
> * Clients: softphones and hardphones
> * Re-registration time for clients: 3600 sec
How is the network topology? Are there NAT or Firewalls between the
phones and the SIP proxy?
I yes (that means that the SIP proxy can not establish TCP connections
to the SIP phones), you should have NAT keepalive activated in the
clients. Further make sure that the SIP proxy does not close idle TCP
connections, use:
http://www.kamailio.net/docs/modules/1.5.x/registrar.html#id2477171
I've not that much experience with TCP, but
don't think that this
numbers should be a problem in a setup like this.
> OpenSER configuration
> · Works as stateful SIP Proxy
> 1 mySQL database
> 2 Version 1.3.4.-TLS
Why do you use an old (unmaintained) version? Update ...
3
Tcp_children: 100 --> is it recommended to increase this number?
This are quite a lot of children, but ok.
It depends on how much memory you have. :-)
I always had problems with children > 30. I think it is not necessary to
have more then 10 children.
> 4 Udp_children: 20
same here.
> 5 Tcp_connection_timeout: 3600
much too high. This can block a process up to 1 hour. Set it to 1 or 2.
Also set tcp_send_timeout to 1 or 2.
6 Shared
memory:
· -m 512 when error occurred
1 Now set to 1024
How much PKG_MEM do you use? The default value?
Problems
* Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day)
Application already crashed twice
This could be a memory leak, what modules do you use? And do you use any
proprietary modules? You could use the memory debugging to further
investigate this:
http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
First messages were, these, repeated thousands of
times (5915 times):
Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]:
ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52
si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect:
tcpconn_new failed, closing socket And a few of these also (7613 times):
Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]:
ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-...
/usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack:
error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This are caused from insufficient memory conditions. I can't comment on
the TCP and TLS errors. But before really starting to investigate this
problem, would it be possible for you to use a more recent version, e.g.
kamailio 1.5.1 for testing?
* TCP errors, lost SIP messages
Examples from error messages:
14.100 times in log file from 17.06.09
Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15
si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect:
failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15
si-...
/usr/local/sbin/openser[13863]:
ERROR:core:tcpconn_connect:
tcp_blocking_connect failed Jun 17 04:03:15 si-...
/usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed
Jun 17
04:03:15 si-.. /usr/local/sbin/openser[13863]:
ERROR:tm:msg_send:
tcp_send
failed Jun 17 04:03:15 si-...
/usr/local/sbin/openser[13863]:
ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory
errors, it was 225.794 times in the log file (note that the number in
parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17
09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child:
no free tcp receiver, connection passed to the leastbusy one (6) Jun 17
09:01:27 si-... /usr/local/sbin/openser[13921]:
WARNING:core:send2child: no
> free tcp receiver, connection passed to the leastbusy one (5)
You should know that TCP/TLS is blocking in openser - that means that
during TCP sending or TCP connection setup the process which processes
the SIP message is blocked.
Thus, to avoid long blocking reduce the timers as suggested.
Further, make sure that TCP connections are stable (don't close them) -
they should be open all the time - further the connections should be
established by the clients.
Which phones do you use?
Use fix_nated_contact and fix_nated_register to achieve that the proxy
sends replies and in-dialog requests via the existing TCP connection.
One more: sip-router has lots of TCP improvements compared to openser
core. E.g. this feature is useful if the clients are behind NAT/FW and
the proxy should not even try to establish a TCP/TLS connection the clients:
http://sip-router.org/wiki/cookbooks/core-cookbook/devel#tcp_no_connect
>
> * Certificate validation problems
> TCP traffic is currently significantly increased by some ( appr. 70)
> clients which failed to validate the TLS certificate. Registration is
> repeated every 5 sec.
>
> Circa 30 thousand per day (on that day, it was 37.162 times in log)
> Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008
> /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Seems to be a problem in the clients. Import the root CA certificate
into the clients.
regards
klaus