The subject is of interest to me as well.
Topology is Kamailio acting as an edge proxy to a farm of Asterisk boxes, proxy handles far-end NAT traversal, transport conversion from UDP/TCP/TLS on the WAN to UDP only on the LAN towards Asterisk. Keep-alive for endpoints on the Internet is achieved using Asterisk's quality OPTIONS messages.
The TCP error shows up rarely in the logs, at which time Asterisk's quality OPTIONS messages stop being delivered to the far endpoint, shortly after the peer goes OFFLINE. What's weird, say there's 60 IP desk phones under the same far NAT, only occasionally (1-3 times a day) some will experience the problem.
What can be done to mitigate this problem? Tuning OS/Kamailio TCP settings, lower UAC register times?
Current TCP settings:
tcp_connection_lifetime=3605
tcp_send_timeout=3
tcp_connect_timeout=5
tcp_max_connections=4096
Thanks.