You should see what the UDP worker processes do when that happens. You have to take the execution backtrace with gdb, one option is to use:

kamctl trap

which grabs the backtrace for all kamailio processes, saving the data in a local file.

By looking at backtrace, one can figure out what that process is execution at that moment.

Cheers,
Daniel

On 30.08.23 15:30, Ihor Olkhovskyi wrote:
They are increasing, actually

# ss -l -u -m src X.X.X.X/Y
State       Recv-Q Send-Q                                               Local Address:Port                                                                Peer Address:Port                
UNCONN      25167616 0                                                   X.X.X.X:sip                                                                            *:*                    
         skmem:(r25167616,rb25165824,t0,tb65536,f2304,w0,o0,bl0,d514894)

Le mer. 30 août 2023 à 15:04, Bastian Triller <bastian.triller@gmail.com> a écrit :
Are drops increasing on that socket while it is happening?
ss -l src <local_interface> -u sport 5060 -m

On Tue, Aug 29, 2023 at 3:26 PM Ihor Olkhovskyi <igorolhovskiy@gmail.com> wrote:
Just to add some info

netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
...
udp   25167616      0 <local_interface>:5060     0.0.0.0:*                           211759/kamailio
...

So I see a huge Receive Queue on UDP for Kamailio which is not clearing.

Le mar. 29 août 2023 à 14:29, Ihor Olkhovskyi <igorolhovskiy@gmail.com> a écrit :
Hello,

I've faced a bit strange issue, but a bit of preface. I have Kamailio as a proxy (TLS/WS <-> UDP) and second Kamailio as a presence server. At some point presence server accepts around 5K PUBLISH within 1 minute and sending around the same amount of NOTIFY to proxy Kamailio.

Proxy is "transforming" protocol to TLS, but at sime point I'm starting to get these type of errors

tm [../../core/forward.h:292]: msg_send_buffer(): tcp_send failed
tm [t_fwd.c:1588]: t_send_branch(): sending request on branch 0 failed
<script>: [RELAY] Relay to <sip:X.X.X.X:51571;transport=tls> failed!
tm [../../core/forward.h:292]: msg_send_buffer(): tcp_send failed
tm [t_fwd.c:1588]: t_send_branch(): sending request on branch 0 failed

Some of those messages are 100% valid as client can go away or so. Some are not, cause I'm sure client is alive and connected. 

But the problem comes later. At some moment proxy Kamailio just stops accept UDP traffic on this interface (where it also accepts all NOTIFY's), at the start of the "stopping accepting" Kamailio sends OPTIONS via DISPATCHER but not able to receive 200 OK.

Over TLS on the same interface all is ok. On other (loopback) interface UDP is being processed fine, so I don't suspert some limit on open files here.

Only restart of Kamailio proxy process helps in this case.

I've tuned net.core.rmem_max and net.core.rmem_default to 25 Mb, so in theory buffer should not be the case.

Is there some internal "interface buffer" in Kamailio that is not freed upon failure send or maybe I've missed somethig?

Kamailio 5.6.4

fork=yes
children=12
tcp_children=12

enable_tls=yes

tcp_accept_no_cl=yes
tcp_max_connections=63536
tls_max_connections=63536
tcp_accept_aliases=no
tcp_async=yes
tcp_connect_timeout=10
tcp_conn_wq_max=63536
tcp_crlf_ping=yes
tcp_delayed_ack=yes
tcp_fd_cache=yes
tcp_keepalive=yes
tcp_keepcnt=3
tcp_keepidle=30
tcp_keepintvl=10
tcp_linger2=30
tcp_rd_buf_size=80000
tcp_send_timeout=10
tcp_wq_blk_size=2100
tcp_wq_max=10485760
open_files_limit=63536

Sysctl

# To increase the amount of memory available for socket input/output queues
net.ipv4.tcp_rmem = 4096 25165824 25165824
net.core.rmem_max = 25165824
net.core.rmem_default = 25165824
net.ipv4.tcp_wmem = 4096 65536 25165824
net.core.wmem_max = 25165824
net.core.wmem_default = 65536
net.core.optmem_max = 25165824

# To limit the maximum number of requests queued to a listen socket
net.core.somaxconn = 128

# Tells TCP to instead make decisions that would prefer lower latency.
net.ipv4.tcp_low_latency=1

# Optional (it will increase performance)
net.core.netdev_max_backlog = 1000
net.ipv4.tcp_max_syn_backlog = 128

# Flush the routing table to make changes happen instantly.
net.ipv4.route.flush=1
--
Best regards,
Ihor (Igor)


--
Best regards,
Ihor (Igor)
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-leave@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the sender!
Edit mailing list options or unsubscribe:
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-leave@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the sender!
Edit mailing list options or unsubscribe:


--
Best regards,
Ihor (Igor)

__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-leave@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the sender!
Edit mailing list options or unsubscribe:
-- 
Daniel-Constantin Mierla (@ asipto.com)
twitter.com/miconda -- linkedin.com/in/miconda
Kamailio Consultancy - Training Services -- asipto.com
Kamailio World Conference - kamailioworld.com