Hi,
I’ve run into stubbornly persistent problems with packets not reaching WSS clients until
some seconds (5-6s or more!) after they were supposedly sent. The symptom is that Kamailio
logs the event as sent at 20:00:00, including in the onsend_route, but in actual reality
it takes quite a bit longer to send at the OS network stack level. This happens
occasionally and quite universally, with no discernible pattern tied to particular
endpoints, networks, etc.
For a long time, I’ve just assumed this was a receive delay, since Kamailio logged an
expectedly prompt relay time. I figured it was an application or browser execution delay
and had nothing to do with networking. Based on what we know about the capriciousness of
browsers and their internal task scheduling, this seemed rather plausible. However, I’ve
since been able to ascertain that this is not the right understanding of the problem,
because a parallel and unrelated WS keepalive, going from a different backend to the same
application/browser tab/event loop, is consistently replied to in a timely manner.
Anyway, I’m curious what else can be done to debug and/or fix this.
I did turn up the size of my write buffers — tcp_wq_conn_max and the global one — quite a
lot, some time ago and for unrelated reasons. I wonder if this might actually make the
problem worse, since it leads to more bytes queued to send to an endpoint on the other
side that could, conceivably, not be reading them fast enough. But I also wonder if this
is tied up in some low-level TLS or WS parameters unrelated to more general TCP
configuration.
Thanks much!
— Alex
—
Sent from mobile, apologies for brevity and errors.