On Thursday 10 June 2010, Andrei Pelinescu-Onciul wrote:
Is the
performance gain really worth the effort? I wonder why
constructing of IP/UDP headers in sr is faster than having it done
in the kernel.
It's a workaround for performance/scalability problems on bigger
multi-cpu/cores machines running linux. It has nothing to do with the
construction of the headers, but with locking inside the kernel when
sending on the same udp socket (or raw socket w/o IP_HDRINCL). If you
are trying to send on the same socket in the same time from multiple
cores you'll hit this problem.
Some of us have seen symptoms which I believe are related to this
problem.
On an 8 cpu machine running an older kernel (IIRC 2.6.22),
I got between 18%-28% improvement in _stateless_ forwarding just by
distributing the traffic on 8 different sockets instead of one.
I believe it would be even better with the raw socket support, but we'll
see when the code will be ready for testing.
Hello Andrei,
the statistics looks promising, thanks. It would be indeed interesting to know
how it performs with raw sockets then. I also looked a bit, aparently is also
know for some other services like e.g. memcached:
"We discovered that under load on Linux, UDP performance was downright
horrible. This is caused by considerable lock contention on the UDP socket
lock when transmitting through a single socket from multiple threads. Fixing
the kernel by breaking up the lock is not easy. Instead, we used separate UDP
sockets for transmitting replies (with one of these reply sockets per thread).
With this change, we were able to deploy UDP without compromising performance
on the backend." (
http://www.facebook.com/note.php?note_id=39391378919)
So it might be a good idea to evaluate how big is the actual improvement with
raw sockets over the multiple sockets, if its make sense to from a maintenance
POV to go with this solution (not sure how complicated the actual
implementation will be..).
Henning