[SR-Users] Hyperthreading and high request volumes in modern times

29 May 2025


      So, I have a platform that handles in the order of around 2500 or more 
call attempts per second.  There are 2 servers that are built that 
handle this traffic, "old" and "new".  The "old" design is a bit heavy, 
has multiple redis calls to external IO, does a lot of "work" on each 
invite before passing it along, etc.  The "new" design is extremely 
lightweight, only a couple of htable comparisons and a dispatcher 
check on each invite before passing it along.  The 2 servers are running 
different versions of Kamailio, however they're both in the 5.x train 
and I've found nothing in the changelogs that I believe would explain a 
significant difference in how child process CPU scheduling is performed, 
especially a detrimental change.  I'm more than happy to be proven wrong 
about that though!
The "old" system is running 5.4.8:
This system is running 4 x 8 core CPUs (32 cores) with no hyperthreading 
(4 x E5-4627 v2 to be exact) and has multiple external IO interactions 
with things like redis, lots of "work" being done on each invite 
checking various things before allowing it to proceed.  This is 
currently running 32 child processes and again, has no issues handling 
the requests whatsoever and has been running at this level and higher 
for literal years.
The "new" system is running 5.8.3:
This system is running 2 x 8 core CPUs (16 cores) *with* hyperthreading 
(2 x E5-2667 v4 to be exact) and has no external IO interactions with 
anything, and only does a couple of checks against some hash table 
values and dispatcher before allowing it to proceed.  This is also 
currently running 32 child processes, however I am experiencing spikes 
in context switching and odd memory access issues (e.g. backtraces show 
things like "<error: Cannot access memory at address 0x14944ec60>" in 
udp_rvc_loop) with similar loads.
What I'm looking for here is empirical knowledge about things 
specifically like context switching and if hyperthreading is detrimental 
in cases where I'm doing very high volume, low latency (e.g. no waiting 
on external replies, etc) work.  Is it possible there are additional 
issues with the way hyperthreading works and the concepts behind it if 
the individual calls for CPU resources are so "quick" as to overwhelm 
the scheduler and cause congestion in the CPU's thread scheduler itself?
Have you run into anything like this?
Have you discovered that very low latency call processing can incur more 
hyperthreading overhead than the benefit you get from it?
Do you have additional data I may be missing to help me justify to the C 
suite to change out the system architecture to provide a higher physical 
core count?
Can you share it?
Thanks!

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[SR-Users] Hyperthreading and high request volumes in modern times