It's probably worth asking if you tried the "old" config on the
"new" system? Given your descriptions it sounds like there's a
large difference between the two configs and it's probably good
to isolate against that - alternately, you could try the "new"
config on the old system as well. I don't want to jump to "it's
a config issue", but consider the fact that your new
configuration will use a lot more shared memory if you're
trading storing data in htable instead of redis. Your new host
might have tons of memory, but if you're not allocating that
into Kamailio it won't really matter, etc.
CAUTION: This email originated from
outside the organization. Do not click links or open attachments
unless you recognize the sender and know the content is safe.
To be clear, the "new" system is currently handling ~1000-1500
cps with
no issues at all, it only starts exhibiting these "drops" when
we push
it with the rest of the traffic that's currently on the "old"
system.
It runs fine for a while, but then when the traffic level gets
up to a
certain point, it starts tossing things like "ERROR: tm
[tm.c:1846]:
_w_t_relay_to(): t_forward_noack failed" into the system journal
and
those memory access errors show up in a trap backtrace.
While I get that there are more "things" than one might think,
the
disparity is kind of what I'm looking at. 8 or 9 redis lookups,
plus
tons of function calls to evaluate message parameters, plus
dispatcher
lookups, plus .... that all equals a lot more "stuff" that gets
put into
the pipe to be worked on, and for the cpu scheduler to handle.
Alternately, the new one will have drastically fewer of these,
so it's
going to "churn" a lot more since each child worker is doing
less overall.
I'm mainly trying to figure out if the addition of the
hyperthreading
could be part of the reason we're having so much trouble
reaching the
higher levels of traffic that this thing should be able to
handle as
opposed to the old one.
Thanks for the insight!
On 5/29/2025 10:10 AM, Alex Balashov via sr-users wrote:
> The counterpoint would be:
>
> Kamailio may not wait a lot for outside I/O, but even a
pure in-memory config's code path largely rakes any incoming
request over a bunch of kernel and system call interactions,
which are in effect wait.
>
> That doesn't have much in common with a truly
pure-computational workload.
>
> -- Alex
>
>> On May 29, 2025, at 1:06 PM, Alex Balashov
<abalashov@evaristesys.com> wrote:
>>
>> I don't have a great deal of empirical experience with
this. However, proceeding from first principles in a rather
general way, hyperthreading probably does more harm than good to
more pure CPU-bound workloads, and is more beneficial in
I/O-bound workloads where there is more of the kind of waiting
on external responses that you ascribe to the "old" system.
>>
>> The basic idea of hyperthreading, as best as I
understand it, is a kind of oversubscription of a physical core
across two logical threads. That means all the core's resources
are shared (e.g. caches, memory bandwidth) and the OS has to
manage more threads on the same physical core, increasing
scheduler overhead.
>>
>> I imagine this means that hyperthreading works best in
cases where there are execution or memory access patterns that
naturally "yield" to the other thread on the same core a lot,
with I/O waiting of course coming to mind first and foremost.
The "hyper" part I guess comes from the idea that the physical
cores aren't sitting idle as much as when a single thread of
execution waits on something. If there's not much of that kind
of waiting, HyperThreading probably isn't of much help, and may
even hurt.
>>
>> The real unknown is whether a seemingly compute-bound
Kamailio config has the kinds of execution patterns that create
a lot of natural yielding on the given architecture. It's hard
to say, under the hood, if htable lookups and dispatcher leaves
a lot of these yield "gaps". My guess is they don't; it doesn't
seem to me to map onto stuff like complex arithmetic, wildly
varying memory allocations, cache misses and branch
mispredictions and other things that could lead a thread of
execution to stall. However, without forensically combing
through the underlying machine code--which I of course cannot
do--it's so hard to say. There are attempts throughout
Kamailio's core code to modulate branch prediction to the
compiler, e.g. through lots of use of unlikely()[1]:
>>
>> /* likely/unlikely */
>> #if __GNUC__ >= 3
>>
>> #define likely(expr) __builtin_expect(!!(expr), 1)
>> #define unlikely(expr) __builtin_expect(!!(expr), 0)
>>
>> I don't know how you'd profile the real-world effects
of this, though.
>>
>> My guess is that this is, like most things of the kind,
splitting very thin hairs, and doesn't have much effect on your
ability to process 2500 CPS one way or another.
>>
>> -- Alex
>>
>> [1]
https://urldefense.com/v3/__https://stackoverflow.com/questions/7346929/what-is-the-advantage-of-gccs-builtin-expect-in-if-else-statements__;!!KWzduNI!ZF1bZli3605YMy2B4whLR_uPDFQwzpJianaOdh67MFi8UPtleWKQx78A1lPrkrBaSn55Tx1DsMQcRHAbiLqqYmk$
>>
>>> On May 29, 2025, at 11:57 AM, Brooks Bridges via
sr-users
<sr-users@lists.kamailio.org> wrote:
>>>
>>> So, I have a platform that handles in the order of
around 2500 or more call attempts per second. There are 2
servers that are built that handle this traffic, "old" and
"new". The "old" design is a bit heavy, has multiple redis
calls to external IO, does a lot of "work" on each invite before
passing it along, etc. The "new" design is extremely
lightweight, only a couple of htable comparisons and a
dispatcher check on each invite before passing it along. The 2
servers are running different versions of Kamailio, however
they're both in the 5.x train and I've found nothing in the
changelogs that I believe would explain a significant difference
in how child process CPU scheduling is performed, especially a
detrimental change. I'm more than happy to be proven wrong
about that though!
>>>
>>> The "old" system is running 5.4.8:
>>> This system is running 4 x 8 core CPUs (32 cores)
with no hyperthreading (4 x E5-4627 v2 to be exact) and has
multiple external IO interactions with things like redis, lots
of "work" being done on each invite checking various things
before allowing it to proceed. This is currently running 32
child processes and again, has no issues handling the requests
whatsoever and has been running at this level and higher for
literal years.
>>>
>>> The "new" system is running 5.8.3:
>>> This system is running 2 x 8 core CPUs (16 cores)
*with* hyperthreading (2 x E5-2667 v4 to be exact) and has no
external IO interactions with anything, and only does a couple
of checks against some hash table values and dispatcher before
allowing it to proceed. This is also currently running 32 child
processes, however I am experiencing spikes in context switching
and odd memory access issues (e.g. backtraces show things like
"<error: Cannot access memory at address 0x14944ec60>" in
udp_rvc_loop) with similar loads.
>>>
>>> What I'm looking for here is empirical knowledge
about things specifically like context switching and if
hyperthreading is detrimental in cases where I'm doing very high
volume, low latency (e.g. no waiting on external replies, etc)
work. Is it possible there are additional issues with the way
hyperthreading works and the concepts behind it if the
individual calls for CPU resources are so "quick" as to
overwhelm the scheduler and cause congestion in the CPU's thread
scheduler itself?
>>>
>>> Have you run into anything like this?
>>> Have you discovered that very low latency call
processing can incur more hyperthreading overhead than the
benefit you get from it?
>>> Do you have additional data I may be missing to
help me justify to the C suite to change out the system
architecture to provide a higher physical core count?
>>> Can you share it?
>>>
>>> Thanks!
>>>
__________________________________________________________
>>> Kamailio - Users Mailing List - Non Commercial
Discussions --
sr-users@lists.kamailio.org
>>> To unsubscribe send an email to
sr-users-leave@lists.kamailio.org
>>> Important: keep the mailing list in the recipients,
do not reply only to the sender!
>> --
>> Alex Balashov
>> Principal Consultant
>> Evariste Systems LLC
>> Web:
https://urldefense.com/v3/__https://evaristesys.com__;!!KWzduNI!ZF1bZli3605YMy2B4whLR_uPDFQwzpJianaOdh67MFi8UPtleWKQx78A1lPrkrBaSn55Tx1DsMQcRHAbzNvaqfA$
>> Tel: +1-706-510-6800
>>
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions --
sr-users@lists.kamailio.org
To unsubscribe send an email to
sr-users-leave@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply
only to the sender!