Andrei,
Thanks for your update on this.
As the conditions for the bug to appear is quite erratic (seems to be
several INVITE sent to the gateway in a row, that make it sending a "482
Loop Detected" leading to the crashing ACK), we do not manage to
reproduce this.
I never saw this on our 0.9.7pre1 lab version, but this one is quite
empty and relays only a few test calls compared to our production 0.9.5
one serving severeal thousand of end users.
Is there any explanation on how we can have an <out of bound memory
address> for the local_totag of the 100-ed tm entry ?
Is there any way to check if the memory address is OK before passing it
to memcmp ?
This has not been modified in t_lookup.c from v0.9.7 nor v2.0 (attached)
Sincerely,
Nicolas
Le mercredi 16 septembre 2009 à 12:49 +0200, Andrei Pelinescu-Onciul a
écrit :
On Sep 16, 2009 at 10:47, inge <inge(a)legos.fr>
wrote:
Hi Andrei,
I'm Nicolas and I'm working with Adrien on crashes experienced on our
SER server during the last months.
We had 4 crashes on 11 jun 2009, 13 aug 2009, 11 sept 2009 and 12 sept
2009.
Every of this crash have a similar call flow, as seen in the one
attached: SER crashes when trying to process an ACK from the CPE for the
previously relayed "482 Loop Detected" from the gateway.
From coredump analysis, the crash occures when
trying to match the ack
totag with a the out of bound local_totag from the
corresponding tm
entry (see attached coredump analysis)
Yes, I saw the same thing.
It seems to me that there is a bug, and I didn't find any patch for
this, even in the last 2.0 versions.
Yes, it's a bug, but things changed a lot between versions.
It might be fixed even in 0.9.7.
Do you have any idea about this problem ?
No.
Is this bug already known ?
No.
If you can reproduce it easily, try it with 0.9.7 (it will work with the
same config as 0.9.4, you don't have to change anything).
If you can still see it, try compiling with debug support
(make proper; make mode=debug all and also don't forget to recompile
with mode=debug any other module you might be using that is not covered
by make all). After this the coredumps should be "better" (more info,
no variables will be optimized to registers).
Hopefully 0.9.7 will solve your problems. If it doesn't then send me again
some backtraces and/or the coredump + binaries (unfortunately the code
is very old and I'm not any longer familiar with it).
Andrei