On Sep 16, 2009 at 16:11, inge <inge(a)legos.fr> wrote:
Andrei,
Thanks for your update on this.
As the conditions for the bug to appear is quite erratic (seems to be
several INVITE sent to the gateway in a row, that make it sending a "482
Loop Detected" leading to the crashing ACK), we do not manage to
reproduce this.
I never saw this on our 0.9.7pre1 lab version, but this one is quite
empty and relays only a few test calls compared to our production 0.9.5
one serving severeal thousand of end users.
Is there any explanation on how we can have an <out of bound memory
address> for the local_totag of the 100-ed tm entry ?
Either some module corrupts shared memory somehow (very hard to find
out), or a deleted transaction is somehow used (e.g. race condition, the
transaction is found in the list, but deleted immediately before being
accessed).
Is there any way to check if the memory address is OK before passing it
to memcmp ?
Yes, you could check that, but it wouldn't help. That bad address means
that things are very wrong. You could avoid the memcpy, but you'll most
likely only delay the crash.
What you could do is add some LOG() statements and log all the fields of
the transaction when the address is wrong (but the information is about
the same that you would get from a coredump when ser is compiled with
debugging).
This has not been modified in t_lookup.c from v0.9.7
nor v2.0 (attached)
Yes, but the problem is not in the place where it crashes, it's somewhere
else.
Both 0.9.7 and 0.9.4 are very old and I don't really remember all the
fixes that went in (the cvs log is helpful, but does not tell the whole
story). I wouldn't want to spend a lot of time debugging 0.9.4, just to
find out in the end that the bug was fixed in 0.9.7, especially since
there were non-trivial tm changes between the two.
Andrei