On Sep 16, 2009 at 16:11, inge inge@legos.fr wrote:
Andrei,
Thanks for your update on this. As the conditions for the bug to appear is quite erratic (seems to be several INVITE sent to the gateway in a row, that make it sending a "482 Loop Detected" leading to the crashing ACK), we do not manage to reproduce this.
I never saw this on our 0.9.7pre1 lab version, but this one is quite empty and relays only a few test calls compared to our production 0.9.5 one serving severeal thousand of end users.
Is there any explanation on how we can have an <out of bound memory address> for the local_totag of the 100-ed tm entry ?
Either some module corrupts shared memory somehow (very hard to find out), or a deleted transaction is somehow used (e.g. race condition, the transaction is found in the list, but deleted immediately before being accessed).
Is there any way to check if the memory address is OK before passing it to memcmp ?
Yes, you could check that, but it wouldn't help. That bad address means that things are very wrong. You could avoid the memcpy, but you'll most likely only delay the crash. What you could do is add some LOG() statements and log all the fields of the transaction when the address is wrong (but the information is about the same that you would get from a coredump when ser is compiled with debugging).
This has not been modified in t_lookup.c from v0.9.7 nor v2.0 (attached)
Yes, but the problem is not in the place where it crashes, it's somewhere else. Both 0.9.7 and 0.9.4 are very old and I don't really remember all the fixes that went in (the cvs log is helpful, but does not tell the whole story). I wouldn't want to spend a lot of time debugging 0.9.4, just to find out in the end that the bug was fixed in 0.9.7, especially since there were non-trivial tm changes between the two.
Andrei