Hi,
yes, you're totally right, we got the core in other server and I though the
fix was included in the code we compiled in this server, but it wasn't. My
fault.
Now, a very recent copy of the 3.1 git branch is running, Daniel's patch is
included. I'll keep you informed but it should go fine.
Thanks, and sorry for the misunderstanding,
Regards,
Anton
2011/5/13 Timo Reimann <timo.reimann(a)1und1.de>
Hey,
On 13.05.2011 11:11, Timo Reimann wrote:
On 12.05.2011 15:55, Anton Roman wrote:
> my answer is inline:
>
> 2011/5/12 Timo Reimann <timo.reimann(a)1und1.de
> <mailto:timo.reimann@1und1.de>>
> As to the reason of the segfault, the dialog structure or hash table
may
> already be gone when unref_dlg() is
called. Can you go to stack #0
and
> tell us what the value of each of the
following data structures is
(use
"p <data structure> in gdb):
*dlg
d_table
d_table->entries
Here you have:
(gdb) p *dlg
$1 = {ref = 793790803, next = 0xa0d4b4f20303032, prev =
0x504953203a616956, h_id = 808333871, h_entry = 1346655535, state =
775174432,
lifetime = 841888562, start_ts = 892219952, dflags = 808794678, sflags
= 1648046134, toroute = 1668178290, toroute_name = {
s = 0x62344768397a3d68 <Address 0x62344768397a3d68 out of bounds>,
len = 946221643}, from_rr_nb = 1886534457, tl = {
next = 0x72460a0d30363035, prev = 0x6f6e4122203a6d6f, timeout =
1869445486}, callid = {
s = 0x6f6e613a7069733c <Address 0x6f6e613a7069733c out of bounds>,
len = 1869445486}, from_uri = {
s = 0x3230322e33322e34 <Address 0x3230322e33322e34 out of bounds>,
len = 1043739950}, to_uri = {
[...]
As I suspected, your dialog seems outdated already: The reference count
is 793790803, and the Call-ID is supposed to have a rough 2 billions
characters. That's what I call unique. :)
I could ask you for more details on the dump but it'd probably be
easiest if I could take a direct (gdb-)look at it. Would you mind
sending it to me in private (i.e., no CC to the mailing list) to the
address I am writing from?
I (and Marius -- credits!) digged through your coredump and found a few
curiosities. Before I bug you with the details, let me just say this:
There might be something wrong the dialog reference counter that
determines when a dialog is a to be removed from the hash table. In
fact, your call stack indicates that an unreference operation was
attempted on a hash table which looks empty:
(gdb) frame 0
#0 unref_dlg (dlg=0x7f08a9f67da8, cnt=1) at dlg_hash.c:598
598 dlg_lock( d_table, d_entry);
(gdb) p *d_table->entries
$53 = {first = 0x0, last = 0x0, next_id = 1124074261, lock_idx = 0}
Looking through the mailing-list archive, I noticed you brought
attention to another reference counter-related bug which Daniel provided
a fix for with commit 2c28a251a. Since you reported that no more issues
appeared with that fixed version, I just backported the patch into 3.1.
However, I can see from your core dump that you are not using a Kamailio
version that includes the fix.
Before we continue with any bug hunting, could you try a version of
Kamailio that comes with Daniel's "safer unref of terminated dialogs"
patch? This can be master branch copy or a recent copy of the 3.1 git
branch. I'd suggest the latter so we can ensure that no bleeding-edge
features added to the dialog module distort our analysis.
Thanks and
Cheers,
--Timo