Hey,
On 13.05.2011 11:11, Timo Reimann wrote:
On 12.05.2011 15:55, Anton Roman wrote:
my answer is inline:
2011/5/12 Timo Reimann <timo.reimann(a)1und1.de
<mailto:timo.reimann@1und1.de>>
As to the reason of the segfault, the dialog structure or hash table may
already be gone when unref_dlg() is called. Can you go to stack #0 and
tell us what the value of each of the following data structures is (use
"p <data structure> in gdb):
*dlg
d_table
d_table->entries
Here you have:
(gdb) p *dlg
$1 = {ref = 793790803, next = 0xa0d4b4f20303032, prev =
0x504953203a616956, h_id = 808333871, h_entry = 1346655535, state =
775174432,
lifetime = 841888562, start_ts = 892219952, dflags = 808794678, sflags
= 1648046134, toroute = 1668178290, toroute_name = {
s = 0x62344768397a3d68 <Address 0x62344768397a3d68 out of bounds>,
len = 946221643}, from_rr_nb = 1886534457, tl = {
next = 0x72460a0d30363035, prev = 0x6f6e4122203a6d6f, timeout =
1869445486}, callid = {
s = 0x6f6e613a7069733c <Address 0x6f6e613a7069733c out of bounds>,
len = 1869445486}, from_uri = {
s = 0x3230322e33322e34 <Address 0x3230322e33322e34 out of bounds>,
len = 1043739950}, to_uri = {
[...]
As I suspected, your dialog seems outdated already: The reference count
is 793790803, and the Call-ID is supposed to have a rough 2 billions
characters. That's what I call unique. :)
I could ask you for more details on the dump but it'd probably be
easiest if I could take a direct (gdb-)look at it. Would you mind
sending it to me in private (i.e., no CC to the mailing list) to the
address I am writing from?
I (and Marius -- credits!) digged through your coredump and found a few
curiosities. Before I bug you with the details, let me just say this:
There might be something wrong the dialog reference counter that
determines when a dialog is a to be removed from the hash table. In
fact, your call stack indicates that an unreference operation was
attempted on a hash table which looks empty:
(gdb) frame 0
#0 unref_dlg (dlg=0x7f08a9f67da8, cnt=1) at dlg_hash.c:598
598 dlg_lock( d_table, d_entry);
(gdb) p *d_table->entries
$53 = {first = 0x0, last = 0x0, next_id = 1124074261, lock_idx = 0}
Looking through the mailing-list archive, I noticed you brought
attention to another reference counter-related bug which Daniel provided
a fix for with commit 2c28a251a. Since you reported that no more issues
appeared with that fixed version, I just backported the patch into 3.1.
However, I can see from your core dump that you are not using a Kamailio
version that includes the fix.
Before we continue with any bug hunting, could you try a version of
Kamailio that comes with Daniel's "safer unref of terminated dialogs"
patch? This can be master branch copy or a recent copy of the 3.1 git
branch. I'd suggest the latter so we can ensure that no bleeding-edge
features added to the dialog module distort our analysis.
Thanks and
Cheers,
--Timo