Hi Andrei,
I'm Nicolas and I'm working with Adrien on crashes experienced on our
SER server during the last months.
We had 4 crashes on 11 jun 2009, 13 aug 2009, 11 sept 2009 and 12 sept
2009.
Every of this crash have a similar call flow, as seen in the one
attached: SER crashes when trying to process an ACK from the CPE for the
previously relayed "482 Loop Detected" from the gateway.
totag with a the out of bound local_totag from the corresponding
tm
entry (see attached coredump analysis)
It seems to me that there is a bug, and I didn't find any patch for
this, even in the last 2.0 versions.
Do you have any idea about this problem ?
Is this bug already known ?
Sincerely,
Nicolas LEROY
Le mercredi 09 septembre 2009 à 12:26 +0200, Andrei Pelinescu-Onciul a
écrit :
On Aug 20, 2009 at 10:40, inge <inge(a)legos.fr>
wrote:
Hi Andrei,
As I understand, this changelog only apply to the tm module.
Is there any clues that this module caused the crash we experienced ?
Yes, according to the backtrace it crashed in tm. It looks like the tag
value was corrupted (one possible explanation is that matching against a
deleted transaction was attempted). It's also possible but much more
unlikely that despite the backtrace info the crash is not related to tm
(e.g. some other module corrupting shared memory).
We would like to determine which of the known and corrected bug could
have caused the crash, in order to find a short-time workaround letting
us some time to deploy abn upgrade to the latest rel in the 0.9.0
branch.
That would be quite hard since we don't know yet if the crash is really
fixed in the latest 0.9.x
If you can reproduce the crash, then you could try a test instalation of
the latest 0.9.x and see if the crash is fixed.
It's very easy to upgrade between 0.9.x versions. There are no config or
db changes, the only differences are bug fixes.
If it still crashes with the latest 0.9.x, then the next step would be
to compile it with debugging info, in an attempt to get more meaningful
backtraces.
Andrei
>
> Le mardi 18 ao??t 2009 ?? 09:00 +0200, Andrei Pelinescu-Onciul a ??crit :
> > On Aug 17, 2009 at 14:42, inge <inge(a)legos.fr> wrote:
> > > Hi Andrei,
> > >
> > > Hope you are fine.
> > > Do you have any update on our crash ?
> > > Is there anything we can do to find the segmentation fault cause, maybe
> > > as a well-known bug, without bothering you ?
> >
> >
> > There are lots of changes between 0.9.5-pre and the latest 0.9.x
> > version.
> > You should try updating to the latest code on the rel_0_9_0 branch and
> > see if you run into this problem again.
> > To get the latest 0.9.x code either get the latest snapshot from
> >
http://ftp.iptel.org/pub/ser/daily-snapshots/stable/ , use cvs to
> > get the rel_0_9_0 branch
> > (CVSROOT=:pserver:anonymous@cvs.berlios.de:/cvsroot/ser ;
> > export CVSROOT ; cvs co -r rel_0_9_0 sip_router ), or use git and the
> > ser repository (see
http://sip-router.org/wiki/git/ser-repository).
> >
> > Here's a short changelog for tm, between 0.9.5 and 0.9.7+
> > (git log --oneline v_0_9_5..origin/rel_0_9_0 modules/tm):
> > - tm: fix delete_cell() when the transaction is referenced
> > - variable timer fix: variable timers (avps) won't be exteneded anymore
> > - fix for free_rdata_list() which used to access the "next" pointer
af
> > - deadlock when t_relay-ing a message from the failure_route fixed (e2e
> > - added sems specific patch. This patch is present in the ser version ship
> > - added diversion and rpid header cloning
> > -bug fix: tm insert_timer used to eat too much cpu, decreasing dramatic
> > - fixed misplaced set_avp list, courtesy of cesc.santa(a)gmail.com
> > - int2reverse_hex/reverse_hex2int fixes (tm with large "labels" was
aff
> > - fix of local ACK matching provided by cesc.santa(a)gmail.com
> > - avp race condition fix (backported from HEAD)
> > - CANCEL terminates retransmission timers properly (backported)
> >
> >
> > Andrei
> >
> >
> > >
> > > Le vendredi 14 ao??t 2009 ?? 17:03 +0200, inge a ??crit :
> > > > Please find the requested information in attached.
> > > >
> > > > I'm aware of the need for an update. It's in the list of
tasks to be
> > > > done, however, the priority is to troubleshoot the problem and maybe
> > > > find a workaround.
> > > >
> > > > Regards,
> > > >
> > > > Adrien
> > > >
> > > > Le vendredi 14 ao??t 2009 ?? 16:34 +0200, Andrei Pelinescu-Onciul a
> > > > ??crit :
> > > > > On Aug 14, 2009 at 15:01, inge <inge(a)legos.fr> wrote:
> > > > > > Hi Andrei,
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > I use ser 0.9.5-pre4.
> > > > > >
> > > > > > I don't really understand the bug you have identify,
where can I find a
> > > > > > description ?
> > > > >
> > > > > Sorry, I was wrong (that bug was in RR and appears only in newer
code).
> > > > >
> > > > > Could you run gdb on the core again , type "frame 0"
and then send me the
> > > > > output of the following commands:
> > > > >
> > > > > print p_cell
> > > > > print p_msg
> > > > > print p_msg->buf
> > > > > print p_cell->uas.local_totag.len
> > > > > print p_cell->uas.local_totag.s
> > > > > print p_msg->to
> > > > > print p_msg->to->parsed
> > > > > print *((struct to_body*)(p_msg->to->parsed))
> > > > > print ((struct
to_body*)(p_msg->to->parsed))->tag_value.len
> > > > > print ((struct
to_body*)(p_msg->to->parsed))->tag_value.s
> > > > >
> > > > >
> > > > > Andrei
> > > > > P.S.: you could try also upgrading to ser 2.0, 2.1 or
sip-router.
> > > > >
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Adrien
> > > > > >
> > > > > > Le vendredi 14 ao??t 2009 ?? 14:45 +0200, Andrei
Pelinescu-Onciul a
> > > > > > ??crit :
> > > > > > > On Aug 13, 2009 at 15:32, inge <inge(a)legos.fr>
wrote:
> > > > > > > > Hi Klaus,
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > I put the output of gdb in attached.
> > > > > > > >
> > > > > > > > I hope someone can decrypt this. Thank you.
> > > > > > >
> > > > > > >
> > > > > > > If you are using ser 2.1/latest cvs or sip-router then
just update to
> > > > > > > the latest cvs or git. It's a known fixed bug (sip
router
> > > > > > > git 6fcd5e or ser 2.1 commit starting with "rr:
fix from header
> > > > > > > access").
> > > > > > >
> > > > > > > If you are using another version then tell me which
one (ser -V)
> > > > > > > and I'll fix it.
> > > > > > >
> > > > > > > Andrei
> > > > > > >
> > > > > > > >
> > > > > > > > Le jeudi 13 ao??t 2009 ?? 13:53 +0200, Klaus
Darilion a ??crit :
> > > > > > > > > locate the core file (either in the working
dir or /tmp or /)
> > > > > > > > > then execute:
> > > > > > > > >
> > > > > > > > > gdb /usr/local/sbin/ser /path/to/core
> > > > > > > > > (gdb) bt
> > > > > > > > >
> > > > > > > > > regards
> > > > > > > > > klaus
> > > > > > > > >
> > > > > > > > > inge schrieb:
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > My SER process had crashed today with
the following logs
> > > > > > > > > > in /var/log/messages :
> > > > > > > > > >
> > > > > > > > > > ser[378]: child process 418 exited by a
signal 11
> > > > > > > > > > ser[378]: core was generated
> > > > > > > > > > ser[378]: INFO: terminating due to
SIGCHLD
> > > > > > > > > > ser[421]: INFO: signal 15 received
> > > > > > > > > > ...
> > > > > > > > > >
> > > > > > > > > > Can someone help me to determine what
kind of problem is it ? I think I
> > > > > > > > > > need to use gdb to extract some
information from the core dump. How can
> > > > > > > > > > I use it to extract the uses
informations ?
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Adrien
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
_______________________________________________
> > > > > > > > > > sr-dev mailing list
> > > > > > > > > > sr-dev(a)lists.sip-router.org
> > > > > > > > > >
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
> > > > > > >
> > > > > > > > #0 0x00e964d3 in matching_3261 (p_msg=0x81647e8,
trans=0xbff74f38, skip_method=4294967294) at t_lookup.c:222
> > > > > > > > 222 if
(memcmp(get_to(ack)->tag_value.s,p_cell->uas.local_totag.s,
> > > > > > > > (gdb) bt
> > > > > > > > #0 0x00e964d3 in matching_3261 (p_msg=0x81647e8,
trans=0xbff74f38, skip_method=4294967294) at t_lookup.c:222
> > > > > > > > #1 0x00e96aff in t_lookup_request
(p_msg=0x81647e8, leave_new_locked=1) at t_lookup.c:421
> > > > > > > > #2 0x00e992a0 in t_newtran (p_msg=0x81647e8) at
t_lookup.c:1085
> > > > > > > > #3 0x00e9116a in t_relay_to (p_msg=0x81647e8,
proxy=0x0, proto=0, replicate=0) at t_funcs.c:224
> > > > > > > > #4 0x00e9c410 in w_t_relay (p_msg=0x81647e8,
_foo=0x0, _bar=0x0) at tm.c:889
> > > > > > > > #5 0x0804fc81 in do_action (a=0x8117818,
msg=0x81647e8) at action.c:610
> > > > > > > > #6 0x0805099d in run_actions (a=0x8117818,
msg=0x81647e8) at action.c:718
> > > > > > > > #7 0x08073f08 in eval_elem (e=0x8117840,
msg=0x81647e8) at route.c:605
> > > > > > > > #8 0x08074392 in eval_expr (e=0x8117840,
msg=0x81647e8) at route.c:654
> > > > > > > > #9 0x080743ce in eval_expr (e=0x8117860,
msg=0x81647e8) at route.c:670
> > > > > > > > #10 0x0804ec95 in do_action (a=0x8117bc8,
msg=0x81647e8) at action.c:586
> > > > > > > > #11 0x0805099d in run_actions (a=0x8117630,
msg=0x81647e8) at action.c:718
> > > > > > > > #12 0x0804ffdf in do_action (a=0x8114f70,
msg=0x81647e8) at action.c:375
> > > > > > > > #13 0x0805099d in run_actions (a=0x8114f70,
msg=0x81647e8) at action.c:718
> > > > > > > > #14 0x0804ecd3 in do_action (a=0x8114fc0,
msg=0x81647e8) at action.c:603
> > > > > > > > #15 0x0805099d in run_actions (a=0x8114fc0,
msg=0x81647e8) at action.c:718
> > > > > > > > #16 0x0804ecd3 in do_action (a=0x8114fe8,
msg=0x81647e8) at action.c:603
> > > > > > > > #17 0x0805099d in run_actions (a=0x8114fe8,
msg=0x81647e8) at action.c:718
> > > > > > > > #18 0x0804ecd3 in do_action (a=0x8115010,
msg=0x81647e8) at action.c:603
> > > > > > > > #19 0x0805099d in run_actions (a=0x8115010,
msg=0x81647e8) at action.c:718
> > > > > > > > #20 0x0804ecd3 in do_action (a=0x8115038,
msg=0x81647e8) at action.c:603
> > > > > > > > #21 0x0805099d in run_actions (a=0x8115038,
msg=0x81647e8) at action.c:718
> > > > > > > > #22 0x0804ecd3 in do_action (a=0x8115060,
msg=0x81647e8) at action.c:603
> > > > > > > > #23 0x0805099d in run_actions (a=0x810fe88,
msg=0x81647e8) at action.c:718
> > > > > > > > #24 0x0806d062 in receive_msg (
> > > > > > > > buf=0x80d61e0 "ACK
sip:0389719641@domain.tld:5060 SIP/2.0\r\nMax-Forwards: 16\r\nContent-Length: 0\r\nVia:
SIP/2.0/UDP 10.0.140.147:5060;branch=z9hG4bK4f1b8571c\r\nCall-ID:
bf85c76a5e2066256679e3945f6b4e36(a)10.0.140.147\r\nF"quot;..., len=592, rcv_info=0xbff76340)
at receive.c:165
> > > > > > > > #25 0x080843cc in udp_rcv_loop () at
udp_server.c:472
> > > > > > > > #26 0x0805cdaf in main_loop () at main.c:1056
> > > > > > > > #27 0x0805e40b in main (argc=1, argv=0xbff76504)
at main.c:1592
> > > > > > > >
> > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > sr-dev mailing list
> > > > > > > > sr-dev(a)lists.sip-router.org
> > > > > > > >
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
> > > > > > >
> > > > _______________________________________________
> > > > sr-dev mailing list
> > > > sr-dev(a)lists.sip-router.org
> > > >
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev