Hello,
I have the backtraces.
This is the output of 'openserctl ps'
 
Process::  ID=0 PID=934 Type=attendant
Process::  ID=1 PID=949 Type=SIP receiver udp:10.45.8.234:5060
Process::  ID=2 PID=950 Type=SIP receiver udp:10.45.8.234:5060
Process::  ID=3 PID=951 Type=SIP receiver udp:10.45.8.234:5060
Process::  ID=4 PID=952 Type=SIP receiver udp:10.45.8.234:5060
Process::  ID=5 PID=957 Type=timer
Process::  ID=6 PID=958 Type=MI FIFO
Process::  ID=7 PID=959 Type=TCP receiver
Process::  ID=8 PID=962 Type=TCP receiver
Process::  ID=9 PID=963 Type=TCP receiver
Process::  ID=10 PID=964 Type=TCP receiver
Process::  ID=11 PID=965 Type=TCP main
 
The process 952 is on loop, you can see below the output of ps command.
This time gdb gives more info on process in loop.
 
This is the backtrace of each process:
 
934 [attendant]
(gdb) bt
#0  0x4004ccd0 in pause () from /usr/local/lib2/libpthread.so.0
 
949 [udp receiver]
(gdb) bt
#0  0x4004c0f0 in read () from /usr/local/lib2/libpthread.so.0
 
950 [udp receiver]
(gdb) bt
#0  0x4004c0f0 in read () from /usr/local/lib2/libpthread.so.0
 
951 [udp receiver]
(gdb) bt
#0  0x4004c0f0 in read () from /usr/local/lib2/libpthread.so.0
 

952 [udp receiver in loop]
(gdb) bt
#0  0x403a2dc4 in t_retransmit_reply () from /usr/local/ser/tm.so
#1  0x40397734 in t_newtran () from /usr/local/ser/tm.so
#2  0x4038c178 in t_relay_to () from /usr/local/ser/tm.so
#3  0x4039bba0 in load_tm () from /usr/local/ser/tm.so
 
957 [Timer]
(gdb) bt
#0  0x401459d8 in select () from /usr/local/lib2/libc.so.6
 

958 [MI FIFO]
(gdb) bt
#0  0x4013f5f0 in read () from /usr/local/lib2/libc.so.6
#1  0x00400f8c in mem_pool ()
#2  0x00400f8c in mem_pool ()
 
 
959 [TCP receiver]
(gdb) bt
#0  0x400c672c in __libc_allocate_rtsig_private () from /usr/local/lib2/libc.so.6
#1  0x400c67a0 in sigtimedwait () from /usr/local/lib2/libc.so.6
#2  0x0006ba2c in force_tcp_conn_lifetime ()
#3  0x400080d0 in _dl_rtld_di_serinfo () from /usr/local/lib2/ld-linux.so.2
#4  0x000f1658 in tcp_children ()
 
962 [TCP receiver]
(gdb) bt
#0  0x400c672c in __libc_allocate_rtsig_private () from /usr/local/lib2/libc.so.6
#1  0x400c67a0 in sigtimedwait () from /usr/local/lib2/libc.so.6
#2  0x0006ba2c in force_tcp_conn_lifetime ()
#3  0x400080d0 in _dl_rtld_di_serinfo () from /usr/local/lib2/ld-linux.so.2
#4  0x000f1658 in tcp_children ()
 

963 [TCP receiver]
(gdb) bt
#0  0x400c672c in __libc_allocate_rtsig_private () from /usr/local/lib2/libc.so.6
#1  0x400c67a0 in sigtimedwait () from /usr/local/lib2/libc.so.6
#2  0x0006ba2c in force_tcp_conn_lifetime ()
#3  0x400080d0 in _dl_rtld_di_serinfo () from /usr/local/lib2/ld-linux.so.2
#4  0x000f1658 in tcp_children ()
 

964 [TCP receiver]
(gdb) bt
#0  0x400c672c in __libc_allocate_rtsig_private () from /usr/local/lib2/libc.so.6
#1  0x400c67a0 in sigtimedwait () from /usr/local/lib2/libc.so.6
#2  0x0006ba2c in force_tcp_conn_lifetime ()
#3  0x400080d0 in _dl_rtld_di_serinfo () from /usr/local/lib2/ld-linux.so.2
#4  0x000f1658 in tcp_children ()
 
965 [TCP main]
(gdb) bt
#0  0x400c672c in __libc_allocate_rtsig_private () from /usr/local/lib2/libc.so.6
#1  0x400c67a0 in sigtimedwait () from /usr/local/lib2/libc.so.6
#2  0x00063bcc in tcpconn_put ()
 
 
> ps axu
 
root       934  0.0  2.7 31464 7068 ?        S    15:10   0:01 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
root       949  7.8  3.0 31588 7724 ?        S    15:10   3:05 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
root       950  0.0  3.0 31584 7740 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
root       951  0.0  3.1 31588 8112 ?        S    15:10   0:01 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
root       952 20.6  3.0 31584 7780 ?        R    15:10   8:09 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
root       957  0.0  2.9 31464 7664 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
root       958  0.0  2.7 31468 7084 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
root       959  0.0  2.7 31464 7140 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
root       962  0.0  2.7 31464 7140 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
root       963  0.0  2.7 31464 7140 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
root       964  0.0  2.7 31464 7140 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
root       965  0.0  2.7 31464 7124 ?        S    15:10   0:00 openser -m 16 -P /var/run/ser.pid -f /etc/sam/ser/ser_IR.cfg -l udp:10.45.8.234 5060 tcp:10.45.8.234:5060
 
daniele
-----Messaggio originale-----
Da: Daniel-Constantin Mierla [mailto:miconda@gmail.com]
Inviato: lunedì 29 marzo 2010 11.01
A: Zappasodi Daniele
Cc: sr-dev; users@lists.kamailio.org
Oggetto: Re: [Kamailio-Users] R: R: Fast lock loop with arm

Hello,

On 3/29/10 8:53 AM, Zappasodi Daniele wrote:

Hello,

I have tried a lot of times with different processes, but backtrace shows always only this.


this is really strange. do you use mi_fifo? if yes, when you start openser run 'openserctl ps'

Spot a udp worker process, the mi fifo process, the main process and the timer process. when the locking happens, attach to each of them and get the backtrace.

Thanks,
Daniel

Daniele
-----Messaggio originale-----
Da: Daniel-Constantin Mierla [mailto:miconda@gmail.com]
Inviato: venerdì 26 marzo 2010 19.40
A: Zappasodi Daniele
Cc: users@lists.kamailio.org; sr-dev
Oggetto: Re: R: [Kamailio-Users] Fast lock loop with arm

Hello,

On 3/26/10 4:13 PM, Zappasodi Daniele wrote:

Hello,

this is what I get with gdb:

(gdb) bt full

#0  0x40137e54 in sched_yield () from /usr/local/lib2/libc.so.6

No symbol table info available.

I don’t think that it can help, but I am not able to load the symbol table for openser on the server.

hmm, strange. Did you try with many processes? Sam result in the backtrace?

Cheers,
Daniel

 

thanks,

Daniele

-----Messaggio originale-----
Da: Daniel-Constantin Mierla [mailto:miconda@gmail.com]
Inviato: mercoledì 24 marzo 2010 16.47
A: Zappasodi Daniele
Cc: users@lists.kamailio.org; sr-dev
Oggetto: Re: [Kamailio-Users] Fast lock loop with arm

Hello,

what version of kamailio do you use?

Can you grab a backtrace with gdb once you get this high cpu usage? Spot one of the processes, take the pid and do:

gdb  /path/to/kamailio  _pid_

Then:

bt

You should see the bactrace of executed function getting to deadloc. Make sure you get at least one SIP worker process backtrace (do kamctl ps to see the type of kamailio process).

Cheers,
Daniel

On 3/24/10 4:29 PM, Zappasodi Daniele wrote:

Hi,
I have a big problem with fast lock mutex on arm CPU:
sometimes I find one or more children that go in an infinite loop, in the while loop of get_lock function.
They remain in Run for long time (sometimes hours) and openser keeps 100% CPU.

Now I have changed the functions get_lock and tsl in order to obtain more info and
I observe that  (if and) when the process finish the cycle, it have done a big amount of cycles. 

This the log with my added info:
Mar 26 20:29:08 SAM-IP openser[1645]: NOTICE:tm:_lock: ret 1 (cycle)
Mar 26 20:29:51 SAM-IP openser[1645]: NOTICE:tm:_lock: ret 1 (cycle)
Mar 26 20:29:55 SAM-IP openser[1645]: NOTICE:tm:_lock: ret 1 (cycle)

Mar 26 20:29:55 SAM-IP openser[1645]: NOTICE:tm:t_retransmit_reply: MYTM lock
[process in loop]
Mar 26 20:33:46 SAM-IP openser[1645]: NOTICE:tm:_lock: ret 59374917 (cycle)
[after 4 minutes and 59374917 cycles, this is an example with a "short" loop]
Mar 26 20:33:46 SAM-IP openser[1645]: NOTICE:tm:t_retransmit_reply: MYTM lock done

I’m not able to recognize the call flow that generates the loop (I know only that it always happens when t_retransmit_reply calls LOCK_REPLIES),

but I urgently need a work around to escape from the loop.

This is the get_lock function with my counter j:

inline static int get_lock(fl_lock_t* lock)
{
#ifdef ADAPTIVE_WAIT
        int i=ADAPTIVE_WAIT_LOOPS;
        int j=1;                                        /***** my change ****/
#endif
       
        while(tsl(lock)){
#ifdef BUSY_WAIT
#elif defined ADAPTIVE_WAIT
                j++;                            /***** my change ****/
                if (i>0) i--;
                else sched_yield();
#else
                sched_yield();
#endif
        }
        return(j);
}

Can I break the lock when my counter reaches a big value?
Should I call the unlock function after the break?
which value can be considered too big?

Thanks,
Daniele

********************************************************************** The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. **********************************************************************