Hi,
We have found the root cause for the problem that was reported (refer below
mail for details) in async module.
Below is the brief description,
- async_route("Resume", "1")
- At time t, the async records are stored at slot 't+1' of async
list.
- Every second, async records from the async list are processed.
- In function 'async_timer_exec', slot = ticks % ASYNC_RING_SIZE;
since slot is based on the ticks and if the previous invocation of
'async_timer_exec' did not finish in time, the next tick is missed.
Subsequent call to 'async_timer_exec' will be with t+2. Here slot t+1 will
be missed and the same will be processed only during next cycle. That means
the async records will be stored for the next 100 seconds before they are
actually processed. TM module will drop those timed out transactions and
hence the error "t_continue: transaction not found"
Now we have modified the code in such a way that slot is incremented
sequentially irrespective of the ticks that is sent to the function
'async_timer_exec' function. This way we do not see any call failures and
all our load runs are successful.
If someone is interested we can share the code as well.
Regards,
Shankar
From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Thursday, January 23, 2014 12:18 PM
To: 'Jason Penton'; 'SIP Router - Kamailio (OpenSER) and SIP Express Router
(SER) - Users Mailing List'
Subject: RE: [SR-Users] FW: Regd. t_suspend() and t_continue()
Hi,
From our repeated load tests what we can conclude
irrespective of the number
of simultaneous calls, there is always this error
"t_continue: transaction
not found" occurring.
If I run say 20 cps, then after running 5000 calls, we observe exactly 20
calls failing with the above error. We doubt that there is something
happening during a particular point in time (for a second) which impacts the
saving of those new transactions into shared memory.
For 10 cps run, we observe exactly 10 call failures. We repeated with
different cps and found that error is exactly equal to the cps being run.
Any configuration we are missing. Anyone can help?
Regards,
Shankar
From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Tuesday, January 21, 2014 3:09 PM
To: 'Jason Penton'; 'SIP Router - Kamailio (OpenSER) and SIP Express Router
(SER) - Users Mailing List'
Subject: FW: [SR-Users] FW: Regd. t_suspend() and t_continue()
Hi Jason,
Below is our config,
route[LOCATION] {
if(is_method("INVITE"))
{
if(!route(FROMCSCF))
{
setflag(FLT_ACC);
setflag(FLT_ACCFAILED);
dlg_manage();
dlg_setflag("4");
async_route("RESUME", "1");
exit;
}
}
}
route[RESUME]
{
route(TO_LOCATION); // here t_relay to REGISTRAR is done
for user lookup.
exit;
}
Regards,
Shankar
Date: Tue, 21 Jan 2014 11:14:21 +0200
From: Jason Penton <jason.penton(a)smilecoms.com>
To: "Kamailio (SER) - Users Mailing List"
<sr-users(a)lists.sip-router.org>
Subject: Re: [SR-Users] FW: Regd. t_suspend() and t_continue()
Message-ID:
<CAE=KcrghqJHgnGDxqS1fYvUzM=HqRAcKWfEAsNJjm8xUDCq68w(a)mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
We use it heavily, but not using the async module - we use it directly from
the IMS code.
Can you please provide your config (or a relevant snippet) file so I can see
what exactly you are testing/trying to do
Cheers
jason
From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Tuesday, January 21, 2014 2:25 PM
To: 'SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users
Mailing List'
Subject: RE: [SR-Users] FW: Regd. t_suspend() and t_continue()
Hi,
Anyone who had used t_suspend() and t_continue() can share the performance
details?
I tried async module with one sec sleep time. I tried only 5 calls per
second but still it was not successful. After sometime I see below logs,
Jan 21 13:51:55 PLT-RA-RD-W167A PCscf[16520]: ERROR: tm [t_suspend.c:128]:
t_continue(): ERROR: t_continue: transaction not found
Jan 21 13:52:49 PLT-RA-RD-W167A last message repeated 15 times
Jan 21 13:59:38 PLT-RA-RD-W167A last message repeated 12 times
Jan 21 14:13:03 PLT-RA-RD-W167A last message repeated 5 times
Any configuration changes can help here?
Regards,
Shankar
From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Wednesday, January 15, 2014 1:26 PM
To: 'Jason Penton'
Cc: 'SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users
Mailing List'
Subject: RE: [SR-Users] FW: Regd. t_suspend() and t_continue()
Hi Jason,
I am using 4.0.2
Regards,
Shankar
From: Jason Penton [mailto:jason.penton@smilecoms.com]
Sent: Wednesday, January 15, 2014 1:21 PM
To: Shankar
Cc: SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users
Mailing List
Subject: Re: [SR-Users] FW: Regd. t_suspend() and t_continue()
Hi Shankar,
What version of Kamailio are you running? Kamailio -V
Cheers
Jason
On Wed, Jan 15, 2014 at 6:58 AM, Shankar <shankar.rk(a)plintron.com> wrote:
Hi Jason,
Please find below my response inline,
I have some questions for you as we have used suspend/continue quite a lot
in the IMS code and don't have any leaks.
Firstly, why are you using pkg_mem for your hash_id and label? Remember that
you will be in 2 different processes in the suspend and continue portions of
the code... so pkg_mem will not work - you should use shm_mem instead.
[Shankar] We use pkg_mem because we are invoking t_continue from the same
process ( using thread ).
Secondly, how are you using top to tell that you have a leak? Kamailio's
memory is internally managed.
[Shankar] After running for say 20minutes, we get out of shared memory
error. Also in top output we observed incremental increase in the shared
usage of shared memory for the process.
Cheers
Jason
On Mon, Jan 13, 2014 at 1:29 PM, Shankar <shankar.rk(a)plintron.com> wrote:
Re-sending without the attachment.
*From:* Shankar [mailto:shankar.rk@plintron.com]
*Sent:* Monday, January 13, 2014 4:57 PM
*To:* 'sr-users(a)lists.sip-router.org'
*Subject:* Regd. t_suspend() and t_continue()
Hi,
We are trying out the t_suspend() and t_continue() in
our test setup.
We are facing memory leak ( both shm and pkg as per
top command results).
Please find below the scenario,
1) Do a t_newtran()
2) Allocate pkg memory for hashid and label.
3) Call t_suspend()
4) Do t_continue() when async result is available
5) De-allocate pkg memory reserved for hashid and
label
6) Do a t_relay() which forwards the sip message
to another sip node.
In the step (6) above, we see t_newtran() allocates
one more time
shared memory for the same transaction.
We tried t_release() after step (4) to release the
transaction as
t_relay() anyways allocates new shared memory. Nothing
helped.
Please let me know what are the logs you would require
to debug the same.
I am attaching syslog for this run.
Regards,
Shankar
--
Jason Penton
Senior Manager: Applications and Services
Smile Communications Pty (Ltd)
Mobile:
+27 (0) 83 283 7000
Skype:
jason.barry.penton
<mailto:name.surname@smilecoms.com> jason.penton(a)smilecoms.com
<http://www.smilecoms.com/>
www.smilecoms.com
<http://196.33.227.129/~smlcoms/sigs/pty/images/smile_signature_07_09.jpg>
This email is subject to the disclaimer of Smile Communications at
http://www.smilecoms.com/disclaimer