On Tuesday 04 November 2014, Daniel-Constantin Mierla wrote:
On 03/11/14 17:05, Alex Hermann wrote:
> On Monday 03 November 2014, Daniel-Constantin Mierla wrote:
True. But even
then refcounting shouldn't be bypassed. For each (cleanup)
situation the expected refcount should be compared to the actual refcount
before deleting.
Not sure what would be expected reference counter value in each of the
states.
I think it is necessary to know it to be able to securely cleanup dialogs.
In theory, the dialog cannot last for ever in initial
state, so cleaning
it up make sense to stay safe of memory leaks at runtime -- we can print
log messages to alert, but keeping it for ever brings more troubles
(e.g., besides memory leaks, the call limit can be affected).
I agree those must be cleaned, even though it is not technically a memory
leak, as the dialog pointer is still used in the dialog hash.
As I said, for operations that are long term, I
consider better to keep
dlg index and label (two integers) than increasing the reference count
and storing the pointer (on 64b it is the same size). Then lookup again
the dialog when needed.
Who is talking about "long term" operations? The duration is not of any
relevance here.
Almost none of the dialog related operations apart from getting the dlg
pointer are protected by locks but by the refcounter. Suppose process 1
obtains a dlg pointer (from the h_entry, and h_id), and upped the refcounter
as a result. It can get scheduled away at any random time, even directly after
obtaining the dlg pointer. If the timer process get scheduled at that moment
and frees the memory of the dlg (disregarding the fact that is is still in
use), how is process 1 to know that?
Finding the reason of what determined you to do
this is important, not to push kind of workaround that we don't fully
understand its purpose and benefits.
IMO, it is not right to keep a dialog in proxy as long as caller/callee
don't have it anymore.
I agree, just not with the method you used to get rid of the dialog.
You can eventually make the behavior of the patch
configurable, so you
can enable it via modparam, in the current form I don't find it ok.
I'm not going to make a known segfault path configurable, so i'll revert it if
you insist. But may i kindly suggest you revert your code too which introduced
this code in the first place as, IMHO, it is fundamentally wrong by deleting
an object without checking if it is in use.
(i.e., also sip
protocol relies on a time interval for not receiving ACK).
This specific situation is not about missing ACK. That would be
DLG_STATE_CONFIRMED_NA.
It was a quick example that sip relies on time intervals for changing
states.
My goal
was to figure out what was the situation, see if it is something
predictable that can be fixed in source code -- dialogs staying too long
in a state that shouldn't take too long are susceptible to issues and it
is better if we know what was the reason.
It is weird that the TMCB_DESTROY callback didn't cleanup the dialog. I
haven't investigated the cause as i could fix the segfault pretty easily.
So far sounds like fixing something that we don't know what was and the
side effects are not good. I would like to get to a proper solution,
knowing what was the issue.
I'm sorry, but IMHO this (and earlier) argument applies more to your initial
cleanup routine for the stray dialogs than to my fix for the segfault.
--
Greetings,
Alex Hermann