So poking round the code for the dialog module.... Im not sure what im missing here.
get_profile_size dosnt care bout the state of a dialog... so you get ALL dialogs that are in the hash table. ( which is interesting if you want to use dialog module to enforce channel limits etc )
So you go... OK... kamailio only expects to have "ACTIVE" dialogs in the hash table... kewl.. lets assume that to be the case.
but then in dlg_db_handler.c , load_dialog_info_from_db loads all dialogs from the DB, regardless of state. so all dialogs in the DB ( ones that didnt get deleted yet... but were in state 5 ) get re-created in kamailio upon startup.
what this means is... ( assume starting with empty DB )
I start kamailio, make some calls... they get synced to the DB. I end the calls, kamailio removes from dialogs module internal hash, but the sync to DB hasnt happened yet.
I kill kamailio ( or crash .. whatever ).... restart kamailio and it re-loads all those dialogs and thinks they are still active calls.
Im SURE Im missing something here, because it seems to be VERY common to use dialogs for channel limiting.. maybe not so much using cassandra db behind the scenes, but as of yet ... Im still yet to find anything that makes me thing this is db_cassandra mis-behaving.
if im wrong, please point me in the right direction.
Jay
On 24 February 2014 17:54, jay binks jaybinks@gmail.com wrote:
I would suggest that you change the DBMS to something a little less complicated than Cassandra, MySQL for example, make your tests again and see if you can reproduce this.
In case you can't, and you get to work everything with the DBMS you chose, it would mean that you have found a bug in the cassandra module.
I personally have been experimenting with db_cassandra, and it works quite well for some scenarios, and it does not at all for others. Also, take into account that you can't really maintain the Kamailio tables using the built-in scripts (kamctl) when using Cassandra as a backend. It does not work because Cassandra uses CQL that resembles SQL, but has a very limited functionality and they look alike only syntactically.
Regards,
On Mon, Feb 24, 2014 at 7:19 AM, jay binks jaybinks@gmail.com wrote:
Hello,
I pushed some patches to the master branch in order to remove the dialog from its associated profiles when it gets in terminated state. I encountered such issue (not that) recently, but I haven't gotten the time to get to it before.
Then, the second patch is to not add dialogs in profiles when loading from database and the state is terminated (5).
Here are the links to the patches:
- http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=edf61acb... - http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=9b88eb7e...
Should be straightforward to cherry pick to 4.1 (even 4.0 I expect). If you test and all goes fine, I will backport -- here I had no time for real testing.
I plan also to not add the dialogs in memory for state terminated, but destroy them at db load time. But this needs a bit of a review, to be sure that all necessary callbacks are executed.
On the other hand, if the dialogs are not removed from db, might be an issue with the database driver (cassandra in this case, which is rather new module). Do you get any syslog errors from kamailio or database server? I expect that people would have reported such issue for other database engines so far. Still it might be an issue, just that was not noticed...
Cheers, Daniel
On 24/02/14 11:19, jay binks wrote:
Hi All,
so Ive done what Carlos suggested and swapped out my dialog db to Mysql rather than cassandra. All worked 100% as you would expect.
Right so the issue is db_cassandra .
I started testing and going through the code.
I found I had these lines, which was interesting & concerning. update_dialog_dbinfo_unsafe(): could not add another dialog to db I had been ignoring them, because the dialog was in the DB and I figured I would come back and figure that out later.
but this seems to have been key to this whole thing.
ends up that in dlg_db_handler.c dialog_dbf.insert was getting a 1 back from kamailio on the insert and a 0 back from mysql... WTF.. ok.
so I trace into db_cassa_insert which calls db_cassa_modify .. around line 1210 I can see this ..
CON_CASSA(_h)->con->batch_mutate(CFMap, oac::ConsistencyLevel::ONE); return 1;
wrapped in a try / catch block.. seems db_cassandra wants to return 1 for success but kamailio ( or dialog module at least ) expects 0 for success .
so I change that to be return 0, and re-test. everything works as expected, "could not add another dialog to db" stops coming up on my console, and dialogs are removed when calls hangup.
seems this 1 thing is enough to screw dialogs in cassandra ( and who knows what else ). This is the reason for my email though, if we simply change that to 0, what else may break !??
however http://www.asipto.com/pub/kamailio-devel-guide/#c09f_insert clearly states that "0 if everything is OK" so this is clearly a bug that needs fixing.
Can I get someone with more experience to test this for me and possibly apply the attached patch !?
Jay
On 25 February 2014 05:58, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
I pushed some patches to the master branch in order to remove the dialog
from its associated profiles when it gets in terminated state. I encountered such issue (not that) recently, but I haven't gotten the time to get to it before.
Then, the second patch is to not add dialogs in profiles when loading
from database and the state is terminated (5).
http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=edf61acb...
http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=9b88eb7e...
Should be straightforward to cherry pick to 4.1 (even 4.0 I expect). If
you test and all goes fine, I will backport -- here I had no time for real testing.
I plan also to not add the dialogs in memory for state terminated, but
destroy them at db load time. But this needs a bit of a review, to be sure that all necessary callbacks are executed.
On the other hand, if the dialogs are not removed from db, might be an
issue with the database driver (cassandra in this case, which is rather new module). Do you get any syslog errors from kamailio or database server? I expect that people would have reported such issue for other database engines so far. Still it might be an issue, just that was not noticed...
dialogs that are in the hash table.
( which is interesting if you want to use dialog module to enforce
channel limits etc )
So you go... OK... kamailio only expects to have "ACTIVE" dialogs in the
hash table... kewl..
lets assume that to be the case.
but then in dlg_db_handler.c , load_dialog_info_from_db loads all dialogs
from the DB, regardless of state.
so all dialogs in the DB ( ones that didnt get deleted yet... but were in
state 5 ) get re-created in kamailio
the sync to DB hasnt happened yet.
I kill kamailio ( or crash .. whatever ).... restart kamailio and it
re-loads all those dialogs
and thinks they are still active calls.
Im SURE Im missing something here, because it seems to be VERY common to
use dialogs for channel limiting..
maybe not so much using cassandra db behind the scenes, but as of yet ...
Im still yet to find anything that makes me thing this is db_cassandra mis-behaving.
mentioning anyways.
so... I run kamailio, make calls, see dialogs in the DB.. and I Can use "kamctl mi dlg_list" and see that dialogs go away when I
hangup a call..
When I query the DB Backend, I still see the queries, but they have a
state of 5.
I Initially thought this was a bug, but it seems dialogs in state 5 get
cleaned up after a period.
"kamctl mi dlg_list" again, and it shows all my dialogs from the DB. they DO show as "State 5"
but for some reason, these dialogs appear to stick around for a long
time, and the bigger issue it causes me is that my channel limiting ( using get_profile_size ) seems to consider these dialogs ( in state 5 ) as being active calls.
-- Sincerely
Jay
Just noticed the same thing in db_cassa_delete.. patch updated to fix both
Jay
On 5 March 2014 12:52, jay binks jaybinks@gmail.com wrote:
Hello,
can you make the patch for master branch? I just backported two patches that were in master branch but not yet in 4.1.
With this occasion, can you review if the other 'return 1' expose the same issue? I noticed another one in db_cassa_delete and in db_cassa_query.
Thanks, Daniel
On 05/03/14 03:57, jay binks wrote:
Hi Jay,
I tried the module as is for the location service, and it worked fine. Considering this, why was this part working if usrloc uses the same DB API that was binding a buggy implementation?. It makes me suspicious.
Did you make sure the records were actually inserted on the cluster?
Regards, Carlos
On Wed, Mar 5, 2014 at 8:42 AM, Daniel-Constantin Mierla miconda@gmail.comwrote:
Records were inserted, thats for sure..
I have not checked usrloc, but I suspect it didnt rely on the return code of the insert to anywhere near the same amount.
On 5 March 2014 23:42, Carlos Ruiz Díaz carlos.ruizdiaz@gmail.com wrote:
The dialog module checks for !=0 and the usrloc for <0. Returning 1 makes no sense for the cassandra functions as it is not for a special case of successful result. It should be 0.
Cheers, Daniel
On 05/03/14 22:46, jay binks wrote:
I have fixed the remaining return 1's , and also added a little more logging for if a query reties to connect and fails ( retry count ).
find attached latest patch file... ( tested to patch against master head ) hopefully this is good enough for you to commit.
On 5 March 2014 21:42, Daniel-Constantin Mierla miconda@gmail.com wrote:
I applied the patch to master and backported to 4.1 branch,
Cheers, Daniel
On 06/03/14 09:38, jay binks wrote: