Hello,

On 1/30/12 6:35 PM, Peter Dunkley wrote:
Hi,

Any retransmission will cause the problem, so anyone using UDP over the Internet to a Kamailio presence server where there is occasional packet-loss will see it.  It was just first noticed here under heavy load.

By creating a new transaction and absorbing the retransmissions, do you mean calling t_newtran()/t_release() when the SUBSCRIBE is received?

yes, like in default config, using t_newtran() before handling the subscribe. t_check_trans() is used to figure out of there is already a transaction for that request and absorbs the request if it is retransmissions.

Not sure t_release() is explicitly needed anymore, Andrei did some work long time ago in this area, iirc, but if used it is harmless, so it is still in the default config.

Cheers,
Daniel


If so I didn't think of that.  It'd make sense to do that too.  I think the presence module should cope with retransmissions (especially as we need it to cope in a multi-server environment with load-balancers/fail-over and a shared database).  But if using t_newtran()/t_release() will handle retransmissions in the normal case it should help reduce the load on the database.

Thanks,

Peter


On Mon, 2012-01-30 at 18:26 +0100, Daniel-Constantin Mierla wrote:
Hello,

it can be held for next minor release to be tested more, if you feel it is better (we have to include something there as well :-) ). From commit log I understood is happening usually under RLS heavy load with retransmissions, does not help creating the transaction and absorbing the retransmissions with tm?

Cheers,
Daniel

On 1/30/12 6:19 PM, Peter Dunkley wrote:
Hello,

I believe that this bug also affects the 3.2 branch, but the change is quite big and with the next release of 3.2 due tomorrow I thought it best to hold off "cherry-pick"ing it until after the release.  That is, unless anyone else thinks it should go in there?

Regards,

Peter

On Mon, 2012-01-30 at 18:16 +0100, Peter Dunkley wrote:
Module: sip-router
Branch: master
Commit: e6a50c5c0957a5ad3e08e57ede5be775a41ac24f
URL:    http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=e6a50c5c0957a5ad3e08e57ede5be775a41ac24f

Author: pd <peter.dunkley@crocodile-rcs.com>
Committer: pd <peter.dunkley@crocodile-rcs.com>
Date:   Mon Jan 30 17:06:42 2012 +0000

modules_k/presence: Improved handling of retransmitted SUBSCRIBE requests

- handle_subscribe() doesn't handle retransmitted SUBSCRIBEs properly. This was
  noticed with back-end SUBSCRIBEs from RLS under heavy load (also tried TCP
  here but under-load this caused a different set of problems with buffer
  sizes and buffers taking too long to process).
- Although this was originally observed with RLS back-end SUBSCRIBEs it
  appears to be a general issue when UDP is used.
- There were two main problems:
  1) On an un-SUBSCRIBE the record in the hash-table or DB will be removed.  If
     the un-SUBSCRIBE is retransmitted there is no record to be found and
     handle_subscribe() fails.
  2) After fixing 1, and on re-SUBSCRIBE, remote CSeq's with lower than
     expected values cause failures.  This can also happen when there are
     retransmissions.
- The fix was to catch both these cases and treat them as a special class of
  error.  In these two cases and when the protocol is UDP, a correct-looking
  2XX response is sent, but no further processing (database updates, sending
  NOTIFY, and so on) is performed on the SUBSCRIBE request.
- Also modified the query in get_database_info() to just use Call-ID, To-tag,
  and From-tag for dialog matching (so it duplicates the query from
  get_stored_info()) as the query that was there looked a little aggressive.



-- 
Daniel-Constantin Mierla -- http://www.asipto.com
http://linkedin.com/in/miconda -- http://twitter.com/miconda