Folks,
I've been having a bit of a battle with a concurrency issue.
If we have a reasonable number of contacts in an RLS resource list
(around 50 does it on my test server), we see a get the following error
message thrown up between 2 and 6 times whenever the client logs in.
ERROR: rls [resource_notify.c:663]: no presence dialog record for
non-TERMINATED state uri pres_uri = sip:0033@lab8.croc.internal
watcher_uri = sip:ernie@lab8.croc.internal
(I've extended the debug here to include the URIs, so I can see what is
not being found)
It is not always the same URIs that go missing, nor is it always the
same number of faults.
On investigation this turns out to be a race condition.
subs_cback_func (pua/send_subscribe.c) locks the presentity hash table
and inserts a dialog entry when it receives a 200 to the subscribe.
rls_handle_notify (rls/resource_notify.c) calls pua_get_record_id
(pua/hash.c get_record_id()) which also locks the presentity hash table
looks up the dialog.
It seems that in some cases the NOTIFY is getting the lock before the
200 to the SUBSCRIBE. Thus the NOTIFY handler is looking for the dialog
before the 200 handler has inserted it.
I attempted to insert a dialog entry in the hash table on sending the
SUBSCRIBE, unfortunately this did not cure the problem
Has anyone any suggestions for the cleanest and easiest method to ensure
that the 200 is handled before the NOTIFY?
Andy Miller
Crocodile RCS