pres_htable_restore(): Remove check of expired presentity entries when initially filling the hash table to represent the number of expired + valid entries. Because hash.c::delete_phtable() decrements publ_count on removal of _every_ expired DB entry. get_p_notify_body(): Compensate the fix on hash table restore by checking for the expires time on building the NOTIFY. This also fixes a problem when a SUBSCRIBE is received for an expired entry before the cleaner can remove the entry from the DB. You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/809
-- Commit Summary --
* Presence: Fix startup inconsistency in presentity hash table
-- File Changes --
M modules/presence/notify.c (17) M modules/presence/presentity.c (3)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/809.patch https://github.com/kamailio/kamailio/pull/809.diff
@Chocolatbuddha pushed 1 commit.
29f6ba4 Fix variable name
Can you provide more details about what is the inconsistency? More information about how the problem you tried to fix is exposed will help to understand properly the patch.
The problem we experienced was that the kamailio would sometimes send empty NOTIFYs after start. This problem was persistent until the kamailio was restarted again.
We found out that this would occur in case the presentity table contained more or equal expired entries than valid ones. Lets assume we have only one presentity entry on startup and it is expired. This entry is ignored when filling the hash table. A PUBLISH generates a new valid entry in the DB and the hash table _before_ the first cleanup run resulting in an expired and a valid entry in the DB and a publ_count of 1 for the URI in the hash table. On first run of the cleaner it removes the expired entry from the DB and reduces the publ_count to 0 which means removing it from the hash table despite the fact that there is still a valid DB entry. Since the following PUBLISHes are subsequent ones only the DB is updated but not the hash table. The effect are hash table misses resulting in the omission of fetching the XML body from the presentity table for every following lookup.
This patch primarily tries to make the usage of publ_count consistently represent the number of presentity entries in the DB. And to make up for the short times an expired entry is in the hash table (after startup and for recently expired ones during runtime) the entry's expires time is checked on fetching the body for a NOTIFY.
Double checking here, to save time digging the code -- are the expired entry in DB still deleted properly if they are not loaded anymore?
The DB was updated correctly and showed no problems whatsoever in our tests. The problems were only in the hash table.
Before merging, maybe other people using presence modules, like @phil-lavin or @lazedo, can have a look and comment if they find something inconsistent here, because I don't have the time right now to go in deeper review and analysis of the code for side effects.
@Chocolatbuddha please, make this change with one commit. Just squash those two commits in one and force push to your branch in order to get a pull request with a single commit
@linuxmaniac I think that can be done from the "Merge pull request" button here -- it has the right arrow down that gives more options one of them is "Squash and merge" -- never tried, but from your description it looks like what you want.
We know there are issues restarting Kamailio with presentity entries in place. The old entries cause problems until they expire. We think we have narrowed this down to the pua db_mode - we can't store PUA entries in the database because that logic is broken in quite a few ways.
That said, I'll give this patch a try and see if it fixes any of our restart issues and/or introduces strange side effects.
Commits squashed.
(Thanks, @linuxmaniac, for the tip. I didn't know you could do that in Git.)
@phil-lavin wondering if you got any chance to test this one in order to merge or not.
If none else has comments, probably I will merge this patch in the next few days.
It's been running here for a while and it has had no side effects. That said, it doesn't resolve any problems that we are facing.
@phil-lavin: thanks for the feedback, I am going to merge this one.
Merged #809.
this commit broke presence for us. we use NSQ + presence, after a phone reboots and we send a NOTIFY with confirmed state, presence module sends a NOTIFY with terminated state. please revert this commit. see the log and SIP packet below:
``` Nov 3 09:23:52 /usr/local/sbin/kamailio[26288]: INFO: <script>: park+6001@troubleshoot.testdomain.com|log|payload {"Call-ID":"park+6001@troubleshoot.testdomain.com","Event-Category":"presence","Event-Name":"update","Event-Package":"dialog","Expires":"1","From":"sip:park+6001@troubleshoot.testdomain.com","From-User":"park+6001","From-Realm":"troubleshoot.testdomain.com","To":"sip:testuser@troubleshoot.testdomain.com","To-User":"testuser","To-Realm":"troubleshoot.testdomain.com","State":"confirmed"} Nov 3 09:23:52 /usr/local/sbin/kamailio[26288]: INFO: <script>: park+6001@troubleshoot.testdomain.com|log|received dialog update for sip:park+6001@troubleshoot.testdomain.com state confirmed Nov 3 09:23:53 /usr/local/sbin/kamailio[26284]: INFO: presence [notify.c:1634]: send_notify_request(): NOTIFY sip:testuser@troubleshoot.testdomain.com via sip:testuser@8.1.2.90:25279;transport=tcp on behalf of sip:park+6001@troubleshoot.testdomain.com for event dialog : 46ed33b5954833366fb4eda1d08f6bb7 ```
```
|NOTIFY sip:testuser@8.1.2.90:25279;transport=tcp SIP/2.0 6.7.8.219:5055 8.1.2.90:2527 |Via: SIP/2.0/TCP 6.7.8.219:5055;branch=z9hG4bKb0d.16516795000000000000000000000000.0 ----------+--------- ----------+---------|To: sip:testuser@troubleshoot.testdomain.com;tag=EBF86DE1-F194FBC6 | NOTIFY | |From: sip:park+6001@troubleshoot.testdomain.com;tag=3d72448f6ec66ad72d4fe275f33a4e85-bf81 09:23:53.007063 | --------------------------> | |CSeq: 3 NOTIFY +0.007681 | 200 OK | |Call-ID: 46ed33b5954833366fb4eda1d08f6bb7 09:23:53.014744 | <-------------------------- | |Content-Length: 285 | | |User-Agent: kamailio (5.0.0-dev7 (x86_64/linux)) | | |Max-Forwards: 70 | | |Event: dialog | | |Contact: sip:6.7.8.219:5055;transport=tcp | | |Subscription-State: active;expires=3491 | | |Content-Type: application/dialog-info+xml | | | | | |<?xml version="1.0"?> | | |<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="3" state="full" entity="sip:park+6001@troubleshoot.testdomain.com"> | | | <dialog id="615293b33c62dec073e05d9421e9f48b" direction="recipient"> | | | <state>terminated</state> | | | </dialog> | | |</dialog-info> ```
@eschmidbauer - is it affecting the runtime behaviour, because it should have been targeting only the startup procedure?
@miconda yes, we are seeing this issue during runtime. not on startup we reverted the commit and problem went away
the problem with this commit is for expired column = 0 which is usually set on "early" state
@lazedo you can see that we sending expires=1 in our NSQ message and in nsq_pua.c ```C struct json_object *ExpiresObj = nsq_json_get_object(json_obj, BLF_JSON_EXPIRES); if (ExpiresObj != NULL) { expires = json_object_get_int(ExpiresObj); if (expires > 0) expires += (int)time(NULL); } ``` you can see expires will be set to time + 1
@eschmidbauer yep, but that's wrong by definition, we had the same problem. try sending `expires = 3600` , it will work. i'm ok with this commit, it could be improved if we were able to ignore the `0` which we interpret as `do not expire`
it's not entirely clear to me what this PR fixes. What about adding a module parameter (default = enabled) to enable this code?
I am fine adding a parameter to control the changes done by the patch -- being it enabled or disabled by default, it's up to the people here what they prefer more.