This was tested with 100K and 4 servers, when restarting a server we get a full sync from the 3 nodes in a few seconds :
```show registrations: first x "usrloc:location-contacts = 100000", show registrations: second x "usrloc:location-contacts = 100000", show registrations: third x "usrloc:location-contacts = 100000", show registrations: fourth x "usrloc:location-contacts = 100000", postgres database: 100000 ``` You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/1054
-- Commit Summary --
* dmq_usrloc: sync with multi contacts per message
-- File Changes --
M src/modules/dmq_usrloc/README (24) M src/modules/dmq_usrloc/dmq_usrloc.c (7) M src/modules/dmq_usrloc/usrloc_sync.c (264)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/1054.patch https://github.com/kamailio/kamailio/pull/1054.diff
I think, I should add a check for maximum packet length. to make sure we will never try to send a datagram larger than 60K. Next step, we could add compression :)
I am reviewing the patch currently but as for limiting to 60KB, I'm not sure that for this application it is necessary. It will be difficult to be 100% accurate, anyway, since you only have control over the body - therefore, the only option is to assume a sensible size for the rest of the message and set that aside in your calculation.
I think Kamailio handles fragmentation just fine and since this is really just a few packets on startup/initial sync, I don't think we need to be concerned about it (although it may be worth a mention in the readme). Others may have a different opinion of course.
This is now deployed on cluster running on AWS, fragmentation is taking place and everything is performing much better, no more transactions storm.
OK to update the readme, changing the example to 50 contacts and adding a comment about considering 1024 Bytes / contact in order to stay bellow 65536 UDP send
@jchavanton pushed 1 commit.
1226c0f dmq_usrloc: readme batch_msg_contacts
@jchavanton: the documentation has to be edited in docbook xml files located in `doc/` subfolder of each module. The readme file must not be edited directly, it is generate on server from those file.
ok, corrected the documentation, I squashed to avoid too many commits
The overhead per contact is 188 chars, this is quite a lot, we may select an alternate format later. `1,:{"action":,"aor":"","ruid":"","c":"","received":"","path":"","callid":"","user_agent":"","instance":"","expires":,"cseq":,"flags":,"cflags":,"q":,"last_modified":,"methods":,"reg_id":},`
I think we could validate the size of the packet while building it to automatically send > 60000 for example.
Are there other changes you'd like to make to this PR?
Hi Charles, I am implementing the size check, I think it would be nightmarish no to know if sometimes contacts could be missing because of oversized messages, this will simply things for anyone using this feature.
I am verifying that the calculation matches and I will test and commit later today.
Thanks for your review so far
@jchavanton pushed 1 commit.
8ed3190 dmq_usrloc: param batch_max_msg_size
I added `dmq_usrloc_batch_msg_size` to make sure we are never trying to send messages too large and to simplify the configuration. I did run a bunch of tests again.
Hi @charlesrchance I am not planning any other modifications on this PR, the performance of sync is now acceptable for our need at this point.
One thing that could help would be to increase the retransmission time-out only for the sync traffic, I did not see any obvious way to do this. I guess this is off topic and I was planning to discuss this in the mailing list.
@charlesrchance I guess this is ready to be merged if you are fine with everything so far.
Returning after an extended Easter break - therefore, have not tested the recent changes, but it looks ok from source. If it has been tested already and everyone else is happy then it can be merged. Otherwise, I will do it tomorrow.
I did test the latest version in my lab, not in prod yet.
The lab test I did was load testing with 4 nodes, before going to prod we do integration testing, in this case see no reason why they would fail. The only reason why this is not in prodyet, is because instead of back porting to 4.5 we will start using 5.x
Merged #1054.