### Description We have 3 registrar servers storing location information in memory only. The registrations are replicated between each other using DMQ/DMQ USRLOC. When one of these registrars processes a registration it is saved and then replicated to the remaining two nodes.
When nat pinging is enabled then all three of these registrars attempt to send options pings to the endpoints. Ideally, only the registrar that serviced the registration should be sending the ping out.
I see that there is a core parameter to set the server_id for each server, and I also see that nathelper has a "filter_server_id" parameter as well, however, this appears to only work in database mode, it does not work with in memory mode. Would it be possible to extend this to work for in-memory mode too?
### Troubleshooting Module definitions
registrar ``` modparam("registrar", "method_filtering", 1) modparam("registrar", "case_sensitive", 1) modparam("registrar", "append_branches", 0) modparam("registrar", "use_path", 1) modparam("registrar", "path_mode", 0) modparam("registrar", "path_use_received", 1) modparam("registrar", "path_check_local", 1) modparam("registrar", "max_contacts", 1) ```
usrloc ``` modparam("usrloc", "db_mode", 0) modparam("usrloc", "use_domain", 1) modparam("usrloc", "timer_interval", 60) modparam("usrloc", "timer_procs", 4) modparam("usrloc", "nat_bflag", 6) ```
nathelper NOTE: force_socket is set to match each registrar server ``` modparam("nathelper", "natping_interval", 20) modparam("nathelper", "natping_processes", 4) modparam("nathelper", "ping_nated_only", 0) modparam("nathelper", "sipping_from", "sip:keepalive@example.com") modparam("nathelper", "sipping_method", "OPTIONS") modparam("nathelper", "sipping_bflag", 6) modparam("nathelper", "force_socket", "10.7.0.189:5060") modparam("nathelper", "udpping_from_path", 1) ```
Kamailio listen directives: NOTE: These are set to match the interfaces on each registrar ``` listen=udp:10.6.0.189:5060 listen=udp:10.7.0.189:5060 listen=tcp:10.6.0.189:80 ```
dmq ``` modparam("dmq", "server_address", DMQ_ADDRESS) modparam("dmq", "notification_address", DMQ_NOTIFY_ADDRESS) modparam("dmq", "multi_notify", 1) modparam("dmq", "num_workers", 4) ```
dmq_usrloc ``` modparam("dmq_usrloc", "enable", 1) ```
#### Reproduction
Using the settings above, when the location information is replicated to a server, the servers that are replicated to should not send options messages for that AOR, only the registrar that serviced the registration should send nat keepalives out.
#### Log Messages
#### SIP Traffic
Here is the traffic from the registrar that serviced the registration, this is expected and working: ``` U 2017/11/02 07:50:54.191717 10.7.0.190:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@212.2.172.228:39808;rinstance=d74acdb581467154;transport=UDP SIP/2.0. Via: SIP/2.0/UDP 10.7.0.190:5060;branch=z9hG4bK5526436. Route: sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:39808. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. To: sip:example_user@212.2.172.228:39808;rinstance=d74acdb581467154;transport=UDP. Call-ID: d9993f61-a1fd4752-73cdc76@10.7.0.190. CSeq: 1 OPTIONS. Content-Length: 0. .
U 2017/11/02 07:50:54.192896 194.213.29.33:5062 -> 212.2.172.228:39808
OPTIONS sip:example_user@212.2.172.228:39808;rinstance=d74acdb581467154;transport=UDP SIP/2.0. Max-Forwards: 10. Record-Route: sip:194.213.29.33:5062;r2=on;lr;ftag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Record-Route: sip:10.7.0.186:5062;r2=on;lr;ftag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Via: SIP/2.0/UDP 194.213.29.33:5062;branch=z9hG4bK13ab.9ccb0733fcecc331893d95f2e09485ee.0. Via: SIP/2.0/UDP 10.7.0.190:5060;rport=5060;branch=z9hG4bK5526436. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. To: sip:example_user@212.2.172.228:39808;rinstance=d74acdb581467154;transport=UDP. Call-ID: d9993f61-a1fd4752-73cdc76@10.7.0.190. CSeq: 1 OPTIONS. Content-Length: 0. .
U 2017/11/02 07:50:54.248234 212.2.172.228:39808 -> 194.213.29.33:5062
SIP/2.0 200 OK. Via: SIP/2.0/UDP 194.213.29.33:5062;branch=z9hG4bK13ab.9ccb0733fcecc331893d95f2e09485ee.0. Via: SIP/2.0/UDP 10.7.0.190:5060;rport=5060;branch=z9hG4bK5526436. Record-Route: sip:194.213.29.33:5062;r2=on;lr;ftag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Record-Route: sip:10.7.0.186:5062;r2=on;lr;ftag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Contact: sip:192.168.1.64:39808. To: sip:example_user@212.2.172.228:39808;rinstance=d74acdb581467154;transport=UDP;tag=726ffa30. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Call-ID: d9993f61-a1fd4752-73cdc76@10.7.0.190. CSeq: 1 OPTIONS. Accept: application/sdp, application/sdp. Accept-Language: en. Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, MESSAGE, OPTIONS, INFO, SUBSCRIBE. Supported: replaces, norefersub, extended-refer, timer, outbound, path, X-cisco-serviceuri. User-Agent: Z 3.15.40006 rv2.8.20. Allow-Events: presence, kpml, talk. Content-Length: 0. .
U 2017/11/02 07:50:54.248922 10.7.0.186:5062 -> 10.7.0.190:5060
SIP/2.0 200 OK. Via: SIP/2.0/UDP 10.7.0.190:5060;rport=5060;branch=z9hG4bK5526436. Record-Route: sip:194.213.29.33:5062;r2=on;lr;ftag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Record-Route: sip:10.7.0.186:5062;r2=on;lr;ftag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Contact: sip:192.168.1.64:39808. To: sip:example_user@212.2.172.228:39808;rinstance=d74acdb581467154;transport=UDP;tag=726ffa30. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-711-3-9968b2da-c2c36781. Call-ID: d9993f61-a1fd4752-73cdc76@10.7.0.190. CSeq: 1 OPTIONS. Accept: application/sdp, application/sdp. Accept-Language: en. Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, MESSAGE, OPTIONS, INFO, SUBSCRIBE. Supported: replaces, norefersub, extended-refer, timer, outbound, path, X-cisco-serviceuri. User-Agent: Z 3.15.40006 rv2.8.20. Allow-Events: presence, kpml, talk. Content-Length: 0. . ```
Here is a ping attempt from a server replicated to (we should not be pinging from this registrar) ``` U 2017/11/02 08:37:59.426608 10.6.0.189:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP SIP/2.0. Via: SIP/2.0/UDP 10.6.0.189:5060;branch=z9hG4bK8416926. Route: sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:39808. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-714-17-9968b2da-13812ff4. To: sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP. Call-ID: c0cec5f7-555ac383-68bb313@10.6.0.189. CSeq: 1 OPTIONS. Content-Length: 0. .
U 2017/11/02 08:38:19.431937 10.6.0.189:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP SIP/2.0. Via: SIP/2.0/UDP 10.6.0.189:5060;branch=z9hG4bK8345318. Route: sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:39808. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-714-17-9968b2da-23812ff4. To: sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP. Call-ID: c0cec5f7-655ac383-a9bb313@10.6.0.189. CSeq: 1 OPTIONS. Content-Length: 0. . ```
Here is a ping attempt from the last server replicated to (we should not be pinging from this registrar) ``` U 2017/11/02 08:53:06.927374 10.6.0.191:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP SIP/2.0. Via: SIP/2.0/UDP 10.6.0.191:5060;branch=z9hG4bK8539212. Route: sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:39808. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-714-17-9968b2da-a25dfc84. To: sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP. Call-ID: 89dde755-495a3694-04ae655@10.6.0.191. CSeq: 1 OPTIONS. Content-Length: 0. .
U 2017/11/02 08:53:26.931991 10.6.0.191:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP SIP/2.0. Via: SIP/2.0/UDP 10.6.0.191:5060;branch=z9hG4bK7581592. Route: sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:39808. From: sip:keepalive@example.com;tag=uloc-2-59fa1f9d-714-17-9968b2da-b25dfc84. To: sip:example_user@212.2.172.228:39808;rinstance=ed8aa63e90f53e97;transport=UDP. Call-ID: 89dde755-595a3694-45ae655@10.6.0.191. CSeq: 1 OPTIONS. Content-Length: 0. . ```
### Possible Solutions
Unknown
### Additional Information
* **Kamailio Version** - output of `kamailio -v` ``` version: kamailio 5.0.4 (x86_64/linux) flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled on 10:57:22 Oct 26 2017 with gcc 4.8.5 ```
* **Operating System**: ``` CentOS Linux release 7.4.1708 (Core) Linux localhost 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux ```
I haven't developed the dmq_usrloc module, so I am not sure what the developer had in mind (or needed), but the idea of full replication is that each node can function independent of the others, with all features on, so if the first node goes down, the clients still get the keepalives. By replicating to two other nodes, you get indeed three times keepalives.
Again, speaking only from concept point of view and my opinion on that.
Maybe others can comment on this from their expectation. I know @charlesrchance does a lot of dmq, so I am mentioning him here, but others are welcome to join the discussion.
Of course, there can be an enhancement to add an option (mod param or similar) to achieve what you want.
Thanks for the info, just a quick question, if the same situation was in place when running 3 registrar's in db-only mode, then we can use the server_id and filter_server_id so that only 1 registrar will ping the contact. At lease, that is what I am interpreting in the module documentation. Is that correct?
I think that if you don't have the server_id and filter_server_id parameters set for nathelper, then all 3 database-only registrars will also ping the contact? Is that correct? Or, does nathelper/registrar us the aor "socket" parameter to decide if it need to send the ping from the local registrar? What i mean is, if the socket is set, and is a local socket, then send the ping, if there is not a socket set, or, if there is a socket set and is not local, then don't send the keep-alive message.
How does nathelper currently decide if it needs to send a keep-alive message from the local registrar?
Currently, I'm lucky that only 1 keep-alive message is actually making it through to the endpoint, however, if kamailio used the force_socket parameter correclty then all 3 keep-alives would make it through to the endpoint, which, I'm guessing, is not desired behaviour, or am I wrong with that assumption?
I don't think this is a DMQ issue specifically, but more a question of what should happen in *any* replicated situation (including shared DB).
Perhaps the solution is, indeed, for nathelper to compare the received socket on the contact against the local socket(s) and either send or not send the keepalive accordingly. I'm not sure if this is already the case since I haven't checked the code, but I assume not given the observed behaviour.
Alternatively, given that functionality already exists in nathelper for filtering on server_id in DB mode, we should also add the server_id to the contact in memory and include it in the replicated copy.
The question remains, however, in the event of a node going down, which node is going to take over the keepalives and how will it know to do so? Again, this applies to shared DB as much as it does in-memory/DMQ.
Either way, I'm happy to add any necessary mods to dmq/dmq_usrloc modules.
With regards dmq/dmq_usrloc and in-memory replication, doesn't dmq know when a node is down/removed from the bus? If it does, then maybe a background thread could fire to (e|se)lect one of the remaining nodes to take over from the node that is down and manipulate/override aor's of the downed host on the (ese)lected registrar. I don't think this would work for shared-db mode though.
It's a nice idea, but to implement would be quite complex/involved. Firstly, there would likely need to be the concept of a master node along with some mechanism for voting/election. Also, there would need to be changes across all 4 modules (dmq, dmq_usrloc, usrloc and nathelper) to make it work. On top of that, we'd need to consider what happens when a node is restored and takes back its IP.
In any case, to select the correct node to take over keepalives, we'd need to know where the IP had been (re)assigned. In your current setup, how do you decide which host takes over the IP (if at all) in the event of a failure?
Currently our 3 registrar servers are not configured for fail-over at all. When we have a planned/unplanned outage we rely on the proxies dispatcher set to route the message to an available registrar. We also shorten the registration interval to take into account that the nat pings wont be available for those endpoints that don't re-register on an available registrar, although, this method isn't completely foolproof.
Once the "failed" registrar is brought back into service dmq_usrloc replicates state back to it when it comes back and we are in service again.
On the other-hand, we do have an ha setup for the proxies involving corosync/pacemaker where there are floating addresses assigned to each kamailio instance, but for the registrars we don't run the ha setup.
Would it be possible to have dmq_usrloc updated to reflect what is currently done in shared-db mode? Shared-db mode currently has some flaws in the scenarios identified in previous posts above, but, at least, for this particular issue, it should, hopefully, resolve the current issue at hand, whereby we have one registrar servicing the nat keep alives for a particular endpoint.
Sure - are you referring to server_id filtering?
Hi, yes, server_id filtering is what I was referring to.
It looks like this is the current way the shared db scheme overcomes all registrars from pinging an available aor. I cannot see, from the module docs, if there are any other ways to overcome this issue.
dmq_usrloc - server_id replication: 684059ca
Thanks for the addition, do I need to recompile with this patch and test it out, or, are there other elements that need to be updated before I can test?
Yes, it is ready to use - just set the core parameter server_id on each server and enable server_id filtering in nathelper and it will work in the same way as with shared DB.
Thanks again, I will recompile with this patch and update this issue with testing.
Hi,
I'm trying to apply this patch against the 5.0.4 source release and I'm getting the following errors:
patch -p1 < dmq_usrloc.patch ``` patching file src/modules/dmq_usrloc/usrloc_sync.c Hunk #1 FAILED at 323. Hunk #2 succeeded at 412 (offset 37 lines). Hunk #3 succeeded at 434 (offset 38 lines). Hunk #4 FAILED at 649. Hunk #5 FAILED at 686. Hunk #6 succeeded at 551 (offset -181 lines). ``` I downloaded the commit with ".patch" added to the end url into the above mentioned dmq_usrloc.patch file.
I must be missing something here, I just cannot see what it is. Any thoughts?
The patch is made against the current master (development) branch - there have been some other changes to the same file since the 5.0 branch which will prevent the patch being applied directly to the old version.
It can be backported eventually but in the meantime, are you able to test with the master?
Hi, at the moment we cannot test with the master or 5.1.x branches as they are not "released" yet, which is a requirement here unfortunately.
I see that Daniel mentioned in an email on the mailing lists that 5.1.x should be release soon (couple weeks time), will this patch make it into that release? If it does then I can test against the 5.1.0 release when it is made available.
Actually, I had some time on the train to modify the patch for 5.0 (attached) - please test first and I will commit later assuming no issues.
[dmq_usrloc_patch_5_0.txt](https://github.com/kamailio/kamailio/files/1451081/dmq_usrloc_patch_5_0.txt)
Thanks Charles, I'll try to build in the morning and get back to you tomorrow with some results.
I've just updated my test registrars with 5.0.4 with the above patch, I've not enabled server_id_filetering in nathelper yet. Is there a way I can see if the server_id is being replicated with the aor? Will I be able to see it in the usrloc database (kamctl ul show)?
Which DB mode are you using? Can you see the server_id using `kamcmd ul.dump`?
I'm using memory mode only with dmq/dmq_usrloc replication. I dont see the server_id in the AOR dump using either kamctl or kamcmd.
Did you clear the records after setting server_id? It seems usrloc only stores server_id on initial insert of a contact, not on subsequent updates/refreshes.
No I did not, let me try that quick
hmm, running this for a while longer and I still see some pings on registrars that were replicated to, I have the nathelper server_id filtering enabled now too. So there is still something amiss: nathelper parameter ``` modparam("nathelper", "filter_server_id", 1) ```
Ping being sent from registrar that was replicated to: ``` U 2017/11/09 10:16:50.143477 10.6.0.189:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@78.143.152.30:59947 SIP/2.0. Via: SIP/2.0/UDP 10.6.0.189:5060;branch=z9hG4bK1016964. Route: sip:10.7.0.186:5062;lr;received=sip:78.143.152.30:59947. From: sip:keepalive@example.com;tag=uloc-1-5a042966-4d55-1-6821c98a-e1450563. To: sip:example_user@78.143.152.30:59947. Call-ID: ad921322-9fd17d71-2782516@10.6.0.189. CSeq: 1 OPTIONS. Content-Length: 0. . ```
Could this be down to the srv_id not showing up in the usrloc database?
I suspect, given that the server_id is not showing in the usrloc dump, this is exactly why the server_id filtering is not working.
We need to understand first why server_id is not present, before moving on to the keepalives.
I have run a test locally with two registrar nodes (server_ids 1 and 2) and two subscribers, each registering to a different node.
Output of ul.dump on one of the nodes:
``` { Domains: { Domain: { Domain: location Size: 1024 AoRs: { Info: { AoR: 123456 HashID: 924664970 Contacts: { Contact: { Address: sip:123456@10.8.0.151:5066 Expires: 33 Q: -1.000000 Call-ID: 863771482-5066-1@BA.I.A.BFB CSeq: 21177 User-Agent: Grandstream GXP2160 1.0.5.33 Received: [not set] Path: [not set] State: CS_NEW Flags: 0 CFlags: 0 Socket: [not set] Methods: 7135 Ruid: uloc-1-5a007f47-206b-11 Instance: urn:uuid:00000000-0000-1000-8000-000B825C66C2 Reg-Id: 4 Server-Id: 1 Tcpconn-Id: -1 Keepalive: 0 Last-Keepalive: 1510225981 Last-Modified: 1510225981 } } } Info: { AoR: 123457 HashID: 924664971 Contacts: { Contact: { Address: sip:123457@10.8.0.135:5066 Expires: 37 Q: -1.000000 Call-ID: 1296753078-5066-1@BA.I.A.BDF CSeq: 2000 User-Agent: Grandstream GXP2100 1.0.8.4 Received: [not set] Path: [not set] State: CS_NEW Flags: 0 CFlags: 0 Socket: udp:10.28.0.21:5060 Methods: 7135 Ruid: uloc-2-5a007f22-17c4-1 Instance: urn:uuid:00000000-0000-1000-8000-000B823BB51D Reg-Id: 4 Server-Id: 2 Tcpconn-Id: -1 Keepalive: 0 Last-Keepalive: 1510225985 Last-Modified: 1510225985 } } } } Stats: { Records: 2 Max-Slots: 1 } } } } ```
As you can see, server_id is present in both contacts and showing the correct value(s).
Can you paste the output (or a snippet) of the same command on one of your nodes?
Sure, here is an example of an aor replicated to registrars_1: ``` { "jsonrpc": "2.0", "result": { "AoR": "example_user@example.com", "Contacts": [{ "Contact": { "Address": "sip:example_user@212.2.172.228:43356;rinstance=4dc262b9af47682f;transport=UDP", "Expires": 98, "Q": -1, "Call-ID": "4HEiFls2hkky2XR6hL8Nrg..", "CSeq": 30, "User-Agent": "Z 3.15.40006 rv2.8.20", "Received": "sip:212.2.172.228:43356", "Path": "sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:43356", "State": "CS_NEW", "Flags": 2, "CFlags": 64, "Socket": "[not set]", "Methods": -1, "Ruid": "uloc-2-5a04296d-4ed0-e1", "Instance": "[not set]", "Reg-Id": 0, "Last-Keepalive": 1510227057, "Last-Modified": 1510227057 } }] }, "id": 20184 } ``` This aor is being serviced by registrar_2 and I can see the server_id being sent in the kdmq message to registrar_1, server_id is clearly set to 2: ``` KDMQ sip:usrloc@10.6.0.189:5060 SIP/2.0. Via: SIP/2.0/UDP 10.6.0.190;branch=z9hG4bK5568.b7ee8e11000000000000000000000000.0. To: sip:usrloc@10.6.0.189:5060. From: sip:usrloc@10.6.0.190:5060;tag=21afb82da9b0dd18e43e09ed8956ffc8-9f49. CSeq: 10 KDMQ. Call-ID: 1a806e9948d055d0-20402@10.6.0.190. Content-Length: 514. User-Agent: kamailio (Registrar 2). Max-Forwards: 1. Content-Type: application/json. . {"action":1,"aor":"example_user@example.com","ruid":"uloc-2-5a04296d-4ed0-e1","c":"sip:example_user@212.2.172.228:43356;rinstance=4dc262b9af47682f;transport=UDP","received":"sip:212.2.172.228:43356","path":"sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:43356","callid":"4HEiFls2hkky2XR6hL8Nrg..","user_agent":"Z 3.15.40006 rv2.8.20","instance":"","expires":1510227474,"cseq":34,"flags":0,"cflags":64,"q":-1,"last_modified":1510227361,"methods":4294967295,"reg_id":0,"server_id":2} ```
Ok - there were some attributes missing in the usrloc rpc output in 5.0 - updated in master, here: https://github.com/kamailio/kamailio/commit/a57465ff47d46fb5d64c692a72f42767...
I have tested and it is fine to apply this patch to 5.0 - please apply and post again the output afterwards.
ok, i'll re-build with the patch and come back here to update.
ok, now I can see the server_id in the usrloc dump.
``` { "Info": { "AoR": "example_user@example.com", "HashID": -1389656423, "Contacts": [{ "Contact": { "Address": "sip:example_user@212.2.172.228:43356;rinstance=4dc262b9af47682f;transport=UDP", "Expires": 111, "Q": -1, "Call-ID": "4HEiFls2hkky2XR6hL8Nrg..", "CSeq": 86, "User-Agent": "Z 3.15.40006 rv2.8.20", "Received": "sip:212.2.172.228:43356", "Path": "sip:10.7.0.186:5062;lr;received=sip:212.2.172.228:43356", "State": "CS_NEW", "Flags": 2, "CFlags": 64, "Socket": "[not set]", "Methods": -1, "Ruid": "uloc-2-5a04296d-4ed0-e1", "Instance": "[not set]", "Reg-Id": 0, "Server-Id": 2, "Tcpconn-Id": -1, "Keepalive": 1, "Last-Keepalive": 1510231303, "Last-Modified": 1510231303 } }] } } ```
Let me run this for a while longer and see if I can see any options messages being sent on the "wrong" registrar(s). After the update and restart, I currently don't see any "incorrect" options messages being sent, which indicates, for now, that nathelpers filtering is working as expected.
On a side note, is the aor server_id attribute exposed to $ulc pseudo variable? So if I wanted to see which server_id an aor was loaded on I could do a reg_fetch_contacts and check the server_id attribute, and if an aor expired, I should be able to see the server_id in the "usrloc:contact-expired" event route when calling, for example "aor: $ulc(exp=>aor), server_id: $ulc(exp=>server_id)"
Hmm, just seen one happen now. ``` U 2017/11/09 13:25:56.263541 10.6.0.189:5060 -> 10.7.0.186:5062
OPTIONS sip:example_user@78.143.152.30:59947 SIP/2.0. Via: SIP/2.0/UDP 10.6.0.189:5060;branch=z9hG4bK5844925. Route: sip:10.7.0.186:5062;lr;received=sip:78.143.152.30:59947. From: sip:keepalive@example.com;tag=uloc-1-5a042966-4d55-1-6821c98a-af4da324. To: sip:example_user@78.143.152.30:59947. Call-ID: 8ad70ab7-8e92721-ffcfa07@10.6.0.189. CSeq: 1 OPTIONS. Content-Length: 0. . ```
What looks like happened here is that the server_id is set to 1, but the socket is not set, so nathelper is getting the default interface from the os and sending the ping on the wrong interface.
Here's the aor on registrar_1: ``` { "jsonrpc": "2.0", "result": { "AoR": "example_user@example.com", "Contacts": [{ "Contact": { "Address": "sip:example_user@78.143.152.30:59947", "Expires": 58, "Q": -1, "Call-ID": "1af6284f-59888f215eb1ff0e91850080f0808080@KX-HDV430X", "CSeq": 1288, "User-Agent": "BF/IE/KX-HDV430X/06.001/BCC3422AAF2C", "Received": "sip:78.143.152.30:59947", "Path": "sip:10.7.0.186:5062;lr;received=sip:78.143.152.30:59947", "State": "CS_NEW", "Flags": 2, "CFlags": 64, "Socket": "[not set]", "Methods": 8095, "Ruid": "uloc-1-5a042966-4d55-1", "Instance": "[not set]", "Reg-Id": 0, "Server-Id": 1, "Tcpconn-Id": -1, "Keepalive": 1, "Last-Keepalive": 1510233939, "Last-Modified": 1510233939 } }] }, "id": 20508 } ```
This contact is serviced by registrar_2 and it looks like it has its socket set to its local interface, but the server_id is still that of registrar_1 so registrar_1 is trying to ping it.
here's the aor on registrar_2 (this one "saved" the location, why is it's server_id still 1?). ``` { "jsonrpc": "2.0", "result": { "AoR": "example_user@example.com", "Contacts": [{ "Contact": { "Address": "sip:example_user@78.143.152.30:59947", "Expires": 68, "Q": -1, "Call-ID": "1af6284f-59888f215eb1ff0e91850080f0808080@KX-HDV430X", "CSeq": 1290, "User-Agent": "BF/IE/KX-HDV430X/06.001/BCC3422AAF2C", "Received": "sip:78.143.152.30:59947", "Path": "sip:10.7.0.186:5062;lr;received=sip:78.143.152.30:59947", "State": "CS_NEW", "Flags": 0, "CFlags": 64, "Socket": "udp:10.7.0.190:5060", "Methods": 8095, "Ruid": "uloc-1-5a042966-4d55-1", "Instance": "[not set]", "Reg-Id": 0, "Server-Id": 1, "Tcpconn-Id": -1, "Keepalive": 1, "Last-Keepalive": 1510234052, "Last-Modified": 1510234052 } }] }, "id": 20758 } ```
This is because usrloc only stores the server_id on initial insert of a contact, not on subsequent updates/refreshes.
The following patch should fix:
[usrloc_5_0_patch.txt](https://github.com/kamailio/kamailio/files/1457907/usrloc_5_0_patch.txt)
Charles thanks, I'll apply the patch and come back to you.
I just want to confirm that those 3 patches applied to v5.0.4 release seem to have done the trick. I don't see any rogue keep alive messages being sent now. Thank you for looking into this issue Charles, much appreciated.
On the question of exposing the servid_id attribute to the $ulc psuedo variable, can we do this as part of this ticket, or shall I create a new issue for that?
Thanks for testing and confirming. Two out of the three patches are in the 5.0 branch now. I expect the third to be in today or early next week.
As for the $ulc pv, I’ll take a look later today and report back here.
Please try with the following:
[registrar_5_0_patch.txt](https://github.com/kamailio/kamailio/files/1461467/registrar_5_0_patch.txt)
I have the $ulc patch active now, I am testing it with the following routing block:
``` event_route[usrloc:contact-expired] { xlog("expired contact for $ulc(exp=>aor) server_id: $ulc(exp=>server_id)\n"); } ```
I'll leave it running for some time and report back here.
I can confirm that I am able to access the server_id attribute in the ulc pvar now. Thanks for sorting that out too :)
Closed #1299.
No problem, thanks for confirming. Will close this now - feel free to reopen if necessary.