### Description
When using topos, Kamailio corrupts the Via header in the OK packet sent when responding to an INVITE.
For example, the following is observed in the OK:
Via: SIP/2.0/UDP 1.2.3.4:5065;rport=5065;branch=abc,SIP/2.0/UDP 2.3.4.5:5060;branch=bcd
Expected behaviour is to correctly insert two separate Via headers, as in the original INVITE:
Via: SIP/2.0/UDP 1.2.3.4:5065;rport=5065;branch=abc Via: SIP/2.0/UDP 2.3.4.5:5060;branch=bcd
### Troubleshooting
When not loading topos.so, OK packet is passed as expected.
#### Reproduction
Place a call with a dialog where multiple Via headers are present with topos.so loaded. Fully reproducible every time.
### Possible Solutions
topos.so should reconstruct the Via header properly by inserting a separate Via header for each comma separated value it stores in the database.
### Additional Information
``` version: kamailio 4.4.5 (x86_64/linux) f98162 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: f98162 compiled on 16:48:08 Jan 18 2017 with gcc 4.4.7 ```
SIP RFC 3261 allows multiple bodies stored inside one header, see:
* https://tools.ietf.org/html/rfc3261#section-7.3
So this is not a bug from specs point of view. Are you encountering problems with any sip device/app? If yes, which ones?
Yes, devices behind NAT are not having replies processed correctly. Kamailio forgets they're behind NAT. Without topos loaded, it's able to remember that and handles the reply correctly.
I've got several IP Phones and Asterisk behind NAT that exhibit the problem. On the other side is a Metaswitch.
This was the first major difference I spotted, but I've since uncovered further unexpected behaviour with INVITES. It appears the initial INVITE is forwarded with the modified headers and then a second INVITE is forwarded that's missing the Contact header, but is otherwise identical.
I don't think this is a problem of having the two Via bodies in a single header.
Can you attach a pcap with a call that works (without topos) and another one that fails (with topos)?
I cannot provide a pcap as there's too much private information, but I have prepared a graphical representation of the flow in both cases and a sanitized text dump of the SIP dialog in both cases. I hope this sufficiently illustrates the problem.
So there are two things a) duplicated INVITE (second is missing the Contact: header) b) the ACK and BYE are not relayed to the correct IP address
As mentioned I can reproduce this with certainty any time. Endpoints that are not behind NAT don't seem affected.
data:image/s3,"s3://crabby-images/05a49/05a4942aadcac43da367afcbf310443a6b310f83" alt="with topos"
data:image/s3,"s3://crabby-images/b2758/b2758c8ebb0e16c2674f4b9fd2e9ccd0a7fcc005" alt="without topos"
[badcall-sanitized.txt](https://github.com/kamailio/kamailio/files/791821/badcall-sanitized.txt) [goodcall-sanitized.txt](https://github.com/kamailio/kamailio/files/791823/goodcall-sanitized.txt)
Without seeing the sip traffic, it is hard to figure out what is going wrong. Instead of the pcap file, you can grab the sip traffic with ngrep, like:
``` ngrep -d any -qt -W byline port 5060 ```
Then replace the IP addresses and other sensitive tokens with generic values, so you can share it here.
I did upload the entire SIP dialog for both examples. The links are below the images.
I missed those attachments, anyhow next time use ngrep, because the output from wireshard adds too much noize. You can use ngrep to dump the text format from pcap (which can be exported from wireshark), like:
``` ngrep -qt -W byline -I file.pcap ```
The issue is not with via headers at all. The replies are routed properly, there is an ACK following the 200ok.
The problem is routing the ACK. Are you using set_contact_alias() or add_contact_alias()? You should use the first one. If you do it, then there might be some logic error in kamailio.cfg.
I'll provide cleaner SIP traces in future. Meanwhile if you can bear with the existing examples I'll use the Frame numbers to point a couple of things out.
The only difference in the config file between working and nonworking is commenting out these two lines:
`loadmodule "topos.so" modparam("topos", "db_url", DBURL) ` As for the NAT handling, I've tested with a minimally modified default 4.4.5 kamailio.cfg which does use set_contact_alias().
In badcall-sanitized.txt, Frame 12 is expected, however Frame 13 is a duplicate that's missing the Contact: header. That seems like a bug. This doesn't happen in goodcall-sanitized.txt.
Without knowing how modules interact with the core of Kamailio, I'm wondering if topos.so and nathelper.so might be working on the packet in the wrong order causing the NAT processes to see the encoded headers rather than decoded and thus not act as it should.
Like I mentioned, everything goes to the right place without topos loaded.
Again, it doesn't have to do anything with Via, but with the Contact header.
Can you paste here the records from database tables topos_t and topos_d for such a call?
I'm working on capturing the SIP dialog and the database queries and will try to report back later today.
Meanwhile, I have observed the following which seems to intermittently happen when hanging up on an established call:
``` -> Request: BYE sip:btpsh-58af2eee-8ef-1@1.2.3.180 <- Request: BYE sip:1.2.3.177:5065;lr=on ```
When this happens, the far end replies 404 and keeps the session open until it's session timer fires and the reINVITE fails.
When it works, I see this:
``` -> BYE sip:atpsh-58af2eee-8f6-2@1.2.3.180 <- BYE sip:250xxxxxxx@4.5.6.224:5060 ```
These two calls were placed back to back within a few minutes of each other, NO changes to configuration (topos loaded for both).
I can provide the full SIP dialog for these examples if you'd like but I don't have the database entries for them.
Provide the ngrep traces for the two calls, it seems that the contact of one call is messed up, like by an intermediary proxy/strict router.
[bye bad.txt](https://github.com/kamailio/kamailio/files/800563/bye.bad.txt) [bye good.txt](https://github.com/kamailio/kamailio/files/800562/bye.good.txt)
Steps to reproduce:
1. Start with the kamailio.cfg that ships in kamailio-4.4.5_src.tar.gz 2. In my environment, I used MySQL and configured DBURL accordingly 3. I set up alias= and listen= lines as applicable, and configured rtpproxy. 4. I added these defines: ``` #!define WITH_MYSQL #!define WITH_AUTH #!define WITH_USRLOCDB #!define WITH_BLOCK3XX #!define WITH_NAT ``` 5. Create a user in your auth database so that you can register a SIP device of some kind. 6. Start Kamailio 7. Configure a SIP device that's behind NAT to register to Kamailio 8. Place a call to itself
**Right now, everything works fine. A call an be set up and works normally.**
9. Add these lines to kamailio.cfg ``` loadmodule "topos.so" modparam("topos", "db_url", DBURL) ```
10. Restart Kamailio 11. Place that same test call again.
Now the call cannot be set up as the ACK is not get sent to the right place by Kamailio.
Comment out those two lines and it works again.
Repeatable 100%.
The bye_goot.txt in the comment:
* https://github.com/kamailio/kamailio/issues/1005#issuecomment-282418620
is also with a config using topos, right? Everything looks ok there. You mention that things break when the re-INVITE appears. In the next comment you say that the issue appears always. Is it a different scenario?
There are a few scenarios where it breaks. bye_good.txt is **without** topos loaded, with everything else identical.
In bye_good.txt, the forwarded invite is:
``` U 2017/02/23 13:14:01.100741 10.10.0.180:5060 -> 10.10.0.177:5065 INVITE sip:2500991234@10.10.0.177:5065 SIP/2.0. Via: SIP/2.0/UDP 10.10.0.180;branch=z9hG4bK52c8.d018d7a9db4e110618f9ef76a92218bc.0. Max-Forwards: 69. To: sip:2500991234@mtl.dryvoip.ca;transport=UDP. From: "Trev"sip:2500991234@mtl.dryvoip.ca;tag=b870f52f. Call-ID: xd68Ayt-2PjL7Bc5A2K-oA... CSeq: 2 INVITE. Content-Type: application/sdp. User-Agent: Z 3.15.40006 rv2.8.20. Allow-Events: presence, kpml, talk. Content-Length: 241. X-Custom-Header: 1. X-Route-Via: upstream. Contact: sip:btpsh-58af2eee-8f6-1@10.10.0.180. ... ```
The contact is from topos. Is the trace mixed?
10.10.0.180 is the machine attempting to run topos. 10.10.0.177 is the PSTN gateway (another Kamailio instance outside the scope of this, not running topos)
The captures were taken from 10.10.0.180's network interface so it captured everything at the network layer between the UA, itself, and the upstream proxy.
Sorry, I've done so much testing I've lost track of what I had posted here.
You were correct earlier. The posted captures are both with topos loaded. The difference is call length. The bad one shows the reINVITE getting messed up. If the call is ended quickly, the dialog is normal.
There are a few primary ways I've been bumping into this: * On long calls, things go haywire once the session timer fires and the INVITE is sent. * On calls longer than a few minutes, the BYE goes haywire. * On endpoints behind NAT, the ACK and BYE packets towards the UA aren't sent to the observed public IP; they're sent to the LAN IP (without topos, they go to the right place).
The final example can be observed with my post about Steps to Reproduce. The prior examples are in the various dialogs & sip traces provided.
Can you try with master branch or with the patch from the commit referenced above? It should fix the issue with re-INVITE.
For the scenario with the NAT, if still a problem, I would need a sip trace to investigate.
Thanks. I applied the patch to 4.4.5 and re-tested.
It didn't fix the NAT problem or the BYE problem when there's no NAT, and to be honest I'm too shot right now to be able to go through this to see if the reINVITE is acting properly, but here are the new traces.
[mar1-nat.txt](https://github.com/kamailio/kamailio/files/812406/mar1-nat.txt) [mar1-reinvite.txt](https://github.com/kamailio/kamailio/files/812408/mar1-reinvite.txt)
As an added datapoint I did update to 5.0.0 and grabbed the lastest topos from github and there was no improvement.
I am happy to put up a small bounty if you or anyone else is able to dedicate some time to resolve these two use cases in a timely manner.
Further investigation shows that the database loses b_contact when the called side sends the session timer reINVITE.
At beginning of the call:
``` *************************** 1. row *************************** id: 762 rectime: 2017-03-08 09:29:13 s_method: INVITE s_cseq: 2 a_callid: DaDWA2PfEHrzY2yHo-7UmQ.. a_uuid: atpsh-58bf061d-69c4-4 b_uuid: btpsh-58bf061d-69c4-4 a_contact: sip:xxxxxxxxx@10.50.50.50:32709;transport=UDP b_contact: sip:2500991234@10.20.0.224:5060 as_contact: sip:atpsh-58bf061d-69c4-4@10.10.0.180 bs_contact: sip:btpsh-58bf061d-69c4-4@10.10.0.180 a_tag: 976c020a b_tag: as22aeaf5f a_rr: b_rr: sip:10.10.0.177:5065;lr=on s_rr: sip:10.10.0.180;lr;ftag=976c020a;dv=aae.5ab1 iflags: 2 a_uri: b_uri: r_uri: a_srcaddr: b_srcaddr: a_socket: b_socket: ```
If the call is ended at this point, the BYE is sent correctly.
However, when the called side's session timer expires and it sends a reINVITE, the database changes to this:
``` *************************** 1. row *************************** id: 762 rectime: 2017-03-08 09:29:13 s_method: INVITE s_cseq: 2 a_callid: DaDWA2PfEHrzY2yHo-7UmQ.. a_uuid: atpsh-58bf061d-69c4-4 b_uuid: btpsh-58bf061d-69c4-4 a_contact: sip:xxxxxxxxx@10.50.50.50:32709;transport=UDP b_contact: <--- this is now blank as_contact: sip:atpsh-58bf061d-69c4-4@10.10.0.180 bs_contact: sip:btpsh-58bf061d-69c4-4@10.10.0.180 a_tag: 976c020a b_tag: as22aeaf5f a_rr: b_rr: sip:10.10.0.177:5065;lr=on s_rr: sip:10.10.0.180;lr;ftag=976c020a;dv=aae.5ab1 iflags: 2 a_uri: b_uri: r_uri: a_srcaddr: b_srcaddr: a_socket: b_socket: ```
If the call is ended by the caller after this, it sends `BYE sip:10.10.0.177:5065;lr=on SIP/2.0` instead of `BYE sip:2500991234@10.20.0.224:5060 SIP/2.0` as expected.
Thanks for troubleshooting further -- the new details gave me a direction where to investigate and I think I found where the issue resides. I will try to come up with a patch for it.
I will change the things in module as per your requirements
Can you try with latest master or applying also the patch referenced above?
Looks good so far, thanks!
Doing some more in depth testing to make sure there aren't any negative side affects.
I haven't come across any regressions and all tests are now working as expected.
Closed #1005.
ok - thanks for testing. Out of curiosity, have you stress tested for performances, or just the typical call flows?
Just tested with a few concurrent calls with our setup to make sure various scenarios were handled correctly (ie. call hold, session timers, hang ups, call refusals). Didn't do any stress tests or simulated things like database failures, etc.