Hello,
I'm setting up a Kamailio instance behind a TCP load balancer (with proxy
protocol and NAT routing: meaning Kam stays in the flow all the time).
I've managed to get working almost everything we need for our service,
except for one thing, and that is for Kam to use existing connections for
subsequent transactions:
Following this example:
[image: image.png]
EXT & INT represent the external and internal interface of a LB between the
UAC and Kamailio, using TLS on both legs and proxy protocol.
Transaction 1: INVITE, 100, 180, 183, 200 OK
UAC 1.1.1.1:1111 -> 2.2.2.2:443 (EXT) 3.3.3.3:3333 (INT) -> 7.7.7.7:5060
(Kamailio)
Transaction 2: ACK
UAC 1.1.1.1:1112 -> 2.2.2.2:443 (EXT) 4.4.4.4:4444 (INT) -> 7.7.7.7:5060
(Kamailio)
Transaction 3: BYE
Kam 7.7.7.7:5060 -> 3.3.3.3:3333 (INT) 2.2.2.2:443 (EXT) -> 1.1.1.1:1111 UAC
My problem is with Transaction 3. In this case the BYE is originated by the
callee, and Kam has to send it to the caller. As the TCP load balancer is
between Kam and the UAC, Kam has to send it to the LB so then the LB can
forward it back to the UAC. This works well for msgs that belong to the
same transaction (INVITE, 100, 180, 183, 200 OK) but it fails when they
don't belong to the same transaction.
Thanks to the newly added $tcp(c_si) and $tcp(c_sp) pseudovars, I can save
the internal IP:Port of the LB, so I can send stuff later to it, my problem
is that Kam doesn't seem to allow this?
On the original INVITE, I use the following to save where I have to reach
the UAC:
add_contact_alias("$tcp(c_si)", "$tcp(c_sp)", "tls");
Then, handle_ruri_alias() will take care of setting $du to the correct
(internal LB) IP:Port so I can reach the UAC, this works.
My problem is that Kamailio doesn't identify that there is a valid existing
TLS connection still up (from the INVITE), and tries to create a new one
(and this obviously doesn't gives all sorts of problems).
So when I run handle_ruri_alias(), and $du is set to 3.3.3.3:3333 (from the
example above), instead of using the existing connection, Kamailio tries to
create a new one.
I have a log statement right before with the result of tcp_conid_state(1)
(the connid is 1 for this connection) and the $rc is 1 (Connection is OK),
but when I tell Kamailio it has to use it I get this in the logs:
DEBUG: {1 11726467 BYE gqR1qqNK8B} <core> [core\/tcp_main.c:2060]:
tcp_send(): no open tcp connection found, opening new one"}
And then the problems begin...
I have tried playing around with:
tcp_reuse_port
tcp_connection_match
But no luck..!
I also thought it could be a problem of the connection being created on one
worker, and a different worker handling BYE transaction, so tested with
children=1 and tcp_children=1, but still same problem.
A more detailed log:
In blue my log statement checking for the status of conid "1", in red Kam
not being able to find it, although it exists (as validated
by tcp_conid_state(), and even in netstat I can see the connection
established). In this log, 35.191.0.66:60271 would be the equivalent of
3.3.3.3:3333 and 104.175.176.242:28157 would be 1.1.1.1:1111 from the
example above.
...
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/tcp_main.c:1657]: _tcpconn_find(): found connection by id: 1"}
"message":" NOTICE: {1 11727734 BYE 5-LX4GdI9X} <script>: JOEL TEST
New
request - M=BYE TCP STATUS:1 ID=5-LX4GdI9X"}
...
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} nathelper
[nathelper.c:1144]: handle_ruri_alias(): setting dst_uri to
<sip:35.191.0.66:60271;transport=tls>"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} nathelper
[nathelper.c:1166]: handle_ruri_alias(): rewriting r-uri to
<sip:linphone@104.175.176.242:28157;transport=tls>"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} tm [t_lookup.c:1328]:
t_newtran(): msg (0x7f3c884259d0) id=534\/18664 global id=532\/18664 T
start=0xffffffffffffffff"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} tm [t_lookup.c:497]:
t_lookup_request(): start searching: hash=63496, isACK=0"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} tm [t_lookup.c:455]:
matching_3261(): RFC3261 transaction matching failed - via branch
[z9hG4bK808f.eee2444f92a02cb33e1b7a21f20bc6bb.0]"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} tm [t_lookup.c:675]:
t_lookup_request(): no transaction found"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} tm [t_hooks.c:336]:
run_reqin_callbacks_internal(): trans=0x7f3c83b8c598, callback type 1, id 0
entered"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/crypto\/md5utils.c:67]: MD5StringArray(): MD5 calculated:
3071029feb05962b26b53a9664a14210"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} siputils [checks.c:123]:
has_totag(): totag found"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} rr [loose.c:1095]:
check_route_param(): route params checking against
[;r2=on;lr=on;ftag=Eb~TbdfTA;did=cab.01e2;nat=yes] (orig:
[r2=on;lr=on;ftag=Eb~TbdfTA;did=cab.01e2;nat=yes])"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} rr [loose.c:1101]:
check_route_param(): params are
<;r2=on;lr=on;ftag=Eb~TbdfTA;did=cab.01e2;nat=yes>"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} siputils [checks.c:123]:
has_totag(): totag found"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/msg_translator.c:161]: check_via_address(): (198.1.54.228,
198.1.54.228, 0)"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/tcp_main.c:2060]: tcp_send(): no open tcp connection found, opening
new one"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/ip_addr.c:229]: print_ip(): tcpconn_new: new tcp connection:
35.191.0.66"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/tcp_main.c:1242]: tcpconn_new(): on port 60271, type 3"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} <core>
[core\/tcp_main.c:1561]: tcpconn_add(): hashes: 1446:2350:0, 5"}
"message":" DEBUG: {1 11727734 BYE 5-LX4GdI9X} tls [tls_server.c:199]:
tls_complete_init(): completing tls connection initialization"}
...
So time to seek help from the community, any suggestions/ideas/comments?
Sorry if all this sounds confusing, I've tried my best to put in text the
whole scenario in and "understandable" way...
Is this even doable?
Thanks,
Joel.