Hi.
Recently I've come across with TCP connection problem. The topology is as following:
DNS srv load balancer - two kamailio proxy servers - one routing server.
Client appeals to NAPTR record like: sip.domain.com So dns returns one of the proxy servers to client (depending on weight/priority). Now both kamailio have the same priority and weight (the goal is load balancing).
Routing server (now it is asterisk) working with chan_pjsip.so, that supports NAPTR/SRV records. He is able to resolve Record-Route / Route headers with value - sip.domain.com (that proxy servers add to record-route headers while relaying requests to him). This topology is done to support present dialogs, even if proxy that recently processed it, is dead.
But the problem comes, when routing server (asterisk) sends in-dialog requests to the proxy, that wasn't used to establish the dialog. Example, routing server obtains 200 OK from endpoint (relayed by kamailio1 to him) and he sends back ACK, but not to the kamailio1, he sends it to kamailio2 (because he resolves NAPTR sip.domain.com and gets ip of second kamailio). Kamailio2 processes the request as usual, because both kamailio have the same db for dialog module, but when he tries to relay the request to endpoint, he gots the error: ERROR: <core> [tcp_main.c:4070]: handle_tcpconn_ev(): connect XXX.XXX.XXX.XXX:52185 failed
The port that kamailio2 tries to use to relay the ACK, is port that endpoint used to establish the dialog with kamailio1 and actually his TCP connection is now established with kamailio1. So kamailio2 tries to use the same port and gets the error.
And this is proper behavior I think.
There is no problem with UDP transport.
Has anyone seen the similar problem? That indeed is not a problem, but proper behavior.
On Thu, Sep 07, 2017 at 11:03:49AM +0300, Donat Zenichev wrote: [snip]
ERROR: <core> [tcp_main.c:4070]: handle_tcpconn_ev(): connect XXX.XXX.XXX.XXX:52185 failed
The port that kamailio2 tries to use to relay the ACK, is port that endpoint used to establish the dialog with kamailio1 and actually his TCP connection is now established with kamailio1. So kamailio2 tries to use the same port and gets the error.
And this is proper behavior I think.
There is no problem with UDP transport.
This problem also exists with UDP when NAT is involved. I don't think there is anything you could do to solve this problem with TCP/TLS connections, especially with NAT.
Having a similar setup with failover for the loadbalancers, I take for granted that TCP/TLS will fail in case of a failover (but UDP will keep working after failover due to the stateless nature of it). Luckily kamailio is rock solid and the only reason the TCP sockets fail is a restart of kamailio on config change.
Having a similar setup with failover for the loadbalancers, I take for granted that TCP/TLS will fail in case of a failover (but UDP will keep working after failover due to the stateless nature of it).
Well, your routing servers use PJSIP for NAPTR resolving? If so, how have you made it working?
I mean, did you find a solution for TCP connections?
2017-09-07 11:03 GMT+03:00 Donat Zenichev donat.zenichev@gmail.com:
Hi.
Recently I've come across with TCP connection problem. The topology is as following:
DNS srv load balancer - two kamailio proxy servers - one routing server.
Client appeals to NAPTR record like: sip.domain.com So dns returns one of the proxy servers to client (depending on weight/priority). Now both kamailio have the same priority and weight (the goal is load balancing).
Routing server (now it is asterisk) working with chan_pjsip.so, that supports NAPTR/SRV records. He is able to resolve Record-Route / Route headers with value - sip.domain.com (that proxy servers add to record-route headers while relaying requests to him). This topology is done to support present dialogs, even if proxy that recently processed it, is dead.
But the problem comes, when routing server (asterisk) sends in-dialog requests to the proxy, that wasn't used to establish the dialog. Example, routing server obtains 200 OK from endpoint (relayed by kamailio1 to him) and he sends back ACK, but not to the kamailio1, he sends it to kamailio2 (because he resolves NAPTR sip.domain.com and gets ip of second kamailio). Kamailio2 processes the request as usual, because both kamailio have the same db for dialog module, but when he tries to relay the request to endpoint, he gots the error: ERROR: <core> [tcp_main.c:4070]: handle_tcpconn_ev(): connect XXX.XXX.XXX.XXX:52185 failed
The port that kamailio2 tries to use to relay the ACK, is port that endpoint used to establish the dialog with kamailio1 and actually his TCP connection is now established with kamailio1. So kamailio2 tries to use the same port and gets the error.
And this is proper behavior I think.
There is no problem with UDP transport.
Has anyone seen the similar problem? That indeed is not a problem, but proper behavior.
--
BR, Donat Zenichev Wnet VoIP team Tel: +380(44) 5-900-808 http://wnet.ua
On Thu, Sep 07, 2017 at 01:05:02PM +0300, Donat Zenichev wrote:
Having a similar setup with failover for the loadbalancers, I take for granted that TCP/TLS will fail in case of a failover (but UDP will keep working after failover due to the stateless nature of it).
Well, your routing servers use PJSIP for NAPTR resolving? If so, how have you made it working?
I use an all kamailio solution, purely routing on dialog headers (new dialogs to UACs are routed on Path header added during REGISTER).
I mean, did you find a solution for TCP connections?
But the issue with TCP is the same. On failover/restart (or completly down in your case), the TCP session is lost, and non recoverable assuming NAT. AFAIK there is no solution.
But the issue with TCP is the same. On failover/restart (or completly down in your case), the TCP session is lost, and non recoverable assuming NAT. AFAIK there is no solution.
It's a pitty, my robust mind tells me, that there is no way to solve it clearly, without stupid "crutches". But I still hope, that there is a solution to it - I don't want to move away from my idea.
Guys, the question is still opened, if someone can suggest solutions, I will be glad to read it.
2017-09-07 13:05 GMT+03:00 Donat Zenichev donat.zenichev@gmail.com:
Having a similar setup with failover for the loadbalancers, I take for granted that TCP/TLS will fail in case of a failover (but UDP will keep working after failover due to the stateless nature of it).
Well, your routing servers use PJSIP for NAPTR resolving? If so, how have you made it working?
I mean, did you find a solution for TCP connections?
2017-09-07 11:03 GMT+03:00 Donat Zenichev donat.zenichev@gmail.com:
Hi.
Recently I've come across with TCP connection problem. The topology is as following:
DNS srv load balancer - two kamailio proxy servers - one routing server.
Client appeals to NAPTR record like: sip.domain.com So dns returns one of the proxy servers to client (depending on weight/priority). Now both kamailio have the same priority and weight (the goal is load balancing).
Routing server (now it is asterisk) working with chan_pjsip.so, that supports NAPTR/SRV records. He is able to resolve Record-Route / Route headers with value - sip.domain.com (that proxy servers add to record-route headers while relaying requests to him). This topology is done to support present dialogs, even if proxy that recently processed it, is dead.
But the problem comes, when routing server (asterisk) sends in-dialog requests to the proxy, that wasn't used to establish the dialog. Example, routing server obtains 200 OK from endpoint (relayed by kamailio1 to him) and he sends back ACK, but not to the kamailio1, he sends it to kamailio2 (because he resolves NAPTR sip.domain.com and gets ip of second kamailio). Kamailio2 processes the request as usual, because both kamailio have the same db for dialog module, but when he tries to relay the request to endpoint, he gots the error: ERROR: <core> [tcp_main.c:4070]: handle_tcpconn_ev(): connect XXX.XXX.XXX.XXX:52185 failed
The port that kamailio2 tries to use to relay the ACK, is port that endpoint used to establish the dialog with kamailio1 and actually his TCP connection is now established with kamailio1. So kamailio2 tries to use the same port and gets the error.
And this is proper behavior I think.
There is no problem with UDP transport.
Has anyone seen the similar problem? That indeed is not a problem, but proper behavior.
--
BR, Donat Zenichev Wnet VoIP team Tel: +380(44) 5-900-808 http://wnet.ua
--
BR, Donat Zenichev Wnet VoIP team Tel: +380(44) 5-900-808 http://wnet.ua