Hello,
IPv6 routers never fragment packets. Rather, they drop a packet that is too large for a (local) MTU and send back ICMPv6 "Packet Too Big". This seems to cause loss of larger SIP messages when an ISP tunnels their IPv6 at the expense of the MTU.
The pmtu_discovery flag sets Don't Fragment in IPv4 traffic; in IPv6 this is an implied property. Does Kamailio learn a lower MTU from any "Packet Too Big" for IPv6 even if pmtu_discovery is not set? Future resends can then be fragemented appropriately.
The udp_mtu setting diverts to another protocol, but that would be a setting as low as the worst peer, impacting all. It would be a weird struggle with a telco serving many. PMTU would be better to rely on, but how does it work in Kamailio?
Details on https://www.rfc-editor.org/rfc/rfc3542#section-11.3 https://stackoverflow.com/questions/38817837/how-does-mtu-retransmission-wor...
Thanks, -Rick
(sr-dev on CC)
Hello,
I am not aware of a special handling of MTU discovery regarding IPv6 UDP traffic in Kamailio core. But of course, we have a lot of code.
You find the implementation of the MTU handling in the src/core/udp_server.c file. Its just setting the appropriate socket option right now.
Cheers,
Henning
On 12/05/2022 07.31, [EXT] Rick van Rein wrote:
Hello,
IPv6 routers never fragment packets. Rather, they drop a packet that is too large for a (local) MTU and send back ICMPv6 "Packet Too Big". This seems to cause loss of larger SIP messages when an ISP tunnels their IPv6 at the expense of the MTU.
The pmtu_discovery flag sets Don't Fragment in IPv4 traffic; in IPv6 this is an implied property. Does Kamailio learn a lower MTU from any "Packet Too Big" for IPv6 even if pmtu_discovery is not set? Future resends can then be fragemented appropriately.
The udp_mtu setting diverts to another protocol, but that would be a setting as low as the worst peer, impacting all. It would be a weird struggle with a telco serving many. PMTU would be better to rely on, but how does it work in Kamailio?
I haven't looked at the Kamailio code either, but in general this is handled by the network stack directly (e.g. the Linux kernel), transparent to the application (Kamailio).
1. The application wants to send a packet and uses the appropriate API (e.g. the kernel's send() system call). 2. The kernel takes care to actually send the packet out to its destination. 3. The packet then hits an MTU barrier along its path. The packet is discarded by the remote router and the router sends back an ICMPv6 packet to the originator. 4. The kernel receives this ICMPv6 packet and from this learns that the path MTU to that destination is lower. The application generally is not notified about this. An automatic retransmission also doesn't happen. 5. The application wants to send another packet to the same destination (e.g. in Kamailio's case probably a retransmission of the first one, as that packet was never acknowledged). 6. The application does exactly the same thing as in step 1. 7. The kernel now knows about the smaller PMTU to that packet's destination and will therefore fragment the packet appropriately before sending the fragments out.
Cheers
Hello Henning and Richard,
Henning Westerholt helped me focus in the code:
You find the implementation of the MTU handling in the src/core/udp_server.c file. Its just setting the appropriate socket option right now.
I think I found a few bugs, centering around https://github.com/kamailio/kamailio/blob/master/src/core/udp_server.c#L331-...
The file clearly shows how the option is processed,
(pmtu_discovery) ? IP_PMTUDISC_DO : IP_PMTUDISC_DONT
This is IPv4-only, and it looks like a bug that no check on the family is done before this is set. Note that Linux defines
/usr/include/linux/in6.h: #define IPV6_MTU_DISCOVER 23 /usr/include/linux/in.h: #define IP_MTU_DISCOVER 10
In general, Path MTU discovery only applies to connected sockets, which is not what happens in udp_server.c -- the IPv4 version sets the DF flag, which made me wonder if that actually gets handled at all. The IP_RECVERR flag described in ip(7) is used and is intended for such connectionless MTU handling. For IPv6, there is an IPV6_RECVERR,
/usr/include/linux/in6.h: #define IPV6_RECVERR 25 /usr/include/linux/in.h: #define IP_RECVERR 11
The IPV6 variant is absent, which would be another bug. (FYI, I use an IPv6-only setup, probably why this turns up.)
This being the mechanism to handle MTU discovery for unconnected sockets, I read ip(7) and it mentions a flag MSG_ERRQUEUE to be used with recvmsg(). I could not find this flag in Kamailio, so I suspect that this treatment was not completed after adding the IP_RECVERR flag.
An approach that would always be safe AFAIK is to change a socket with this kind of error to a connected socket, and set the lower MTU on that. And then, continue sending. Connecting over UDP is kind-of free, and avoids relying on another protocol in the peer. The expense would be grabbing an extra socket, which is why it may be better to await Path MTU failure.
Richard Fuchs explained in detail what happens:
- The application wants to send another packet to the same destination (e.g. in Kamailio's case probably a retransmission of the first one, as that packet was never acknowledged).
- The application does exactly the same thing as in step 1.
- The kernel now knows about the smaller PMTU to that packet's destination and will therefore fragment the packet appropriately before sending the fragments out.
These last steps however, only apply to a _connected_ UDP socket. I chased for that in the given file, but did not find it.
I suppose there are also problems in Linux' double-action of MTU as implied MRU -- it means that you cannot be conservative in what you send and liberal in what you accept -- that would have been a useful OS-level strategy. In lieu of that, I suppose it is an application problem :'-(
This in general feels like it is outside my reach. I can understand it, but cannot fix it. Have I hereby submitted a bug, or is an issue on GitHub the proper path?
Thanks,
Rick van Rein
Hello Rick,
thanks for looking into it.
You already opened an issue about that, which is a good idea to keep track of it.
Cheers,
Henning