I got one thing wrong, and that saves bundles of work. Here's from experimental code,

/*
 * Confusingly, ip(7) states
 *
 * IP_MTU (since Linux 2.2)
 *    Retrieve the current known path MTU of the current socket.
 *    Returns an integer.  IP_MTU  is valid only for getsockopt(2) and
 *    can be employed only when the socket has been connected.
 *
 * Similarly, ipv6(7) states
 *
 * IPV6_MTU
 *     getsockopt(): Retrieve the current known path MTU of the current
 *     socket.  Valid only when the socket has been connected.  Returns
 *     an integer.
 *     
 *     setsockopt():  Set  the  MTU to be used for the socket.  The MTU
 *     is limited by the device MTU or the path MTU when path MTU
 *     discovery is enabled.  Argument is a pointer to integer.
 *
 * This suggests that IP_MTU is a socket property.  However, it makes
 * more sense as a shared global property, which indeed seems to apply:
 *
 * The ipv6(7) entry for IPV6_MTU_DISCOVER references IP_MTU_DISCOVER;
 * the ip(7) entry for IP_MTU_DISCOVER states
 *
 * IP_MTU_DISCOVER (since Linux 2.2)
 *    When PMTU discovery is enabled, the kernel automatically keeps track
 *    of the path MTU  per destination host.  When it is connected to a
 *    specific peer with connect(2), the currently known path MTU can be
 *    retrieved conveniently using the IP_MTU socket option (e.g.,  after
 *    an  EMSGSIZE  error  occurred).   The  path MTU may change over time.
 *    For connectionless sockets with many destinations, the new MTU for a
 *    given destination can also be  accessed using  the  error  queue (see
 *    IP_RECVERR).  A new error will be queued for every incoming MTU update.
 *
 *    While MTU discovery is in progress, initial packets from datagram
 *    sockets may be dropped.  Applications  using  UDP  should  be aware
 *    of this and not take it into account for their packet retransmit strategy.
 *
 * Retransmission is common in UDP applications.  Ideally, the IP_RECVERR or
 * IPV6_RECVERR are used to immediately resend, without wait for timers to
 * expire; and without limiting the number of Path MTU lessens learnt to the
 * number of timer rounds.
 *
 * For IPv6, where fragmenttion is required to accomodate the Path MTU, and
 * for unconnected applications, the lessons from Path MTU discovery are of
 * major impact on their behaviour; we should always let the socket fragment
 * frames when so desired, so:
 *
 * IP_MTU_DISCOVER (since Linux 2.2)
 *    IP_PMTUDISC_WANT will fragment a datagram if needed according to the
 *    path MTU, [IPv4-only: or will set the don't-fragment flag otherwise].
 *
 *    Path MTU discovery value   Meaning
 *    IP_PMTUDISC_WANT           Use per-route settings.
 *    IP_PMTUDISC_DONT           Never do Path MTU Discovery.
 *    IP_PMTUDISC_DO             Always do Path MTU Discovery.
 *    IP_PMTUDISC_PROBE          Set DF but ignore Path MTU.
 *
 */

I'm documenting it here, so that the knowledge is not lost on the project. This is difficult stuff.

It would seem that Path MTU discovery is not maintained per socket (which would benefit locality and proper cleanup of the knowledge) but as a global kernel property for the route (which benefits reuse of the knowledge, IWO a useful form of caching).

Conclusions for Kamailio on IPv6

  1. The idea to set different MTU values for two sockets failed for unconnected sockets. And to have multiple MTUs you need unconnected sockets.
  2. This means that the idea of a secondary socket is not going to work in Kamailio either.
  3. It does seem to be true that the kernel keeps track of Path MTU if asked.
  4. For IPv6, not learning from Path MTU feedback (ICMPv6 Packet too Big) always leads to the same effect; once a frame is dropped it is always lost, regardless of resends. Kamailio comes across as unstable, especially because SIP message sizes vary and make some things works while others fail.
  5. Note that it never causes packet drops if Path MTU discovery is enabled for IPv6; there is just a reason for fragmentation, which at most is an efficiency issue. Note that IPv6 has no "Don't Fragment" option; this behaviour is always active.
  6. And it means that it can only add value to enable Path MTU discovery for IPv6. Even if sysctl() could make such a setting, Kamailio stability demands this for IPv6, AFAIK.
  7. Path MTU discovery for IPv4 continues to be an option and a matter of taste, unlike for IPv6.

Perfection for Kamailio over IPv6

  1. The first contact with an IPv6 host may drop with Packet too Big over ICMPv6 messages. This may happen when the kernel drops knowledge. Some SIP processing is an hour apart, and may cause this dropping of knowledge.
  2. Use of IPV6_RECVERR enables immediate resending, with improved Path MTU knowledge. This involves an extra polling mechanism, which is beyond my reach. This also links into the tm logic and goes beyond my reach. For sl replies there will probably be a 2nd round if Path MTU problems arise, because the reply was sent-then-forgotten, and needs to wait for another round.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/3119/1152876570@github.com>