On 2018/10/05 18:38, Alexander Bluhm wrote:
> IPv6 Source selection is a mess!
> 
> > ICMP6 messages
> > are generated with a source of, I think, the local address associated with
> > the route to the recipient,
> 
> It is not that simple.  Look at in6_ifawithscope() in sys/netinet6/in6.c.

I know that's used for newly generated packets, but I'm not sure it's the
case for icmp6, I haven't tried modifying the kernel to confirm for sure
that this is the code generating the address in this case, but it seems
likely, in icmp6.c:

1111         /*
1112          * If the incoming packet was addressed directly to us (i.e. 
unicast),
1113          * use dst as the src for the reply.
1114          * The IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED case would be VERY 
rare,
1115          * but is possible (for example) when we encounter an error while
1116          * forwarding procedure destined to a duplicated address of ours.
1117          */
1118         rt = rtalloc(sin6tosa(&sa6_dst), 0, rtableid);
1119         if (rtisvalid(rt) && ISSET(rt->rt_flags, RTF_LOCAL) &&
1120             !ISSET(ifatoia6(rt->rt_ifa)->ia6_flags,
1121             IN6_IFF_ANYCAST|IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED)) {
1122                 src = &t;
1123         }

> Could you provide your ifconfig and route output?  So we could
> figure out into which path of this algorith you are running.

The host running traceroute has a handful of global scope addresses on
loopback interfaces, plus a global scope address on a vlan interface
facing the next router, all advertised into ospf.

The default source address is one of the loopbacks,
2001:67c:15f4:a423::26, so the only route back from the rest of the
network to this address is via link-locals all the way.

BGP routes changed so I'll include a traceroute using the default source
address again so all the new copied output is consistent:

$ traceroute6 -n www.google.com
traceroute6 to www.google.com (2a00:1450:4009:80b::2004), 64 hops max, 60 byte 
packets
 1  fe80::5606:33d8:d784:cd2f%vlan701  0.494 ms  0.362 ms  0.373 ms
 2  * * *
 3  * * *
 4  2001:7f8:17::1b1b:1  7.272 ms  7.332 ms  6.938 ms
 5  2001:7f8:17::3b41:1  6.699 ms  6.342 ms  6.453 ms
[...]

>From the first hop router,

gr1$ route -n get -inet6 2001:67c:15f4:a423::26
   route to: 2001:67c:15f4:a423::26
destination: 2001:67c:15f4:a423::26
       mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
    gateway: fe80::e648:4970:e85f:5e13%vlan701
  interface: vlan701
 if address: fe80::5606:33d8:d784:cd2f%vlan701
   priority: 32 (ospf)
      flags: <UP,GATEWAY,HOST,DONE>
     use       mtu    expire
29464060         0         0 

>From the second hop router,

gr5$ route -n get -inet6 2001:67c:15f4:a423::26
   route to: 2001:67c:15f4:a423::26
destination: 2001:67c:15f4:a423::26
       mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
    gateway: fe80::d05a:f0e8:e5e:a30a%vlan740
  interface: vlan740
 if address: fe80::b769:5751:d87b:44b7%vlan740
   priority: 32 (ospf)
      flags: <UP,GATEWAY,HOST,DONE>
     use       mtu    expire
20369017         0         0 

If instead I source packets from the vlan interface (directly connected
to the next router), I instead get this:

$ traceroute6 -n -s 2a03:8920:1:52bd::184 www.google.com
traceroute6 to www.google.com (2a00:1450:4009:80b::2004) from 
2a03:8920:1:52bd::184, 64 hops max, 60 byte packets
 1  2a03:8920:1:52bd::181  1.769 ms  0.382 ms  0.377 ms
 2  * * *
 3  * * *
 4  2001:7f8:17::1b1b:1  6.931 ms  6.999 ms  7.115 ms
 5  2001:7f8:17::3b41:1  6.466 ms  6.568 ms  6.416 ms

The routes in this case from the hop 1 and 2 routers are

gr1$ route -n get -inet6 2a03:8920:1:52bd::184
   route to: 2a03:8920:1:52bd::184
destination: 2a03:8920:1:52bd::184
       mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
  interface: vlan701
 if address: 2a03:8920:1:52bd::181
   priority: 3 ()
      flags: <UP,HOST,DONE,LLINFO,CLONED>
     use       mtu    expire
   22811         0         3 

gr5$ route -n get -inet6 2a03:8920:1:52bd::184
   route to: 2a03:8920:1:52bd::184
destination: 2a03:8920:1:52bd::
       mask: ffff:ffff:ffff:ffff::
    gateway: fe80::d05a:f0e8:e5e:a30a%vlan740
  interface: vlan740
 if address: fe80::b769:5751:d87b:44b7%vlan740
   priority: 32 (ospf)
      flags: <UP,GATEWAY,DONE>
     use       mtu    expire
     234         0         0 

So the first hop packet is returned from a "normal" address, and the
second (from tcpdump) is returned from fe80::b769:5751:d87b:44b7 and of
course doesn't make it all the way back to the host running traceroute.

> Once I have have added the following rule from a newer RFC.  It
> makes things better for many caes, but not with OSPF6.  There you
> may have an interface with only link-local addresses.  Then this
> link-local is used instead of another better scoped one.

I have global addresses on all interfaces involved, none of the involved
interfaces just have link-local.

>                         /* RFC 3484 5. Rule 5: Prefer outgoing interface */
> 
> >  4  2001:728:0:5000::55  7.843 ms  8.236 ms  7.391 ms
> 
> How can this work?  Does your AS-Boundary Router do NAT?

The source address on my traceroutes is a global scope address in all these
cases, so my upstream or peer knows how to route back to that.

> > What's anyone else doing? Just living with it or has anyone figured a way
> > to make it nicer? I'd like to reply with either a global scope address for
> > the interface, or a loopback address,
> 
> We have implemented more or less a very old RFC.  There are two
> newer RFCs with different algorithms.  There is recommendation to
> store policies from user-land into the kernel for address selection.
> 
> I have just looked at FreeBSD in6_ifawithifp(), it is quite simple.
> Perhaps this is a way to go.

The code in FreeBSD's icmp6.c matching the above calls in6ifa_ifwithaddr
https://svnweb.freebsd.org/base/head/sys/netinet6/icmp6.c?revision=338831&view=markup#l2113

> > I didn't get anywhere with PF
> > translation though.
> 
> pf with IPv6 link-local addresses does not work properly.  I think
> it cannot parse the %if suffixes.  The KAME hack scope id is not
> handled.

Thank you.

Reply via email to