On 2018/10/05 18:38, Alexander Bluhm wrote: > IPv6 Source selection is a mess! > > > ICMP6 messages > > are generated with a source of, I think, the local address associated with > > the route to the recipient, > > It is not that simple. Look at in6_ifawithscope() in sys/netinet6/in6.c.
I know that's used for newly generated packets, but I'm not sure it's the case for icmp6, I haven't tried modifying the kernel to confirm for sure that this is the code generating the address in this case, but it seems likely, in icmp6.c: 1111 /* 1112 * If the incoming packet was addressed directly to us (i.e. unicast), 1113 * use dst as the src for the reply. 1114 * The IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED case would be VERY rare, 1115 * but is possible (for example) when we encounter an error while 1116 * forwarding procedure destined to a duplicated address of ours. 1117 */ 1118 rt = rtalloc(sin6tosa(&sa6_dst), 0, rtableid); 1119 if (rtisvalid(rt) && ISSET(rt->rt_flags, RTF_LOCAL) && 1120 !ISSET(ifatoia6(rt->rt_ifa)->ia6_flags, 1121 IN6_IFF_ANYCAST|IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED)) { 1122 src = &t; 1123 } > Could you provide your ifconfig and route output? So we could > figure out into which path of this algorith you are running. The host running traceroute has a handful of global scope addresses on loopback interfaces, plus a global scope address on a vlan interface facing the next router, all advertised into ospf. The default source address is one of the loopbacks, 2001:67c:15f4:a423::26, so the only route back from the rest of the network to this address is via link-locals all the way. BGP routes changed so I'll include a traceroute using the default source address again so all the new copied output is consistent: $ traceroute6 -n www.google.com traceroute6 to www.google.com (2a00:1450:4009:80b::2004), 64 hops max, 60 byte packets 1 fe80::5606:33d8:d784:cd2f%vlan701 0.494 ms 0.362 ms 0.373 ms 2 * * * 3 * * * 4 2001:7f8:17::1b1b:1 7.272 ms 7.332 ms 6.938 ms 5 2001:7f8:17::3b41:1 6.699 ms 6.342 ms 6.453 ms [...] >From the first hop router, gr1$ route -n get -inet6 2001:67c:15f4:a423::26 route to: 2001:67c:15f4:a423::26 destination: 2001:67c:15f4:a423::26 mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff gateway: fe80::e648:4970:e85f:5e13%vlan701 interface: vlan701 if address: fe80::5606:33d8:d784:cd2f%vlan701 priority: 32 (ospf) flags: <UP,GATEWAY,HOST,DONE> use mtu expire 29464060 0 0 >From the second hop router, gr5$ route -n get -inet6 2001:67c:15f4:a423::26 route to: 2001:67c:15f4:a423::26 destination: 2001:67c:15f4:a423::26 mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff gateway: fe80::d05a:f0e8:e5e:a30a%vlan740 interface: vlan740 if address: fe80::b769:5751:d87b:44b7%vlan740 priority: 32 (ospf) flags: <UP,GATEWAY,HOST,DONE> use mtu expire 20369017 0 0 If instead I source packets from the vlan interface (directly connected to the next router), I instead get this: $ traceroute6 -n -s 2a03:8920:1:52bd::184 www.google.com traceroute6 to www.google.com (2a00:1450:4009:80b::2004) from 2a03:8920:1:52bd::184, 64 hops max, 60 byte packets 1 2a03:8920:1:52bd::181 1.769 ms 0.382 ms 0.377 ms 2 * * * 3 * * * 4 2001:7f8:17::1b1b:1 6.931 ms 6.999 ms 7.115 ms 5 2001:7f8:17::3b41:1 6.466 ms 6.568 ms 6.416 ms The routes in this case from the hop 1 and 2 routers are gr1$ route -n get -inet6 2a03:8920:1:52bd::184 route to: 2a03:8920:1:52bd::184 destination: 2a03:8920:1:52bd::184 mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff interface: vlan701 if address: 2a03:8920:1:52bd::181 priority: 3 () flags: <UP,HOST,DONE,LLINFO,CLONED> use mtu expire 22811 0 3 gr5$ route -n get -inet6 2a03:8920:1:52bd::184 route to: 2a03:8920:1:52bd::184 destination: 2a03:8920:1:52bd:: mask: ffff:ffff:ffff:ffff:: gateway: fe80::d05a:f0e8:e5e:a30a%vlan740 interface: vlan740 if address: fe80::b769:5751:d87b:44b7%vlan740 priority: 32 (ospf) flags: <UP,GATEWAY,DONE> use mtu expire 234 0 0 So the first hop packet is returned from a "normal" address, and the second (from tcpdump) is returned from fe80::b769:5751:d87b:44b7 and of course doesn't make it all the way back to the host running traceroute. > Once I have have added the following rule from a newer RFC. It > makes things better for many caes, but not with OSPF6. There you > may have an interface with only link-local addresses. Then this > link-local is used instead of another better scoped one. I have global addresses on all interfaces involved, none of the involved interfaces just have link-local. > /* RFC 3484 5. Rule 5: Prefer outgoing interface */ > > > 4 2001:728:0:5000::55 7.843 ms 8.236 ms 7.391 ms > > How can this work? Does your AS-Boundary Router do NAT? The source address on my traceroutes is a global scope address in all these cases, so my upstream or peer knows how to route back to that. > > What's anyone else doing? Just living with it or has anyone figured a way > > to make it nicer? I'd like to reply with either a global scope address for > > the interface, or a loopback address, > > We have implemented more or less a very old RFC. There are two > newer RFCs with different algorithms. There is recommendation to > store policies from user-land into the kernel for address selection. > > I have just looked at FreeBSD in6_ifawithifp(), it is quite simple. > Perhaps this is a way to go. The code in FreeBSD's icmp6.c matching the above calls in6ifa_ifwithaddr https://svnweb.freebsd.org/base/head/sys/netinet6/icmp6.c?revision=338831&view=markup#l2113 > > I didn't get anywhere with PF > > translation though. > > pf with IPv6 link-local addresses does not work properly. I think > it cannot parse the %if suffixes. The KAME hack scope id is not > handled. Thank you.