On Sun, 29 Nov 2020 13:54:16 +0100 Guillaume Nault wrote: > On Sat, Nov 28, 2020 at 01:17:16PM -0800, Jakub Kicinski wrote: > > On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote: > > > On 11/26/20 11:09 AM, Guillaume Nault wrote: > > > > When inet_rtm_getroute() was converted to use the RCU variants of > > > > ip_route_input() and ip_route_output_key(), the TOS parameters > > > > stopped being masked with IPTOS_RT_MASK before doing the route lookup. > > > > > > > > As a result, "ip route get" can return a different route than what > > > > would be used when sending real packets. > > > > > > > > For example: > > > > > > > > $ ip route add 192.0.2.11/32 dev eth0 > > > > $ ip route add unreachable 192.0.2.11/32 tos 2 > > > > $ ip route get 192.0.2.11 tos 2 > > > > RTNETLINK answers: No route to host > > > > > > > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would > > > > actually be routed using the first route: > > > > > > > > $ ping -c 1 -Q 2 192.0.2.11 > > > > PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. > > > > 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms > > > > > > > > --- 192.0.2.11 ping statistics --- > > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > > > rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms > > > > > > > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to > > > > return results consistent with real route lookups. > > > > > > > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu > > > > versions of route lookup") > > > > Signed-off-by: Guillaume Nault <gna...@redhat.com> > > > > > > Reviewed-by: David Ahern <dsah...@kernel.org> > > > > Applied, thanks! > > > > Should the discrepancy between the behavior of ip_route_input_rcu() and > > ip_route_input() be addressed, possibly? > > Do you mean masking TOS with IPTOS_RT_MASK directly in > ip_route_input_rcu(), instead of in the callers? > > After this patch, all callers apply IPTOS_RT_MASK before calling > ip_route_input_rcu(). So, yes, that could be easily consolidated there, > and I'll do that after net merges into net-next. > > More generally, my long term plan is indeed to do mask the TOS in > central places, to get consistent behaviour across the networking > stack. However, generally speaking, I need to be careful not to break > any established behaviour. > > I'm mostly worried about the ECN bits. I guess that any caller that > doesn't mask these bits has a bug (as that may break ECN, which is > there since a long time). However, there are many code paths to audit > before we can be sure. > > The end goal is to fully support DSCP. Once we'll be sure that no > code path can possibly intreprete an ECN bit as TOS, we'll can safely > drop all those obsolete TOS* masks and macros from the kernel code and > simply mask out the ECN bits (thus preserving the whole DSCP space).
Sounds great! > Please note that this is background work for me. Expect slow (but > hopefully regular) progress from me.