On 07.03.2018 14:53, Kirill Tkhai wrote: > On 07.03.2018 06:58, David Ahern wrote: >> ipv6_chk_addr_and_flags determines if an address is a local address. It >> is called by ip6_route_info_create to validate a gateway address is not a >> local address. It currently does not consider L3 domains and as a result >> does not allow a route to be added in one VRF if the nexthop points to >> an address in a second VRF. e.g., >> >> $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23 >> Error: Invalid gateway address. >> >> where 2001:db8:102::23 is an address on an interface in vrf r1. >> >> Resolve by comparing the l3mdev for the passed in device and requiring an >> l3mdev match with the device containing an address. The intent of checking >> for an address on the specified device versus any device in the domain is >> mantained by a new argument to skip the check between the passed in device >> and the device with the address. >> >> Update the handful of users of ipv6_chk_addr with a NULL dev argument: >> - anycast to call ipv6_chk_addr_and_flags. If the device is given by the >> user, look for the given address across the L3 domain. If the index is >> not given, the default table is presumed so only addresses on devices >> not enslaved are considered. >> >> - ip6_tnl_rcv_ctl - local address must exist on device, remote address >> can not exist in L3 domain; only remote check needs to be updated but >> do both for consistency. >> >> ip6_validate_gw needs to handle 2 cases - one where the device is given >> as part of the nexthop spec and the other where the device is resolved. >> There is at least 1 VRF case where deferring the check to only after >> the route lookup has resolved the device fails with an unintuitive error >> "RTNETLINK answers: No route to host" as opposed to the preferred >> "Error: Gateway can not be a local address." The 'no route to host' >> error is because of the fallback to a full lookup. >> >> Signed-off-by: David Ahern <dsah...@gmail.com> >> --- >> include/net/addrconf.h | 4 ++-- >> net/ipv6/addrconf.c | 26 ++++++++++++++++++++++---- >> net/ipv6/anycast.c | 9 ++++++--- >> net/ipv6/datagram.c | 5 +++-- >> net/ipv6/ip6_tunnel.c | 12 ++++++++---- >> net/ipv6/ndisc.c | 2 +- >> net/ipv6/route.c | 37 ++++++++++++++++++++++++++++--------- >> 7 files changed, 70 insertions(+), 25 deletions(-) >> >> diff --git a/include/net/addrconf.h b/include/net/addrconf.h >> index c4185a7b0e90..132e5b95167a 100644 >> --- a/include/net/addrconf.h >> +++ b/include/net/addrconf.h >> @@ -69,8 +69,8 @@ int addrconf_set_dstaddr(struct net *net, void __user >> *arg); >> int ipv6_chk_addr(struct net *net, const struct in6_addr *addr, >> const struct net_device *dev, int strict); >> int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr, >> - const struct net_device *dev, int strict, >> - u32 banned_flags); >> + const struct net_device *dev, bool skip_dev_check, >> + int strict, u32 banned_flags); > > This function already has 5 arguments, while this patch adds one more. > Can't we use new flags argument for both of them?
Them are skip_dev_check and strict. > Also, the name of function and input parameters are already so big, that they > don't fit a single line already, while your patch adds more parameters. > Can't we make it more slim? Something like ipv6_chk_addr_fl() instead of > current > name. > >> #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE) >> int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr); >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >> index b5fd116c046a..17d5d3f42d21 100644 >> --- a/net/ipv6/addrconf.c >> +++ b/net/ipv6/addrconf.c >> @@ -1851,22 +1851,40 @@ static int ipv6_count_addresses(const struct >> inet6_dev *idev) >> int ipv6_chk_addr(struct net *net, const struct in6_addr *addr, >> const struct net_device *dev, int strict) >> { >> - return ipv6_chk_addr_and_flags(net, addr, dev, strict, IFA_F_TENTATIVE); >> + return ipv6_chk_addr_and_flags(net, addr, dev, !dev, >> + strict, IFA_F_TENTATIVE); >> } >> EXPORT_SYMBOL(ipv6_chk_addr); > > This function was not introduced by this commit, but since the commit > modifies it, > and the function is pretty simple, we could declare it as static inline in > header > in separate patch. > >> >> +/* device argument is used to find the L3 domain of interest. If >> + * skip_dev_check is set, then the ifp device is not checked against >> + * the passed in dev argument. So the 2 cases for addresses checks are: >> + * 1. does the address exist in the L3 domain that dev is part of >> + * (skip_dev_check = true), or >> + * >> + * 2. does the address exist on the specific device >> + * (skip_dev_check = false) >> + */ >> int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr, >> - const struct net_device *dev, int strict, >> - u32 banned_flags) >> + const struct net_device *dev, bool skip_dev_check, >> + int strict, u32 banned_flags) >> { >> unsigned int hash = inet6_addr_hash(net, addr); >> + const struct net_device *l3mdev; >> struct inet6_ifaddr *ifp; >> u32 ifp_flags; >> >> rcu_read_lock(); >> + >> + l3mdev = l3mdev_master_dev_rcu(dev); >> + >> hlist_for_each_entry_rcu(ifp, &inet6_addr_lst[hash], addr_lst) { >> if (!net_eq(dev_net(ifp->idev->dev), net)) >> continue; >> + >> + if (l3mdev_master_dev_rcu(ifp->idev->dev) != l3mdev) >> + continue; >> + >> /* Decouple optimistic from tentative for evaluation here. >> * Ban optimistic addresses explicitly, when required. >> */ >> @@ -1875,7 +1893,7 @@ int ipv6_chk_addr_and_flags(struct net *net, const >> struct in6_addr *addr, >> : ifp->flags; >> if (ipv6_addr_equal(&ifp->addr, addr) && >> !(ifp_flags&banned_flags) && >> - (!dev || ifp->idev->dev == dev || >> + (skip_dev_check || ifp->idev->dev == dev || >> !(ifp->scope&(IFA_LINK|IFA_HOST) || strict))) { >> rcu_read_unlock(); >> return 1; > > There are two logical pieces in changes of this function: > > 1)You become always pass not NULL dev and add skip_dev_check argument. > 2)l3mdev_master_dev_rcu() check is introduced. > > They should go in separate patches. > >> diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c >> index c61718dba2e6..d580d4d456a5 100644 >> --- a/net/ipv6/anycast.c >> +++ b/net/ipv6/anycast.c >> @@ -66,7 +66,11 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const >> struct in6_addr *addr) >> return -EPERM; >> if (ipv6_addr_is_multicast(addr)) >> return -EINVAL; >> - if (ipv6_chk_addr(net, addr, NULL, 0)) >> + >> + if (ifindex) >> + dev = __dev_get_by_index(net, ifindex); >> + >> + if (ipv6_chk_addr_and_flags(net, addr, dev, true, 0, IFA_F_TENTATIVE)) >> return -EINVAL; >> >> pac = sock_kmalloc(sk, sizeof(struct ipv6_ac_socklist), GFP_KERNEL); >> @@ -90,8 +94,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const >> struct in6_addr *addr) >> dev = __dev_get_by_flags(net, IFF_UP, >> IFF_UP | IFF_LOOPBACK); >> } >> - } else >> - dev = __dev_get_by_index(net, ifindex); >> + } > > The hunk moving __dev_get_by_index() dereference may go as separate change, > as it's a refactoring. > This will make the review easier. > >> >> if (!dev) { >> err = -ENODEV; >> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c >> index fbf08ce3f5ab..b27333d7b099 100644 >> --- a/net/ipv6/datagram.c >> +++ b/net/ipv6/datagram.c >> @@ -801,8 +801,9 @@ int ip6_datagram_send_ctl(struct net *net, struct sock >> *sk, >> if (addr_type != IPV6_ADDR_ANY) { >> int strict = __ipv6_addr_src_scope(addr_type) >> <= IPV6_ADDR_SCOPE_LINKLOCAL; >> if (!(inet_sk(sk)->freebind || >> inet_sk(sk)->transparent) && >> - !ipv6_chk_addr(net, &src_info->ipi6_addr, >> - strict ? dev : NULL, 0) && >> + !ipv6_chk_addr_and_flags(net, >> &src_info->ipi6_addr, >> + dev, !strict, 0, >> + IFA_F_TENTATIVE) && >> !ipv6_chk_acast_addr_src(net, dev, >> >> &src_info->ipi6_addr)) >> err = -EINVAL; >> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c >> index 56c4967f1868..1ce8244e8aee 100644 >> --- a/net/ipv6/ip6_tunnel.c >> +++ b/net/ipv6/ip6_tunnel.c >> @@ -758,9 +758,11 @@ int ip6_tnl_rcv_ctl(struct ip6_tnl *t, >> ldev = dev_get_by_index_rcu(net, p->link); >> >> if ((ipv6_addr_is_multicast(laddr) || >> - likely(ipv6_chk_addr(net, laddr, ldev, 0))) && >> + likely(ipv6_chk_addr_and_flags(net, laddr, ldev, false, >> + 0, IFA_F_TENTATIVE))) && >> ((p->flags & IP6_TNL_F_ALLOW_LOCAL_REMOTE) || >> - likely(!ipv6_chk_addr(net, raddr, NULL, 0)))) >> + likely(!ipv6_chk_addr_and_flags(net, raddr, ldev, true, >> + 0, IFA_F_TENTATIVE)))) >> ret = 1; >> } >> return ret; >> @@ -990,12 +992,14 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t, >> if (p->link) >> ldev = dev_get_by_index_rcu(net, p->link); >> >> - if (unlikely(!ipv6_chk_addr(net, laddr, ldev, 0))) >> + if (unlikely(!ipv6_chk_addr_and_flags(net, laddr, ldev, false, >> + 0, IFA_F_TENTATIVE))) >> pr_warn("%s xmit: Local address not yet configured!\n", >> p->name); >> else if (!(p->flags & IP6_TNL_F_ALLOW_LOCAL_REMOTE) && >> !ipv6_addr_is_multicast(raddr) && >> - unlikely(ipv6_chk_addr(net, raddr, NULL, 0))) >> + unlikely(ipv6_chk_addr_and_flags(net, raddr, ldev, >> + true, 0, >> IFA_F_TENTATIVE))) >> pr_warn("%s xmit: Routing loop! Remote address found on >> this node!\n", >> p->name); >> else >> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c >> index 0a19ce3a6f7f..13bf775c7f1a 100644 >> --- a/net/ipv6/ndisc.c >> +++ b/net/ipv6/ndisc.c >> @@ -707,7 +707,7 @@ static void ndisc_solicit(struct neighbour *neigh, >> struct sk_buff *skb) >> int probes = atomic_read(&neigh->probes); >> >> if (skb && ipv6_chk_addr_and_flags(dev_net(dev), &ipv6_hdr(skb)->saddr, >> - dev, 1, >> + dev, false, 1, >> IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) >> saddr = &ipv6_hdr(skb)->saddr; >> probes -= NEIGH_VAR(neigh->parms, UCAST_PROBES); >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c >> index 3851c3ccfd7a..bbc62799eb3b 100644 >> --- a/net/ipv6/route.c >> +++ b/net/ipv6/route.c >> @@ -2633,18 +2633,25 @@ static int ip6_validate_gw(struct net *net, struct >> fib6_config *cfg, >> const struct in6_addr *gw_addr = &cfg->fc_gateway; >> int gwa_type = ipv6_addr_type(gw_addr); >> const struct net_device *dev = *_dev; >> + bool need_local_addr_check = !dev; >> int err = -EINVAL; >> >> - /* if gw_addr is local we will fail to detect this in case >> - * address is still TENTATIVE (DAD in progress). rt6_lookup() >> - * will return already-added prefix route via interface that >> - * prefix route was assigned to, which might be non-loopback. >> + /* if route spec contains the device, check if gateway address >> + * is a local address in the same L3 domain >> */ >> - if (ipv6_chk_addr_and_flags(net, gw_addr, >> - gwa_type & IPV6_ADDR_LINKLOCAL ? >> - dev : NULL, 0, 0)) { >> - NL_SET_ERR_MSG(extack, "Invalid gateway address"); >> - goto out; >> + if (dev) { >> + /* if gw_addr is local we will fail to detect this in case >> + * address is still TENTATIVE (DAD in progress). rt6_lookup() >> + * will return already-added prefix route via interface that >> + * prefix route was assigned to, which might be non-loopback. >> + */ >> + if (ipv6_chk_addr_and_flags(net, gw_addr, dev, >> + gwa_type & IPV6_ADDR_LINKLOCAL ? >> + false : true, 0, 0)) { >> + NL_SET_ERR_MSG(extack, >> + "Gateway can not be a local address"); >> + goto out; >> + } > > Why do these two "if" go as tree, not as single "if (a && b)"? > >> } >> >> if (gwa_type != (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_UNICAST)) { >> @@ -2683,6 +2690,18 @@ static int ip6_validate_gw(struct net *net, struct >> fib6_config *cfg, >> "Egress device can not be loopback device for >> this route"); >> goto out; >> } >> + >> + /* if we did not check gw_addr above, do so now that the >> + * egress device has been resolved. >> + */ >> + if (need_local_addr_check && > > Do we really need a variable with so long name? Can't we use "local_check" or > something like this? > >> + ipv6_chk_addr_and_flags(net, gw_addr, dev, >> + gwa_type & IPV6_ADDR_LINKLOCAL ? >> + false : true, 0, 0)) { > > Second time there is "gwa_type & IPV6_ADDR_LINKLOCAL ? false : true". > gwa_type is constant, > it doesn't change. Repeating constant expressions have to be be cached into > local variable > for improving readability. > >> + NL_SET_ERR_MSG(extack, "Gateway can not be a local address"); >> + goto out; >> + } >> + >> err = 0; >> out: >> return err; >> > > Thanks, > Kirill >