Hello,

On Thu, 31 Mar 2016, David Ahern wrote:

> To maintain backward compatibility use of the neighbor information is
> based on a new sysctl, fib_multipath_use_neigh.
> 
> Signed-off-by: David Ahern <d...@cumulusnetworks.com>
> ---
> v2
> - use rcu locking to avoid refcnts per Eric's suggestion
> - only consider neighbor info for nh_scope == RT_SCOPE_LINK per Julian's
>   comment
> - drop the 'state == NUD_REACHABLE' from the state check since it is
>   part of NUD_VALID (comment from Julian)
> - wrapped the use of the neigh in a sysctl
> 
>  Documentation/networking/ip-sysctl.txt | 10 ++++++++++
>  include/net/netns/ipv4.h               |  3 +++
>  net/ipv4/fib_semantics.c               | 32 ++++++++++++++++++++++++++++++--
>  net/ipv4/sysctl_net_ipv4.c             | 11 +++++++++++
>  4 files changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index b183e2b606c8..5b316d33a23f 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
>       fwmark of the packet they are replying to.
>       Default: 0
>  
> +fib_multipath_use_neigh - BOOLEAN
> +     Use status of existing neighbor entry when determining nexthop for
> +     multipath routes. If disabled neighbor information is not used then
> +     packets could be directed to a dead nexthop. Only valid for kernels

        Some may associate "dead" with RTNH_F_DEAD while
in context of nexthop "failed" or "unreachable" can be
more suitable? Can we use something like this?:

        If disabled<COMMA_ALLOWED_HERE?> neighbor information is not used
and packets could be directed to a failed nexthop.

> +     built with CONFIG_IP_ROUTE_MULTIPATH enabled.
> +     Default: 0 (disabled)
> +     Possible values:
> +     0 - disabled
> +     1 - enabled
> +
>  route/max_size - INTEGER
>       Maximum number of routes allowed in the kernel.  Increase
>       this when using large numbers of interfaces and/or routes.
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index a69cde3ce460..d061ffeb1e71 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -133,6 +133,9 @@ struct netns_ipv4 {
>       struct fib_rules_ops    *mr_rules_ops;
>  #endif
>  #endif
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +     int sysctl_fib_multipath_use_neigh;
> +#endif
>       atomic_t        rt_genid;
>  };
>  #endif
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index d97268e8ff10..6d423faff0ce 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1559,17 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int 
> nh_flags)
>  }
>  
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH
> +static bool fib_good_nh(const struct fib_nh *nh, struct net_device *dev)
> +{
> +     struct neighbour *n = NULL;
> +     int state = NUD_NONE;

        Looks like we can do it even better.
If we use NUD_REACHABLE here...

> +
> +     if (nh->nh_scope == RT_SCOPE_LINK) {
> +             rcu_read_lock_bh();
> +
> +             n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, dev);
> +             if (n)
> +                     state = n->nud_state;
> +
> +             rcu_read_unlock_bh();
> +     }
> +
> +     /* outside of rcu locking using n only as a boolean
> +      * on whether a neighbor entry existed
> +      */
> +     if (!n || (state & NUD_VALID))

        then check for '!n' is not needed. For bool type
'return state & NUD_VALID;' should work.

> +             return true;
> +
> +     return false;
> +}
>  
>  void fib_select_multipath(struct fib_result *res, int hash)
>  {
>       struct fib_info *fi = res->fi;
> +     struct net_device *dev = fi->fib_dev;
> +     struct net *net = fi->fib_net;
>  
>       for_nexthops(fi) {
>               if (hash > atomic_read(&nh->nh_upper_bound))
>                       continue;
>  
> -             res->nh_sel = nhsel;
> -             return;
> +             if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
> +                 fib_good_nh(nh, dev)) {

        This dev is from first nexthop. Better fib_good_nh
to use nh->nh_dev instead, it is present for all nexthops
in multipath route. We should not copy the bugs from
fib_detect_death.

> +                     res->nh_sel = nhsel;
> +                     return;
> +             }
>       } endfor_nexthops(fi);

        So, you dropped the idea to give full chance for
fallback? Now if last nexthop fails we do not fallback at all.
We promised to prefer reachable nexthops.

Regards

--
Julian Anastasov <j...@ssi.bg>

Reply via email to