On Tue, Jun 04, 2019 at 02:36:27PM -0700, Wei Wang wrote: > On Tue, Jun 4, 2019 at 2:13 PM David Ahern <dsah...@gmail.com> wrote: > > > > On 6/4/19 3:06 PM, Martin Lau wrote: > > > On Tue, Jun 04, 2019 at 02:17:28PM -0600, David Ahern wrote: > > >> On 6/3/19 11:29 PM, Martin Lau wrote: > > >>> On Mon, Jun 03, 2019 at 07:36:06PM -0600, David Ahern wrote: > > >>>> On 6/3/19 6:58 PM, Martin Lau wrote: > > >>>>> I have concern on calling ip6_create_rt_rcu() in general which seems > > >>>>> to trace back to this commit > > >>>>> dec9b0e295f6 ("net/ipv6: Add rt6_info create function for > > >>>>> ip6_pol_route_lookup") > > >>>>> > > >>>>> This rt is not tracked in pcpu_rt, rt6_uncached_list or exception > > >>>>> bucket. > > >>>>> In particular, how to react to NETDEV_UNREGISTER/DOWN like > > >>>>> the rt6_uncached_list_flush_dev() does and calls dev_put()? > > >>>>> > > >>>>> The existing callers seem to do dst_release() immediately without > > >>>>> caching it, but still concerning. > > >>>> > > >>>> those are the callers that don't care about the dst_entry, but are > > >>>> forced to deal with it. Removing the tie between fib lookups an > > >>>> dst_entry is again the right solution. > > >>> Great to know that there will be a solution. It would be great > > >>> if there is patch (or repo) to show how that may look like on > > >>> those rt6_lookup() callers. > > >> > > >> Not 'will be', 'there is' a solution now. Someone just needs to do the > > >> conversions and devise the tests for the impacted users. > > > I don't think everyone will convert to the new nexthop solution > > > immediately. > > > > > > How about ensuring the existing usage stays solid first? > > > > Use of nexthop objects has nothing to do with separating fib lookups > > from dst_entries, but with the addition of fib6_result it Just Works. > > > > Wei converted ipv6 to use exception caches instead of adding them to the > > FIB. > > > > I converted ipv6 to use separate data structures for fib entries, added > > direct fib6 lookup functions and added fib6_result. See the > > net/core/filter.c. > > > > The stage is set for converting users. > > > > For example, ip6_nh_lookup_table does not care about the dst entry, only > > the fib entry. This converts it: > > > > static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, > > const struct in6_addr *gw_addr, u32 tbid, > > int flags, struct fib6_result *res) > > { > > struct flowi6 fl6 = { > > .flowi6_oif = cfg->fc_ifindex, > > .daddr = *gw_addr, > > .saddr = cfg->fc_prefsrc, > > }; > > struct fib6_table *table; > > struct rt6_info *rt; > > > > table = fib6_get_table(net, tbid); > > if (!table) > > return -EINVAL; > > > > if (!ipv6_addr_any(&cfg->fc_prefsrc)) > > flags |= RT6_LOOKUP_F_HAS_SADDR; > > > > flags |= RT6_LOOKUP_F_IGNORE_LINKSTATE; > > > > fib6_table_lookup(net, table, cfg->fc_ifindex, fl6, res, flags); > > if (res.f6i == net->ipv6.fib6_null_entry) > > return -ENETUNREACH; > > > > fib6_select_path(net, &res, fl6, oif, false, NULL, flags); > > > > return 0; > > } > > I do agree with Martin that ip6_create_rt_rcu() seems to be dangerous > as the dst cache created by this func does not get tracked anywhere > and it is up to the user to not cache it for too long. IMO, ip6_create_rt_rcu(), which returns untracked rt, was a mistake and removing it has been overdue. Tracking down the unregister dev bug is not easy.
> But I think David, what you are suggesting is: > instead of trying to convert ip6_create_rt_rcu() to use the pcpu_dst > logic, completely get rid of the calling to ip6_create_rt_rcu(), and > directly return f6i in those cases to the caller. Is that correct? I am fine with either of these two ways to remove ip6_create_rt_rcu(). Further depending on ip6_create_rt_rcu() in this patch even in ip6_pol_route_lookup() is arguably neither of these two ways...