On Fri, Jan 05, 2018 at 05:38:35PM -0800, Wei Wang wrote: > From: Wei Wang <wei...@google.com> > > In the current code, when creating a new fib6 table, tb6_root.leaf gets > initialized to net->ipv6.ip6_null_entry. > If a default route is being added with rt->rt6i_metric = 0xffffffff, > fib6_add() will add this route after net->ipv6.ip6_null_entry. As > null_entry is shared, it could cause problem. > > In order to fix it, set fn->leaf to NULL before calling > fib6_add_rt2node() when trying to add the first default route. > And reset fn->leaf to null_entry when adding fails or when deleting the > last default route. > > syzkaller reported the following issue which is fixed by this commit: > ============================= > WARNING: suspicious RCU usage > 4.15.0-rc5+ #171 Not tainted > ----------------------------- > net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 1 > 4 locks held by swapper/0/0: > #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] > lockdep_copy_map include/linux/lockdep.h:178 [inline] > #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] > call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 > #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] > spin_lock_bh include/linux/spinlock.h:315 [inline] > #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] > fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 > #2: (rcu_read_lock){....}, at: [<0000000091db762d>] > __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 > #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh > include/linux/spinlock.h:315 [inline] > #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] > __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 > > stack backtrace: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:53 > lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 > fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 > fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 > fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 > fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 > fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 > __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 > fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] > fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 > fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 > call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 > expire_timers kernel/time/timer.c:1357 [inline] > __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 > run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 > __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 > invoke_softirq kernel/softirq.c:365 [inline] > irq_exit+0x1cc/0x200 kernel/softirq.c:405 > exiting_irq arch/x86/include/asm/apic.h:540 [inline] > smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 > apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 > </IRQ> > > Reported-by: syzbot <syzkal...@googlegroups.com> > Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in > fib6_table") > Signed-off-by: Wei Wang <wei...@google.com> > --- > net/ipv6/ip6_fib.c | 45 +++++++++++++++++++++++++++++++++++---------- > 1 file changed, 35 insertions(+), 10 deletions(-) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index d11a5578e4f8..37cb4ad1ea29 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net, > if (!(fn->fn_flags & RTN_RTINFO)) { > RCU_INIT_POINTER(fn->leaf, NULL); > rt6_release(leaf); > + /* remove null_entry in the root node */ > + } else if (fn->fn_flags & RTN_TL_ROOT && > + rcu_access_pointer(fn->leaf) == > + net->ipv6.ip6_null_entry) { > + RCU_INIT_POINTER(fn->leaf, NULL); It seems the reader side could see tb6_root.leaf == NULL after this change and I think it should be fine? If it is, instead of switching betwen NULL and ip6_null_entry, would it be simpler to always set tb6_root.leaf to NULL until a legit default route is added?
> } > > return fn; > @@ -1270,14 +1275,27 @@ int fib6_add(struct fib6_node *root, struct rt6_info > *rt, > return err; > > failure: > - /* fn->leaf could be NULL if fn is an intermediate node and we > - * failed to add the new route to it in both subtree creation > - * failure and fib6_add_rt2node() failure case. > - * In both cases, fib6_repair_tree() should be called to fix > + /* fn->leaf could be NULL if: > + * 1. fn is the root node in the table and we fail to add the default > + * route to it. > + * In this case, we put fn->leaf back to net->ipv6.ip6_null_entry as > + * the way the table was created. > + * 2. fn is an intermediate node and we failed to add the new > + * route to it in both subtree creation failure and fib6_add_rt2node() > + * failure case. > + * In this case, fib6_repair_tree() should be called to fix > * fn->leaf. > */ > - if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) > - fib6_repair_tree(info->nl_net, table, fn); > + if (fn) { > + if (fn->fn_flags & RTN_TL_ROOT) { > + if (!rcu_access_pointer(fn->leaf)) { > + rcu_assign_pointer(fn->leaf, > + info->nl_net->ipv6.ip6_null_entry); > + } > + } else if (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) { > + fib6_repair_tree(info->nl_net, table, fn); > + } > + } > /* Always release dst as dst->__refcnt is guaranteed > * to be taken before entering this function > */ > @@ -1685,11 +1703,18 @@ static void fib6_del_route(struct fib6_table *table, > struct fib6_node *fn, > } > read_unlock(&net->ipv6.fib6_walker_lock); > > - /* If it was last route, expunge its radix tree node */ > + /* If it was last route: > + * 1. For root node, put back null_entry as how the table was created. > + * 2. For other nodes, expunge its radix tree node. > + */ > if (!rcu_access_pointer(fn->leaf)) { > - fn->fn_flags &= ~RTN_RTINFO; > - net->ipv6.rt6_stats->fib_route_nodes--; > - fn = fib6_repair_tree(net, table, fn); > + if (fn->fn_flags & RTN_TL_ROOT) { > + rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry); > + } else { > + fn->fn_flags &= ~RTN_RTINFO; > + net->ipv6.rt6_stats->fib_route_nodes--; > + fn = fib6_repair_tree(net, table, fn); > + } > } > > fib6_purge_rt(rt, fn, net); > -- > 2.16.0.rc0.223.g4a4ac83678-goog >