On Mon, Mar 6, 2017 at 6:31 PM, David Ahern <d...@cumulusnetworks.com> wrote: > On 3/4/17 1:15 PM, Eric Dumazet wrote: >> On Sat, 2017-03-04 at 19:57 +0100, Dmitry Vyukov wrote: >>> On Fri, Mar 3, 2017 at 8:12 PM, David Ahern <d...@cumulusnetworks.com> >>> wrote: >>>> On 3/3/17 6:39 AM, Dmitry Vyukov wrote: >>>>> I am getting heap out-of-bounds reports in >>>>> fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone while running >>>>> syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. They all >>>>> follow the same pattern: an object of size 216 is allocated from >>>>> ip_dst_cache slab, and then accessed at offset 272/276 withing >>>>> fib6_walk. Looks like type confusion. Unfortunately this is not >>>>> reproducible. >>>> >>>> I'll take a look this weekend or Monday at the latest. >>> >>> >>> I've got some additional useful info on this. I think this is >>> use-after-free rather than out-of-bounds. I've collected stack where >>> the route was disposed with call_rcu, see the last "Disposed" stack. >>> The crash happens when cmpxchg in rt_cache_route replaces an existing >>> route. And that route seems to have some existing pointers to it >>> (rt->dst.rt6_next) which fib6_walk uses to get to it after its >>> deletion. >> >> rt_cache_route() deals with IPv4 routes. >> >> We somehow mix IPv4 and IPv6 dsts in IPv6 tree. >> >> We need to add type safety at IPV6 route insertions to catch the >> offender. >> > > I've seen something like this before -- a rt was on the gc list but > still linked in the tables because of some reference. > > Dmitry: you seem to have reproduced this a few times. Can you share how > to run whatever tests you are using?
We hit it several thousand times, but we get only several dozens of crashes per day on ~80 VMs. So if you try to reproduce it on a single machine it can take days for a single crash. If you are ready to go that route, here are some instructions on setting up syzkaller: https://github.com/google/syzkaller You also need kernel built with CONFIG_KASAN. I am ready to help with resolving any issues. Another possible route is if you give me a patch with some additional WARNINGs. Then I can deploy it to bots and collect stacks.