Hello, Dmirty, Thank you for the explanation of how syzkaller/syzbot works in this and other emails. I understand that is it a complicated task to determine and categorize bugs based on just crash dump and messages, and syzkaller does a great job of doing so.
> Re __hw_addr_add_ex bug, as Alex noted the crash was detected _after_ > the fixing commit went in. So it's something new and different and > can't be fixed by the older commit. Indeed, you're right, there is another issue with tun/tap devices which leads to this bug. I've posted a patch (https://lkml.org/lkml/2018/9/26/416) to fix it. I hope I did not do much damage, reporting previous fix as a fix for this bug, as syzkaller will probably create another "KMSAN: uninit-value in <...>" report. Best regards, Vladis Dronov | Red Hat, Inc. | Product Security Engineer ----- Original Message ----- > From: "Dmitry Vyukov" <dvyu...@google.com> > To: "Alexander Potapenko" <gli...@google.com> > Cc: "Vladis Dronov" <vdro...@redhat.com>, "syzbot" > <syzbot+d3402c47f680ff24b...@syzkaller.appspotmail.com>, > "syzkaller-bugs" <syzkaller-b...@googlegroups.com>, "David Miller" > <da...@davemloft.net>, "Eric Dumazet" > <eduma...@google.com>, "LKML" <linux-ker...@vger.kernel.org>, "Networking" > <netdev@vger.kernel.org>, "sunlianwen" > <sunlw.f...@cn.fujitsu.com> > Sent: Monday, September 24, 2018 11:39:08 AM > Subject: Re: KMSAN: uninit-value in memcmp (2) > > On Mon, Sep 24, 2018 at 8:53 AM, Alexander Potapenko <gli...@google.com> > wrote: > > On Mon, Sep 24, 2018 at 12:09 AM Vladis Dronov <vdro...@redhat.com> wrote: > >> > >> Hello, Dmirty, > >> > >> Thank you for the reply. Can we please, discuss this further? > > Hi Vladis, > >> > You can see on dashboard that the last crash > >> > for the second version (2) happened just few days ago. So this is a > >> > different bug. > > FWIW I've just double-checked that the reproducer provided by > > syzkaller in the original message still triggers the report from the > > original message in the latest KMSAN tree (which already contains the > > __hw_addr_add_ex() fix from April). > >> Well... yes and no. When I was looking at this bug (bug?id=088efeac32fd) I > >> was looking > >> at the report at "2018/05/09 18:55" > >> (https://syzkaller.appspot.com/text?tag=CrashReport&x=141b707b800000), > >> since it was the only report with a reproducer. This was my error. > >> > >> The error and the call trace in this report are: > >> > >> >>> > >> BUG: KMSAN: uninit-value in memcmp+0x119/0x180 lib/string.c:861 > >> CPU: 0 PID: 38 Comm: kworker/0:1 Not tainted 4.17.0-rc3+ #88 > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >> Google 01/01/2011 > >> Workqueue: ipv6_addrconf addrconf_dad_work > >> Call Trace: > >> __dump_stack lib/dump_stack.c:77 [inline] > >> dump_stack+0x185/0x1d0 lib/dump_stack.c:113 > >> kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 > >> __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 > >> memcmp+0x119/0x180 lib/string.c:861 > >> __hw_addr_add_ex net/core/dev_addr_lists.c:61 [inline] > >> __dev_mc_add+0x1fc/0x900 net/core/dev_addr_lists.c:670 > >> dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687 > >> igmp6_group_added+0x2db/0xa00 net/ipv6/mcast.c:662 > >> ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914 > >> addrconf_join_solict net/ipv6/addrconf.c:2103 [inline] > >> addrconf_dad_begin net/ipv6/addrconf.c:3853 [inline] > >> addrconf_dad_work+0x462/0x2a20 net/ipv6/addrconf.c:3979 > >> process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145 > >> worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279 > >> kthread+0x539/0x720 kernel/kthread.c:239 > >> ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412 > >> > >> Local variable description: ----buf@igmp6_group_added > >> Variable was created at: > >> igmp6_group_added+0x4a/0xa00 net/ipv6/mcast.c:650 > >> ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914 > >> <<< > >> > >> It is the same like in bug?id=3887c0d99aecb27d085180c5222d245d08a30806 > >> which, after some more test, made me believe these bugs are duplicate > >> and are fixed by the same commit. > >> > >> But let's look at another report at "2018/09/12 21:00" > >> (https://syzkaller.appspot.com/text?tag=CrashReport&x=14f99b71400000) > >> at the bug (bug?id=088efeac32fd), the one you've mentioned as > >> "the last crash for the second version (2) happened just few days ago". > >> > >> Its error and the call trace are completely different: > >> > >> >>> > >> BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863 > >> CPU: 0 PID: 6107 Comm: syz-executor4 Not tainted 4.19.0-rc3+ #45 > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >> Google 01/01/2011 > >> Call Trace: > >> __dump_stack lib/dump_stack.c:77 [inline] > >> dump_stack+0x14b/0x190 lib/dump_stack.c:113 > >> kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956 > >> __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645 > >> memcmp+0x11d/0x180 lib/string.c:863 > >> dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464 > >> ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline] > >> rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558 > >> rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715 > >> netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454 > >> rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733 > >> netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] > >> netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343 > >> netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908 > >> sock_sendmsg_nosec net/socket.c:621 [inline] > >> sock_sendmsg net/socket.c:631 [inline] > >> ... > >> Uninit was created at: > >> ... > >> slab_post_alloc_hook mm/slab.h:446 [inline] > >> slab_alloc_node mm/slub.c:2718 [inline] > >> __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351 > >> __kmalloc_reserve net/core/skbuff.c:138 [inline] > >> __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206 > >> alloc_skb include/linux/skbuff.h:996 [inline] > >> netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline] > >> netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883 > >> sock_sendmsg_nosec net/socket.c:621 [inline] > >> sock_sendmsg net/socket.c:631 [inline] > >> ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114 > >> <<< > >> > >> This is a different bug. How come these 2 different reports for 2 > >> different > >> bugs have ended in the same syzkaller report (bug?id=088efeac32fd) ? > > > > I suspect this is because syzbot used the top stack frame as the > > report signature. > > There's a mechanism to ignore frames like memcmp() in the reports, not > > sure why didn't it work in this case (maybe it just wasn't in place at > > the time the report happened). > >> One bug is fixed by the "net: fix uninit-value in __hw_addr_add_ex()" > >> commit, > >> the second one is not, but they are still in the same syzkaller report. > >> > >> This was the reason of my confusion. I'm not sure how to fix this. If it > >> is possible, > >> probably we need to cancel/revoke "#syz fix: net: fix uninit-value in > >> __hw_addr_add_ex()" > >> for this syzkaller report (bug?id=088efeac32fd). And then "split" it into > >> 2 or > >> more different reports, but I'm not sure if this is possible. > >> > >> Probably, syzkaller needs to look deeper into the KMSAN reports to > >> differentiate > >> KMSAN errors happening because of different reasons. > >> > >> > On Sun, Sep 23, 2018 at 6:02 PM, Vladis Dronov <vdro...@redhat.com> > >> > wrote: > >> > > #syz fix: net: fix uninit-value in __hw_addr_add_ex() > >> > > >> > Hi Vladis, > >> > > >> > This can be fixed with "net: fix uninit-value in __hw_addr_add_ex()". > >> > That commit landed in April, syzbot waited till the commit reached all > >> > tested trees, and then closed the bug. > >> > But the similar bug continued to happen, so syzbot created second > >> > version of this bug (2). You can see on dashboard that the last crash > >> > for the second version (2) happened just few days ago. So this is a > >> > different bug. > > > Precisely discriminating bugs (root causes) bases on crash text is > generally undecidable problem, even for humans. We even can have > literally equal crash texts, which are still different bugs. And we > can have significantly differently looking crash texts, which are > actually caused by the same root cause. syzbot extracts some > "identity" string for each crash and than uses that string to > discriminate crashes and sort them into bins. This identity string is > what you see in email subject and bug title on dashboard. This method > can have both false positives and false negatives, but works > reasonably well in most cases and looks like the best practical > option. > > For this exact instance (memcmp) we actually improved the analysis > logic recently: > https://github.com/google/syzkaller/commit/0e29942f77715486995d996f80f82742812d75a2#diff-abe1515f011fad2659ff218f9eea9ae1 > But this crash was analyzed and reported before the change. So if this > crash happens again it should be reported as "in __hw_addr_add_ex" > now. > > Re __hw_addr_add_ex bug, as Alex noted the crash was detected _after_ > the fixing commit went in. So it's something new and different and > can't be fixed by the older commit. > > There are no general, single guideline as to what to do when several > different bugs glued together into a single bug. Fixing at least one > of them (any) in the context of the bug is good, fixing both is good > too. When/if a bug is closed, new occurrences of similar crashes (the > same identity string) will lead to creation of a new bug. So if we fix > only one and close the bug, eventually the second one will lead to a > new bug (won't be lost), now dedicated to this second crash. > > Now syzbot thinks that this bug is fixed/closed: > https://syzkaller.appspot.com/bug?extid=d3402c47f680ff24b29c > There is specifically no "undo" functionality, because it's inherently > racy with creation of a new version of this bugs by new crashes. So if > of these crashes will happen again, syzbot will open new bugs (now > with better discriminated titles). We can wait for that. Or we can > submit new fixes without waiting for new syzbot bugs (adding > Reported-by to new commits referencing this bug should not do any > harm). > > Hope this clarifies things a bit. > > Thanks