[Bug c/108552] New: Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 Bug ID: 108552 Summary: Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: feng.tang at intel dot com Target Milestone: --- Created attachment 54345 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54345&action=edit objdump of prep_compound_page() 0Day found a i386 Linux kernel boot issue, and bisection shows the first bad commit is 7118fc2906e29 ("hugetlb: address ref count racing in prep_compound_gigantic_page"). It happens 94 times out of 999 runs. Details and some debug analysis from Linus/Vlastimil and us could be found in the following link: https://lore.kernel.org/lkml/202301170941.49728982-oliver.s...@intel.com/t/ Debug shows it is related with one function prep_compound_page() in mm/page_alloc.c: * If we use '#pragma GCC optimize ("O1")' for that function (kernel normally uses O2), the issue will be gone * If we disable GCOV for page_alloc.c, can't reproduce it * If we disable UBSAN for page_alloc.c, can't reproduce it * Not reproducable for x86_64 build It seems to be a loop corruption, the pesudo code is: for (i = 1; i < nr_pages; i++) set_meta_data(page[i]; It should happen for page[1]...page[nr_pages - 1], but from memory dump, seems that one more page, the page[nr_pages] is also called with set_meta_data[]. https://lore.kernel.org/all/202212312021.bc1efe86-oliver.s...@intel.com/t/ The kernel log, i386 config and the objdump of prep_compound_page() of first bad commit are attached, please let know if you need more info, thanks!
[Bug c/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #1 from Tang, Feng --- Created attachment 54346 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54346&action=edit kernel log with error message
[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #4 from Tang, Feng --- (In reply to Andrew Pinski from comment #3) > Do you have the preprocessed source that is used generate the bad object > file? > How about the exact command line? Thanks for the prompt response! The error was originally reported by 0Day (which is a kernel automation test robot), and I can locally reproduce it with a little difference. Sorry for my poor knowledge of gcc, do you want me to give the output of " make ARCH=i386 mm/page_alloc.s"? or you can give me to command to generate it. thanks
[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 Tang, Feng changed: What|Removed |Added Attachment #54345|0 |1 is obsolete|| --- Comment #6 from Tang, Feng --- Created attachment 54348 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54348&action=edit objdump of prep_compound_page()
[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #7 from Tang, Feng --- Created attachment 54349 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54349&action=edit original job-script from Oliver (0Day)
[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #8 from Tang, Feng --- Created attachment 54350 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54350&action=edit i386 kernel config In https://lore.kernel.org/lkml/202301170941.49728982-oliver.s...@intel.com/t/ Oliver Sang provided a reproduce: To reproduce: # build kernel cd linux cp config-5.13.0-00219-g7118fc2906e2 .config make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH= modules_install cd find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #9 from Tang, Feng --- For original report https://lore.kernel.org/lkml/202301170941.49728982-oliver.s...@intel.com/t/, it was reported by Sang Oliver from 0Day team, but I failed to add him too cc (probably due to he is not registered in this bugzilla system?), so I will try to gather some info (some from Oliver's report, some from my local system when it can't be found from Oliver's report) gcc version: gcc-11 (Debian 11.3.0-8) 11.3.0 gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Platform: QEMU Preprocessing file: page_alloc.i (attached) gcc options: from page_alloc.s(got from 'make ARCH=i386 mm/page_alloc.s') # GNU C89 (Ubuntu 11.3.0-1ubuntu1~22.04) version 11.3.0 (x86_64-linux-gnu) # compiled by GNU C version 11.3.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m32 -msoft-float -mregparm=3 -mpreferred-stack-boundary=2 -march=i686 -mstack-protector-guard-reg=fs -msta ck-protector-guard-symbol=__stack_chk_guard -mindirect-branch=thunk-extern -mindirect-branch-register -O2 -std=gnu90 -fno-strict-aliasing -fno-common -fshort-wchar -fcf-prot ection=none -freg-struct-return -fno-pic -ffreestanding -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fno-reo rder-blocks -fno-ipa-cp-clone -fno-partial-inlining -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-stack-clash-protection -fno-inline-func tions-called-once -fno-strict-overflow -fstack-check=no -fconserve-stack -fprofile-arcs -ftest-coverage -fno-tree-loop-im -fsanitize=bounds -fsanitize=shift -fsanitize=unrea chable
[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #10 from Tang, Feng --- Created attachment 54352 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54352&action=edit page_alloc.i.xz
[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #36 from Tang, Feng --- (In reply to Vladimir Makarov from comment #35) > (In reply to Jakub Jelinek from comment #34) > > Seems right now DECL_NONALIASED is only used on these coverage vars and on > > Fortran caf tokens, so perhaps a quick workaround would be on the LRA side > > never reread stuff from MEMs with VAR_P && DECL_NONALIASED MEM_EXPRs. CCing > > Vlad on that. > > The following patch can do this: > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc Thanks for the patch! As the bug is against 11.3, so I git cloned gcc git, and checkout origin/releases/gcc-11 branch, then compile gcc (TBH, it's my first time) * built gcc-11,compiled i386 kernel, run my local reproduce(QEMU loop booting that kernel), the error was reproduced at once for every 20 boots rate. * manually applied Vladimir's patch (original patch seems to be against 'master' branch) * rebuilt gcc, make clean and re-compile i386 kernel, and the error was NOT seen in 350 runs so far Also I will attach the page_alloc.i and objdump of prep_compound_page() with the new patched gcc-11
[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #37 from Tang, Feng --- Created attachment 54367 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54367&action=edit page_alloc.i with patch in comment 35
[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #38 from Tang, Feng --- Created attachment 54368 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54368&action=edit objdump of prep_compound_page() with patch in comment 35