[Bug c/70604] New: switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 Bug ID: 70604 Summary: switch statement optimization creates dead code Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Target Milestone: --- Created attachment 38226 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38226&action=edit iscsi_target.i.gz The linux kernel has a new tool named "objtool" which follows all possible code paths for every .o file, looking for abnormalities. In rare cases it has stumbled on a gcc optimization issue related to switch statements, which leaves some dead code (unreachable instructions) laying around, resulting in increased code size and confusing assembler code. Note: For some reason the likelihood of seeing this problem seems to have diminished from gcc 5.3.1 to gcc 6.0. We've seen the problem in three separate object files in the kernel with gcc 5.3.1, but with gcc 6.0 there's only one known occurrence of this issue. Further, even in that one case, the size of the dead code has decreased from several instructions in 5.3.1 to only a single unreachable instruction in 6.0. Here are the relevant assembler code excerpts, using gcc 6.0: iscsit_handle_task_mgt_cmd: [...snip...] jmp *.L6212+64(%rip)# .section.rodata .align 8 .align 4 .L6212: .quad .L6211 .quad .L6070 .quad .L6214 .quad .L6214 .quad .L6214 .quad .L6214 .quad .L6082 .quad .L6089 .quad .L6096 .text [...snip...] jmp .L6196 # .L6211: movl$8, %esi#, _315 .L6213: movq$.LC48, %rdi#, There's an indirect jump instruction in .text, along with a jump table in .rodata, which is a common pattern for switch statements. But this one's a little different than the normal pattern: the indirect jump is hard-coded to use a single entry in the jump table. The other jump table entries are unused. Further, the code referenced in one of the entries (.L6211) is dead code and completely unreachable from anywhere in the function. Note that the -fsanitize=unreachable option is enabled, but this seems unrelated: __builtin_unreachable() doesn't appear to have been used for this code path, and the unreachable code block doesn't have a call to __ubsan_handle_builtin_unreachable. So to summarize, the issues are: 1) dead code in .text (in this case, the function only has one unreachable instruction) 2) mostly unused (and completely unnecessary) jump table in .rodata 3) unnecessary indirect jump (a direct jump could be used instead) $ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.0.0/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 6.0.0 20160311 (Red Hat 6.0.0-0.17) (GCC) $ gcc -Wp,-MD,drivers/target/iscsi/.iscsi_target.o.d -nostdinc -I./arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments -Wdeclaration-after-statement -Wno-pointer-sign -fno-st
[Bug c/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 --- Comment #1 from Josh Poimboeuf --- Created attachment 38227 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38227&action=edit Linux .config
[Bug tree-optimization/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 --- Comment #3 from Josh Poimboeuf --- Hi Richard, thanks for looking at it! (In reply to Richard Biener from comment #2) > Are the cases you still see indirect jumps to only one active case or > is it just that if()s with multiple cases would have avoided the dead > code? I don't quite understand the second part of your question, but maybe this information will answer it. With GCC 6 I've only seen one occurrence of this issue, which is the one for which I posted the assembler and .i file above. It has an indirect jump which is hard-coded to use only one entry in the jump table, as shown in the assembler code above. Note that the C code has two switch statements, which seem to correspond to the two "normal" cases of indirect jump table patterns. But then there's the third unusual indirect jump, shown above, which also corresponds to one of the two switch statements -- so there are two jump tables for a single switch statement, where one of the tables appears to be optimized for a single case. Hopefully I'm making sense :-) With GCC 5, the other occurrences of this issue were very similar, with switch statements and indirect jumps to a single entry in the jump table.
[Bug c/70646] Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 --- Comment #2 from Josh Poimboeuf --- $ gcc -Wp,-MD,drivers/scsi/qla2xxx/.qla_attr.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/5.3.1/include -I./arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Os -Wno-maybe-uninitialized --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(qla_attr)" -D"KBUILD_MODNAME=KBUILD_STR(qla2xxx)" -c -o drivers/scsi/qla2xxx/qla_attr.o drivers/scsi/qla2xxx/qla_attr.c $ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-default-libstdcxx-abi=gcc4-compatible --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)
[Bug tree-optimization/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 Josh Poimboeuf changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #4 from Josh Poimboeuf --- Richard, just realized you weren't on CC for my response to your question.
[Bug c/70646] Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 --- Comment #1 from Josh Poimboeuf --- Created attachment 38256 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38256&action=edit Linux kernel config
[Bug c/70646] New: Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 Bug ID: 70646 Summary: Corrupt truncated function Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Target Milestone: --- Created attachment 38255 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38255&action=edit qla_attr.i.gz The linux kernel has a new tool named "objtool" which follows all possible code paths for every .o file, looking for abnormalities. In one rare case it has discovered a corrupt truncated function. From the disassembly of drivers/scsi/qla2xxx/qla_attr.o: 2f53 : 2f53: 55 push %rbp 2f54: 48 89 e5mov%rsp,%rbp 2f57 : 2f57: 55 push %rbp 2f58: b9 e8 00 00 00 mov$0xe8,%ecx 2f5d: 48 89 e5mov%rsp,%rbp ... Note that qla2x00_get_host_fabric_name() is inexplicably truncated after setting up the frame pointer. It falls through to the next function, which is very bad. I can recreate it with gcc 5.3.1 or gcc 6.0 on the upstream Linux kernel at tag v4.6-rc3. The call chain which appears to trigger the problem is: qla2x00_get_host_fabric_name() wwn_to_u64() get_unaligned_be64() be64_to_cpup() __be64_to_cpup() It occurs with the combination of the following two recent Linux commits: - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations") - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") The gzipped .i file is attached. I'll also attach the kernel .config file.
[Bug ipa/70646] [4.9/5/6 Regression] Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 --- Comment #13 from Josh Poimboeuf --- So if I understand correctly, some reachable code is incorrectly getting marked unreachable and then getting discarded. Interestingly, the function's epilogue (frame pointer restore) and return instruction are also getting discarded. Can you tell if that will always be the case when this bug triggers? If so, that should make it possible for objtool to reliably detect the bug.
[Bug ipa/70646] [4.9/5/6 Regression] Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 --- Comment #17 from Josh Poimboeuf --- (In reply to Richard Biener from comment #14) > (In reply to Josh Poimboeuf from comment #13) > > Interestingly, the function's epilogue (frame pointer restore) and return > > instruction are also getting discarded. Can you tell if that will always be > > the case when this bug triggers? > > I don't think we can rely on that. The path could simply fall thru to > a random instruction which is still inside the function. Say, if it was > > if (x) > ... > else > ... > foo (); > > then the arm marked unreachable would simply disappear. I tried doing that by putting the offending wwn_to_u64() call in an if statement, but then the bug disappeared. In fact, adding *any* code before the call seems to make the bug disappear.
[Bug ipa/70646] [4.9/5/6 Regression] Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 --- Comment #20 from Josh Poimboeuf --- Thanks very much to everyone who has looked into this so far. It would be very helpful to get answers to the following questions, so we can understand the impact to the kernel: 1) Is there a reliable way to avoid the bug, either in code or with a gcc flag? 2) Is there a reliable way to detect the bug by looking at the object code?
[Bug ipa/70646] [4.9/5/6/7 Regression] Corrupt truncated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 --- Comment #24 from Josh Poimboeuf --- (In reply to Martin Jambor from comment #23) > (In reply to Josh Poimboeuf from comment #20) > > Thanks very much to everyone who has looked into this so far. It would be > > very helpful to get answers to the following questions, so we can understand > > the impact to the kernel: > > > > 1) Is there a reliable way to avoid the bug, either in code or with a gcc > > flag? > > If you mean in general, then unfortunately no, except for disabling > inlining altogether and removing all always_inline's. In addition to > the main bug, I found out that I check --param ipa-max-agg-items only > after incrementing it, so even setting that to zero does not help. > I'll prepare a patch for that too. Yes, I'm looking for a general way to either prevent or try to detect potential other cases of the bug throughout the entire kernel. Can it only occur with the use of __builtin_constant_p(exp) by an inline function (where exp is a constant)?
[Bug c/81813] New: Inefficient stack pointer adjustment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813 Bug ID: 81813 Summary: Inefficient stack pointer adjustment Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Target Milestone: --- Created attachment 41968 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41968&action=edit fs_pin.i With GCC 7.1, the kernel's object code static analysis tool (objtool) found this unusual method of adjusting the stack pointer in the kernel: 2cc: 48 8d 4c 24 08 lea0x8(%rsp),%rcx 2d1: 48 89 ccmov%rcx,%rsp The value in %rcx was never used afterwards. It would be faster and more straightforward to just add 8 to %rsp. To recreate: gcc -mno-sse -mpreferred-stack-boundary=3 -O2 -c -o fs_pin.o fs_pin.i
[Bug c/82221] New: internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 Bug ID: 82221 Summary: internal compiler error: in print_reg, at config/i386/i386.c:17656 Product: gcc Version: 7.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Target Milestone: --- When compiling a patch for the Linux kernel, GCC crashes with the following error: gcc -Wp,-MD,arch/x86/events/.core.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mpreferred-stack-boundary=2 -march=i686 -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-int-in-bool-context -O2 --param=allow-store-data-races=0 -DCC_HAVE_ASM_GOTO -fno-reorder-blocks -fno-ipa-cp-clone -fno-partial-inlining -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -pg -mfentry -DCC_USING_FENTRY -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init-DKBUILD_BASENAME='"core"' -DKBUILD_MODNAME='"core"' -c -o arch/x86/events/core.o arch/x86/events/core.c arch/x86/events/core.c: In function ‘x86_perf_event_update’: arch/x86/events/core.c:108:1: internal compiler error: in print_reg, at config/i386/i386.c:17656 } ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. {standard input}: Assembler messages: {standard input}: Warning: end of file not at end of a line; newline inserted {standard input}:1573: Error: unbalanced parenthesis in operand 2. Preprocessed source stored into /tmp/ccydezQe.out file, please attach this to your bugreport. The code being compiled is at: git://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git clang-sp-clobber I will also attach the kernel .config file which triggers this bug.
[Bug c/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #1 from Josh Poimboeuf --- Created attachment 42179 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42179&action=edit preprocessed source
[Bug c/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #2 from Josh Poimboeuf --- Created attachment 42180 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42180&action=edit kernel config file
[Bug c/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #3 from Josh Poimboeuf --- This has been seen on Fedora GCC 7.1.1 and Debian GCC 6.2.0-3. This bug can be recreated with the above kernel git branch and the attached .config by typing "make ARCH=i386 arch/x86/events/core.o". I've also attached the preprocessed source which GCC produced when it reported the error.
[Bug target/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #7 from Josh Poimboeuf --- Putting "sp" in the clobbers list is something that was suggested to me on the GCC mailing list a while back. And, other than this rare bug, it seems to do exactly what we want, which is, force GCC to save the frame pointer before inserting the inline asm. We need that to happen when we put a "call" instruction inside the inline asm, so that we can get a reliable stack trace from the called function. I know that putting "sp" in the clobbers list is an undocumented "feature", so maybe it is user error. However it would be nice to have something like this as a real feature. Either with "sp", or maybe a new clobbers keyword like "frame". Would that be feasible?
[Bug target/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #9 from Josh Poimboeuf --- I might be misunderstanding, but I don't think we need DRAP for our use case. We just need assurance that the frame pointer (RBP) has been set up before the inline asm is inserted. If clobbering "sp" isn't the right approach, maybe a new syntax, hopefully a documented one, would be better. Like a special "frame" clobber, for example.
[Bug target/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #12 from Josh Poimboeuf --- While I know about DRAP, I'm otherwise not very familiar with GCC internals, so I don't quite understand your points. I would like to clarify that most of the time, when we use "sp" in the clobbers list, the stack does *not* need to be realigned. And in those cases, we would *not* want DRAP, because it's not needed. We would instead just want the normal frame pointer setup. For cases where the stack pointer *does* need to be realigned, DRAP is fine. I can try out your patch next week to see what changes it makes to the generated code. Thanks!
[Bug target/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #14 from Josh Poimboeuf --- (In reply to H.J. Lu from comment #13) > (In reply to Josh Poimboeuf from comment #12) > > I would like to clarify that most of the time, when we use "sp" in the > > clobbers list, the stack does *not* need to be realigned. And in those > > Yes, stack alignment may be needed due to: > > typedef struct { > u64 __attribute__((aligned(8))) counter; > } atomic64_t; > > with -mpreferred-stack-boundary=2. Yes, in the above example that is true. However there are many more cases in the kernel where we want to use the "sp" clobbers, where stack alignment is not needed.
[Bug target/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #16 from Josh Poimboeuf --- Ok. I thought you were saying that, with your new patch, putting "sp" in clobbers would *always* force DRAP, even in cases where the function doesn't need realignment. If you're *not* saying that, then it sounds good :-) Either way I'll try the patch out next week and see what it does.
[Bug target/82221] internal compiler error: in print_reg, at config/i386/i386.c:17656
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221 --- Comment #18 from Josh Poimboeuf --- (In reply to H.J. Lu from comment #17) > (In reply to Josh Poimboeuf from comment #7) > > Putting "sp" in the clobbers list is something that was suggested to me on > > the GCC mailing list a while back. And, other than this rare bug, it seems > > to do exactly what we want, which is, force GCC to save the frame pointer > > before inserting the inline asm. We need that to happen when we put a > > "call" instruction inside the inline asm, so that we can get a reliable > > stack trace from the called function. > > > > I know that putting "sp" in the clobbers list is an undocumented "feature", > > so maybe it is user error. However it would be nice to have something like > > this as a real feature. Either with "sp", or maybe a new clobbers keyword > > like "frame". > > > > Would that be feasible? > > Can you use something like > > unsigned long paravirt_read_pmc(void) > { > register char *frame __asm__("ebp"); > unsigned long __eax; > asm volatile("# foo" : "=a" (__eax) >: "r" (frame) >: "memory", "cc"); > return __eax; > } > > That is your asm statement needs frame pointer. That works, with frame pointers enabled (-fno-omit-frame-pointer). But with -fomit-frame-pointer, it causes GCC to save rbp on the stack, which is not ideal. There's something similar, which does exactly what we need: make "rsp" an input/output constraint: register void *__sp asm("rsp"); static inline function foo(void) { asm volatile("call bar" : "+r" (__sp)); } That forces a frame pointer when frame pointers are enabled, and does nothing when frame pointers are disabled. So I think we'll go with that solution for now. (Note that "__sp" is a global variable due to a compatibility issue with the clang compiler.)
[Bug c/77966] New: Corrupt function with -fsanitize-coverage=trace-pc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77966 Bug ID: 77966 Summary: Corrupt function with -fsanitize-coverage=trace-pc Product: gcc Version: 6.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Target Milestone: --- In the Linux kernel, we found another case (other than bug 70646) where a couple of functions are getting corrupted. Arnd Bergmann reduced it down to a simple test case, and I've reduced it slightly further: $ cat test.c typedef int spinlock_t; extern unsigned int ioread32(void *); struct vnic_wq_ctrl { unsigned int error_status; }; struct vnic_wq { struct vnic_wq_ctrl *ctrl; } mempool_t; struct snic { unsigned int wq_count; __attribute__ ((__aligned__)) struct vnic_wq wq[1]; spinlock_t wq_lock[1]; }; unsigned int snic_log_q_error_err_status; void snic_log_q_error(struct snic *snic) { unsigned int i; for (i = 0; i < snic->wq_count; i++) snic_log_q_error_err_status = ioread32(&snic->wq[i].ctrl->error_status); } $ gcc -O2 -fno-reorder-blocks -fsanitize-coverage=trace-pc -c test.c -o test.o $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text: : 0: 53 push %rbx 1: 48 89 fbmov%rdi,%rbx 4: e8 00 00 00 00 callq 9 5: R_X86_64_PC32__sanitizer_cov_trace_pc-0x4 9: 8b 03 mov(%rbx),%eax b: 85 c0 test %eax,%eax d: 75 09 jne18 f: 5b pop%rbx 10: e9 00 00 00 00 jmpq 15 11: R_X86_64_PC32 __sanitizer_cov_trace_pc-0x4 15: 0f 1f 00nopl (%rax) 18: e8 00 00 00 00 callq 1d 19: R_X86_64_PC32 __sanitizer_cov_trace_pc-0x4 1d: 48 8b 7b 10 mov0x10(%rbx),%rdi 21: e8 00 00 00 00 callq 26 22: R_X86_64_PC32 ioread32-0x4 26: 83 3b 01cmpl $0x1,(%rbx) 29: 89 05 00 00 00 00 mov%eax,0x0(%rip)# 2f 2b: R_X86_64_PC32 snic_log_q_error_err_status-0x4 2f: 76 de jbef 31: e8 00 00 00 00 callq 36 32: R_X86_64_PC32 __sanitizer_cov_trace_pc-0x4 Notice how the function ends unexpectedly after the last call to __sanitizer_cov_trace_pc(). $ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC)
[Bug target/77966] Corrupt function with -fsanitize-coverage=trace-pc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77966 --- Comment #6 from Josh Poimboeuf --- (In reply to Arnd Bergmann from comment #5) > I checked the test case using "-fsanitize=unreachable" and that avoids the > problem. > > Josh, should we set that whenever we enable objtool in the kernel? In theory, adding -fsanitize=unreachable might be a workable option for allowing objtool to detect such unreachable blocks. However, in practice, that option doesn't seem to work as advertised. It seems to change the control flow unexpectedly. When adding it to the test case, it doesn't add a __ubsan_handle_builtin_unreachable() call to the unreachable block. Instead, it treats it as a normal loop, and removes the assumption that the loop can only run one time. Here's the same test case from comment #1, with -fsanitize-unreachable added: : 0: 55 push %rbp 1: 53 push %rbx 2: 48 89 fdmov%rdi,%rbp 5: 31 db xor%ebx,%ebx 7: 48 83 ec 08 sub$0x8,%rsp b: e8 00 00 00 00 callq 10 c: R_X86_64_PC32__sanitizer_cov_trace_pc-0x4 10: 8b 45 00mov0x0(%rbp),%eax 13: 85 c0 test %eax,%eax 15: 75 11 jne28 17: 48 83 c4 08 add$0x8,%rsp 1b: 5b pop%rbx 1c: 5d pop%rbp 1d: e9 00 00 00 00 jmpq 22 1e: R_X86_64_PC32 __sanitizer_cov_trace_pc-0x4 22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 28: e8 00 00 00 00 callq 2d 29: R_X86_64_PC32 __sanitizer_cov_trace_pc-0x4 2d: 89 d8 mov%ebx,%eax 2f: 83 c3 01add$0x1,%ebx 32: 48 8b 7c c5 08 mov0x8(%rbp,%rax,8),%rdi 37: e8 00 00 00 00 callq 3c 38: R_X86_64_PC32 ioread32-0x4 3c: 39 5d 00cmp%ebx,0x0(%rbp) 3f: 77 e7 ja 28 41: eb d4 jmp17
[Bug c/61958] function arbitrarily placed in .text.unlikely section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61958 --- Comment #1 from Josh Poimboeuf --- Created attachment 33207 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33207&action=edit gzipped .i file for the good case
[Bug c/61958] New: function arbitrarily placed in .text.unlikely section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61958 Bug ID: 61958 Summary: function arbitrarily placed in .text.unlikely section Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Created attachment 33206 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33206&action=edit gzipped .i file for the bad case When making changes to a certain source file in the Linux kernel, gcc is unexpectedly moving two functions from the .text section to the .text.unlikely section. When compiling the original version of the source file, the two functions are placed in the .text section: $ cd linux $ git describe v3.16-rc7-7-g31dab71 $ make net/ipv4/fib_trie.o CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALLscripts/checksyscalls.sh CC net/ipv4/fib_trie.o $ eu-readelf -s net/ipv4/fib_trie.o |grep 'insert_leaf_info\|leaf_info_new' 6: 136 FUNCLOCAL DEFAULT1 insert_leaf_info 37: 0dc0 88 FUNCLOCAL DEFAULT1 leaf_info_new $ eu-readelf -S net/ipv4/fib_trie.o |grep '\[ 1\]' [ 1] .textPROGBITS 0040 38c0 0 AX 0 0 16 However, when I apply the following patch, the two functions get unexpectedly moved to .text.unlikely: $ cat /tmp/fib_trie.patch diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 5afeb5a..cc5e3d3 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1187,6 +1187,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) key = key & mask; +printk("foo\n"); fi = fib_create_info(cfg); if (IS_ERR(fi)) { err = PTR_ERR(fi); $ patch -p1 < /tmp/fib_trie.patch patching file net/ipv4/fib_trie.c $ make net/ipv4/fib_trie.o CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALLscripts/checksyscalls.sh CC net/ipv4/fib_trie.o $ eu-readelf -s net/ipv4/fib_trie.o |grep 'insert_leaf_info\|leaf_info_new' 6: 111 FUNCLOCAL DEFAULT5 insert_leaf_info 28: 006f 82 FUNCLOCAL DEFAULT5 leaf_info_new $ eu-readelf -S net/ipv4/fib_trie.o |grep '\[ 5\]' [ 5] .text.unlikely PROGBITS 3770 00c1 0 AX 0 0 1 Both of the moved functions are called by fib_insert_node(), which is inlined by fib_table_insert(), which was modified by the patch. As far as I can tell, it seems likely that the moved functions would be called, especially insert_leaf_info() which is called in the main control path of the function. Also it seems odd that adding a call to printk would change the likelihood of a function being called, since it doesn't change control flow. Using gcc from Fedora rawhide: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.9.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.9.1-20140717/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.9.1-20140717/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.9.1 20140717 (Red Hat 4.9.1-2) (GCC) Full gcc cmdline: gcc -Wp,-MD,net/ipv4/.fib_trie.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.9.1/include -I./arch/x86/include -Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -m64 -mno-mmx -mno-sse -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare -
[Bug middle-end/61958] functions arbitrarily placed in .text.unlikely section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61958 --- Comment #2 from Josh Poimboeuf --- I see a similar issue with another patch to a different kernel file: diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index b10cd43a..40c275f 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -797,6 +797,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg) #endif err = -ENOBUFS; + printk("foo\n"); if (fib_info_cnt >= fib_info_hash_size) { unsigned int new_size = fib_info_hash_size << 1; struct hlist_head *new_info_hash; This results in fib_info_hashfn() no longer being inlined (despite its static inline directive), and being placed in .text.unlikely. Other functions are also moved to .text.unlikely: fib_info_hash_free(), kzalloc.constprop.19(), and fib_info_hash_alloc() I can also provide the data for this error if needed, but it looks like the same issue.
[Bug c/119279] New: Specifying frame pointer dependency in inline asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279 Bug ID: 119279 Summary: Specifying frame pointer dependency in inline asm Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com CC: ak at gcc dot gnu.org, hjl.tools at gmail dot com, hpa at zytor dot com, jakub at gcc dot gnu.org, peterz at infradead dot org, pinskia at gcc dot gnu.org, rguenth at gcc dot gnu.org, torva...@linux-foundation.org, ubizjak at gmail dot com Target Milestone: --- Target: x86_64-*-* Created attachment 60749 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60749&action=edit Reduced test case (gcc -O2 -fno-omit-frame-pointer -fno-optimize-sibling-calls -c mmap.i -o mmap.o) [ This was discussed previously in bug 117311, which was a documentation request for existing (but unsupported) behavior. This bug here is not for documenting existing behavior, but rather a request to define a supported solution for specifying a frame pointer dependency in inline asm, whatever that looks like. ] In the Linux kernel on x86 it's common for inline asm to have a "call" to a thunk which saves registers before calling out to another function. If the inline asm is in a leaf function, or gets emitted before the function prologue, the frame pointer doesn't get set up before the call. The current unsupported workaround used throughout the kernel is to make the stack pointer an in/out constraint: register unsigned long current_stack_pointer asm("%rsp"); #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) We've also experimented with making __builtin_frame_address(0) an input constraint, which also seems to work: #define ASM_CALL_CONSTRAINT "r" (__builtin_frame_address(0)) Unfortunately these are unsupported hacks which just happen to work. It would be much better to have a supported solution. The attached test case results in the following: : 0: e8 00 00 00 00 call 51: R_X86_64_PLT32 __SCT__preempt_schedule-0x4 5: 48 83 3d 00 00 00 00 00 cmpq $0x0,0x0(%rip)# d 8: R_X86_64_PC32 pfn_modify_allowed_pfn-0x5 d: 75 01 jne10 f: c3 ret 10: 55 push %rbp 11: 31 c0 xor%eax,%eax 13: 48 89 e5mov%rsp,%rbp 16: e8 00 00 00 00 call 1b 17: R_X86_64_PLT32 capable-0x4 1b: 5d pop%rbp 1c: c3 ret