[Bug target/96050] New: PDP-11: 32-bit MOV from offset(Rn) overrides Rn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96050 Bug ID: 96050 Summary: PDP-11: 32-bit MOV from offset(Rn) overrides Rn Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: imachug at gmail dot com Target Milestone: --- Consider the following code: struct { unsigned long a, b; } structure; void calc() { unsigned long x = structure.a; unsigned long y = structure.b; asm volatile(""::"r"(x), "r"(y)); } ("asm volatile" is just to stop GCC from removing x and y completely) When this source is compiled with "-Os -S", GCC erroneously generates the following assembly to load structure members to registers: mov $_structure,r0 mov (r0),r2 mov 02(r0),r3 mov 04(r0),r0 mov 06(r0),r1 "mov 04(r0), r0" overrides r0, which the next instruction assumes to contain the old non-overwritten value. I think this has to do with disabled early clobbering on movsi insn, but adding "&" to lines 529, 536 in pdp11.md (i.e. changing "=r,r,g,g" to "=&r,r,g,g" in "[(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,g,g")") didn't fix the bug for me. $ ./tools/bin/pdp11-aout-gcc -v Using built-in specs. COLLECT_GCC=./tools/bin/pdp11-aout-gcc-10.1.0 COLLECT_LTO_WRAPPER=/[redacted]/tools/libexec/gcc/pdp11-aout/10.1.0/lto-wrapper Target: pdp11-aout Configured with: ../configure --prefix /[redacted]/tools --target pdp11-aout --enable-languages=c --with-gnu-as --with-gnu-ld --without-headers --disable-libssp Thread model: single Supported LTO compression algorithms: zlib gcc version 10.1.0 (GCC) $ uname -a Linux [redacted] 5.3.0-59-generic #53-Ubuntu SMP Wed Jun 3 15:52:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[Bug target/103696] New: Lambda functions are not inlined under certain optimization pragmas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103696 Bug ID: 103696 Summary: Lambda functions are not inlined under certain optimization pragmas Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: imachug at gmail dot com Target Milestone: --- This seems like a very weird bug to me and I'm not even sure how to label it, so please fix the component if needed. Testcase (C++): #pragma GCC optimize("finite-math-only") #pragma GCC target("sse3") void fn() { } int global_var; int solve() { auto nested = []() { return global_var; }; return nested(); } When compiling this code via `g++ test.cpp -c -O2 -std=c++17`, I get the following assembly: $ objdump -d test.o ... <_ZZ5solvevENKUlvE_clEv.constprop.0>: 0: 8b 05 00 00 00 00 mov0x0(%rip),%eax# 6 <_ZZ5solvevENKUlvE_clEv.constprop.0+0x6> 6: c3 retq 7: 66 0f 1f 84 00 00 00nopw 0x0(%rax,%rax,1) e: 00 00 ... 0020 <_Z5solvev>: 20: f3 0f 1e fa endbr64 24: e8 d7 ff ff ff callq 0 <_ZZ5solvevENKUlvE_clEv.constprop.0> 29: c3 retq As you can see, the nested() lambda call was not inlined into solve(). However, if I do any of the following, the lambda is inlined as expected: - Remove `fn` definition - Move `fn` definition under `solve` - Replace reading `global_var` with a constant - Make `nested` a global function - Remove either of the two pragmas (or both) - Add -ffinite-math-only or -msse3 or both to the compilation line (regardless of whether the pragmas are still there) I have absolutely no idea why a floating point optimization affects inlining or how a pragma is different from a compilation line option wrt. this bug.
[Bug target/103696] Lambda functions are not inlined under certain optimization pragmas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103696 --- Comment #1 from Ivan Machugovskiy --- Obligatory info dump. I managed to reproduce this on G++ 9.3.0 and G++ 10.3.0 locally, and on G++ trunk on Godbolt (see https://godbolt.org/z/Y5Kr3KfjW). This is probably a longstanding bug. $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) $ g++-10 -v Using built-in specs. COLLECT_GCC=g++-10 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.3.0-1ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-S4I5Pr/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-S4I5Pr/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~20.04)
[Bug tree-optimization/116768] New: Strict aliasing breaks autovectorization with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116768 Bug ID: 116768 Summary: Strict aliasing breaks autovectorization with -O3 Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: imachug at gmail dot com Target Milestone: --- This returns 0 (wrong) with strict aliasing enabled and 1 (correct) with strict aliasing disabled. Looks like a bug to me (no casts, sanitizers are silent, the example is a minimization of an std::bitset-based reproducer). -O3 -mavx is required to trigger the bug. I believe this is a bug in TBAA, because defining Parent to Child or replacing `y_child->` with `y->child.` fixes the miscompilation. A quick check with Godbolt shows the code is reduced to 'return 0' by the last tree pass, so I'm tentatively labeling this tree-optimization. This can be reproduced starting with 11.2 up to trunk. https://godbolt.org/z/1v16bPdfv ``` typedef struct { unsigned long words[2]; } Child; typedef struct { Child child; } Parent; Parent my_or(Parent x, const Parent *y) { const Child *y_child = &y->child; for (int i = 0; i < 2; i++) { x.child.words[i] |= y_child->words[i]; } return x; } int main() { Parent bs[4]; __builtin_memset(bs, 0, sizeof(bs)); bs[0].child.words[0] = 1; for (int i = 1; i <= 3; i++) { bs[i] = my_or(bs[i], &bs[i - 1]); } return bs[2].child.words[0]; } ``` Here's -v for my local compiler if you find it useful. Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.2.1 20240805 (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-' /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/cc1 -E -quiet -v test.c -mavx -mtune=generic -march=x86-64 -O3 -fpch-preprocess -o a-test.i ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/include /usr/local/include /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/include-fixed /usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-' /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/cc1 -fpreprocessed a-test.i -quiet -dumpdir a- -dumpbase test.c -dumpbase-ext .c -mavx -mtune=generic -march=x86-64 -O3 -version -o a-test.s GNU C17 (GCC) version 14.2.1 20240805 (x86_64-pc-linux-gnu) compiled by GNU C version 14.2.1 20240805, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.26-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: faa3163d33b78b77071c76eebeab3034 COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-' as -v --64 -o a-test.o a-test.s GNU assembler version 2.43.0 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.43.0 COMPILER_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/ LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a.'
[Bug tree-optimization/116768] [12/13/14/15 regression] Strict aliasing breaks autovectorization with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116768 Alisa Sireneva changed: What|Removed |Added Known to work|11.4.0 |11.1.0 --- Comment #4 from Alisa Sireneva --- With the new reproducer, this doesn't work on 11.4