[Bug target/96050] New: PDP-11: 32-bit MOV from offset(Rn) overrides Rn

2020-07-03 Thread imachug at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96050

Bug ID: 96050
   Summary: PDP-11: 32-bit MOV from offset(Rn) overrides Rn
   Product: gcc
   Version: 10.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: imachug at gmail dot com
  Target Milestone: ---

Consider the following code:


struct {
unsigned long a, b;
} structure;

void calc() {
unsigned long x = structure.a;
unsigned long y = structure.b;
asm volatile(""::"r"(x), "r"(y));
}


("asm volatile" is just to stop GCC from removing x and y completely)

When this source is compiled with "-Os -S", GCC erroneously generates the
following assembly to load structure members to registers:


mov $_structure,r0
mov (r0),r2
mov 02(r0),r3
mov 04(r0),r0
mov 06(r0),r1


"mov 04(r0), r0" overrides r0, which the next instruction assumes to contain
the old non-overwritten value.

I think this has to do with disabled early clobbering on movsi insn, but adding
"&" to lines 529, 536 in pdp11.md (i.e. changing "=r,r,g,g" to "=&r,r,g,g" in
"[(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,g,g")") didn't fix the
bug for me.


$ ./tools/bin/pdp11-aout-gcc -v
Using built-in specs.
COLLECT_GCC=./tools/bin/pdp11-aout-gcc-10.1.0
COLLECT_LTO_WRAPPER=/[redacted]/tools/libexec/gcc/pdp11-aout/10.1.0/lto-wrapper
Target: pdp11-aout
Configured with: ../configure --prefix /[redacted]/tools --target pdp11-aout
--enable-languages=c --with-gnu-as --with-gnu-ld --without-headers
--disable-libssp
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.1.0 (GCC) 

$ uname -a
Linux [redacted] 5.3.0-59-generic #53-Ubuntu SMP Wed Jun 3 15:52:15 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux

[Bug target/103696] New: Lambda functions are not inlined under certain optimization pragmas

2021-12-13 Thread imachug at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103696

Bug ID: 103696
   Summary: Lambda functions are not inlined under certain
optimization pragmas
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: imachug at gmail dot com
  Target Milestone: ---

This seems like a very weird bug to me and I'm not even sure how to label it,
so please fix the component if needed.

Testcase (C++):


#pragma GCC optimize("finite-math-only")
#pragma GCC target("sse3")

void fn() {
}

int global_var;

int solve() {
auto nested = []() {
return global_var;
};
return nested();
}


When compiling this code via `g++ test.cpp -c -O2 -std=c++17`, I get the
following assembly:


$ objdump -d test.o
...
 <_ZZ5solvevENKUlvE_clEv.constprop.0>:
   0:   8b 05 00 00 00 00   mov0x0(%rip),%eax# 6
<_ZZ5solvevENKUlvE_clEv.constprop.0+0x6>
   6:   c3  retq   
   7:   66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
   e:   00 00 
...
0020 <_Z5solvev>:
  20:   f3 0f 1e fa endbr64 
  24:   e8 d7 ff ff ff  callq  0 <_ZZ5solvevENKUlvE_clEv.constprop.0>
  29:   c3  retq   


As you can see, the nested() lambda call was not inlined into solve().

However, if I do any of the following, the lambda is inlined as expected:

- Remove `fn` definition
- Move `fn` definition under `solve`
- Replace reading `global_var` with a constant
- Make `nested` a global function
- Remove either of the two pragmas (or both)
- Add -ffinite-math-only or -msse3 or both to the compilation line (regardless
of whether the pragmas are still there)

I have absolutely no idea why a floating point optimization affects inlining or
how a pragma is different from a compilation line option wrt. this bug.

[Bug target/103696] Lambda functions are not inlined under certain optimization pragmas

2021-12-13 Thread imachug at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103696

--- Comment #1 from Ivan Machugovskiy  ---
Obligatory info dump. I managed to reproduce this on G++ 9.3.0 and G++ 10.3.0
locally, and on G++ trunk on Godbolt (see https://godbolt.org/z/Y5Kr3KfjW).
This is probably a longstanding bug.


$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-9
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

$ g++-10 -v
Using built-in specs.
COLLECT_GCC=g++-10
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
10.3.0-1ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-10-S4I5Pr/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-S4I5Pr/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~20.04)

[Bug tree-optimization/116768] New: Strict aliasing breaks autovectorization with -O3

2024-09-18 Thread imachug at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116768

Bug ID: 116768
   Summary: Strict aliasing breaks autovectorization with -O3
   Product: gcc
   Version: 14.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: imachug at gmail dot com
  Target Milestone: ---

This returns 0 (wrong) with strict aliasing enabled and 1 (correct) with strict
aliasing disabled. Looks like a bug to me (no casts, sanitizers are silent, the
example is a minimization of an std::bitset-based reproducer). -O3 -mavx is
required to trigger the bug.

I believe this is a bug in TBAA, because defining Parent to Child or replacing
`y_child->` with `y->child.` fixes the miscompilation. A quick check with
Godbolt shows the code is reduced to 'return 0' by the last tree pass, so I'm
tentatively labeling this tree-optimization.

This can be reproduced starting with 11.2 up to trunk.
https://godbolt.org/z/1v16bPdfv

```
typedef struct {
  unsigned long words[2];
} Child;

typedef struct {
  Child child;
} Parent;

Parent my_or(Parent x, const Parent *y) {
  const Child *y_child = &y->child;
  for (int i = 0; i < 2; i++) {
x.child.words[i] |= y_child->words[i];
  }
  return x;
}

int main() {
  Parent bs[4];
  __builtin_memset(bs, 0, sizeof(bs));

  bs[0].child.words[0] = 1;
  for (int i = 1; i <= 3; i++) {
bs[i] = my_or(bs[i], &bs[i - 1]);
  }
  return bs[2].child.words[0];
}
```

Here's -v for my local compiler if you find it useful.

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust
--enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues
--with-build-config=bootstrap-lto --with-linker-hash-style=gnu
--with-system-zlib --enable-__cxa_atexit --enable-cet=auto
--enable-checking=release --enable-clocale=gnu --enable-default-pie
--enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object
--enable-libstdcxx-backtrace --enable-link-serialization=1
--enable-linker-build-id --enable-lto --enable-multilib --enable-plugin
--enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.2.1 20240805 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic'
'-march=x86-64' '-dumpdir' 'a-'
 /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/cc1 -E -quiet -v test.c -mavx
-mtune=generic -march=x86-64 -O3 -fpch-preprocess -o a-test.i
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/include
 /usr/local/include
 /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic'
'-march=x86-64' '-dumpdir' 'a-'
 /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/cc1 -fpreprocessed a-test.i -quiet
-dumpdir a- -dumpbase test.c -dumpbase-ext .c -mavx -mtune=generic
-march=x86-64 -O3 -version -o a-test.s
GNU C17 (GCC) version 14.2.1 20240805 (x86_64-pc-linux-gnu)
compiled by GNU C version 14.2.1 20240805, GMP version 6.3.0, MPFR
version 4.2.1, MPC version 1.3.1, isl version isl-0.26-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: faa3163d33b78b77071c76eebeab3034
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic'
'-march=x86-64' '-dumpdir' 'a-'
 as -v --64 -o a-test.o a-test.s
GNU assembler version 2.43.0 (x86_64-pc-linux-gnu) using BFD version (GNU
Binutils) 2.43.0
COMPILER_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx' '-mtune=generic'
'-march=x86-64' '-dumpdir' 'a.'
 

[Bug tree-optimization/116768] [12/13/14/15 regression] Strict aliasing breaks autovectorization with -O3

2024-09-18 Thread imachug at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116768

Alisa Sireneva  changed:

   What|Removed |Added

  Known to work|11.4.0  |11.1.0

--- Comment #4 from Alisa Sireneva  ---
With the new reproducer, this doesn't work on 11.4