https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118891

--- Comment #20 from marcus at mc dot pp.se ---
Hi.

Do you need more help to fix the broken autovectorization?

Again, you don't need ARM hardware; here is how I could reproduce the
miscompilation on a PPC64 host:

---8<---
wget
https://distfiles.gentoo.org/releases/arm64/autobuilds/20241012T164823Z/stage3-aarch64_be-openrc-20241012T164823Z.tar.xz
mkdir /tmp/gcc14_sysroot
tar -x -J -C /tmp/gcc14_sysroot --exclude './dev/*' -f
stage3-aarch64_be-openrc-20241012T164823Z.tar.xz

git clone --depth=10 -b binutils-2_44 git://sourceware.org/git/binutils-gdb.git
binutils244
cd binutils244
mkdir build
cd build
../configure --target=aarch64_be-unknown-linux-gnu --prefix=/tmp/gcc14_install
make -j40
make install
cd ../..

export PATH=/tmp/gcc14_install/bin:$PATH

git clone --depth=10 -b releases/gcc-14 git://gcc.gnu.org/git/gcc.git gcc14
cd gcc14
mkdir build
cd build
../configure --target=aarch64_be-unknown-linux-gnu --prefix=/tmp/gcc14_install
--with-sysroot=/tmp/gcc14_sysroot --with-build-sysroot=/tmp/gcc14_sysroot
make -j40
make install


cat > testcase.c
#include <stdio.h>
#define N 16
__attribute__ ((noinline, noclone)) void main1 (int X)
{  
  int arr[N]; int k = 3; int m, i=0;
  do { 
    m = k + 5;
    arr[i] = m;
    k = k + X;
    i++;
  } while (i < N);
  for (i = 0; i < N; i++) printf("%d ", arr[i]);
  printf("\n");
}
int main() { main1(2); return 0; }


/tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -static -O2 -o testcase
testcase.c
qemu-aarch64_be ./testcase  # wrong result

/tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -static -O2
-fno-tree-vectorize -o testcase testcase.c
qemu-aarch64_be ./testcase  # correct result
---8<---

Looking at the generated code, it just seems like it has messed up the order of
the register arguments to zip1.  If I reverse them I get the correct output
from the test program:

---8<---
/tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -O2 -S -o testcase.s
testcase.c
sed -i -e '/zip1/s/v31.4s, v30.4s$/v30.4s, v31.4s/' testcase.s
/tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -static -o testcase
testcase.s
qemu-aarch64_be ./testcase  # correct result
---8<---

(The resulting binary also produces the correct result on ARM hardware.)

Reply via email to