https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118891
--- Comment #20 from marcus at mc dot pp.se --- Hi. Do you need more help to fix the broken autovectorization? Again, you don't need ARM hardware; here is how I could reproduce the miscompilation on a PPC64 host: ---8<--- wget https://distfiles.gentoo.org/releases/arm64/autobuilds/20241012T164823Z/stage3-aarch64_be-openrc-20241012T164823Z.tar.xz mkdir /tmp/gcc14_sysroot tar -x -J -C /tmp/gcc14_sysroot --exclude './dev/*' -f stage3-aarch64_be-openrc-20241012T164823Z.tar.xz git clone --depth=10 -b binutils-2_44 git://sourceware.org/git/binutils-gdb.git binutils244 cd binutils244 mkdir build cd build ../configure --target=aarch64_be-unknown-linux-gnu --prefix=/tmp/gcc14_install make -j40 make install cd ../.. export PATH=/tmp/gcc14_install/bin:$PATH git clone --depth=10 -b releases/gcc-14 git://gcc.gnu.org/git/gcc.git gcc14 cd gcc14 mkdir build cd build ../configure --target=aarch64_be-unknown-linux-gnu --prefix=/tmp/gcc14_install --with-sysroot=/tmp/gcc14_sysroot --with-build-sysroot=/tmp/gcc14_sysroot make -j40 make install cat > testcase.c #include <stdio.h> #define N 16 __attribute__ ((noinline, noclone)) void main1 (int X) { int arr[N]; int k = 3; int m, i=0; do { m = k + 5; arr[i] = m; k = k + X; i++; } while (i < N); for (i = 0; i < N; i++) printf("%d ", arr[i]); printf("\n"); } int main() { main1(2); return 0; } /tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -static -O2 -o testcase testcase.c qemu-aarch64_be ./testcase # wrong result /tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -static -O2 -fno-tree-vectorize -o testcase testcase.c qemu-aarch64_be ./testcase # correct result ---8<--- Looking at the generated code, it just seems like it has messed up the order of the register arguments to zip1. If I reverse them I get the correct output from the test program: ---8<--- /tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -O2 -S -o testcase.s testcase.c sed -i -e '/zip1/s/v31.4s, v30.4s$/v30.4s, v31.4s/' testcase.s /tmp/gcc14_install/bin/aarch64_be-unknown-linux-gnu-gcc -static -o testcase testcase.s qemu-aarch64_be ./testcase # correct result ---8<--- (The resulting binary also produces the correct result on ARM hardware.)