https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117675
Bug ID: 117675 Summary: ARM Cortext 7-A ldrd register overlap Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: fhunl...@troodon-software.com Target Milestone: --- Created attachment 59633 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59633&action=edit Long example source code This was found when compiling the Erlang runtime, https://github.com/erlang/otp, for an ARM Cortex 7-A processor. A much smaller code example is at https://godbolt.org/z/eKGzTrWTM. The much smaller code example looks like it reproduces the issue, but its register choice is different, so I'm including both just in case. In both cases, an `ldrd` instruction was emitted that used the same index and target register (e.g., `r3` below since it's implicitly used when loading to `r2`): ``` 002de77c <ethr_dw_atomic_read_acqb>: 2de77c: e2003007 and r3, r0, #7 2de780: e18020d3 ldrd r2, [r0, r3] 2de784: e1c120f0 strd r2, [r1] 2de788: f57ff05d dmb ld 2de78c: e12fff1e bx lr ``` Warnings were emitted: ``` obj/arm-buildroot-linux-gnueabihf/opt/r/ethr_atomics.s: Assembler messages: obj/arm-buildroot-linux-gnueabihf/opt/r/ethr_atomics.s:1377: Warning: index register overlaps transfer register ... ``` At runtime, this resulted in a SIGILL on an Allwinner H3. Reproduction details to follow. ## Long example See the ethr_atomics.i attachment. ``` armv7-nerves-linux-gnueabihf-gcc -DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -Werror=undef -Werror=implicit -Werror=return-type -fno-strict-aliasing -fno-common -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g2 -D_FORTIFY_SOURCE=1 -I/home/fhunleth/nerves/nerves_system_br/o/condor/build/erlang-27.1.2/erts/arm-buildroot-linux-gnueabihf -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -Wall -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Wdeclaration-after-statement -DHAVE_CONFIG_H -I../include -I../include/arm-buildroot-linux-gnueabihf -I../include/internal -I../include/internal/arm-buildroot-linux-gnueabihf -I../emulator/beam -I../emulator/sys/unix -c common/ethr_atomics.c -o obj/arm-buildroot-linux-gnueabihf/opt/r/ethr_atomics.o ``` `armv7-nerves-linux-gnueabihf-gcc` is a GCC 14.2.0 cross compiler built with Crosstool-NG. Here are the specs: ``` Using built-in specs. COLLECT_GCC=/home/fhunleth/nerves/nerves_system_br/o/condor/host/opt/ext-toolchain/bin/armv7-nerves-linux-gnueabihf-gcc COLLECT_LTO_WRAPPER=/home/fhunleth/nerves/nerves_system_br/o/condor/host/opt/ext-toolchain/bin/../libexec/gcc/armv7-nerves-linux-gnueabihf/14.2.0/lto-wrapper Target: armv7-nerves-linux-gnueabihf Configured with: /tmp/ctng-work/armv7-nerves-linux-gnueabihf/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=armv7-nerves-linux-gnueabihf --prefix=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf --exec_prefix=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf --with-sysroot=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf/armv7-nerves-linux-gnueabihf/sysroot --enable-languages=c,c++,fortran --with-cpu=generic-armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-pkgversion='crosstool-NG 1.26.0.107_5595edc' --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --disable-libquadmath --disable-libquadmath-support --disable-libsanitizer --disable-libmpx --with-gmp=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools --with-mpfr=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools --with-mpc=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools --with-isl=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools --enable-lto --without-zstd --enable-threads=posix --enable-target-optspace --enable-linker-build-id --disable-plugin --disable-nls --disable-multilib --with-local-prefix=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf/armv7-nerves-linux-gnueabihf/sysroot --enable-long-long Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.2.0 (crosstool-NG 1.26.0.107_5595edc) ``` I tried the Bootlin armv7-eabihf gcc 14.2.0 cross-compiler (https://toolchains.bootlin.com/) and it had the same behavior. The C code that causes the issue is the inlined function: ``` static __inline__ ethr_sint64_t ethr_native_su_dw_atomic_read(ethr_native_dw_atomic_t *var) { ethr_native_dw_ptr_t p = (ethr_native_dw_ptr_t) (&(var)->c[(int) ((ethr_uint_t) &(var)->c[0]) & 0x7]); ; return __atomic_load_n(p, 0); } ``` GCC 13.2.0 does not produce assembler with the issue. Here's what it looks like: ``` 002dc0b0 <ethr_dw_atomic_read_acqb>: 2dc0b0: e2003007 and r3, r0, #7 2dc0b4: e0833000 add r3, r3, r0 2dc0b8: e1c320d0 ldrd r2, [r3] 2dc0bc: e1c120f0 strd r2, [r1] 2dc0c0: f57ff05f dmb sy 2dc0c4: e12fff1e bx lr ``` ## Short example The Godbolt example at https://godbolt.org/z/eKGzTrWTM tries to replicate the issue. Here's the C: ```c #include <stdint.h> #define ETHR_DW_NATMC_ALIGN_MASK__ 0x7 #define ETHR_DW_NATMC_MEM__(VAR) \ (&(VAR)->c[(int) ((uint32_t) &(VAR)->c[0]) & ETHR_DW_NATMC_ALIGN_MASK__]) typedef union { volatile int64_t dw_sint; volatile int32_t sint[3]; volatile char c[4*3]; } ethr_native_dw_atomic_t; int64_t test(ethr_native_dw_atomic_t *x) { uint64_t *p = (uint64_t *) ETHR_DW_NATMC_MEM__(x); return __atomic_load_n(p, __ATOMIC_RELAXED); } ``` With GCC 14.2.0 (flags=-marm -O3 -mlibarch=armv7ve+simd -march=armv7ve+simd -mcpu=cortex-a7): ``` test: and r3, r0, #7 ldrd r0, r1, [r0, r3] bx lr ``` Switch to GCC 14.1.0 with the same flags and the following assembler is produced: ``` test: and r3, r0, #7 add r3, r3, r0 ldrd r0, r1, [r3] bx lr ``` Looking at https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Instruction-Details/Alphabetical-list-of-instructions/LDRD--register-?lang=en, the `r0` overlap might be ok (the Rt and Rn overlap sometimes works). The long example's overlap with Rt2 and Rm hits one of the UNPREDICTABLE conditions. Interestingly enough, if you remove `-marm` with GCC 14.2.0, the assembler is also ok. I didn't expect that, so perhaps that's a hint. Looking through the differences between GCC 14.1.0 and GCC 14.2.0, the commit at https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=83332e3f808b146ca06dbc6a91d15bd3e5650658 looks like the only one that might affect `ldrd`, but I'm really not sure.