https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117675

            Bug ID: 117675
           Summary: ARM Cortext 7-A ldrd register overlap
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fhunl...@troodon-software.com
  Target Milestone: ---

Created attachment 59633
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59633&action=edit
Long example source code

This was found when compiling the Erlang runtime,
https://github.com/erlang/otp, for an ARM Cortex 7-A processor. A much smaller
code example is at https://godbolt.org/z/eKGzTrWTM.

The much smaller code example looks like it reproduces the issue, but its
register choice is different, so I'm including both just in case.

In both cases, an `ldrd` instruction was emitted that used the same index and
target register (e.g., `r3` below since it's implicitly used when loading to
`r2`): 

```
002de77c <ethr_dw_atomic_read_acqb>:                                            
  2de77c:       e2003007        and     r3, r0, #7 
  2de780:       e18020d3        ldrd    r2, [r0, r3]
  2de784:       e1c120f0        strd    r2, [r1]
  2de788:       f57ff05d        dmb     ld     
  2de78c:       e12fff1e        bx      lr     
```

Warnings were emitted:

```
obj/arm-buildroot-linux-gnueabihf/opt/r/ethr_atomics.s: Assembler messages:
obj/arm-buildroot-linux-gnueabihf/opt/r/ethr_atomics.s:1377: Warning: index
register overlaps transfer register
...
```

At runtime, this resulted in a SIGILL on an Allwinner H3.

Reproduction details to follow.

## Long example

See the ethr_atomics.i attachment.

```
armv7-nerves-linux-gnueabihf-gcc -DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT
-DPOSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS  -Werror=undef -Werror=implicit
-Werror=return-type  -fno-strict-aliasing -fno-common -D_LARGEFILE_SOURCE
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g2 -D_FORTIFY_SOURCE=1
-I/home/fhunleth/nerves/nerves_system_br/o/condor/build/erlang-27.1.2/erts/arm-buildroot-linux-gnueabihf
 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
-Wall -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes
-Wdeclaration-after-statement -DHAVE_CONFIG_H  -I../include
-I../include/arm-buildroot-linux-gnueabihf -I../include/internal
-I../include/internal/arm-buildroot-linux-gnueabihf -I../emulator/beam
-I../emulator/sys/unix -c common/ethr_atomics.c -o
obj/arm-buildroot-linux-gnueabihf/opt/r/ethr_atomics.o
```

`armv7-nerves-linux-gnueabihf-gcc` is a GCC 14.2.0 cross compiler built with
Crosstool-NG. Here are the specs:

```
Using built-in specs.
COLLECT_GCC=/home/fhunleth/nerves/nerves_system_br/o/condor/host/opt/ext-toolchain/bin/armv7-nerves-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=/home/fhunleth/nerves/nerves_system_br/o/condor/host/opt/ext-toolchain/bin/../libexec/gcc/armv7-nerves-linux-gnueabihf/14.2.0/lto-wrapper
Target: armv7-nerves-linux-gnueabihf
Configured with: /tmp/ctng-work/armv7-nerves-linux-gnueabihf/src/gcc/configure
--build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu
--target=armv7-nerves-linux-gnueabihf
--prefix=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf
--exec_prefix=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf
--with-sysroot=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf/armv7-nerves-linux-gnueabihf/sysroot
--enable-languages=c,c++,fortran --with-cpu=generic-armv7-a
--with-fpu=vfpv3-d16 --with-float=hard --with-pkgversion='crosstool-NG
1.26.0.107_5595edc' --enable-__cxa_atexit --disable-libmudflap --enable-libgomp
--disable-libssp --disable-libquadmath --disable-libquadmath-support
--disable-libsanitizer --disable-libmpx
--with-gmp=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools
--with-mpfr=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools
--with-mpc=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools
--with-isl=/tmp/ctng-work/armv7-nerves-linux-gnueabihf/buildtools --enable-lto
--without-zstd --enable-threads=posix --enable-target-optspace
--enable-linker-build-id --disable-plugin --disable-nls --disable-multilib
--with-local-prefix=/home/nerves/build/nerves_toolchain_armv7_nerves_linux_gnueabihf/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-linux_x86_64-14.2.0/x-tools/armv7-nerves-linux-gnueabihf/armv7-nerves-linux-gnueabihf/sysroot
--enable-long-long
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.2.0 (crosstool-NG 1.26.0.107_5595edc)
``` 

I tried the Bootlin armv7-eabihf gcc 14.2.0 cross-compiler
(https://toolchains.bootlin.com/) and it had the same behavior. 

The C code that causes the issue is the inlined function:

```
static __inline__ ethr_sint64_t
ethr_native_su_dw_atomic_read(ethr_native_dw_atomic_t *var)
{
    ethr_native_dw_ptr_t p = (ethr_native_dw_ptr_t) (&(var)->c[(int)
((ethr_uint_t) &(var)->c[0]) & 0x7]);
    ;
    return __atomic_load_n(p, 0);
}

```

GCC 13.2.0 does not produce assembler with the issue. Here's what it looks
like:

```
002dc0b0 <ethr_dw_atomic_read_acqb>:
  2dc0b0:       e2003007        and     r3, r0, #7                              
  2dc0b4:       e0833000        add     r3, r3, r0 
  2dc0b8:       e1c320d0        ldrd    r2, [r3]
  2dc0bc:       e1c120f0        strd    r2, [r1]
  2dc0c0:       f57ff05f        dmb     sy     
  2dc0c4:       e12fff1e        bx      lr     
```

## Short example

The Godbolt example at https://godbolt.org/z/eKGzTrWTM tries to replicate the
issue. Here's the C:

```c
#include <stdint.h>

#define ETHR_DW_NATMC_ALIGN_MASK__ 0x7
#define ETHR_DW_NATMC_MEM__(VAR) \
   (&(VAR)->c[(int) ((uint32_t) &(VAR)->c[0]) & ETHR_DW_NATMC_ALIGN_MASK__])
typedef union {
    volatile int64_t dw_sint;
    volatile int32_t sint[3];
    volatile char c[4*3];
} ethr_native_dw_atomic_t;

int64_t test(ethr_native_dw_atomic_t *x)
{
    uint64_t *p = (uint64_t *) ETHR_DW_NATMC_MEM__(x);
    return __atomic_load_n(p, __ATOMIC_RELAXED);
}
```

With GCC 14.2.0 (flags=-marm -O3 -mlibarch=armv7ve+simd -march=armv7ve+simd
-mcpu=cortex-a7):

```
test:
        and     r3, r0, #7
        ldrd    r0, r1, [r0, r3]
        bx      lr
```

Switch to GCC 14.1.0 with the same flags and the following assembler is
produced:

```
test:
        and     r3, r0, #7
        add     r3, r3, r0
        ldrd    r0, r1, [r3]
        bx      lr
```

Looking at
https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Instruction-Details/Alphabetical-list-of-instructions/LDRD--register-?lang=en,
the `r0` overlap might be ok (the Rt and Rn overlap sometimes works). The long
example's overlap with Rt2 and Rm hits one of the UNPREDICTABLE conditions.

Interestingly enough, if you remove `-marm` with GCC 14.2.0, the assembler is
also ok. I didn't expect that, so perhaps that's a hint.

Looking through the differences between GCC 14.1.0 and GCC 14.2.0, the commit
at
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=83332e3f808b146ca06dbc6a91d15bd3e5650658
looks like the only one that might affect `ldrd`, but I'm really not sure.

Reply via email to