This patch adds the ability to fold the address computation into the addressing mode for LDAPR instructions using LDAPUR when RCPC2 is available.
LDAPUR emission is controlled by the tune flag enable_ldapur, to enable it on a per-core basis. Earlier, the following code: uint64_t foo (std::atomic<uint64_t> *x) { return x[1].load(std::memory_order_acquire); } would generate: foo(std::atomic<unsigned long>*): add x0, x0, 8 ldapr x0, [x0] ret but now generates: foo(std::atomic<unsigned long>*): ldapur x0, [x0, 8] ret The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soum...@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): Add the enable_ldapur flag to conwtrol LDAPUR emission. * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag. * config/aarch64/aarch64.md (any): Add ldapur_enable attribute. * config/aarch64/atomics.md: (aarch64_atomic_load<mode>_rcpc): Modify to emit LDAPUR for cores with RCPC2 when enable_ldapur is set. (*aarch64_atomic_load<ALLX:mode>_rcpc_zext): Likewise. (*aarch64_atomic_load<ALLX:mode>_rcpc_sext): Modified to emit LDAPURS for addressing with offsets. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ldapur.c: New test.
0001-aarch64-Enable-selective-LDAPUR-generation-for-cores.patch
Description: 0001-aarch64-Enable-selective-LDAPUR-generation-for-cores.patch