This patch adds the ability to fold the address computation into the addressing
mode for LDAPR instructions using LDAPUR when RCPC2 is available.

LDAPUR emission is controlled by the tune flag enable_ldapur, to enable it on a
per-core basis. Earlier, the following code:

uint64_t
foo (std::atomic<uint64_t> *x)
{
  return x[1].load(std::memory_order_acquire);
}

would generate:

foo(std::atomic<unsigned long>*):
        add     x0, x0, 8
        ldapr   x0, [x0]
        ret

but now generates:

foo(std::atomic<unsigned long>*):
        ldapur  x0, [x0, 8]
        ret

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soum...@nvidia.com>

gcc/ChangeLog:

        * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
        Add the enable_ldapur flag to conwtrol LDAPUR emission.
        * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag.
        * config/aarch64/aarch64.md (any): Add ldapur_enable attribute.
        * config/aarch64/atomics.md: (aarch64_atomic_load<mode>_rcpc): Modify
        to emit LDAPUR for cores with RCPC2 when enable_ldapur is set.
        (*aarch64_atomic_load<ALLX:mode>_rcpc_zext): Likewise.
        (*aarch64_atomic_load<ALLX:mode>_rcpc_sext): Modified to emit LDAPURS
        for addressing with offsets.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/ldapur.c: New test.

Attachment: 0001-aarch64-Enable-selective-LDAPUR-generation-for-cores.patch
Description: 0001-aarch64-Enable-selective-LDAPUR-generation-for-cores.patch

Reply via email to