On 29/09/2021 12:20, Kyrylo Tkachov via Gcc-patches wrote:
Hi all,

Similar to my previous patch for setmem this one does the same for the cpymem 
expansion.
We count the number of ops emitted and compare it against the alternative of 
just calling
the library function when optimising for size.
For the code:
void
cpy_127 (char *out, char *in)
{
   __builtin_memcpy (out, in, 127);
}

void
cpy_128 (char *out, char *in)
{
   __builtin_memcpy (out, in, 128);
}

we now emit a call to memcpy (with an extra MOV-immediate instruction for the 
size) instead of:
cpy_127(char*, char*):
         ldp     q0, q1, [x1]
         stp     q0, q1, [x0]
         ldp     q0, q1, [x1, 32]
         stp     q0, q1, [x0, 32]
         ldp     q0, q1, [x1, 64]
         stp     q0, q1, [x0, 64]
         ldr     q0, [x1, 96]
         str     q0, [x0, 96]
         ldr     q0, [x1, 111]
         str     q0, [x0, 111]
         ret
cpy_128(char*, char*):
         ldp     q0, q1, [x1]
         stp     q0, q1, [x0]
         ldp     q0, q1, [x1, 32]
         stp     q0, q1, [x0, 32]
         ldp     q0, q1, [x1, 64]
         stp     q0, q1, [x0, 64]
         ldp     q0, q1, [x1, 96]
         stp     q0, q1, [x0, 96]
         ret

which is a clear code size win. Speed optimisation heuristics remain unchanged.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.

Thanks,
Kyrill

2021-09-29  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

        * config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of
        emitted operations and adjust heuristic for code size.
2021-09-29 Kyrylo Tkachov <kyrylo.tkac...@arm.com>

        * gcc.target/aarch64/cpymem-size.c: New test.


Hi Kyrill,

Just to mention that the new test fails with -mabi=ilp32...


Thanks,


Christophe



Reply via email to