https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121315

--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Here is a reduced testcase (compile with -O3 -mcpu=neoverse-v2):

void copyReverseGeneric(int *dst, int *src) {
  for (int i = 0; i < 10000; ++i)
    dst[i] = __builtin_bswap32(src[i]);
}

of course using LDP/STP here would result in an extra add over the current
codegen (even auto-inc LDP/STP doesn't come for free), but maybe it is
worthwhile.  I will look into it.

Reply via email to