https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115102

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I believe this might be the middle-end using bswap32 (you can try to confirm
for SH by looking at the dump generated by -fdump-tree-optimized).

For x86_64 we get

uint32_t bswap8 (uint32_t val)
{
  unsigned int _1;
  unsigned int bswapdst_4;
  uint32_t _8;
  unsigned int _10;
  unsigned int bswapmaskdst_11;

  <bb 2> [local count: 1073741824]:
  _1 = val_7(D) & 4294901760;
  bswapdst_4 = __builtin_bswap32 (val_7(D));
  bswapmaskdst_11 = bswapdst_4 & 4294901760;
  _10 = bswapmaskdst_11 r>> 16;
  _8 = _1 | _10;
  return _8;

and a similar

bswap8:
.LFB0:
        .cfi_startproc
        movl    %edi, %eax
        xorw    %di, %di
        bswap   %eax
        shrl    $16, %eax
        orl     %edi, %eax
        ret

though on x86 there's no high word preserving swap of the lower 2 bytes.

Reply via email to