https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96573

            Bug ID: 96573
           Summary: [Regression] Regression in optimization on x86-64 with
                    -O3 from GCC 9 to 10
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: remi.andruccioli at gmail dot com
  Target Milestone: ---

Created attachment 49046
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49046&action=edit
This file contains the source code of the function described in the report.

Hello,

I'd like to describe here what seems to be a regression/missed optimization
since GCC 10.

The host and target architecture is x86-64.
I'm using Fedora 32 with Linux 5.7, but I could reproduce it on many other
Linux platforms.
I'm using GCC 10.2, but I could reproduce it with any GCC 10 minor version.


Here is a function, written in pure ANSI C99:

#include <stdlib.h>
#include <stdint.h>

void *
ReverseBytesOfPointer(void * const pointer)
{
  const size_t maxIndex = sizeof(pointer) - 1;
  const uint8_t * const oldPointerPointer = (uint8_t*)&pointer;
  void *newPointer;
  uint8_t * const newPointerPointer = (uint8_t *)&newPointer;
  uint8_t i;

  for (i = 0; i <= maxIndex; ++i) {
    newPointerPointer[maxIndex - i] = oldPointerPointer[i];
  }

  return newPointer;
}


What this function does is simply to reverse all the bytes of a pointer. It is
written in pure C99 and is extremely portable, as it works from 16-bit to
64-bit machines. (I wrote it and use it for embedded development and I'm happy
with it).

What makes it magical is that when compiled with -O3 (and -std=c99 -Wall
-Werror -Wextra), GCC 9 (yes, 9) is clever enough to deduce the intent of this
function and compiles it all as:

ReverseBytesOfPointer:
        mov     rax, rdi
        bswap   rax
        ret


However, since GCC 10, the magic seems to have disappeared. This is the ASM
code that is generated now, with the exact same command line invocation:

ReverseBytesOfPointer:
        movq    %rdi, %rax
        movb    %dil, -1(%rsp)
        movzbl  %ah, %edx
        shrq    $56, %rax
        movb    %dl, -2(%rsp)
        movq    %rdi, %rdx
        shrq    $16, %rdx
        movb    %al, -8(%rsp)
        movb    %dl, -3(%rsp)
        movq    %rdi, %rdx
        shrq    $24, %rdx
        movb    %dl, -4(%rsp)
        movq    %rdi, %rdx
        shrq    $32, %rdx
        movb    %dl, -5(%rsp)
        movq    %rdi, %rdx
        shrq    $40, %rdx
        movb    %dl, -6(%rsp)
        movq    %rdi, %rdx
        shrq    $48, %rdx
        movb    %dl, -7(%rsp)
        movq    -8(%rsp), %rax
        ret


For your convenience, I include here a link to a snippet on compiler-explorer
to show the comparison between versions 9.3 and 10.2. This snippet also
includes a few unit tests:

https://godbolt.org/z/YG1KPf


Regards,

Remi Andruccioli

Reply via email to