https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113651

            Bug ID: 113651
           Summary: The GCC optimizer performs poorly on a very simple
                    code snippet.
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cuking998244353 at gmail dot com
  Target Milestone: ---

I would like to report an issue with the GCC optimizer related to a specific
code snippet. The optimizer exhibits suboptimal behavior when handling a very
simple code segment. Below is the core code snippet:


#include <stdint.h>

constexpr uint32_t M = 0x04c11db7;

uint32_t calc_CRC1(const uint8_t *data, int len, uint32_t init_value = 0)
{
    uint32_t r = init_value;
    for (int i = 0; i < len; i++)
    {
        for (int j = 0; j < 8; j++)
        {
            // Issue in Code1:
            r = (r << 1) ^ (data[i] >> (7 - j) & 1) ^ (r >> 31 ? M : 0);

            // The following Code2 works as expected:
            // uint32_t flag = r >> 31 ? M : 0;
            // r = (r << 1) ^ (data[i] >> (7 - j) & 1);
            // r ^= flag;
        }
    }
    for (int i = 0; i < 32; i++)
    {
        r = (r << 1) ^ (r >> 31 ? M : 0);
    }
    return r;
}


When I replace the segment with Code2 instead of Code1, there is a noticeable
improvement in performance.

When I read the assembly code, I noticed that in the first approach, GCC
chooses to compute the result of (r << 1) ^ (data[i] >> (7 - j) & 1) and then
XOR this result with M, obtaining the desired value through cmov.

However, it is evident that there is no dependency among the three
sub-expressions here. Each sub-expression can be computed independently and
then XORed together. This optimization approach, on the contrary, results in a
lengthening of the dependency chain.

If the second code snippet is used, this issue does not arise. Such a simple
modification leads to significant differences in results, and when I switch to
compiling with Clang, there is no noticeable performance difference between the
two. I believe this may indicate the presence of a bug in the GCC optimizer.

GCC version:
gcc.exe (Rev2, Built by MSYS2 project) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

system:
windows11 (it can still be reproduced on Linux.)

cmd:
-O2 -Wall -Wextra

You can view the issue directly through this
link:https://godbolt.org/z/PG6b7xveo

Reply via email to