https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118657

            Bug ID: 118657
           Summary: Missed optimization (unreachable branch could be
                    pruned after taking into account the possible values
                    of a constexpr lookup table)
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nicula.iccc at gmail dot com
  Target Milestone: ---

For the following code, GCC (14.2 and the current trunk) generates suboptimal
assembly:

    #include <cstddef>
    #include <array>
    #include <cstdint>
    #include <cassert>

    using u8 = uint8_t;

    constexpr size_t DATA_SIZE = 1024;
    constexpr std::array<size_t, 256> TO_DATA_INDEX = {};

    bool foo(int* data, u8 first_idx)
    {
        size_t second_idx = TO_DATA_INDEX[first_idx];
        if (second_idx >= DATA_SIZE)
            return false;

        data[second_idx] = 1;
        return true;
    }

Assembly from godbolt.org with -O3 (https://godbolt.org/z/5f17onGYT):

    foo(int*, unsigned char):
            movzx   esi, sil
            xor     eax, eax
            mov     rdx, QWORD PTR TO_DATA_INDEX[0+rsi*8]
            cmp     rdx, 1023
            ja      .L1
            mov     DWORD PTR [rdi+rdx*4], 1
            mov     eax, 1
    .L1:
            ret
    TO_DATA_INDEX:
            .zero   2048

GCC misses an optimization because it doesn't realize that the 'cmp rax, 1023'
check is redundant, because the only possible value that 'second_idx' can have
is 0.

Clang, on the other hand, sees this, and removes the dead branch. Clang's
assembly with -O3 (same link):

    foo(int*, unsigned char):
            mov     dword ptr [rdi], 1
            mov     al, 1
            ret

Those checks/branches could become significant, especially if you have multiple
layers through which you run this kind of transformation (first_idx ->
second_idx -> third_idx -> ... -> last_idx), and with bounds checking after
each layer. Given that in some contexts the value of 'last_index' is derivable
at compile-time (like in the example I gave above), it would be great if GCC
applied this optimization.

Reply via email to