https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125557

            Bug ID: 125557
           Summary: Missed if-conversion of load addresses
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

The reduced testcase is:
#include <stdint.h>
#include <stddef.h>

const uint8_t *
advance_loop (const uint8_t *ip, size_t tag, const uint8_t *end)
{
  while (ip < end)
    {
      size_t tag_type = tag & 3;
      if (tag_type == 0)
       {
         size_t nlt = (tag >> 2) + 1;
         tag = ip[nlt];
         ip += nlt + 1;
       }
      else
       {
         tag = ip[tag_type];
         ip += tag_type + 1;
       }
    }
  return ip;
}

For aarch64 at -O3 LLVM manages to if-convert it into:
.LBB0_1:
        add     x8, x0, x1, lsr #2
        ands    x9, x1, #0x3
        add     x9, x0, x9
        add     x10, x8, #2
        csinc   x8, x9, x8, ne
        ldrb    w1, [x8]
        csinc   x0, x10, x9, eq
        cmp     x0, x2
        b.lo    .LBB0_1

whereas GCC keeps conditional branches. From what I can tell this is because
both arms of the diamond have memory loads, and GCC doesn't figure out that it
can conditionally select the addresses for an unconditional load.

This loop is hot in the Snappy compression library, where it leads to GCC being
behind LLVM on aarch64 machines.
Perhaps phiopt not transforming PHI <*P, *Q> into P' = PHI <P, Q>; result = *P'
?

Reply via email to