https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84547

            Bug ID: 84547
           Summary: Suboptimal code for masked shifts (ARM64)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nruslan_devel at yahoo dot com
  Target Milestone: ---

Partially related to the Bug 84431 (see description of the problem there) but
observed on ARM64 instead of x86/x86-64. (Not sure about ARM32.)

Test example:

__uint128_t func(__uint128_t a, unsigned shift)
{
       return a << (shift & 63);
}

aarch64-linux-gnu-gcc-7 -Wall -O2 -S test.c

GCC generates:
func:
    and w2, w2, 63
    mov w4, 63
    sub w5, w4, w2
    lsr x4, x0, 1
    sub w3, w2, #64
    lsl x1, x1, x2
    cmp w3, 0
    lsr x4, x4, x5
    orr x1, x4, x1
    lsl x4, x0, x3
    lsl x0, x0, x2
    csel    x1, x4, x1, ge
    csel    x0, x0, xzr, lt
    ret


While clang/llvm generates better code:

func:                                   // @func
// BB#0:
    and w8, w2, #0x3f
    lsr x9, x0, #1
    eor x11, x8, #0x3f
    lsl x10, x1, x8
    lsr x9, x9, x11
    orr     x1, x10, x9
    lsl x0, x0, x8
    ret


Another interesting case when __builtin_unreachable() is used:

__uint128_t func(__uint128_t a, unsigned shift)
{
    if (shift > 63)
        __builtin_unreachable();
    return a << shift;
}

But in this case, neither clang/llvm, nor gcc seem to be able to optimize code
well.

Reply via email to