[Bug rtl-optimization/46514] New: 128-bit shifts on x86_64 generate silly code unless the shift amount is constant

luto at mit dot edu Tue, 16 Nov 2010 17:37:41 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46514


           Summary: 128-bit shifts on x86_64 generate silly code unless
                    the shift amount is constant
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: l...@mit.edu


Created attachment 22428
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22428
Preprocessed source

I'm using 4.5.1 (Fedora 14) with -O3, but -O2 does the same thing.

This really easy case:

uint64_t shift_test_31(__uint128_t x, uint32_t shift)
{
  if (shift != 31)
    __builtin_unreachable();

  return (uint64_t)(x >> shift);
}

generates:

0000000000000050 <shift_test_31>:
  50:   48 89 f8                mov    %rdi,%rax
  53:   48 0f ac f0 1f          shrd   $0x1f,%rsi,%rax
  58:   c3                      retq   
  59:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

which is entirely sensible.  But this:

uint64_t shift_test_le_31(__uint128_t x, uint32_t shift)
{
  if (shift >= 32)
    __builtin_unreachable();

  return (uint64_t)(x >> shift);
}

generates this:

0000000000000060 <shift_test_le_31>:
  60:   89 d1                   mov    %edx,%ecx
  62:   48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
  67:   48 89 f5                mov    %rsi,%rbp
  6a:   48 0f ad f7             shrd   %cl,%rsi,%rdi
  6e:   48 d3 ed                shr    %cl,%rbp
  71:   f6 c2 40                test   $0x40,%dl
  74:   48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
  79:   48 0f 45 fd             cmovne %rbp,%rdi
  7d:   48 8b 5c 24 f0          mov    -0x10(%rsp),%rbx
  82:   48 8b 6c 24 f8          mov    -0x8(%rsp),%rbp
  87:   48 89 f8                mov    %rdi,%rax
  8a:   c3                      retq   

which contains a pointless shr, test, and cmovne.  (Even if I change the
__builtin_unreachable() into a real branch, I get the same code.)

[Bug rtl-optimization/46514] New: 128-bit shifts on x86_64 generate silly code unless the shift amount is constant

Reply via email to