SHRD

peter at cordes dot ca Tue, 19 Sep 2017 10:09:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261


            Bug ID: 82261
           Summary: x86: missing peephole for SHLD / SHRD
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

unsigned shld(unsigned a, unsigned b, unsigned n){
        //n=13;
        a <<= n;
        b >>= (32-n); //&31;
        return a|b;
}
// https://godbolt.org/g/3jbgbR

g++ (GCC-Explorer-Build) 8.0.0 20170919 -O3 -march=haswell
        movl    $32, %eax
        subl    %edx, %eax          # missed optimization: NEG would work
        shrx    %eax, %esi, %eax
        shlx    %edx, %edi, %esi
        orl     %esi, %eax
        ret

Intel has efficient SHLD/SHRD, so this should be compiled similar to what clang
does:

        movl    %edx, %ecx
        movl    %edi, %eax           # move first so we overwrite a
mov-elimination result right away
        shldl   %cl, %esi, %eax
        retq

Without SHLD, there's another missed optimization: shifts mask their count, and
32 & 31 is 0, so we could just NEG instead of setting up a constant 32.

        shlx    %edx, %edi, %eax
        neg     %edx
        shrx    %edx, %esi, %esi
        orl     %esi, %eax
        ret

This *might* be worth it on AMD, where SHLD is 7 uops and one per 3 clock
throughput/latency.  Without BMI2, though, it may be good to just use SHLD
anyway.

There are various inefficiencies (extra copying of the shift count) in the
non-BMI2 output, but this bug report is supposed to be about the SHRD/SHLD
peephole.  (I didn't check for SHRD).

[Bug target/82261] New: x86: missing peephole for SHLD / SHRD

Reply via email to