https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261
Bug ID: 82261 Summary: x86: missing peephole for SHLD / SHRD Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: x86_64-*-*, i?86-*-* unsigned shld(unsigned a, unsigned b, unsigned n){ //n=13; a <<= n; b >>= (32-n); //&31; return a|b; } // https://godbolt.org/g/3jbgbR g++ (GCC-Explorer-Build) 8.0.0 20170919 -O3 -march=haswell movl $32, %eax subl %edx, %eax # missed optimization: NEG would work shrx %eax, %esi, %eax shlx %edx, %edi, %esi orl %esi, %eax ret Intel has efficient SHLD/SHRD, so this should be compiled similar to what clang does: movl %edx, %ecx movl %edi, %eax # move first so we overwrite a mov-elimination result right away shldl %cl, %esi, %eax retq Without SHLD, there's another missed optimization: shifts mask their count, and 32 & 31 is 0, so we could just NEG instead of setting up a constant 32. shlx %edx, %edi, %eax neg %edx shrx %edx, %esi, %esi orl %esi, %eax ret This *might* be worth it on AMD, where SHLD is 7 uops and one per 3 clock throughput/latency. Without BMI2, though, it may be good to just use SHLD anyway. There are various inefficiencies (extra copying of the shift count) in the non-BMI2 output, but this bug report is supposed to be about the SHRD/SHLD peephole. (I didn't check for SHRD).