https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #3 from Jens Seifert <jens.seifert at de dot ibm.com> --- GCC 8.3 generates: _Z3shloy: .LFB0: .cfi_startproc addi 9,5,-64 cmpwi 7,9,0 blt 7,.L2 sld 4,3,9 li 3,0 blr .p2align 4,,15 .L2: srdi 9,3,1 subfic 10,5,63 sld 4,4,5 srd 9,9,10 sld 3,3,5 or 4,9,4 blr .long 0 .byte 0,9,0,0,0,0,0,0 .cfi_endproc 8 instructions if taking L2. The branch free code I propsed: _Z15shl_branch_lessoy: .LFB1: .cfi_startproc rldicl 5,5,0,32 subfic 9,5,64 addi 10,5,-64 sld 10,3,10 srd 9,3,9 sld 4,4,5 or 9,9,10 or 4,9,4 sld 3,3,5 blr 8 instructions no branch. Almost everything can be executed in parallel. rldicl 5,5,0,32 gets added by gcc, which is not necessary.