https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704

--- Comment #3 from Jens Seifert <jens.seifert at de dot ibm.com> ---
GCC 8.3 generates:
_Z3shloy:
.LFB0:
        .cfi_startproc
        addi 9,5,-64
        cmpwi 7,9,0
        blt 7,.L2
        sld 4,3,9
        li 3,0
        blr
        .p2align 4,,15
.L2:
        srdi 9,3,1
        subfic 10,5,63
        sld 4,4,5
        srd 9,9,10
        sld 3,3,5
        or 4,9,4
        blr
        .long 0
        .byte 0,9,0,0,0,0,0,0
        .cfi_endproc

8 instructions if taking L2. The branch free code I propsed:

_Z15shl_branch_lessoy:
.LFB1:
        .cfi_startproc
        rldicl 5,5,0,32
        subfic 9,5,64
        addi 10,5,-64
        sld 10,3,10
        srd 9,3,9
        sld 4,4,5
        or 9,9,10
        or 4,9,4
        sld 3,3,5
        blr

8 instructions no branch. Almost everything can be executed in parallel.

rldicl 5,5,0,32 gets added by gcc, which is not necessary.

Reply via email to