https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119912

            Bug ID: 119912
           Summary: PPC: Inefficient vector immediate shifts
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jens.seifert at de dot ibm.com
  Target Milestone: ---

Shifts by <element bit width>-1 should be performed by a 0xFF..FF constant as
PPC has modulo shift and the constant generation for 0xFF..FF requires just 1
instruction.

On Power9 always use a byte mask for the shift amount that xxspltib can be
used.
On Power8 use vspltisb for the 
- value range 0..15 and 16..31 for int
- value range 0..15 for short
- value range 0..7 for byte
- 0..15 48..63 for long long.

1 byte shift left as add is done already by gcc.

Sample:
#include <altivec.h>

vector unsigned int shl31(vector unsigned int in)
{
    return vec_sl(in, (vector unsigned int)vec_splats((unsigned char)31));
}

Today on Power8/9:
shl31(unsigned int __vector(4)):
.LCF0:
0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        addis 9,2,.LC0@toc@ha
        addi 9,9,.LC0@toc@l
        lxv 32,0(9)
        vslw 2,2,0
        blr

Should be done by:
Power8:
        vspltisw 0,-1
        vslw 2,2,0
Power9:
        xxspltib 34,31
        vslw 2,2,0

Reply via email to