On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt >> <wschm...@linux.vnet.ibm.com> wrote: >> > Hi, >> > >> > The following simple test fails when attempting to convert a vector >> > shift-by-scalar into a vector shift-by-vector. >> > >> > typedef unsigned char v16ui __attribute__((vector_size(16))); >> > >> > v16ui vslb(v16ui v, unsigned char i) >> > { >> > return v << i; >> > } >> > >> > When this code is gimplified, the shift amount gets expanded to an >> > unsigned int: >> > >> > vslb (v16ui v, unsigned char i) >> > { >> > v16ui D.2300; >> > unsigned int D.2301; >> > >> > D.2301 = (unsigned int) i; >> > D.2300 = v << D.2301; >> > return D.2300; >> > } >> > >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector >> > using expand_vector_broadcast, which produces the following rtx to be >> > used to initialize a V16QI vector: >> > >> > (parallel:V16QI [ >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > ]) >> > >> > The back end eventually chokes trying to generate a copy of the SImode >> > expression into a QImode memory slot. >> > >> > This patch fixes this problem by ensuring that the shift amount is >> > truncated to the inner mode of the vector when necessary. I've added a >> > test case verifying correct PowerPC code generation in this case. >> > >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no >> > regressions. Is this ok for trunk? >> > >> > Thanks, >> > Bill >> > >> > >> > [gcc] >> > >> > 2015-08-31 Bill Schmidt <wschm...@linux.vnet.ibm.com> >> > >> > * optabs.c (expand_binop): Don't create a broadcast vector with a >> > source element wider than the inner mode. >> > >> > [gcc/testsuite] >> > >> > 2015-08-31 Bill Schmidt <wschm...@linux.vnet.ibm.com> >> > >> > * gcc.target/powerpc/vec-shift.c: New test. >> > >> > >> > Index: gcc/optabs.c >> > =================================================================== >> > --- gcc/optabs.c (revision 227353) >> > +++ gcc/optabs.c (working copy) >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r >> > >> > if (otheroptab && optab_handler (otheroptab, mode) != >> > CODE_FOR_nothing) >> > { >> > + /* The scalar may have been extended to be too wide. Truncate >> > + it back to the proper size to fit in the broadcast vector. */ >> > + machine_mode inner_mode = GET_MODE_INNER (mode); >> > + if (GET_MODE_BITSIZE (inner_mode) >> > + < GET_MODE_BITSIZE (GET_MODE (op1))) >> >> Does that work for modeless constants? Btw, what do other targets do >> here? Do they >> also choke or do they cope with the wide operand? > > Good question. This works by serendipity more than by design. Because > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE > won't be generated. It would be better for me to put in an explicit > check for CONST_INT rather than relying on this, though. I'll fix that. > > I am not sure what other targets do here; I can check. However, do you > think that's relevant? I'm concerned that > > (parallel:V16QI [ > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > ]) > > is a nonsensical expression and shouldn't be produced by common code, in > my view. It seems best to make this explicitly correct. Please let me > know if that's off-base.
No, the above indeed looks fishy though other backends vec_init_optab might have just handle it fine. OTOH if a conversion is required it would be nice to CSE it, thus force the result to a register (not sure if the targets handle invalid RTL sharing in vec_init_optab). > Thanks, > Bill > >> >> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, >> > + GET_MODE (op1)); >> > rtx vop1 = expand_vector_broadcast (mode, op1); >> > if (vop1) >> > { >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c >> > =================================================================== >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) >> > @@ -0,0 +1,20 @@ >> > +/* { dg-do compile { target { powerpc*-*-* } } } */ >> > +/* { dg-require-effective-target powerpc_altivec_ok } */ >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { >> > "-mcpu=power7" } } */ >> > +/* { dg-options "-mcpu=power7 -O2" } */ >> > + >> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned >> > + int. We used to fail at expand time as we tried to cram an SImode item >> > + into a QImode memory slot. This has been fixed to properly truncate >> > the >> > + shift amount when splatting it into a vector. */ >> > + >> > +typedef unsigned char v16ui __attribute__((vector_size(16))); >> > + >> > +v16ui vslb(v16ui v, unsigned char i) >> > +{ >> > + return v << i; >> > +} >> > + >> > +/* { dg-final { scan-assembler "vspltb" } } */ >> > +/* { dg-final { scan-assembler "vslb" } } */ >> > >> > >> > >> > >