Hi, The following simple test fails when attempting to convert a vector shift-by-scalar into a vector shift-by-vector.
typedef unsigned char v16ui __attribute__((vector_size(16))); v16ui vslb(v16ui v, unsigned char i) { return v << i; } When this code is gimplified, the shift amount gets expanded to an unsigned int: vslb (v16ui v, unsigned char i) { v16ui D.2300; unsigned int D.2301; D.2301 = (unsigned int) i; D.2300 = v << D.2301; return D.2300; } In expand_binop, the shift-by-scalar is converted into a shift-by-vector using expand_vector_broadcast, which produces the following rtx to be used to initialize a V16QI vector: (parallel:V16QI [ (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) ]) The back end eventually chokes trying to generate a copy of the SImode expression into a QImode memory slot. This patch fixes this problem by ensuring that the shift amount is truncated to the inner mode of the vector when necessary. I've added a test case verifying correct PowerPC code generation in this case. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2015-08-31 Bill Schmidt <wschm...@linux.vnet.ibm.com> * optabs.c (expand_binop): Don't create a broadcast vector with a source element wider than the inner mode. [gcc/testsuite] 2015-08-31 Bill Schmidt <wschm...@linux.vnet.ibm.com> * gcc.target/powerpc/vec-shift.c: New test. Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 227353) +++ gcc/optabs.c (working copy) @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) { + /* The scalar may have been extended to be too wide. Truncate + it back to the proper size to fit in the broadcast vector. */ + machine_mode inner_mode = GET_MODE_INNER (mode); + if (GET_MODE_BITSIZE (inner_mode) + < GET_MODE_BITSIZE (GET_MODE (op1))) + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, + GET_MODE (op1)); rtx vop1 = expand_vector_broadcast (mode, op1); if (vop1) { Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-mcpu=power7 -O2" } */ + +/* This used to ICE. During gimplification, "i" is widened to an unsigned + int. We used to fail at expand time as we tried to cram an SImode item + into a QImode memory slot. This has been fixed to properly truncate the + shift amount when splatting it into a vector. */ + +typedef unsigned char v16ui __attribute__((vector_size(16))); + +v16ui vslb(v16ui v, unsigned char i) +{ + return v << i; +} + +/* { dg-final { scan-assembler "vspltb" } } */ +/* { dg-final { scan-assembler "vslb" } } */