Hi, The GCC vectorizer can't vectorize the following loop even though the target supports 2-lane SIMD left shift.
short a[256], b[256]; foo () { int i; for (i=0; i<256; i++) { a[i] = b[i] << 4; } } The reason seems to be GCC is promoting the source from short to int, then performing left shift on int type and finally a type demotion is done to covert it back to short. Below is the related tree dump: _2 = (intD.1) _1; # RANGE [-524288, 524272] NONZERO 4294967280 _3 = _2 << 4; # RANGE [-32768, 32767] NONZERO 65520 _4 = (short intD.10) _3; # .MEM_8 = VDEF <.MEM_14> aD.1888[i_13] = _4; I checked tree-vect-patterns.c and found there is a pattern recognizer "vect_recog_over_widening_pattern" to recognize such sequences already. But, in vect_operation_fits_smaller_type, it only recognizes the sequences when the promoted type is 4 times wider than the original type. The reason seems to be the original proposal at: https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01472.html is to handle the following sequences where three types are involved, and the width, T_PROMOTED = 2 * T_INTER = 4 * T_ORIG. T_ORIG a; T_PROMOTED b, c; T_INTER d; b = (T_PROMOTED) a; c = b << 2; d = (T_INTER) c; While we could also handle the following sequence where only two types are involved, and T_PROMOTED = 2 * T_ORIG T_ORIG a; T_PROMOTED b, c, d; b = (T_PROMOTED) a; c = b << 2; d = (T_ORIG) c; Performing the left shift on T_ORIG directly should be equal to performing it on T_PROMOTED then converting back to T_ORIG. x86-64/AArch64/PPC64 bootstrap OK (finished on gcc farms) and no regression on check-gcc/g++. gcc/ 2017-09-21 Jon Beniston <j...@beniston.com> * tree-vect-patterns.c (vect_opertion_fits_smaller_type): Allow half_type for LSHIFT_EXPR. diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index cdad261..0abf37c 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -1318,7 +1318,12 @@ vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type, break; case LSHIFT_EXPR: - /* Try intermediate type - HALF_TYPE is not enough for sure. */ + /* Try half_type. */ + if (TYPE_PRECISION (type) == TYPE_PRECISION (half_type) * 2 + && vect_supportable_shift (code, half_type)) + break; + + /* Try intermediate type. */ if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4)) return false;