https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119640
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is we have an invariant shift, 1<<mask_nbits in the loop. vectorizable_shift runs into the was_scalar_shift_arg == true, scalar_shift_arg == false, incompatible_op1_vectype_p == true case which makes vectorizable_shift handle the required conversions. But that leaves SLP scheduling with no idea where to schedule the invariant shift - not knowing vectorizable_shift would later insert code in the preheader. Arguably scheduling should at most lift code to the preheader, consistent with a NULL gsi from vect_init_vector but that's difficult as that inserts on edge (immediate) while with a gsi we insert after that. But it's not the time to mess with this. Testing a patch.