https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107359
Spencer Abson <sabson at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sabson at gcc dot gnu.org --- Comment #3 from Spencer Abson <sabson at gcc dot gnu.org> --- Analysis is failing with auto-vec mode VNx2QI because we generally don't support reductions with partial SVE vector modes (VNx2SI in this case). In particular, we're hitting: /* For double reductions, and for SLP reductions with a neutral value, we construct a variable-length initial vector by loading a vector full of the neutral value and then shift-and-inserting the start values into the low-numbered elements. */ if ((double_reduc || neutral_op) && !nunits_out.is_constant () && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, vectype_out, OPTIMIZE_FOR_SPEED)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "reduction on variable-length vectors requires" " target support for a vector-shift-and-insert" " operation.\n"); return false; } (tree-vect-loop.cc) Where neutral_op is an INTEGER_CST with value 0. This check might be too conservative, since we don't actually use VEC_SHL_INSERT in get_initial_defs_for_reduction when group_size == 1. But, as already suggested, we'd fail later on anyway since we don't support REDUC_PLUS with VNx2SI. After a bit of hacking, it seems that the current choice (vectorizing with mode VNx4QI) is preferred over what we can produce using VNx2QI for generic_armv8_a (and others)... but the missing support and current codegen is something to consider either way. FWIW, trunk is also producing: movi v31.2d, #0 fmov w0, s31 ret Which looks like a side-effect of bbro touching the return sequence a bit too late.. (https://godbolt.org/z/nxa653T5a) Spencer