punpkhi compare to llvm

sabson at gcc dot gnu.org via Gcc-bugs Wed, 23 Jul 2025 05:04:20 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107359


Spencer Abson <sabson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sabson at gcc dot gnu.org

--- Comment #3 from Spencer Abson <sabson at gcc dot gnu.org> ---
Analysis is failing with auto-vec mode VNx2QI because we generally don't
support reductions with partial SVE vector modes (VNx2SI in this case).  In
particular, we're hitting:

  /* For double reductions, and for SLP reductions with a neutral value,
     we construct a variable-length initial vector by loading a vector
     full of the neutral value and then shift-and-inserting the start
     values into the low-numbered elements.  */
  if ((double_reduc || neutral_op)
      && !nunits_out.is_constant ()
      && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
                                          vectype_out, OPTIMIZE_FOR_SPEED))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "reduction on variable-length vectors requires"
                         " target support for a vector-shift-and-insert"
                         " operation.\n");
      return false;
    }

(tree-vect-loop.cc)

Where neutral_op is an INTEGER_CST with value 0.  This check might be too
conservative, since we don't actually use VEC_SHL_INSERT in
get_initial_defs_for_reduction when group_size == 1.  But, as already
suggested, we'd fail later on anyway since we don't support REDUC_PLUS with
VNx2SI.

After a bit of hacking, it seems that the current choice (vectorizing with mode
VNx4QI) is preferred over what we can produce using VNx2QI for generic_armv8_a
(and others)... but the missing support and current codegen is something to
consider either way.

FWIW, trunk is also producing:

        movi    v31.2d, #0
        fmov    w0, s31
        ret

Which looks like a side-effect of bbro touching the return sequence a bit too
late.. (https://godbolt.org/z/nxa653T5a)

Spencer

[Bug target/107359] [aarch64] should avoid the punpklo/punpkhi compare to llvm

Reply via email to