On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> "H.J. Lu" <hjl.to...@gmail.com> writes:
> >> > 1. Update vec_duplicate to allow to fail so that backend can only allow
> >> > broadcasting an integer constant to a vector when broadcast instruction
> >> > is available.  This can be used by memset expander to avoid vec_duplicate
> >> > when loading from constant pool is more efficient.
> >>
> >> I don't see any changes in target-independent code though, other than
> >> the doc update.  It's still the case that (existing) uses of
> >> vec_duplicate_optab do not allow it to fail.
> >
> > I have a followup patch set on
> >
> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
> >
> > to use it to expand memset with vector broadcast:
> >
> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
> >
> > For SSE2 which doesn't have vector broadcast, the constant vector broadcast
> > expander returns FAIL and load from constant pool will be used.
>
> Hmm, but as Jeff and I mentioned in the earlier replies,
> vec_duplicate_optab shouldn't be used for constants.  Constants
> should go via the move expanders instead.
>
> In a previous message I suggested:
>
>   … would it work to change:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && ((icode = optab_handler (vec_duplicate_optab, mode))
>                 != CODE_FOR_nothing)
>             && (elt = uniform_vector_p (exp)))
>
>   to something like:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && (elt = uniform_vector_p (exp)))
>           {
>             if (TREE_CODE (elt) == INTEGER_CST
>                 || TREE_CODE (elt) == POLY_INT_CST
>                 || TREE_CODE (elt) == REAL_CST
>                 || TREE_CODE (elt) == FIXED_CST)
>               {
>                 rtx src = gen_const_vec_duplicate (mode, expand_normal 
> (node));
>                 emit_move_insn (target, src);
>                 break;
>               }
>             …
>           }
>
> if that code was the source of the constant operand.  If we're adding a
> new use of vec_duplicate_optab then that should be similarly protected
> against constant operands.
>

Your comments apply to my initial vec_duplicate patch that caused the
gcc.dg/pr100239.c failure.  It has been fixed by

commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
Author: Richard Biener <rguent...@suse.de>
Date:   Mon Jun 7 20:08:13 2021 +0200

    middle-end/100951 - make sure to generate VECTOR_CST in lowering

    When vector lowering creates piecewise ops make sure to create
    VECTOR_CSTs instead of CONSTRUCTORs when possible.

The problem I am running into now is in my memset vector broadcast
patch.  In order to optimize vector broadcast for memset, I need to
generate a pseudo register for

 __builtin_memset (ops, 3, 38);

only when vector broadcast is available:

  rtx target = nullptr;

  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
  machine_mode vector_mode;
  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
    gcc_unreachable ();

  enum insn_code icode = optab_handler (vec_duplicate_optab,
                                        vector_mode);
  if (icode != CODE_FOR_nothing)
    {
      rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
      class expand_operand ops[2];
      create_output_operand (&ops[0], reg, vector_mode);
      create_input_operand (&ops[1], data, QImode);
      if (maybe_expand_insn (icode, 2, ops))
        {
          if (!rtx_equal_p (reg, ops[0].value))
            emit_move_insn (reg, ops[0].value);
          target = lowpart_subreg (mode, reg, vector_mode);
        }
    }

  return target;  <<< Return nullptr to load from constant pool.

-- 
H.J.

Reply via email to