"H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On Mon, Jun 28, 2021 at 5:36 AM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>> > On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> >>
>> >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> >> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
>> >> > <richard.sandif...@arm.com> wrote:
>> >> >>
>> >> >> "H.J. Lu" <hjl.to...@gmail.com> writes:
>> >> >> > 1. Update vec_duplicate to allow to fail so that backend can only 
>> >> >> > allow
>> >> >> > broadcasting an integer constant to a vector when broadcast 
>> >> >> > instruction
>> >> >> > is available.  This can be used by memset expander to avoid 
>> >> >> > vec_duplicate
>> >> >> > when loading from constant pool is more efficient.
>> >> >>
>> >> >> I don't see any changes in target-independent code though, other than
>> >> >> the doc update.  It's still the case that (existing) uses of
>> >> >> vec_duplicate_optab do not allow it to fail.
>> >> >
>> >> > I have a followup patch set on
>> >> >
>> >> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
>> >> >
>> >> > to use it to expand memset with vector broadcast:
>> >> >
>> >> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
>> >> >
>> >> > For SSE2 which doesn't have vector broadcast, the constant vector 
>> >> > broadcast
>> >> > expander returns FAIL and load from constant pool will be used.
>> >>
>> >> Hmm, but as Jeff and I mentioned in the earlier replies,
>> >> vec_duplicate_optab shouldn't be used for constants.  Constants
>> >> should go via the move expanders instead.
>> >>
>> >> In a previous message I suggested:
>> >>
>> >>   … would it work to change:
>> >>
>> >>         /* Try using vec_duplicate_optab for uniform vectors.  */
>> >>         if (!TREE_SIDE_EFFECTS (exp)
>> >>             && VECTOR_MODE_P (mode)
>> >>             && eltmode == GET_MODE_INNER (mode)
>> >>             && ((icode = optab_handler (vec_duplicate_optab, mode))
>> >>                 != CODE_FOR_nothing)
>> >>             && (elt = uniform_vector_p (exp)))
>> >>
>> >>   to something like:
>> >>
>> >>         /* Try using vec_duplicate_optab for uniform vectors.  */
>> >>         if (!TREE_SIDE_EFFECTS (exp)
>> >>             && VECTOR_MODE_P (mode)
>> >>             && eltmode == GET_MODE_INNER (mode)
>> >>             && (elt = uniform_vector_p (exp)))
>> >>           {
>> >>             if (TREE_CODE (elt) == INTEGER_CST
>> >>                 || TREE_CODE (elt) == POLY_INT_CST
>> >>                 || TREE_CODE (elt) == REAL_CST
>> >>                 || TREE_CODE (elt) == FIXED_CST)
>> >>               {
>> >>                 rtx src = gen_const_vec_duplicate (mode, expand_normal 
>> >> (node));
>> >>                 emit_move_insn (target, src);
>> >>                 break;
>> >>               }
>> >>             …
>> >>           }
>> >>
>> >> if that code was the source of the constant operand.  If we're adding a
>> >> new use of vec_duplicate_optab then that should be similarly protected
>> >> against constant operands.
>> >>
>> >
>> > Your comments apply to my initial vec_duplicate patch that caused the
>> > gcc.dg/pr100239.c failure.  It has been fixed by
>> >
>> > commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
>> > Author: Richard Biener <rguent...@suse.de>
>> > Date:   Mon Jun 7 20:08:13 2021 +0200
>> >
>> >     middle-end/100951 - make sure to generate VECTOR_CST in lowering
>> >
>> >     When vector lowering creates piecewise ops make sure to create
>> >     VECTOR_CSTs instead of CONSTRUCTORs when possible.
>> >
>> > The problem I am running into now is in my memset vector broadcast
>> > patch.  In order to optimize vector broadcast for memset, I need to
>> > generate a pseudo register for
>> >
>> >  __builtin_memset (ops, 3, 38);
>> >
>> > only when vector broadcast is available:
>> >
>> >   rtx target = nullptr;
>> >
>> >   unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
>> >   machine_mode vector_mode;
>> >   if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
>> >     gcc_unreachable ();
>> >
>> >   enum insn_code icode = optab_handler (vec_duplicate_optab,
>> >                                         vector_mode);
>> >   if (icode != CODE_FOR_nothing)
>> >     {
>> >       rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
>> >       class expand_operand ops[2];
>> >       create_output_operand (&ops[0], reg, vector_mode);
>> >       create_input_operand (&ops[1], data, QImode);
>> >       if (maybe_expand_insn (icode, 2, ops))
>> >         {
>> >           if (!rtx_equal_p (reg, ops[0].value))
>> >             emit_move_insn (reg, ops[0].value);
>> >           target = lowpart_subreg (mode, reg, vector_mode);
>> >         }
>> >     }
>> >
>> >   return target;  <<< Return nullptr to load from constant pool.
>>
>> I don't think this is a correct use of vec_duplicate_optab.  If the
>> scalar operand is a constant then the move should always go through
>> the move expanders instead, as a move from a CONST_VECTOR.
>
> Like this?
>
>   enum insn_code icode = optab_handler (vec_duplicate_optab,
>                                         vector_mode);
>   if (icode != CODE_FOR_nothing)
>     {
>       rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
>       if (CONST_INT_P (data))
>         {
>           /* Use the move expander with CONST_VECTOR.  */
>           rtvec v = rtvec_alloc (nunits);
>           for (unsigned int i = 0; i < nunits; i++)
>             RTVEC_ELT (v, i) = data;
>           rtx const_vec = gen_rtx_CONST_VECTOR (vector_mode, v);

FWIW, this is:

        rtx const_vec = gen_const_vec_duplicate (vector_mode, data);

>           emit_move_insn (reg, const_vec);
>         }
>       else
>         {
>
>           class expand_operand ops[2];
>           create_output_operand (&ops[0], reg, vector_mode);
>           create_input_operand (&ops[1], data, QImode);
>           expand_insn (icode, 2, ops);
>           if (!rtx_equal_p (reg, ops[0].value))
>             emit_move_insn (reg, ops[0].value);
>         }
>       target = lowpart_subreg (mode, reg, vector_mode);
>     }

I guess what I don't understand here is why we want to move a
“vector_mode” constant and take the “mode” subreg of the result.
Won't that get folded to a “mode” move anyway?  It seems more natural
to emit it as a “mode” move from the start.

Also, IMO it looks odd to be guarding the constant case with the
optab check.

Perhaps this is more obvious in the context of the wider patch though.

But yeah, the use of the optab itself looks good.

Thanks,
Richard

Reply via email to