On Mon, Jun 28, 2021 at 5:36 AM Richard Sandiford
<[email protected]> wrote:
>
> "H.J. Lu" <[email protected]> writes:
> > On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
> > <[email protected]> wrote:
> >>
> >> "H.J. Lu via Gcc-patches" <[email protected]> writes:
> >> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
> >> > <[email protected]> wrote:
> >> >>
> >> >> "H.J. Lu" <[email protected]> writes:
> >> >> > 1. Update vec_duplicate to allow to fail so that backend can only
> >> >> > allow
> >> >> > broadcasting an integer constant to a vector when broadcast
> >> >> > instruction
> >> >> > is available. This can be used by memset expander to avoid
> >> >> > vec_duplicate
> >> >> > when loading from constant pool is more efficient.
> >> >>
> >> >> I don't see any changes in target-independent code though, other than
> >> >> the doc update. It's still the case that (existing) uses of
> >> >> vec_duplicate_optab do not allow it to fail.
> >> >
> >> > I have a followup patch set on
> >> >
> >> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
> >> >
> >> > to use it to expand memset with vector broadcast:
> >> >
> >> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
> >> >
> >> > For SSE2 which doesn't have vector broadcast, the constant vector
> >> > broadcast
> >> > expander returns FAIL and load from constant pool will be used.
> >>
> >> Hmm, but as Jeff and I mentioned in the earlier replies,
> >> vec_duplicate_optab shouldn't be used for constants. Constants
> >> should go via the move expanders instead.
> >>
> >> In a previous message I suggested:
> >>
> >> … would it work to change:
> >>
> >> /* Try using vec_duplicate_optab for uniform vectors. */
> >> if (!TREE_SIDE_EFFECTS (exp)
> >> && VECTOR_MODE_P (mode)
> >> && eltmode == GET_MODE_INNER (mode)
> >> && ((icode = optab_handler (vec_duplicate_optab, mode))
> >> != CODE_FOR_nothing)
> >> && (elt = uniform_vector_p (exp)))
> >>
> >> to something like:
> >>
> >> /* Try using vec_duplicate_optab for uniform vectors. */
> >> if (!TREE_SIDE_EFFECTS (exp)
> >> && VECTOR_MODE_P (mode)
> >> && eltmode == GET_MODE_INNER (mode)
> >> && (elt = uniform_vector_p (exp)))
> >> {
> >> if (TREE_CODE (elt) == INTEGER_CST
> >> || TREE_CODE (elt) == POLY_INT_CST
> >> || TREE_CODE (elt) == REAL_CST
> >> || TREE_CODE (elt) == FIXED_CST)
> >> {
> >> rtx src = gen_const_vec_duplicate (mode, expand_normal
> >> (node));
> >> emit_move_insn (target, src);
> >> break;
> >> }
> >> …
> >> }
> >>
> >> if that code was the source of the constant operand. If we're adding a
> >> new use of vec_duplicate_optab then that should be similarly protected
> >> against constant operands.
> >>
> >
> > Your comments apply to my initial vec_duplicate patch that caused the
> > gcc.dg/pr100239.c failure. It has been fixed by
> >
> > commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
> > Author: Richard Biener <[email protected]>
> > Date: Mon Jun 7 20:08:13 2021 +0200
> >
> > middle-end/100951 - make sure to generate VECTOR_CST in lowering
> >
> > When vector lowering creates piecewise ops make sure to create
> > VECTOR_CSTs instead of CONSTRUCTORs when possible.
> >
> > The problem I am running into now is in my memset vector broadcast
> > patch. In order to optimize vector broadcast for memset, I need to
> > generate a pseudo register for
> >
> > __builtin_memset (ops, 3, 38);
> >
> > only when vector broadcast is available:
> >
> > rtx target = nullptr;
> >
> > unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
> > machine_mode vector_mode;
> > if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> > gcc_unreachable ();
> >
> > enum insn_code icode = optab_handler (vec_duplicate_optab,
> > vector_mode);
> > if (icode != CODE_FOR_nothing)
> > {
> > rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
> > class expand_operand ops[2];
> > create_output_operand (&ops[0], reg, vector_mode);
> > create_input_operand (&ops[1], data, QImode);
> > if (maybe_expand_insn (icode, 2, ops))
> > {
> > if (!rtx_equal_p (reg, ops[0].value))
> > emit_move_insn (reg, ops[0].value);
> > target = lowpart_subreg (mode, reg, vector_mode);
> > }
> > }
> >
> > return target; <<< Return nullptr to load from constant pool.
>
> I don't think this is a correct use of vec_duplicate_optab. If the
> scalar operand is a constant then the move should always go through
> the move expanders instead, as a move from a CONST_VECTOR.
Like this?
enum insn_code icode = optab_handler (vec_duplicate_optab,
vector_mode);
if (icode != CODE_FOR_nothing)
{
rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
if (CONST_INT_P (data))
{
/* Use the move expander with CONST_VECTOR. */
rtvec v = rtvec_alloc (nunits);
for (unsigned int i = 0; i < nunits; i++)
RTVEC_ELT (v, i) = data;
rtx const_vec = gen_rtx_CONST_VECTOR (vector_mode, v);
emit_move_insn (reg, const_vec);
}
else
{
class expand_operand ops[2];
create_output_operand (&ops[0], reg, vector_mode);
create_input_operand (&ops[1], data, QImode);
expand_insn (icode, 2, ops);
if (!rtx_equal_p (reg, ops[0].value))
emit_move_insn (reg, ops[0].value);
}
target = lowpart_subreg (mode, reg, vector_mode);
}
--
H.J.