On Mon, Jul 26, 2021 at 3:56 PM H.J. Lu <[email protected]> wrote:
>
> On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford
> <[email protected]> wrote:
> >
> > "H.J. Lu via Gcc-patches" <[email protected]> writes:
> > > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
> > > <[email protected]> wrote:
> > >>
> > >> "H.J. Lu via Gcc-patches" <[email protected]> writes:
> > >> > +to avoid stack realignment when expanding memset. The default is
> > >> > +@code{gen_reg_rtx}.
> > >> > +@end deftypefn
> > >> > +
> > >> > @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned
> > >> > @var{nunroll}, class loop *@var{loop})
> > >> > This target hook returns a new value for the number of times
> > >> > @var{loop}
> > >> > should be unrolled. The parameter @var{nunroll} is the number of times
> > >> > […]
> > >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
> > >> > max_size = STORE_MAX_PIECES + 1;
> > >> > while (max_size > 1 && l > 0)
> > >> > {
> > >> > - scalar_int_mode mode = widest_int_mode_for_size (max_size);
> > >> > + /* Since this can be called before virtual registers are ready
> > >> > + to use, avoid QI vector mode here. */
> > >> > + fixed_size_mode mode
> > >> > + = widest_fixed_size_mode_for_size (max_size, false);
> > >>
> > >> I think I might have asked this before, sorry, but: when is that true
> > >> and why does it matter?
> > >
> > > can_store_by_pieces may be called:
> > >
> > > value-prof.c: if (!can_store_by_pieces (val, builtin_memset_read_str,
> > > value-prof.c: if (!can_store_by_pieces (val, builtin_memset_read_str,
> > >
> > > before virtual registers can be used. When true is passed to
> > > widest_fixed_size_mode_for_size, virtual registers may be used
> > > to expand memset to broadcast, which leads to ICE. Since for the
> > > purpose of can_store_by_pieces, we don't need to expand memset
> > > to broadcast and pass false here can avoid ICE.
> >
> > Ah, I see, thanks.
> >
> > That sounds like a problem in the way that the memset const function is
> > written though. can_store_by_pieces is just a query function, so I don't
> > think it should be trying to create new registers for can_store_by_pieces,
> > even if it could. At the same time, can_store_by_pieces should make the
> > same choices as the real expander would.
> >
> > I think this means that:
> >
> > - gen_memset_broadcast should be inlined into its callers, with the
> > builtin_memset_read_str getting the CONST_INT_P case and
> > builtin_memset_gen_str getting the variable case.
> >
> > - builtin_memset_read_str should then stop at and return the
> > gen_const_vec_duplicate when the prev argument is null.
This doesn't work since can_store_by_pieces has
cst = (*constfun) (constfundata, nullptr, offset, mode);
if (!targetm.legitimate_constant_p (mode, cst))
ix86_legitimate_constant_p only allows 0 or -1 for CONST_VECTOR.
can_store_by_pieces doesn't work well for vector modes.
> > Only when prev is nonnull should it go on to call the hook
> > and copy the constant to the register that the hook returns.
>
> How about keeping gen_memset_broadcast and passing PREV to it:
>
> rtx target;
> if (CONST_INT_P (data))
> {
> rtx const_vec = gen_const_vec_duplicate (mode, data);
> if (prev == NULL)
> /* Return CONST_VECTOR when called by a query function. */
> target = const_vec;
> else
> {
> /* Use the move expander with CONST_VECTOR. */
> target = targetm.gen_memset_scratch_rtx (mode);
> emit_move_insn (target, const_vec);
> }
> }
> else
> {
> target = targetm.gen_memset_scratch_rtx (mode);
> class expand_operand ops[2];
> create_output_operand (&ops[0], target, mode);
> create_input_operand (&ops[1], data, QImode);
> expand_insn (icode, 2, ops);
> if (!rtx_equal_p (target, ops[0].value))
> emit_move_insn (target, ops[0].value);
> }
>
> > I admit that's uglier than the current version, but it looks like that's
> > what the current interface expects.
> >
> > Thanks,
> > Richard
>
>
>
> --
> H.J.
--
H.J.