"H.J. Lu" <hjl.to...@gmail.com> writes:
> On Mon, May 10, 2021 at 6:59 AM Richard Biener
> <richard.guent...@gmail.com> wrote:
>>
>> On Mon, May 10, 2021 at 3:29 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>> >
>> > On Mon, May 10, 2021 at 2:39 AM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> > >
>> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
>> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > >>
>> > > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> > > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu <hjl.to...@gmail.com> wrote:
>> > > >> >>
>> > > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
>> > > >> >> <richard.sandif...@arm.com> wrote:
>> > > >> >> >
>> > > >> >> > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> > > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
>> > > >> >> > > <richard.sandif...@arm.com> wrote:
>> > > >> >> > >>
>> > > >> >> > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> > > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo 
>> > > >> >> > >> > registers so that
>> > > >> >> > >> > associated hard registers can be properly spilled onto 
>> > > >> >> > >> > stack.  But there
>> > > >> >> > >> > are cases where associated hard registers will never be 
>> > > >> >> > >> > spilled onto
>> > > >> >> > >> > stack.  gen_reg_rtx is changed to take an argument for 
>> > > >> >> > >> > register alignment
>> > > >> >> > >> > so that stack realignment can be avoided when not needed.
>> > > >> >> > >>
>> > > >> >> > >> How is it guaranteed that they will never be spilled though?
>> > > >> >> > >> I don't think that that guarantee exists for any kind of 
>> > > >> >> > >> pseudo,
>> > > >> >> > >> except perhaps for the temporary pseudos that the RA creates 
>> > > >> >> > >> to
>> > > >> >> > >> replace (match_scratch …)es.
>> > > >> >> > >>
>> > > >> >> > >
>> > > >> >> > > The caller of creating pseudo registers with specific 
>> > > >> >> > > alignment must
>> > > >> >> > > guarantee that they will never be spilled.   I am only using 
>> > > >> >> > > it in
>> > > >> >> > >
>> > > >> >> > >   /* Make operand1 a register if it isn't already.  */
>> > > >> >> > >   if (can_create_pseudo_p ()
>> > > >> >> > >       && !register_operand (op0, mode)
>> > > >> >> > >       && !register_operand (op1, mode))
>> > > >> >> > >     {
>> > > >> >> > >       /* NB: Don't increase stack alignment requirement when 
>> > > >> >> > > forcing
>> > > >> >> > >          operand1 into a pseudo register to copy data from one 
>> > > >> >> > > memory
>> > > >> >> > >          location to another since it doesn't require a spill. 
>> > > >> >> > >  */
>> > > >> >> > >       emit_move_insn (op0,
>> > > >> >> > >                       force_reg (GET_MODE (op0), op1,
>> > > >> >> > >                                  (UNITS_PER_WORD * 
>> > > >> >> > > BITS_PER_UNIT)));
>> > > >> >> > >       return;
>> > > >> >> > >     }
>> > > >> >> > >
>> > > >> >> > > for vector moves.  RA shouldn't spill it.
>> > > >> >> >
>> > > >> >> > But this is the point: it's a case of hoping that the RA won't 
>> > > >> >> > spill it,
>> > > >> >> > rather than having a guarantee that it won't.
>> > > >> >> >
>> > > >> >> > Even if the moves start out adjacent, they could be separated by 
>> > > >> >> > later
>> > > >> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
>> > > >> >> > scheduling
>> > > >> >> > isn't enabled by default for x86, but it can still be enabled 
>> > > >> >> > explicitly.)
>> > > >> >> > Or if the same data is being copied to two locations, we might 
>> > > >> >> > reuse
>> > > >> >> > values loaded by the first copy for the second copy as well.
>> > > >> >
>> > > >> > There are cases where pseudo vector registers are created as pure
>> > > >> > temporary registers in the backend and they shouldn't ever be 
>> > > >> > spilled
>> > > >> > to stack.   They will be spilled to stack only if there are other 
>> > > >> > non-temporary
>> > > >> > vector register usage in which case stack will be properly 
>> > > >> > re-aligned.
>> > > >> > Caller of creating pseudo registers with specific alignment 
>> > > >> > guarantees
>> > > >> > that they are used only as pure temporary registers.
>> > > >>
>> > > >> I don't think there's really a distinct category of pure temporary
>> > > >> registers though.  The things I mentioned above can happen for any
>> > > >> kind of pseudo register.
>> > > >
>> > > > I wonder if for the cases HJ thinks of it is appropriate to use 
>> > > > hardregs?
>> > > > Do we generally handle those well?  That is, are they again subject
>> > > > to be allocated by RA when no longer live?
>> > >
>> > > Yeah, using hard registers should work.  Of course, any given fixed 
>> > > choice
>> > > of hard register has the potential to be suboptimal in some situation,
>> > > but it should be safe.
>> >
>> > I tried hard registers.  The generated code isn't as good as pseudo 
>> > registers.
>> > But I want to avoid align the shack when YMM registers are only used to
>> > inline memcpy/memset.  Any suggestions?
>>
>> I wonder if we can mark pseudos with a new reg flag, like 'nospill' and
>> enforce this in LRA or ICE if we can't?  That said, we should be able
>> to verify our assumption holds.  Now, we then of course need to avoid
>> CSE re-using such pseudo in ways that could lead to spilling
>> (not sure how that could happen, but ...).
>
> Spill should be rare.  It is up to backends to decide if unaligned spill
> should be used when spill does happen.
>
>> Did you investigate closer what made the hardreg case generate worse
>> code?  Can we hide the copies behind UNSPECs and split them late
>
> I chose XMM7 for memcpy/memset.   Only XMM7 is used for memcpy
> vs XMM0/XMM1/.....

Could you show the kind of code you'd like to generate with multiple
registers?  Also, why doesn't register renaming hide the difference?

One option might be to:

(a) have a pass that:
    - determines which pseudos P might force stack realignment
    - tries to do very simple early RA for P, replacing the pseudos
      with hard registers
    - bails out if it can't handle all P this way, or if something
      else forces stack realignment anyway

(b) only force stack realignment for pseudos after this pass has run

E.g. the pass could be restricted to pseudos that are never live
across block boundaries.

This might help in other situations, not just the memcpy one.

Thanks,
Richard

Reply via email to