On Mon, May 10, 2021 at 4:12 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Mon, May 10, 2021 at 6:59 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Mon, May 10, 2021 at 3:29 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Mon, May 10, 2021 at 2:39 AM Richard Sandiford > > > <richard.sandif...@arm.com> wrote: > > > > > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > >> > > > > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > > > > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu <hjl.to...@gmail.com> > > > > >> > wrote: > > > > >> >> > > > > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford > > > > >> >> <richard.sandif...@arm.com> wrote: > > > > >> >> > > > > > >> >> > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > > > > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford > > > > >> >> > > <richard.sandif...@arm.com> wrote: > > > > >> >> > >> > > > > >> >> > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > > > > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo > > > > >> >> > >> > registers so that > > > > >> >> > >> > associated hard registers can be properly spilled onto > > > > >> >> > >> > stack. But there > > > > >> >> > >> > are cases where associated hard registers will never be > > > > >> >> > >> > spilled onto > > > > >> >> > >> > stack. gen_reg_rtx is changed to take an argument for > > > > >> >> > >> > register alignment > > > > >> >> > >> > so that stack realignment can be avoided when not needed. > > > > >> >> > >> > > > > >> >> > >> How is it guaranteed that they will never be spilled though? > > > > >> >> > >> I don't think that that guarantee exists for any kind of > > > > >> >> > >> pseudo, > > > > >> >> > >> except perhaps for the temporary pseudos that the RA creates > > > > >> >> > >> to > > > > >> >> > >> replace (match_scratch …)es. > > > > >> >> > >> > > > > >> >> > > > > > > >> >> > > The caller of creating pseudo registers with specific > > > > >> >> > > alignment must > > > > >> >> > > guarantee that they will never be spilled. I am only using > > > > >> >> > > it in > > > > >> >> > > > > > > >> >> > > /* Make operand1 a register if it isn't already. */ > > > > >> >> > > if (can_create_pseudo_p () > > > > >> >> > > && !register_operand (op0, mode) > > > > >> >> > > && !register_operand (op1, mode)) > > > > >> >> > > { > > > > >> >> > > /* NB: Don't increase stack alignment requirement when > > > > >> >> > > forcing > > > > >> >> > > operand1 into a pseudo register to copy data from > > > > >> >> > > one memory > > > > >> >> > > location to another since it doesn't require a > > > > >> >> > > spill. */ > > > > >> >> > > emit_move_insn (op0, > > > > >> >> > > force_reg (GET_MODE (op0), op1, > > > > >> >> > > (UNITS_PER_WORD * > > > > >> >> > > BITS_PER_UNIT))); > > > > >> >> > > return; > > > > >> >> > > } > > > > >> >> > > > > > > >> >> > > for vector moves. RA shouldn't spill it. > > > > >> >> > > > > > >> >> > But this is the point: it's a case of hoping that the RA won't > > > > >> >> > spill it, > > > > >> >> > rather than having a guarantee that it won't. > > > > >> >> > > > > > >> >> > Even if the moves start out adjacent, they could be separated > > > > >> >> > by later > > > > >> >> > RTL optimisations, particularly scheduling. (I realise pre-RA > > > > >> >> > scheduling > > > > >> >> > isn't enabled by default for x86, but it can still be enabled > > > > >> >> > explicitly.) > > > > >> >> > Or if the same data is being copied to two locations, we might > > > > >> >> > reuse > > > > >> >> > values loaded by the first copy for the second copy as well. > > > > >> > > > > > >> > There are cases where pseudo vector registers are created as pure > > > > >> > temporary registers in the backend and they shouldn't ever be > > > > >> > spilled > > > > >> > to stack. They will be spilled to stack only if there are other > > > > >> > non-temporary > > > > >> > vector register usage in which case stack will be properly > > > > >> > re-aligned. > > > > >> > Caller of creating pseudo registers with specific alignment > > > > >> > guarantees > > > > >> > that they are used only as pure temporary registers. > > > > >> > > > > >> I don't think there's really a distinct category of pure temporary > > > > >> registers though. The things I mentioned above can happen for any > > > > >> kind of pseudo register. > > > > > > > > > > I wonder if for the cases HJ thinks of it is appropriate to use > > > > > hardregs? > > > > > Do we generally handle those well? That is, are they again subject > > > > > to be allocated by RA when no longer live? > > > > > > > > Yeah, using hard registers should work. Of course, any given fixed > > > > choice > > > > of hard register has the potential to be suboptimal in some situation, > > > > but it should be safe. > > > > > > I tried hard registers. The generated code isn't as good as pseudo > > > registers. > > > But I want to avoid align the shack when YMM registers are only used to > > > inline memcpy/memset. Any suggestions? > > > > I wonder if we can mark pseudos with a new reg flag, like 'nospill' and > > enforce this in LRA or ICE if we can't? That said, we should be able > > to verify our assumption holds. Now, we then of course need to avoid > > CSE re-using such pseudo in ways that could lead to spilling > > (not sure how that could happen, but ...). > > Spill should be rare. It is up to backends to decide if unaligned spill > should be used when spill does happen.
Can we transparently decide this somehow? Thus when we didn't do stack re-alignment force unaligned spills? > > Did you investigate closer what made the hardreg case generate worse > > code? Can we hide the copies behind UNSPECs and split them late > > I chose XMM7 for memcpy/memset. Only XMM7 is used for memcpy > vs XMM0/XMM1/..... > > > after reload? Or is that too awkward to support when generating the > > sequence from the middle-end (I suppose it's not going via the optabs?) > > That is correct. > > > Richard. > > > > > Thanks. > > > > > > -- > > > H.J. > > > > -- > H.J.