Re: [RFC] split pseudos during loop unrolling in RTL unroller

Jeff Law via Gcc-patches Thu, 23 Apr 2020 07:41:26 -0700

On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
> > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > But being stuck with something means no progress...  I know
> > > > > very well it's 100 times harder to get rid of something than to
> > > > > add something new ontop.
> > > > 
> > > > Well, what progress do you expect to make?  After expand that is :-)
> > > 
> > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > no CSE, ...
> > 
> > RTL CSE for example is very much required to get any good code.  It
> > needs to CSE stuff that wasn't there before expand.
> 
> Sure, but then we should fix that!
Exactly.  It's purpose largely becomes dealing with the redundancies exposed by
expansion.  ie, address arithmetic and the like.   A lot of its path following
code should be throttled back.


> 
> But valid RTL is instructions that are recognized.  Which means
> when the target doesn't support an SImode add we may not create
> one.  That's instruction selection ;)
That's always a point of tension.  But I think that in general continuing to 
have
targets claim to support things they do not (such as double-wordsize arithmetic,
logicals, moves, etc) is a mistake.  It made sense at one time, but I think 
we've
got better mechansisms in place to deal with this stuff now.

> 
> > Is there something particular in postreload-gcse that is bad?  To me it
> > always is just one of those passes that doesn't do anything :-)  That
> > can and should be cleaned up, sure :-)
> 
> postreload-gcse is ad-hoc, it uses full blown gcse tools that easily
> blow up (compute_transp) when it doesn't really require it
> (Ive fixed things up a bit in dc91c65378cd0e6c0).  But I wonder why,
> if we want to do PRE of loads, we don't simply schedule another
> gcse pass rather than implementing a new one.  IIRC what the pass
> does could be done with much more local dataflow.  Both
> postreload gcse and cse are major time-hogs on "bad" testcases :/
I think the biggest reason is the existing gcse bits inherently assume they can
create new registers.  It's deeply baked into gcse.c.  There's ways around that,
but it's likely a lot of work.

> 
> > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > there.  But for low-level, close-to-the-machine stuff, RTL is much
> > better suited.  And we *do* want to optimise at that level as well, and
> > much more than just peepholes.
> 
> Well, everything that requires costing (unrolling, vectorization,
> IV selection to name a few) _is_ close-to-the-machine.  We're
> just saying they are not because GIMPLE is so much easier to
> work with here (not sure why exactly...).
The primary motivation behind discouraging target costing and the like from
gimple was to make it easier to implement and predict the behavior of the gimple
optimizers.   We've relaxed that somewhat, particularly for vectorization, but I
think the principle is still solid.

But I think there is a place for adding target dependencies -- and that's at the
end of the current gimple pipeline.

Jeff

Re: [RFC] split pseudos during loop unrolling in RTL unroller

Reply via email to