Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

Jeff Law via Gcc-patches Tue, 19 Oct 2021 16:06:42 -0700



On 10/19/2021 1:33 AM, Aldy Hernandez wrote:

On Tue, Oct 19, 2021 at 8:52 AM Richard Biener
<[email protected]> wrote:

On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez <[email protected]> wrote:



On 10/18/21 3:41 PM, Aldy Hernandez wrote:

I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader clients,
not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).

It occurs to me that we could also remove the threading before VRP
passes, and enable a fully-resolving backward threader after VRP.  I
haven't played with this scenario, but it should be just as good.  That
being said, I don't know the intricacies of why we had both pre and post
VRP threading passes, and if one is ideally better than the other.

It was done because they were different threaders.  Since the new threader
uses built-in VRP it shouldn't really matter whether it's before or after
VRP _for the threading_, but it might be that if threading runs before VRP
then VRP itself can do a better job on cleaning up the IL.

Good point.

FWIW, earlier this season I played with replacing the VRPs with evrp
instances (which fold far more conditionals) and I found that the
threaders can actually find LESS opportunities after *vrp fold away
things.  I don't know if this is a good or a bad thing.  Perhaps we
should benchmark three alternatives:

This is expected. VRP and DOM will sometimes find conditionals thatthey can fully optimize away. If those conditionals are left in the IL,the threaders would sometimes pick them up.

So as we fold more in VRP/DOM, I'm not surprised there's fewer thingsfor the threaders to find. In general, if a conditional can be removedby VRP/DOM, that's the preference.


1. Mainline
2. Fully resolving threader -> VRP -> No threading.
3. No threading -> VRP -> Full resolving threader.

...and see what the actual effect is, regardless of number of threaded paths.

Speak of which, what's the blessed way of benchmarking performance
nowadays?  I've seen some PRs fly that measure some more lightweight
benchmarks (open source?) than a full blown SPEC.

Can't speak for anyone else, but I benchmark these days with spec in acycle-approximate simulator :-) I'm isolated from the whims of turbomodes and the like. Of course it takes considerably longer than runningon real hardware :-)

+      /* ?? Is this still needed.  ?? */
        /* Threading can leave many const/copy propagations in the IL.
          Clean them up.  Instead of just copy_prop, we use ccp to
          compute alignment and nonzero bits.  */

Yes, it's still needed but not for the stated reason - the VRP
substitution and folding stage should deal with copy/constant propagation
but we replaced the former copy propagation with CCP to re-compute
nonzero bits & alignment so I'd change the comment to

    /* Run CCP to compute alignment and nonzero bits.  */

The threaders (really the copiers) don't make much, if any, attempt toclean up the IL. So, for example, they'll often leave degenerate PHIsin the IL. We need to clean that crap up or we'll get false positivesin the middle-end diagnostics.


Jeff

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

Reply via email to