On 10/6/2021 4:22 AM, Aldy Hernandez wrote:
[Here go the bits by Richi, tested on x86-64 Linux, and ongoing tests
on aarch64 and ppc64le.]

There is a lot of fall-out from this patch, as there were many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I would appreciate someone reviewing the test changes.  I am unsure
whether some of the tests should be removed, XFAILed, or some other
thing.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Note, that this ad-hoc testing is not meant to replace a more thorough
SPEC, etc test.

Tested on x86-64 Linux.

OK pending tests on aarch64 and ppc64le?

Co-authored-by: Richard Biener <rguent...@suse.de>

0001-Disallow-loop-rotation-and-loop-header-crossing-in-j.patch

 From 5c0fb55b45733ec38918f5d7a576719da32e4a6c Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <al...@redhat.com>
Date: Mon, 4 Oct 2021 09:47:02 +0200
Subject: [PATCH] Disallow loop rotation and loop header crossing in jump
  threaders.

There is a lot of fall-out from this patch, as there are many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I would appreciate someone reviewing the test changes.  I am unsure
whether some of the tests should be removed, XFAILed, or some other
thing.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Note, that this ad-hoc testing is not meant to replace a more thorough
SPEC, etc test.

Tested on x86-64 Linux.

OK pending tests on aarch64 and ppc64le?

Co-authored-by: Richard Biener <rguent...@suse.de>

gcc/ChangeLog:

        * tree-ssa-threadupdate.c (cancel_thread): Dump threading reason
        on the same line as the threading cancellation.
        (jt_path_registry::cancel_invalid_paths): Avoid rotating loops.
        Avoid threading through loop headers where the path remains in the
        loop.

libgomp/ChangeLog:

        * testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail.

gcc/testsuite/ChangeLog:

        * gcc.dg/Warray-bounds-87.c: Remove xfail.
        * gcc.dg/analyzer/pr94851-2.c: Remove xfail.
        * gcc.dg/graphite/pr69728.c: Remove xfail.
        * gcc.dg/graphite/scop-dsyr2k.c: Remove xfail.
        * gcc.dg/graphite/scop-dsyrk.c: Remove xfail.
        * gcc.dg/shrink-wrap-loop.c: Remove xfail.
        * gcc.dg/loop-8.c: Adjust for new threading restrictions.
        * gcc.dg/tree-ssa/ifc-20040816-1.c: Same.
        * gcc.dg/tree-ssa/pr21559.c: Same.
        * gcc.dg/tree-ssa/pr59597.c: Same.
        * gcc.dg/tree-ssa/pr71437.c: Same.
        * gcc.dg/tree-ssa/pr77445-2.c: Same.
        * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
        * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Same.
        * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same.
        * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
        * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
        * gcc.dg/vect/bb-slp-16.c: Same.
        * gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
So there is some light fallout on our friend visium.  I must say that having a port which fails to link if there's a call to abort () left in the program is awful helpful :-)    I reviewed the half-dozen or so tests that fail after this change, they all look like any jump threads are for loop rotation.  So I'm inclined to ignore those and just generate new baselines for the visium port.

Some additional thoughts on the tests below.  I didn't see anything that's worth an objection, but at least in a couple cases there's ways to save the test.  In others I might have made a slightly different decision, but I can live with yours.

I think once we reach a consensus on the tests, this will be good to go.


diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
index 90ea1c45524..66318fc08dc 100644
--- a/gcc/testsuite/gcc.dg/loop-8.c
+++ b/gcc/testsuite/gcc.dg/loop-8.c
@@ -24,5 +24,9 @@ f (int *a, int *b)
/* Load of 42 is moved out of the loop, introducing a new pseudo register. */
  /* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
-/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" 
"loop2_invariant" } } */
+
+
+/* ?? The expected behavior below depends on threading the 2->3->5 path
+   in DOM2, but this is invalid since it would rotate the loop.  */
+/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" 
"loop2_invariant" { xfail *-*-* } } } */
So maybe the thing to do here since I guess we want to keep the test would be to manually rotate the loop in the source.  In theory that should restore the test to validating what we want it to validate (specifically the behavior of LICM).

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 0246ebf3c63..f83cefd8d89 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -22,6 +22,8 @@
All the cases are picked up by VRP1 as jump threads. */ -/* There used to be 6 jump threads found by thread1, but they all
-   depended on threading through distinct loops in ethread.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */
+/* This test should be obsoleted.  We used to catch 2 total threads in
+   vrp-thread1, but after adding loop rotating restrictions, we get
+   none.  Interestingly, on x86-64 we now get 1 in DOM2, 5 in DOM3,
+   and 1 in vrp-thread2.  */
+/* { dg-final { scan-tree-dump-not "Threaded" "vrp-thread1" } } */
I think that testing nothing was threaded in vrp1 is probably best. Though I wouldn't lose any sleep if this just went away.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
index 8f0a12c12ee..68808bd09fc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
@@ -1,10 +1,9 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */
+/* { dg-options "-O2 -fdump-statistics" } */
void bla(); -/* In the following case, we should be able to thread edge through
-   the loop header.  */
+/* No one should thread through the loop header.  */
void thread_entry_through_header (void)
  {
@@ -14,8 +13,4 @@ void thread_entry_through_header (void)
      bla ();
  }
-/* There's a single jump thread that should be handled by the VRP
-   jump threading pass.  */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
+/* { dg-final { scan-tree-dump-not "Jumps threaded" "statistics"} } */
Similarly.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index b0a7d423475..24de9d57d50 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,8 +1,12 @@
  /* { dg-do compile } */
  /* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" 
} */
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering jump" 1 "thread3" } } */
+/* ?? We should obsolete this test.  All the threads in thread1 and
+   thread3 we used to get cross the loop header but does not exit the
+   loop, so they have been deemed invalid.  */
+
+/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread3" } } */
int sum0, sum1, sum2, sum3;
  int foo (char *s, char **ret)
Similarly.

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index e68a9b62535..fc3adab3fc3 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -65,5 +65,9 @@ int main (void)
    return 0;
  }
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
+/* ?? The check below depends on jump threading.  There are no a
+   couple threaded paths that are being rejected because they either
+   rotate a loop or cross the loop header without exiting the
+   loop.  */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" { xfail 
*-*-* } } } */
So we could manually rotate the loop in the source which in turn would allow this test to continue to validate SLP's behavior.

Reply via email to