[Bug tree-optimization/114057] [14 Regression] 435.gromacs fails verification with -Ofast -march={znver2,znver4} and PGO after r14-7272-g57f611604e8bab

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 26 Mar 2024 08:32:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114057


--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the ref output is

-3.22397e+05
3.07684e+02
1.06621e+10

and before the change we get

-3.22205e+05
3.05161e+02
1.06660e+10

while after it is

-3.22401e+05
3.11606e+02
1.06579e+10

vectorization differences show in innerf.o, bondfree.o, clincs.o, coupling.o,
disre.o and update.o while all but innerf.o show only less vectorization.
Only using the "bad" version of innerf.o gets us

-3.23378e+05
3.08348e+02
1.06697e+10

which should still PASS.  Replacing all above TUs with the bad objects
reproduces the bad output.

Replacing update.o, disre.o, coupling.o or clincs.o with the GOOD version
doesn't change the output, so it's only innerf.o and bondfree.o making
a difference.  Using only BAD bondfree.o gives

-3.22265e+05
3.07882e+02
1.06644e+10

That would have been OK as well.

The bondfree.o change is small:

@@ -2720,9 +2588,6 @@
 vec.h:252:8: optimized: basic block part vectorized using 8 byte vectors
 vec.h:419:8: optimized: basic block part vectorized using 8 byte vectors
 vec.h:419:8: optimized: basic block part vectorized using 8 byte vectors
-vec.h:378:9: optimized: basic block part vectorized using 16 byte vectors
-vec.h:379:9: optimized: basic block part vectorized using 16 byte vectors
-vec.h:380:9: optimized: basic block part vectorized using 16 byte vectors
 bondfree.c:806:16: optimized: basic block part vectorized using 8 byte vectors
 vec.h:239:8: optimized: basic block part vectorized using 8 byte vectors
 vec.h:265:8: optimized: basic block part vectorized using 8 byte vectors

while the innerf.o changes are many (but possibly similar).

I will see to understand the bondfree change first.  That's the following
change in the function idihs:

 vec.h:380:9: note: Cost model analysis for part in loop 1:
-  Vector cost: 624
-  Scalar cost: 700
-vec.h:380:9: note: Basic block will be vectorized using SLP
-vec.h:252:8: optimized: basic block part vectorized using 8 byte vectors
-vec.h:252:8: optimized: basic block part vectorized using 8 byte vectors
-vec.h:252:8: optimized: basic block part vectorized using 8 byte vectors
-vec.h:419:8: optimized: basic block part vectorized using 8 byte vectors
-vec.h:419:8: optimized: basic block part vectorized using 8 byte vectors
-vec.h:378:9: optimized: basic block part vectorized using 16 byte vectors
-vec.h:379:9: optimized: basic block part vectorized using 16 byte vectors
-vec.h:380:9: optimized: basic block part vectorized using 16 byte vectors
-vec.h:380:9: note: Vectorizing SLP tree:
-vec.h:380:9: note: node 0x345f188 (max_nunits=2, refcnt=1) vector(2) float
+  Vector cost: 640
+  Scalar cost: 532
+vec.h:380:9: missed: not vectorized: vectorization is not profitable.

where it basically changes what nodes we think are live.  Note this is
a larger graph with multiple instances so we might suffer from
what I noted in PR114413.

The IL has all but the call to do_dih_fup inlined into idihs.

[Bug tree-optimization/114057] [14 Regression] 435.gromacs fails verification with -Ofast -march={znver2,znver4} and PGO after r14-7272-g57f611604e8bab

Reply via email to