https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114057

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so I think the change is that we get to "correctly" notice

-vec.h:380:9: note: node (external) 0x6a2e9d8 (max_nunits=2, refcnt=1)
vector(2) float
-vec.h:380:9: note:     stmt 0 _164 = MEM[(const real *)_27 + 8B];
-vec.h:380:9: note:     stmt 1 _158 = MEM[(const real *)_27];
+vec.h:380:9: note: node (external) 0x5a823a8 (max_nunits=2, refcnt=1)
vector(2) float
+vec.h:380:9: note:     [l] stmt 0 _164 = MEM[(const real *)_27 + 8B];
+vec.h:380:9: note:     [l] stmt 1 _158 = MEM[(const real *)_27];

for the loads we do not handle because of gaps and promoted external.  That
leads to extra costs.

But also

+vec.h:380:9: note: node 0x5a81770 (max_nunits=2, refcnt=2) vector(2) float
 vec.h:380:9: note: op template: x_160 = _158 - _159;
 vec.h:380:9: note:     stmt 0 x_160 = _158 - _159;
-vec.h:380:9: note:     [l] stmt 1 y_163 = _161 - _162;
+vec.h:380:9: note:     stmt 1 y_163 = _161 - _162;

so y_163 isn't considered live for some reason.  We find

_123 = _117 * y_163;

is vectorized as part of a reduction.  On the costing side we then see

-_161 - _162 1 times scalar_stmt costs 12 in body
-MEM[(const real *)_27 + 4B] 1 times scalar_load costs 12 in body
-MEM[(const real *)_24 + 4B] 1 times scalar_load costs 12 in body

which is the live (and dependent) stmts no longer costed on the scalar
side but also

+MEM[(const real *)_27 + 8B] 1 times vec_to_scalar costs 4 in epilogue
+MEM[(const real *)_24 + 8B] 1 times vec_to_scalar costs 4 in epilogue

costed in the vector epilog.  This is because we're conservative as we
don't really know whether we'll be able to code-generate the live
operation.  The costing side here is also not in sync as can be seen
from the _161 - _162 op removed.

I should also note that the setting of PURE_SLP is done a bit too early,
before we analyze operations and eventually throw away instances or
prune it by promoting ops external.

For reductions we also falsely claim all root stmts are vectorized - we
do have remain ops.  Fixing this restores the LIVE on them and in some
way restores vectorization.

I'm going to test this as fix for now.

Reply via email to