https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118125

--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Redirecting the call to operator delete[](void*) to
__builtin_unreachable(), which seems the correct thing to do, leads to
one more SLP vectorization in the functin experiencing the slow-down,
comparing -fopt-info-optimized gives:

@@ -188,6 +188,7 @@
 /home/mjambor/gcc/mine/inst/include/c++/15.0.1/bits/stl_tree.h:206:27:
optimized: basic block part vectorized using 16 byte vectors
 include/lac/vector.h:990:12: optimized: basic block part vectorized using 8
byte vectors
 include/lac/vector.h:979:31: optimized: basic block part vectorized using 8
byte vectors
+include/lac/solver_gmres.h:498:14: optimized: basic block part vectorized
using 16 byte vectors
 include/lac/vector.h:949:3: optimized: basic block part vectorized using 8
byte vectors
 /home/mjambor/gcc/mine/inst/include/c++/15.0.1/bits/stl_tree.h:206:27:
optimized: basic block part vectorized using 16 byte vectors
 include/lac/vector.h:990:12: optimized: basic block part vectorized using 8
byte vectors

I have mananged to avoid that one particular SLP vectoriation using
-fdbg-cnt=ipa_update_vr:2175-2175:3013-3013,vect_slp:1-61,63-9999 and
was able to get the original performance back.  Unfortunately when I
then looked at SLP vectorization when all IPA-VR propagations were
allowed again, this particular case was not there (but there were
plenty of others).  

If I can read the slp dump correctly (which is a big if), the vectorization
produced the following change:

@@ -30911,15 +26466,16 @@
   # DEBUG this => NULL
   # DEBUG i => NULL
   c_790 = *_789;
+  _272 = {c_790, c_790};
   # DEBUG c => c_790
   # DEBUG this => D#400
   # DEBUG i => D#396
   _792 = _229 + _785;
   # DEBUG this => NULL
   # DEBUG i => NULL
-  # DEBUG dummy => D__lsm0.1125_879
-  _794 = c_790 * D__lsm0.1125_879;
-  _795 = i_758 + 1;
+  # DEBUG dummy => D__lsm0.1125_214
+  _794 = D__lsm0.1125_214 * c_790;
+  _795 = i_754 + 1;
   # DEBUG D#397 => (unsigned int) _795
   # DEBUG this => D#400
   # DEBUG i => D#397
@@ -30929,13 +26485,16 @@
   # DEBUG this => NULL
   # DEBUG i => NULL
   _799 = *_798;
+  _273 = {D__lsm0.1125_214, _799};
   _800 = s_787 * _799;
   _801 = _794 + _800;
-  *_792 = _801;
-  _1111 = s_787 * D__lsm0.1125_879;
+  _1110 = D__lsm0.1125_214 * s_787;
+  _252 = {_800, _1110};
+  vect__2237.1174_737 = .VEC_FMSUBADD (_273, _272, _252);
   _805 = c_790 * _799;
-  _41 = _805 - _1111;
-  *_798 = _41;
+  _41 = _805 - _1110;
+  vectp.1176_781 = _792;
+  MEM <vector(2) double> [(double &)vectp.1176_781] = vect__2237.1174_737;
   # DEBUG i => _795
   if (_298 > _795)
     goto <bb 405>; [89.00%]

Reply via email to