https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438

--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> 
> This might in the end be fallout of different sinking?!
> 
> One difference wrt SLP vs. non-SLP is that with SLP we are taking the
> initial value as the initial value with SLP while with non-SLP we
> are using zero as initial reduction value and compensating at the epilouge:
> 
>   _1615 = {tmp_111, 0.0, 0.0, 0.0};
>   # _1619 = PHI <_1618(116), _1615(119)>
> ...
>   _1623 = .REDUC_PLUS (vect_tmp_1505.835_1621);
> 
> vs.
> 
>   # _1346 = PHI <_1345(98), { 0.0, 0.0, 0.0, 0.0 }(94)>
> ...
>   _1385 = .REDUC_PLUS (vect_tmp_1268.744_1383);
>   _1386 = tmp_710 + _1385;
> 
> so while the profile clearly shows a difference between GCC 14.2 and trunk
> I can't yet pinpoint to what makes the difference.

I guess the non-SLP case happened to break the critical path for REDUCE_PLUS
between main loop and epilogue loop, and enable more parallelism(like partial
sum).

Reply via email to