https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
We unroll the loop completely but our basic-block vectorization capabilities do
not include reductions. We see the following there:
<bb 2> [local count: 357878154]:
temp_33 = bytes[0];
_34 = temp_33 >> 32;
temp_35 = temp_33 + _34;
_36 = temp_35 >> 16;
temp_37 = temp_35 + _36;
_38 = temp_37 >> 8;
temp_44 = bytes[1];
_45 = temp_44 >> 32;
temp_46 = temp_44 + _45;
_47 = temp_46 >> 16;
temp_48 = temp_46 + _47;
_40 = temp_37 + temp_48;
_49 = temp_48 >> 8;
_51 = _38 + _40;
result_29 = _49 + _51;
_20 = (unsigned char) result_29;
b ={v} {CLOBBER};
return _20;