https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #6 from Alexander Nesterovskiy <alexander.nesterovskiy at intel dot com> --- Thanks! I see performance gain on 648.exchange2_s (~6% on Broadwell and ~3% on Skylake-X) reverting performance to r255266 level (Skylake-X regression was ~3%). And loops unrolled with 2 and 3 iterations. It's surely fixed.