https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117088
Bug ID: 117088 Summary: [15 regression] 548.exchange_r regressed by 10% with -O2 -march=x86-64-v3 after enhance O2 vectorization Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- 548.exchange2_r -11.54% The only regression is from 548.exchange_r, the vectorization for inner loop in each layer of the 9-layer loops increases register pressure and causes more spill. - block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10 - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10 ..... - block(rnext:9, 9, i9) = block(rnext:9, 9, i9) + 10 ... - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10 - block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10 Looks like aarch64 doesn't have the issue because aarch64 has 32 gprs, but x86 only has 16. I have a extra patch to prevent loop vectorization in deep-depth loop for x86 backend which can bring the performance back.