https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90270
--- Comment #4 from bin cheng <amker at gcc dot gnu.org> --- On AArch64, iovpts generates following code: <bb 3> [local count: 954449108]: # crc_20 = PHI <crc_7(D)(2), crc_12(5)> # ivtmp.5_18 = PHI <1(2), ivtmp.5_17(5)> _19 = &final_counts + 18446744073709551612; _1 = MEM[base: _19, index: ivtmp.5_18, step: 4, offset: 0B]; crc_10 = crcu32 (_1, crc_20); _5 = &track_counts + 18446744073709551612; _2 = MEM[base: _5, index: ivtmp.5_18, step: 4, offset: 0B]; crc_12 = crcu32 (_2, crc_10); ivtmp.5_17 = ivtmp.5_18 + 1; if (ivtmp.5_17 != 9) goto <bb 5>; [87.50%] else goto <bb 4>; [12.50%] Which looks optimal to me if _19/_5 can be hoisted out of loop. And it is intended to be hoisted by rtl liv. (TREE liv doesn't help much, that's another story) Problem is in dom3 pass, cprop_operand, _19/_5 is propagated into memory access although it causes invalid addressing mode on AArch64: [&MEM[(void *)&final_counts + -4B], &MEM[(void *)&final_counts + -4B]] EQUIVALENCES: { _19 } (1 elements) Optimizing statement _1 = MEM[base: _19, index: ivtmp.5_18, step: 4, offset: 0B]; Replaced '_19' with constant '&MEM[(void *)&final_counts + -4B]' Folded to: _1 = MEM[symbol: final_counts, index: ivtmp.5_18, step: 4, offset: -4B]; LKUP STMT _1 = MEM[symbol: final_counts, index: ivtmp.5_18, step: 4, offset: -4B] with .MEM_22 2>>> STMT _1 = MEM[symbol: final_counts, index: ivtmp.5_18, step: 4, offset: -4B] with .MEM_22 it's kept in this form to the end of GIMPLE, then badly legitimized. So ivopts worked hard to get addressing mode and invariant expression correct in this case, we need to avoid immature transformations afterwards. BTW, with dom disabled by -fno-tree-dominator-opts, vrp2 does the same transformation too. -fno-tree-vrp is also necessary to get the optimal code. Well, you can argue [base + iv << 2] is sub-optimal comparing to [base + iv], but that's hard to tune. Also bias to the original IV is in general preferred for reasons like smaller setup code, better debug info, and even for performance in complicated loops.