https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
--- Comment #10 from Jiong Wang <jiwang at gcc dot gnu.org> --- Finished a further investigation, looks like the simplest fix to genrate optimized code for case A is to add one more optimization case in "eliminate_regs_in_insn". currently we only optimize "eliminate_reg + const_offset" if (plus_src && CONST_INT_P (XEXP (plus_src, 1))) while for those arch, like AArch64, Mips, ARM, which support base + offset addressing mode, the following pattern which is normally for array element address, like A[I]: reg T <- eliminate_reg + reg I (which hold value I) reg D <- MEM(reg T, offset) we should eliminate into (fold two constant offset immediately): reg T <- reg_after_eliminate + reg I reg D <- MEM(reg T, offset + eliminate_offset) instead of reg S <- reg_after_eliminate + eliminate_offset reg T <- reg S + reg I reg D <- MEM(reg T, offset) because there are high dependence between D and T, we just need to check NEXT_INSN when doing the elimination to detect whether there are such pattern. I'd try this approach which is quite clean, hopefully AArch64, ARM, MIPS could all be fixed.