https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
--- Comment #10 from Jiong Wang <jiwang at gcc dot gnu.org> ---
Finished a further investigation, looks like the simplest fix to genrate
optimized code for case A is to add one more optimization case in
"eliminate_regs_in_insn".
currently we only optimize "eliminate_reg + const_offset"
if (plus_src
&& CONST_INT_P (XEXP (plus_src, 1)))
while for those arch, like AArch64, Mips, ARM, which support base + offset
addressing mode, the following pattern which is normally for array element
address, like A[I]:
reg T <- eliminate_reg + reg I (which hold value I)
reg D <- MEM(reg T, offset)
we should eliminate into (fold two constant offset immediately):
reg T <- reg_after_eliminate + reg I
reg D <- MEM(reg T, offset + eliminate_offset)
instead of
reg S <- reg_after_eliminate + eliminate_offset
reg T <- reg S + reg I
reg D <- MEM(reg T, offset)
because there are high dependence between D and T, we just need to check
NEXT_INSN when doing the elimination to detect whether there are such pattern.
I'd try this approach which is quite clean, hopefully AArch64, ARM, MIPS could
all be fixed.