https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
--- Comment #7 from Jiong Wang <jiwang at gcc dot gnu.org> ---
(In reply to bin.cheng from comment #6)
> Em, is offset valid for [reg+offset] addressing mode? if it is, why don't we
> transform "reg+reg+offset" into "regX <- reg + reg; [regX + offset];"?
that's because for local char array A, if we want to address it's element, like
A[I],
first we get the base address of array A, which is
(plus virtual_stack_vars_rtx, offset),
then we add the index offset I which is in register B:
(plus (plus virtual_stack_vars_rtx, offset), B)
while from my experiment, above will be canonicalized into :
(plus (plus virtual_stack_vars_rtx, B), offset)
and for any target define FRAME_GROWS_DOWNWARD be 1, virtual_stack_vars_rtx
will be eliminated into (plus frame pointer, offset1), instead of (plus,
frame_pointer, const_0) which only happen when FRAME_GROWS_DOWNWARD be 0.
so, transform "reg+reg+offset" into "regX <- reg + reg; [regX + offset];" will
cause some trouble for gcc rtl optimization, because it's finally splitted
into:
regA <- frame - offset0
regA <- regA + regB
regA <- regA + offset1
and somehow, later rtl optimization can't fold offset 0 and offset 1, because
virtual_stack_var_rtx elimination happens at quite later stage in LRA.
so, if we found "virtual_stack_var_rtx + reg + offset", it's better to
associate constant offset with it, which means transform it into
regA <- virtual_stack_var_rtx + offset
regA <- regA + regB
thus the elimination offset will be merged into the array offset automatically
in LRA.
I verified if we add such transform in aarch64's LEGITIMIZE_ADDRESS hook, then
we do generate optimized code for Pinski's sample code:
bar:
stp x29, x30, [sp, -48]!
add x29, sp, 0
stp x19, x20, [sp, 16]
add x19, x29, 32
mov w20, w0
mov x0, x19
bl g
ldrb w0, [x19, w20, sxtw]
bl f
ldp x19, x20, [sp, 16]
ldp x29, x30, [sp], 48
ret