[Bug target/62173] [5.0 regression] [AArch64] Performance regression due to r213488

jiwang at gcc dot gnu.org Mon, 24 Nov 2014 04:15:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173


--- Comment #7 from Jiong Wang <jiwang at gcc dot gnu.org> ---
(In reply to bin.cheng from comment #6)
> Em, is offset valid for [reg+offset] addressing mode? if it is, why don't we
> transform "reg+reg+offset" into "regX <- reg + reg; [regX + offset];"?

that's because for local char array A, if we want to address it's element, like
A[I],

first we get the base address of array A, which is

  (plus virtual_stack_vars_rtx, offset),

then we add the index offset I which is in register B:

  (plus (plus virtual_stack_vars_rtx, offset), B)

while from my experiment, above will be canonicalized into :

(plus (plus virtual_stack_vars_rtx, B), offset)


and for any target define FRAME_GROWS_DOWNWARD be 1, virtual_stack_vars_rtx
will be eliminated into (plus frame pointer, offset1), instead of (plus,
frame_pointer, const_0) which only happen when FRAME_GROWS_DOWNWARD be 0.

so, transform "reg+reg+offset" into "regX <- reg + reg; [regX + offset];" will
cause some trouble for gcc rtl optimization, because it's finally splitted
into:

regA <- frame - offset0

regA <- regA + regB

regA <- regA + offset1

and somehow, later rtl optimization can't fold offset 0 and offset 1, because
virtual_stack_var_rtx elimination happens at quite later stage in LRA.

so, if we found "virtual_stack_var_rtx + reg + offset", it's better to
associate constant offset with it, which means transform it into

regA <- virtual_stack_var_rtx + offset
regA <- regA + regB

thus the elimination offset will be merged into the array offset automatically
in LRA.

I verified if we add such transform in aarch64's LEGITIMIZE_ADDRESS hook, then
we do generate optimized code for Pinski's sample code:

bar:
        stp     x29, x30, [sp, -48]!
        add     x29, sp, 0
        stp     x19, x20, [sp, 16]
        add     x19, x29, 32
        mov     w20, w0
        mov     x0, x19
        bl      g
        ldrb    w0, [x19, w20, sxtw]
        bl      f
        ldp     x19, x20, [sp, 16]
        ldp     x29, x30, [sp], 48 
        ret

[Bug target/62173] [5.0 regression] [AArch64] Performance regression due to r213488

Reply via email to