Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

Wilco Dijkstra Mon, 08 Jan 2018 05:28:02 -0800

Segher Boessenkool wrote:
> On Fri, Jan 05, 2018 at 12:22:44PM +0000, Wilco Dijkstra wrote:
>> An example epilog in a shrinkwrapped function before:
>> 
>> ldp    x21, x22, [sp,#16]
>> ldr    x23, [sp,#32]
>> ldr    x24, [sp,#40]
>> ldp    x25, x26, [sp,#48]
>> ldr    x27, [sp,#64]
>> ldr    x28, [sp,#72]
>> ldr    x30, [sp,#80]
>> ldr    d8, [sp,#88]
>> ldp    x19, x20, [sp],#96
>> ret
>
> In this example, the compiler already can make a ldp for both x23/x24 and
> x27/x28 just fine (if not in emit_epilogue_components, then simply in a
> peephole); why did that not work?  Or is this not the actual generated
> machine code (and there are labels between the insns, for example)?


This block originally had a label in it, 2 blocks emitted identical restores and
then branched to the final epilog. The final epilogue was then duplicated so
we end up with 2 almost identical epilogs of 10 instructions (almost since
there were 1-2 unrelated instructions in both blocks).

Peepholing is very conservative about instructions using SP and won't touch
anything frame related. If this was working better then the backend could just
emit single loads/stores and let peepholing generate LDP/STP.

However this is not the real issue. In the worst case the current code may
only emit LDR and STR. If there are multiple callee-saves in a block, we
want to use LDP/STP, and if there is an odd number of registers, we want
to add a callee-save from an inner block.

Another issue is that after pro_and_epilogue pass I see multiple restores
of the same registers and then a branch to the same block. We should try
to avoid the unnecessary duplication.

Wilco

Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

Reply via email to