Segher Boessenkool wrote: > On Fri, Jan 05, 2018 at 12:22:44PM +0000, Wilco Dijkstra wrote: >> An example epilog in a shrinkwrapped function before: >> >> ldp x21, x22, [sp,#16] >> ldr x23, [sp,#32] >> ldr x24, [sp,#40] >> ldp x25, x26, [sp,#48] >> ldr x27, [sp,#64] >> ldr x28, [sp,#72] >> ldr x30, [sp,#80] >> ldr d8, [sp,#88] >> ldp x19, x20, [sp],#96 >> ret > > In this example, the compiler already can make a ldp for both x23/x24 and > x27/x28 just fine (if not in emit_epilogue_components, then simply in a > peephole); why did that not work? Or is this not the actual generated > machine code (and there are labels between the insns, for example)?
This block originally had a label in it, 2 blocks emitted identical restores and then branched to the final epilog. The final epilogue was then duplicated so we end up with 2 almost identical epilogs of 10 instructions (almost since there were 1-2 unrelated instructions in both blocks). Peepholing is very conservative about instructions using SP and won't touch anything frame related. If this was working better then the backend could just emit single loads/stores and let peepholing generate LDP/STP. However this is not the real issue. In the worst case the current code may only emit LDR and STR. If there are multiple callee-saves in a block, we want to use LDP/STP, and if there is an odd number of registers, we want to add a callee-save from an inner block. Another issue is that after pro_and_epilogue pass I see multiple restores of the same registers and then a branch to the same block. We should try to avoid the unnecessary duplication. Wilco