Steve Ellcey <sell...@marvell.com> writes: > This is a patch to fix the register allocation in SIMD functions. In > normal functions registers V16 to V31 are all caller saved. In SIMD > functions V16 to V23 are callee saved and V24 to V31 are caller saved. > This means that SIMD functions should use V24 to V31 before V16 to V23 > in order to avoid some saves and restores. > > My fix for this is to define REG_ALLOC_ORDER. Right now it is not > defined on Aarch64, so I just defined it as all the registers in order > except for putting V24 to V31 before V16 to V23. This fixes the > register allocation in SIMD functions. It also changes the register > allocation order in regular functions but since all the registers (V16 > to V31) are caller saved in that case, it doesn't matter. We could use > ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see > any reason to do that.
REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones in the PR. Like you say, we don't currently need it to handle the equivalent situation for the standard ABI. I think the problem is that the RA is still using the register sets for the default ABI when evaluating the prologue/epilogue cost of using a hard register. E.g. see calculate_saved_nregs. Maybe one fix would be to add an equivalent of call_used_reg_set to rtl_data. By default it would be the same as call_used_reg_set, but the target could have an opportunity to change it. Then code like calculate_saved_nregs should use the new set to find out what registers the current function can use without spilling. This would also be useful for targets that implement interrupt handler attributes. It would be good to add the testcase in the PR to the testsuite, with a scan-assembler to check for spills. > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 7bd3bf5..d3723ff 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version; > ZR zero register, encoded as X/R31 elsewhere > > 32 x 128-bit floating-point/vector registers > - V16-V31 Caller-saved (temporary) registers > + V24-V31 Caller-saved (temporary) registers > + V16-V23 Caller-saved (temporary) registers in most functions; > + Callee-saved in SIMD functions. > V8-V15 Callee-saved registers > V0-V7 Parameter/result registers Probably better as s/SIMD/vector PCS/. The hunk above is OK with that change, independently of the rest. Thanks, Richard