Richard, I don't necessarily disagree with anything in your comments and long term I think that is the right direction, but I wonder if that level of change is appropriate for GCC Stage 4 which is where we are now. Your changes would require fixes in shared code, whereas setting REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change. I am not sure how long it would take me to implement something along the lines of what you are suggesting.
Steve Ellcey sell...@marvell.com On Sat, 2019-03-09 at 08:03 +0000, Richard Sandiford wrote: > Steve Ellcey <sell...@marvell.com> writes: > > This is a patch to fix the register allocation in SIMD functions. In > > normal functions registers V16 to V31 are all caller saved. In SIMD > > functions V16 to V23 are callee saved and V24 to V31 are caller saved. > > This means that SIMD functions should use V24 to V31 before V16 to V23 > > in order to avoid some saves and restores. > > > > My fix for this is to define REG_ALLOC_ORDER. Right now it is not > > defined on Aarch64, so I just defined it as all the registers in order > > except for putting V24 to V31 before V16 to V23. This fixes the > > register allocation in SIMD functions. It also changes the register > > allocation order in regular functions but since all the registers (V16 > > to V31) are caller saved in that case, it doesn't matter. We could use > > ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see > > any reason to do that. > > REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones > in the PR. Like you say, we don't currently need it to handle the > equivalent situation for the standard ABI. > > I think the problem is that the RA is still using the register sets > for the default ABI when evaluating the prologue/epilogue cost of using > a hard register. E.g. see calculate_saved_nregs. > > Maybe one fix would be to add an equivalent of call_used_reg_set to > rtl_data. By default it would be the same as call_used_reg_set, > but the target could have an opportunity to change it. Then code like > calculate_saved_nregs should use the new set to find out what registers > the current function can use without spilling. > > This would also be useful for targets that implement interrupt handler > attributes. > > It would be good to add the testcase in the PR to the testsuite, > with a scan-assembler to check for spills. > > > diff --git a/gcc/config/aarch64/aarch64.h > > b/gcc/config/aarch64/aarch64.h > > index 7bd3bf5..d3723ff 100644 > > --- a/gcc/config/aarch64/aarch64.h > > +++ b/gcc/config/aarch64/aarch64.h > > @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version; > > ZR zero register, encoded as X/R31 elsewhere > > > > 32 x 128-bit floating-point/vector registers > > - V16-V31 Caller-saved (temporary) registers > > + V24-V31 Caller-saved (temporary) registers > > + V16-V23 Caller-saved (temporary) registers in most functions; > > + Callee-saved in SIMD functions. > > V8-V15 Callee-saved registers > > V0-V7 Parameter/result registers > > Probably better as s/SIMD/vector PCS/. The hunk above is OK with > that > change, independently of the rest. > > Thanks, > Richard