On Mon, Mar 25, 2019 at 6:48 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener <richard.guent...@gmail.com> writes: > > On Tue, Mar 12, 2019 at 10:50 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Steve Ellcey <sell...@marvell.com> writes: > >> > Richard, > >> > > >> > I don't necessarily disagree with anything in your comments and long > >> > term I think that is the right direction, but I wonder if that level of > >> > change is appropriate for GCC Stage 4 which is where we are now. Your > >> > changes would require fixes in shared code, whereas setting > >> > REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change. > >> > >> I guess it's weighing invasiveness in terms of lines of code/location of > >> code vs. invasiveness in terms of the effect the code has. Changing > >> REG_ALLOC_ORDER affects all functinos that use significant numbers of > >> SIMD registers, which could have unexpected knock-on effects like > >> performance regressions or exposing latent bugs. It would also make > >> the RA try V24 before V16 for callers of vector PCS functions, > >> even though they should prefer V16. > >> > >> Keeping the change specific to callees that use the new vector PCS > >> feature seems more conservative from that point of view. > >> > >> > I am not sure how long it would take me to implement something along > >> > the lines of what you are suggesting. > >> > >> OK, in that case, I thought I'd give a go. > >> > >> This patch adds a new target hook that says which registers a function > >> can use without saving them in the prologue and restoring them in the > >> epilogue. By default this is call_used_reg_set. > >> > >> The patch only makes IRA use the hook, and in that context it's just > >> an (important) optimisation. But I think we need the same thing for > >> passes like regrename, where it'll be a correctness issue. I'll follow > >> on with patches there if this one is OK. > >> > >> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu. > >> OK to install? > > > > So why is call_used_reg_set not updated appropriately? As you say > > if you introduce sth else it becomes a correctness issue to use that "else" > > rather than call_used_reg_set which was OK up until now. I wonder how > > other targets with a per-function different ABI handle this situation. Or > > are > > you the first and others at most affect calling conventions here? That is, > > it looks like making (parts of?) this_taret_hard_regs per function or > > re-computed at set_cfun time. There's alrady a hook for that btw, > > targetm.set_current_function (). Why not change call_used_reg_set there? > > The problem is that call_used_reg_set is a global that describes the > base ABI. It's used both for what the current function can use without > saving and restoring, and for what the target of a call insn might clobber. > Most uses are for the latter. > > The global set still needs to describe the base ABI, since the default > assumption is that a call insn uses the base ABI unless something says > otherwise. TARGET_HARD_REGNO_CALL_PART_CLOBBERED can be used for call > insns that use other ABIs, so it handles the case in which the current > function is the caller. But there's no hook for handling other ABIs > when the current function is the callee. > > So the purpose of the new hook is to split out what the current function > needs to save and restore as a separate bit of data that doesn't affect > call insns. Since it's an rtl-level thing, I thought it'd be better > to put it in crtl (i.e. the rtl info about the current function). > That way we only need to set it once at the start of the rtl passes, > rather than every time we change cfun via targetm.set_current_function. > (Well, longer-term, we might want to set it after RA too, so that the > scratch registers only include what the prologue and epilogue actually > save & restore.) > > Targets can already use targetm.hard_regno_scratch_ok and HARD_REGNO_RENAME_OK > to stop the post-RA passes using a particular register. I guess that's > been enough for what targets have needed till now, and we could use them > for the correctness thing I mentioned above. But that wouldn't solve the > RA problem. Making the information directly available seems better than > adding more hacks around it.
Fair enough. I'm not too deep into this but it seems to me this is an area that needs proper documentation and a well-designed extension - both don't seem like a good fit for stage 4. Richard. > Thanks, > Richard