https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #32 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- Created attachment 52102 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52102&action=edit Alternative patch This patch is a squash of several ira tweaks that together recover the pre-GCC11 exchange2 performance on aarch64. It isn't ready for trunk yet (hence lack of comments and changelog). It would be great to hear whether/how it works on other targets though. The patch bootstraps on aarch64-linux-gnu and x86_64-linux-gnu, but there are some new scan-assembler failures that need looking at. Quoting from the covering message: The main changes are: (1) Add ira_loop_border_costs to simplify spill cost calculations (NFC intended) (2) Avoid freeing updated costs until the loop node has been fully allocated. This in turn allows: (3) Make improve_allocation work exclusively on updated costs, rather than using a mixture of updated and original costs. One reason this matters is that the register costs only make sense relative to the memory costs, so in some cases, a common register is subtracted from the updated memory cost instead of being added to each individual updated register cost. (4) If a child allocno has a hard register conflict, allow the parent allocno to handle the conflict by spilling to memory throughout the child allocno's loop. This carries the child allocno's full memory cost plus the cost of spilling to memory on entry to the loop and restoring it on exit, but this can still be cheaper than spilling the entire parent allocno. In particular, it helps for allocnos that are live across a loop but not referenced within it, since the child allocno's memory cost is 0 in that case. (5) Extend (4) to cases in which the child allocno is live across a call. The parent then has a free choice between spilling call-clobbered registers around each call (as normal) or spilling them on entry to the loop, keeping the allocno in memory throughout the loop, and restoring them on exit from the loop. (6) Detect <E2><80><9C>soft conflicts<E2><80><9D> in which: - one allocno (A1) is a cap whose (transitive) <E2><80><9C>real<E2><80><9D> allocno is A1' - A1' occurs in loop L1' - the other allocno (A2) is a non-cap allocno - the equivalent of A2 is live across L1' (hence the conflict) but has no references in L1' In this case we can spill A2 around L1' (or perhaps some parent loop) and reuse the same register for A1'. A1 and A2 can then use the same hard register, provided that we make sure not to propagate A1's allocation to A1'.