https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rsandifo at gcc dot 
gnu.org

--- Comment #32 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
Created attachment 52102
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52102&action=edit
Alternative patch

This patch is a squash of several ira tweaks that together recover the
pre-GCC11 exchange2 performance on aarch64.  It isn't ready for trunk
yet (hence lack of comments and changelog).  It would be great to hear
whether/how it works on other targets though.

The patch bootstraps on aarch64-linux-gnu and x86_64-linux-gnu,
but there are some new scan-assembler failures that need looking at.

Quoting from the covering message:

The main changes are:

(1) Add ira_loop_border_costs to simplify spill cost calculations
    (NFC intended)

(2) Avoid freeing updated costs until the loop node has been fully
    allocated.  This in turn allows:

(3) Make improve_allocation work exclusively on updated costs,
    rather than using a mixture of updated and original costs.
    One reason this matters is that the register costs only make
    sense relative to the memory costs, so in some cases,
    a common register is subtracted from the updated memory cost
    instead of being added to each individual updated register cost.

(4) If a child allocno has a hard register conflict, allow the parent
    allocno to handle the conflict by spilling to memory throughout
    the child allocno's loop.  This carries the child allocno's full
    memory cost plus the cost of spilling to memory on entry to the
    loop and restoring it on exit, but this can still be cheaper
    than spilling the entire parent allocno.  In particular, it helps
    for allocnos that are live across a loop but not referenced
    within it, since the child allocno's memory cost is 0 in
    that case.

(5) Extend (4) to cases in which the child allocno is live across
    a call.  The parent then has a free choice between spilling
    call-clobbered registers around each call (as normal) or
    spilling them on entry to the loop, keeping the allocno in memory
    throughout the loop, and restoring them on exit from the loop.

(6) Detect <E2><80><9C>soft conflicts<E2><80><9D> in which:

    - one allocno (A1) is a cap whose (transitive) <E2><80><9C>real<E2><80><9D>
allocno
      is A1'

    - A1' occurs in loop L1'

    - the other allocno (A2) is a non-cap allocno

    - the equivalent of A2 is live across L1' (hence the conflict)
      but has no references in L1'

    In this case we can spill A2 around L1' (or perhaps some parent
    loop) and reuse the same register for A1'.  A1 and A2 can then
    use the same hard register, provided that we make sure not to
    propagate A1's allocation to A1'.

Reply via email to