[Bug rtl-optimization/118611] LRA inserts unneeded reload on FMA chain

rsandifo at gcc dot gnu.org via Gcc-bugs Fri, 24 Jan 2025 05:00:55 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118611


Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2025-01-24
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #4 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
I think the problem is IRA rather than LRA.

As a result of the quoted instruction, IRA realises that r101 and r103 should
be tied.  It therefore forms a thread for them:

        Forming thread by copy 3:a1r101-a3r103 (freq=1000):
          Result (freq=4000): a1r101(2000) a3r103(2000)

But it happens to allocate r101 first, even though r103's allocation is more
constrained:

      Popping a7(r106,l0)  --         assign reg 63
      Popping a1(r101,l0)  --         assign reg 63
      Popping a3(r103,l0)  --         assign reg 62

So IRA picks 63 for both r101 and r106.  But r103 and r106 are live at the same
time, so it has to fall back on 62 for r103.

I don't think allocating r101 and then r103 is necessarily the wrong order. 
There could be other cases where the current order gives the best result and
the opposite order wouldn't.  Instead, it seems like the cost of allocating 63
to r101 doesn't fully reflect the r103→r101 copy that we would fail to
eliminate (or, alternatively, that the cost of allocating 62 doesn't fully
reflect the saving of eliminating the copy).

[Bug rtl-optimization/118611] LRA inserts unneeded reload on FMA chain

Reply via email to