[Bug rtl-optimization/91981] Speed degradation because of inlining a register clobbering function

segher at gcc dot gnu.org Fri, 04 Oct 2019 08:00:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981


--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Attempting shrink-wrapping optimization.
Block 2 needs the prologue.

(That's the entry block, already).  And in fact it does need the prologue,
it has

        movq    %rdi, %rbx      # 2     [c=4 l=3]  *movdi_internal/3

This was already decided by IRA:

(insn 2 87 3 2 (set (reg/v/f:DI 105 [ v ])
        (reg:DI 115)) "91981.c":46:30 66 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 115)
        (nil)))

and IRA picked

   16:r82  l0     1   17:r83  l0     0    8:r88  l0     1    6:r89  l0     6
   12:r92  l0    40    4:r93  l0    41    5:r94  l0    40    7:r95  l0     5
   27:r97  l0     2    2:r100 l0     6   10:r103 l0     0    9:r104 l0     0
    0:r105 l0     3   18:r106 l0     0   15:r107 l0     1   14:r109 l0     1
   13:r111 l0     0    3:r113 l0    40    1:r114 l0     6   19:r115 l0     5
   11:r116 l0     0

(105 gets bx, 115 gets di).

Ideally IRA will choose register better, not use non-volatile registers
early in the function.  But shrink-wrapping could try to correct for that;
that has been on my to-do for a long time now, but it is hard to come up
with good heuristics.

There are three mechanisms that can be used:

1) Rename registers.  Sometimes you can shuffle the registers a bit such
that the one you care about gets a volatile register.
2) More feasible, you can create register copies to move the stuff around.
Sometimes late passes can get rid of those copies, even.
3) You can copy the code using those non-volatile registers to all successor
blocks.  Or just the code that sets the register.  And you have to be careful
that the inputs to the code you copy are still live at the new position(s),
etc.

Often you cannot get rid of *all* non-volatile registers, even in the entry
block.  Deciding which to get rid of, where, and how, is quite a big problem.

But maybe there is some simple heuristic that works well that I just fail
to see :-)

[Bug rtl-optimization/91981] Speed degradation because of inlining a register clobbering function

Reply via email to