8 regression] Performance regression with code hoisting enabled

prathamesh3492 at gcc dot gnu.org Wed, 11 Oct 2017 12:05:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155


--- Comment #33 from prathamesh3492 at gcc dot gnu.org ---
Created attachment 42341
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42341&action=edit
Test-case to reproduce regression with cortex-m7

I have attached an artificial test-case that is fairly representative of the
regression we are seeing in a benchmark. The test-case mimics a deterministic
finite automaton. With code-hoisting there's an additional spill of r5 near
beginning of the function.

Looking at the loop from the attached test-case:
for (; *a && b != 'z'; a++)
  {
    next = *a;
    if (next == ',')
      {
        a++;
        break;
      }
    switch (b) { ... }
  }

The for loop has same computation a++ in two sibling basic blocks,
which gets hoisted.

From PRE dump with code-hoisting:
  <bb 26> [23.80%] [count: INV]:
  # _25 = PHI <_151(25), _23(2)>
  # b_50 = PHI <b_152(25), 97(2)>
  # a_55 = PHI <a_153(25), a_28(2)>
  next_29 = (int) _25;
  _44 = a_55 + 1;
  if (next_29 == 44)
    goto <bb 27>; [5.00%] [count: INV]
  else
    goto <bb 12>; [95.00%] [count: INV]

(a+1) seems to get hoisted in bb26:
_44 = a_55 + 1
just before
if (next_29 == 44) which corresponds to if (next == ',') condition.

The issue I think is that there is a use of 'a' near end of function:
*s = a;
which possibly results in register pressure forcing the compiler to spill r5.
Commenting out the assignment removes the spill.

Looking at register allocation with code-hoisting, it seems r2 is used
to hold the hoisted value (a + 1):

r0 = s
r1 = tab
r3 = a
r4 = b
r5 = *a
r2 = r3 + 1 (holding the hoisted value)

And without code-hoisting, it seems only r3 is assigned to 'a'.
r0 = s
r1 = tab
r2 = b
r3 = a
r4 = *a


This is evident from asm differences for the early-exit code-path:
if (next == ',')
  {
    a++;
    break;
  }

<breaks to>:
  *s = a;
  return b;


Without code-hoisting:
.L2:
        cmp     r4, #44
        beq     .L4

.L4:
        adds    r3, r3, #1
        ldr     r4, [sp], #4
        str     r3, [r0]
        mov     r0, r2
        bx      lr

With code-hoisting:
.L2:
        cmp     r5, #44
        add     r2, r3, #1
        beq     .L3

.L3:
        str     r2, [r0]
        mov     r0, r4
        pop     {r4, r5}
        bx      lr

Without code-hoisting it is reusing r3 to store a + 1, while due to code
hoisting it uses the extra register 'r2' to store the value of hoisted
expression a + 1.

Would it be a good idea to somehow "limit" the distance (in terms of number of
basic blocks maybe?) between the definition of hoisted variable and it's
furthest use during PRE ? If that exceeds a certain threshold then PRE should
choose not to hoist that expression. The threshold could be a param that can be
set by backends.
Does this analysis look reasonable ?

Thanks,
Prathamesh

[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled

Reply via email to