https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173

--- Comment #23 from amker at gcc dot gnu.org ---
Now I am less convinced that it's a tree ivopt issue.  Tree optimizer has no
knowledge about stack frame information for local array variables.  With the
original test, on 32-bits targets, tree ivopts happens to choose the IV of base
address like below:

  <bb 16>:
  # ivtmp.11_82 = PHI <ivtmp.11_84(15), ivtmp.11_83(17)>
  _87 = (void *) ivtmp.11_82;
  _13 = MEM[base: _87, offset: 0B];
  ivtmp.11_83 = ivtmp.11_82 - 1;
  _14 = (int) _13;
  foo (_14);
  if (ivtmp.11_83 != _88)
    goto <bb 17>;
  else
    goto <bb 3>;

  <bb 17>:
  goto <bb 16>;

IMHO, we shouldn't depend on this.  It's much clearer to consider the exactly
multiple-use example in previous comment, but this time in loop context:

void bar(int i) {
  char A[10];
  char B[10];
  char C[10];
  int d = 0;
  while (i > 0)
    A[d++] = i--;

  while (d > 0)
  {
    foo(A[d]);
    foo(B[d]);
    foo(C[d]);
    d--;
  }
}

Tree ivopt has no knowledge that references of A/B/C in loop share common
sub-expression.  Most likely (as suggested by previous comments), GCC will
generates below IR:

  <bb 15>:
  _93 = (sizetype) i_6(D);
  _94 = &A + _93;
  ivtmp.11_92 = (unsigned int) _94;
  _98 = (sizetype) i_6(D);
  _99 = _98 + 1;
  _100 = &B + _99;
  ivtmp.15_97 = (unsigned int) _100;
  _104 = (sizetype) i_6(D);
  _105 = _104 + 1;
  _106 = &C + _105;
  ivtmp.20_103 = (unsigned int) _106;
  _110 = (unsigned int) &A;

  <bb 16>:
  # ivtmp.11_90 = PHI <ivtmp.11_92(15), ivtmp.11_91(17)>
  # ivtmp.15_95 = PHI <ivtmp.15_97(15), ivtmp.15_96(17)>
  # ivtmp.20_101 = PHI <ivtmp.20_103(15), ivtmp.20_102(17)>
  _107 = (void *) ivtmp.11_90;
  _13 = MEM[base: _107, offset: 0B];
  ivtmp.11_91 = ivtmp.11_90 - 1;
  _14 = (int) _13;
  foo (_14);
  ivtmp.15_96 = ivtmp.15_95 - 1;
  _108 = (void *) ivtmp.15_96;
  _16 = MEM[base: _108, offset: 0B];
  _17 = (int) _16;
  foo (_17);
  ivtmp.20_102 = ivtmp.20_101 - 1;
  _109 = (void *) ivtmp.20_102;
  _19 = MEM[base: _109, offset: 0B];
  _20 = (int) _19;
  foo (_20);
  if (ivtmp.11_91 != _110)
    goto <bb 17>;
  else
    goto <bb 3>;

  <bb 17>:
  goto <bb 16>;

It not good at least on targets without auto-increment addressing modes.  Even
on ARM/AARCH64 with auto-increment addressing modes, it's not always practical
because of bloated loop-setup, or register pressure issue caused by duplicated
inducation variables.

In this case, we should associate "virtual_frame + offset1" and hoist it out of
loop, while in loop, we should choose only one inducation variable (the biv),
then refer to A/B/C using offset addressing mode like below:

  <bb 15>
    base_1 = virtual_frame + off1

  <bb 16>:
  # d_26 = PHI <init, d_13>
    base_2 = base_1 + d_26
    d_13 = d_26 - 1;
    foo (MEM[base_2, off_A])
    foo (MEM[base_2, off_B])
    foo (MEM[base_2, off_C])
    goto <bb 16>


So maybe this is a RTL issue, we firstly should do reassociation like
"virtual_frame + reg + offset" to "(virtual_frame + offset) + reg".  As for
missed CSE opportunities in address expressions, maybe it can be fixed by an
additional strength reduction pass on RTL, just like gimple-strength-reduction
pass.  The rtl pass is necessary simply because the gimple one has no knowledge
of stack frame information, just like tree IVOPT.

Reply via email to