[Bug tree-optimization/91934] Performance regression on 8.3.0 with -O3 and avx

rguenth at gcc dot gnu.org Tue, 01 Oct 2019 03:58:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934


--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the difference between good and bad is data-ref access analysis which
figures
single-element interleaving in GCC 8 and nicer interleaving in GCC 9 where
I rewrote parts of that analysis:

t.c:15:9: note:   === vect_analyze_data_ref_accesses ===
t.c:15:9: note:   Detected interleaving load _6->i and _6->q
t.c:15:9: note:   Detected interleaving load _8->i and _8->q
t.c:15:9: note:   Detected interleaving load _34->i and _34->q
t.c:15:9: note:   Detected interleaving load _32->i and _32->q
t.c:15:9: note:   Detected interleaving load _3->i and _37->i
t.c:15:9: note:   Queuing group with duplicate access for fixup
t.c:15:9: note:   Detected interleaving load _3->i and _3->q
t.c:15:9: note:   Detected interleaving load _3->i and _37->q
t.c:15:9: note:   Detected interleaving store _3->i and _37->i
t.c:15:9: note:   Queuing group with duplicate access for fixup
t.c:15:9: note:   Detected interleaving store _3->i and _3->q
t.c:15:9: note:   Detected interleaving store _3->i and _37->q

see the 'Queuing group with duplicate access' parts which is a new feature
that deals with interleaving exposed by unrolling a bit better.  In
particular we have redundancies the old code simply gives up on:

  <bb 3> [local count: 66409497]:
  # j_40 = PHI <0(5), j_75(21)>
  # ivtmp_28 = PHI <200(5), ivtmp_44(21)>
  idx_22 = _1 + j_40;
  _2 = j_40 * 8;
  _3 = dst_23(D) + _2;
  _4 = _3->i;
...
  _38 = j_40 * 8;
  _37 = dst_23(D) + _38;
  _36 = _37->i;

while the new code simply leaves them in place, vectorizing them.

So for GCC 9 the fix for PR87105 (specifically r265457) fixed this.

[Bug tree-optimization/91934] Performance regression on 8.3.0 with -O3 and avx

Reply via email to