https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.L8:
        ldr     q4, [x9, x8]
        cmgt    v2.4s, v6.4s, v0.4s
        ldr     q3, [x10, x8]
        add     w12, w12, 1
        ldr     q1, [x2, x8]
        add     v0.4s, v0.4s, v5.4s
        add     v3.4s, v3.4s, v4.4s << this one
        add     v1.4s, v1.4s, v4.4s  << this one
        bit     v1.16b, v3.16b, v2.16b
        str     q1, [x9, x8]
        add     x8, x8, 16
        cmp     w7, w12
        bhi     .L8

This is the trunk on aarch64-linux-gnu.  Range splitting is not there but there
is more it can be done even without range splitting; there is one extra add.

PRE produces:
  <bb 4>:
  _2 = b[i_18];
  _3 = _2 + pretmp_14;
  goto <bb 6>;

  <bb 5>:
  _5 = c[i_18];
  _6 = _5 + pretmp_14;

  <bb 6>:
  # cstore_17 = PHI <_3(4), _6(5)>

But we could do better and do:
  <bb 4>:
  _2 = b[i_18];
  goto <bb 6>;

  <bb 5>:
  _5 = c[i_18];

  <bb 6>:
  # _N = PHI <_2(4), _5(5)>
  _cstore_17 = _N + pretmp_14;


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to