https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |53947 --- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> --- .L8: ldr q4, [x9, x8] cmgt v2.4s, v6.4s, v0.4s ldr q3, [x10, x8] add w12, w12, 1 ldr q1, [x2, x8] add v0.4s, v0.4s, v5.4s add v3.4s, v3.4s, v4.4s << this one add v1.4s, v1.4s, v4.4s << this one bit v1.16b, v3.16b, v2.16b str q1, [x9, x8] add x8, x8, 16 cmp w7, w12 bhi .L8 This is the trunk on aarch64-linux-gnu. Range splitting is not there but there is more it can be done even without range splitting; there is one extra add. PRE produces: <bb 4>: _2 = b[i_18]; _3 = _2 + pretmp_14; goto <bb 6>; <bb 5>: _5 = c[i_18]; _6 = _5 + pretmp_14; <bb 6>: # cstore_17 = PHI <_3(4), _6(5)> But we could do better and do: <bb 4>: _2 = b[i_18]; goto <bb 6>; <bb 5>: _5 = c[i_18]; <bb 6>: # _N = PHI <_2(4), _5(5)> _cstore_17 = _N + pretmp_14; Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations