https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46008
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |5.0 --- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Fixed: .L2: ldr q0, [x1, x4] ldr q3, [x5, x4] ldr q1, [x2, x4] ldr q2, [x3, x4] fadd v0.2d, v0.2d, v3.2d fmla v0.2d, v2.2d, v1.2d fcmlt v1.2d, v0.2d, 0 bit v0.16b, v4.16b, v1.16b str q0, [x0, x4] add x4, x4, 16 cmp x4, 8192 bne .L2 5.4.0 produces something slightly worse but still vectorizered: .L2: ldr q0, [x1, x4] ldr q3, [x5, x4] ldr q1, [x2, x4] ldr q2, [x3, x4] fadd v0.2d, v0.2d, v3.2d fmla v0.2d, v2.2d, v1.2d fcmlt v1.2d, v0.2d, 0 bit v0.16b, v4.16b, v1.16b str q0, [x4, x0] add x4, x4, 16 cmp x4, 8192 bne .L2