https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101506
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |testsuite Target| |aarch64 Target Milestone|--- |12.0 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- I see that the :9 and :19 loops (maxv_f32 and minv_f32) use epilogue vectorization: /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:9:13: optimized: loop vectorized using 16 byte vectors /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:9:13: optimized: loop vectorized using 8 byte vectors /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:19:13: optimized: loop vectorized using 16 byte vectors /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:19:13: optimized: loop vectorized using 8 byte vectors and have two different vector sizes. I have no idea what 'fminnmv' or 'fmaxnmv' are but the vectorizer behaves as intended here. Somebody familiar with aarch64 needs to look. A cc1 cross produces maxv_f32: .LFB0: .cfi_startproc ldr q1, [x0, 4] ld1r {v3.4s}, [x0] ldr d2, [x0, 20] ldr s0, [x0, 28] fmaxnm v1.4s, v1.4s, v3.4s dup d3, v1.d[1] fmaxnm v1.2s, v1.2s, v3.2s fmaxnm v1.2s, v1.2s, v2.2s fmaxnmp s1, v1.2s fmaxnm s0, s0, s1 ret and minv_f32: .LFB1: .cfi_startproc ldr q3, [x0, 4] ld1r {v0.4s}, [x0] ldr q2, [x0, 20] fminnm v3.4s, v3.4s, v0.4s ldr d5, [x0, 52] ldr q1, [x0, 36] ldr s4, [x0, 60] fminnm v2.4s, v2.4s, v3.4s fminnm v1.4s, v1.4s, v2.4s dup d0, v1.d[1] fminnm v0.2s, v1.2s, v0.2s fminnm v0.2s, v0.2s, v5.2s fminnmp s0, v0.2s fminnm s0, s4, s0 ret not sure what is "wrong" here. I'd say it's a testsuite or target (costing) issue. None of which I can really resolve.