https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101506

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |testsuite
             Target|                            |aarch64
   Target Milestone|---                         |12.0

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see that the :9 and :19 loops (maxv_f32 and minv_f32) use epilogue
vectorization:

/home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:9:13:
optimized: loop vectorized using 16 byte vectors
/home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:9:13:
optimized: loop vectorized using 8 byte vectors
/home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:19:13:
optimized: loop vectorized using 16 byte vectors
/home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x:19:13:
optimized: loop vectorized using 8 byte vectors

and have two different vector sizes.  I have no idea what 'fminnmv'
or 'fmaxnmv' are but the vectorizer behaves as intended here.  Somebody
familiar with aarch64 needs to look.  A cc1 cross produces

maxv_f32:
.LFB0:
        .cfi_startproc
        ldr     q1, [x0, 4]
        ld1r    {v3.4s}, [x0]
        ldr     d2, [x0, 20]
        ldr     s0, [x0, 28]
        fmaxnm  v1.4s, v1.4s, v3.4s
        dup     d3, v1.d[1]
        fmaxnm  v1.2s, v1.2s, v3.2s
        fmaxnm  v1.2s, v1.2s, v2.2s
        fmaxnmp s1, v1.2s
        fmaxnm  s0, s0, s1
        ret

and

minv_f32:
.LFB1:
        .cfi_startproc
        ldr     q3, [x0, 4]
        ld1r    {v0.4s}, [x0]
        ldr     q2, [x0, 20]
        fminnm  v3.4s, v3.4s, v0.4s
        ldr     d5, [x0, 52]
        ldr     q1, [x0, 36]
        ldr     s4, [x0, 60]
        fminnm  v2.4s, v2.4s, v3.4s
        fminnm  v1.4s, v1.4s, v2.4s
        dup     d0, v1.d[1]
        fminnm  v0.2s, v1.2s, v0.2s
        fminnm  v0.2s, v0.2s, v5.2s
        fminnmp s0, v0.2s
        fminnm  s0, s4, s0
        ret

not sure what is "wrong" here.  I'd say it's a testsuite or target (costing)
issue.  None of which I can really resolve.

Reply via email to