http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

mgretton at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mgretton at gcc dot gnu.org

--- Comment #7 from mgretton at gcc dot gnu.org ---
Testing the testcase in #4 with a recent trunk (gcc version 4.9.0 20130528
(experimental) (GCC)) gives the following results:

arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
sqrlen4D_16u8:
        vmov    d18, r0, r1  @ v16qi
        vmov    d19, r2, r3
        vld1.64 {d16-d17}, [sp:64]
        vabd.u8 q8, q9, q8
        vmull.u8        q9, d16, d16
        vmull.u8        q8, d17, d17
        vuzp.32 q9, q8
        vpaddl.u16      q9, q9
        vmov    q10, q9  @ v4si
        vpadal.u16      q10, q8
        vmov    r0, r1, d20  @ v4si
        vmov    r2, r3, d21
        bx      lr


arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard -O2 -mthumb:
sqrlen4D_16u8:
        vabd.u8 q1, q0, q1
        vmull.u8        q0, d2, d2
        vmull.u8        q8, d3, d3
        vuzp.32 q0, q8
        vpaddl.u16      q0, q0
        vpadal.u16      q0, q8
        bx      lr

So code generation seems to be OK for hard-float ABI but the soft-float version
has some issues with an extra vmov between the vpaddl and vpadal.

Reply via email to