https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #1 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
Interesting idea!  But I think the ideal thing here would be
to do the 8*step after the store:

.L2:
        add     v29.4s, v31.4s, v28.4s  # += 4*step
        stp     q31, q29, [x0]
        add     v31.4s, v31.4s, v27.4s  # += 8*step
        add     x0, x0, 32
        cmp     x1, x0
        bne     .L2

This has the advantage that the loop-carried dependency
is only one ADD instruction deep, rather than 2 ADDs deep.

I haven't looked how easy it would be to do though…

Reply via email to