https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109072

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-03-09
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #1 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to Tamar Christina from comment #0)
> The SLP costs went from:
> 
>   Vector cost: 2
>   Scalar cost: 4
> 
> to:
> 
>   Vector cost: 12
>   Scalar cost: 4
> 
> it looks like it's no longer costing it as a duplicate but instead 4 vec
> inserts.
We do cost it as a duplicate, but we only try to vectorize up to
the stores, rather than up to the load back.  So we're costing
the difference between:

        fmov    s1, s0
        stp     s1, s1, [x0]
        stp     s1, s1, [x0, 8]

(no idea why we have an fmov, pretend we don't) and:

        fmov    s1, s0
        dup     v1.4s, v1.s[0]
        str     q1, [x0]

If we want the latter as a general principle, the PR is
easy to fix.  But if we don't, we'd need to make the
vectoriser start at the load or (alternatively) fold
to a constructor independently of vectorisation.

Reply via email to