13 Regression] SLP costs for vec duplicate too high since g:4963079769c99c4073adfd799885410ad484cbbe

rsandifo at gcc dot gnu.org via Gcc-bugs Thu, 09 Mar 2023 06:13:44 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109072


rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-03-09
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #1 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to Tamar Christina from comment #0)
> The SLP costs went from:
> 
>   Vector cost: 2
>   Scalar cost: 4
> 
> to:
> 
>   Vector cost: 12
>   Scalar cost: 4
> 
> it looks like it's no longer costing it as a duplicate but instead 4 vec
> inserts.
We do cost it as a duplicate, but we only try to vectorize up to
the stores, rather than up to the load back.  So we're costing
the difference between:

        fmov    s1, s0
        stp     s1, s1, [x0]
        stp     s1, s1, [x0, 8]

(no idea why we have an fmov, pretend we don't) and:

        fmov    s1, s0
        dup     v1.4s, v1.s[0]
        str     q1, [x0]

If we want the latter as a general principle, the PR is
easy to fix.  But if we don't, we'd need to make the
vectoriser start at the load or (alternatively) fold
to a constructor independently of vectorisation.

[Bug target/109072] [12/13 Regression] SLP costs for vec duplicate too high since g:4963079769c99c4073adfd799885410ad484cbbe

Reply via email to