[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication

pinskia at gcc dot gnu.org Sat, 28 Jan 2017 00:38:00 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438


--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #12) 
> You are making an orthogonal point to this bug report: whether or not to
> vectorize such a loop.  But if loop is vectorized, then on any
> microarchitecture it is better to have "st2" vs "umov; st1; str".

Yes but thinking about the problem some more I do think there are some vector
cost model issue in the aarch64 backend where we don't model int vs floating
point cost differences.  For an example ^ for scalar int might be one cycle but
vector it is 4 cycles but for floating point scalar addition, it is 4 cycles
while the floating point vector addition is just 4 cycles.
struct cpu_vector_cost
{
  const int scalar_stmt_cost;            /* Cost of any scalar operation,
                                            excluding load and store.  */
...

  const int vec_stmt_cost;               /* Cost of any vector operation,
                                            excluding load, store, permute,
                                            vector-to-scalar and
                                            scalar-to-vector operation.  */


Anyways I filed PR 79262 for the regression.

[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication

Reply via email to