Looked into tuning the cost model for ThunderX 1 and I noticed I had
too high cost of the unaligned load/store.  This reduces the cost
there and now the loops in linpack are able to vectorize and perform
the best.

Also tested on SPEC CPU 2006 to make sure we don't regress the vectorizer there.

Committed as obvious after a bootstrap/test on aarch64-linux-gnu with
no regressions.

Thanks,
Andrew Pinski

* config/aarch64/aarch64.c (thunderx_vector_cost): Decrease cost of
vec_unalign_load_cost and vec_unalign_store_cost.
Index: config/aarch64/aarch64.c
===================================================================
--- config/aarch64/aarch64.c    (revision 250592)
+++ config/aarch64/aarch64.c    (working copy)
@@ -363,8 +363,8 @@ static const struct cpu_vector_cost thun
   2, /* vec_to_scalar_cost  */
   2, /* scalar_to_vec_cost  */
   3, /* vec_align_load_cost  */
-  10, /* vec_unalign_load_cost  */
-  10, /* vec_unalign_store_cost  */
+  5, /* vec_unalign_load_cost  */
+  5, /* vec_unalign_store_cost  */
   1, /* vec_store_cost  */
   3, /* cond_taken_branch_cost  */
   3 /* cond_not_taken_branch_cost  */

Reply via email to