On Fri, 5 May 2017, Christophe Lyon wrote:
> Hi Richard,
>
>
> On 3 May 2017 at 10:19, Richard Biener <[email protected]> wrote:
> >
> > The following extends the very simplistic cost modeling I added somewhen
> > late in the release process to, for all unknown misaligned refs, also
> > apply this model for loops containing stores.
> >
> > The model basically says it's useless to peel for alignment if there's
> > only a single DR that is affected or if, in case we'll end up using
> > hw-supported misaligned loads, the cost of misaligned loads is the same
> > as of aligned ones. Previously we'd usually align one of the stores
> > with the theory that this improves (precious) store-bandwith.
> >
> > Note this is only a so slightly conservative (aka less peeling). We'll
> > still apply peeling for alignment if you make the testcase use +=
> > because then we'll align both the load and the store from v1.
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2017-05-03 Richard Biener <[email protected]>
> >
> > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> > When all DRs have unknown misaligned do not always peel
> > when there is a store but apply the same costing model as if
> > there were only loads.
> >
> > * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase.
> >
>
> This patch (r247544) caused regressions on aarch64 and arm:
> - PASS now FAIL [PASS => FAIL]:
>
> Executed from: gcc.dg/vect/vect.exp
> gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Vectorizing an unaligned access" 2
> gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of
> access forced using peeling" 1
> gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an
> unaligned access" 2
> gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Vectorizing an unaligned access" 2
> gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of
> access forced using peeling" 1
> gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an
> unaligned access" 2
Ok, so the reason is that we no longer peel for alignment for
for (i = 0; i < N; i++)
{
pa[i] = pb[i] * pc[i];
}
which is probably good. This is because the generic aarch64 cost model
(and probaby also arm) has
1, /* vec_align_load_cost */
1, /* vec_unalign_load_cost */
1, /* vec_unalign_store_cost */
1, /* vec_store_cost */
so there's no benefit in aligning. x86 generic tuning has
1, /* vec_align_load_cost. */
2, /* vec_unalign_load_cost. */
1, /* vec_store_cost. */
and vec_unalign_store_cost sharing with vec_unalign_load_cost.
That makes us still apply peeling.
Fixing this with vect_ testsuite conditions is going to be tricky
so the easiest is to simply disable peeling here.
Tested on aarch64 and x86_64, committed.
Richard.
2017-05-09 Richard Biener <[email protected]>
* gcc.dg/vect/vect-44.c: Add --param vect-max-peeling-for-alignment=0
and adjust.
* gcc.dg/vect/vect-50.c: Likewise.
Index: gcc/testsuite/gcc.dg/vect/vect-44.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-44.c (revision 247782)
+++ gcc/testsuite/gcc.dg/vect/vect-44.c (working copy)
@@ -1,6 +1,7 @@
+/* { dg-do compile } */
/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
-#include <stdarg.h>
#include "tree-vect.h"
#define N 256
@@ -65,7 +66,7 @@ int main (void)
two loads to be aligned). */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2
"vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using
peeling" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {!
vector_alignment_reachable} } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3
"vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using
peeling" 0 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {!
vector_alignment_reachable} } } } } */
/* { dg-final { scan-tree-dump-times "Alignment of access forced using
versioning." 3 "vect" { target { vect_no_align && { ! vect_hw_misalign } } } }
} */
/* { dg-final { scan-tree-dump-times "Alignment of access forced using
versioning." 1 "vect" { target { {! vector_alignment_reachable} && {{!
vect_no_align} && {! vect_hw_misalign} } } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-50.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-50.c (revision 247782)
+++ gcc/testsuite/gcc.dg/vect/vect-50.c (working copy)
@@ -1,6 +1,7 @@
+/* { dg-do compile } */
/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
-#include <stdarg.h>
#include "tree-vect.h"
#define N 256
@@ -61,8 +62,8 @@ int main (void)
align the store will not force the two loads to be aligned). */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2
"vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2
"vect" { target vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using
peeling" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {!
vector_alignment_reachable} } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3
"vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3
"vect" { target vect_hw_misalign } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using
peeling" 0 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {!
vector_alignment_reachable} } } } } */
/* { dg-final { scan-tree-dump-times "Alignment of access forced using
versioning." 3 "vect" { target { vect_no_align && { ! vect_hw_misalign } } } }
} */
/* { dg-final { scan-tree-dump-times "Alignment of access forced using
versioning." 1 "vect" { target { {! vector_alignment_reachable} && { {!
vect_no_align } && {! vect_hw_misalign } } } } } } */