r241959 included code to stop us increasing the alignment of a "user-aligned" variable. This wasn't the main purpose of the patch, and I think it was just there to make the testcase work.
The documentation for the aligned attribute says: This attribute specifies a minimum alignment for the variable or structure field, measured in bytes. The DECL_USER_ALIGN code seemed to be treating as a sort of maximum instead, but there's not really such a thing as a maximum here: the variable might still end up at the start of a section that has a higher alignment, or might end up by chance on a "very aligned" boundary at link or load time. I think people who add alignment attributes want to ensure that accesses to that variable are fast, so it seems counter-intuitive for it to make the access slower. The vect-align-4.c test is an example of this: for targets with 128-bit vectors, we get better code without the aligned attribute than we do with it. Tested on aarch64-linux-gnu so far, will test more widely if OK. Thanks, Richard 2018-01-03 Richard Sandiford <richard.sandif...@linaro.org> gcc/ * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't punt for user-aligned variables. gcc/testsuite/ * gcc.dg/vect/vect-align-4.c: New test. * gcc.dg/vect/vect-nb-iter-ub-2.c (cc): Remove alignment attribute and redefine as a structure with an unaligned member "b". (foo): Update accordingly. Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-01-03 15:03:14.301330558 +0000 +++ gcc/tree-vect-data-refs.c 2018-01-03 15:03:14.454324422 +0000 @@ -920,19 +920,6 @@ vect_compute_data_ref_alignment (struct return true; } - if (DECL_USER_ALIGN (base)) - { - if (dump_enabled_p ()) - { - dump_printf_loc (MSG_NOTE, vect_location, - "not forcing alignment of user-aligned " - "variable: "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, base); - dump_printf (MSG_NOTE, "\n"); - } - return true; - } - /* Force the alignment of the decl. NOTE: This is the only change to the code we make during the analysis phase, before deciding to vectorize the loop. */ Index: gcc/testsuite/gcc.dg/vect/vect-align-4.c =================================================================== --- /dev/null 2018-01-03 08:32:43.873058927 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-align-4.c 2018-01-03 15:03:14.453324462 +0000 @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-add-options bind_pic_locally } */ + +__attribute__((aligned (8))) int a[2048] = {}; + +void +f1 (void) +{ + for (int i = 0; i < 2048; i++) + a[i]++; +} + +/* { dg-final { scan-tree-dump-not "Vectorizing an unaligned access" "vect" } } */ +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 2018-01-03 15:03:14.301330558 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 2018-01-03 15:03:14.454324422 +0000 @@ -3,18 +3,19 @@ #include "tree-vect.h" int ii[32]; -char cc[66] __attribute__((aligned(1))) = +struct { char a; char b[66]; } cc = { 0, { 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0, 20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0, - 30, 0, 31, 0 }; + 30, 0, 31, 0 } +}; void __attribute__((noinline,noclone)) foo (int s) { int i; for (i = 0; i < s; i++) - ii[i] = (int) cc[i*2]; + ii[i] = (int) cc.b[i*2]; } int main (int argc, const char **argv)