On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote: > Hello, > > PR tree-optimization/53636 is a crash due to an invalid unaligned access > generated by the vectorizer. > > The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO > as computed by the default data-ref analysis to decide whether an access > is sufficiently aligned for the vectorizer. > > However, this analysis computes the scalar evolution relative to the > innermost loop in which the access takes place; DR_ALIGNED_TO only > reflects the known alignmnent of the *base* address according to > that evolution. > > Now, if we're actually about to vectorize this particular loop, then > just checking the alignment of the base is exactly what we need to do > (subsequent accesses will usually be misaligned, but that's OK since > we're transforming those into a vector access). > > However, if we're actually currently vectorizing something else, this > test is not sufficient. The code currently already checks for the > case where we're performing outer-loop vectorization. In this case, > we need to check alignment of the access on *every* pass through the > inner loop, as the comment states: > > /* In case the dataref is in an inner-loop of the loop that is being > vectorized (LOOP), we use the base and misalignment information > relative to the outer-loop (LOOP). This is ok only if the misalignment > stays the same throughout the execution of the inner-loop, which is why > we have to check that the stride of the dataref in the inner-loop evenly > divides by the vector size. */ > > However, there is a second case where we need to check every pass: if > we're not actually vectorizing any loop, but are performing basic-block > SLP. In this case, it would appear that we need the same check as > described in the comment above, i.e. to verify that the stride is a > multiple of the vector size. > > The patch below adds this check, and this indeed fixes the invalid access > I was seeing in the test case (in the final assembler, we now get a vld1.16 > instead of vldr). > > Tested on arm-linux-gnueabi with no regressions. > > OK for mainline?
Ok. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > gcc/ > PR tree-optimization/53636 > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify > stride when doing basic-block vectorization. > > gcc/testsuite/ > PR tree-optimization/53636 > * gcc.target/arm/pr53636.c: New test. > > > === added file 'gcc/testsuite/gcc.target/arm/pr53636.c' > --- gcc/testsuite/gcc.target/arm/pr53636.c 1970-01-01 00:00:00 +0000 > +++ gcc/testsuite/gcc.target/arm/pr53636.c 2012-06-11 17:31:41 +0000 > @@ -0,0 +1,48 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-options "-O -ftree-vectorize" } */ > +/* { dg-add-options arm_neon } */ > + > +void fill (short *buf) __attribute__ ((noinline)); > +void fill (short *buf) > +{ > + int i; > + > + for (i = 0; i < 11 * 8; i++) > + buf[i] = i; > +} > + > +void test (unsigned char *dst) __attribute__ ((noinline)); > +void test (unsigned char *dst) > +{ > + short tmp[11 * 8], *tptr; > + int i; > + > + fill (tmp); > + > + tptr = tmp; > + for (i = 0; i < 8; i++) > + { > + dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) > >> 7; > + dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) > >> 7; > + dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) > >> 7; > + dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) > >> 7; > + dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) > >> 7; > + dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) > >> 7; > + dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) > >> 7; > + dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) > >> 7; > + > + dst += 8; > + tptr += 11; > + } > +} > + > +int main (void) > +{ > + char buf [8 * 8]; > + > + test (buf); > + > + return 0; > +} > + > > === modified file 'gcc/tree-vect-data-refs.c' > --- gcc/tree-vect-data-refs.c 2012-05-31 08:46:10 +0000 > +++ gcc/tree-vect-data-refs.c 2012-06-11 17:31:41 +0000 > @@ -845,6 +845,24 @@ > } > } > > + /* Similarly, if we're doing basic-block vectorization, we can only use > + base and misalignment information relative to an innermost loop if the > + misalignment stays the same throughout the execution of the loop. > + As above, this is the case if the stride of the dataref evenly divides > + by the vector size. */ > + if (!loop) > + { > + tree step = DR_STEP (dr); > + HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); > + > + if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0) > + { > + if (vect_print_dump_info (REPORT_ALIGNMENT)) > + fprintf (vect_dump, "SLP: step doesn't divide the vector-size."); > + misalign = NULL_TREE; > + } > + } > + > base = build_fold_indirect_ref (base_addr); > alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT); > > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > ulrich.weig...@de.ibm.com >