On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote: > Hmm. I'm still thinking that we should handle this during the regular > transform step.
I wonder if it can't be done instead just in vectorizable_load, if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is invariant, just emit the (broadcasted) load not inside of the loop, but on the loop preheader edge. > *** gcc/tree-vect-data-refs.c (revision 205435) > --- gcc/tree-vect-data-refs.c (working copy) > *************** again: > *** 3668,3673 **** > --- 3668,3682 ---- > } > STMT_VINFO_STRIDE_LOAD_P (stmt_info) = true; > } > + else if (loop_vinfo > + && integer_zerop (DR_STEP (dr))) > + { > + /* All loads from a non-varying address will be disambiguated > + by data-ref analysis or via a runtime alias check and thus > + they will become invariant. Force them to be vectorized > + as external. */ > + STMT_VINFO_DEF_TYPE (stmt_info) = vect_external_def; > + } I think this is unsafe for simd loops. I'd say: int a[1024], b[1024]; int foo (void) { int i; #pragma omp simd safelen(8) for (i = 0; i < 1024; i++) { a[i] = i; b[i] = a[0]; } } is valid testcase, the loop behaves the same if you execute it sequentially, or vectorize using SIMD (hardware or emulated) instructions with vectorization factor of 2, 4 or 8, as long as you do all memory operations (either using scalar insns or simd instructions) in the order they were written, which I believe the vectorizer right now handles correctly, but the hoisting this patch wants to perform is not fine, unless data ref analysis would prove that it can't alias. For non-simd loops we of course perform that data ref analysis and either version for alias, or prove that the drs can't alias, but for simd loops we take as given that the loop is safe to be vectorized. It is, but not for hoisting. So, the above say with emulated SIMD is safe to be executed as: for (i = 0; i < 1024; i += 8) { int tmp; for (tmp = i; tmp < i + 8; tmp++) a[tmp] = tmp; for (tmp = i; tmp < i + 8; tmp++) b[tmp] = a[0]; } but not as: int tempa[8], tmp; /* Hoisted HW or emulated load + splat. */ for (tmp = 0; tmp < 8; tmp++) tempa[tmp] = a[0]; for (i = 0; i < 1024; i += 8) { for (tmp = i; tmp < i + 8; tmp++) a[tmp] = tmp; for (tmp = i; tmp < i + 8; tmp++) b[tmp] = tempa[tmp]; } Jakub