On Wed, 27 Nov 2013, Jakub Jelinek wrote:
> On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote:
> > Hmm. I'm still thinking that we should handle this during the regular
> > transform step.
>
> I wonder if it can't be done instead just in vectorizable_load,
> if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is
> invariant, just emit the (broadcasted) load not inside of the loop, but on
> the loop preheader edge.
It is safe even for !LOOP_REQUIRES_VERSIONING_FOR_ALIAS. It's just
a missed optimization I even noted when originally implementing
support for invariant loads ...
> > *** gcc/tree-vect-data-refs.c (revision 205435)
> > --- gcc/tree-vect-data-refs.c (working copy)
> > *************** again:
> > *** 3668,3673 ****
> > --- 3668,3682 ----
> > }
> > STMT_VINFO_STRIDE_LOAD_P (stmt_info) = true;
> > }
> > + else if (loop_vinfo
> > + && integer_zerop (DR_STEP (dr)))
> > + {
> > + /* All loads from a non-varying address will be disambiguated
> > + by data-ref analysis or via a runtime alias check and thus
> > + they will become invariant. Force them to be vectorized
> > + as external. */
> > + STMT_VINFO_DEF_TYPE (stmt_info) = vect_external_def;
> > + }
>
> I think this is unsafe for simd loops.
> I'd say:
> int a[1024], b[1024];
>
> int foo (void)
> {
> int i;
> #pragma omp simd safelen(8)
> for (i = 0; i < 1024; i++)
> {
> a[i] = i;
> b[i] = a[0];
> }
> }
>
> is valid testcase, the loop behaves the same if you execute it
> sequentially, or vectorize using SIMD (hardware or emulated) instructions
> with vectorization factor of 2, 4 or 8, as long as you do all memory
> operations (either using scalar insns or simd instructions) in the order
> they were written, which I believe the vectorizer right now handles
> correctly, but the hoisting this patch wants to perform is not fine,
> unless data ref analysis would prove that it can't alias. For non-simd
> loops we of course perform that data ref analysis and either version for
> alias, or prove that the drs can't alias, but for simd loops we take as
> given that the loop is safe to be vectorized. It is, but not for hoisting.
Ick. I hate this behind-the-back stuff - so safelen doesn't mean
that a[i] and a[0] do not alias. Note that this will break with
SLP stuff at least as that will re-order reads/writes. Not sure
how safelen applies to SLP though. That is
a[i] = i;
b[i] = a[0];
a[i+1] = i+1;
b[i+1] = a[1];
will eventually end up re-ordering reads/writes in non-obvious
ways.
> So, the above say with emulated SIMD is safe to be executed as:
> for (i = 0; i < 1024; i += 8)
> {
> int tmp;
> for (tmp = i; tmp < i + 8; tmp++)
> a[tmp] = tmp;
> for (tmp = i; tmp < i + 8; tmp++)
> b[tmp] = a[0];
> }
> but not as:
> int tempa[8], tmp;
> /* Hoisted HW or emulated load + splat. */
> for (tmp = 0; tmp < 8; tmp++)
> tempa[tmp] = a[0];
> for (i = 0; i < 1024; i += 8)
> {
> for (tmp = i; tmp < i + 8; tmp++)
> a[tmp] = tmp;
> for (tmp = i; tmp < i + 8; tmp++)
> b[tmp] = tempa[tmp];
> }
>
> Jakub
>
>
--
Richard Biener <[email protected]>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer