On Wed, Nov 27, 2013 at 12:54:14PM +0100, Richard Biener wrote: > On Wed, 27 Nov 2013, Jakub Jelinek wrote: > > > On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote: > > > Hmm. I'm still thinking that we should handle this during the regular > > > transform step. > > > > I wonder if it can't be done instead just in vectorizable_load, > > if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is > > invariant, just emit the (broadcasted) load not inside of the loop, but on > > the loop preheader edge. > > It is safe even for !LOOP_REQUIRES_VERSIONING_FOR_ALIAS. It's just > a missed optimization I even noted when originally implementing > support for invariant loads ...
True, but only for non-simd loops, or if we proved it by looking at all relevant LOOP_VINFO_DDRSs. But, if it is not a simd loop, and not !LOOP_REQUIRES_VERSIONING_FOR_ALIAS, wouldn't previous optimizations hoist the load before the loop already? > Ick. I hate this behind-the-back stuff - so safelen doesn't mean > that a[i] and a[0] do not alias. My initial understanding of the SIMD loops was also that it allows the the up to safelen consecutive iterations to be randomly reordered or intermixed without affecting valid programs, but further mails from Tobias and others on this topic plus testcases changed my understanding of it. Note that we don't purge LOOP_VINFO_DDRSs in any way for loop->safelen, just don't add versioning for aliasor punt if there is some possible (or proven) aliasing. Perhaps we could add a bool flag to loop_vinfo which would tell us whether the loop has no data dependencies at all (i.e. either for non-safelen is !LOOP_REQUIRES_VERSIONING_FOR_ALIAS, or with safelen non-zero would be !LOOP_REQUIRES_VERSIONING_FOR_ALIAS). Then we could hoist if that flag is set or LOOP_REQUIRES_VERSIONING_FOR_ALIAS (because then the runtime test checks the dependency). > Note that this will break with > SLP stuff at least as that will re-order reads/writes. Not sure > how safelen applies to SLP though. That is > > a[i] = i; > b[i] = a[0]; > a[i+1] = i+1; > b[i+1] = a[1]; > > will eventually end up re-ordering reads/writes in non-obvious > ways. You mean SLP inside of loop vectorization, right? Because for normal SLP outside of loop vectorizer simdlen is ignored and normal data ref is performed without any bypassing. Jakub