On Fri, 23 Aug 2019, Andre Vieira (lists) wrote: > Hi, > > This patch is an improvement on my last RFC. As you pointed out, we can do > the vectorization analysis of the epilogues before doing the transformation, > using the same approach as used by openmp simd. I have not yet incorporated > the cost tweaks for vectorizing the epilogue, I would like to do this in a > subsequent patch, to make it easier to test the differences. > > I currently disable the vectorization of epilogues when versioning for > iterations. This is simply because I do not completely understand how the > assumptions are created and couldn't determine whether using skip_vectors with > this would work. If you don't think it is a problem or have a testcase to > show it work I would gladly look at it.
I don't think there's any problem. Basically the versioning condition is if (can_we_compute_niter), most of the time it is an extra condition from niter analysis under which niter is for example zero. This should also be the same with all vector sizes. - delete loop_vinfo; + { + /* Set versioning threshold of the original LOOP_VINFO based + on the last vectorization of the epilog. */ + LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo) + = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo); + delete loop_vinfo; + } I'm not sure this works reliably since the order we process vector sizes is under target control and not necessarily decreasing. I think you want to keep track of the minimum here? Preferably separately I guess. >From what I see vect_analyze_loop_2 doesn't need vect_epilogues_nomask and thus it doesn't change throughout the iteration. else - delete loop_vinfo; + { + /* Disable epilog vectorization if we can't determine the epilogs can + be vectorized. */ + *vect_epilogues_nomask &= vectorized_loops > 1; + delete loop_vinfo; + } and this is a bit premature and instead it should be done just before returning success? Maybe also storing the epilogue vector sizes we can handle in the loop_vinfo, thereby representing !vect_epilogues_nomask if there are no such sizes which also means that @@ -1013,8 +1015,13 @@ try_vectorize_loop_1 (hash_table<simduid_to_vf> *&simduid_to_vf_htab, /* Epilogue of vectorized loop must be vectorized too. */ if (new_loop) - ret |= try_vectorize_loop_1 (simduid_to_vf_htab, num_vectorized_loops, - new_loop, loop_vinfo, NULL, NULL); + { + /* Don't include vectorized epilogues in the "vectorized loops" count. + */ + unsigned dont_count = *num_vectorized_loops; + ret |= try_vectorize_loop_1 (simduid_to_vf_htab, &dont_count, + new_loop, loop_vinfo, NULL, NULL); + } can be optimized to not re-check all smaller sizes (but even assert re-analysis succeeds to the original result for the actual transform). Otherwise this looks reasonable to me. Thanks, Richard. > > Bootstrapped this and the next patch on x86_64 and aarch64-unknown-linux-gnu, > with no regressions (after test changes in next patch). > > gcc/ChangeLog: > 2019-08-23 Andre Vieira <andre.simoesdiasvie...@arm.com> > > PR 88915 > * gentype.c (main): Add poly_uint64 type to generator. > * tree-vect-loop.c (vect_analyze_loop_2): Make it determine > whether we vectorize epilogue loops. > (vect_analyze_loop): Idem. > (vect_transform_loop): Pass decision to vectorize epilogues > to vect_do_peeling. > * tree-vect-loop-manip.c (vect_do_peeling): Enable skip-vectors > when doing loop versioning if we decided to vectorize epilogues. > (vect-loop_versioning): Moved decision to check_profitability > based on cost model. > * tree-vectorizer.h (vect_loop_versioning, vect_do_peeling, > vect_analyze_loop, vect_transform_loop): Update declarations. > * tree-vectorizer.c: Include params.h > (try_vectorize_loop_1): Initialize vect_epilogues_nomask > to PARAM_VECT_EPILOGUES_NOMASK and pass it to vect_analyze_loop > and vect_transform_loop. Also make sure vectorizing epilogues > does not count towards number of vectorized loops. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 247165 (AG München)