On Thu, Jun 16, 2016 at 8:00 AM, Jeff Law <l...@redhat.com> wrote: > On 05/19/2016 01:39 PM, Ilya Enkovich wrote: >> >> Hi, >> >> This patch introduces changes required to run vectorizer on loop epilogue. >> This also enables epilogue vectorization using a vector of smaller size. >> >> Thanks, >> Ilya >> -- >> gcc/ >> >> 2016-05-19 Ilya Enkovich <ilya.enkov...@intel.com> >> >> * tree-if-conv.c (tree_if_conversion): Make public. >> * tree-if-conv.h: New file. >> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't >> try to enhance alignment for epilogues. >> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return >> created loop. >> * tree-vect-loop.c: include tree-if-conv.h. >> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in >> loop->aux. >> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset >> loop->aux. >> (vect_analyze_loop): Reset loop->aux. >> (vect_transform_loop): Check if created epilogue should be >> returned >> for further vectorization. If-convert epilogue if required. >> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to >> process and insert vectorized loop epilogues into this queue. >> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return >> created >> loop. >> (vect_transform_loop): Return created loop. > > As Richi noted, the additional calls into the if-converter are unfortunate. > I'm not sure how else to avoid them though. It looks like we can run > if-conversion on just the epilogue, so maybe that's not too bad.
We could use the if-converted loop as source when doing the loop copy for the epilogue... (and do it similar to if-conversion when it inserts a __builtin_vectorized_loop () check, that is, create two versions for the epilogue). >> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, >> bool clean_stmts) >> destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)); >> loop_vinfo->scalar_cost_vec.release (); >> >> + loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); >> free (loop_vinfo); >> - loop->aux = NULL; >> } > > Hmm, there seems to be a level of indirection I'm missing here. We're > smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux. Ewww. I thought > the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from > the original loop to the vectorized epilogue. What am I missing? Rather > than smuggling around in the aux field, is there some inherent reason why we > can't just copy the info from the original loop directly into > LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue? > >> + /* FORNOW: Currently alias checks are not inherited for epilogues. >> + Don't try to vectorize epilogue because it will require >> + additional alias checks. */ > > Are the alias checks here redundant with the ones done for the original > loop? If so won't DOM eliminate them? They are too complex for this. But the epilogue could be annotated with ivdep pragma / safelen in some way? > And something just occurred to me -- is there some inherent reason why SLP > doesn't vectorize the epilogue, particularly for the cases where we can > vectorize the epilogue using smaller vectors? Sorry if you've already > answered this somewhere or it's a dumb question. It usually can but only if we unroll the epilogue later (and thus when the number of iterations is known at compile-time). > > >> >> + /* Add new loop to a processing queue. To make it easier >> + to match loop and its epilogue vectorization in dumps >> + put new loop as the next loop to process. */ >> + if (new_loop) >> + { >> + loops.safe_insert (i + 1, new_loop->num); >> + vect_loops_num = number_of_loops (cfun); >> + } >> + > > So just to be clear, the only reason to do this is for dumps -- other than > processing the loop before it's epilogue, there's no other inherently > necessary ordering of the loops, right? > > > Jeff