On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Thanks Richard for your comments. > Your proposed to handle epilogue loop just like normal short-trip loop > but this is not exactly truth since e.g. epilogue must not be peeled > for alignment. Yes there must be some differences, my motivation is to minimize that so we don't need to specially check normal/epilogue loops at too many places. Of course it's just my feeling when going through the patch set, and could be wrong.
Thanks, bin > > It is not clear for me what are my next steps? Should I re-design the > patch completely or simply decompose the whole patch to different > parts? But it means that we must start review process from beginning > but release is closed to its end. > Note also that i work for Intel till the end of year and have not idea > who will continue working on this project. > > Any help will be appreciated. > > Thanks. > Yuri. > > 2016-11-09 13:37 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>: >> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Hi All, >>> >>> I re-send all patches sent by Ilya earlier for review which support >>> vectorization of loop epilogues and loops with low trip count. We >>> assume that the only patch - vec-tails-07-combine-tail.patch - was not >>> approved by Jeff. >>> >>> I did re-base of all patches and performed bootstrapping and >>> regression testing that did not show any new failures. Also all >>> changes related to new vect_do_peeling algorithm have been changed >>> accordingly. >>> >>> Is it OK for trunk? >> >> Hi, >> I can't approve patches, but had some comments after going through the >> implementation. >> >> One confusing part is cost model change, as well as the way it's used >> to decide how epilogue loop should be vectorized. Given vect-tail is >> disabled at the moment and the cost change needs further tuning, is it >> reasonable to split this part out and get vectorization part >> reviewed/submitted independently? For example, let user specified >> parameters make the decision for now. Cost and target dependent >> changes should go in at last, this could make the patch easier to >> read. >> >> The implementation computes/shares quite amount information from main >> loop to epilogue loop vectorization. Furthermore, variables/fields >> for such information are somehow named in a misleading way. For >> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >> flag controlling whether epilogue loop should be vectorized with >> masking. However, it's actually controlled by exactly the same flag >> as whether epilogue loop should be combined into the main loop with >> masking: >> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >> >> slpeel_make_loop_iterate_ntimes (loop, niters_vector); >> >> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >> + vect_combine_loop_epilogue (loop_vinfo); >> + >> /* Reduce loop iterations by the vectorization factor. */ >> scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >> expected_iterations / vf); >> >> IMHO, we should decouple main loop vectorization and epilogue >> vectorization as much as possible by sharing as few information as we >> can. The general idea is to handle epilogue loop just like normal >> short-trip loop. For example, we can rename >> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >> else), and we don't differentiate its meaning between main and >> epilogue(short-trip) loop. It only indicates the current loop should >> be vectorized with masking no matter it's a main loop or epilogue >> loop, and it works just like the current implementation. >> >> After this change, we can refine vectorization and make it more >> general for normal loop and epilogue(short trip) loop. For example, >> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >> loop and use it to control how it should be vectorized: >> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >> + { >> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; >> + } >> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) >> + && min_profitable_combine_iters >= 0) >> + { >> >> This works, but not that good for understanding or maintaining. >> >> Thanks, >> bin