Richard, It looks like misunderstanding - we assume that for GCCv6 the simple scheme of remainder will be used through introducing new IV : https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
Is it true or we missed something? Now we are testing vectorization of loops with small non-constant trip count. Yuri. 2015-11-03 14:47 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Wed, Oct 28, 2015 at 11:45 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi All, >> >> Here is a preliminary patch to combine vectorized loop with its scalar >> remainder, draft of which was proposed by Kirill Yukhin month ago: >> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html >> It was tested wwith '-mavx2' option to run on Haswell processor. >> The main goal of it is to improve performance of vectorized loops for AVX512. >> Note that only loads/stores and simple reductions with binary operations are >> converted to masked form, e.g. load --> masked load and reduction like >> r1 = f <op> r2 --> t = f <op> r2; r1 = m ? t : r2. Masking is performed >> through >> creation of a new vector induction variable initialized with consequent >> values >> from 0.. VF-1, new const vector upper bound which contains number of >> iterations >> and the result of comparison which is considered as mask vector. >> This implementation has several restrictions: >> >> 1. Multiple types are not supported. >> 2. SLP is not supported. >> 3. Gather/Scatter's are also not supported. >> 4. Vectorization of the loops with low trip count is not implemented yet >> since >> it requires additional design and tuning. >> >> We are planning to eleminate all these restrictions in GCCv7. >> >> This patch will be extended to include cost model to reject unprofutable >> transformations, e.g. new vector body cost will be evaluated through new >> target hook which estimates cast of masking different vector statements. New >> threshold parameter will be introduced which determines permissible cost >> increasing which will be tuned on an AVX512 machine. >> This patch is not in sync with changes of Ilya Enkovich for AVX512 masked >> load/store support since only part of them is in trunk compiler. >> >> Any comments will be appreciated. > > As stated in the previous discussion I don't think the extra mask IV > is a good idea > and we instead should have a masked final iteration for the epilogue > (yes, that's > not really "combined" then). This is because in the end we'd not only > want AVX512 > to benefit from this work but also other ISAs that can do unaligned or masked > operations (we can overlap the epilogue work with the vectorized work or use > masked loads/stores available with AVX). Note that the same applies to > the alignment prologue if present, I can't see how you can handle that with > the > in-loop approach. > > Richard.