Hi All, Here is a preliminary patch to combine vectorized loop with its scalar remainder, draft of which was proposed by Kirill Yukhin month ago: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html It was tested wwith '-mavx2' option to run on Haswell processor. The main goal of it is to improve performance of vectorized loops for AVX512. Note that only loads/stores and simple reductions with binary operations are converted to masked form, e.g. load --> masked load and reduction like r1 = f <op> r2 --> t = f <op> r2; r1 = m ? t : r2. Masking is performed through creation of a new vector induction variable initialized with consequent values from 0.. VF-1, new const vector upper bound which contains number of iterations and the result of comparison which is considered as mask vector. This implementation has several restrictions:
1. Multiple types are not supported. 2. SLP is not supported. 3. Gather/Scatter's are also not supported. 4. Vectorization of the loops with low trip count is not implemented yet since it requires additional design and tuning. We are planning to eleminate all these restrictions in GCCv7. This patch will be extended to include cost model to reject unprofutable transformations, e.g. new vector body cost will be evaluated through new target hook which estimates cast of masking different vector statements. New threshold parameter will be introduced which determines permissible cost increasing which will be tuned on an AVX512 machine. This patch is not in sync with changes of Ilya Enkovich for AVX512 masked load/store support since only part of them is in trunk compiler. Any comments will be appreciated.
remainder.patch.1
Description: Binary data