Hi All,

Here is a preliminary patch to combine vectorized loop with its scalar
remainder, draft of which was proposed by Kirill Yukhin month ago:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
It was tested wwith '-mavx2' option to run on Haswell processor.
The main goal of it is to improve performance of vectorized loops for AVX512.
Note that only loads/stores and simple reductions with binary operations are
converted to masked form, e.g. load --> masked load and reduction like
r1 = f <op> r2 --> t = f <op> r2; r1 = m ? t : r2. Masking is performed through
creation of a new vector induction variable initialized with consequent values
from 0.. VF-1, new const vector upper bound which contains number of iterations
and the result of comparison which is considered as mask vector.
This implementation has several restrictions:

1. Multiple types are not supported.
2. SLP is not supported.
3. Gather/Scatter's are also not supported.
4. Vectorization of the loops with low trip count is not implemented yet since
   it requires additional design and tuning.

We are planning to eleminate all these restrictions in GCCv7.

This patch will be extended to include cost model to reject unprofutable
transformations, e.g. new vector body cost will be evaluated through new
target hook which estimates cast of masking different vector statements. New
threshold parameter will be introduced which determines permissible cost
increasing which will be tuned on an AVX512 machine.
This patch is not in sync with changes of Ilya Enkovich for AVX512 masked
load/store support since only part of them is in trunk compiler.

Any comments will be appreciated.

Attachment: remainder.patch.1
Description: Binary data

Reply via email to