Re: [RFC] Combine vectorized loops with its scalar remainder.

Yuri Rumyantsev Tue, 03 Nov 2015 04:09:08 -0800

Richard,

It looks like misunderstanding - we assume that for GCCv6 the simple
scheme of remainder will be used through introducing new IV :
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html


Is it true or we missed something?
Now we are testing vectorization of loops with small non-constant trip count.
Yuri.

2015-11-03 14:47 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Wed, Oct 28, 2015 at 11:45 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Hi All,
>>
>> Here is a preliminary patch to combine vectorized loop with its scalar
>> remainder, draft of which was proposed by Kirill Yukhin month ago:
>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>> It was tested wwith '-mavx2' option to run on Haswell processor.
>> The main goal of it is to improve performance of vectorized loops for AVX512.
>> Note that only loads/stores and simple reductions with binary operations are
>> converted to masked form, e.g. load --> masked load and reduction like
>> r1 = f <op> r2 --> t = f <op> r2; r1 = m ? t : r2. Masking is performed 
>> through
>> creation of a new vector induction variable initialized with consequent 
>> values
>> from 0.. VF-1, new const vector upper bound which contains number of 
>> iterations
>> and the result of comparison which is considered as mask vector.
>> This implementation has several restrictions:
>>
>> 1. Multiple types are not supported.
>> 2. SLP is not supported.
>> 3. Gather/Scatter's are also not supported.
>> 4. Vectorization of the loops with low trip count is not implemented yet 
>> since
>>    it requires additional design and tuning.
>>
>> We are planning to eleminate all these restrictions in GCCv7.
>>
>> This patch will be extended to include cost model to reject unprofutable
>> transformations, e.g. new vector body cost will be evaluated through new
>> target hook which estimates cast of masking different vector statements. New
>> threshold parameter will be introduced which determines permissible cost
>> increasing which will be tuned on an AVX512 machine.
>> This patch is not in sync with changes of Ilya Enkovich for AVX512 masked
>> load/store support since only part of them is in trunk compiler.
>>
>> Any comments will be appreciated.
>
> As stated in the previous discussion I don't think the extra mask IV
> is a good idea
> and we instead should have a masked final iteration for the epilogue
> (yes, that's
> not really "combined" then).  This is because in the end we'd not only
> want AVX512
> to benefit from this work but also other ISAs that can do unaligned or masked
> operations (we can overlap the epilogue work with the vectorized work or use
> masked loads/stores available with AVX).  Note that the same applies to
> the alignment prologue if present, I can't see how you can handle that with 
> the
> in-loop approach.
>
> Richard.

Re: [RFC] Combine vectorized loops with its scalar remainder.

Reply via email to