On Tue, Apr 26, 2016 at 2:29 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
> 2016-04-22 10:13 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich....@gmail.com> 
>> wrote:
>>> Hi,
>>>
>>> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
>>> by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
>>> is upper bound for (<number of loop iterations> - 1) and therefore simple
>>> dividing it by VF in many cases gives us bounds greater than a real one.
>>> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
>>
>> Yeah, that seems correct.
>>
>>> Also decrement due to peeling for gaps should happen before we scale it
>>> by VF because peeling applies to a scalar loop, not vectorized one.
>>
>> That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
>> is peeled as scalar operations.  We do not account for the amount
>> of known prologue peeling (if peeling for alignment and the misalignment
>> is known at compile-time) - that would be peeling of scalar iterations.
>
> My initial patch didn't change anything for PEELING_FOR_GAP and it caused
> a runfail for one of SPEC2006 benchmarks.  My investigation showed number
> of vector iterations calculation doesn't match nb_iterations_upper_bound
> adjustment in a way PEELING_FOR_GAP is accounted.
>
> Looking into vect_generate_tmps_on_preheader I see:
>
> /* If epilogue loop is required because of data accesses with gaps, we
>    subtract one iteration from the total number of iterations here for
>    correct calculation of RATIO.  */
>
> And then we decrement loop counter before dividing it by VF to compute
> ratio and ratio_mult_vf.  This doesn't match nb_iterations_upper_bound
> update and that's why I fixed it.  This resolved runfail for me.
>
> Thus ratio_mult_vf computation conflicts with your statement we peel a
> vector iteration.

Hum.  I stand corrected.  So yes, we remove the last vector iteration if
there are not already epilogue iterations.

>>
>> But it would be interesting to know why we need the != 0 check - static
>> cost modelling should have disabled vectorization if the vectorized body
>> isn't run.
>>
>>> This patch modifies nb_iterations_upper_bound computation to resolve
>>> these issues.
>>
>> You do not adjust the ->nb_iterations_estimate accordingly.
>>
>>> Running regression testing I got one fail due to optimized loop. Heres
>>> is a loop:
>>>
>>> foo (signed char s)
>>> {
>>>   signed char i;
>>>   for (i = 0; i < s; i++)
>>>     yy[i] = (signed int) i;
>>> }
>>>
>>> Here we vectorize for AVX512 using VF=64.  Original loop has max 127
>>> iterations and therefore vectorized loop may be executed only once.
>>> With the patch applied compiler detects it and transforms loop into
>>> BB with just stores of constants vectors into yy.  Test was adjusted
>>> to increase number of possible iterations.  A copy of test was added
>>> to check we can optimize out the original loop.
>>>
>>> Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?
>>
>> I'd like to see testcases covering the corner-cases - have them have
>> upper bound estimates by adjusting known array sizes and also cover
>> the case of peeling for gaps.
>
> OK, I'll make more tests.

Thanks,
Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>

Reply via email to