On Tue, Apr 26, 2016 at 2:29 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: > 2016-04-22 10:13 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich....@gmail.com> >> wrote: >>> Hi, >>> >>> Currently when loop is vectorized we adjust its nb_iterations_upper_bound >>> by dividing it by VF. This is incorrect since nb_iterations_upper_bound >>> is upper bound for (<number of loop iterations> - 1) and therefore simple >>> dividing it by VF in many cases gives us bounds greater than a real one. >>> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1). >> >> Yeah, that seems correct. >> >>> Also decrement due to peeling for gaps should happen before we scale it >>> by VF because peeling applies to a scalar loop, not vectorized one. >> >> That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration >> is peeled as scalar operations. We do not account for the amount >> of known prologue peeling (if peeling for alignment and the misalignment >> is known at compile-time) - that would be peeling of scalar iterations. > > My initial patch didn't change anything for PEELING_FOR_GAP and it caused > a runfail for one of SPEC2006 benchmarks. My investigation showed number > of vector iterations calculation doesn't match nb_iterations_upper_bound > adjustment in a way PEELING_FOR_GAP is accounted. > > Looking into vect_generate_tmps_on_preheader I see: > > /* If epilogue loop is required because of data accesses with gaps, we > subtract one iteration from the total number of iterations here for > correct calculation of RATIO. */ > > And then we decrement loop counter before dividing it by VF to compute > ratio and ratio_mult_vf. This doesn't match nb_iterations_upper_bound > update and that's why I fixed it. This resolved runfail for me. > > Thus ratio_mult_vf computation conflicts with your statement we peel a > vector iteration.
Hum. I stand corrected. So yes, we remove the last vector iteration if there are not already epilogue iterations. >> >> But it would be interesting to know why we need the != 0 check - static >> cost modelling should have disabled vectorization if the vectorized body >> isn't run. >> >>> This patch modifies nb_iterations_upper_bound computation to resolve >>> these issues. >> >> You do not adjust the ->nb_iterations_estimate accordingly. >> >>> Running regression testing I got one fail due to optimized loop. Heres >>> is a loop: >>> >>> foo (signed char s) >>> { >>> signed char i; >>> for (i = 0; i < s; i++) >>> yy[i] = (signed int) i; >>> } >>> >>> Here we vectorize for AVX512 using VF=64. Original loop has max 127 >>> iterations and therefore vectorized loop may be executed only once. >>> With the patch applied compiler detects it and transforms loop into >>> BB with just stores of constants vectors into yy. Test was adjusted >>> to increase number of possible iterations. A copy of test was added >>> to check we can optimize out the original loop. >>> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk? >> >> I'd like to see testcases covering the corner-cases - have them have >> upper bound estimates by adjusting known array sizes and also cover >> the case of peeling for gaps. > > OK, I'll make more tests. Thanks, Richard. > Thanks, > Ilya > >> >> Richard. >>