2016-04-22 10:13 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: >> Hi, >> >> Currently when loop is vectorized we adjust its nb_iterations_upper_bound >> by dividing it by VF. This is incorrect since nb_iterations_upper_bound >> is upper bound for (<number of loop iterations> - 1) and therefore simple >> dividing it by VF in many cases gives us bounds greater than a real one. >> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1). > > Yeah, that seems correct. > >> Also decrement due to peeling for gaps should happen before we scale it >> by VF because peeling applies to a scalar loop, not vectorized one. > > That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration > is peeled as scalar operations. We do not account for the amount > of known prologue peeling (if peeling for alignment and the misalignment > is known at compile-time) - that would be peeling of scalar iterations.
My initial patch didn't change anything for PEELING_FOR_GAP and it caused a runfail for one of SPEC2006 benchmarks. My investigation showed number of vector iterations calculation doesn't match nb_iterations_upper_bound adjustment in a way PEELING_FOR_GAP is accounted. Looking into vect_generate_tmps_on_preheader I see: /* If epilogue loop is required because of data accesses with gaps, we subtract one iteration from the total number of iterations here for correct calculation of RATIO. */ And then we decrement loop counter before dividing it by VF to compute ratio and ratio_mult_vf. This doesn't match nb_iterations_upper_bound update and that's why I fixed it. This resolved runfail for me. Thus ratio_mult_vf computation conflicts with your statement we peel a vector iteration. > > But it would be interesting to know why we need the != 0 check - static > cost modelling should have disabled vectorization if the vectorized body > isn't run. > >> This patch modifies nb_iterations_upper_bound computation to resolve >> these issues. > > You do not adjust the ->nb_iterations_estimate accordingly. > >> Running regression testing I got one fail due to optimized loop. Heres >> is a loop: >> >> foo (signed char s) >> { >> signed char i; >> for (i = 0; i < s; i++) >> yy[i] = (signed int) i; >> } >> >> Here we vectorize for AVX512 using VF=64. Original loop has max 127 >> iterations and therefore vectorized loop may be executed only once. >> With the patch applied compiler detects it and transforms loop into >> BB with just stores of constants vectors into yy. Test was adjusted >> to increase number of possible iterations. A copy of test was added >> to check we can optimize out the original loop. >> >> Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk? > > I'd like to see testcases covering the corner-cases - have them have > upper bound estimates by adjusting known array sizes and also cover > the case of peeling for gaps. OK, I'll make more tests. Thanks, Ilya > > Richard. >