On Fri, Nov 18, 2016 at 4:52 PM, Michael Matz <m...@suse.de> wrote: > Hi, > > On Thu, 17 Nov 2016, Bin.Cheng wrote: > >> B) Depending on ilp, I think below test strings fail for long time with >> haswell: >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 1 "pcom" { target lp64 } } } >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 2 "pcom" { target ia32 } } } >> Because vectorizer choose vf==4 in this case, and there is no >> predictive commoning opportunities at all. >> Also the newly added test string fails in this case too because the >> prolog peeled iterates more than 1 times. > > Btw, this probably means that on haswell (or other archs with vf==4) mgrid > is slower than necessary. On mgrid you really really want predictive > commoning to happen. Vectorization isn't that interesting there. Interesting, I will check if there is difference between 2/4 vf. we do have cases that smaller vf is better and should be chosen, though for different reasons.
Thanks, bin > > > Ciao, > Michael.