Hi, On Thu, 17 Nov 2016, Bin.Cheng wrote:
> B) Depending on ilp, I think below test strings fail for long time with > haswell: > ! { dg-final { scan-tree-dump-times "Executing predictive commoning > without unrolling" 1 "pcom" { target lp64 } } } > ! { dg-final { scan-tree-dump-times "Executing predictive commoning > without unrolling" 2 "pcom" { target ia32 } } } > Because vectorizer choose vf==4 in this case, and there is no > predictive commoning opportunities at all. > Also the newly added test string fails in this case too because the > prolog peeled iterates more than 1 times. Btw, this probably means that on haswell (or other archs with vf==4) mgrid is slower than necessary. On mgrid you really really want predictive commoning to happen. Vectorization isn't that interesting there. Ciao, Michael.