On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guoji...@imap.linux.ibm.com> wrote: > > On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote: > > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc wrote: > >> Better than doing loop versioning is to enhance SCEV (and thus also > >> dependence analysis) to track extra conditions they need to handle > >> cases similar as to how niter analysis computes it's 'assumptions' > >> condition. That allows the versioning to be done when there's an > >> actual beneficial transform (like vectorization) rather than just > >> upfront for the eventual chance that there'll be any. Ideally such > >> transform would then choose IVs in their transformed copy that > >> are analyzable w/o repeating such versioning exercise for the next > >> transform. > > > > And it might be beneficial to perform some type promotion/demotion > > pass, either early during vectorization or separately before > > vectorization > > on a loop copy guarded with the ifns e.g. ifconv uses too. > > Find out what type sizes the loop use, first try to demote computations > > to narrower types in the vectorized loop candidate (e.g. if something > > is computed in a wider type only to have the result demoted to narrower > > type), then pick up the widest type size still in use in the loop (ok, > > this assumes we don't mix multiple vector sizes in the loop, but > > currently > > our vectorizer doesn't do that) and try to promote computations that > > could > > be promoted to that type size. We do partially something like that > > during > > vect patterns for bool types, but not other types I think. > > > > Jakub > > Thanks for the suggestions! > > Enhancing SCEV could help other optimizations and improve performance in > some cases. > While one of the direct ideas of using the '64bit type' is to eliminate > conversions, > even for some cases which are not easy to be optimized through > ifconv/vectorization, > for examples: > > unsigned int i = 0; > while (a[i]>1e-3) > i++; > > unsigned int i = 0; > while (p1[i] == p2[i] && p1[i] != '\0') > i++; > > Or only do versioning on type for this kind of loop? Any suggestions?
But the "optimization" resulting from such versioning is hard to determine upfront which means we'll pay quite a big code size cost for unknown questionable gain. What's the particular optimization in the above cases? Note that for example for unsigned int i = 0; while (a[i]>1e-3) i++; you know that when 'i' wraps then the loop will not terminate. There's the address computation that is i * sizeof (T) which is done in a larger type to avoid overflow so we have &a + zext (i) * 8 - is that the operation that is 'slow' for you? Richard. > BR. > Jiufu Guo.