On Wed, Mar 24, 2021 at 3:55 AM guojiufu <guoji...@linux.ibm.com> wrote: > > On 2021-03-23 16:25, Richard Biener via Gcc wrote: > > On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guoji...@imap.linux.ibm.com> > > wrote: > >> > >> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote: > >> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc wrote: > >> >> Better than doing loop versioning is to enhance SCEV (and thus also > >> >> dependence analysis) to track extra conditions they need to handle > >> >> cases similar as to how niter analysis computes it's 'assumptions' > >> >> condition. That allows the versioning to be done when there's an > >> >> actual beneficial transform (like vectorization) rather than just > >> >> upfront for the eventual chance that there'll be any. Ideally such > >> >> transform would then choose IVs in their transformed copy that > >> >> are analyzable w/o repeating such versioning exercise for the next > >> >> transform. > >> > > >> > And it might be beneficial to perform some type promotion/demotion > >> > pass, either early during vectorization or separately before > >> > vectorization > >> > on a loop copy guarded with the ifns e.g. ifconv uses too. > >> > Find out what type sizes the loop use, first try to demote computations > >> > to narrower types in the vectorized loop candidate (e.g. if something > >> > is computed in a wider type only to have the result demoted to narrower > >> > type), then pick up the widest type size still in use in the loop (ok, > >> > this assumes we don't mix multiple vector sizes in the loop, but > >> > currently > >> > our vectorizer doesn't do that) and try to promote computations that > >> > could > >> > be promoted to that type size. We do partially something like that > >> > during > >> > vect patterns for bool types, but not other types I think. > >> > > >> > Jakub > >> > >> Thanks for the suggestions! > >> > >> Enhancing SCEV could help other optimizations and improve performance > >> in > >> some cases. > >> While one of the direct ideas of using the '64bit type' is to > >> eliminate > >> conversions, > >> even for some cases which are not easy to be optimized through > >> ifconv/vectorization, > >> for examples: > >> > >> unsigned int i = 0; > >> while (a[i]>1e-3) > >> i++; > >> > >> unsigned int i = 0; > >> while (p1[i] == p2[i] && p1[i] != '\0') > >> i++; > >> > >> Or only do versioning on type for this kind of loop? Any suggestions? > > > > But the "optimization" resulting from such versioning is hard to > > determine upfront which means we'll pay quite a big code size cost > > for unknown questionable gain. What's the particular optimization > > Right. Code size increasing is a big pain on large loops. If the gain > is not significant, this optimization may not positive. > > > in the above cases? Note that for example for > > > > unsigned int i = 0; > > while (a[i]>1e-3) > > i++; > > > > you know that when 'i' wraps then the loop will not terminate. There's > > Thanks :) The code would be "while (a[i]>1e-3 && i < n)", the upbound is > checkable. Otherwise, the optimization to avoid zext is not adoptable. > > > the address computation that is i * sizeof (T) which is done in a > > larger > > type to avoid overflow so we have &a + zext (i) * 8 - is that the > > operation > > that is 'slow' for you? > > This is the point: "zext(i)" is the instruction that I want to > eliminate, > which is the direct goal of the optimization. > > The gain of eliminating the 'zext' is visible or not, and the code size > increasing is small enough or not, this is a question and needs to > trade-off. > It may be only acceptable if the loop is very small, then eliminating > 'zext' > would help to save runtime, and code size increase maybe not big.
OK, so I indeed think that the desire to micro-optimize a 'zext' doesn't make versioning a good trade-off. The micro-architecture should better not make that totally slow (I'd expect an extra latency comparable to the multiply or add on the &a + zext(i) * 8 instruction chain). OTOH making SCEV analysis not give up but instead record the constraints under which its solution is valid is a very good and useful thing to do. Richard. > Thanks again for your very helpful comments! > > BR. > Jiufu Guo. > > > > > Richard. > > > >> BR. > >> Jiufu Guo.