On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guoji...@imap.linux.ibm.com> wrote:
>
> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote:
> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc wrote:
> >> Better than doing loop versioning is to enhance SCEV (and thus also
> >> dependence analysis) to track extra conditions they need to handle
> >> cases similar as to how niter analysis computes it's 'assumptions'
> >> condition.  That allows the versioning to be done when there's an
> >> actual beneficial transform (like vectorization) rather than just
> >> upfront for the eventual chance that there'll be any.  Ideally such
> >> transform would then choose IVs in their transformed copy that
> >> are analyzable w/o repeating such versioning exercise for the next
> >> transform.
> >
> > And it might be beneficial to perform some type promotion/demotion
> > pass, either early during vectorization or separately before
> > vectorization
> > on a loop copy guarded with the ifns e.g. ifconv uses too.
> > Find out what type sizes the loop use, first try to demote computations
> > to narrower types in the vectorized loop candidate (e.g. if something
> > is computed in a wider type only to have the result demoted to narrower
> > type), then pick up the widest type size still in use in the loop (ok,
> > this assumes we don't mix multiple vector sizes in the loop, but
> > currently
> > our vectorizer doesn't do that) and try to promote computations that
> > could
> > be promoted to that type size.  We do partially something like that
> > during
> > vect patterns for bool types, but not other types I think.
> >
> >       Jakub
>
> Thanks for the suggestions!
>
> Enhancing SCEV could help other optimizations and improve performance in
> some cases.
> While one of the direct ideas of using the '64bit type' is to eliminate
> conversions,
> even for some cases which are not easy to be optimized through
> ifconv/vectorization,
> for examples:
>
>    unsigned int i = 0;
>    while (a[i]>1e-3)
>      i++;
>
>    unsigned int i = 0;
>    while (p1[i] == p2[i] && p1[i] != '\0')
>      i++;
>
> Or only do versioning on type for this kind of loop? Any suggestions?

But the "optimization" resulting from such versioning is hard to
determine upfront which means we'll pay quite a big code size cost
for unknown questionable gain.  What's the particular optimization
in the above cases?  Note that for example for

    unsigned int i = 0;
    while (a[i]>1e-3)
       i++;

you know that when 'i' wraps then the loop will not terminate.  There's
the address computation that is i * sizeof (T) which is done in a larger
type to avoid overflow so we have &a + zext (i) * 8 - is that the operation
that is 'slow' for you?

Richard.

> BR.
> Jiufu Guo.

Reply via email to