Re: [RFC] avoid type conversion through versioning loop

Richard Biener Wed, 24 Mar 2021 05:33:44 -0700

On Wed, 24 Mar 2021, guojiufu wrote:

> On 2021-03-24 15:55, Richard Biener wrote:
> > On Wed, Mar 24, 2021 at 3:55 AM guojiufu <[email protected]> wrote:
> >> 
> >> On 2021-03-23 16:25, Richard Biener via Gcc wrote:
> >> > On Tue, Mar 23, 2021 at 4:33 AM guojiufu <[email protected]>
> >> > wrote:
> >> >>
> >> >> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote:
> >> >> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc
> >> >> > wrote:
> >> >> >> Better than doing loop versioning is to enhance SCEV (and thus also
> >> >> >> dependence analysis) to track extra conditions they need to handle
> >> >> >> cases similar as to how niter analysis computes it's 'assumptions'
> >> >> >> condition.  That allows the versioning to be done when there's an
> >> >> >> actual beneficial transform (like vectorization) rather than just
> >> >> >> upfront for the eventual chance that there'll be any.  Ideally such
> >> >> >> transform would then choose IVs in their transformed copy that
> >> >> >> are analyzable w/o repeating such versioning exercise for the next
> >> >> >> transform.
> >> >> >
> >> >> > And it might be beneficial to perform some type promotion/demotion
> >> >> > pass, either early during vectorization or separately before
> >> >> > vectorization
> >> >> > on a loop copy guarded with the ifns e.g. ifconv uses too.
> >> >> > Find out what type sizes the loop use, first try to demote
> >> >> > computations
> >> >> > to narrower types in the vectorized loop candidate (e.g. if something
> >> >> > is computed in a wider type only to have the result demoted to
> >> >> > narrower
> >> >> > type), then pick up the widest type size still in use in the loop (ok,
> >> >> > this assumes we don't mix multiple vector sizes in the loop, but
> >> >> > currently
> >> >> > our vectorizer doesn't do that) and try to promote computations that
> >> >> > could
> >> >> > be promoted to that type size.  We do partially something like that
> >> >> > during
> >> >> > vect patterns for bool types, but not other types I think.
> >> >> >
> >> >> >       Jakub
> >> >>
> >> >> Thanks for the suggestions!
> >> >>
> >> >> Enhancing SCEV could help other optimizations and improve performance
> >> >> in
> >> >> some cases.
> >> >> While one of the direct ideas of using the '64bit type' is to
> >> >> eliminate
> >> >> conversions,
> >> >> even for some cases which are not easy to be optimized through
> >> >> ifconv/vectorization,
> >> >> for examples:
> >> >>
> >> >>    unsigned int i = 0;
> >> >>    while (a[i]>1e-3)
> >> >>      i++;
> >> >>
> >> >>    unsigned int i = 0;
> >> >>    while (p1[i] == p2[i] && p1[i] != '\0')
> >> >>      i++;
> >> >>
> >> >> Or only do versioning on type for this kind of loop? Any suggestions?
> >> >
> >> > But the "optimization" resulting from such versioning is hard to
> >> > determine upfront which means we'll pay quite a big code size cost
> >> > for unknown questionable gain.  What's the particular optimization
> >> 
> >> Right.  Code size increasing is a big pain on large loops. If the gain
> >> is not significant, this optimization may not positive.
> >> 
> >> > in the above cases?  Note that for example for
> >> >
> >> >     unsigned int i = 0;
> >> >     while (a[i]>1e-3)
> >> >        i++;
> >> >
> >> > you know that when 'i' wraps then the loop will not terminate.  There's
> >> 
> >> Thanks :) The code would be "while (a[i]>1e-3 && i < n)", the upbound is
> >> checkable.  Otherwise, the optimization to avoid zext is not adoptable.
> >> 
> >> > the address computation that is i * sizeof (T) which is done in a
> >> > larger
> >> > type to avoid overflow so we have &a + zext (i) * 8 - is that the
> >> > operation
> >> > that is 'slow' for you?
> >> 
> >> This is the point: "zext(i)" is the instruction that I want to
> >> eliminate,
> >> which is the direct goal of the optimization.
> >> 
> >> The gain of eliminating the 'zext' is visible or not, and the code size
> >> increasing is small enough or not, this is a question and needs to
> >> trade-off.
> >> It may be only acceptable if the loop is very small, then eliminating
> >> 'zext'
> >> would help to save runtime, and code size increase maybe not big.
> > 
> > OK, so I indeed think that the desire to micro-optimize a 'zext' doesn't
> > make versioning a good trade-off.  The micro-architecture should better
> > not make that totally slow (I'd expect an extra latency comparable to
> > the multiply or add on the &a + zext(i) * 8 instruction chain).
> 
> Agree, I understand your point.  The concern is some micro-architectures are
> not
> very well on this yet.  I tested the above example code:
>   unsigned i = 0;
>   while (a[i] > 1e-3 && i < n)
>     i++;
> there are ~30% performance improvement if using "long i" instead of"unsigned
> i"
> on ppc64le and x86.  It seems those instructions are not optimized too much on
> some platforms.  So, I'm wondering we need to do this in GCC.


On x86 I see indexed addressing modes being used which should be fine.
Compilable testcase:

unsigned foo (double *a, unsigned n)
{
  unsigned i = 0;
  while (a[i] > 1e-3 && i < n)
    i++;
  return i;
}

ppc64le seems to do some odd unrolling/peeling or whatnot, I have a hard
time following its assembly ... ah, -fno-unroll-loops "helps" and
produces

.L5:
        lfd %f0,0(%r9)
        addi %r3,%r3,1
        addi %r9,%r9,8
        rldicl %r3,%r3,0,32
        fcmpu %cr0,%f0,%f12
        bnglr %cr0
        bdnz .L5

which looks pretty good to me, I suppose the rldicl is the
zero-extension but the IVs are already 64bit and the
zero-extension should be sunk to the loop exit instead.

> > 
> > OTOH making SCEV analysis not give up but instead record the constraints
> > under which its solution is valid is a very good and useful thing to do.
> 
> Thanks! Enhance SCEV could help a few cases, especially when other
> optimizations
> are enabled.
> 
> Thanks again for your suggestions!
> 
> BR.
> Jiufu Guo.
> 
> > 
> > Richard.
> > 
> >> Thanks again for your very helpful comments!
> >> 
> >> BR.
> >> Jiufu Guo.
> >> 
> >> >
> >> > Richard.
> >> >
> >> >> BR.
> >> >> Jiufu Guo.
> 
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [RFC] avoid type conversion through versioning loop

Reply via email to