RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

Richard Biener Fri, 12 Jul 2024 02:47:22 -0700

On Thu, 11 Jul 2024, Tamar Christina wrote:

>  -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Thursday, July 11, 2024 12:39 PM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; bin.ch...@linux.alibaba.com
> > Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing 
> > modes
> > known not to overflow. [PR114932]
> > 
> > On Wed, 10 Jul 2024, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguent...@suse.de>
> > > > Sent: Thursday, June 20, 2024 8:55 AM
> > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>;
> > bin.ch...@linux.alibaba.com
> > > > Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing
> > modes
> > > > known not to overflow. [PR114932]
> > > >
> > > > On Wed, 19 Jun 2024, Tamar Christina wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguent...@suse.de>
> > > > > > Sent: Wednesday, June 19, 2024 1:14 PM
> > > > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>;
> > > > bin.ch...@linux.alibaba.com
> > > > > > Subject: Re: [PATCH][ivopts]: perform affine fold on unsigned 
> > > > > > addressing
> > > > modes
> > > > > > known not to overflow. [PR114932]
> > > > > >
> > > > > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > When the patch for PR114074 was applied we saw a good boost in
> > > > exchange2.
> > > > > > >
> > > > > > > This boost was partially caused by a simplification of the 
> > > > > > > addressing
> > modes.
> > > > > > > With the patch applied IV opts saw the following form for the base
> > > > addressing;
> > > > > > >
> > > > > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) 
> > > > > > > l0_19(D) *
> > > > > > > 324) + 36)
> > > > > > >
> > > > > > > vs what we normally get:
> > > > > > >
> > > > > > >   Base: (integer(kind=4) *) &block + ((sizetype) 
> > > > > > > ((integer(kind=8)) l0_19(D)
> > > > > > > * 81) + 9) * 4
> > > > > > >
> > > > > > > This is because the patch promoted multiplies where one operand 
> > > > > > > is a
> > > > constant
> > > > > > > from a signed multiply to an unsigned one, to attempt to fold 
> > > > > > > away the
> > > > constant.
> > > > > > >
> > > > > > > This patch attempts the same but due to the various problems with 
> > > > > > > SCEV
> > and
> > > > > > > niters not being able to analyze the resulting forms (i.e. 
> > > > > > > PR114322) we
> > can't
> > > > > > > do it during SCEV or in the general form like in fold-const like
> > extract_muldiv
> > > > > > > attempts.
> > > > > > >
> > > > > > > Instead this applies the simplification during IVopts 
> > > > > > > initialization when we
> > > > > > > create the IV.  Essentially when we know the IV won't overflow 
> > > > > > > with
> > regards
> > > > to
> > > > > > > niters then we perform an affine fold which gets it to simplify 
> > > > > > > the internal
> > > > > > > computation, even if this is signed because we know that for 
> > > > > > > IVOPTs uses
> > the
> > > > > > > IV won't ever overflow.  This allows IV opts to see the 
> > > > > > > simplified form
> > > > > > > without influencing the rest of the compiler.
> > > > > > >
> > > > > > > as mentioned in PR114074 it would be good to fix the missed
> > optimization in
> > > > the
> > > > > > > other passes so we can perform this in general.
> > > > > > >
> > > > > > > The reason this has a big impact on fortran code is that fortran 
> > > > > > > doesn't
> > seem
> > > > to
> > > > > > > have unsigned integer types.  As such all it's addressing are 
> > > > > > > created with
> > > > > > > signed types and folding does not happen on them due to the 
> > > > > > > possible
> > > > overflow.
> > > > > > >
> > > > > > > concretely on AArch64 this changes the results from generation:
> > > > > > >
> > > > > > >         mov     x27, -108
> > > > > > >         mov     x24, -72
> > > > > > >         mov     x23, -36
> > > > > > >         add     x21, x1, x0, lsl 2
> > > > > > >         add     x19, x20, x22
> > > > > > > .L5:
> > > > > > >         add     x0, x22, x19
> > > > > > >         add     x19, x19, 324
> > > > > > >         ldr     d1, [x0, x27]
> > > > > > >         add     v1.2s, v1.2s, v15.2s
> > > > > > >         str     d1, [x20, 216]
> > > > > > >         ldr     d0, [x0, x24]
> > > > > > >         add     v0.2s, v0.2s, v15.2s
> > > > > > >         str     d0, [x20, 252]
> > > > > > >         ldr     d31, [x0, x23]
> > > > > > >         add     v31.2s, v31.2s, v15.2s
> > > > > > >         str     d31, [x20, 288]
> > > > > > >         bl      digits_20_
> > > > > > >         cmp     x21, x19
> > > > > > >         bne     .L5
> > > > > > >
> > > > > > > into:
> > > > > > >
> > > > > > > .L5:
> > > > > > >         ldr     d1, [x19, -108]
> > > > > > >         add     v1.2s, v1.2s, v15.2s
> > > > > > >         str     d1, [x20, 216]
> > > > > > >         ldr     d0, [x19, -72]
> > > > > > >         add     v0.2s, v0.2s, v15.2s
> > > > > > >         str     d0, [x20, 252]
> > > > > > >         ldr     d31, [x19, -36]
> > > > > > >         add     x19, x19, 324
> > > > > > >         add     v31.2s, v31.2s, v15.2s
> > > > > > >         str     d31, [x20, 288]
> > > > > > >         bl      digits_20_
> > > > > > >         cmp     x21, x19
> > > > > > >         bne     .L5
> > > > > > >
> > > > > > > The two patches together results in a 10% performance increase in
> > exchange2
> > > > in
> > > > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% 
> > > > > > > improvement
> > in
> > > > > > compile
> > > > > > > time. There's also a 5% performance improvement in fotonik3d and 
> > > > > > > similar
> > > > > > > reduction in binary size.
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > > > >
> > > > > > > Ok for master?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >   PR tree-optimization/114932
> > > > > > >   * tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned 
> > > > > > > fold.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >   PR tree-optimization/114932
> > > > > > >   * gfortran.dg/addressing-modes_1.f90: New test.
> > > > > > >
> > > > > > > ---
> > > > > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
> > > > > > b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
> > > > > > > new file mode 100644
> > > > > > > index
> > > > > >
> > > >
> > 0000000000000000000000000000000000000000..334d5bc47a16e53e9168b
> > > > > > b1f90dfeff584b4e494
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
> > > > > > > @@ -0,0 +1,37 @@
> > > > > > > +! { dg-do compile { target aarch64-*-* } }
> > > > > > > +! { dg-additional-options "-w -Ofast" }
> > > > > > > +
> > > > > > > +  module brute_force
> > > > > > > +    integer, parameter :: r=9
> > > > > > > +     integer  block(r, r, 0)
> > > > > > > +    contains
> > > > > > > +  subroutine brute
> > > > > > > +     do
> > > > > > > +      do
> > > > > > > +          do
> > > > > > > +           do
> > > > > > > +                do
> > > > > > > +                     do
> > > > > > > +                         do i7 = l0, 1
> > > > > > > +                       select case(1 )
> > > > > > > +                       case(1)
> > > > > > > +                           block(:2, 7:, 1) = block(:2, 7:, i7) 
> > > > > > > - 1
> > > > > > > +                       end select
> > > > > > > +                            do i8 = 1, 1
> > > > > > > +                               do i9 = 1, 1
> > > > > > > +                            if(1 == 1) then
> > > > > > > +                                    call digits_20
> > > > > > > +                                end if
> > > > > > > +                                end do
> > > > > > > +                          end do
> > > > > > > +                    end do
> > > > > > > +                    end do
> > > > > > > +              end do
> > > > > > > +              end do
> > > > > > > +           end do
> > > > > > > +     end do
> > > > > > > +  end do
> > > > > > > + end
> > > > > > > +  end
> > > > > > > +
> > > > > > > +! { dg-final { scan-assembler-not {ldr\s+d([0-9]+),\s+\[x[0-9]+, 
> > > > > > > x[0-9]+\]}
> > } }
> > > > > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc 
> > > > > > > b/gcc/tree-ssa-loop-ivopts.cc
> > > > > > > index
> > > > > >
> > > >
> > 4338d7b64a6c2df6404b8d5e51c7f62c23006e72..f621e4ee924b930e1e1d68e3
> > > > > > 5f3d3a0d52470811 100644
> > > > > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > > > > @@ -1216,6 +1216,18 @@ alloc_iv (struct ivopts_data *data, tree 
> > > > > > > base,
> > tree
> > > > > > step,
> > > > > > >        base = fold_convert (TREE_TYPE (base), 
> > > > > > > aff_combination_to_tree
> > > > (&comb));
> > > > > > >      }
> > > > > > >
> > > > > > > +  /* If we know the IV won't overflow wrt niters and the type is 
> > > > > > > an
> > unsigned
> > > > > > > +     type then fold using affine unsigned arithmetic to allow 
> > > > > > > more folding of
> > > > > > > +     constants.  */
> > > > > > > +  if (no_overflow
> > > > > > > +      && TYPE_UNSIGNED (TREE_TYPE (expr)))
> > > > > > > +    {
> > > > > > > +      aff_tree comb;
> > > > > > > +      tree utype = unsigned_type_for (TREE_TYPE (expr));
> > > > > > > +      tree_to_aff_combination (expr, utype, &comb);
> > > > > > > +      base = fold_convert (TREE_TYPE (base), 
> > > > > > > aff_combination_to_tree
> > > > (&comb));
> > > > > > > +    }
> > > > > > > +
> > > > > >
> > > > > > So right above we already do
> > > > > >
> > > > > >   /* Lower address expression in base except ones with DECL_P as 
> > > > > > operand.
> > > > > >      By doing this:
> > > > > >        1) More accurate cost can be computed for address 
> > > > > > expressions;
> > > > > >        2) Duplicate candidates won't be created for bases in 
> > > > > > different
> > > > > >           forms, like &a[0] and &a.  */
> > > > > >   STRIP_NOPS (expr);
> > > > > >   if ((TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr,
> > 0)))
> > > > > >       || contain_complex_addr_expr (expr))
> > > > > >     {
> > > > > >       aff_tree comb;
> > > > > >       tree_to_aff_combination (expr, TREE_TYPE (expr), &comb);
> > > > > >       base = fold_convert (TREE_TYPE (base), aff_combination_to_tree
> > > > > > (&comb));
> > > > > >     }
> > > > > >
> > > > > > and if I read correctly 'expr' is
> > > > > >
> > > > > >   (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) 
> > > > > > l0_19(D)
> > > > > > * 81) + 9) * 4
> > > > > >
> > > > > > in your interesting case which means it doesn't satisfy
> > > > > > contain_complex_addr_expr.
> > > > > >
> > > > > > I don't quite get why rewriting the base into (T)(unsigned)... is
> > > > > > only valid when no_overflow - no_overflow is about {base, +, step},
> > > > > > not about any overflow contained in 'base'.
> > > > > >
> > > > >
> > > > > Right, so you're saying overflow only applies to step and it's final
> > > > > evolution wrt to niters? Ok, makes sense.  I had thought it referred
> > > > > to that the final possible address doesn't overflow. So offset + step 
> > > > > * niters.
> > > >
> > > > It refers to the IV itself not wrapping but 'base' is the initial
> > > > value computation, whether that computation itself wraps or not
> > > > (it would wrap the same for each iteration) isn't covered.
> > > >
> > > > > > I wonder if we maybe want to record an "original" iv->base
> > > > > > to be used for code generation (and there only the unexpanded form)
> > > > > > and a variant used for the various sorts of canonicalization/compare
> > > > > > (I see we eventually add/subtract step and then compare against
> > > > > > sth else).  And then apply this normalization always to the not
> > > > > > "original" form.
> > > > >
> > > > > That's certainly possible, but it'll make the code a bit harder to 
> > > > > follow in the
> > > > > cases where the comparisons are done between the split address base + 
> > > > > step
> > > > > and then a new IV built up from it.
> > > > >
> > > > > In those cases we'd have to split both expressions but used different 
> > > > > ones in
> > > > > the right places.  Where it gets messy is that something it updates 
> > > > > the value
> > > > > and then compares.
> > > > >
> > > > > So if we can avoid it it's likely cleaner.
> > > >
> > > > Sure.
> > > >
> > > > > >
> > > > > > The above STRIP_NOPS (expr) + expand might turn an unsigned
> > > > > > affine combination into a signed one which might be problematic.
> > > > > > So what happens if you change the above to simply always
> > > > > > unsigned expand?
> > > > > >
> > > > >
> > > > > Yes, that's what's currently happening. It's stripping (signed) (a + 
> > > > > b) into
> > > > > a + b.
> > > > >
> > > > > To be clear you mean change the affine fold I added to always be 
> > > > > unsigned?
> > > > > and thus removing the contain_complex_addr_expr chunk?
> > > >
> > > > Yes.
> > > >
> > > > > That works and was my first patch locally (though need to fully 
> > > > > reg-test it).
> > > >
> > > > I'm curious what happens.
> > > >
> > >
> > > Ok, We cannot fold unconditionally to unsigned, however this I believe is 
> > > correct:
> > >
> > > The patch folds every IV to unsigned to canonicalize them.  At the end of 
> > > the
> > > pass we match.pd will then remove unneeded conversions.
> > >
> > > Note that we cannot force everything to unsigned, IVops requires that 
> > > array
> > > address expressions remain as such.  Folding them results in them becoming
> > > pointer expressions for which some optimizations in IVopts do not run.
> > 
> > Isn't that what contain_complex_addr_expr tries to assert?  That there
> > isn't a DECL in the expression?
> 
> Yes, but In this case, specifically for POINTER_PLUS_EXPR it's perfectly fine 
> for
> The base to be a DECL as we won't be folding it.  Though looking at this 
> perhaps
> the same condition should be applied to PLUS_EXPR and MINUS_EXPR.
> 
> So I think contain_complex_addr_expr should maybe not check operand0.


I think we want to lower &a[i] + b (a complex addr-expr in operand0).
Likewise ptr + (&a[i] - &b[i]).

As originally said as we do not only lower the complex ADDR_EXPRs
but also the whole expression containing it the question is why not
to lower all of them.  IIRC you tested that or did you only
test with converting to unsigned?  Another issue is the
limitation of tree-affine to 8 components which means we handle
large expressions only partially.

I still think that while for cost and IV re-use doing this is
beneficial but for code generation IVOPTs should see to use
the original base.

With simply rewriting all addresses like

  STRIP_NOPS (expr);
      aff_tree comb;
      tree_to_aff_combination (expr, TREE_TYPE (expr), &comb);
      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree 
(&comb));

I see for your fortran testcase on x86_64 changes like

-    Base:      (unsigned long) ((sizetype) l0_19(D) * 324) + (unsigned 
long) &block
+    Base:      (unsigned long) l0_19(D) * 324 + (unsigned long) &block

and while the IVs are the same there's

 <Invariant Expressions>:
 inv_expr 1:    (unsigned long) l0_19(D) * 324
+inv_expr 2:    (signed int) l0_19(D) - (signed int) ((sizetype) l0_19(D) 
+ 1)
 
and so this common subexpression is code generated only once and outside
of the loop.

Note the stripping of nops and folding back to the type looks somewhat
redudnant.  I suppose it's impossible to prove that the above is
equivalent to

      aff_tree comb;
      tree_to_aff_combination (expr, TREE_TYPE (expr), &comb);
      base = aff_combination_to_tree (&comb);

I'm currently giving the first above a full test on x86_64 I think
it should be a good incremental improvement as shown by the
invariant newly discovered this way.  It doesn't affect
PR64705 or has problems with bitint-49.c.

Testing shows it causes gcc.dg/torture/pr111336.c to fail execution
which I think means there's a correctness bug in tree-affine involving
integer overflow rules ... it's avoided when using the 2nd simpler
variant though (but the latent problem is there).

-    Base:      (unsigned int) (((int) c.1_22 + 773233762) * 5 + 
773163185)
+    Base:      (unsigned int) ((int) c.1_22 * 5 + 344364699)

the 2nd variant instead does

-    Base:      (unsigned int) (((int) c.1_22 + 773233762) * 5 + 
773163185)
+    Base:      (unsigned int) c.1_22 * 5 + 344364699

but doesn't avoid the similarly invalid

-  Base:        ((int) c.1_22 + 773233762) * 5 + 773163185
+  Base:        (int) c.1_22 * 5 + 344364699

the degenerate case is c.1_22 == -773233762 or any other value
where c.1_22 * 5 overflows but (c.1_22 + 773233762) * 5 does not.

Usually IVOPTs is careful to rewrite everything to unsigned and
it does so in this case so the failure mode seems different and
needs analysis.  Still the above shows the affine round-trip alters
expressions in an invalid way.  Doing

      aff_tree comb;
      STRIP_NOPS (expr);
      expr = fold_convert (unsigned_type_for (TREE_TYPE (expr)), expr);
      tree_to_aff_combination (expr, TREE_TYPE (expr), &comb);
      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree 
(&comb));

avoids this but this indeed now FAILs gcc.dg/torture/bitint-49.c
execution (the pr64705 FAIL is a testism).

For bitint-49.c IVOPTs and .optimized look OK, the asm shows differences
in stack slot allocation so I guess the stack-slot sharing issue is
re-surfacing here and being confused by

-  ivtmp.28_49 = (unsigned long) &MEM <_BitInt(6384)> [(void *)&y + 8B];
-  _48 = (unsigned long) &y;
-  _40 = _48 + 792;
+  _50 = (unsigned long) &y;
+  ivtmp.28_49 = _50 + 8;
+  _40 = _50 + 792;

It seems to me the task is to work through the FAILs and fix the
latent issues and go for this kind of "simplification".

Richard.

> > 
> > > If we fold array expressions it starts treating them normal integers and 
> > > leads
> > > to incorrect code (as shown by testsuite/gcc.dg/torture/bitint-49.c).
> > >
> > > And if we fold POINTER_PLUS_EXPR then this leads to inefficient codegen 
> > > as is
> > > shown by testsuite/gcc.target/i386/pr115462.c.  So for POINTER_PLUS_EXPR
> > we can
> > > fold the offset but not the base.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/114932
> > >   * tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned fold.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/114932
> > >   * gcc.dg/tree-ssa/pr64705.c: Update dump file scan.
> > >   * gfortran.dg/addressing-modes_1.f90: New test.
> > >
> > > -- inline copy of patch --
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c 
> > > b/gcc/testsuite/gcc.dg/tree-
> > ssa/pr64705.c
> > > index
> > fd24e38a53e9f3a4659dece5af4d71fbc0ce2c18..3c9c2e5deed1ba755d1fae15a6
> > 553d68b6ad0098 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c
> > > @@ -23,4 +23,4 @@ int foo(char *flags, long len, long i, long steps)
> > >
> > >  /* Don't expand iv {base+step, step}_loop into {base+x+y, step}_loop
> > >     even if "step == x + y".  */
> > > -/* { dg-final { scan-tree-dump "Base:\\tstep_\[0-9\]* \\+ 
> > > iter|Base:\\titer_\[0-
> > 9\]* \\+ step" "ivopts"} } */
> > > +/* { dg-final { scan-tree-dump {Base:\t\(unsigned ?.*\) step_[0-9]* \+
> > \(unsigned ?.*\) iter|Base:\t\(unsigned ?.*\) iter_[0-9]* \+ \(unsigned 
> > ?.*\) step}
> > "ivopts"} } */
> > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
> > b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..334d5bc47a16e53e9168b
> > b1f90dfeff584b4e494
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
> > > @@ -0,0 +1,37 @@
> > > +! { dg-do compile { target aarch64-*-* } }
> > > +! { dg-additional-options "-w -Ofast" }
> > > +
> > > +  module brute_force
> > > +    integer, parameter :: r=9
> > > +     integer  block(r, r, 0)
> > > +    contains
> > > +  subroutine brute
> > > +     do
> > > +      do
> > > +          do
> > > +           do
> > > +                do
> > > +                     do
> > > +                         do i7 = l0, 1
> > > +                       select case(1 )
> > > +                       case(1)
> > > +                           block(:2, 7:, 1) = block(:2, 7:, i7) - 1
> > > +                       end select
> > > +                            do i8 = 1, 1
> > > +                               do i9 = 1, 1
> > > +                            if(1 == 1) then
> > > +                                    call digits_20
> > > +                                end if
> > > +                                end do
> > > +                          end do
> > > +                    end do
> > > +                    end do
> > > +              end do
> > > +              end do
> > > +           end do
> > > +     end do
> > > +  end do
> > > + end
> > > +  end
> > > +
> > > +! { dg-final { scan-assembler-not {ldr\s+d([0-9]+),\s+\[x[0-9]+, 
> > > x[0-9]+\]} } }
> > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > index
> > b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f..5fc188ae3f8207943a9503e2d
> > c51b067c3f85739 100644
> > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > @@ -1184,23 +1184,41 @@ static struct iv *
> > >  alloc_iv (struct ivopts_data *data, tree base, tree step,
> > >     bool no_overflow = false)
> > >  {
> > > -  tree expr = base;
> > >    struct iv *iv = (struct iv*) obstack_alloc (&data->iv_obstack,
> > >                                         sizeof (struct iv));
> > >    gcc_assert (step != NULL_TREE);
> > >
> > > -  /* Lower address expression in base except ones with DECL_P as operand.
> > > -     By doing this:
> > > +  /* Lower address expression in base by folding if possible while 
> > > maintaining
> > > +     the original type of the expression.  Any ADDR_EXPR should not be 
> > > folded as
> > > +     we will end up folding this into an integer and losing the fact 
> > > it's an
> > > +     address.
> > > +     By doing this folding we gain:
> > >         1) More accurate cost can be computed for address expressions;
> > >         2) Duplicate candidates won't be created for bases in different
> > >     forms, like &a[0] and &a.  */
> > > +  tree expr = base;
> > >    STRIP_NOPS (expr);
> > > -  if ((TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
> > > -      || contain_complex_addr_expr (expr))
> > > +  if (TREE_CODE (expr) != ADDR_EXPR)
> > 
> > So this runs the conversion for some additional cases but not for some
> > cases it ran before.
> 
> Yeah, Those cases should probably be folded as they were before. So I think 
> this should be
> kept as an else.
> 
> > 
> > IIRC the original purpose is to "expand" &a[i] to &a + CST * i and
> > the guard is to avoid churn on trees otherwise.
> > 
> > Your goal is to convert '(unsigned)i' to 'i'.
> 
> Don't you mean the opposite? Fold 'I' into '(unsigned)i'?
> > 
> > But definitely TREE_CODE (expr) != ADDR_EXPR is odd since I don't
> > see the difference between &x and &x + 1.
> > 
> 
> The issue isn't this, but folding &a[i] into (unsigned)&a + i leads to 
> incorrect code.
> The original fold never changed the sign.
> 
> > >      {
> > >        aff_tree comb;
> > > -      tree_to_aff_combination (expr, TREE_TYPE (expr), &comb);
> > > -      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree 
> > > (&comb));
> > > +      tree addr = NULL_TREE;
> > > +      /* Contrary to the comment above, IVopts does not deal well with 
> > > folding
> > > +  address the pointers.
> > 
> > "folding address the pointers"?
> > 
> > >  We avoid folding ADDR_EXPR but when we have a
> > > +  POINTER_PLUS_EXPR we want to keep the base the same but still fold the
> > > +  offset.  */
> > 
> > But oddly we did this folding before?  Btw, your patch removes the
> > only use of contain_complex_addr_expr but not the function itself.
> > 
> 
> No, the original fold preserves the same sign of the expression.  Which is 
> fine.
> This now always folds to unsigned.  So I think like we want that if we can't 
> fold
> to unsigned, we should fold using the original sign still to canonicalize 
> these
> address expressions?  That should cover both cases.
> 
> > So you're talking about the case where 'base' has a pointer-to-integer
> > or integer-to-pointer conversion which STRIP_NOPS strips?
> > 
> > > +      if (TREE_CODE (expr) == POINTER_PLUS_EXPR)
> > > + {
> > > +   addr = TREE_OPERAND (expr, 0);
> > > +   expr = TREE_OPERAND (expr, 1);
> > 
> > but expr should be unsigned already, see expr_to_aff_combination
> >
> 
> So I definitely see cases where expr is unsigned, but has not been folded as 
> unsigned.
> i.e. the simplification of the expression as unsigned was never performed, 
> only a cast
> added.  I don't see the other conversion to unsigned happening to an affine 
> transform.
> 
> I only see calls to fold_convert.
> 
> > > + }
> > > +
> > > +      tree utype = unsigned_type_for (TREE_TYPE (expr));
> > > +      tree_to_aff_combination (expr, utype, &comb);
> > 
> > I don't think tree_to_aff_combination is supposed to "convert" on the
> > fly.  There is aff_combination_convert for this.
> > 
> 
> Hmm I though the actual conversion happens in aff_combination_to_tree
> with tree_to_aff_combination just setting the desired type.
> 
> It looks like aff_combination_convert does the same, except it round trips it
> back to an aff_tree,  but I don't need the round tripping.  So I can use it, 
> but
> it seems more expensive...
> 
> Thanks for the review,
> Tamar
> 
> > > +      expr = aff_combination_to_tree (&comb);
> > > +      if (addr)
> > > + expr = fold_build_pointer_plus (addr, expr);
> > > +
> > > +      base = fold_convert (TREE_TYPE (base), expr);
> > >      }
> > >
> > >    iv->base = base;
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

Reply via email to