RE: [PATCH]Construct canonical scaled address expression in IVOPT

bin.cheng Thu, 26 Sep 2013 05:10:42 -0700

Sorry for missing the patch.

Thanks.
bin


> -----Original Message-----
> From: [email protected] [mailto:gcc-patches-
> [email protected]] On Behalf Of bin.cheng
> Sent: Thursday, September 26, 2013 8:05 PM
> To: 'Richard Biener'; Bin.Cheng
> Cc: GCC Patches; Richard Earnshaw
> Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT
> 
> 
> 
> > -----Original Message-----
> > From: Richard Biener [mailto:[email protected]]
> > Sent: Tuesday, September 24, 2013 7:58 PM
> > To: Bin.Cheng
> > Cc: Bin Cheng; GCC Patches; Richard Earnshaw
> > Subject: Re: [PATCH]Construct canonical scaled address expression in
> > IVOPT
> >
> > On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng <[email protected]>
> > wrote:
> > > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
> > > <[email protected]> wrote:
> > >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <[email protected]>
> wrote:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>
> > >> Or even [reg*scale] (not sure about that).  But yes, at least
> > >> reg*scale + offset and reg*scale + reg.
> > >>
> > >>>   Apparently it's infeasible to check every possibility for each
> > >>> architecture, is it ok we at least check "index", then "addr" if
> > >>> "index" is failed?  By "any kind of addressing modes", I mean
> > >>> modes supported by function get_address_cost,  i.e., in form of
> > >>> "[base] + [off] + [var] + (reg|reg*scale)".
> > >>
> > >> I suppose so, but it of course depends on what IVOPTs uses the
> > >> answer for in the end.  Appearantly it doesn't distinguish between
> > >> the various cases even though TARGET_MEM_REF does support all the
> > >> variants in question (reg * scale, reg * scale + reg, reg * scale +
> > >> const, reg * scale + reg + const).
> > >>
> > >> So the better answer may be to teach the costs about the differences?
> > > Ideally, IVOPT should be aware whether scaling is allowed in every
> > > kind of addressing modes and account cost of multiplier accordingly.
> > > For current code, there are two scenarios here
> > > 1) If target supports reg*scale+reg, but not reg*scale, in this
> > > case, IVOPT considers multiplier is not allowed in any addressing
> > > mode and account multiplier with high cost.  This is the problem arm
> having.
> > > 2) If target supports reg*scale, but not some kind of addressing
> > > mode (saying reg*scale+reg), in this case, IVOPT still constructs
> > > various scaled addressing mode in get_address_cost and depends on
> > > address_cost to compute correct cost for that addressing expression.
> > > I think this happens to work even IVOPT doesn't know "reg*scale+reg"
> > > is actually not supported.
> > >
> > >>
> > >>>> The above also builds more RTX waste which you can fix by
> > >>>> re-using the
> > >>> PLUS
> > >>>> by building it up-front similar to the multiplication.  You also
> > >>>> miss the
> > >>> Yes, this can be fixed.
> > >>>
> > >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2
> > >>>> is
> valid.
> > >>> I
> > >>>> would expect that many targets support reg1 * scale +
> > >>>> constant-offset but not many reg1 * scale + reg2.
> > >>> I thought scale==1 is unnecessary because the addressing mode
> > >>> degrades into "reg" or "reg+reg".  Moreover, calls of
> > >>> multiplier_allowed_in_address_p in both get_address_cost and
> > get_computation_cost_at have scale other than 1.
> > >>
> > >> Ok.
> > >>
> > >>>>
> > >>>> So no, the helper now checks sth completely different.  What's
> > >>>> the problem with arm supporting reg1 * scale?  Why shouldn't it
> > >>>> being able to handle
> > >>> the
> > >>>> implicit zero offset?
> > >>>
> > >>> As Richard clarified, ARM does not support scaled addressing mode
> > >>> without base register.
> > >>
> > >> I see.
> > >>
> > > Also from the newer comments:
> > >
> > >> Btw, it should be reasonably possible to compute the whole
> > >> multiplier_allowed_in_address_p table for all primary and secondary
> > >> archs (simply build cross-cc1) and compare the results before /
> > >> after a patch candidate.  Querying both reg * scale and reg + reg *
> > >> scale if the first fails sounds like a good solution to me.
> > > I take this as we should do minimal change by checking reg + reg *
> > > scale if reg * scale is failed,  right?
> >
> > Yes, you can share a single RTL expression for all this and I think
> querying reg
> > + reg * scale first makes sense (then fallback to reg * scale for
> compatibility).
> >
> I updated the patch as discussed.  Please review.
> It bootstraps and passes regression test on x86/x86_64, but fails tree-
> ssa/scev-4.c on arm cortex-a15.
> Hi Richard, I investigated the failure and found out it reveals two other
> problems in IVOPT we have discussed.
> For scev-4.c like:
> 
> typedef struct {
>       int x;
>       int y;
> } S;
> int *a_p;
> S a[1000];
> f(int k)
> {
>       int i;
> 
>       for (i=k; i<1000; i+=k) {
>               a_p = &a[i].y;
>               *a_p = 100;
>         }
> }
> 
> The iv candidates and uses are like:
> 
> use 0
>   generic
>   in statement a_p.0_5 = &a[k_11].y;
> 
>   at position
>   type int *
>   base (int *) &a + ((sizetype) k_3(D) * 8 + 4)
>   step (sizetype) k_3(D) * 8
>   base object (void *) &a
>   related candidates
> use 1
>   address
>   in statement MEM[(int *)&a][k_11].y = 100;
> 
>   at position MEM[(int *)&a][k_11].y
>   type int *
>   base &MEM[(int *)&a][k_3(D)].y
>   step (sizetype) k_3(D) * 8
>   base object (void *) &a
>   related candidates
> 
> candidate 4 (important)
>   depends on 1
>   original biv
>   type int
>   base k_3(D)
>   step k_3(D)
> 
> candidate 7
>   depends on 1
>   var_before ivtmp.12
>   var_after ivtmp.12
>   incremented before exit test
>   type unsigned int
>   base (unsigned int) ((int *) &a + (sizetype) k_3(D) * 8)
>   step (unsigned int) ((sizetype) k_3(D) * 8)
>   base object (void *) &a
> Candidate 7 is related to use 0
> 
> Problem 1) When computing cost of use 1 using cand 4, the call of
> strip_offset (&MEM[(int *)&a][k_3(D)].y, &off1) computes off1 == 0, which
> is actually 4.
> This can be fixed my previous patch which still in re-working.
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01749.html
> 
> Problem 2) It allocates iv with not simplified base like &MEM[(int
> *)&a][k_3(D)].y,  while cost computation functions like
ptr_difference_const
> can't handle such complex address properly (also force_expr_to_var_cost
> etc.).  In this case, it can't understand that "((int *) &a + (sizetype)
> k_3(D) * 8)" and "&MEM[(int *)&a][k_3(D)]" are actually equal.
> I think it would substantially simplify computation of cost in IVOPT if we
can
> allocate iv with simplified base expression.  I have another patch for
this and
> will send it for review.
> 
> BTW, the failure can be fixed by the offset fixation patch.
> 
> So is it OK to applied this updated patch?
> 
> Thanks.
> bin
> 
> 
>

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c  (revision 202599)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -3134,16 +3134,26 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT rat
     {
       enum machine_mode address_mode = targetm.addr_space.address_mode (as);
       rtx reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
-      rtx addr;
+      rtx addr, index;
       HOST_WIDE_INT i;
 
       valid_mult = sbitmap_alloc (2 * MAX_RATIO + 1);
       bitmap_clear (valid_mult);
-      addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
+      /* Construct address expressions like "index*scale+base" and
+        "index*scale", then call memory_address_addr_space_p to see
+        whether multiplier is allowed by target processors.  We query
+        "index*scan+base" first and fallback to "index*scale".  */
+      index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
+      addr = gen_rtx_fmt_ee (PLUS, address_mode, NULL_RTX, reg1);
       for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
        {
-         XEXP (addr, 1) = gen_int_mode (i, address_mode);
-         if (memory_address_addr_space_p (mode, addr, as))
+         if (i == 1)
+           continue;
+
+         XEXP (index, 1) = gen_int_mode (i, address_mode);
+         XEXP (addr, 0) = index;
+         if (memory_address_addr_space_p (mode, addr, as)
+             || memory_address_addr_space_p (mode, index, as))
            bitmap_set_bit (valid_mult, i + MAX_RATIO);
        }

RE: [PATCH]Construct canonical scaled address expression in IVOPT

Reply via email to