On Wed, 27 Jan 2021, Jakub Jelinek wrote:

> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
> > The following avoids repeatedly turning VALUE RTXen into
> > sth useful and re-applying a constant offset through get_addr
> > via DSE check_mem_read_rtx.  Instead perform this once for
> > all stores to be visited in check_mem_read_rtx.  This avoids
> > allocating 1.6GB of garbage PLUS RTXen on the PR80960
> > testcase, fixing the memory usage regression from old GCC.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> > 
> > Thanks,
> > Richard.
> > 
> > 2021-01-27  Richard Biener  <rguent...@suse.de>
> > 
> >     PR rtl-optimization/80960
> >     * dse.c (check_mem_read_rtx): Call get_addr on the
> >     offsetted address.
> > ---
> >  gcc/dse.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/gcc/dse.c b/gcc/dse.c
> > index c88587e7d94..da0df54a2dd 100644
> > --- a/gcc/dse.c
> > +++ b/gcc/dse.c
> > @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
> >      }
> >    if (maybe_ne (offset, 0))
> >      mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
> > +  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
> > +     which will over and over re-create proper RTL and re-apply the
> > +     offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
> > +     RTXen that way.  */
> > +  mem_addr = get_addr (mem_addr);
> >  
> >    if (group_id >= 0)
> >      {
> 
> Does that result in any changes on how much does DSE optimize?
> I mean, if you do 2 bootstraps/regtests, one with this patch and another one
> without it, and at the end of rest_of_handle_dse dump
> locally_deleted, globally_deleted
> for each CU/function, do you get the same counts except perhaps for dse.c?

So I see no difference for stage2-gcc/*.o dse1/dse2 with/without the
patch but counts are _extremely_ small.  Statistics:

  70148 dse: local deletions = 0, global deletions = 0
     32 dse: local deletions = 0, global deletions = 1
      9 dse: local deletions = 0, global deletions = 2
      7 dse: local deletions = 0, global deletions = 3
      2 dse: local deletions = 0, global deletions = 4
      2 dse: local deletions = 0, global deletions = 5
      3 dse: local deletions = 0, global deletions = 7
     67 dse: local deletions = 1, global deletions = 0
      1 dse: local deletions = 1, global deletions = 2
     12 dse: local deletions = 2, global deletions = 0
      1 dse: local deletions = 24, global deletions = 1
      2 dse: local deletions = 3, global deletions = 0
      4 dse: local deletions = 4, global deletions = 0
      4 dse: local deletions = 6, global deletions = 0
      1 dse: local deletions = 7, global deletions = 0
      1 dse: local deletions = 8, global deletions = 0

so not sure how much confidence this brings over the analytical
reasoning that it shouldn't make a difference ...

stats on just dse2 are even more depressing (given it's cost)

  35123 dse: local deletions = 0, global deletions = 0
      2 dse: local deletions = 0, global deletions = 1
     20 dse: local deletions = 1, global deletions = 0
      1 dse: local deletions = 2, global deletions = 0
      1 dse: local deletions = 3, global deletions = 0
      1 dse: local deletions = 4, global deletions = 0

Richard.

Reply via email to