On Thu, 2 Apr 2020, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, if !ACCUMULATE_OUTGOING_ARGS on large functions we
> can have hundreds of thousands of stack pointer adjustments and cselib
> creates a new VALUE after each sp adjustment, which form extremely deep
> VALUE chains, which is very harmful e.g. for find_base_term.
> E.g. if we have
> sp -= 4
> sp -= 4
> sp += 4
> sp += 4
> sp -= 4
> sp += 4
> that means 7 VALUEs, one for the sp at beginning (val1), than val2 = val1 -
> 4, then val3 = val2 - 4, then val4 = val3 + 4, then val5 = val4 + 4, then
> val6 = val5 - 4, then val7 = val6 + 4.
> This patch tweaks cselib, so that it is smarter about sp adjustments.
> When cselib_lookup (stack_pointer_rtx, Pmode, 1, VOIDmode) and we know
> nothing about sp yet (this happens at the start of the function, for
> non-var-tracking also after cselib_reset_table and for var-tracking after
> processing fp_setter insn where we forget about former sp values because
> that is now hfp related while everything after it is sp related), we
> look it up normally, but in addition to what we have been doing before
> we mark the VALUE as SP_DERIVED_VALUE_P.  Further lookups of sp + offset
> are then special cased, so that it is canonicalized to that
> SP_DERIVED_VALUE_P VALUE + CONST_INT (if possible).  So, for the above,
> we get val1 with SP_DERIVED_VALUE_P set, then val2 = val1 - 4, val3 = val1 -
> 8 (note, no longer val2 - 4!), then we get val2 again, val1 again, val2
> again, val1 again.
> In the find_base_term visited_vals.length () > 100 find_base_term
> statistics during combined x86_64-linux and i686-linux bootstrap+regtest
> cycle, without the patch I see:
>                       find_base_term > 100
>                       returning NULL  returning non-NULL
> 32-bit compilations   4229178         407
> 64-bit compilations   217523          0
> with largest visited_vals.length () when returning non-NULL being 206.
> With the patch the same numbers are:
> 32-bit compilations   1249588         135
> 64-bit compilations   3510            0
> with largest visited_vals.length () when returning non-NULL being 173.
> This shows significant reduction of the deep VALUE chains.
> On powerpc64{,le}-linux, these stats didn't change at all, we have
>                       1008            0
> for all of -m32, -m64 and little-endian -m64, just the
> gcc.dg/pr85180.c and gcc.dg/pr87985.c testcases which are unrelated to sp.
> 
> My earlier version of the patch, which contained just the rtl.h and cselib.c
> changes, regressed some tests:
> gcc.dg/guality/{pr36728-{1,3},pr68860-{1,2}}.c
> gcc.target/i386/{pr88416,sse-{13,23,24,25,26}}.c
> The problem with the former tests was worse debug info, where with -m32
> where arg7 was passed in a stack slot we though a push later on might have
> invalidated it, when it couldn't.  This is something I've solved with the
> var-tracking.c (vt_initialize) changes.  In those problematic functions, we
> create a cfa_base VALUE (argp) and want to record that at the start of
> the function the argp VALUE is sp + off and also record that current sp
> VALUE is argp's VALUE - off.  The second permanent equivalence didn't make
> it after the patch though, because cselib_add_permanent_equiv will
> cselib_lookup the value of the expression it wants to add as the equivalence
> and if it is the same VALUE as we are calling it on, it doesn't do anything;
> and due to the cselib changes for sp based accesses that is exactly what
> happened.  By reversing the order of the cselib_add_permanent_equiv calls we
> get both equivalences though and thus are able to canonicalize the sp based
> accesses in var-tracking to the cfa_base value + offset.

I think this warrants a comment in the code.

> The i386 FAILs were all ICEs, where we had pushf instruction pushing flags
> and then pop pseudo reading that value again.  With the cselib changes,
> cselib during RTL DSE is able to see through the sp adjustment and wanted
> to replace_read what was done pushf, by moving the flags register into a
> pseudo and replace the memory read in the pop with that pseudo.  That is
> wrong for two reasons: one is that the backend doesn't have an instruction
> to move the flags hard register into some other register, but replace_read
> has been validating just the mem -> pseudo replacement and not the insns
> emitted by copy_to_mode_reg.  And the second issue is that it is obviously
> wrong to replace a stack pop which contains stack post-increment by a copy
> of pseudo into destination.  dse.c has some code to handle RTX_AUTOINC, but
> only uses it when actually removing stores and only when there is REG_INC
> note (stack RTX_AUTOINC does not have those), in check_for_inc_dec* where
> it emits the reg adjustment(s) before the insn that is going to be deleted.
> replace_read doesn't remove the insn, so if it e.g. contained REG_INC note,
> it would be kept there and we might have the RTX_AUTOINC not just in *loc,
> but other spots.
> So, the dse.c changes try to validate the added insns and punt on all
> RTX_AUTOINC in *loc.  Furthermore, it seems that with the cselib.c changes
> on the gfortran.dg/pr87360.f90 and gcc.target/i386/pr88416.c testcases
> check_for_inc_dec{,_1} happily throws stack pointer autoinc on the floor,
> which is also wrong.  While we could perhaps do the for_each_inc_dec
> call regardless of whether we have REG_INC note or not, we aren't prepared
> to handle e.g. REG_ARGS_SIZE distribution and thus could end up with wrong
> unwind info or ICEs during dwarf2cfi.c.  So the patch also punts on those,
> after all, if we'd in theory managed to try to optimize such pushes before,
> we'd create wrong-code.
> 
> On x86_64-linux and i686-linux, the patch has some minor debug info coverage
> differences, but it doesn't appear very significant to me.
> https://github.com/pmachata/dwlocstat tool gives (where before is vanilla
> trunk + the rtl.h patch but not {cselib,var-tracking,dse}.c
> --enable-checking=yes,rtl,extra bootstrapped, then {cselib,var-tracking,dse}.c
> hunks applied and make cc1plus, while after is trunk with the whole patch
> applied).
> 
> 64-bit cc1plus
> before
> cov%  samples cumul
> 0..10 1232756/48%     1232756/48%
> 11..20        31089/1%        1263845/49%
> 21..30        39172/1%        1303017/51%
> 31..40        38853/1%        1341870/52%
> 41..50        47473/1%        1389343/54%
> 51..60        45171/1%        1434514/56%
> 61..70        69393/2%        1503907/59%
> 71..80        61988/2%        1565895/61%
> 81..90        104528/4%       1670423/65%
> 91..100       875402/34%      2545825/100%
> after
> cov%  samples cumul
> 0..10 1233238/48%     1233238/48%
> 11..20        31086/1%        1264324/49%
> 21..30        39157/1%        1303481/51%
> 31..40        38819/1%        1342300/52%
> 41..50        47447/1%        1389747/54%
> 51..60        45151/1%        1434898/56%
> 61..70        69379/2%        1504277/59%
> 71..80        61946/2%        1566223/61%
> 81..90        104508/4%       1670731/65%
> 91..100       875094/34%      2545825/100%
> 
> 32-bit cc1plus
> before
> cov%  samples cumul
> 0..10 1231221/48%     1231221/48%
> 11..20        30992/1%        1262213/49%
> 21..30        36422/1%        1298635/51%
> 31..40        35793/1%        1334428/52%
> 41..50        47102/1%        1381530/54%
> 51..60        41201/1%        1422731/56%
> 61..70        65467/2%        1488198/58%
> 71..80        59560/2%        1547758/61%
> 81..90        104076/4%       1651834/65%
> 91..100       881879/34%      2533713/100%
> after
> cov%  samples cumul
> 0..10 1230469/48%     1230469/48%
> 11..20        30390/1%        1260859/49%
> 21..30        36362/1%        1297221/51%
> 31..40        36042/1%        1333263/52%
> 41..50        47619/1%        1380882/54%
> 51..60        41674/1%        1422556/56%
> 61..70        65849/2%        1488405/58%
> 71..80        59857/2%        1548262/61%
> 81..90        104178/4%       1652440/65%
> 91..100       881273/34%      2533713/100%
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64{,-le}}-linux, ok for trunk?

OK.

> For the PR in question, my proposal would be to also lower
> -param=max-find-base-term-values=
> default from 2000 to 200 after this, at least in the above 4
> bootstraps/regtests there is nothing that would ever result in
> find_base_term returning non-NULL with more than 200 VALUEs being processed.

Sounds good to me.

Thanks,
Richard.

> 2020-04-02  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR rtl-optimization/92264
>       * rtl.h (struct rtx_def): Mention that call bit is used as
>       SP_DERIVED_VALUE_P in cselib.c.
>       * cselib.c (SP_DERIVED_VALUE_P): Define.
>       (PRESERVED_VALUE_P, SP_BASED_VALUE_P): Move definitions earlier.
>       (cselib_hasher::equal): Handle equality between SP_DERIVED_VALUE_P
>       val_rtx and sp based expression where offsets cancel each other.
>       (preserve_constants_and_equivs): Formatting fix.
>       (cselib_reset_table): Add reverse op loc to SP_DERIVED_VALUE_P
>       locs list for cfa_base_preserved_val if needed.  Formatting fix.
>       (autoinc_split): If the to be returned value is a REG, MEM or
>       VALUE which has SP_DERIVED_VALUE_P + CONST_INT as one of its
>       locs, return the SP_DERIVED_VALUE_P VALUE and adjust *off.
>       (rtx_equal_for_cselib_1): Call autoinc_split even if both
>       expressions are PLUS in Pmode with CONST_INT second operands.
>       Handle SP_DERIVED_VALUE_P cases.
>       (cselib_hash_plus_const_int): New function.
>       (cselib_hash_rtx): Use it for PLUS in Pmode with CONST_INT
>       second operand, as well as for PRE_DEC etc. that ought to be
>       hashed the same way.
>       (cselib_subst_to_values): Substitute PLUS with Pmode and
>       CONST_INT operand if the first operand is a VALUE which has
>       SP_DERIVED_VALUE_P + CONST_INT as one of its locs for the
>       SP_DERIVED_VALUE_P + adjusted offset.
>       (cselib_lookup_1): When creating a new VALUE for stack_pointer_rtx,
>       set SP_DERIVED_VALUE_P on it.  Set PRESERVED_VALUE_P when adding
>       SP_DERIVED_VALUE_P PRESERVED_VALUE_P subseted VALUE location.
>       * var-tracking.c (vt_initialize): Call cselib_add_permanent_equiv
>       on the sp value before calling cselib_add_permanent_equiv on the
>       cfa_base value.
>       * dse.c (check_for_inc_dec_1, check_for_inc_dec): Punt on RTX_AUTOINC
>       in the insn without REG_INC note.
>       (replace_read): Punt on RTX_AUTOINC in the *loc being replaced.
>       Punt on invalid insns added by copy_to_mode_reg.  Formatting fixes.
> 
> --- gcc/rtl.h.jj      2020-04-01 09:43:45.346628694 +0200
> +++ gcc/rtl.h 2020-04-01 17:46:15.160396309 +0200
> @@ -331,6 +331,7 @@ struct GTY((desc("0"), tag("0"),
>       1 in a MEM if it cannot trap.
>       1 in a CALL_INSN logically equivalent to
>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
> +     1 in a VALUE is SP_DERIVED_VALUE_P in cselib.c.
>       Dumped as "/c" in RTL dumps.  */
>    unsigned int call : 1;
>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
> --- gcc/cselib.c.jj   2020-04-01 09:43:45.305629293 +0200
> +++ gcc/cselib.c      2020-04-01 17:46:15.161396295 +0200
> @@ -58,6 +58,16 @@ static void cselib_invalidate_regno (uns
>  static void cselib_invalidate_mem (rtx);
>  static void cselib_record_set (rtx, cselib_val *, cselib_val *);
>  static void cselib_record_sets (rtx_insn *);
> +static rtx autoinc_split (rtx, rtx *, machine_mode);
> +
> +#define PRESERVED_VALUE_P(RTX) \
> +  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
> +
> +#define SP_BASED_VALUE_P(RTX) \
> +  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
> +
> +#define SP_DERIVED_VALUE_P(RTX) \
> +  (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call)
>  
>  struct expand_value_data
>  {
> @@ -122,6 +132,13 @@ cselib_hasher::equal (const cselib_val *
>    if (GET_CODE (x) == VALUE)
>      return x == v->val_rtx;
>  
> +  if (SP_DERIVED_VALUE_P (v->val_rtx) && GET_MODE (x) == Pmode)
> +    {
> +      rtx xoff = NULL;
> +      if (autoinc_split (x, &xoff, memmode) == v->val_rtx && xoff == 
> NULL_RTX)
> +     return true;
> +    }
> +
>    /* We don't guarantee that distinct rtx's have different hash values,
>       so we need to do a comparison.  */
>    for (l = v->locs; l; l = l->next)
> @@ -256,12 +273,6 @@ void (*cselib_discard_hook) (cselib_val
>  void (*cselib_record_sets_hook) (rtx_insn *insn, struct cselib_set *sets,
>                                int n_sets);
>  
> -#define PRESERVED_VALUE_P(RTX) \
> -  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
> -
> -#define SP_BASED_VALUE_P(RTX) \
> -  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
> -
>  
>  
>  /* Allocate a struct elt_list and fill in its two elements with the
> @@ -494,7 +505,7 @@ preserve_constants_and_equivs (cselib_va
>        };
>        cselib_val **slot
>       = cselib_preserved_hash_table->find_slot_with_hash (&lookup,
> -                                                        v->hash, INSERT);
> +                                                         v->hash, INSERT);
>        gcc_assert (!*slot);
>        *slot = v;
>      }
> @@ -532,6 +543,28 @@ cselib_reset_table (unsigned int num)
>        max_value_regs
>       = hard_regno_nregs (regno,
>                           GET_MODE (cfa_base_preserved_val->locs->loc));
> +
> +      /* If cfa_base is sp + const_int, need to preserve also the
> +      SP_DERIVED_VALUE_P value.  */
> +      for (struct elt_loc_list *l = cfa_base_preserved_val->locs;
> +        l; l = l->next)
> +     if (GET_CODE (l->loc) == PLUS
> +         && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +         && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +         && CONST_INT_P (XEXP (l->loc, 1)))
> +       {
> +         if (! invariant_or_equiv_p (CSELIB_VAL_PTR (XEXP (l->loc, 0))))
> +           {
> +             rtx val = cfa_base_preserved_val->val_rtx;
> +             rtx_insn *save_cselib_current_insn = cselib_current_insn;
> +             cselib_current_insn = l->setting_insn;
> +             new_elt_loc_list (CSELIB_VAL_PTR (XEXP (l->loc, 0)),
> +                               plus_constant (Pmode, val,
> +                                              -UINTVAL (XEXP (l->loc, 1))));
> +             cselib_current_insn = save_cselib_current_insn;
> +           }
> +         break;
> +       }
>      }
>    else
>      {
> @@ -541,8 +574,7 @@ cselib_reset_table (unsigned int num)
>      }
>  
>    if (cselib_preserve_constants)
> -    cselib_hash_table->traverse <void *, preserve_constants_and_equivs>
> -      (NULL);
> +    cselib_hash_table->traverse <void *, preserve_constants_and_equivs> 
> (NULL);
>    else
>      {
>        cselib_hash_table->empty ();
> @@ -799,33 +831,66 @@ autoinc_split (rtx x, rtx *off, machine_
>      {
>      case PLUS:
>        *off = XEXP (x, 1);
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      case PRE_DEC:
>        if (memmode == VOIDmode)
>       return x;
>  
>        *off = gen_int_mode (-GET_MODE_SIZE (memmode), GET_MODE (x));
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      case PRE_INC:
>        if (memmode == VOIDmode)
>       return x;
>  
>        *off = gen_int_mode (GET_MODE_SIZE (memmode), GET_MODE (x));
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      case PRE_MODIFY:
> -      return XEXP (x, 1);
> +      x = XEXP (x, 1);
> +      break;
>  
>      case POST_DEC:
>      case POST_INC:
>      case POST_MODIFY:
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      default:
> -      return x;
> +      break;
>      }
> +
> +  if (GET_MODE (x) == Pmode
> +      && (REG_P (x) || MEM_P (x) || GET_CODE (x) == VALUE)
> +      && (*off == NULL_RTX || CONST_INT_P (*off)))
> +    {
> +      cselib_val *e;
> +      if (GET_CODE (x) == VALUE)
> +     e = CSELIB_VAL_PTR (x);
> +      else
> +     e = cselib_lookup (x, GET_MODE (x), 0, memmode);
> +      if (e)
> +     for (struct elt_loc_list *l = e->locs; l; l = l->next)
> +       if (GET_CODE (l->loc) == PLUS
> +           && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +           && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +           && CONST_INT_P (XEXP (l->loc, 1)))
> +         {
> +           if (*off == NULL_RTX)
> +             *off = XEXP (l->loc, 1);
> +           else
> +             *off = plus_constant (Pmode, *off,
> +                                   INTVAL (XEXP (l->loc, 1)));
> +           if (*off == const0_rtx)
> +             *off = NULL_RTX;
> +           return XEXP (l->loc, 0);
> +         }
> +    }
> +  return x;
>  }
>  
>  /* Return nonzero if we can prove that X and Y contain the same value,
> @@ -868,6 +933,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>        if (GET_CODE (y) == VALUE)
>       return e == canonical_cselib_val (CSELIB_VAL_PTR (y));
>  
> +      if ((SP_DERIVED_VALUE_P (x)
> +        || SP_DERIVED_VALUE_P (e->val_rtx))
> +       && GET_MODE (y) == Pmode)
> +     {
> +       rtx yoff = NULL;
> +       rtx yr = autoinc_split (y, &yoff, memmode);
> +       if ((yr == x || yr == e->val_rtx) && yoff == NULL_RTX)
> +         return 1;
> +     }
> +
>        if (depth == 128)
>       return 0;
>  
> @@ -891,6 +966,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>        cselib_val *e = canonical_cselib_val (CSELIB_VAL_PTR (y));
>        struct elt_loc_list *l;
>  
> +      if ((SP_DERIVED_VALUE_P (y)
> +        || SP_DERIVED_VALUE_P (e->val_rtx))
> +       && GET_MODE (x) == Pmode)
> +     {
> +       rtx xoff = NULL;
> +       rtx xr = autoinc_split (x, &xoff, memmode);
> +       if ((xr == y || xr == e->val_rtx) && xoff == NULL_RTX)
> +         return 1;
> +     }
> +
>        if (depth == 128)
>       return 0;
>  
> @@ -910,7 +995,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>    if (GET_MODE (x) != GET_MODE (y))
>      return 0;
>  
> -  if (GET_CODE (x) != GET_CODE (y))
> +  if (GET_CODE (x) != GET_CODE (y)
> +      || (GET_CODE (x) == PLUS
> +       && GET_MODE (x) == Pmode
> +       && CONST_INT_P (XEXP (x, 1))
> +       && CONST_INT_P (XEXP (y, 1))))
>      {
>        rtx xorig = x, yorig = y;
>        rtx xoff = NULL, yoff = NULL;
> @@ -918,17 +1007,20 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>        x = autoinc_split (x, &xoff, memmode);
>        y = autoinc_split (y, &yoff, memmode);
>  
> -      if (!xoff != !yoff)
> -     return 0;
> -
> -      if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
> -     return 0;
> -
>        /* Don't recurse if nothing changed.  */
>        if (x != xorig || y != yorig)
> -     return rtx_equal_for_cselib_1 (x, y, memmode, depth);
> +     {
> +       if (!xoff != !yoff)
> +         return 0;
>  
> -      return 0;
> +       if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
> +         return 0;
> +
> +       return rtx_equal_for_cselib_1 (x, y, memmode, depth);
> +     }
> +
> +      if (GET_CODE (xorig) != GET_CODE (yorig))
> +     return 0;
>      }
>  
>    /* These won't be handled correctly by the code below.  */
> @@ -1042,6 +1134,41 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>    return 1;
>  }
>  
> +/* Helper function for cselib_hash_rtx.  Arguments like for cselib_hash_rtx,
> +   except that it hashes (plus:P x c).  */
> +
> +static unsigned int
> +cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int create,
> +                         machine_mode memmode)
> +{
> +  cselib_val *e = cselib_lookup (x, GET_MODE (x), create, memmode);
> +  if (! e)
> +    return 0;
> +
> +  if (! SP_DERIVED_VALUE_P (e->val_rtx))
> +    for (struct elt_loc_list *l = e->locs; l; l = l->next)
> +      if (GET_CODE (l->loc) == PLUS
> +       && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +       && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +       && CONST_INT_P (XEXP (l->loc, 1)))
> +     {
> +       e = CSELIB_VAL_PTR (XEXP (l->loc, 0));
> +       c = trunc_int_for_mode (c + UINTVAL (XEXP (l->loc, 1)), Pmode);
> +       break;
> +     }
> +  if (c == 0)
> +    return e->hash;
> +
> +  unsigned hash = (unsigned) PLUS + (unsigned) GET_MODE (x);
> +  hash += e->hash;
> +  unsigned int tem_hash = (unsigned) CONST_INT + (unsigned) VOIDmode;
> +  tem_hash += ((unsigned) CONST_INT << 7) + (unsigned HOST_WIDE_INT) c;
> +  if (tem_hash == 0)
> +    tem_hash = (unsigned int) CONST_INT;
> +  hash += tem_hash;
> +  return hash ? hash : 1 + (unsigned int) PLUS;
> +}
> +
>  /* Hash an rtx.  Return 0 if we couldn't hash the rtx.
>     For registers and memory locations, we look up their cselib_val structure
>     and return its VALUE element.
> @@ -1209,10 +1336,21 @@ cselib_hash_rtx (rtx x, int create, mach
>       offset = -offset;
>        /* Adjust the hash so that (mem:MEMMODE (pre_* (reg))) hashes
>        like (mem:MEMMODE (plus (reg) (const_int I))).  */
> -      hash += (unsigned) PLUS - (unsigned)code
> -     + cselib_hash_rtx (XEXP (x, 0), create, memmode)
> -     + cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
> -                        create, memmode);
> +      if (GET_MODE (x) == Pmode
> +       && (REG_P (XEXP (x, 0))
> +           || MEM_P (XEXP (x, 0))
> +           || GET_CODE (XEXP (x, 0)) == VALUE))
> +     {
> +       HOST_WIDE_INT c;
> +       if (offset.is_constant (&c))
> +         return cselib_hash_plus_const_int (XEXP (x, 0),
> +                                            trunc_int_for_mode (c, Pmode),
> +                                            create, memmode);
> +     }
> +      hash = ((unsigned) PLUS + (unsigned) GET_MODE (x)
> +           + cselib_hash_rtx (XEXP (x, 0), create, memmode)
> +           + cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
> +                              create, memmode));
>        return hash ? hash : 1 + (unsigned) PLUS;
>  
>      case PRE_MODIFY:
> @@ -1237,6 +1375,16 @@ cselib_hash_rtx (rtx x, int create, mach
>  
>        break;
>  
> +    case PLUS:
> +      if (GET_MODE (x) == Pmode
> +       && (REG_P (XEXP (x, 0))
> +           || MEM_P (XEXP (x, 0))
> +           || GET_CODE (XEXP (x, 0)) == VALUE)
> +       && CONST_INT_P (XEXP (x, 1)))
> +     return cselib_hash_plus_const_int (XEXP (x, 0), INTVAL (XEXP (x, 1)),
> +                                        create, memmode);
> +      break;
> +
>      default:
>        break;
>      }
> @@ -1927,6 +2075,26 @@ cselib_subst_to_values (rtx x, machine_m
>        gcc_assert (memmode != VOIDmode);
>        return cselib_subst_to_values (XEXP (x, 0), memmode);
>  
> +    case PLUS:
> +      if (GET_MODE (x) == Pmode && CONST_INT_P (XEXP (x, 1)))
> +     {
> +       rtx t = cselib_subst_to_values (XEXP (x, 0), memmode);
> +       if (GET_CODE (t) == VALUE)
> +         for (struct elt_loc_list *l = CSELIB_VAL_PTR (t)->locs;
> +              l; l = l->next)
> +           if (GET_CODE (l->loc) == PLUS
> +               && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +               && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +               && CONST_INT_P (XEXP (l->loc, 1)))
> +             return plus_constant (Pmode, l->loc, INTVAL (XEXP (x, 1)));
> +       if (t != XEXP (x, 0))
> +         {
> +           copy = shallow_copy_rtx (x);
> +           XEXP (copy, 0) = t;
> +         }
> +       return copy;
> +     }
> +
>      default:
>        break;
>      }
> @@ -2030,6 +2198,8 @@ cselib_lookup_1 (rtx x, machine_mode mod
>       }
>  
>        e = new_cselib_val (next_uid, GET_MODE (x), x);
> +      if (GET_MODE (x) == Pmode && x == stack_pointer_rtx)
> +     SP_DERIVED_VALUE_P (e->val_rtx) = 1;
>        new_elt_loc_list (e, x);
>  
>        scalar_int_mode int_mode;
> @@ -2107,7 +2277,14 @@ cselib_lookup_1 (rtx x, machine_mode mod
>       the hash table is inconsistent until we do so, and
>       cselib_subst_to_values will need to do lookups.  */
>    *slot = e;
> -  new_elt_loc_list (e, cselib_subst_to_values (x, memmode));
> +  rtx v = cselib_subst_to_values (x, memmode);
> +
> +  /* If cselib_preserve_constants, we might get a SP_DERIVED_VALUE_P
> +     VALUE that isn't in the hash tables anymore.  */
> +  if (GET_CODE (v) == VALUE && SP_DERIVED_VALUE_P (v) && PRESERVED_VALUE_P 
> (v))
> +    PRESERVED_VALUE_P (e->val_rtx) = 1;
> +
> +  new_elt_loc_list (e, v);
>    return e;
>  }
>  
> --- gcc/var-tracking.c.jj     2020-03-26 09:15:29.825500110 +0100
> +++ gcc/var-tracking.c        2020-04-01 19:09:30.145421270 +0200
> @@ -10045,19 +10045,20 @@ vt_initialize (void)
>        preserve_value (val);
>        if (reg != hard_frame_pointer_rtx && fixed_regs[REGNO (reg)])
>       cselib_preserve_cfa_base_value (val, REGNO (reg));
> -      expr = plus_constant (GET_MODE (stack_pointer_rtx),
> -                         stack_pointer_rtx, -ofst);
> -      cselib_add_permanent_equiv (val, expr, get_insns ());
> -
>        if (ofst)
>       {
> -       val = cselib_lookup_from_insn (stack_pointer_rtx,
> -                                      GET_MODE (stack_pointer_rtx), 1,
> -                                      VOIDmode, get_insns ());
> -       preserve_value (val);
> +       cselib_val *valsp
> +         = cselib_lookup_from_insn (stack_pointer_rtx,
> +                                    GET_MODE (stack_pointer_rtx), 1,
> +                                    VOIDmode, get_insns ());
> +       preserve_value (valsp);
>         expr = plus_constant (GET_MODE (reg), reg, ofst);
> -       cselib_add_permanent_equiv (val, expr, get_insns ());
> +       cselib_add_permanent_equiv (valsp, expr, get_insns ());
>       }
> +
> +      expr = plus_constant (GET_MODE (stack_pointer_rtx),
> +                         stack_pointer_rtx, -ofst);
> +      cselib_add_permanent_equiv (val, expr, get_insns ());
>      }
>  
>    /* In order to factor out the adjustments made to the stack pointer or to
> --- gcc/dse.c.jj      2020-03-03 11:04:46.365821937 +0100
> +++ gcc/dse.c 2020-04-02 10:48:08.042260842 +0200
> @@ -846,6 +846,17 @@ check_for_inc_dec_1 (insn_info_t insn_in
>    if (note)
>      return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before,
>                            insn_info) == 0;
> +
> +  /* Punt on stack pushes, those don't have REG_INC notes and we are
> +     unprepared to deal with distribution of REG_ARGS_SIZE notes etc.  */
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
> +    {
> +      const_rtx x = *iter;
> +      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
> +     return false;
> +    }
> +
>    return true;
>  }
>  
> @@ -866,6 +877,17 @@ check_for_inc_dec (rtx_insn *insn)
>    if (note)
>      return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before,
>                            &insn_info) == 0;
> +
> +  /* Punt on stack pushes, those don't have REG_INC notes and we are
> +     unprepared to deal with distribution of REG_ARGS_SIZE notes etc.  */
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
> +    {
> +      const_rtx x = *iter;
> +      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
> +     return false;
> +    }
> +
>    return true;
>  }
>  
> @@ -1980,16 +2002,30 @@ replace_read (store_info *store_info, in
>        point.  This does occasionally happen, see PR 37922.  */
>        bitmap regs_set = BITMAP_ALLOC (&reg_obstack);
>  
> -      for (this_insn = insns; this_insn != NULL_RTX; this_insn = NEXT_INSN 
> (this_insn))
> -     note_stores (this_insn, look_for_hardregs, regs_set);
> +      for (this_insn = insns;
> +        this_insn != NULL_RTX; this_insn = NEXT_INSN (this_insn))
> +     {
> +       if (insn_invalid_p (this_insn, false))
> +         {
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, " -- replacing the loaded MEM with ");
> +               print_simple_rtl (dump_file, read_reg);
> +               fprintf (dump_file, " led to an invalid instruction\n");
> +             }
> +           BITMAP_FREE (regs_set);
> +           return false;
> +         }
> +       note_stores (this_insn, look_for_hardregs, regs_set);
> +     }
>  
>        bitmap_and_into (regs_set, regs_live);
>        if (!bitmap_empty_p (regs_set))
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS))
>           {
> -           fprintf (dump_file,
> -                    "abandoning replacement because sequence clobbers live 
> hardregs:");
> +           fprintf (dump_file, "abandoning replacement because sequence "
> +                               "clobbers live hardregs:");
>             df_print_regset (dump_file, regs_set);
>           }
>  
> @@ -1999,6 +2035,19 @@ replace_read (store_info *store_info, in
>        BITMAP_FREE (regs_set);
>      }
>  
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
> +    {
> +      const_rtx x = *iter;
> +      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
> +     {
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         fprintf (dump_file, " -- replacing the MEM failed due to address "
> +                             "side-effects\n");
> +       return false;
> +     }
> +    }
> +
>    if (validate_change (read_insn->insn, loc, read_reg, 0))
>      {
>        deferred_change *change = deferred_change_pool.allocate ();
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to