On Thu, 2 Apr 2020, Jakub Jelinek wrote: > Hi! > > As discussed in the PR, if !ACCUMULATE_OUTGOING_ARGS on large functions we > can have hundreds of thousands of stack pointer adjustments and cselib > creates a new VALUE after each sp adjustment, which form extremely deep > VALUE chains, which is very harmful e.g. for find_base_term. > E.g. if we have > sp -= 4 > sp -= 4 > sp += 4 > sp += 4 > sp -= 4 > sp += 4 > that means 7 VALUEs, one for the sp at beginning (val1), than val2 = val1 - > 4, then val3 = val2 - 4, then val4 = val3 + 4, then val5 = val4 + 4, then > val6 = val5 - 4, then val7 = val6 + 4. > This patch tweaks cselib, so that it is smarter about sp adjustments. > When cselib_lookup (stack_pointer_rtx, Pmode, 1, VOIDmode) and we know > nothing about sp yet (this happens at the start of the function, for > non-var-tracking also after cselib_reset_table and for var-tracking after > processing fp_setter insn where we forget about former sp values because > that is now hfp related while everything after it is sp related), we > look it up normally, but in addition to what we have been doing before > we mark the VALUE as SP_DERIVED_VALUE_P. Further lookups of sp + offset > are then special cased, so that it is canonicalized to that > SP_DERIVED_VALUE_P VALUE + CONST_INT (if possible). So, for the above, > we get val1 with SP_DERIVED_VALUE_P set, then val2 = val1 - 4, val3 = val1 - > 8 (note, no longer val2 - 4!), then we get val2 again, val1 again, val2 > again, val1 again. > In the find_base_term visited_vals.length () > 100 find_base_term > statistics during combined x86_64-linux and i686-linux bootstrap+regtest > cycle, without the patch I see: > find_base_term > 100 > returning NULL returning non-NULL > 32-bit compilations 4229178 407 > 64-bit compilations 217523 0 > with largest visited_vals.length () when returning non-NULL being 206. > With the patch the same numbers are: > 32-bit compilations 1249588 135 > 64-bit compilations 3510 0 > with largest visited_vals.length () when returning non-NULL being 173. > This shows significant reduction of the deep VALUE chains. > On powerpc64{,le}-linux, these stats didn't change at all, we have > 1008 0 > for all of -m32, -m64 and little-endian -m64, just the > gcc.dg/pr85180.c and gcc.dg/pr87985.c testcases which are unrelated to sp. > > My earlier version of the patch, which contained just the rtl.h and cselib.c > changes, regressed some tests: > gcc.dg/guality/{pr36728-{1,3},pr68860-{1,2}}.c > gcc.target/i386/{pr88416,sse-{13,23,24,25,26}}.c > The problem with the former tests was worse debug info, where with -m32 > where arg7 was passed in a stack slot we though a push later on might have > invalidated it, when it couldn't. This is something I've solved with the > var-tracking.c (vt_initialize) changes. In those problematic functions, we > create a cfa_base VALUE (argp) and want to record that at the start of > the function the argp VALUE is sp + off and also record that current sp > VALUE is argp's VALUE - off. The second permanent equivalence didn't make > it after the patch though, because cselib_add_permanent_equiv will > cselib_lookup the value of the expression it wants to add as the equivalence > and if it is the same VALUE as we are calling it on, it doesn't do anything; > and due to the cselib changes for sp based accesses that is exactly what > happened. By reversing the order of the cselib_add_permanent_equiv calls we > get both equivalences though and thus are able to canonicalize the sp based > accesses in var-tracking to the cfa_base value + offset.
I think this warrants a comment in the code. > The i386 FAILs were all ICEs, where we had pushf instruction pushing flags > and then pop pseudo reading that value again. With the cselib changes, > cselib during RTL DSE is able to see through the sp adjustment and wanted > to replace_read what was done pushf, by moving the flags register into a > pseudo and replace the memory read in the pop with that pseudo. That is > wrong for two reasons: one is that the backend doesn't have an instruction > to move the flags hard register into some other register, but replace_read > has been validating just the mem -> pseudo replacement and not the insns > emitted by copy_to_mode_reg. And the second issue is that it is obviously > wrong to replace a stack pop which contains stack post-increment by a copy > of pseudo into destination. dse.c has some code to handle RTX_AUTOINC, but > only uses it when actually removing stores and only when there is REG_INC > note (stack RTX_AUTOINC does not have those), in check_for_inc_dec* where > it emits the reg adjustment(s) before the insn that is going to be deleted. > replace_read doesn't remove the insn, so if it e.g. contained REG_INC note, > it would be kept there and we might have the RTX_AUTOINC not just in *loc, > but other spots. > So, the dse.c changes try to validate the added insns and punt on all > RTX_AUTOINC in *loc. Furthermore, it seems that with the cselib.c changes > on the gfortran.dg/pr87360.f90 and gcc.target/i386/pr88416.c testcases > check_for_inc_dec{,_1} happily throws stack pointer autoinc on the floor, > which is also wrong. While we could perhaps do the for_each_inc_dec > call regardless of whether we have REG_INC note or not, we aren't prepared > to handle e.g. REG_ARGS_SIZE distribution and thus could end up with wrong > unwind info or ICEs during dwarf2cfi.c. So the patch also punts on those, > after all, if we'd in theory managed to try to optimize such pushes before, > we'd create wrong-code. > > On x86_64-linux and i686-linux, the patch has some minor debug info coverage > differences, but it doesn't appear very significant to me. > https://github.com/pmachata/dwlocstat tool gives (where before is vanilla > trunk + the rtl.h patch but not {cselib,var-tracking,dse}.c > --enable-checking=yes,rtl,extra bootstrapped, then {cselib,var-tracking,dse}.c > hunks applied and make cc1plus, while after is trunk with the whole patch > applied). > > 64-bit cc1plus > before > cov% samples cumul > 0..10 1232756/48% 1232756/48% > 11..20 31089/1% 1263845/49% > 21..30 39172/1% 1303017/51% > 31..40 38853/1% 1341870/52% > 41..50 47473/1% 1389343/54% > 51..60 45171/1% 1434514/56% > 61..70 69393/2% 1503907/59% > 71..80 61988/2% 1565895/61% > 81..90 104528/4% 1670423/65% > 91..100 875402/34% 2545825/100% > after > cov% samples cumul > 0..10 1233238/48% 1233238/48% > 11..20 31086/1% 1264324/49% > 21..30 39157/1% 1303481/51% > 31..40 38819/1% 1342300/52% > 41..50 47447/1% 1389747/54% > 51..60 45151/1% 1434898/56% > 61..70 69379/2% 1504277/59% > 71..80 61946/2% 1566223/61% > 81..90 104508/4% 1670731/65% > 91..100 875094/34% 2545825/100% > > 32-bit cc1plus > before > cov% samples cumul > 0..10 1231221/48% 1231221/48% > 11..20 30992/1% 1262213/49% > 21..30 36422/1% 1298635/51% > 31..40 35793/1% 1334428/52% > 41..50 47102/1% 1381530/54% > 51..60 41201/1% 1422731/56% > 61..70 65467/2% 1488198/58% > 71..80 59560/2% 1547758/61% > 81..90 104076/4% 1651834/65% > 91..100 881879/34% 2533713/100% > after > cov% samples cumul > 0..10 1230469/48% 1230469/48% > 11..20 30390/1% 1260859/49% > 21..30 36362/1% 1297221/51% > 31..40 36042/1% 1333263/52% > 41..50 47619/1% 1380882/54% > 51..60 41674/1% 1422556/56% > 61..70 65849/2% 1488405/58% > 71..80 59857/2% 1548262/61% > 81..90 104178/4% 1652440/65% > 91..100 881273/34% 2533713/100% > > Bootstrapped/regtested on {x86_64,i686,powerpc64{,-le}}-linux, ok for trunk? OK. > For the PR in question, my proposal would be to also lower > -param=max-find-base-term-values= > default from 2000 to 200 after this, at least in the above 4 > bootstraps/regtests there is nothing that would ever result in > find_base_term returning non-NULL with more than 200 VALUEs being processed. Sounds good to me. Thanks, Richard. > 2020-04-02 Jakub Jelinek <ja...@redhat.com> > > PR rtl-optimization/92264 > * rtl.h (struct rtx_def): Mention that call bit is used as > SP_DERIVED_VALUE_P in cselib.c. > * cselib.c (SP_DERIVED_VALUE_P): Define. > (PRESERVED_VALUE_P, SP_BASED_VALUE_P): Move definitions earlier. > (cselib_hasher::equal): Handle equality between SP_DERIVED_VALUE_P > val_rtx and sp based expression where offsets cancel each other. > (preserve_constants_and_equivs): Formatting fix. > (cselib_reset_table): Add reverse op loc to SP_DERIVED_VALUE_P > locs list for cfa_base_preserved_val if needed. Formatting fix. > (autoinc_split): If the to be returned value is a REG, MEM or > VALUE which has SP_DERIVED_VALUE_P + CONST_INT as one of its > locs, return the SP_DERIVED_VALUE_P VALUE and adjust *off. > (rtx_equal_for_cselib_1): Call autoinc_split even if both > expressions are PLUS in Pmode with CONST_INT second operands. > Handle SP_DERIVED_VALUE_P cases. > (cselib_hash_plus_const_int): New function. > (cselib_hash_rtx): Use it for PLUS in Pmode with CONST_INT > second operand, as well as for PRE_DEC etc. that ought to be > hashed the same way. > (cselib_subst_to_values): Substitute PLUS with Pmode and > CONST_INT operand if the first operand is a VALUE which has > SP_DERIVED_VALUE_P + CONST_INT as one of its locs for the > SP_DERIVED_VALUE_P + adjusted offset. > (cselib_lookup_1): When creating a new VALUE for stack_pointer_rtx, > set SP_DERIVED_VALUE_P on it. Set PRESERVED_VALUE_P when adding > SP_DERIVED_VALUE_P PRESERVED_VALUE_P subseted VALUE location. > * var-tracking.c (vt_initialize): Call cselib_add_permanent_equiv > on the sp value before calling cselib_add_permanent_equiv on the > cfa_base value. > * dse.c (check_for_inc_dec_1, check_for_inc_dec): Punt on RTX_AUTOINC > in the insn without REG_INC note. > (replace_read): Punt on RTX_AUTOINC in the *loc being replaced. > Punt on invalid insns added by copy_to_mode_reg. Formatting fixes. > > --- gcc/rtl.h.jj 2020-04-01 09:43:45.346628694 +0200 > +++ gcc/rtl.h 2020-04-01 17:46:15.160396309 +0200 > @@ -331,6 +331,7 @@ struct GTY((desc("0"), tag("0"), > 1 in a MEM if it cannot trap. > 1 in a CALL_INSN logically equivalent to > ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P. > + 1 in a VALUE is SP_DERIVED_VALUE_P in cselib.c. > Dumped as "/c" in RTL dumps. */ > unsigned int call : 1; > /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere. > --- gcc/cselib.c.jj 2020-04-01 09:43:45.305629293 +0200 > +++ gcc/cselib.c 2020-04-01 17:46:15.161396295 +0200 > @@ -58,6 +58,16 @@ static void cselib_invalidate_regno (uns > static void cselib_invalidate_mem (rtx); > static void cselib_record_set (rtx, cselib_val *, cselib_val *); > static void cselib_record_sets (rtx_insn *); > +static rtx autoinc_split (rtx, rtx *, machine_mode); > + > +#define PRESERVED_VALUE_P(RTX) \ > + (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging) > + > +#define SP_BASED_VALUE_P(RTX) \ > + (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump) > + > +#define SP_DERIVED_VALUE_P(RTX) \ > + (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call) > > struct expand_value_data > { > @@ -122,6 +132,13 @@ cselib_hasher::equal (const cselib_val * > if (GET_CODE (x) == VALUE) > return x == v->val_rtx; > > + if (SP_DERIVED_VALUE_P (v->val_rtx) && GET_MODE (x) == Pmode) > + { > + rtx xoff = NULL; > + if (autoinc_split (x, &xoff, memmode) == v->val_rtx && xoff == > NULL_RTX) > + return true; > + } > + > /* We don't guarantee that distinct rtx's have different hash values, > so we need to do a comparison. */ > for (l = v->locs; l; l = l->next) > @@ -256,12 +273,6 @@ void (*cselib_discard_hook) (cselib_val > void (*cselib_record_sets_hook) (rtx_insn *insn, struct cselib_set *sets, > int n_sets); > > -#define PRESERVED_VALUE_P(RTX) \ > - (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging) > - > -#define SP_BASED_VALUE_P(RTX) \ > - (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump) > - > > > /* Allocate a struct elt_list and fill in its two elements with the > @@ -494,7 +505,7 @@ preserve_constants_and_equivs (cselib_va > }; > cselib_val **slot > = cselib_preserved_hash_table->find_slot_with_hash (&lookup, > - v->hash, INSERT); > + v->hash, INSERT); > gcc_assert (!*slot); > *slot = v; > } > @@ -532,6 +543,28 @@ cselib_reset_table (unsigned int num) > max_value_regs > = hard_regno_nregs (regno, > GET_MODE (cfa_base_preserved_val->locs->loc)); > + > + /* If cfa_base is sp + const_int, need to preserve also the > + SP_DERIVED_VALUE_P value. */ > + for (struct elt_loc_list *l = cfa_base_preserved_val->locs; > + l; l = l->next) > + if (GET_CODE (l->loc) == PLUS > + && GET_CODE (XEXP (l->loc, 0)) == VALUE > + && SP_DERIVED_VALUE_P (XEXP (l->loc, 0)) > + && CONST_INT_P (XEXP (l->loc, 1))) > + { > + if (! invariant_or_equiv_p (CSELIB_VAL_PTR (XEXP (l->loc, 0)))) > + { > + rtx val = cfa_base_preserved_val->val_rtx; > + rtx_insn *save_cselib_current_insn = cselib_current_insn; > + cselib_current_insn = l->setting_insn; > + new_elt_loc_list (CSELIB_VAL_PTR (XEXP (l->loc, 0)), > + plus_constant (Pmode, val, > + -UINTVAL (XEXP (l->loc, 1)))); > + cselib_current_insn = save_cselib_current_insn; > + } > + break; > + } > } > else > { > @@ -541,8 +574,7 @@ cselib_reset_table (unsigned int num) > } > > if (cselib_preserve_constants) > - cselib_hash_table->traverse <void *, preserve_constants_and_equivs> > - (NULL); > + cselib_hash_table->traverse <void *, preserve_constants_and_equivs> > (NULL); > else > { > cselib_hash_table->empty (); > @@ -799,33 +831,66 @@ autoinc_split (rtx x, rtx *off, machine_ > { > case PLUS: > *off = XEXP (x, 1); > - return XEXP (x, 0); > + x = XEXP (x, 0); > + break; > > case PRE_DEC: > if (memmode == VOIDmode) > return x; > > *off = gen_int_mode (-GET_MODE_SIZE (memmode), GET_MODE (x)); > - return XEXP (x, 0); > + x = XEXP (x, 0); > + break; > > case PRE_INC: > if (memmode == VOIDmode) > return x; > > *off = gen_int_mode (GET_MODE_SIZE (memmode), GET_MODE (x)); > - return XEXP (x, 0); > + x = XEXP (x, 0); > + break; > > case PRE_MODIFY: > - return XEXP (x, 1); > + x = XEXP (x, 1); > + break; > > case POST_DEC: > case POST_INC: > case POST_MODIFY: > - return XEXP (x, 0); > + x = XEXP (x, 0); > + break; > > default: > - return x; > + break; > } > + > + if (GET_MODE (x) == Pmode > + && (REG_P (x) || MEM_P (x) || GET_CODE (x) == VALUE) > + && (*off == NULL_RTX || CONST_INT_P (*off))) > + { > + cselib_val *e; > + if (GET_CODE (x) == VALUE) > + e = CSELIB_VAL_PTR (x); > + else > + e = cselib_lookup (x, GET_MODE (x), 0, memmode); > + if (e) > + for (struct elt_loc_list *l = e->locs; l; l = l->next) > + if (GET_CODE (l->loc) == PLUS > + && GET_CODE (XEXP (l->loc, 0)) == VALUE > + && SP_DERIVED_VALUE_P (XEXP (l->loc, 0)) > + && CONST_INT_P (XEXP (l->loc, 1))) > + { > + if (*off == NULL_RTX) > + *off = XEXP (l->loc, 1); > + else > + *off = plus_constant (Pmode, *off, > + INTVAL (XEXP (l->loc, 1))); > + if (*off == const0_rtx) > + *off = NULL_RTX; > + return XEXP (l->loc, 0); > + } > + } > + return x; > } > > /* Return nonzero if we can prove that X and Y contain the same value, > @@ -868,6 +933,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma > if (GET_CODE (y) == VALUE) > return e == canonical_cselib_val (CSELIB_VAL_PTR (y)); > > + if ((SP_DERIVED_VALUE_P (x) > + || SP_DERIVED_VALUE_P (e->val_rtx)) > + && GET_MODE (y) == Pmode) > + { > + rtx yoff = NULL; > + rtx yr = autoinc_split (y, &yoff, memmode); > + if ((yr == x || yr == e->val_rtx) && yoff == NULL_RTX) > + return 1; > + } > + > if (depth == 128) > return 0; > > @@ -891,6 +966,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma > cselib_val *e = canonical_cselib_val (CSELIB_VAL_PTR (y)); > struct elt_loc_list *l; > > + if ((SP_DERIVED_VALUE_P (y) > + || SP_DERIVED_VALUE_P (e->val_rtx)) > + && GET_MODE (x) == Pmode) > + { > + rtx xoff = NULL; > + rtx xr = autoinc_split (x, &xoff, memmode); > + if ((xr == y || xr == e->val_rtx) && xoff == NULL_RTX) > + return 1; > + } > + > if (depth == 128) > return 0; > > @@ -910,7 +995,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma > if (GET_MODE (x) != GET_MODE (y)) > return 0; > > - if (GET_CODE (x) != GET_CODE (y)) > + if (GET_CODE (x) != GET_CODE (y) > + || (GET_CODE (x) == PLUS > + && GET_MODE (x) == Pmode > + && CONST_INT_P (XEXP (x, 1)) > + && CONST_INT_P (XEXP (y, 1)))) > { > rtx xorig = x, yorig = y; > rtx xoff = NULL, yoff = NULL; > @@ -918,17 +1007,20 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma > x = autoinc_split (x, &xoff, memmode); > y = autoinc_split (y, &yoff, memmode); > > - if (!xoff != !yoff) > - return 0; > - > - if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth)) > - return 0; > - > /* Don't recurse if nothing changed. */ > if (x != xorig || y != yorig) > - return rtx_equal_for_cselib_1 (x, y, memmode, depth); > + { > + if (!xoff != !yoff) > + return 0; > > - return 0; > + if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth)) > + return 0; > + > + return rtx_equal_for_cselib_1 (x, y, memmode, depth); > + } > + > + if (GET_CODE (xorig) != GET_CODE (yorig)) > + return 0; > } > > /* These won't be handled correctly by the code below. */ > @@ -1042,6 +1134,41 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma > return 1; > } > > +/* Helper function for cselib_hash_rtx. Arguments like for cselib_hash_rtx, > + except that it hashes (plus:P x c). */ > + > +static unsigned int > +cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int create, > + machine_mode memmode) > +{ > + cselib_val *e = cselib_lookup (x, GET_MODE (x), create, memmode); > + if (! e) > + return 0; > + > + if (! SP_DERIVED_VALUE_P (e->val_rtx)) > + for (struct elt_loc_list *l = e->locs; l; l = l->next) > + if (GET_CODE (l->loc) == PLUS > + && GET_CODE (XEXP (l->loc, 0)) == VALUE > + && SP_DERIVED_VALUE_P (XEXP (l->loc, 0)) > + && CONST_INT_P (XEXP (l->loc, 1))) > + { > + e = CSELIB_VAL_PTR (XEXP (l->loc, 0)); > + c = trunc_int_for_mode (c + UINTVAL (XEXP (l->loc, 1)), Pmode); > + break; > + } > + if (c == 0) > + return e->hash; > + > + unsigned hash = (unsigned) PLUS + (unsigned) GET_MODE (x); > + hash += e->hash; > + unsigned int tem_hash = (unsigned) CONST_INT + (unsigned) VOIDmode; > + tem_hash += ((unsigned) CONST_INT << 7) + (unsigned HOST_WIDE_INT) c; > + if (tem_hash == 0) > + tem_hash = (unsigned int) CONST_INT; > + hash += tem_hash; > + return hash ? hash : 1 + (unsigned int) PLUS; > +} > + > /* Hash an rtx. Return 0 if we couldn't hash the rtx. > For registers and memory locations, we look up their cselib_val structure > and return its VALUE element. > @@ -1209,10 +1336,21 @@ cselib_hash_rtx (rtx x, int create, mach > offset = -offset; > /* Adjust the hash so that (mem:MEMMODE (pre_* (reg))) hashes > like (mem:MEMMODE (plus (reg) (const_int I))). */ > - hash += (unsigned) PLUS - (unsigned)code > - + cselib_hash_rtx (XEXP (x, 0), create, memmode) > - + cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)), > - create, memmode); > + if (GET_MODE (x) == Pmode > + && (REG_P (XEXP (x, 0)) > + || MEM_P (XEXP (x, 0)) > + || GET_CODE (XEXP (x, 0)) == VALUE)) > + { > + HOST_WIDE_INT c; > + if (offset.is_constant (&c)) > + return cselib_hash_plus_const_int (XEXP (x, 0), > + trunc_int_for_mode (c, Pmode), > + create, memmode); > + } > + hash = ((unsigned) PLUS + (unsigned) GET_MODE (x) > + + cselib_hash_rtx (XEXP (x, 0), create, memmode) > + + cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)), > + create, memmode)); > return hash ? hash : 1 + (unsigned) PLUS; > > case PRE_MODIFY: > @@ -1237,6 +1375,16 @@ cselib_hash_rtx (rtx x, int create, mach > > break; > > + case PLUS: > + if (GET_MODE (x) == Pmode > + && (REG_P (XEXP (x, 0)) > + || MEM_P (XEXP (x, 0)) > + || GET_CODE (XEXP (x, 0)) == VALUE) > + && CONST_INT_P (XEXP (x, 1))) > + return cselib_hash_plus_const_int (XEXP (x, 0), INTVAL (XEXP (x, 1)), > + create, memmode); > + break; > + > default: > break; > } > @@ -1927,6 +2075,26 @@ cselib_subst_to_values (rtx x, machine_m > gcc_assert (memmode != VOIDmode); > return cselib_subst_to_values (XEXP (x, 0), memmode); > > + case PLUS: > + if (GET_MODE (x) == Pmode && CONST_INT_P (XEXP (x, 1))) > + { > + rtx t = cselib_subst_to_values (XEXP (x, 0), memmode); > + if (GET_CODE (t) == VALUE) > + for (struct elt_loc_list *l = CSELIB_VAL_PTR (t)->locs; > + l; l = l->next) > + if (GET_CODE (l->loc) == PLUS > + && GET_CODE (XEXP (l->loc, 0)) == VALUE > + && SP_DERIVED_VALUE_P (XEXP (l->loc, 0)) > + && CONST_INT_P (XEXP (l->loc, 1))) > + return plus_constant (Pmode, l->loc, INTVAL (XEXP (x, 1))); > + if (t != XEXP (x, 0)) > + { > + copy = shallow_copy_rtx (x); > + XEXP (copy, 0) = t; > + } > + return copy; > + } > + > default: > break; > } > @@ -2030,6 +2198,8 @@ cselib_lookup_1 (rtx x, machine_mode mod > } > > e = new_cselib_val (next_uid, GET_MODE (x), x); > + if (GET_MODE (x) == Pmode && x == stack_pointer_rtx) > + SP_DERIVED_VALUE_P (e->val_rtx) = 1; > new_elt_loc_list (e, x); > > scalar_int_mode int_mode; > @@ -2107,7 +2277,14 @@ cselib_lookup_1 (rtx x, machine_mode mod > the hash table is inconsistent until we do so, and > cselib_subst_to_values will need to do lookups. */ > *slot = e; > - new_elt_loc_list (e, cselib_subst_to_values (x, memmode)); > + rtx v = cselib_subst_to_values (x, memmode); > + > + /* If cselib_preserve_constants, we might get a SP_DERIVED_VALUE_P > + VALUE that isn't in the hash tables anymore. */ > + if (GET_CODE (v) == VALUE && SP_DERIVED_VALUE_P (v) && PRESERVED_VALUE_P > (v)) > + PRESERVED_VALUE_P (e->val_rtx) = 1; > + > + new_elt_loc_list (e, v); > return e; > } > > --- gcc/var-tracking.c.jj 2020-03-26 09:15:29.825500110 +0100 > +++ gcc/var-tracking.c 2020-04-01 19:09:30.145421270 +0200 > @@ -10045,19 +10045,20 @@ vt_initialize (void) > preserve_value (val); > if (reg != hard_frame_pointer_rtx && fixed_regs[REGNO (reg)]) > cselib_preserve_cfa_base_value (val, REGNO (reg)); > - expr = plus_constant (GET_MODE (stack_pointer_rtx), > - stack_pointer_rtx, -ofst); > - cselib_add_permanent_equiv (val, expr, get_insns ()); > - > if (ofst) > { > - val = cselib_lookup_from_insn (stack_pointer_rtx, > - GET_MODE (stack_pointer_rtx), 1, > - VOIDmode, get_insns ()); > - preserve_value (val); > + cselib_val *valsp > + = cselib_lookup_from_insn (stack_pointer_rtx, > + GET_MODE (stack_pointer_rtx), 1, > + VOIDmode, get_insns ()); > + preserve_value (valsp); > expr = plus_constant (GET_MODE (reg), reg, ofst); > - cselib_add_permanent_equiv (val, expr, get_insns ()); > + cselib_add_permanent_equiv (valsp, expr, get_insns ()); > } > + > + expr = plus_constant (GET_MODE (stack_pointer_rtx), > + stack_pointer_rtx, -ofst); > + cselib_add_permanent_equiv (val, expr, get_insns ()); > } > > /* In order to factor out the adjustments made to the stack pointer or to > --- gcc/dse.c.jj 2020-03-03 11:04:46.365821937 +0100 > +++ gcc/dse.c 2020-04-02 10:48:08.042260842 +0200 > @@ -846,6 +846,17 @@ check_for_inc_dec_1 (insn_info_t insn_in > if (note) > return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before, > insn_info) == 0; > + > + /* Punt on stack pushes, those don't have REG_INC notes and we are > + unprepared to deal with distribution of REG_ARGS_SIZE notes etc. */ > + subrtx_iterator::array_type array; > + FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST) > + { > + const_rtx x = *iter; > + if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC) > + return false; > + } > + > return true; > } > > @@ -866,6 +877,17 @@ check_for_inc_dec (rtx_insn *insn) > if (note) > return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before, > &insn_info) == 0; > + > + /* Punt on stack pushes, those don't have REG_INC notes and we are > + unprepared to deal with distribution of REG_ARGS_SIZE notes etc. */ > + subrtx_iterator::array_type array; > + FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST) > + { > + const_rtx x = *iter; > + if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC) > + return false; > + } > + > return true; > } > > @@ -1980,16 +2002,30 @@ replace_read (store_info *store_info, in > point. This does occasionally happen, see PR 37922. */ > bitmap regs_set = BITMAP_ALLOC (®_obstack); > > - for (this_insn = insns; this_insn != NULL_RTX; this_insn = NEXT_INSN > (this_insn)) > - note_stores (this_insn, look_for_hardregs, regs_set); > + for (this_insn = insns; > + this_insn != NULL_RTX; this_insn = NEXT_INSN (this_insn)) > + { > + if (insn_invalid_p (this_insn, false)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, " -- replacing the loaded MEM with "); > + print_simple_rtl (dump_file, read_reg); > + fprintf (dump_file, " led to an invalid instruction\n"); > + } > + BITMAP_FREE (regs_set); > + return false; > + } > + note_stores (this_insn, look_for_hardregs, regs_set); > + } > > bitmap_and_into (regs_set, regs_live); > if (!bitmap_empty_p (regs_set)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > { > - fprintf (dump_file, > - "abandoning replacement because sequence clobbers live > hardregs:"); > + fprintf (dump_file, "abandoning replacement because sequence " > + "clobbers live hardregs:"); > df_print_regset (dump_file, regs_set); > } > > @@ -1999,6 +2035,19 @@ replace_read (store_info *store_info, in > BITMAP_FREE (regs_set); > } > > + subrtx_iterator::array_type array; > + FOR_EACH_SUBRTX (iter, array, *loc, NONCONST) > + { > + const_rtx x = *iter; > + if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, " -- replacing the MEM failed due to address " > + "side-effects\n"); > + return false; > + } > + } > + > if (validate_change (read_insn->insn, loc, read_reg, 0)) > { > deferred_change *change = deferred_change_pool.allocate (); > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)