On Tue, 4 Feb 2025, Jakub Jelinek wrote: > On Tue, Feb 04, 2025 at 11:53:21AM +0100, Richard Biener wrote: > > > + if (sp_derived_base) > > > + if (cselib_val *v3 > > > + = cselib_lookup_1 (stack_pointer_rtx, Pmode, 0, VOIDmode)) > > > + { > > > + bool ok = false; > > > + HOST_WIDE_INT off = 0; > > > + if (v3->val_rtx == sp_derived_base) > > > + ok = true; > > > + else > > > + for (struct elt_loc_list *l = v3->locs; l; l = l->next) > > > > isn't this invariant on the whole workset? You are looking up > > the CSELIB val for stack_pointer_rtx and traversing it's locations. > > The way it is written it isn't invariant, but you're right it might be a > good idea to tweak it so that it is invariant and just compute it lazily the > first time we need it during the single cselib_invalidate_memory call > (with SP_DERIVED_VALUE_P tests instead of == sp_derived_base) and then just > compare that base against sp_derived_base (perhaps with better names for the > 2 pairs of variables, VALUE and corresponding offset). > > > It looks like you want to know whether there's an offsetted stack > > location from sp_derived_base (a VALUE). There might be multiple > > ones, but you stop at the first? But still use that random offset? > > There shouldn't be multiple ones by construction. > The first time we try to look up stack_pointer_rtx, we create a VALUE > with SP_DERIVED_VALUE_P. If stack_pointer_rtx or (plus:P (sp) (const_int N)) > is looked up later, with/without any stack_pointer_rtx adjustments by > constant in between, we get either that SP_DERIVED_VALUE_P VALUE or a VALUE > with that (plus:P SP_DERIVED_VALUE_P (const_int N)) as one of its locs. > This was done in r10-7665 to make sure we don't construct arbitrarily deep > chains of VALUEs when sp keeps changing a lot. > > The point of the patch is that (unless proven otherwise) I believe that > these sometimes above sometimes below the stack pointer stores with > exception of epilogue freeing it completely are solely for outgoing argument > slots and that normally one should be able to track those to be stack > pointer related (even on ia64 and the likes), so they should have the > SP_DERIVED_VALUE_P or PLUS SP_DERIVED_VALUE_P CONST_INT somewhere. > > The code only attempts to optimize based on offsets if both the MEM's > address and stack_pointer_rtx VALUEs are derived from the same > SP_DERIVED_VALUE_P VALUE (the likely case, to get a different such > VALUE I think one needs to not be in var-tracking and reset the table > but then even the MEMs are invalidated, or in var-tracking across the sp -> > fp setting), and in that case those aren't random offsets, those are > 2 offsets from the same base. So x_addr is known to be > sp_derived_base + sp_derived_off, and current sp is known to be > sp_derived_base + off. So one can just compare the offsets to know > what is below the stack pointer or offsets with x mem_size for the PA case. > > > > + if (sp_derived_base == NULL_RTX) > > > > So, this seems to take SP_DERIVED_VALUE_P conservative in the wrong way? > > It seems CSELIB sets this if it is sure the value is SP derived but > > for correctness reasons you have to assume a value is SP derived if > > you can't prove it is not? > > I don't want to cause significant performance and debugging quality > regressions without a proof that it is really needed otherwise. Don't > remember any PRs about this until this one, so it can't be that common. > If I needed to differentiate between any stack pointer based MEMs vs. > something clearly guaranteed to be outside of the current stack frame > (+ perhaps the incoming arguments area or the like), then sure, I need to be > conservative, any addressable automatic var in the current stack frame can > have its address taken, go through some pointer arithmetics and escape > somewhere etc. > But I can't imagine how addresses of outgoing argument slots could be > escaping somewhere or not have the stack pointer visible in the address > computation. > If one passes C++ non-PODs as arguments/return values, those are passed > by invisible reference, the objects are actually normal automatic VAR_DECLs, > so those aren't in the outgoing stack arguments area. If I use say large > POD aggregate argument passed on the stack, even if what I pass there is > address taken and perhaps self-referential, when it is copied into the > stack slot it still shouldn't refer to addresses within the outgoing > argument slot, there is just a bitwise copy. > Even in the case where way say use memcpy to do the bitwise copy into the > outgoing argument slot and therefore need to have the address loaded in > an argument, I think cselib should see it was stack pointer based. And if > say CSE sees the same is used in another memcpy, it still should see the > original was sp derived.
OK, fair enough. I think there should be a comment indicating this isn't a full conservative approach but handling a certain class of accesses only, with the hope that this is all that's actually needed. > > Or is your argument that no such proof > > is necessary because the stack slot would have its address taken and > > would then necessarily be not a spill slot? This is all about spill > > slots, right? IIRC we do have spill slot MEMs specially marked > > with MEM_EXPR equal to get_spill_slot_decl, there's also may_be_sp_based_p > > using the somewhat broken RTL alias analysis (and there's > > static_reg_base_value with the various stack area kinds, not sure if > > helpful here). > > Stack spills again go above the outgoing argument area, so should be in the > current frame from the prologue to the epilogue, not affected by volatility > of sp decrements and increments for particular calls when not really > accumulating outgoing args. > > I'm a little bit worried about aarch64 SVE and RISC-V, dunno what happens > on poly-int sized outgoing argument pushes because right now the > SP_DERIVED_VALUE_P handling stuff in cselib.cc is about CONST_INT offsets. > But then, both those targets are unconditional ACCUMULATE_OUTGOING_ARGS. > > And the testcase for the PR isn't miscompiled with additional > -maccumulate-outgoing-args or say -mtune=intel which implies that. > So, perhaps I could restrict that cselib_invalidate_mem (callmem[1]); > to if (!ACCUMULATE_OUTGOING_ARGS). I guess that then makes sense. Richard. > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)