On Tue, 4 Feb 2025, Jakub Jelinek wrote:

> On Tue, Feb 04, 2025 at 11:53:21AM +0100, Richard Biener wrote:
> > > +       if (sp_derived_base)
> > > +         if (cselib_val *v3
> > > +             = cselib_lookup_1 (stack_pointer_rtx, Pmode, 0, VOIDmode))
> > > +           {
> > > +             bool ok = false;
> > > +             HOST_WIDE_INT off = 0;
> > > +             if (v3->val_rtx == sp_derived_base)
> > > +               ok = true;
> > > +             else
> > > +               for (struct elt_loc_list *l = v3->locs; l; l = l->next)
> > 
> > isn't this invariant on the whole workset?  You are looking up
> > the CSELIB val for stack_pointer_rtx and traversing it's locations.
> 
> The way it is written it isn't invariant, but you're right it might be a
> good idea to tweak it so that it is invariant and just compute it lazily the
> first time we need it during the single cselib_invalidate_memory call
> (with SP_DERIVED_VALUE_P tests instead of == sp_derived_base) and then just
> compare that base against sp_derived_base (perhaps with better names for the
> 2 pairs of variables, VALUE and corresponding offset).
> 
> > It looks like you want to know whether there's an offsetted stack
> > location from sp_derived_base (a VALUE).  There might be multiple
> > ones, but you stop at the first?  But still use that random offset?
> 
> There shouldn't be multiple ones by construction.
> The first time we try to look up stack_pointer_rtx, we create a VALUE
> with SP_DERIVED_VALUE_P.  If stack_pointer_rtx or (plus:P (sp) (const_int N))
> is looked up later, with/without any stack_pointer_rtx adjustments by
> constant in between, we get either that SP_DERIVED_VALUE_P VALUE or a VALUE
> with that (plus:P SP_DERIVED_VALUE_P (const_int N)) as one of its locs.
> This was done in r10-7665 to make sure we don't construct arbitrarily deep
> chains of VALUEs when sp keeps changing a lot.
> 
> The point of the patch is that (unless proven otherwise) I believe that
> these sometimes above sometimes below the stack pointer stores with
> exception of epilogue freeing it completely are solely for outgoing argument
> slots and that normally one should be able to track those to be stack
> pointer related (even on ia64 and the likes), so they should have the
> SP_DERIVED_VALUE_P or PLUS SP_DERIVED_VALUE_P CONST_INT somewhere.
> 
> The code only attempts to optimize based on offsets if both the MEM's
> address and stack_pointer_rtx VALUEs are derived from the same
> SP_DERIVED_VALUE_P VALUE (the likely case, to get a different such
> VALUE I think one needs to not be in var-tracking and reset the table
> but then even the MEMs are invalidated, or in var-tracking across the sp ->
> fp setting), and in that case those aren't random offsets, those are
> 2 offsets from the same base.  So x_addr is known to be
> sp_derived_base + sp_derived_off, and current sp is known to be
> sp_derived_base + off.  So one can just compare the offsets to know
> what is below the stack pointer or offsets with x mem_size for the PA case.
> 
> > > +       if (sp_derived_base == NULL_RTX)
> > 
> > So, this seems to take SP_DERIVED_VALUE_P conservative in the wrong way?
> > It seems CSELIB sets this if it is sure the value is SP derived but
> > for correctness reasons you have to assume a value is SP derived if
> > you can't prove it is not?
> 
> I don't want to cause significant performance and debugging quality
> regressions without a proof that it is really needed otherwise.  Don't
> remember any PRs about this until this one, so it can't be that common.
> If I needed to differentiate between any stack pointer based MEMs vs.
> something clearly guaranteed to be outside of the current stack frame
> (+ perhaps the incoming arguments area or the like), then sure, I need to be
> conservative, any addressable automatic var in the current stack frame can
> have its address taken, go through some pointer arithmetics and escape
> somewhere etc.
> But I can't imagine how addresses of outgoing argument slots could be
> escaping somewhere or not have the stack pointer visible in the address
> computation.
> If one passes C++ non-PODs as arguments/return values, those are passed
> by invisible reference, the objects are actually normal automatic VAR_DECLs,
> so those aren't in the outgoing stack arguments area.  If I use say large
> POD aggregate argument passed on the stack, even if what I pass there is
> address taken and perhaps self-referential, when it is copied into the
> stack slot it still shouldn't refer to addresses within the outgoing
> argument slot, there is just a bitwise copy.
> Even in the case where way say use memcpy to do the bitwise copy into the
> outgoing argument slot and therefore need to have the address loaded in
> an argument, I think cselib should see it was stack pointer based.  And if
> say CSE sees the same is used in another memcpy, it still should see the
> original was sp derived.

OK, fair enough.  I think there should be a comment indicating this
isn't a full conservative approach but handling a certain class of
accesses only, with the hope that this is all that's actually needed.

> >  Or is your argument that no such proof
> > is necessary because the stack slot would have its address taken and
> > would then necessarily be not a spill slot?  This is all about spill
> > slots, right?  IIRC we do have spill slot MEMs specially marked
> > with MEM_EXPR equal to get_spill_slot_decl, there's also may_be_sp_based_p
> > using the somewhat broken RTL alias analysis (and there's
> > static_reg_base_value with the various stack area kinds, not sure if
> > helpful here).
> 
> Stack spills again go above the outgoing argument area, so should be in the
> current frame from the prologue to the epilogue, not affected by volatility
> of sp decrements and increments for particular calls when not really
> accumulating outgoing args.
> 
> I'm a little bit worried about aarch64 SVE and RISC-V, dunno what happens
> on poly-int sized outgoing argument pushes because right now the
> SP_DERIVED_VALUE_P handling stuff in cselib.cc is about CONST_INT offsets.
> But then, both those targets are unconditional ACCUMULATE_OUTGOING_ARGS.
> 
> And the testcase for the PR isn't miscompiled with additional
> -maccumulate-outgoing-args or say -mtune=intel which implies that.
> So, perhaps I could restrict that cselib_invalidate_mem (callmem[1]);
> to if (!ACCUMULATE_OUTGOING_ARGS).

I guess that then makes sense.

Richard.

> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to