On Tue, Feb 04, 2025 at 11:53:21AM +0100, Richard Biener wrote:
> > +         if (sp_derived_base)
> > +           if (cselib_val *v3
> > +               = cselib_lookup_1 (stack_pointer_rtx, Pmode, 0, VOIDmode))
> > +             {
> > +               bool ok = false;
> > +               HOST_WIDE_INT off = 0;
> > +               if (v3->val_rtx == sp_derived_base)
> > +                 ok = true;
> > +               else
> > +                 for (struct elt_loc_list *l = v3->locs; l; l = l->next)
> 
> isn't this invariant on the whole workset?  You are looking up
> the CSELIB val for stack_pointer_rtx and traversing it's locations.

The way it is written it isn't invariant, but you're right it might be a
good idea to tweak it so that it is invariant and just compute it lazily the
first time we need it during the single cselib_invalidate_memory call
(with SP_DERIVED_VALUE_P tests instead of == sp_derived_base) and then just
compare that base against sp_derived_base (perhaps with better names for the
2 pairs of variables, VALUE and corresponding offset).

> It looks like you want to know whether there's an offsetted stack
> location from sp_derived_base (a VALUE).  There might be multiple
> ones, but you stop at the first?  But still use that random offset?

There shouldn't be multiple ones by construction.
The first time we try to look up stack_pointer_rtx, we create a VALUE
with SP_DERIVED_VALUE_P.  If stack_pointer_rtx or (plus:P (sp) (const_int N))
is looked up later, with/without any stack_pointer_rtx adjustments by
constant in between, we get either that SP_DERIVED_VALUE_P VALUE or a VALUE
with that (plus:P SP_DERIVED_VALUE_P (const_int N)) as one of its locs.
This was done in r10-7665 to make sure we don't construct arbitrarily deep
chains of VALUEs when sp keeps changing a lot.

The point of the patch is that (unless proven otherwise) I believe that
these sometimes above sometimes below the stack pointer stores with
exception of epilogue freeing it completely are solely for outgoing argument
slots and that normally one should be able to track those to be stack
pointer related (even on ia64 and the likes), so they should have the
SP_DERIVED_VALUE_P or PLUS SP_DERIVED_VALUE_P CONST_INT somewhere.

The code only attempts to optimize based on offsets if both the MEM's
address and stack_pointer_rtx VALUEs are derived from the same
SP_DERIVED_VALUE_P VALUE (the likely case, to get a different such
VALUE I think one needs to not be in var-tracking and reset the table
but then even the MEMs are invalidated, or in var-tracking across the sp ->
fp setting), and in that case those aren't random offsets, those are
2 offsets from the same base.  So x_addr is known to be
sp_derived_base + sp_derived_off, and current sp is known to be
sp_derived_base + off.  So one can just compare the offsets to know
what is below the stack pointer or offsets with x mem_size for the PA case.

> > +         if (sp_derived_base == NULL_RTX)
> 
> So, this seems to take SP_DERIVED_VALUE_P conservative in the wrong way?
> It seems CSELIB sets this if it is sure the value is SP derived but
> for correctness reasons you have to assume a value is SP derived if
> you can't prove it is not?

I don't want to cause significant performance and debugging quality
regressions without a proof that it is really needed otherwise.  Don't
remember any PRs about this until this one, so it can't be that common.
If I needed to differentiate between any stack pointer based MEMs vs.
something clearly guaranteed to be outside of the current stack frame
(+ perhaps the incoming arguments area or the like), then sure, I need to be
conservative, any addressable automatic var in the current stack frame can
have its address taken, go through some pointer arithmetics and escape
somewhere etc.
But I can't imagine how addresses of outgoing argument slots could be
escaping somewhere or not have the stack pointer visible in the address
computation.
If one passes C++ non-PODs as arguments/return values, those are passed
by invisible reference, the objects are actually normal automatic VAR_DECLs,
so those aren't in the outgoing stack arguments area.  If I use say large
POD aggregate argument passed on the stack, even if what I pass there is
address taken and perhaps self-referential, when it is copied into the
stack slot it still shouldn't refer to addresses within the outgoing
argument slot, there is just a bitwise copy.
Even in the case where way say use memcpy to do the bitwise copy into the
outgoing argument slot and therefore need to have the address loaded in
an argument, I think cselib should see it was stack pointer based.  And if
say CSE sees the same is used in another memcpy, it still should see the
original was sp derived.

>  Or is your argument that no such proof
> is necessary because the stack slot would have its address taken and
> would then necessarily be not a spill slot?  This is all about spill
> slots, right?  IIRC we do have spill slot MEMs specially marked
> with MEM_EXPR equal to get_spill_slot_decl, there's also may_be_sp_based_p
> using the somewhat broken RTL alias analysis (and there's
> static_reg_base_value with the various stack area kinds, not sure if
> helpful here).

Stack spills again go above the outgoing argument area, so should be in the
current frame from the prologue to the epilogue, not affected by volatility
of sp decrements and increments for particular calls when not really
accumulating outgoing args.

I'm a little bit worried about aarch64 SVE and RISC-V, dunno what happens
on poly-int sized outgoing argument pushes because right now the
SP_DERIVED_VALUE_P handling stuff in cselib.cc is about CONST_INT offsets.
But then, both those targets are unconditional ACCUMULATE_OUTGOING_ARGS.

And the testcase for the PR isn't miscompiled with additional
-maccumulate-outgoing-args or say -mtune=intel which implies that.
So, perhaps I could restrict that cselib_invalidate_mem (callmem[1]);
to if (!ACCUMULATE_OUTGOING_ARGS).

        Jakub

Reply via email to