Re: [PATCH] cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]

Richard Biener Tue, 04 Feb 2025 02:53:48 -0800

On Tue, 4 Feb 2025, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled on x86_64 during postreload.
> After reload (with IPA-RA figuring out the calls don't modify any
> registers but %rax for return value) postreload sees
> (insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
>                 (const_int 16 [0x10])) [0  S8 A64])
>         (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 
> {*movdi_internal}
>      (nil))
> (call_insn/i 15 14 16 2 (set (reg:SI 0 ax)
>         (call (mem:QI (symbol_ref:DI ("baz") [flags 0x3]  <function_decl 
> 0x7ffb2e2bdf00 r>) [0 baz S1 A8])
>             (const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value}
>      (expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3]  
> <function_decl 0x7ffb2e2bdf00 baz>)
>         (expr_list:REG_EH_REGION (const_int 0 [0])
>             (nil)))
>     (nil))
> (insn 16 15 18 2 (parallel [
>             (set (reg/f:DI 7 sp)
>                 (plus:DI (reg/f:DI 7 sp)
>                     (const_int 24 [0x18])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr117239.c":18:7 285 {*adddi_1}
>      (expr_list:REG_ARGS_SIZE (const_int 0 [0])
>         (nil)))
> ...
> (call_insn/i 19 18 21 2 (set (reg:SI 0 ax)
>         (call (mem:QI (symbol_ref:DI ("foo") [flags 0x3]  <function_decl 
> 0x7ffb2e2bdb00 l>) [0 foo S1 A8])
>             (const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value}
>      (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3]  
> <function_decl 0x7ffb2e2bdb00 foo>)
>         (expr_list:REG_EH_REGION (const_int 0 [0])
>             (nil)))
>     (nil))
> (insn 21 19 26 2 (parallel [
>             (set (reg/f:DI 7 sp)
>                 (plus:DI (reg/f:DI 7 sp)
>                     (const_int -24 [0xffffffffffffffe8])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr117239.c":19:3 discrim 1 285 {*adddi_1}
>      (expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
>         (nil)))
> (insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
>                 (const_int 16 [0x10])) [0  S8 A64])
>         (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 
> {*movdi_internal}
>      (nil))
> i.e.
>         movq    %rdx, 16(%rsp)
>         call    baz
>         addq    $24, %rsp
> ...
>         call    foo
>         subq    $24, %rsp
>         movq    %rdx, 16(%rsp)
> Now, postreload uses cselib and cselib remembered that %rdx value has been
> stored into 16(%rsp).  Both baz and foo are pure calls.  If they weren't,
> when processing those CALL_INSNs cselib would invalidate all MEMs
>       if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>           || !(RTL_CONST_OR_PURE_CALL_P (insn)))
>         cselib_invalidate_mem (callmem);
> where callmem is (mem:BLK (scratch)).  But they are pure, so instead the
> code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE.
> The calls actually clobber more than that, even const/pure calls clobber
> all memory below the stack pointer.  And that is something that hasn't been
> invalidated.  In this failing testcase, the call to baz is not a big deal,
> we don't have anything remembered in memory below %rsp at that call.
> But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack
> and do the call to foo.  And that call now actually, not just in theory,
> clobbers the memory below the stack pointer (in particular overwrites it
> with the return value).  But cselib does not invalidate.  Then %rsp
> is decremented again (in preparation for another call, to bar) and cselib
> is processing store of %rdx (which IPA-RA says has not been modified by
> either baz or foo calls) to %rsp + 16, and it sees the memory already has
> that value, so the store is useless, let's remove it.
> But it is not, the call to foo has changed it, so it needs to be stored
> again.
> 
> The following patch adds targetted invalidation of memory below stack
> pointer (or on SPARC memory below stack pointer + 2047 when stack bias is
> used, or on PA memory above stack pointer instead).
> Now, memory below stack pointer is special, except for functions using
> alloca/VLAs I believe no addressable memory should be there, it should be
> purely outgoing function argument area, if we take address of some automatic
> variable, it should live all the time above the outgoing function argument
> area.  So on top of just trying to flush memory below stack pointer
> (represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches),
> the patch tries to optimize and only invalidate memory that has address
> clearly derived from stack pointer (memory with other bases is not
> invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in
> both VALUEs) it is above current stack, also don't call
> canon_anti_dependence which might just give up in certain cases.
> 
> I've gathered statistics from x86_64-linux and i686-linux
> bootstraps/regtests.  During -m64 compilations from those, there were
> 3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem
> (callmem[1]) calls, the first number is number of MEMs not invalidated
> because of the optimization, i.e.
> +             if (sp_derived_base == NULL_RTX)
> +               {
> +                 has_mem = true;
> +                 num_mems++;
> +                 p = &(*p)->next;
> +                 continue;
> +               }
> in the patch, the second number is number of MEMs not invalidated because
> canon_anti_dependence returned false and finally the last number is number
> of MEMs actually invalidated (so that is what hasn't been invalidated
> before).  During -m32 compilations the numbers were
> 1422412 + 39354 + 16509 with the same meaning.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Note, when there is no red zone, in theory even the sp = sp + incr
> instruction invalidates memory below the new stack pointer, as signal
> can come and overwrite the memory.  So maybe we should be invalidating
> something at those instructions as well.  But in leaf functions we certainly
> can have even addressable automatic vars in the red zone (which would make
> it harder to distinguish), on the other side aren't normally storing
> anything below the red zone, and in non-leaf it should normally be just the
> outgoing arguments area.
> 
> 2025-02-04  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR rtl-optimization/117239
>       * cselib.cc (callmem): Change type from rtx to rtx[2].
>       (cselib_preserve_only_values): Use callmem[0] rather than callmem.
>       (cselib_invalidate_mem): Optimize and don't try to invalidate
>       for the mem_rtx == callmem[1] case MEMs which clearly can't be
>       below the stack pointer.
>       (cselib_process_insn): Likewise.  For const/pure calls also
>       call cselib_invalidate_mem (callmem[1]).
>       (cselib_init): Initialize callmem[0] rather than callmem and also
>       initialize callmem[1].
> 
>       * gcc.dg/pr117239.c: New test.
> 
> --- gcc/cselib.cc.jj  2025-02-01 00:46:53.073275225 +0100
> +++ gcc/cselib.cc     2025-02-03 13:16:16.381772989 +0100
> @@ -248,8 +248,9 @@ static unsigned int *used_regs;
>  static unsigned int n_used_regs;
>  
>  /* We pass this to cselib_invalidate_mem to invalidate all of
> -   memory for a non-const call instruction.  */
> -static GTY(()) rtx callmem;
> +   memory for a non-const call instruction and memory below stack pointer
> +   for const/pure calls.  */
> +static GTY(()) rtx callmem[2];
>  
>  /* Set by discard_useless_locs if it deleted the last location of any
>     value.  */
> @@ -808,7 +809,7 @@ cselib_preserve_only_values (void)
>    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>      cselib_invalidate_regno (i, reg_raw_mode[i]);
>  
> -  cselib_invalidate_mem (callmem);
> +  cselib_invalidate_mem (callmem[0]);
>  
>    remove_useless_values ();
>  
> @@ -2644,6 +2645,97 @@ cselib_invalidate_mem (rtx mem_rtx)
>             p = &(*p)->next;
>             continue;
>           }
> +
> +       /* When invalidating memory below the stack pointer for const/pure
> +          calls and alloca/VLAs aren't used, attempt to optimize.  Values
> +          stored into area below the stack pointer shouldn't be addressable
> +          and should be stored just through stack pointer derived
> +          expressions, so don't invalidate MEMs not using stack derived
> +          addresses, or if the MEMs clearly aren't below the stack
> +          pointer.  */
> +       if (mem_rtx == callmem[1]
> +           && num_mems < param_max_cselib_memory_locations
> +           && GET_CODE (XEXP (x, 0)) == VALUE
> +           && !cfun->calls_alloca)
> +         {
> +           cselib_val *v2 = CSELIB_VAL_PTR (XEXP (x, 0));
> +           rtx sp_derived_base = NULL_RTX;
> +           HOST_WIDE_INT sp_derived_off = 0;
> +           if (SP_DERIVED_VALUE_P (v2->val_rtx))
> +             sp_derived_base = v2->val_rtx;
> +           else
> +             for (struct elt_loc_list *l = v2->locs; l; l = l->next)
> +               if (GET_CODE (l->loc) == PLUS
> +                   && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +                   && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +                   && CONST_INT_P (XEXP (l->loc, 1)))
> +                 {
> +                   sp_derived_base = XEXP (l->loc, 0);
> +                   sp_derived_off = INTVAL (XEXP (l->loc, 1));
> +                   break;
> +                 }
> +           if (sp_derived_base)
> +             if (cselib_val *v3
> +                 = cselib_lookup_1 (stack_pointer_rtx, Pmode, 0, VOIDmode))
> +               {
> +                 bool ok = false;
> +                 HOST_WIDE_INT off = 0;
> +                 if (v3->val_rtx == sp_derived_base)
> +                   ok = true;
> +                 else
> +                   for (struct elt_loc_list *l = v3->locs; l; l = l->next)


isn't this invariant on the whole workset?  You are looking up
the CSELIB val for stack_pointer_rtx and traversing it's locations.
It looks like you want to know whether there's an offsetted stack
location from sp_derived_base (a VALUE).  There might be multiple
ones, but you stop at the first?  But still use that random offset?

> +                     if (GET_CODE (l->loc) == PLUS
> +                         && XEXP (l->loc, 0) == sp_derived_base
> +                         && CONST_INT_P (XEXP (l->loc, 1)))
> +                       {

so what did we actually do here?  This needs a comment, esp.
'off' vs. 'sp_derived_off'.

> +                         ok = true;
> +                         off = INTVAL (XEXP (l->loc, 1));
> +                         break;
> +                       }
> +                 if (ok)
> +                   {
> +                     if (STACK_GROWS_DOWNWARD)
> +                       {
> +#ifdef STACK_ADDRESS_OFFSET
> +                         /* On SPARC take stack pointer bias into account as
> +                            well.  */
> +                         off
> +                           += (STACK_ADDRESS_OFFSET
> +                               - FIRST_PARM_OFFSET (current_function_decl));
> +#endif
> +                         if (sp_derived_off >= off)
> +                           /* x is at or above the current stack pointer,
> +                              no need to invalidate it.  */
> +                           sp_derived_base = NULL_RTX;
> +                       }
> +                     else
> +                       {
> +                         HOST_WIDE_INT sz;
> +                         enum machine_mode mode = GET_MODE (x);
> +                         if ((MEM_SIZE_KNOWN_P (x)
> +                              && MEM_SIZE (x).is_constant (&sz))
> +                             || (mode != BLKmode
> +                                 && GET_MODE_SIZE (mode).is_constant (&sz)))
> +                           if (sp_derived_off < off
> +                               && ((HOST_WIDE_INT)
> +                                   ((unsigned HOST_WIDE_INT) sp_derived_off
> +                                    + sz) <= off))
> +                             /* x's end is below or at the current stack
> +                                pointer in !STACK_GROWS_DOWNWARD target,
> +                                no need to invalidate it.  */
> +                             sp_derived_base = NULL_RTX;
> +                       }
> +                   }
> +               }
> +           if (sp_derived_base == NULL_RTX)

So, this seems to take SP_DERIVED_VALUE_P conservative in the wrong way?
It seems CSELIB sets this if it is sure the value is SP derived but
for correctness reasons you have to assume a value is SP derived if
you can't prove it is not?  Or is your argument that no such proof
is necessary because the stack slot would have its address taken and
would then necessarily be not a spill slot?  This is all about spill
slots, right?  IIRC we do have spill slot MEMs specially marked
with MEM_EXPR equal to get_spill_slot_decl, there's also may_be_sp_based_p
using the somewhat broken RTL alias analysis (and there's
static_reg_base_value with the various stack area kinds, not sure if
helpful here).

> +             {
> +               has_mem = true;
> +               num_mems++;
> +               p = &(*p)->next;
> +               continue;
> +             }
> +         }
> +
>         if (num_mems < param_max_cselib_memory_locations
>             && ! canon_anti_dependence (x, false, mem_rtx,
>                                         GET_MODE (mem_rtx), mem_addr))
> @@ -3196,14 +3288,20 @@ cselib_process_insn (rtx_insn *insn)
>        as if they were regular functions.  */
>        if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>         || !(RTL_CONST_OR_PURE_CALL_P (insn)))
> -     cselib_invalidate_mem (callmem);
> +     cselib_invalidate_mem (callmem[0]);
>        else
> -     /* For const/pure calls, invalidate any argument slots because
> -        they are owned by the callee.  */
> -     for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
> -       if (GET_CODE (XEXP (x, 0)) == USE
> -           && MEM_P (XEXP (XEXP (x, 0), 0)))
> -         cselib_invalidate_mem (XEXP (XEXP (x, 0), 0));
> +     {
> +       /* For const/pure calls, invalidate any argument slots because
> +          they are owned by the callee.  */
> +       for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
> +         if (GET_CODE (XEXP (x, 0)) == USE
> +             && MEM_P (XEXP (XEXP (x, 0), 0)))
> +           cselib_invalidate_mem (XEXP (XEXP (x, 0), 0));
> +       /* And invalidate memory below the stack (or above for
> +          !STACK_GROWS_DOWNWARD), as even const/pure call can invalidate
> +          that.  */
> +       cselib_invalidate_mem (callmem[1]);
> +     }
>      }
>  
>    cselib_record_sets (insn);
> @@ -3256,8 +3354,27 @@ cselib_init (int record_what)
>  
>    /* (mem:BLK (scratch)) is a special mechanism to conflict with everything,
>       see canon_true_dependence.  This is only created once.  */
> -  if (! callmem)
> -    callmem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
> +  if (! callmem[0])
> +    {
> +      callmem[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
> +      /* Similarly create a MEM representing roughly everything below
> +      the stack for STACK_GROWS_DOWNWARD targets or everything above
> +      it otherwise.  */
> +      if (STACK_GROWS_DOWNWARD)
> +     {
> +       unsigned HOST_WIDE_INT off = -(GET_MODE_MASK (Pmode) >> 1);
> +#ifdef STACK_ADDRESS_OFFSET
> +       /* On SPARC take stack pointer bias into account as well.  */
> +       off += (STACK_ADDRESS_OFFSET
> +               - FIRST_PARM_OFFSET (current_function_decl)));
> +#endif
> +       callmem[1] = plus_constant (Pmode, stack_pointer_rtx, off);
> +     }
> +      else
> +     callmem[1] = stack_pointer_rtx;
> +      callmem[1] = gen_rtx_MEM (BLKmode, callmem[1]);
> +      set_mem_size (callmem[1], GET_MODE_MASK (Pmode) >> 1);
> +    }
>  
>    cselib_nregs = max_reg_num ();
>  
> --- gcc/testsuite/gcc.dg/pr117239.c.jj        2025-02-03 11:13:59.399159640 
> +0100
> +++ gcc/testsuite/gcc.dg/pr117239.c   2025-02-03 11:13:59.399159640 +0100
> @@ -0,0 +1,42 @@
> +/* PR rtl-optimization/117239 */
> +/* { dg-do run } */
> +/* { dg-options "-fno-inline -O2" } */
> +/* { dg-additional-options "-fschedule-insns" { target i?86-*-* x86_64-*-* } 
> } */
> +
> +int a, b, c = 1, d;
> +
> +int
> +foo (void)
> +{
> +  return a;
> +}
> +
> +struct A {
> +  int e, f, g, h;
> +  short i;
> +  int j;
> +};
> +
> +void
> +bar (int x, struct A y)
> +{
> +  if (y.j == 1)
> +    c = 0;
> +}
> +
> +int
> +baz (struct A x)
> +{
> +  return b;
> +}
> +
> +int
> +main ()
> +{
> +  struct A k = { 0, 0, 0, 0, 0, 1 };
> +  d = baz (k);
> +  bar (foo (), k);
> +  if (c != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]

Reply via email to