Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]

Richard Biener Wed, 20 Dec 2023 11:01:08 -0800

> Am 20.12.2023 um 19:30 schrieb Richard Sandiford <richard.sandif...@arm.com>:
> 
> If cse sees:
> 
>  (set (reg R) (const_vector [A B ...]))
> 
> it creates fake sets of the form:
> 
>  (set R[0] A)
>  (set R[1] B)
>  ...
> 
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
> 
>  (set (reg R2) A)
>  ...(reg R2)...
> 
> to try to use R[0] instead of (reg R2).
> 
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
> 
>  (set (reg T) A)
> 
> the pass would go to considerable effort trying to work out whether:
> 
>  (set R[0] A)
> 
> or:
> 
>  (set R[0] (reg T))
> 
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
> 
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
> 
> The problem in the PR occurred if we had:
> 
> (1) something that establishes an equivalence between a vector V1 of
>    M-bit scalar integers and a hard register H
> 
> (2) something that establishes an equivalence between a vector V2 of
>    N-bit scalar integers, where N<M and where V2 contains at least 2
>    instances of V1[0]
> 
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
> 
>      /* See if we have a CONST_INT that is already in a register in a
>     wider mode.  */
> 
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
> 
>      /* At this point, ELT, if nonzero, points to a class of expressions
>         equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>     and SRC_RELATED, if nonzero, each contain additional equivalent
>     expressions.  Prune these latter expressions by deleting expressions
>     already in the equivalence class.
> 
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
> 
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
> 
> We could avoid the double registration by adding:
> 
>  for (elt = classp; elt; elt = elt->next_same_value)
>    if (rtx_equal_p (elt->exp, x))
>      return elt;
> 
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
> 
> The handling of fake sets therefore seems like the more fundamental bug.
> 
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.

Agreed and OK

> gcc/
>    PR rtl-optimization/111702
>    * cse.cc (set::mode): Move earlier.
>    (set::src_in_memory, set::src_volatile): Convert to bitfields.
>    (set::is_fake_set): New member variable.
>    (add_to_set): Add an is_fake_set parameter.
>    (find_sets_in_insn): Update calls accordingly.
>    (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
>    try to optimize them either, or validate changes to them.
> 
> gcc/
>    PR rtl-optimization/111702
>    * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
> gcc/cse.cc                                  | 38 +++++++++++-------
> gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +++++++++++++++++++++
> 2 files changed, 67 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> 
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>   unsigned dest_hash;
>   /* The SET_DEST, with SUBREG, etc., stripped.  */
>   rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>   /* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>   /* Nonzero if the SET_SRC contains something
>      whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been created to describe
> +     part of an insn's effect.  Zero means that RTL appears directly in
> +     the insn pattern.  */
> +  unsigned int is_fake_set : 1;
>   /* Hash value of constant equivalent for SET_SRC.  */
>   unsigned src_const_hash;
>   /* A constant equivalent for SET_SRC, if any.  */
> @@ -4229,12 +4233,15 @@ try_back_substitute_reg (rtx set, rtx_insn *insn)
>     }
> }
> 
> -/* Add an entry containing RTL X into SETS.  */
> +/* Add an entry containing RTL X into SETS.  IS_FAKE_SET is true if X is
> +   an artifical set that has been created to describe part of an insn's
> +   effect.  */
> static inline void
> -add_to_set (vec<struct set> *sets, rtx x)
> +add_to_set (vec<struct set> *sets, rtx x, bool is_fake_set)
> {
>   struct set entry = {};
>   entry.rtl = x;
> +  entry.is_fake_set = is_fake_set;
>   sets->safe_push (entry);
> }
> 
> @@ -4271,7 +4278,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>            && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
>    {
>      /* First register the vector itself.  */
> -      add_to_set (psets, x);
> +      add_to_set (psets, x, false);
>      rtx src = SET_SRC (x);
>      /* Go over the constants of the CONST_VECTOR in forward order, to
>         put them in the same order in the SETS array.  */
> @@ -4281,11 +4288,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>         used to tell CSE how to get to a particular constant.  */
>          rtx y = simplify_gen_vec_select (SET_DEST (x), i);
>          gcc_assert (y);
> -          add_to_set (psets, gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)));
> +          rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
> +          add_to_set (psets, set, true);
>        }
>    }
>       else
> -    add_to_set (psets, x);
> +    add_to_set (psets, x, false);
>     }
>   else if (GET_CODE (x) == PARALLEL)
>     {
> @@ -4306,7 +4314,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>          else if (GET_CODE (SET_SRC (y)) == CALL)
>        ;
>          else
> -        add_to_set (psets, y);
> +        add_to_set (psets, y, false);
>        }
>    }
>     }
> @@ -4616,6 +4624,7 @@ cse_insn (rtx_insn *insn)
>       int src_related_regcost = MAX_COST;
>       int src_elt_regcost = MAX_COST;
>       scalar_int_mode int_mode;
> +      bool is_fake_set = sets[i].is_fake_set;
> 
>       dest = SET_DEST (sets[i].rtl);
>       src = SET_SRC (sets[i].rtl);
> @@ -4627,7 +4636,7 @@ cse_insn (rtx_insn *insn)
>       mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
>       sets[i].mode = mode;
> 
> -      if (src_eqv)
> +      if (!is_fake_set && src_eqv)
>    {
>      machine_mode eqvmode = mode;
>      if (GET_CODE (dest) == STRICT_LOW_PART)
> @@ -4648,7 +4657,7 @@ cse_insn (rtx_insn *insn)
>       /* If this is a STRICT_LOW_PART assignment, src_eqv corresponds to the
>     value of the INNER register, not the destination.  So it is not
>     a valid substitution for the source.  But save it for later.  */
> -      if (GET_CODE (dest) == STRICT_LOW_PART)
> +      if (is_fake_set || GET_CODE (dest) == STRICT_LOW_PART)
>    src_eqv_here = 0;
>       else
>    src_eqv_here = src_eqv;
> @@ -5158,7 +5167,7 @@ cse_insn (rtx_insn *insn)
> 
>       /* Terminate loop when replacement made.  This must terminate since
>          the current contents will be tested and will always be valid.  */
> -      while (1)
> +      while (!is_fake_set)
>    {
>      rtx trial;
> 
> @@ -5425,7 +5434,8 @@ cse_insn (rtx_insn *insn)
>     with the head of the class.  If we do not do this, we will have
>     both registers live over a portion of the basic block.  This way,
>     their lifetimes will likely abut instead of overlapping.  */
> -      if (REG_P (dest)
> +      if (!is_fake_set
> +      && REG_P (dest)
>      && REGNO_QTY_VALID_P (REGNO (dest)))
>    {
>      int dest_q = REG_QTY (REGNO (dest));
> diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c 
> b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> new file mode 100644
> index 00000000000..8af2c54de3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target aarch64*-*-* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int data[];
> +
> +void __RTL (startwith ("vregs")) foo ()
> +{
> +  (function "foo"
> +    (insn-chain
> +      (block 2
> +    (edge-from entry (flags "FALLTHRU"))
> +    (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +    (insn 4 (set (reg:V16QI <0>)
> +             (const_vector:V16QI [(const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)])))
> +    (insn 5 (set (reg:V2SI v0)
> +             (const_vector:V2SI [(const_int 1) (const_int 0)])))
> +    (insn 6 (set (reg:V16QI v1)
> +             (const_vector:V16QI [(const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)])))
> +        (insn 7 (set (reg:QI x0) (subreg:QI (reg:V16QI <0>) 0))
> +        (expr_list:REG_EQUAL (const_int 1) (nil)))
> +        (insn 8 (use (reg:V16QI <0>)))
> +        (insn 9 (use (reg:V2SI v0)))
> +        (insn 10 (use (reg:V16QI v1)))
> +        (insn 11 (use (reg:QI x0)))
> +    (edge-to exit (flags "FALLTHRU"))
> +      ) ;; block 2
> +    ) ;; insn-chain
> +  ) ;; function
> +}
> --
> 2.25.1
>
Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]

Reply via email to