On Thu, 21 Dec 2023 at 00:00, Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> If cse sees:
>
>   (set (reg R) (const_vector [A B ...]))
>
> it creates fake sets of the form:
>
>   (set R[0] A)
>   (set R[1] B)
>   ...
>
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
>
>   (set (reg R2) A)
>   ...(reg R2)...
>
> to try to use R[0] instead of (reg R2).
>
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
>
>   (set (reg T) A)
>
> the pass would go to considerable effort trying to work out whether:
>
>   (set R[0] A)
>
> or:
>
>   (set R[0] (reg T))
>
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
>
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
>
> The problem in the PR occurred if we had:
>
> (1) something that establishes an equivalence between a vector V1 of
>     M-bit scalar integers and a hard register H
>
> (2) something that establishes an equivalence between a vector V2 of
>     N-bit scalar integers, where N<M and where V2 contains at least 2
>     instances of V1[0]
>
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
>
>       /* See if we have a CONST_INT that is already in a register in a
>          wider mode.  */
>
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
>
>       /* At this point, ELT, if nonzero, points to a class of expressions
>          equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>          and SRC_RELATED, if nonzero, each contain additional equivalent
>          expressions.  Prune these latter expressions by deleting expressions
>          already in the equivalence class.
>
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
>
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
>
> We could avoid the double registration by adding:
>
>   for (elt = classp; elt; elt = elt->next_same_value)
>     if (rtx_equal_p (elt->exp, x))
>       return elt;
>
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
>
> The handling of fake sets therefore seems like the more fundamental bug.
>
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.
Hi Richard,
Thanks for the detailed explanation and fix!

Thanks,
Prathamesh
>
> gcc/
>         PR rtl-optimization/111702
>         * cse.cc (set::mode): Move earlier.
>         (set::src_in_memory, set::src_volatile): Convert to bitfields.
>         (set::is_fake_set): New member variable.
>         (add_to_set): Add an is_fake_set parameter.
>         (find_sets_in_insn): Update calls accordingly.
>         (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
>         try to optimize them either, or validate changes to them.
>
> gcc/
>         PR rtl-optimization/111702
>         * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
>  gcc/cse.cc                                  | 38 +++++++++++-------
>  gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +++++++++++++++++++++
>  2 files changed, 67 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>    unsigned dest_hash;
>    /* The SET_DEST, with SUBREG, etc., stripped.  */
>    rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>    /* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>    /* Nonzero if the SET_SRC contains something
>       whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been created to describe
> +     part of an insn's effect.  Zero means that RTL appears directly in
> +     the insn pattern.  */
> +  unsigned int is_fake_set : 1;
>    /* Hash value of constant equivalent for SET_SRC.  */
>    unsigned src_const_hash;
>    /* A constant equivalent for SET_SRC, if any.  */
> @@ -4229,12 +4233,15 @@ try_back_substitute_reg (rtx set, rtx_insn *insn)
>      }
>  }
>
> -/* Add an entry containing RTL X into SETS.  */
> +/* Add an entry containing RTL X into SETS.  IS_FAKE_SET is true if X is
> +   an artifical set that has been created to describe part of an insn's
> +   effect.  */
>  static inline void
> -add_to_set (vec<struct set> *sets, rtx x)
> +add_to_set (vec<struct set> *sets, rtx x, bool is_fake_set)
>  {
>    struct set entry = {};
>    entry.rtl = x;
> +  entry.is_fake_set = is_fake_set;
>    sets->safe_push (entry);
>  }
>
> @@ -4271,7 +4278,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>                     && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 
> 1)))
>         {
>           /* First register the vector itself.  */
> -         add_to_set (psets, x);
> +         add_to_set (psets, x, false);
>           rtx src = SET_SRC (x);
>           /* Go over the constants of the CONST_VECTOR in forward order, to
>              put them in the same order in the SETS array.  */
> @@ -4281,11 +4288,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>                  used to tell CSE how to get to a particular constant.  */
>               rtx y = simplify_gen_vec_select (SET_DEST (x), i);
>               gcc_assert (y);
> -             add_to_set (psets, gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)));
> +             rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
> +             add_to_set (psets, set, true);
>             }
>         }
>        else
> -       add_to_set (psets, x);
> +       add_to_set (psets, x, false);
>      }
>    else if (GET_CODE (x) == PARALLEL)
>      {
> @@ -4306,7 +4314,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>               else if (GET_CODE (SET_SRC (y)) == CALL)
>                 ;
>               else
> -               add_to_set (psets, y);
> +               add_to_set (psets, y, false);
>             }
>         }
>      }
> @@ -4616,6 +4624,7 @@ cse_insn (rtx_insn *insn)
>        int src_related_regcost = MAX_COST;
>        int src_elt_regcost = MAX_COST;
>        scalar_int_mode int_mode;
> +      bool is_fake_set = sets[i].is_fake_set;
>
>        dest = SET_DEST (sets[i].rtl);
>        src = SET_SRC (sets[i].rtl);
> @@ -4627,7 +4636,7 @@ cse_insn (rtx_insn *insn)
>        mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
>        sets[i].mode = mode;
>
> -      if (src_eqv)
> +      if (!is_fake_set && src_eqv)
>         {
>           machine_mode eqvmode = mode;
>           if (GET_CODE (dest) == STRICT_LOW_PART)
> @@ -4648,7 +4657,7 @@ cse_insn (rtx_insn *insn)
>        /* If this is a STRICT_LOW_PART assignment, src_eqv corresponds to the
>          value of the INNER register, not the destination.  So it is not
>          a valid substitution for the source.  But save it for later.  */
> -      if (GET_CODE (dest) == STRICT_LOW_PART)
> +      if (is_fake_set || GET_CODE (dest) == STRICT_LOW_PART)
>         src_eqv_here = 0;
>        else
>         src_eqv_here = src_eqv;
> @@ -5158,7 +5167,7 @@ cse_insn (rtx_insn *insn)
>
>        /* Terminate loop when replacement made.  This must terminate since
>           the current contents will be tested and will always be valid.  */
> -      while (1)
> +      while (!is_fake_set)
>         {
>           rtx trial;
>
> @@ -5425,7 +5434,8 @@ cse_insn (rtx_insn *insn)
>          with the head of the class.  If we do not do this, we will have
>          both registers live over a portion of the basic block.  This way,
>          their lifetimes will likely abut instead of overlapping.  */
> -      if (REG_P (dest)
> +      if (!is_fake_set
> +         && REG_P (dest)
>           && REGNO_QTY_VALID_P (REGNO (dest)))
>         {
>           int dest_q = REG_QTY (REGNO (dest));
> diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c 
> b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> new file mode 100644
> index 00000000000..8af2c54de3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target aarch64*-*-* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int data[];
> +
> +void __RTL (startwith ("vregs")) foo ()
> +{
> +  (function "foo"
> +    (insn-chain
> +      (block 2
> +       (edge-from entry (flags "FALLTHRU"))
> +       (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +       (insn 4 (set (reg:V16QI <0>)
> +                    (const_vector:V16QI [(const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)])))
> +       (insn 5 (set (reg:V2SI v0)
> +                    (const_vector:V2SI [(const_int 1) (const_int 0)])))
> +       (insn 6 (set (reg:V16QI v1)
> +                    (const_vector:V16QI [(const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)])))
> +        (insn 7 (set (reg:QI x0) (subreg:QI (reg:V16QI <0>) 0))
> +               (expr_list:REG_EQUAL (const_int 1) (nil)))
> +        (insn 8 (use (reg:V16QI <0>)))
> +        (insn 9 (use (reg:V2SI v0)))
> +        (insn 10 (use (reg:V16QI v1)))
> +        (insn 11 (use (reg:QI x0)))
> +       (edge-to exit (flags "FALLTHRU"))
> +      ) ;; block 2
> +    ) ;; insn-chain
> +  ) ;; function
> +}
> --
> 2.25.1
>

Reply via email to