> Am 20.12.2023 um 19:30 schrieb Richard Sandiford <richard.sandif...@arm.com>:
>
> If cse sees:
>
> (set (reg R) (const_vector [A B ...]))
>
> it creates fake sets of the form:
>
> (set R[0] A)
> (set R[1] B)
> ...
>
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets. This allows a sequence like:
>
> (set (reg R2) A)
> ...(reg R2)...
>
> to try to use R[0] instead of (reg R2).
>
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs. That is, if there was an earlier:
>
> (set (reg T) A)
>
> the pass would go to considerable effort trying to work out whether:
>
> (set R[0] A)
>
> or:
>
> (set R[0] (reg T))
>
> was more profitable. This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
>
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0]. We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
>
> The problem in the PR occurred if we had:
>
> (1) something that establishes an equivalence between a vector V1 of
> M-bit scalar integers and a hard register H
>
> (2) something that establishes an equivalence between a vector V2 of
> N-bit scalar integers, where N<M and where V2 contains at least 2
> instances of V1[0]
>
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
>
> /* See if we have a CONST_INT that is already in a register in a
> wider mode. */
>
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table. It therefore survived
> the purge in:
>
> /* At this point, ELT, if nonzero, points to a class of expressions
> equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
> and SRC_RELATED, if nonzero, each contain additional equivalent
> expressions. Prune these latter expressions by deleting expressions
> already in the equivalence class.
>
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
>
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
>
> We could avoid the double registration by adding:
>
> for (elt = classp; elt; elt = elt->next_same_value)
> if (rtx_equal_p (elt->exp, x))
> return elt;
>
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence. The latter seems more
> appealing from a compile-time perspective. But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
>
> The handling of fake sets therefore seems like the more fundamental bug.
>
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets. They only describe the "real" (first) set.
Agreed and OK
> gcc/
> PR rtl-optimization/111702
> * cse.cc (set::mode): Move earlier.
> (set::src_in_memory, set::src_volatile): Convert to bitfields.
> (set::is_fake_set): New member variable.
> (add_to_set): Add an is_fake_set parameter.
> (find_sets_in_insn): Update calls accordingly.
> (cse_insn): Do not apply REG_EQUAL notes to fake sets. Do not
> try to optimize them either, or validate changes to them.
>
> gcc/
> PR rtl-optimization/111702
> * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
> gcc/cse.cc | 38 +++++++++++-------
> gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +++++++++++++++++++++
> 2 files changed, 67 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
> unsigned dest_hash;
> /* The SET_DEST, with SUBREG, etc., stripped. */
> rtx inner_dest;
> + /* Original machine mode, in case it becomes a CONST_INT. */
> + ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> /* Nonzero if the SET_SRC is in memory. */
> - char src_in_memory;
> + unsigned int src_in_memory : 1;
> /* Nonzero if the SET_SRC contains something
> whose value cannot be predicted and understood. */
> - char src_volatile;
> - /* Original machine mode, in case it becomes a CONST_INT. */
> - ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> + unsigned int src_volatile : 1;
> + /* Nonzero if RTL is an artifical set that has been created to describe
> + part of an insn's effect. Zero means that RTL appears directly in
> + the insn pattern. */
> + unsigned int is_fake_set : 1;
> /* Hash value of constant equivalent for SET_SRC. */
> unsigned src_const_hash;
> /* A constant equivalent for SET_SRC, if any. */
> @@ -4229,12 +4233,15 @@ try_back_substitute_reg (rtx set, rtx_insn *insn)
> }
> }
>
> -/* Add an entry containing RTL X into SETS. */
> +/* Add an entry containing RTL X into SETS. IS_FAKE_SET is true if X is
> + an artifical set that has been created to describe part of an insn's
> + effect. */
> static inline void
> -add_to_set (vec<struct set> *sets, rtx x)
> +add_to_set (vec<struct set> *sets, rtx x, bool is_fake_set)
> {
> struct set entry = {};
> entry.rtl = x;
> + entry.is_fake_set = is_fake_set;
> sets->safe_push (entry);
> }
>
> @@ -4271,7 +4278,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set>
> *psets)
> && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
> {
> /* First register the vector itself. */
> - add_to_set (psets, x);
> + add_to_set (psets, x, false);
> rtx src = SET_SRC (x);
> /* Go over the constants of the CONST_VECTOR in forward order, to
> put them in the same order in the SETS array. */
> @@ -4281,11 +4288,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set>
> *psets)
> used to tell CSE how to get to a particular constant. */
> rtx y = simplify_gen_vec_select (SET_DEST (x), i);
> gcc_assert (y);
> - add_to_set (psets, gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)));
> + rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
> + add_to_set (psets, set, true);
> }
> }
> else
> - add_to_set (psets, x);
> + add_to_set (psets, x, false);
> }
> else if (GET_CODE (x) == PARALLEL)
> {
> @@ -4306,7 +4314,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set>
> *psets)
> else if (GET_CODE (SET_SRC (y)) == CALL)
> ;
> else
> - add_to_set (psets, y);
> + add_to_set (psets, y, false);
> }
> }
> }
> @@ -4616,6 +4624,7 @@ cse_insn (rtx_insn *insn)
> int src_related_regcost = MAX_COST;
> int src_elt_regcost = MAX_COST;
> scalar_int_mode int_mode;
> + bool is_fake_set = sets[i].is_fake_set;
>
> dest = SET_DEST (sets[i].rtl);
> src = SET_SRC (sets[i].rtl);
> @@ -4627,7 +4636,7 @@ cse_insn (rtx_insn *insn)
> mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
> sets[i].mode = mode;
>
> - if (src_eqv)
> + if (!is_fake_set && src_eqv)
> {
> machine_mode eqvmode = mode;
> if (GET_CODE (dest) == STRICT_LOW_PART)
> @@ -4648,7 +4657,7 @@ cse_insn (rtx_insn *insn)
> /* If this is a STRICT_LOW_PART assignment, src_eqv corresponds to the
> value of the INNER register, not the destination. So it is not
> a valid substitution for the source. But save it for later. */
> - if (GET_CODE (dest) == STRICT_LOW_PART)
> + if (is_fake_set || GET_CODE (dest) == STRICT_LOW_PART)
> src_eqv_here = 0;
> else
> src_eqv_here = src_eqv;
> @@ -5158,7 +5167,7 @@ cse_insn (rtx_insn *insn)
>
> /* Terminate loop when replacement made. This must terminate since
> the current contents will be tested and will always be valid. */
> - while (1)
> + while (!is_fake_set)
> {
> rtx trial;
>
> @@ -5425,7 +5434,8 @@ cse_insn (rtx_insn *insn)
> with the head of the class. If we do not do this, we will have
> both registers live over a portion of the basic block. This way,
> their lifetimes will likely abut instead of overlapping. */
> - if (REG_P (dest)
> + if (!is_fake_set
> + && REG_P (dest)
> && REGNO_QTY_VALID_P (REGNO (dest)))
> {
> int dest_q = REG_QTY (REGNO (dest));
> diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> new file mode 100644
> index 00000000000..8af2c54de3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target aarch64*-*-* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int data[];
> +
> +void __RTL (startwith ("vregs")) foo ()
> +{
> + (function "foo"
> + (insn-chain
> + (block 2
> + (edge-from entry (flags "FALLTHRU"))
> + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
> + (insn 4 (set (reg:V16QI <0>)
> + (const_vector:V16QI [(const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)])))
> + (insn 5 (set (reg:V2SI v0)
> + (const_vector:V2SI [(const_int 1) (const_int 0)])))
> + (insn 6 (set (reg:V16QI v1)
> + (const_vector:V16QI [(const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)
> + (const_int 0) (const_int 0)
> + (const_int 1) (const_int 1)])))
> + (insn 7 (set (reg:QI x0) (subreg:QI (reg:V16QI <0>) 0))
> + (expr_list:REG_EQUAL (const_int 1) (nil)))
> + (insn 8 (use (reg:V16QI <0>)))
> + (insn 9 (use (reg:V2SI v0)))
> + (insn 10 (use (reg:V16QI v1)))
> + (insn 11 (use (reg:QI x0)))
> + (edge-to exit (flags "FALLTHRU"))
> + ) ;; block 2
> + ) ;; insn-chain
> + ) ;; function
> +}
> --
> 2.25.1
>