https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for the CSE issue we go through the equivalence chain and find

(gdb) p debug_rtx (p->exp)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4
A32])

5076              if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
5077                src_related = dest;
...
5121              if (rtx_equal_p (src_related, dest))
5122                src_related_cost = src_related_regcost = -1;

the first issue is that we're recording dest as src_related here losing the
opportunity to do a validity check later on.  If we fix that we can do the
usual validity check, copied from DSE:

diff --git a/gcc/cse.c b/gcc/cse.c
index 3e8724b3fed..f07bbdbebad 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5074,7 +5074,7 @@ cse_insn (rtx_insn *insn)
             to prefer it.  Copy it to src_related.  The code below will
             then give it a negative cost.  */
          if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
-           src_related = dest;
+           src_related = p->exp;
        }

       /* Find the cheapest valid equivalent, trying all the available
@@ -5332,7 +5332,16 @@ cse_insn (rtx_insn *insn)
                   && rtx_equal_p (trial, dest)
                   && !side_effects_p (dest)
                   && (cfun->can_delete_dead_exceptions
-                      || insn_nothrow_p (insn)))
+                      || insn_nothrow_p (insn))
+                  /* We can only remove the later store if the earlier aliases
+                     at least all accesses the later one.  */
+                  && (!MEM_P (trial)
+                      || ((MEM_ALIAS_SET (dest) == MEM_ALIAS_SET (trial)
+                           || alias_set_subset_of (MEM_ALIAS_SET (dest),
+                                                   MEM_ALIAS_SET (trial)))
+                           && (!MEM_EXPR (trial)
+                               || refs_same_for_tbaa_p (MEM_EXPR (trial),
+                                                        MEM_EXPR (dest))))))
            {
              SET_SRC (sets[i].rtl) = trial;
              noop_insn = true;

that gives us for the testcase

foo:
        movi    r2, 1
        stw     r2, 0(r4)
        stw     zero, 0(r5)
        ldw     r2, 0(r4)
        stw     zero, 4(r5)
        ret

which looks correct to me.  I'm going to test the above on x86_64-linux.

Reply via email to