https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, now looking myself. RTL expansion creates
(insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4
A32])
(reg:SI 49)) "t.c":12:13 -1
(nil))
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
(const_int 0 [0])) "t.c":13:12 -1
(nil))
(insn 10 9 11 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ])
(const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
(const_int 0 [0])) "t.c":13:12 -1
(nil))
(insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb
*)ptr_1(D)].b.u.f+0 S4 A32])
(const_int 0 [0])) "t.c":14:12 -1
(nil))
(insn 12 11 13 2 (set (reg:SI 51)
(mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]))
"t.c":15:17 -1
(nil))
where insn 11 is the important one. Somehow on nios2 the CSE1 removes that
store.
deferring deletion of insn with uid = 11.
and we end up with
(insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4
A32])
(reg:SI 49)) "t.c":12:13 5 {movsi_internal}
(expr_list:REG_DEAD (reg:SI 49)
(nil)))
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
(const_int 0 [0])) "t.c":13:12 5 {movsi_internal}
(nil))
(insn 10 9 12 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ])
(const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
(const_int 0 [0])) "t.c":13:12 5 {movsi_internal}
(nil))
(insn 12 10 13 2 (set (reg:SI 51 [ bv_3(D)->b.u.f ])
(mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]))
"t.c":15:17 5 {movsi_internal}
(expr_list:REG_DEAD (reg/v/f:SI 47 [ bv ])
(nil)))
where there indeed is no scheduling barrier anymore.
I didn't know CSE removes stores or why this only triggers on nios2, it looks
like some DF thing? Backtrace of the "DSE":
#0 delete_insn (insn=0x7ffff6bc3400)
at /space/rguenther/src/gcc/gcc/cfgrtl.c:135
#1 0x0000000000b0bfa5 in delete_insn_and_edges (insn=0x7ffff6bc3400)
at /space/rguenther/src/gcc/gcc/cfgrtl.c:237
#2 0x0000000001a9d8eb in cse_insn (insn=0x7ffff6bc3400)
at /space/rguenther/src/gcc/gcc/cse.c:5571
#3 0x0000000001aa0b76 in cse_extended_basic_block (ebb_data=0x7fffffffdc90)
at /space/rguenther/src/gcc/gcc/cse.c:6614
#4 0x0000000001aa10a5 in cse_main (f=0x7ffff6cce310, nregs=52)
at /space/rguenther/src/gcc/gcc/cse.c:6793
that's
/* Similarly for no-op moves. */
else if (noop_insn)
{
if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
cse_cfg_altered = true;
cse_cfg_altered |= delete_insn_and_edges (insn);
/* No more processing for this set. */
sets[i].rtl = 0;
so appearantly it does redundant store removal as well...
/* Similarly, lots of targets don't allow no-op
(set (mem x) (mem x)) moves. Even (set (reg x) (reg x))
might be impossible for certain registers (like CC registers). */
else if (n_sets == 1
&& !CALL_P (insn)
&& (MEM_P (trial) || REG_P (trial))
&& rtx_equal_p (trial, dest)
&& !side_effects_p (dest)
&& (cfun->can_delete_dead_exceptions
|| insn_nothrow_p (insn)))
{
SET_SRC (sets[i].rtl) = trial;
noop_insn = true;
break;
}
where
(gdb) p debug_rtx (insn)
(insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb
*)ptr_1(D)].b.u.f+0 S4 A32])
(const_int 0 [0])) "t.c":14:12 5 {movsi_internal}
(expr_list:REG_DEAD (reg/v/f:SI 48 [ ptr ])
(nil)))
(gdb) p debug_rtx (trial)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4
A32])
$4 = void
(gdb) p debug_rtx (dest)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4
A32])
$6 = void
so it might be that the trigger is a target where sizeof(long long) = 2 *
sizeof(long) _and_ we split stores to the larger type
(I tried to pick a set of types where sizeof is the same but
alias-sets are different - otherwise I'd have to cater for big vs.
little-endian).