Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes: > Hi Richard, > > Thanks for the feedback! We have sent a new version that uses > lowpart_subreg > (https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682835.html). > We had tried that before, but we were mishandling the case where there > are multiple stores with the same offset as the load.
Thanks, I'll have a look. > As for `it->offset`, that's actually the offset difference between the > store and the load (we're trying to find the store with the same > offset as the load), so the endianness should be irrelevant in that > case. But I thought the code was allowing multiple stores to be forwarded to a single (wider) load. E.g. 4 individual byte stores at address X, X+1, X+2 and X+3 could be forwarded to a 4-byte load at address X. And the code I mentioned is handling the least significant byte by zero-extending it. For big-endian targets, the least significant byte should come from address X+3 rather than address X. The byte at address X (i.e. the byte with the equal offset) should instead go in the most significant byte, typically using a shift left. Richard > > Konstantinos > > On Tue, Apr 29, 2025 at 8:48 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes: >> > During the base register initialization, when we are eliminating the load >> > instruction, we were calling `emit_move_insn` on registers of the same >> > size but of different mode in some cases, causing an ICE. >> > >> > This patch fixes this, by adding a check for the modes to match before >> > calling `emit_move_insn`. >> > >> > Bootstrapped/regtested on AArch64 and x86_64. >> > >> > PR rtl-optimization/119884 >> > >> > gcc/ChangeLog: >> > >> > * avoid-store-forwarding.cc (process_store_forwarding): >> > Added check to ensure that the register modes match >> > before calling `emit_move_insn`. >> > >> > gcc/testsuite/ChangeLog: >> > >> > * gcc.target/i386/pr119884.c: New test. >> > --- >> > gcc/avoid-store-forwarding.cc | 3 ++- >> > gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++++++++++++ >> > 2 files changed, 15 insertions(+), 1 deletion(-) >> > create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c >> > >> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc >> > index ded8d7e596e0..aec05c22ac37 100644 >> > --- a/gcc/avoid-store-forwarding.cc >> > +++ b/gcc/avoid-store-forwarding.cc >> > @@ -244,7 +244,8 @@ process_store_forwarding (vec<store_fwd_info> &stores, >> > rtx_insn *load_insn, >> > GET_MODE_BITSIZE (GET_MODE (it->mov_reg)))) >> > base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg); >> > >> > - if (base_reg) >> > + /* Generate a move instruction, only when the modes match. */ >> > + if (base_reg && dest_mode == GET_MODE (base_reg)) >> > { >> > rtx_insn *move0 = emit_move_insn (dest, base_reg); >> > if (recog_memoized (move0) >= 0) >> >> Is this a complete fix though? It looks like: >> >> rtx base_reg = it->mov_reg; >> if (known_gt (GET_MODE_BITSIZE (dest_mode), >> GET_MODE_BITSIZE (GET_MODE (it->mov_reg)))) >> base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg); >> >> implicitly assumes that dest_mode and GET_MODE (it->mov_reg) are both >> integer modes. >> >> Given that this code only triggers if we're going to eliminate the load, >> and thus presumably define all the other bytes of DEST_MODE later, >> couldn't we just use lowpart_subreg? That way we can cope with different >> like-sized modes and with "extensions" from one mode to another. Of course, >> there might be cases where doing this for different modes would trigger >> unwanted cross-file register transfers, but that seems like a general >> risk with forwarding. >> >> Not related to this PR, but on the same piece of code: >> >> /* If we're eliminating the load then find the store with zero offset >> and use it as the base register to avoid a bit insert if possible. >> */ >> if (load_elim && it->offset == 0) >> >> Doesn't this offset check need to take endianness into account? >> Byte 0 would normally be the msb for big-endian targets. >> >> Thanks, >> Richard >> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr119884.c >> > b/gcc/testsuite/gcc.target/i386/pr119884.c >> > new file mode 100644 >> > index 000000000000..34d5b689244d >> > --- /dev/null >> > +++ b/gcc/testsuite/gcc.target/i386/pr119884.c >> > @@ -0,0 +1,13 @@ >> > +/* { dg-do compile } */ >> > +/* { dg-options "-O2 -fno-dse -favoid-store-forwarding" } */ >> > + >> > +typedef __attribute__((__vector_size__(64))) char V; >> > +char c; >> > +V v; >> > + >> > +char >> > +foo() >> > +{ >> > + v *= c; >> > + return v[0]; >> > +} >> > \ No newline at end of file