Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes:
> Hi Richard,
>
> Thanks for the feedback! We have sent a new version that uses
> lowpart_subreg 
> (https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682835.html).
> We had tried that before, but we were mishandling the case where there
> are multiple stores with the same offset as the load.

Thanks, I'll have a look.

> As for `it->offset`, that's actually the offset difference between the
> store and the load (we're trying to find the store with the same
> offset as the load), so the endianness should be irrelevant in that
> case.

But I thought the code was allowing multiple stores to be forwarded to
a single (wider) load.  E.g. 4 individual byte stores at address X, X+1,
X+2 and X+3 could be forwarded to a 4-byte load at address X.  And the code
I mentioned is handling the least significant byte by zero-extending it.

For big-endian targets, the least significant byte should come from
address X+3 rather than address X.  The byte at address X (i.e. the
byte with the equal offset) should instead go in the most significant
byte, typically using a shift left.

Richard

>
> Konstantinos
>
> On Tue, Apr 29, 2025 at 8:48 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes:
>> > During the base register initialization, when we are eliminating the load
>> > instruction, we were calling `emit_move_insn` on registers of the same
>> > size but of different mode in some cases, causing an ICE.
>> >
>> > This patch fixes this, by adding a check for the modes to match before
>> > calling `emit_move_insn`.
>> >
>> > Bootstrapped/regtested on AArch64 and x86_64.
>> >
>> >       PR rtl-optimization/119884
>> >
>> > gcc/ChangeLog:
>> >
>> >       * avoid-store-forwarding.cc (process_store_forwarding):
>> >       Added check to ensure that the register modes match
>> >       before calling `emit_move_insn`.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >       * gcc.target/i386/pr119884.c: New test.
>> > ---
>> >  gcc/avoid-store-forwarding.cc            |  3 ++-
>> >  gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++++++++++++
>> >  2 files changed, 15 insertions(+), 1 deletion(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c
>> >
>> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
>> > index ded8d7e596e0..aec05c22ac37 100644
>> > --- a/gcc/avoid-store-forwarding.cc
>> > +++ b/gcc/avoid-store-forwarding.cc
>> > @@ -244,7 +244,8 @@ process_store_forwarding (vec<store_fwd_info> &stores, 
>> > rtx_insn *load_insn,
>> >                       GET_MODE_BITSIZE (GET_MODE (it->mov_reg))))
>> >           base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg);
>> >
>> > -       if (base_reg)
>> > +       /* Generate a move instruction, only when the modes match.  */
>> > +       if (base_reg && dest_mode == GET_MODE (base_reg))
>> >           {
>> >             rtx_insn *move0 = emit_move_insn (dest, base_reg);
>> >             if (recog_memoized (move0) >= 0)
>>
>> Is this a complete fix though?  It looks like:
>>
>>           rtx base_reg = it->mov_reg;
>>           if (known_gt (GET_MODE_BITSIZE (dest_mode),
>>                         GET_MODE_BITSIZE (GET_MODE (it->mov_reg))))
>>             base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg);
>>
>> implicitly assumes that dest_mode and GET_MODE (it->mov_reg) are both
>> integer modes.
>>
>> Given that this code only triggers if we're going to eliminate the load,
>> and thus presumably define all the other bytes of DEST_MODE later,
>> couldn't we just use lowpart_subreg?  That way we can cope with different
>> like-sized modes and with "extensions" from one mode to another.  Of course,
>> there might be cases where doing this for different modes would trigger
>> unwanted cross-file register transfers, but that seems like a general
>> risk with forwarding.
>>
>> Not related to this PR, but on the same piece of code:
>>
>>       /* If we're eliminating the load then find the store with zero offset
>>          and use it as the base register to avoid a bit insert if possible.  
>> */
>>       if (load_elim && it->offset == 0)
>>
>> Doesn't this offset check need to take endianness into account?
>> Byte 0 would normally be the msb for big-endian targets.
>>
>> Thanks,
>> Richard
>>
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr119884.c 
>> > b/gcc/testsuite/gcc.target/i386/pr119884.c
>> > new file mode 100644
>> > index 000000000000..34d5b689244d
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr119884.c
>> > @@ -0,0 +1,13 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fno-dse -favoid-store-forwarding" } */
>> > +
>> > +typedef __attribute__((__vector_size__(64))) char V;
>> > +char c;
>> > +V v;
>> > +
>> > +char
>> > +foo()
>> > +{
>> > +  v *= c;
>> > +  return v[0];
>> > +}
>> > \ No newline at end of file

Reply via email to