Hi, thanks for the feedback!

It turned out to be an endianness issue and we needed to treat the
call to `store_bit_field` differently for machines with
BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN, like H8.
I've sent a new version
(https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668214.html).

Thanks,
Konstantinos.



On Sat, Oct 26, 2024 at 5:54 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 10/23/24 8:27 AM, Konstantinos Eleftheriou wrote:
> > From: kelefth <konstantinos.elefther...@vrull.eu>
> >
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >       strb    w2, [x1, 1]
> >       ldr     x0, [x1]      # Expensive store forwarding to larger load.
> >
> > To:
> >
> >       ldr     x0, [x1]
> >       strb    w2, [x1]
> >       bfi     x0, x2, 0, 8
> >
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >    Neoverse-N1:      +29.4%
> >    Intel Coffeelake: +13.1%
> >    AMD 5950X:        +17.5%
> [ ... ]
> Seems to still have some correctness issues.  H8 reports this when the
> pass is enabled by default:
>
> > Tests that now fail, but worked before (8 tests):
> >
> > h8300-sim/-mh/-mint32: gcc: gcc.c-torture/execute/pr63843.c   -O2  
> > execution test
> [ ... ]
>
> It looks like we miss setting the high half of the register.  The good
> sequence looks like:
>
> > !       mov.b   @er2,r3l
> > !       mov.b   r3l,@(2,er7)
> >         mov.b   @(1,er2),r2l
> >         mov.b   r2l,@(3,er7)
> > !       mov.w   @(2,er7),r0
>
> Note the word (16 bit) move at the end of the sequence that sets both
> halves of the r0 register.
>
> The broken sequence looks like this:
>
> > !       mov.b   @er2,r0l
> >         mov.b   @(1,er2),r2l
> > +       mov.b   r0l,@(2,er7)
> >         mov.b   r2l,@(3,er7)
> > !       mov.b   r2l,r0l
>
> Note how all the assignments are byte sized and that nothing sets r0h.
> We get whatever value happened to be lying around in the high half of
> the register.
>
> You should be able to see this with an H8 cross compiler and shouldn't
> need a full toolchain to test.  Compile with -O2 -mh -mint32.  There is
> only one opportunity for SFB avoidance in pr63843.
>
> There's also a failure for bfin-elf, but I suspect it's ultimately the
> same underlying issue.  Obviously I'll retest once there's a fix.
>
> jeff
>
>

Reply via email to