Hi, thanks for the feedback! It turned out to be an endianness issue and we needed to treat the call to `store_bit_field` differently for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN, like H8. I've sent a new version (https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668214.html).
Thanks, Konstantinos. On Sat, Oct 26, 2024 at 5:54 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 10/23/24 8:27 AM, Konstantinos Eleftheriou wrote: > > From: kelefth <konstantinos.elefther...@vrull.eu> > > > > This pass detects cases of expensive store forwarding and tries to avoid > > them > > by reordering the stores and using suitable bit insertion sequences. > > For example it can transform this: > > > > strb w2, [x1, 1] > > ldr x0, [x1] # Expensive store forwarding to larger load. > > > > To: > > > > ldr x0, [x1] > > strb w2, [x1] > > bfi x0, x2, 0, 8 > > > > Assembly like this can appear with bitfields or type punning / unions. > > On stress-ng when running the cpu-union microbenchmark the following > > speedups > > have been observed. > > > > Neoverse-N1: +29.4% > > Intel Coffeelake: +13.1% > > AMD 5950X: +17.5% > [ ... ] > Seems to still have some correctness issues. H8 reports this when the > pass is enabled by default: > > > Tests that now fail, but worked before (8 tests): > > > > h8300-sim/-mh/-mint32: gcc: gcc.c-torture/execute/pr63843.c -O2 > > execution test > [ ... ] > > It looks like we miss setting the high half of the register. The good > sequence looks like: > > > ! mov.b @er2,r3l > > ! mov.b r3l,@(2,er7) > > mov.b @(1,er2),r2l > > mov.b r2l,@(3,er7) > > ! mov.w @(2,er7),r0 > > Note the word (16 bit) move at the end of the sequence that sets both > halves of the r0 register. > > The broken sequence looks like this: > > > ! mov.b @er2,r0l > > mov.b @(1,er2),r2l > > + mov.b r0l,@(2,er7) > > mov.b r2l,@(3,er7) > > ! mov.b r2l,r0l > > Note how all the assignments are byte sized and that nothing sets r0h. > We get whatever value happened to be lying around in the high half of > the register. > > You should be able to see this with an H8 cross compiler and shouldn't > need a full toolchain to test. Compile with -O2 -mh -mint32. There is > only one opportunity for SFB avoidance in pr63843. > > There's also a failure for bfin-elf, but I suspect it's ultimately the > same underlying issue. Obviously I'll retest once there's a fix. > > jeff > >