[Bug target/64345] [SH] Improve single bit extraction

cvs-commit at gcc dot gnu.org via Gcc-bugs Mon, 27 Oct 2025 05:59:55 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345


--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <[email protected]>:

https://gcc.gnu.org/g:e0f27973e6eedf247e53d797b6ed166cba711de4

commit r16-4651-ge0f27973e6eedf247e53d797b6ed166cba711de4
Author: Jeff Law <[email protected]>
Date:   Mon Oct 27 06:58:09 2025 -0600

    [RISC-V][PR target/64345][PR tree-optimization/80770] Improve simple bit
extractions on RISC-V

    Whee.  So what got me wandering down this path was looking for a good bug
for
    Shreya or Austin and concluding this one would be dreadful for both ð

    We're basically looking at single bit extractions where there's a bit-not
    somewhere in the sequence.

    A few examples for the motivating PR64345.  They were for the SH, but
aren't
    handled well for RISC-V either.

    > unsigned int test0 (unsigned int x)
    > {
    >   return ((x >> 4) ^ 1) & 1;
    > }
    >
    > unsigned int test1 (unsigned int x)
    > {
    >   return ((x >> 4) & 1) ^ 1;
    > }
    >
    > unsigned int test2 (unsigned int x)
    > {
    >   return ~(x >> 4) & 1;
    > }

    Right now those generates sequences like this:

    >         li      a5,1
    >         srliw   a0,a0,4
    >         andn    a0,a5,a0

    But we can do better.  This is semantically equivalent, two bytes shorter
and
    at least as fast.

    >         xori    a0,a0,16        # 8     [c=4 l=4]  *xordi3/1
    >         bexti   a0,a0,4 # 16    [c=4 l=4]  *bexti

    The core problem is the little white lie we have for and-not:

    > (define_insn_and_split "*<optab>_not_const<mode>"
    >   [(set (match_operand:X 0 "register_operand" "=r")
    >        (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand"
"r"))
    >               (match_operand:X 2 "const_arith_operand" "I")))

    There is no such insn.  andn, orn, xorn do not accept constants. But
pretending
    we do may help generate better performing code in some cases.

    That pattern is a single insn from combine's standpoint.  So when we see
this:

    > Trying 6 -> 10:
    >     6: r140:DI=zero_extract(r145:DI,0x1c,0x4)
    >       REG_DEAD r145:DI
    >    10: {r143:DI=~r140:DI&0x1;clobber scratch;}
    >       REG_DEAD r140:DI
    > Failed to match this instruction:
    > (parallel [
    >         (set (reg:DI 143)
    >             (zero_extract:DI (xor:DI (reg:DI 145 [ x ])
    >                     (const_int 16 [0x10]))
    >                 (const_int 1 [0x1])
    >                 (const_int 4 [0x4])))
    >         (clobber (scratch:DI))
    >     ])

    We can't split it because the result would be 2 insns and it was already 2
    insns from combine's standpoint (the little while lie shows up in insn 10
which
    is really 2 instructions, but just one insn).

    I looked at the wacky possibility of making the problem pattern only
available
    after reload in the hopes that late-combine could generate it, but
late-combine
    doesn't handle scratches/clobbers like that.

    I consider the cases where the lie helps code generation very much on the
    margins and realized that we could turn it into a peephole2.  That way we
don't
    regress on those marginal cases, but the problem pattern doesn't get in the
way
    of combine's work.

    So that's a good first step. But not entirely sufficient to get the best
    possible code for those tests.  In particular, given equal costs this patch
    also steers towards AND which has the advantage that on an OoO core the
    constant load in that case can sometimes issue for free or it might be
    encodable directly as well.  On an in-order core it gives the scheduler
more
    freedom.

    There's a bit of tension on that topic and some issues I'm not trying to
tackle
    at this time.  Essentially in the sign-bit-splat path, depending on the
    constants different paths might be preferred (particularly when there's a
NOT
    in the sequence).  It's on the margins and touched on by a different BZ.

    The net is we can fix the various extraction problems on RISC-V exposed by
the
    testcases in PR64345 without regressing the minor and-not cases the
    define_insn_and_split was handling with one less define_insn_and_split in
the
    port.  It likely improves pr80770 as well, though I haven't checked that.

    Bootstrapped and regression tested on both the Pioneer and BPI.  Also
    regression tested on riscv64-elf and riscv32-elf.  And since this touched
    ifcvt.cc, bootstrapped and regression tested on x86_64 as well ð  It's
also
    regression tested across all the embedded targets in my tester.

            PR target/64345
            PR tree-optimization/80770
    gcc/

            * config/riscv/bitmanip.md (<optab>_not_const<mode>): Turn into a
            peephole2 to avoid matching prior to combine.
            * ifcvt.cc (noce_try_sign_bit_splat): When costs are equal steer
            towards an AND based sequence.

    gcc/testsuite/
            * gcc.target/riscv/pr120553-2.c: Update expected output.
            * gcc.target/riscv/pr64345.c: New test.
            * gcc.target/riscv/zbb-andn-orn-01.c: Skip when peephole2 isn't
run.
            * gcc.target/riscv/zbb-andn-orn-02.c: Likewise.

[Bug target/64345] [SH] Improve single bit extraction

Reply via email to