9.x and 10.x

law at gcc dot gnu.org via Gcc-bugs Sat, 29 Nov 2025 07:30:48 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116560


--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So a few more notes.

ISTM part of the problem is the bswap pass is ignoring the fact that the target
may not have efficient support for unaligned accesses.  So it creates a 16 bit
load from the two 8 bit elements.

Then the target has to break that down into a safe sequence.  Then it expands
the bswap idiom, then we have a redundant zero extension.

If you use -mtune=thead-c906 to select a design that has fast unaligned access,
you'll see improved, but not great code -- the redundant extension is still in
there:

        lhu     a0,0(a0)        # 7     [c=28 l=4]  *zero_extendhisi2/1
        slli    a4,a0,8 # 11    [c=4 l=4]  *ashlsi3
        srli    a5,a0,8 # 10    [c=4 l=4]  *lshrsi3
        add     a5,a5,a4        # 14    [c=4 l=4]  *addsi3/0
        slli    a5,a5,16        # 31    [c=4 l=4]  *ashlsi3
        srli    a5,a5,16        # 32    [c=4 l=4]  *lshrsi3
        seqz    a0,a0   # 24    [c=4 l=4]  *seq_zero_sisi
        sw      a5,0(a1)        # 16    [c=4 l=4]  *movsi_internal/3
        ret             # 35    [c=0 l=4]  simple_return

We can get the same thing by hacking up the relevant part of the bswap pass to
query if the target has fast unaligned access.

The redundant zero extension is more problematical.  For reasons I still don't
undrestand our nonzero bit tracking totally mucks up the set of bits
potentially nonzero.  If we put a breakpoint in combine_simplify_rtx we
eventually see:

(ior:SI (and:SI (reg:SI 147)
        (const_int 65280 [0xff00]))
    (and:SI (subreg:SI (reg:HI 148 [ _5 ]) 0)
        (const_int 255 [0xff])))

And if we query the nonzero bits of (reg 147) and (reg 148) we get:

(gdb) p/x nonzero_bits (x.u.fld[0].rt_rtx.u.fld[0].rt_rtx, E_SImode)
$198 = 0xff00ff00
(gdb) p/x nonzero_bits (x.u.fld[1].rt_rtx.u.fld[0].rt_rtx, E_SImode) 
$203 = 0xffff00ff

The second result kind of makes sense in that we've got a paradoxical subreg,
so the bits outside HImode are undefined.

[Bug rtl-optimization/116560] [13/14/15/16 Regression] RISC-V : rv32 code optimization , big code difference between 8/9.x and 10.x

Reply via email to