On 9/4/24 4:11 PM, Palmer Dabbelt wrote:
On Wed, 04 Sep 2024 13:47:58 PDT (-0700), jeffreya...@gmail.com wrote:

So I was looking at a performance regression in spec with Ventana's
internal tree.  Ultimately the problem was a bad interaction with an
internal patch (REP_MODE_EXTENDED), fwprop and ext-dce.  The details of
that problem aren't particularly important.


Removal of the local patch went reasonably well.  But I did see some
secondary cases where we had redundant sign extensions.  The most
notable cases come from the integer sCC insns.

Expansion of those cases for rv64 can be improved using Jivan's trick.
ie, if the target is not DImode, then create a DImode temporary for the
result and copy the low bits out with a promoted subreg to the real target.


With the change in expansion the final code we generate is slightly
different for a few tests at -O1/-Og, but should perform the same.  The
key for the affected tests is we're not seeing the introduction of
unnecessary extensions.  Rather than adjust the regexps to handle the
-O1/-Og output, skipping for those seemed OK to me.  I didn't extract a
testcase.  I'm a bit fried from digging through LTO'd code right now ;-)

Just spot-checking that first one, it's changing from

        slt     a0,a0,a1
        seqz    a0,a0
        ret

to

        slt     a0,a0,a1
        xori    a0,a0,1
        ret

which smells like it's just an arbitrary instruction selection decision getting
tripped up -- they're the same cost (or at least I'd expect them to be pretty
much everywhere) and do the same thing for the 1-bit output, so it's not like
three's a reason to select one rather than the other.
Exactly. It's just instruction selection changes as a result of something changing early in the RTL pipeline. I would expect both of those sequences to perform the same.

The RTL generation change is really targeted at cases that show up elsewhere. For example feval(state_t*, int, t_eval_comps*) in deepsjeng:

      011038b3          snez                    a7,a7
      01e037b3          snez                    a5,t5
[ ... ]
      2881              sext.w                  a7,a7
[ ... ]
      0007851b          sext.w                  a0,a5


By constructing into an X mode temporary and extracting via a promoted subreg, we tell the rest of the RTL pipeline that the value is already sign extended and the explicit extensions above are unnecessary.
Jeff

Reply via email to