Re: [PATCH V3] RISC-V: Prevent speculative vsetvl insn scheduling

Palmer Dabbelt Sat, 15 Feb 2025 10:49:42 -0800

On Sat, 15 Feb 2025 10:30:18 PST (-0800), jeffreya...@gmail.com wrote:



On 2/15/25 10:07 AM, Palmer Dabbelt wrote:

Since this is purely a performance related patch, gate the target hook
with an opt flag to see the fallout.

    PR 117974

gcc/ChangeLog:

    * config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
    (TARGET_SCHED_CAN_SPECULATE_INSN): Implement.
    * config/riscv/riscv.opt: Add temporary opt.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/vsetvl/pr117974.c: New test.

I was kind of thinking we'd have a flag in the tuning structure so that
each uarch could make a reasonable default selection rather than having
it controlled by a flag.


Ya, seems reasonable.  We wanted the flag to make it easier to do some
expirements -- otherwise we'd need either two builds or an expirement-
specific tune struct, and our perf guys aren't so great with SW so the
flag is just easier.

Might make sense to start with a flag to test internally if it makes a
meaningful difference.  We've certainly done that as well.

I realize there may not be any designs upstream which would set this
right now, but it puts the right infrastructure in place.  Very similar
to the divmod expansion stuff from a while back.


I'm fine either way.  It's always kind of tough to tell what the right
answer is on these things: we're taking complexity without a strong
justification, but if it's an ISA corner case then it's not crazy to
assume some HW will trip over it at some point.  This one is so small I
don't think it really matters which way we go, but if someone else is
interested in using it that'd probably tip things towards getting it
upstream on my end.

My impression was the vl=0 scenario was fairly troublesome on some of
the designs out there, so I'd tend to think it's worth getting upstream
either now or for gcc-16.  I don't *think* it's an issue for ours, but
I'm constantly surprised by the corner cases issues that pop up.

It certainly smells like VL=0 is tripping people up, just given theamount of code that goes by trying to avoid VL=0. I think nobody's hadreal numbers, though (at least none in the glibc thread as ofyesterday).

FWIW: This is just in the idea stage right now so it's no big deal if we
don't merge it, it's both pretty easy to keep locally and of unknown
utility (even internally).  We've got to go look at the generated code
either way and we'll post what we can publicly.

So maybe we defer a bit, y'all get some data the reconvene on this patch
for gcc-16?  We were going to have to answer the gcc-15 vs gcc-16
question anyway, just wasn't quite to it yet.

Ya, I don't think this is stage4 material. Unless some of those otherVL=0 avoiders are trying to work around a functional bug, but we'd needsomeone who knows to speak up.

Edwin and I were also talking in the office yesterday and I think we can
do a slightly better job here if we use the runqueue sorting hook.  At
that point we can actually look at the instructions and see if the
branch will avoid a VL=0 case before preventing the hoisting, so we can
avoid introducing unnecessary latency.

The speculation stuff is fairly isolated.  You can probably walk
outwards in the call chain from the point were this hook is used to see
if there's a point with a better hook or a point were you could add a
new one to do his scanning.

Right, I forgot about that: I also tried playing with some of thedependency insertion stuff, but everything thing I did either appearedto have no effect or trigger an infinate-loop-type stack overflowsomewhere in the scheduler (some function called priority() or somethingcalling itself, I think).

Then the next morning I saw that weird short-circuit side-effect vectortype patch and accidentally git-checkout deleted everything when tryingto make sure that one actually did anything. After trying to readthrough the REG_DEP_* implementation/usage/docs a few time I decided tojust sort the ready queue.

I don't think I remember to tell Edwin that, Friday was kind ofhectic...

I put together some code that did what's in here via that sorting hook,
but my attempts at matching more specific subtrees failed so I sent it
to Edwin to try and figure out what I screwed up.  It's probably
something easy, I always screw that stuff up until I get it right.

I'm a bit surprised you have visibility into the branch in the sort
step, but if you do, then great.

I'm at least seeing both the jump_insn with an if_then_else for the zerotest and the vsetvli in the queue at the same time, and if I just sortthe queue to make the vsetvli happen after the jump then things getreordered. So I think I have enough information in this case.

No idea if that holds elsewhere, though, so if that way ends upimplementable then we'll also have to go poke around the generated codea bunch.


So assuming that works we'll have another flavor of this to post.

Q. Does Rivos have a cpu defined in your internal tree?  We do that for
ours as keep it as a local change for now.  Assuming you do, then adding
the bits to the tuning structure should make it trivial to make this
behavior the default on your design internally.


We've got internal tune structs.  They're super easy to keep around and
we'll likely always need them for internal stuff, so no big deal on that
end.

OK.  It's a slight pain on the merge side, but quite convenient in other
ways.  Sounds like we're in the same boat on that issue.

Ya.

Re: [PATCH V3] RISC-V: Prevent speculative vsetvl insn scheduling

Reply via email to