Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

Jeff Law Thu, 13 Feb 2025 15:28:38 -0800



On 2/13/25 11:13 AM, Palmer Dabbelt wrote:

FWIW, that's what tripped up my "maybe there's a functional bug here"thought. It looks like the scheduling is seeing
    bne t0, x0, end
    vsetvli t1, t2, ...
    vsetvli x0, t2, ...
    ...
  end:
    vsetvli x0, t2, ...

and thinking it's safe to schedule that like

    vsetvli t1, t2, ...
    bne t0, x0, end
    vsetvli x0, t2, ...
    ...
  end:
    vsetvli x0, t2, ...
which I'd assumed is because the scheduler sees both execution pathsoverwriting the vector control registers and thus thinks it's safe tomove the first vsetvli to execute speculatively. From reading "6.Configuration-Setting Instructions" in vector.md that seems intentional,though, so maybe it's all just fine?

I think it's fine. Perhaps not what we want from a performancestandpoint, but functionally safe.

Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?

The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a
path that didn't have a vsetvl before.  Consider this simple graph.

     0
    / \
   2-->3

If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl
will land in bb4.  bb0 is not a valid insertion point for the vsetvl
pass because the path 0->3 doesn't strictly need a vsetvl.  That's
inherent in the LCM algorithm (anticipatable).

The scheduler has no such limitations.  The scheduler might create a
scheduling region out of blocks 0 and 2.  In that scenario, insns from
block 2 may speculate into block 0 as long as doing so doesn't change
semantics.

Ya. The combination of the scheduler moving a vsetvli before the branch(IIUC from bb2 to bb0 here) and the vsetvli merging causes it to looklike the whole vsetvli was moved before the branch.

I'm not sure why the scheduler doesn't move both vsetvli instructions toexecute speculatively, but otherwise this seems to be behaving asdesigned. It's just tripping up the VL=0 cases for us.

You'd have to get into those dumps and possibly throw the compiler undera debugger. My guess is it didn't see any advantage in doing so.

Maybe that's a broad uarch split point here? For OOO designs we'd wantto rely on HW scheduling and thus avoid hoisting possibly-expensivevsetvli instructions (where they'd need to execute in HW because of theside effects), while on in-order designs we'd want to aggressivelyschedule vsetvli instructions because we can't rely on HW scheduling tohide the latency.

There may be. But the natural question would be cost/benefit. It maynot buy us anything on the performance side to defer vsetvl insertionfor OOO cores. At which point the only advantage is testsuitestability. And if that's the only benefit, we may be able to do thatthrough other mechanisms.

In theory at sched2 time the insn stream should be fixed.  There are
practical/historical exceptions, but changes to the insn stream after
that point are discouraged.
We were just talking about this is our toolchain team meeting, and itseems like both GCC and LLVM are in similar spots here -- essentiallythe required set of vsetvli instructions depends very strongly onscheduling, so trying to do them independently is just always going tolead to sub-par results. It feels kind of like we want some scheduling-based cost feedback in the vsetvli pass (or the other way around ifthey're in the other order) to get better results.
Maybe that's too much of a time sink for the OOO machines, though? Ifwe've got HW scheduling then the SW just has to be in the ballpark andeverything should be fine.

I'd guess it more work than it'd be worth. We're just not seeingvsetvls being all that problematical on our design. I do see a lot ofseemingly gratutious changes in the vector config, but when we makechanges to fix that we generally end up with worse performing code.


Jeff

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

Reply via email to