On 9/12/24 8:22 AM, Richard Sandiford wrote:


This has recently come up in the RISC-V space due to needing VXRM
assignments so that we can utilize the vaaddu add-with-averaging
instructions.    Placement of VXRM mode switches looks optimal from an
LCM standpoint, but speculation can measurably improve performance.  It
was something like 2% on the BPI for x264.  The k1/m1 chip in the BPI is
almost certainly flushing its pipelines on the VXRM assignment.

Ah yeah, good point.  I expect speculation would be best for FPMR as well.
I imagine most use cases will be well-structured in practice, but for
those that aren't...
It's certainly worth investigating.


I've got a hack here that I'll submit upstream at some point.  Just not
at the top of my list yet -- especially now that our uarch has been
fixed to not flush its pipelines at VXRM assignments ;-)

Is that handled by mode-switching, or is it a separate thing?
I abused one of the existing mode switching hooks. Essentially I scan the function once to look for all the possible modes of vxrm. If there is precisely once mode needed, then my hack pretends that mode is needed on the first insn of the function. Then we let the standard mode switching algorithm run.

For the cases that matter (and there are very very few with vxrm), that gets us the desired speculation. While I could certainly construct a testcase where the speculation was unprofitable, I doubt it ever happens in practice.

jeff

Reply via email to