On 2/13/2025 4:12 AM, Vineet Gupta wrote:
On 2/13/25 14:17, Robin Dapp wrote:
Other thoughts?
The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff
we can't/don't model in the pipeline, but I have no idea how to model
the VL=0 case there.
Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be
the first time a hook got (ab)used in ways that weren't part of the
original intent.
I don't fully understand what's happening.  So the hoisting is being done
speculatively here?  And it just happens to be "bad" because that might
cause a VL=0 case.  But are we sure a lack of speculation cannot cause
such cases?
Exactly. My gut feeling w/o deep dive was this seemed like papering over the 
issue.

BTW what exactly is speculative scheduling ? As in what is it actually trying to
schedule ahead ?

Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?
As far as LCM is concerned it is hoisting the insn to the optimal spot. However
there's some additional logic such as in can_use_next_avl_p () which influences
if things can be moved around.

Since sched1 put the vsetvl right before the branch, that was always determined to be the "earliest" point because it was now available on all outgoing edges. Without the vsetvl right before the branch, the "earliest" point to insert the vsetvls was determined to be the beginning of each basic block.

I did try adding some additional logic to adjust the way vsetvl fusion occurs across basic blocks in these scenarios  i.e. performing the fusion in the opposite manner (breaking lcm guarantees); however, from my testing, fusing two vsetvls didn't actually remove the fused expression from the vinfo list. I'm not sure if that's intended but as a result, phase 3 would remove the fused block and use the vinfo that should've been fused into the other.

That won't help with the problem here but might with others.
Right this needs to be evaluated independently with both icounts and BPI3 runs
to see if anything falls out.

-Vineet

I'll add an opt flag to gate this for testing purposes.

Edwin

Reply via email to