> For high-performance OoO uarchs, the number of ALUs is usually reflected > directly in instruction throughput. So in theory we could derive the scalar > vs vector scaling factor from the CPU scheduling model. > > I looked at the GCC RISC-V scheduling models (spacemit-x60.md, xiangshan.md, > sifive-p600.md) - they do have ALU unit counts defined. I also noticed LLVM's > SchedMachineModel has similar information (throughput, latency, resource units > per instruction class). Not sure if LLVM already uses this for vector cost > scaling though.
Right now there is no way to access that information during vector costing, unfortunately. Well, not no way at all, but it hasn't been wired up at least. And the vector costing is coarser in that it doesn't distinguish between individual ALU ops etc. It would be nice to establish a connection there in the mid to long term. One question is whether the scheduler model is the right "ground truth" choice or if there should be another unified file containing everything. aarch64 has been doing some tuning-model work, having json parsers/dumpers for tune files but AFAIK they also don't connect scheduler model and vectorizer cost model. That could help get us going. If you're up for some work in that area, I'm sure patches are welcome. > Also, if this "scalar" scaling is hardcoded as a fixed value, it will always > be unfriendly to some uarchs - different CPUs have very different > scalar/vector > ALU ratios. Ideally this should come from the CPU model. Yes, I was just stating an example. That's surely something that we'd directly get from the model. > Another concern: even the 4 scalar ALUs vs 2 vector ALUs ratio may not be > sufficient for scaling. VLEN also matters - a vector op with VLEN=512 and one > with VLEN=128 shouldn't have the same cost scaling, since what we really want > to compare is the cost of processing the same amount of data. I know RVV is > VLA, but maybe we could start with a default VLEN=128 and allow users to > adjust the scaling via some options. VLEN is considered during vector costing. That's what it does by default (via vectorization factor and more), I just mentioned what we need to do on top of that to fairly compare against scalar. -- Regards Robin
