Re: [RISC-V] vector segment load/store width as a riscv_tune_param

Robin Dapp via Gcc Tue, 25 Mar 2025 02:55:10 -0700

I am revisiting an effort to make the number of lanes for vector segment
load/store a tunable parameter.


A year ago, Robin added minimal and not-yet-tunable
common_vector_cost::segment_permute_[2-8]

But it is tunable, just not a param? :) We have our own cost structure in ourdownstream repo, adjusted to our uarch. I suggest you do the same or upstreama separate cost structure. I don't think anybody would object to havingseveral of those, one for each uarch (as long as they are sufficientlydistinct).

BTW, just tangentially related and I don't know how sensitive your uarch is toscheduling, but with the x264 SAD and other sched issues we have seen you mightconsider disabling sched1 as well for your uarch? I know that for our uarch wewant to keep it on but we surely could have another generic-like mtune optionthat disables it (maybe even generic-ooo and change the current generic-ooo togeneric-in-order?). I would expect this to get more common in the futureanyway.

Some issues & questions:

* Since this pertains only to segment load/store, why is the word "permute"
  in the name?

The vectorizer already performs costing for the segment loads/stores (IIRC assimple loads, though). At some point the idea was to explicitly model the"segment permute/transpose" as a separate operation i.e.


v0, v1, v2 = segmented_load3x3 (...)
  {
    load vtmp0;
    load vtmp1;
    load vtmp2;
    v0 = {vtmp0[0], v1tmp[0], v2tmp[0]};
    v1 = {vtmp0[1], v1tmp[1], v2tmp[1]};
    v2 = {vtmp0[2], v1tmp[2], v2tmp[2]};
  }

and that permute is the expensive part of the operation in 99% of the cases.
That's where the wording comes from.

* Nit: why are these defined as individual members rather than an array
  referenced as segment_permute[NF-2]?

No real reason. I guess an array is preferable in several ways so feel free tochange that.

* I implemented tuning as a simple threshold for max NF where segment
  load/store is profitable. Test cases for vector segment store pass, but
  tests for load fail. I found that common_cost_vector::segment_permute is
  properly honored in the store case, but not even inspected in the load
  case. I will need to spelunk the autovec cost model. Clues are welcome.


Could you give an example for that?  Might just be a bug.

Looking at gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c, however I seethat the cost is adjusted for loads, though.


--
Regards
Robin

Re: [RISC-V] vector segment load/store width as a riscv_tune_param

Reply via email to