On Tue, Mar 25, 2025 at 2:47 AM Robin Dapp <rdapp....@gmail.com> wrote:
> > A year ago, Robin added minimal and not-yet-tunable > > common_vector_cost::segment_permute_[2-8] > > But it is tunable, just not a param? :) I meant "param" generically, not necessarily a command-line --param=thingy, though point taken! :) > We have our own cost structure in our > downstream repo, adjusted to our uarch. I suggest you do the same or > upstream > a separate cost structure. I don't think anybody would object to having > several of those, one for each uarch (as long as they are sufficiently > distinct). > Yes, this is what I meant by not-yet-tunable, there is currently no datapath between -mcpu/-mtune and common_vector_cost::segment_permute_*. All CPUs get the same hard-coded value of 1 for all segment_permute_* costs. > BTW, just tangentially related and I don't know how sensitive your uarch > is to > scheduling, but with the x264 SAD and other sched issues we have seen you > might > consider disabling sched1 as well for your uarch? I know that for our > uarch we > want to keep it on but we surely could have another generic-like mtune > option > that disables it (maybe even generic-ooo and change the current > generic-ooo to > generic-in-order?). I would expect this to get more common in the future > anyway. Thanks for the tip. We will look into it. > > Some issues & questions: > > > > * Since this pertains only to segment load/store, why is the word > "permute" > > in the name? > > The vectorizer already performs costing for the segment loads/stores (IIRC > as > simple loads, though). At some point the idea was to explicitly model the > "segment permute/transpose" as a separate operation i.e. > This is a different concept, so I ought to introduce a new cost param which is the threshold value of NF for fast vs. slow. > * I implemented tuning as a simple threshold for max NF where segment > > load/store is profitable. Test cases for vector segment store pass, but > > tests for load fail. I found that common_cost_vector::segment_permute > is > > properly honored in the store case, but not even inspected in the load > > case. I will need to spelunk the autovec cost model. Clues are welcome. > > Could you give an example for that? Might just be a bug. > Looking at gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c, however I > see > that the cost is adjusted for loads, though. You won't see failures in the testsuite. The failures only show-up when I attempt to impose huge costs on NF above threshold. A quick & dirty way to expose the bug is apply the appended patch, then observe that you get output from this only for mask_struct_store-*.c and not for mask_struct_load-*.c G --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -1140,6 +1140,7 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, loop_vec_info loop, int group_size = segment_loadstore_group_size (kind, stmt_info); if (group_size > 1) { + fprintf (stderr, "segment_loadstore_group_size = %d\n", group_size); switch (group_size) { case 2: