On Tue, Mar 25, 2025 at 2:47 AM Robin Dapp <rdapp....@gmail.com> wrote:


> > A year ago, Robin added minimal and not-yet-tunable
> > common_vector_cost::segment_permute_[2-8]
>
> But it is tunable, just not a param? :)


I meant "param" generically, not necessarily a command-line --param=thingy,
though point taken! :)


> We have our own cost structure in our
> downstream repo, adjusted to our uarch.  I suggest you do the same or
> upstream
> a separate cost structure.  I don't think anybody would object to having
> several of those, one for each uarch (as long as they are sufficiently
> distinct).
>

Yes, this is what I meant by not-yet-tunable, there is currently no datapath
between -mcpu/-mtune and common_vector_cost::segment_permute_*. All CPUs get
the same hard-coded value of 1 for all segment_permute_* costs.


> BTW, just tangentially related and I don't know how sensitive your uarch
> is to
> scheduling, but with the x264 SAD and other sched issues we have seen you
> might
> consider disabling sched1 as well for your uarch?  I know that for our
> uarch we
> want to keep it on but we surely could have another generic-like mtune
> option
> that disables it (maybe even generic-ooo and change the current
> generic-ooo to
> generic-in-order?).  I would expect this to get more common in the future
> anyway.


Thanks for the tip. We will look into it.


> > Some issues & questions:
> >
> > * Since this pertains only to segment load/store, why is the word
> "permute"
> >   in the name?
>
> The vectorizer already performs costing for the segment loads/stores (IIRC
> as
> simple loads, though).  At some point the idea was to explicitly model the
> "segment permute/transpose" as a separate operation i.e.
>

This is a different concept, so I ought to introduce a new cost param which
is
the threshold value of NF for fast vs. slow.

> * I implemented tuning as a simple threshold for max NF where segment
> >   load/store is profitable. Test cases for vector segment store pass, but
> >   tests for load fail. I found that common_cost_vector::segment_permute
> is
> >   properly honored in the store case, but not even inspected in the load
> >   case. I will need to spelunk the autovec cost model. Clues are welcome.
>
> Could you give an example for that?  Might just be a bug.
> Looking at gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c, however I
> see
> that the cost is adjusted for loads, though.


You won't see failures in the testsuite. The failures only show-up when I
attempt to impose huge costs on NF above threshold. A quick & dirty way to
expose the bug is apply the appended patch, then observe that you get output
from this only for mask_struct_store-*.c and not for mask_struct_load-*.c

G

--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1140,6 +1140,7 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt
kind, loop_vec_info loop,
              int group_size = segment_loadstore_group_size (kind,
stmt_info);
              if (group_size > 1)
                {
+           fprintf (stderr, "segment_loadstore_group_size = %d\n",
group_size);
                  switch (group_size)
                    {
                    case 2:

Reply via email to