On 8/25/2022 7:07 AM, Richard Sandiford via Gcc-patches wrote:
Currently SLP tries to force permute operations "down" the graph
from loads in the hope of reducing the total number of permutations
needed or (in the best case) removing the need for the permutations
entirely.  This patch tries to extend it as follows:

- Allow loads to take a different permutation from the one they
   started with, rather than choosing between "original permutation"
   and "no permutation".

- Allow changes in both directions, if the target supports the
   reverse permutation.

- Treat the placement of permutations as a two-way dataflow problem:
   after propagating information from leaves to roots (as now), propagate
   information back up the graph.

- Take execution frequency into account when optimising for speed,
   so that (for example) permutations inside loops have a higher
   cost than permutations outside loops.

- Try to reduce the total number of permutations when optimising for
   size, even if that increases the number of permutations on a given
   execution path.

See the big block comment above vect_optimize_slp_pass for
a detailed description.

The original motivation for doing this was to add a framework that would
allow other layout differences in future.  The two main ones are:

- Make it easier to represent predicated operations, including
   predicated operations with gaps.  E.g.:

      a[0] += 1;
      a[1] += 1;
      a[3] += 1;

   could be a single load/add/store for SVE.  We could handle this
   by representing a layout such as { 0, 1, _, 2 } or { 0, 1, _, 3 }
   (depending on what's being counted).  We might need to move
   elements between lanes at various points, like with permutes.

   (This would first mean adding support for stores with gaps.)

- Make it easier to switch between an even/odd and unpermuted layout
   when switching between wide and narrow elements.  E.g. if a widening
   operation produces an even vector and an odd vector, we should try
   to keep operations on the wide elements in that order rather than
   force them to be permuted back "in order".
Very cool.  Richi and I discussed this a bit a year or so ago -- basically noting that bi-directional movement is really the way to go and that the worst thing to do is push a permute down into the *middle* of a computation chain since that will tend to break FMA generation.  Moving to the loads or stores or to another permute point ought to be the goal.

It looks like you've covered that excessively well :-)

Jeff

Reply via email to