On 11/15/24 12:17 AM, Richard Biener wrote:
On Thu, Nov 14, 2024 at 10:41 PM Jeff Law <j...@ventanamicro.com> wrote:


Several weeks ago I was looking at SATD and realized that we had loads
of permutation constants that could be implemented as a trivial
adjustment to a prior loaded permutation constant.

For example if we had loaded a permutation constant like 1, 3, 1, 3, 5,
7, 5, 7 and we needed 0, 2, 0, 2, 4, 6, 4, 6.  We could use vadd.vi to
adjust the already loaded constant into the new constant we wanted.

This has a lot of similarities to SLSR and reload_cse_mov2add, but with
the added complexity that we're dealing with vectors and whether or not
we can profitably derive one constant from the other is a target
dependent decision.

So this is implemented as a mini pass after CSE2.  If other targets are
interested, we can certainly look to make this more generic.  I'm sure
we could use a hook to help with the profitability question.

The implementation works by normalizing permutation constants so that
the first element is 0 and we hash based on the normalized form.  So in
the case above, the normalized form is 0, 2, 0, 2, 4, 6, 4, 6, that's
what gets entered into the hash table for 1, 3, 1, 3, 5, 7, 5, 7,
allowing us to realize the all the elements differ by the same value
when we later encounter 0, 2, 0, 2, 4, 6, 4, 6.

Note this in principle applies to all (non-vector) constants.  The issues are
  a) can the target directly generate the constant
  b) requiring an earlier constant and some adjustment might increase
      register lifetime and thus cause spilling in the end
  c) a load from L1 might be better than the dependence on the earlier
      generated constant

I think it makes sense to consider this as part of LRA rematerialization
support?
Yea, it definitely applies to other constants, which is why I mentioned the related values stuff from CSE, move2add and SLSR.

(a) is probably the least "interesting" problem. An appropriate hook would give the target a chance to answer that question.

(b) is common to most CSE based transformations. We've typically driven CSE's decisions based on localized cost modeling, which we could certainly do here. If we moved the REG_EQUAL/REG_EQUIV note that would likely be a good thing for IRA/LRA. I'm not sure if they're currently rematerializing constants, but it'd at least provide a critical tidbit of information.

(c) is tough as well since it can be fairly dependent on the precise instruction mix. I'm pretty sure issues in this space are why the patch actually caused performance to go backwards a couple months ago. With some design adjustments and a sensible scheduler model it's likely a small win now in general on our design. But I would well see it behaving differently on other designs.

But yes, this could well be thought of as a remat problem in two ways.

First we could continue down the path of trying to optimize the related value in a CSE-like manner, but provide enough infrastructure for IRA/LRA to rematerialize the constant from the constant pool when register pressure is high. That would probably fit into the current IRA/LRA model.

Or we could extend remat to more generally work on trying to materialize constant pool accesses using existing values in the IL.

Or we could punt it to post-reload CSE since this is just a vector version of move2add. I'm not a fan of the move2add code, so I discounted this approach. But it would largely address (b) above.

Jeff


Reply via email to