On Thu, Nov 14, 2024 at 10:41 PM Jeff Law <j...@ventanamicro.com> wrote:
>
>
> Several weeks ago I was looking at SATD and realized that we had loads
> of permutation constants that could be implemented as a trivial
> adjustment to a prior loaded permutation constant.
>
> For example if we had loaded a permutation constant like 1, 3, 1, 3, 5,
> 7, 5, 7 and we needed 0, 2, 0, 2, 4, 6, 4, 6.  We could use vadd.vi to
> adjust the already loaded constant into the new constant we wanted.
>
> This has a lot of similarities to SLSR and reload_cse_mov2add, but with
> the added complexity that we're dealing with vectors and whether or not
> we can profitably derive one constant from the other is a target
> dependent decision.
>
> So this is implemented as a mini pass after CSE2.  If other targets are
> interested, we can certainly look to make this more generic.  I'm sure
> we could use a hook to help with the profitability question.
>
> The implementation works by normalizing permutation constants so that
> the first element is 0 and we hash based on the normalized form.  So in
> the case above, the normalized form is 0, 2, 0, 2, 4, 6, 4, 6, that's
> what gets entered into the hash table for 1, 3, 1, 3, 5, 7, 5, 7,
> allowing us to realize the all the elements differ by the same value
> when we later encounter 0, 2, 0, 2, 4, 6, 4, 6.

Note this in principle applies to all (non-vector) constants.  The issues are
 a) can the target directly generate the constant
 b) requiring an earlier constant and some adjustment might increase
     register lifetime and thus cause spilling in the end
 c) a load from L1 might be better than the dependence on the earlier
     generated constant

I think it makes sense to consider this as part of LRA rematerialization
support?

> After we hit in the hash table we verify that a simple vadd.vi is
> sufficient to generate the second constant from the first and adjust the
> code appropriately.
>
> I tested it on the BPI at the time with good results, but the details
> have escaped me -- at the time it performed poorly on our design.  I had
> a strong suspicion the poor behavior on design was due to a particularly
> poor scheduler model.  With our scheduler model in much better shape I
> recently retested and it's consistently 1c better for the SATD code.
> Big win!  Wahoo!
>
>
> Bootstrapped and regression tested on riscv64-linux-gnu, regression
> tested on riscv64-elf and riscv32-elf as well.
>
> Anyway, RFC for now since it introduces a new target dependent pass.
> Comments, criticism, etc all welcomed.  The goal would be to try and go
> forward with something in the gcc-15 timeframe.
>
> Thanks,
> Jeff

Reply via email to