Hi,

I'm working on optimising assignments to the AArch64 Floating-point Mode
Register (FPMR), as part of our FP8 enablement work.  Claudio has already
implemented FPMR as a hard register, with the intention that FP8 intrinsic
functions will compile to a combination of an fpmr register set, followed by an
FP8 operation that takes fpmr as an input operand.

It would clearly be inefficient to retain an explicit FPMR assignment prior to
each FP8 instruction (especially in the common case where every assignment uses
the same FPMR value).  I think the best way to optimise this would be to
implement a new pass that can optimise assignments to individual hard registers.

There are a number of existing passes that do similar optimisations, but which
I believe are unsuitable for this scenario for various reasons.  For example:

- cse1 can already optimise FPMR assignments within an extended basic block,
  but can't handle broader optimisations.
- pre (in gcse.c) doesn't work with assigning constant values, which would miss
  many potential usages.  It also has limits on how far code can be moved,
  based around ideas of register pressure that don't apply to the context of a
  single hard register that shouldn't be used by the register allocator for
  anything else.  Additionally, it doesn't run at -Os.
- hoist (also using gcse.c) only handles constant values, and only runs when
  optimising for size.  It also has the rest of the issues that pre does.
- mode_sw only handles a small finite set of modes.  The mode requirements are
  determined solely by the instructions that require the specific mode, so mode
  switches don't depend on the output of previous instructions.


My intention would be for the new pass to reuse ideas, and hopefully some of
the existing code, from the mode-switching and gcse passes.  In particular,
gcse.c (or it's dependencies) has code that could identify when values assigned
to the FPMR are known to be the same (although we may not need the full CSE
capabilities of gcse.c), and mode-switching.cc knows how to globally optimise
mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid
excessively increasing register pressure).

Initially the new pass would only apply to the AArch64 FPMR register, but in
future it could also be used for other hard registers with similar properties.

Does anyone have any comments on this approach, before I start writing any
code?

Thanks,
Andrew


Reply via email to