Hi, I'm working on optimising assignments to the AArch64 Floating-point Mode Register (FPMR), as part of our FP8 enablement work. Claudio has already implemented FPMR as a hard register, with the intention that FP8 intrinsic functions will compile to a combination of an fpmr register set, followed by an FP8 operation that takes fpmr as an input operand.
It would clearly be inefficient to retain an explicit FPMR assignment prior to each FP8 instruction (especially in the common case where every assignment uses the same FPMR value). I think the best way to optimise this would be to implement a new pass that can optimise assignments to individual hard registers. There are a number of existing passes that do similar optimisations, but which I believe are unsuitable for this scenario for various reasons. For example: - cse1 can already optimise FPMR assignments within an extended basic block, but can't handle broader optimisations. - pre (in gcse.c) doesn't work with assigning constant values, which would miss many potential usages. It also has limits on how far code can be moved, based around ideas of register pressure that don't apply to the context of a single hard register that shouldn't be used by the register allocator for anything else. Additionally, it doesn't run at -Os. - hoist (also using gcse.c) only handles constant values, and only runs when optimising for size. It also has the rest of the issues that pre does. - mode_sw only handles a small finite set of modes. The mode requirements are determined solely by the instructions that require the specific mode, so mode switches don't depend on the output of previous instructions. My intention would be for the new pass to reuse ideas, and hopefully some of the existing code, from the mode-switching and gcse passes. In particular, gcse.c (or it's dependencies) has code that could identify when values assigned to the FPMR are known to be the same (although we may not need the full CSE capabilities of gcse.c), and mode-switching.cc knows how to globally optimise mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid excessively increasing register pressure). Initially the new pass would only apply to the AArch64 FPMR register, but in future it could also be used for other hard registers with similar properties. Does anyone have any comments on this approach, before I start writing any code? Thanks, Andrew