Jeff Law <jeffreya...@gmail.com> writes: > On 9/7/24 1:09 AM, Richard Biener wrote: >> >> >>> Am 06.09.2024 um 17:38 schrieb Andrew Carlotti <andrew.carlo...@arm.com>: >>> >>> Hi, >>> >>> I'm working on optimising assignments to the AArch64 Floating-point Mode >>> Register (FPMR), as part of our FP8 enablement work. Claudio has already >>> implemented FPMR as a hard register, with the intention that FP8 intrinsic >>> functions will compile to a combination of an fpmr register set, followed >>> by an >>> FP8 operation that takes fpmr as an input operand. >>> >>> It would clearly be inefficient to retain an explicit FPMR assignment prior >>> to whic >>> each FP8 instruction (especially in the common case where every assignment >>> uses >>> the same FPMR value). I think the best way to optimise this would be to >>> implement a new pass that can optimise assignments to individual hard >>> registers. >>> >>> There are a number of existing passes that do similar optimisations, but >>> which >>> I believe are unsuitable for this scenario for various reasons. For >>> example: >>> >>> - cse1 can already optimise FPMR assignments within an extended basic block, >>> but can't handle broader optimisations. >>> - pre (in gcse.c) doesn't work with assigning constant values, which would >>> miss >>> many potential usages. It also has limits on how far code can be moved, >>> based around ideas of register pressure that don't apply to the context >>> of a >>> single hard register that shouldn't be used by the register allocator for >>> anything else. Additionally, it doesn't run at -Os. >>> - hoist (also using gcse.c) only handles constant values, and only runs when >>> optimising for size. It also has the rest of the issues that pre does. >>> - mode_sw only handles a small finite set of modes. The mode requirements >>> are >>> determined solely by the instructions that require the specific mode, so >>> mode >>> switches don't depend on the output of previous instructions. >>> >>> >>> My intention would be for the new pass to reuse ideas, and hopefully some of >>> the existing code, from the mode-switching and gcse passes. In particular, >>> gcse.c (or it's dependencies) has code that could identify when values >>> assigned >>> to the FPMR are known to be the same (although we may not need the full CSE >>> capabilities of gcse.c), and mode-switching.cc knows how to globally >>> optimise >>> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to >>> avoid >>> excessively increasing register pressure). >>> >>> Initially the new pass would only apply to the AArch64 FPMR register, but in >>> future it could also be used for other hard registers with similar >>> properties. >>> >>> Does anyone have any comments on this approach, before I start writing any >>> code? >> >> Can you explain in more detail why the mode-switching pass > infrastructure isn’t a good fit? ISTR it already is customizable via > target hooks. > Agreed. Mode switching seems to be the right pass to look at. > > It probably is worth pointing out that mode switching is LCM based and > as such never speculates. Given the potential cost of a mode switch, > failure to speculate may be a notable limitation (though the same would > apply to the ideas Andrew floated above). > > This has recently come up in the RISC-V space due to needing VXRM > assignments so that we can utilize the vaaddu add-with-averaging > instructions. Placement of VXRM mode switches looks optimal from an > LCM standpoint, but speculation can measurably improve performance. It > was something like 2% on the BPI for x264. The k1/m1 chip in the BPI is > almost certainly flushing its pipelines on the VXRM assignment.
Ah yeah, good point. I expect speculation would be best for FPMR as well. I imagine most use cases will be well-structured in practice, but for those that aren't... > I've got a hack here that I'll submit upstream at some point. Just not > at the top of my list yet -- especially now that our uarch has been > fixed to not flush its pipelines at VXRM assignments ;-) Is that handled by mode-switching, or is it a separate thing? RIchard