On 2/9/2022 1:12 PM, Roger Sayle wrote:
This patch adds middle-end support for target ABIs that pass/return floating point values in integer registers with precision wider than the original FP mode. An example, is the nvptx backend where 16-bit HFmode registers are passed/returned as (promoted to) SImode registers. Unfortunately, this currently falls foul of the various (recent?) sanity checks that (very sensibly) prevent creating paradoxical SUBREGs of floating point registers. The approach below is to explicitly perform the conversion/promotion in two steps, via an integer mode of same precision as the floating point value. So on nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using SUBREG), then zero-extended to SImode, and likewise when going the other way, parameters truncated to HImode then converted to HFmode (using SUBREG). These changes are localized to expand_value_return and expanding DECL_RTL to support strange ABIs, rather than inside convert_modes or gen_lowpart, as mismatched precision integer/FP conversions should be explicit in the RTL, and these semantics not generally visible/implicit in user code. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures, and on nvptx-none, where it is the middle-end portion of a pair of patches to allow the default ISA to be advanced. Ok for mainline? 2022-02-09 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog * cfgexpand.cc (expand_value_return): Allow backends to promote a scalar floating point return value to a wider integer mode. * expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow backends to promote scalar FP PARM_DECLs to wider integer modes.
Buried somewhere in our calling conventions code is the ability to pass around BLKmode objects in registers along with the ability to tune left vs right padding adjustments. Much of this support grew out of the PA 32 bit SOM ABI.
While I think we could probably make those bits do what we want, I suspect the result will actually be uglier than what you've done here and I wouldn't be surprised if there was a performance hit as the code to handle those cases was pretty dumb in its implementation.
What I find rather surprising is the location of your changes -- they feel incomplete. For example, you fix the callee side of returns in expand_value_return, but I don't analogous code for the caller side.
Similarly while you fix things for arguments in expand_expr_real_1, that's again just the callee side. Don't you need to so something on the caller side too?
Jeff