On 12/5/24 8:45 AM, Andrew Carlotti wrote:
So at a 30k foot level, one thing to be very leery of is extending the
lifetime of any hard register. It's probably not a big deal on aarch, but
it can cause all kinds of headaches on other targets.
Essentially you probably need to avoid PRE on a hard register that's in a
likely spilled class.
This is not intended to be used for ordinary registers, so that shouldn't be a
concern. The use case is essentially as a form of mode-switching, where the
active mode is specified by a register that can take arbitrary values at
runtime.
I don't recall the details of the patch, but essentially you're going to
need some kind of way to prune the set of hard registers subject to this
optimization -- which would need to include filtering out any hard
register that's in a likely spilled class.
In this new hardreg PRE pass we want to move the actual assignment to the
hardreg, so we have to additionally check whether moving the assignment earlier
(and not just moving the compuation earlier) would conflict with existing uses
of the hardreg. The transparency (and antic) bitmaps are used to indicate this
property in LCM; perhaps the name is less accurate when applied to hardreg PRE.
Understood. I think this is primarily a naming problem.
@@ -747,6 +804,9 @@ static basic_block current_bb;
static bool
want_to_gcse_p (rtx x, machine_mode mode, HOST_WIDE_INT *max_distance_ptr)
{
+ if (doing_hardreg_pre_p)
+ return true;
This seems overly aggressive, though perhaps it's not so bad in practice.
This should be fine - want_to_gcse_p is aiming to filter out cases where the
lifetime of the newly-created pseduoregs would increase register to an extent
that the resulting spills would outweigh the benefits of the code motion.
For hardreg PRE we don't create new pseduoregs, but instead use specific
hardregs that aren't used by RA, and we check for conflicting uses in
determining the extent of the code motion. So the checks in want_to_gcse_p are
irrelevant here.
want_to_gcse_p was originally meant to filter out things we didn't know
how to handle or which were seen as not profitable to gcse, ever.
Register lifetimes didn't come into play until much later.
@@ -1537,6 +1623,18 @@ compute_hash_table_work (struct gcse_hash_table_d *table)
EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
record_last_reg_set_info (insn, regno);
+ if (doing_hardreg_pre_p)
+ {
+ /* This is covered by the above clobbers, but let's
+ conservatively make this work as well for hardregs that
+ are call-used but not call-clobbered. */
+ record_last_reg_set_info (insn, current_hardreg_regno);
+
+ /* Mark this block as containing a call-clobber. */
+ bitmap_set_bit (blocks_with_calls,
+ BLOCK_FOR_INSN (insn)->index);
+
I don't think this is wrong, but isn't it redundant? ie, shouldn't we have
populated that bitmap elsewhere?
The existing code only uses blocks_with_calls for load motion, to indicate when
memory might be clobbered by a call. The existing computation of
blocks_with_calls is in
record_last_mem_set_info_common, but I bypass this with an added early exit
condition in record_last_mem_set_info (in the same way that I bypass all the
other code specific to load motion).
If I remember correctly, it's also used for loads outside the load/store
motion variant of PRE. ie, PRE of loads (without trying to do any
motion of stores).
In the hardreg PRE pass we don't need to check for clobbered memory, but we do
need to check whether the hardreg might be clobbered by a call. It seemed
sensible to reuse the existing suitably named bitmap to store this information,
but because I bypassed the existing computation, I needed to add the
computation back in elsewhere.
ACK. Note this all plays into the need to walk into the FUSAGE notes as
well, which I think this patch failed to do.
Jeff