On Sun, Dec 29, 2024 at 10:54:03AM -0700, Jeff Law wrote:
>
>
> On 12/5/24 8:45 AM, Andrew Carlotti wrote:
>
> > > So at a 30k foot level, one thing to be very leery of is extending the
> > > lifetime of any hard register. It's probably not a big deal on aarch, but
> > > it can cause all kinds of headaches on other targets.
> > >
> > > Essentially you probably need to avoid PRE on a hard register that's in a
> > > likely spilled class.
> >
> > This is not intended to be used for ordinary registers, so that shouldn't
> > be a
> > concern. The use case is essentially as a form of mode-switching, where the
> > active mode is specified by a register that can take arbitrary values at
> > runtime.
> I don't recall the details of the patch, but essentially you're going to
> need some kind of way to prune the set of hard registers subject to this
> optimization -- which would need to include filtering out any hard register
> that's in a likely spilled class.
By "in a likely splilled class", do you mean a register where register
allocation might end up splitting up the live range and spilling/reloading the
value? Or something else?
I think my pass handles this by only operating on a list of hardregs specified
by a new target hook. It would be up to the target hooks to ensure that the
optimisation isn't applied to inappropriate register classes.
> >
> > In this new hardreg PRE pass we want to move the actual assignment to the
> > hardreg, so we have to additionally check whether moving the assignment
> > earlier
> > (and not just moving the compuation earlier) would conflict with existing
> > uses
> > of the hardreg. The transparency (and antic) bitmaps are used to indicate
> > this
> > property in LCM; perhaps the name is less accurate when applied to hardreg
> > PRE.
> Understood. I think this is primarily a naming problem.
>
> > > > @@ -747,6 +804,9 @@ static basic_block current_bb;
> > > > static bool
> > > > want_to_gcse_p (rtx x, machine_mode mode, HOST_WIDE_INT
> > > > *max_distance_ptr)
> > > > {
> > > > + if (doing_hardreg_pre_p)
> > > > + return true;
> > > This seems overly aggressive, though perhaps it's not so bad in practice.
> >
> > This should be fine - want_to_gcse_p is aiming to filter out cases where the
> > lifetime of the newly-created pseduoregs would increase register to an
> > extent
> > that the resulting spills would outweigh the benefits of the code motion.
> >
> > For hardreg PRE we don't create new pseduoregs, but instead use specific
> > hardregs that aren't used by RA, and we check for conflicting uses in
> > determining the extent of the code motion. So the checks in want_to_gcse_p
> > are
> > irrelevant here.
> want_to_gcse_p was originally meant to filter out things we didn't know how
> to handle or which were seen as not profitable to gcse, ever. Register
> lifetimes didn't come into play until much later.
Right - I realised there were still some cases that should be filtered out for
hardreg PRE, so I changed this for the v2 patch.
(https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671801.html)
> >
> > > > @@ -1537,6 +1623,18 @@ compute_hash_table_work (struct
> > > > gcse_hash_table_d *table)
> > > > EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0,
> > > > regno, hrsi)
> > > > record_last_reg_set_info (insn, regno);
> > > > + if (doing_hardreg_pre_p)
> > > > + {
> > > > + /* This is covered by the above clobbers, but let's
> > > > + conservatively make this work as well for hardregs
> > > > that
> > > > + are call-used but not call-clobbered. */
> > > > + record_last_reg_set_info (insn,
> > > > current_hardreg_regno);
> > > > +
> > > > + /* Mark this block as containing a call-clobber. */
> > > > + bitmap_set_bit (blocks_with_calls,
> > > > + BLOCK_FOR_INSN (insn)->index);
> > > > +
> > > I don't think this is wrong, but isn't it redundant? ie, shouldn't we
> > > have
> > > populated that bitmap elsewhere?
> >
> > The existing code only uses blocks_with_calls for load motion, to indicate
> > when
> > memory might be clobbered by a call. The existing computation of
> > blocks_with_calls is in
> > record_last_mem_set_info_common, but I bypass this with an added early exit
> > condition in record_last_mem_set_info (in the same way that I bypass all the
> > other code specific to load motion).
> If I remember correctly, it's also used for loads outside the load/store
> motion variant of PRE. ie, PRE of loads (without trying to do any motion of
> stores).
I've dropped the reuse of blocks_with_calls from v2 of the patch anyway,
because I realised that DF analysis can already detect call-clobbered
registers.
> >
> > In the hardreg PRE pass we don't need to check for clobbered memory, but we
> > do
> > need to check whether the hardreg might be clobbered by a call. It seemed
> > sensible to reuse the existing suitably named bitmap to store this
> > information,
> > but because I bypassed the existing computation, I needed to add the
> > computation back in elsewhere.
> ACK. Note this all plays into the need to walk into the FUSAGE notes as
> well, which I think this patch failed to do.
Does the use of DF analysis cover this, or are there additional checks still
required?
> Jeff