Richard Biener <richard.guent...@gmail.com> writes: > On Mon, Aug 18, 2025 at 10:51 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> This patch fixes an internal disagreement in gcse about how to >> handle partial clobbers. Like many passes, gcse doesn't track >> the modes of live values, so if a call clobbers only part of >> a register, the pass has to make conservative assumptions. >> As the comment in the patch says, this means: >> >> (1) ignoring partial clobbers when computing liveness and reaching >> definitions >> >> (2) treating partial clobbers as full clobbers when computing >> availability >> >> DF is mostly concerned with (1), so ignores partial clobbers. >> >> compute_hash_table_work did (2) when calculating kill sets, >> but compute_transp didn't do (2) when computing transparency. >> This led to a nonsensical situation of a register being in both >> the transparency and kill sets. >> >> Andreas fixed the PR itself with a target-specific change, which is why >> there is no new testcase. But I was supposed to post this patch when >> the next stage 1 opened and it looks like I never did. >> >> Tested on aarch64-linux-gnu. OK to install? >> >> Richard >> >> >> gcc/ >> PR rtl-optimization/97497 >> * function-abi.h (predefined_function_abi::only_partial_reg_clobbers) >> (function_abi::only_partial_reg_clobbers): New member functions. >> * gcse-common.c: Include regs.h and function-abi.h. >> (compute_transp): Check for partially call-clobbered registers >> and treat them as not being transparent in blocks with calls. >> --- >> gcc/function-abi.h | 39 +++++++++++++++++++++++++++++++++++++++ >> gcc/gcse-common.cc | 40 ++++++++++++++++++++++++++++++++-------- >> 2 files changed, 71 insertions(+), 8 deletions(-) >> >> diff --git a/gcc/function-abi.h b/gcc/function-abi.h >> index edf83c1cb39..6cb552fbcac 100644 >> --- a/gcc/function-abi.h >> +++ b/gcc/function-abi.h >> @@ -99,6 +99,39 @@ public: >> return m_full_and_partial_reg_clobbers; >> } >> >> + /* Return the set of registers that a function call is allowed to alter >> + partially but not fully; i.e. those in full_and_partial_reg_clobbers () >> + but not in full_reg_clobbers (). >> + >> + If a register X is in this set and if we don't know which parts of >> + X are live (typically because we don't bother to track X's mode), >> + then the conservative assumptions are: >> + >> + - to ignore the call when computing reaching definitions >> + - to treat X as clobbered when computing availability >> + >> + For example, if we have: >> + >> + A: X := Y >> + B: call that partially clobbers X >> + C: ... := ... X ... >> + >> + and don't track the mode of X when computing reaching definitions, >> + then the conservative assumption is that A's definition survives >> + until C. But if we have: >> + >> + A: X := Y >> + B: call that partially clobbers X >> + C: ... := ... Y ... >> + >> + and don't track the mode of X when computing availability, then the >> + conservative assumption is that Y is not available in X at C. */ >> + HARD_REG_SET >> + only_partial_reg_clobbers () const >> + { >> + return full_and_partial_reg_clobbers () & ~full_reg_clobbers (); >> + } >> + >> /* Return the set of registers that cannot be used to hold a value of >> mode MODE across a function call. That is: >> >> @@ -165,6 +198,12 @@ public: >> return m_mask & m_base_abi->full_and_partial_reg_clobbers (); >> } >> >> + HARD_REG_SET >> + only_partial_reg_clobbers () const >> + { >> + return m_mask & m_base_abi->only_partial_reg_clobbers (); >> + } >> + >> HARD_REG_SET >> mode_clobbers (machine_mode mode) const >> { >> diff --git a/gcc/gcse-common.cc b/gcc/gcse-common.cc >> index 888e4d6356c..9ce9bb57148 100644 >> --- a/gcc/gcse-common.cc >> +++ b/gcc/gcse-common.cc >> @@ -27,7 +27,8 @@ >> #include "rtl.h" >> #include "df.h" >> #include "gcse-common.h" >> - >> +#include "regs.h" >> +#include "function-abi.h" >> >> /* Record all of the canonicalized MEMs of record_last_mem_set_info's insn. >> Note we store a pair of elements in the list, so they have to be >> @@ -129,13 +130,36 @@ compute_transp (const_rtx x, int indx, sbitmap *bmap, >> switch (code) >> { >> case REG: >> - { >> - df_ref def; >> - for (def = DF_REG_DEF_CHAIN (REGNO (x)); >> - def; >> - def = DF_REF_NEXT_REG (def)) >> - bitmap_clear_bit (bmap[DF_REF_BB (def)->index], indx); >> - } >> + { >> + df_ref def; >> + for (def = DF_REG_DEF_CHAIN (REGNO (x)); >> + def; >> + def = DF_REF_NEXT_REG (def)) >> + bitmap_clear_bit (bmap[DF_REF_BB (def)->index], indx); >> + >> + /* Check for hard registers that are partially clobbered (but not >> + fully clobbered) by calls. Such partial clobbers do not have >> + an associated definition or use in the DF representation, >> + but they do prevent the register from being transparent. >> + >> + ??? The check here is fairly crude. We could instead maintain >> + separate blocks_with_calls bitmaps for each ABI. */ >> + if (HARD_REGISTER_P (x)) > > It's probably cheap enough in that hard-reg uses in expressions are > seldom.
Yeah, hopefully. And uses of only-part-clobbered registers should be even rarer. > One does of course wonder about the cache-infriendlyness > of compute_transp in general (also looking at the MEM case...). Having blocks be the outer index and expressions be the inner index does seem a bit inefficient in this context, but I suppose it's better when solving the dataflow problem. I wonder whether it'd be more efficient to reformat the data after computing transparency? It might not be in practice though. And yeah, the MEM case does look a bit expensive :) Thanks for the review. Richard > > OK. > > Thanks, > Richard. > >> + for (unsigned int i = 0; i < NUM_ABI_IDS; ++i) >> + { >> + const predefined_function_abi &abi = function_abis[i]; >> + if (abi.initialized_p () >> + && overlaps_hard_reg_set_p (abi.only_partial_reg_clobbers >> (), >> + GET_MODE (x), REGNO (x))) >> + { >> + bitmap_iterator bi; >> + unsigned bb_index; >> + EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, >> bi) >> + bitmap_clear_bit (bmap[bb_index], indx); >> + break; >> + } >> + } >> + } >> >> return; >> >> -- >> 2.43.0 >>