On 11/03/2025 17:39, Richard Sandiford wrote:
> Alex Coplan <alex.cop...@arm.com> writes:
> > Hi,
> >
> > The PR shows us spinning in dce.cc:fast_dce at the start of combine.
> > This spinning appears to be because of a disagreement between the fast_dce 
> > code
> > and the code in df-problems.cc:df_lr_bb_local_compute.  Specifically, they
> > disagree on the treatment of partial defs.  For the testcase in the PR, we 
> > have
> > the following insn in bb 3:
> >
> > (insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1
> >      (nil))
> >
> > which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e.
> > DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG.
> >
> > Eliding the large block comment for readability, the code in
> > df_lr_bb_local_compute does the following (for each insn):
> >
> >       FOR_EACH_INSN_INFO_DEF (def, insn_info)
> >         {
> >           unsigned int dregno = DF_REF_REGNO (def);
> >           bitmap_set_bit (&bb_info->def, dregno);
> >           if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> >             bitmap_set_bit (&bb_info->use, dregno);
> >           else
> >             bitmap_clear_bit (&bb_info->use, dregno);
> >         }
> >
> > i.e. it models partial defs as a RMW operation; thus for the def arising
> > from i10 above, it records a use of r104; hence it ends up in the
> > live-in set for bb 3.
> >
> > However, as it stands, the code in dce.cc:fast_dce (and its callee
> > dce_process_block) has no such provision for DF_REF_PARTIAL defs.  It
> > does not treat these as a RMW and does not compute r104 above as being
> > live-in to bb 3.  At the end of dce_process_block we compute the
> > following "did something happen" condition used to decide termination of
> > the analysis:
> >
> >   block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb));
> >   if (block_changed)
> >     bitmap_copy (DF_LR_IN (bb), local_live);
> >
> >   BITMAP_FREE (local_live);
> >   return block_changed;
> >
> > because of the disagreement between df_lr_local_compute and the local
> > analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
> > in local_live.  Hence we always return true here, call
> > df_analyze_problem (which re-computes DF_LR_IN according to
> > df_lr_bb_local_compute, re-adding r104), and so the analysis never
> > terminates.
> >
> > This patch therefore adjusts df_simulate_defs (called from
> > dce_process_block) to match the behaviour of df_lr_bb_local_compute in
> > this respect, namely we make it model partial defs as RMW operations by
> > setting the relevant register live.  This fixes the spinning in fast_dce
> > for this testcase.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, arm-linux-gnueabihf, and
> > x86_64-linux-gnu: no regressions.  I also checked that this is netural
> > for SPEC CPU 2017 on Graviton 3.  OK for trunk?  If so, what about
> > backports?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> >     PR rtl-optimization/116564
> >     * df-problems.cc (df_simulate_defs): For partial defs, mark the
> >     register live (treat it as a RMW operation).
> >
> > gcc/testsuite/ChangeLog:
> >
> >     PR rtl-optimization/116564
> >     * gcc.target/aarch64/torture/pr116564.c: New test.
> 
> OK, thanks.  I think it's also ok to backport after a settling period
> (but probably best to leave a reasonable amount of time before backporting,
> since this is sensitive code).

Thanks, pushed to trunk as g:758e617bcf224dc9d4a7e26dd858d43c1e63b916.

Yes this seems like sensitive code indeed, so I'll leave this for at
least a couple of weeks before thinking about backports.

Alex

> 
> Richard
> 
> >
> > diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
> > index f32185b3eac..90753793098 100644
> > --- a/gcc/df-problems.cc
> > +++ b/gcc/df-problems.cc
> > @@ -3893,9 +3893,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live)
> >      {
> >        unsigned int dregno = DF_REF_REGNO (def);
> >  
> > -      /* If the def is to only part of the reg, it does
> > -    not kill the other defs that reach here.  */
> > -      if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> > +      /* If the def is to only part of the reg, model it as a RMW operation
> > +    by marking it live.  It only kills the reg if it is a complete def.  */
> > +      if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> > +   bitmap_set_bit (live, dregno);
> > +      else
> >     bitmap_clear_bit (live, dregno);
> >      }
> >  }
> > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c 
> > b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
> > new file mode 100644
> > index 00000000000..d471e097294
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "" } */
> > +#include <arm_neon.h>
> > +void test()
> > +{
> > +  for (int L = 0; L < 4; ++L) {
> > +    float64_t ResData[1 * 2];
> > +    float64x1x2_t Src1;
> > +    vst2_f64(ResData, Src1);
> > +  }
> > +}

Reply via email to