Alex Coplan <[email protected]> writes:
> Hi,
>
> The PR shows us spinning in dce.cc:fast_dce at the start of combine.
> This spinning appears to be because of a disagreement between the fast_dce
> code
> and the code in df-problems.cc:df_lr_bb_local_compute. Specifically, they
> disagree on the treatment of partial defs. For the testcase in the PR, we
> have
> the following insn in bb 3:
>
> (insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1
> (nil))
>
> which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e.
> DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG.
>
> Eliding the large block comment for readability, the code in
> df_lr_bb_local_compute does the following (for each insn):
>
> FOR_EACH_INSN_INFO_DEF (def, insn_info)
> {
> unsigned int dregno = DF_REF_REGNO (def);
> bitmap_set_bit (&bb_info->def, dregno);
> if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> bitmap_set_bit (&bb_info->use, dregno);
> else
> bitmap_clear_bit (&bb_info->use, dregno);
> }
>
> i.e. it models partial defs as a RMW operation; thus for the def arising
> from i10 above, it records a use of r104; hence it ends up in the
> live-in set for bb 3.
>
> However, as it stands, the code in dce.cc:fast_dce (and its callee
> dce_process_block) has no such provision for DF_REF_PARTIAL defs. It
> does not treat these as a RMW and does not compute r104 above as being
> live-in to bb 3. At the end of dce_process_block we compute the
> following "did something happen" condition used to decide termination of
> the analysis:
>
> block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb));
> if (block_changed)
> bitmap_copy (DF_LR_IN (bb), local_live);
>
> BITMAP_FREE (local_live);
> return block_changed;
>
> because of the disagreement between df_lr_local_compute and the local
> analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
> in local_live. Hence we always return true here, call
> df_analyze_problem (which re-computes DF_LR_IN according to
> df_lr_bb_local_compute, re-adding r104), and so the analysis never
> terminates.
>
> This patch therefore adjusts df_simulate_defs (called from
> dce_process_block) to match the behaviour of df_lr_bb_local_compute in
> this respect, namely we make it model partial defs as RMW operations by
> setting the relevant register live. This fixes the spinning in fast_dce
> for this testcase.
>
> Bootstrapped/regtested on aarch64-linux-gnu, arm-linux-gnueabihf, and
> x86_64-linux-gnu: no regressions. I also checked that this is netural
> for SPEC CPU 2017 on Graviton 3. OK for trunk? If so, what about
> backports?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
> PR rtl-optimization/116564
> * df-problems.cc (df_simulate_defs): For partial defs, mark the
> register live (treat it as a RMW operation).
>
> gcc/testsuite/ChangeLog:
>
> PR rtl-optimization/116564
> * gcc.target/aarch64/torture/pr116564.c: New test.
OK, thanks. I think it's also ok to backport after a settling period
(but probably best to leave a reasonable amount of time before backporting,
since this is sensitive code).
Richard
>
> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
> index f32185b3eac..90753793098 100644
> --- a/gcc/df-problems.cc
> +++ b/gcc/df-problems.cc
> @@ -3893,9 +3893,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live)
> {
> unsigned int dregno = DF_REF_REGNO (def);
>
> - /* If the def is to only part of the reg, it does
> - not kill the other defs that reach here. */
> - if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> + /* If the def is to only part of the reg, model it as a RMW operation
> + by marking it live. It only kills the reg if it is a complete def. */
> + if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> + bitmap_set_bit (live, dregno);
> + else
> bitmap_clear_bit (live, dregno);
> }
> }
> diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
> b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
> new file mode 100644
> index 00000000000..d471e097294
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +#include <arm_neon.h>
> +void test()
> +{
> + for (int L = 0; L < 4; ++L) {
> + float64_t ResData[1 * 2];
> + float64x1x2_t Src1;
> + vst2_f64(ResData, Src1);
> + }
> +}