On Fri, 2020-10-30 at 09:22 +0100, Richard Biener wrote: > On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich <i...@linux.ibm.com> > wrote: > > On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote: > > > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc > > > <gcc@gcc.gnu.org> wrote: > > > > Hi, > > > > > > > > I'd like to revive the old discussion regarding the interaction > > > > of > > > > jump threading and b_c_p causing the latter to incorrectly > > > > return 1 > > > > in > > > > certain cases: > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html > > > > > > > > The conclusion was that this happening during threading is just > > > > a > > > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p > > > > should > > > > not be registered for incremental updating. > > > > > > > > I performed a little experiment and added an assertion to > > > > create_new_def_for: > > > > > > > > --- a/gcc/tree-into-ssa.c > > > > +++ b/gcc/tree-into-ssa.c > > > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple > > > > *stmt, > > > > def_operand_p def) > > > > { > > > > tree new_name; > > > > > > > > + gcc_checking_assert (!used_by_bcp_p (old_name)); > > > > + > > > > timevar_push (TV_TREE_SSA_INCREMENTAL); > > > > > > > > if (!update_ssa_initialized_fn) > > > > > > > > This has of course fired when performing basic block > > > > duplication > > > > during > > > > threading, which can be fixed by avoiding duplication of basic > > > > blocks > > > > wi > > > > th b_c_p calls: > > > > > > > > --- a/gcc/tree-cfg.c > > > > +++ b/gcc/tree-cfg.c > > > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p > > > > (const_basic_block > > > > bb) > > > > || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) > > > > || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_VOTE_ANY) > > > > || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_BFLY) > > > > - || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_IDX))) > > > > + || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_IDX) > > > > + || gimple_call_builtin_p (g, > > > > BUILT_IN_CONSTANT_P))) > > > > return false; > > > > } > > > > > > > > The second occurrence is a bit more interesting: > > > > > > > > gimple * > > > > vrp_insert::build_assert_expr_for (tree cond, tree v) > > > > { > > > > ... > > > > a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond); > > > > assertion = gimple_build_assign (NULL_TREE, a); > > > > ... > > > > tree new_def = create_new_def_for (v, assertion, NULL); > > > > > > > > The fix is also simple though: > > > > > > > > --- a/gcc/tree-vrp.c > > > > +++ b/gcc/tree-vrp.c > > > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for > > > > (tree > > > > name, assert_locus *loc) > > > > if (loc->expr == loc->val) > > > > return false; > > > > > > > > + if (used_by_bcp_p (name)) > > > > + return false; > > > > + > > > > cond = build2 (loc->comp_code, boolean_type_node, loc->expr, > > > > loc- > > > > > val); > > > > assert_stmt = build_assert_expr_for (cond, name); > > > > if (loc->e) > > > > > > > > My original testcase did not trigger anything else. I'm > > > > planning > > > > to > > > > check how this affects the testsuite, but before going further > > > > I > > > > would > > > > like to ask: is this the right direction now? To me it looks a > > > > little-bit more heavy-handed than the original approach, but > > > > maybe > > > > it's > > > > worth it. > > > > > > Disabling threading looks reasonable but I'd rather not disallow > > > BB > > > duplication > > > or renaming. For VRP I guess we want to instead change > > > register_edge_assert_for* to not register assertions for the > > > result > > > of > > > __builtin_constant_p rather than just not allowing VRP to process > > > it > > > (there are other consumers still). > > > > If I understood Jeff correctly, we should disable incremental > > updates > > for absolutely all b_c_p arguments, so affecting as many consumers > > as > > possible must actually be a good thing when this approach is > > considered? > > > > That said, regtest has revealed one more place where this is > > happening: > > rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi -> > > create_new_def_for. The reduced code is: > > > > int a; > > > > void > > b (void) > > { > > while (a) > > __builtin_constant_p (a) ?: 0; > > if (a == 8) > > while (1) > > ; > > } > > > > I guess we are not allowed to cancel this transformation, becase we > > really need LCSSA for later passes? This now contradicts the > > "prohibit > > all updates" idea.. > > Yes. I believe this looks at the issue from a wrong angle. SSA > rewrite > is just renaming and that's OK. > > > If you think that disabling threading is reasonable, could you > > please > > have another look at the original patches? > > > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html > > > > Can anything like this go in after all? > > For v2 I'm not sure why we would allow b_c_p late? That said, both > approaches are reasonable - disallow threading or force bcp to > evaluate to zero (guess that relies on the non-threaded variant to > be known to be _not_ constant, otherwise we create another > inconsistecy?). So for sure v1 looks "safer" at the possible expense > of less (early) optimization. Of course backward threading is not > the only source of jump threading (but it's the only truly early > one).
According to the doc, returning 0 conservatively should not create inconsistencies: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the -O option. v2 was suggested by Jakub as a possible optimization here: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549078.html However, I don't see any differences between v1 and v2 neither in SPEC nor in vmlinux builds on x86_64, so I think the additional complexity is not worth it. There are a few differences between master and v1/v2 in x86_64 vmlinux builds though. For example: https://elixir.bootlin.com/linux/v5.10-rc2/source/kernel/tracepoint.c#L56 Here master performs threading for the overflow case: https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/overflow.h#L297 where the allocated size ends up being SIZE_MAX. Threading goes as far as this b_c_p in kmalloc: https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/slab.h#L538 and ends up emitting jo to kmalloc_large call after mul that calculates the size. I haven't analyzed all the cases, but the asm diff (including context) is just 2k lines, which isn't a lot, given that vmlinux objdump is 6m lines. Best regards, Ilya