Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

Ilya Leoshkevich via Gcc Tue, 03 Nov 2020 05:49:51 -0800

On Fri, 2020-10-30 at 09:22 +0100, Richard Biener wrote:
> On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich <i...@linux.ibm.com>
> wrote:
> > On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote:
> > > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
> > > <gcc@gcc.gnu.org> wrote:
> > > > Hi,
> > > > 
> > > > I'd like to revive the old discussion regarding the interaction
> > > > of
> > > > jump threading and b_c_p causing the latter to incorrectly
> > > > return 1
> > > > in
> > > > certain cases:
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
> > > > 
> > > > The conclusion was that this happening during threading is just
> > > > a
> > > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p
> > > > should
> > > > not be registered for incremental updating.
> > > > 
> > > > I performed a little experiment and added an assertion to
> > > > create_new_def_for:
> > > > 
> > > > --- a/gcc/tree-into-ssa.c
> > > > +++ b/gcc/tree-into-ssa.c
> > > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple
> > > > *stmt,
> > > > def_operand_p def)
> > > >  {
> > > >    tree new_name;
> > > > 
> > > > +  gcc_checking_assert (!used_by_bcp_p (old_name));
> > > > +
> > > >    timevar_push (TV_TREE_SSA_INCREMENTAL);
> > > > 
> > > >    if (!update_ssa_initialized_fn)
> > > > 
> > > > This has of course fired when performing basic block
> > > > duplication
> > > > during
> > > > threading, which can be fixed by avoiding duplication of basic
> > > > blocks
> > > > wi
> > > > th b_c_p calls:
> > > > 
> > > > --- a/gcc/tree-cfg.c
> > > > +++ b/gcc/tree-cfg.c
> > > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p
> > > > (const_basic_block
> > > > bb)
> > > >               || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> > > >               || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_VOTE_ANY)
> > > >               || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_BFLY)
> > > > -             || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_IDX)))
> > > > +             || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_IDX)
> > > > +             || gimple_call_builtin_p (g,
> > > > BUILT_IN_CONSTANT_P)))
> > > >         return false;
> > > >      }
> > > > 
> > > > The second occurrence is a bit more interesting:
> > > > 
> > > > gimple *
> > > > vrp_insert::build_assert_expr_for (tree cond, tree v)
> > > > {
> > > >   ...
> > > >   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
> > > >   assertion = gimple_build_assign (NULL_TREE, a);
> > > >   ...
> > > >   tree new_def = create_new_def_for (v, assertion, NULL);
> > > > 
> > > > The fix is also simple though:
> > > > 
> > > > --- a/gcc/tree-vrp.c
> > > > +++ b/gcc/tree-vrp.c
> > > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for
> > > > (tree
> > > > name, assert_locus *loc)
> > > >    if (loc->expr == loc->val)
> > > >      return false;
> > > > 
> > > > +  if (used_by_bcp_p (name))
> > > > +    return false;
> > > > +
> > > >    cond = build2 (loc->comp_code, boolean_type_node, loc->expr,
> > > > loc-
> > > > > val);
> > > >    assert_stmt = build_assert_expr_for (cond, name);
> > > >    if (loc->e)
> > > > 
> > > > My original testcase did not trigger anything else.  I'm
> > > > planning
> > > > to
> > > > check how this affects the testsuite, but before going further
> > > > I
> > > > would
> > > > like to ask: is this the right direction now?  To me it looks a
> > > > little-bit more heavy-handed than the original approach, but
> > > > maybe
> > > > it's
> > > > worth it.
> > > 
> > > Disabling threading looks reasonable but I'd rather not disallow
> > > BB
> > > duplication
> > > or renaming.  For VRP I guess we want to instead change
> > > register_edge_assert_for* to not register assertions for the
> > > result
> > > of
> > > __builtin_constant_p rather than just not allowing VRP to process
> > > it
> > > (there are other consumers still).
> > 
> > If I understood Jeff correctly, we should disable incremental
> > updates
> > for absolutely all b_c_p arguments, so affecting as many consumers
> > as
> > possible must actually be a good thing when this approach is
> > considered?
> > 
> > That said, regtest has revealed one more place where this is
> > happening:
> > rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi ->
> > create_new_def_for.  The reduced code is:
> > 
> > int a;
> > 
> > void
> > b (void)
> > {
> >   while (a)
> >     __builtin_constant_p (a) ?: 0;
> >   if (a == 8)
> >     while (1)
> >       ;
> > }
> > 
> > I guess we are not allowed to cancel this transformation, becase we
> > really need LCSSA for later passes?  This now contradicts the
> > "prohibit
> > all updates" idea..
> 
> Yes.  I believe this looks at the issue from a wrong angle.  SSA
> rewrite
> is just renaming and that's OK.
> 
> > If you think that disabling threading is reasonable, could you
> > please
> > have another look at the original patches?
> > 
> > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html
> > 
> > Can anything like this go in after all?
> 
> For v2 I'm not sure why we would allow b_c_p late?  That said, both
> approaches are reasonable - disallow threading or force bcp to
> evaluate to zero (guess that relies on the non-threaded variant to
> be known to be _not_ constant, otherwise we create another
> inconsistecy?).  So for sure v1 looks "safer" at the possible expense
> of less (early) optimization.  Of course backward threading is not
> the only source of jump threading (but it's the only truly early
> one).


According to the doc, returning 0 conservatively should not create
inconsistencies:

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

A return of 0 does not indicate that the value is not a constant, but
merely that GCC cannot prove it is a constant with the specified value
of the -O option.



v2 was suggested by Jakub as a possible optimization here:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549078.html

However, I don't see any differences between v1 and v2 neither in SPEC
nor in vmlinux builds on x86_64, so I think the additional complexity
is not worth it.



There are a few differences between master and v1/v2 in x86_64 vmlinux
builds though. For example:

https://elixir.bootlin.com/linux/v5.10-rc2/source/kernel/tracepoint.c#L56

Here master performs threading for the overflow case:

https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/overflow.h#L297

where the allocated size ends up being SIZE_MAX. Threading goes as far
as this b_c_p in kmalloc:

https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/slab.h#L538

and ends up emitting jo to kmalloc_large call after mul that
calculates the size.

I haven't analyzed all the cases, but the asm diff (including context)
is just 2k lines, which isn't a lot, given that vmlinux objdump is 6m
lines.

Best regards,
Ilya

Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

Reply via email to