On Thu, Apr 28, 2016 at 12:52 PM, Jeff Law <[email protected]> wrote:
> On 04/20/2016 03:02 AM, Richard Biener wrote:
>>
>> On Tue, Apr 19, 2016 at 7:50 PM, Patrick Palka <[email protected]>
>> wrote:
>>>
>>> This patch makes the jump threader look through the BIT_AND_EXPRs and
>>> BIT_IOR_EXPRs within a condition so that we could find dominating
>>> ASSERT_EXPRs that could help make the overall condition evaluate to a
>>> constant. For example, we currently don't perform any jump threading in
>>> the following test case even though it's known that if the code calls
>>> foo() then it can't possibly call bar() afterwards:
>
> I'd always envisioned we'd do more simplifications than we're doing now and
> this fits well within how I expected to exploit ASSERT_EXPRs and DOM's
> available expressions/const/copies tables.
>
> However, I do have some long term direction plans that may make how we do
> this change a bit. In the mean time I don't see a reason not to go forward
> with your change.
>
>
>
>
>>>
>>> void
>>> baz_1 (int a, int b, int c)
>>> {
>>> if (a && b)
>>> foo ();
>>> if (!b && c)
>>> bar ();
>>> }
>>>
>>> <bb 2>:
>>> _4 = a_3(D) != 0;
>>> _6 = b_5(D) != 0;
>>> _7 = _4 & _6;
>>> if (_7 != 0)
>>> goto <bb 3>;
>>> else
>>> goto <bb 4>;
>>>
>>> <bb 3>:
>>> b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>;
>>> foo ();
>>>
>>> <bb 4>:
>>> _10 = b_5(D) == 0;
>>> _12 = c_11(D) != 0;
>>> _13 = _10 & _12;
>>> if (_13 != 0)
>>> goto <bb 5>;
>>> else
>>> goto <bb 6>;
>>>
>>> <bb 5>:
>>> bar ();
>>>
>>> <bb 6>:
>>> return;
>>>
>>> So we here miss a jump threading opportunity that would have made bb 3
>>> jump
>>> straight to bb 6 instead of falling through to bb 4.
>>>
>>> If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
>>> there is an ASSERT_EXPR that says its left operand b_5 is non-zero. We
>>> could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
>>> always false. This is what this patch does, basically by making
>>> simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
>>> BIT_IOR_EXPRs.
>>>
>>> Does this seem like a good idea/approach?
>
> So the other approach I've been pondering for a while is backwards
> substitution.
>
> So given _13 != 0, we expand that to
>
> (_10 & _12) != 0
>
> Which further expands into
> ((b_5 == 0) & (c_11 != 0)) != 0
>
> And we follow b_5 back to the ASSERT_EXPR which allows us to start
> simplifying terms.
>
>
> The glitch in that plan is there is no easy linkage between the use of b_5
> in bb4 and the ASSERT_EXPR in bb3. That's something Aldy, Andrew and myself
> are looking at independently for some of Aldy's work.
I see.. One other deficiency I noticed in the existing threading code
is that there may have been multiple ASSERT_EXPRs registered for b_5,
so bb3 could look like
<bb 3>:
b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>;
b_16 = ASSERT_EXPR <b_15, b_15 != 1>;
foo ();
but currently we don't consider the 2nd ASSERT_EXPR because we only
look at the immediate uses of b_5. This oversight makes us fail to
thread
void bar (void);
void baz (void);
void
foo (int a)
{
if (a != 5 && a != 10)
bar ();
if (a == 10)
baz ();
}
>
> But that's all future work... Back to your patch...
>
>>>
>>> Notes:
>>>
>>> 1. This patch introduces a "regression" in
>>> gcc.dg/tree-ssa/ssa-thread-11.c
>>> in that we no longer perform FSM threading during vrp2 but instead we
>>> detect two new jump threading opportunities during vrp1. Not sure if
>>> the new code is better but it is shorter. I wonder how this should be
>>> resolved...
>>
>>
>> Try adjusting the testcase so that it performs the FSM threading again
>> or adjust the expected outcome...
>
> Right. We just need to look closely at the before/after dumps, make a
> decision about whether the result is better or worse. If it's better, then
> we adjust the output to the new better result (and I would claim that the
> same threading, but done earlier is better).
>
> Shorter isn't generally a good indicator of whether or not something is
> better. The thing to look at is the number of conditional executed on the
> various paths through the CFG.
>
> In this specific instance, there's a good chance your analysis is catching
> something earlier and allowing it to be better simplified. But let's do the
> analysis to make sure.
>From what I can tell, the patch does cause fewer conditionals to get
executed in general. I spot two extra jumps that are threaded in the
final code compared to without the patch. I wouldn't trust my
analysis though!
By the way, the test case ssa-thread-11.c is somewhat buggy since its
two functions lack return statements. Also I would expect abort() to
have the noreturn attribute.
>
>>
>>> 2. I haven't tested the performance impact of this patch. What would be
>>> a good way to do this?
>
> I have ways to do that. Jump threading inherently is about reducing the
> number of conditionals we have to evaluate at runtime. Valgrind can give
> us that information. So what I do is...
>
> Build a control compiler. Use that to build an older version of gcc
> (gcc-4.7.3 in particular) . I then use the just-built gcc-4.7.3 to build a
> bunch of .i files under valgrind control.
>
> I then repeat, but the first compiler has whatever patch I want to evaluate
> installed.
>
> I can then compare the output from valgrind to see which has better
> branching behaviour.
That makes sense! I will play around with this technique.
>
> It takes a few hours, but it's not terrible and has provided good data
> through the years. I'll throw your patch into the tester and see what it
> spits out.
>
> I'll also have a look at the details of the patch.
Thanks!
>
> jeff