On Wed, May 3, 2017 at 6:32 PM, Jeff Law <[email protected]> wrote:
> [ With the patch attached... ]
>
>
> On 05/03/2017 10:31 AM, Jeff Law wrote:
>>
>> This is the first of 3-5 patches to address pr78496.
>>
>> The goal of these patches is to catch jump threads earlier in the pipeline
>> to avoid undesirable behavior in PRE and more generally be able to exploit
>> the secondary opportunities exposed by jump threading.
>>
>> One of the more serious issues I found while investigating 78496 was VRP
>> failing to find what should have been obvious jump threads. The fundamental
>> issue is VRP will simplify conditionals which are fed by a typecast prior to
>> jump threading. So something like this:
>>
>> x = (typecast) y;
>> if (x == 42)
>>
>> Can often be transformed into:
>>
>> if (y == 42)
>>
>>
>> The problem is any ASSERT_EXPRS after the conditional will reference "x"
>> rather than "y". That in turn makes it impossible for VRP to use those
>> ASSERT_EXPRs to thread later jumps that use x == <whatever>
>>
>>
>> More concretely consider this gimple code:
>>
>>
>> ;; basic block 5, loop depth 0, count 0, freq 10000, maybe hot
>> ;; prev block 4, next block 12, flags: (NEW, REACHABLE, VISITED)
>> ;; pred: 3 [50.0%] (TRUE_VALUE,EXECUTABLE)
>> ;; 4 [100.0%] (FALLTHRU,EXECUTABLE)
>> # iftmp.0_2 = PHI <1(3), 0(4)>
>> in_loop_7 = (unsigned char) iftmp.0_2;
>> if (in_loop_7 != 0)
>> goto <bb 6>; [33.00%]
>> else
>> goto <bb 12>; [67.00%]
>>
>> ;; succ: 6 [33.0%] (TRUE_VALUE,EXECUTABLE)
>> ;; 12 [67.0%] (FALSE_VALUE,EXECUTABLE)
>>
>> ;; basic block 12, loop depth 0, count 0, freq 6700, maybe hot
>> ;; prev block 5, next block 6, flags: (NEW)
>> ;; pred: 5 [67.0%] (FALSE_VALUE,EXECUTABLE)
>> in_loop_15 = ASSERT_EXPR <in_loop_7, in_loop_7 == 0>;
>> goto <bb 7>; [100.00%]
>> ;; succ: 7 [100.0%] (FALLTHRU)
>>
>> ;; basic block 6, loop depth 0, count 0, freq 3300, maybe hot
>> ;; prev block 12, next block 7, flags: (NEW, REACHABLE, VISITED)
>> ;; pred: 5 [33.0%] (TRUE_VALUE,EXECUTABLE)
>> in_loop_14 = ASSERT_EXPR <in_loop_7, in_loop_7 != 0>;
>> simple_iv ();
>> ;; succ: 7 [100.0%] (FALLTHRU,EXECUTABLE)
>>
>> And later we have:
>>
>> ;; basic block 9, loop depth 0, count 0, freq 8476, maybe hot
>> ;; prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
>> ;; pred: 7 [84.8%] (FALSE_VALUE,EXECUTABLE)
>> if (in_loop_7 == 0)
>> goto <bb 10>; [36.64%]
>> else
>> goto <bb 11>; [63.36%]
>>
>> VRP knows it can replace the uses of in_loop_7 in the conditionals in
>> blocks 5 and 9 with iftmp.0_2 and happily does so *before* jump threading
>> (but well after ASSERT_EXPR insertion).
>>
>> As a result VRP is unable to utilize the ASSERT_EXPRs in blocks 12 and 6
>> (which reference in_loop_7) to thread the jump at bb9 (which now references
>> iftmp.0_2).
>>
>>
>> The cases in pr78496 are slightly more complex, but boil down to the same
>> core issue -- simplifying the conditional too early.
>>
>> Thankfully this is easy to fix. We just split the conditional
>> simplification into two steps so that the transformation noted above occurs
>> after jump threading (the other simplifications we want to occur before jump
>> threading).
>>
>> This allows VRP1 to pick up 27 missed jump threads in the testcase from
>> 78496. It could well be enough to address 78496, but since we don't have a
>> solid description of the desired end result I won't consider 78496 fixed
>> quite yet as there's significant further improvements we can make.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu. Installing on the
>> trunk.
I think this is a hack ;) Basically the issue is that jump-threading
uses ASSERT_EXPRs
at all (which are an implementation detail of VRP). As far as I
understand it does that
because VRP can do "fancy" things and create ASSERT_EXPRs that do not directly
map to the conditional but to its operand def stmts.
I have meanwhile factored this "fancieness" out into (ok, bad name...)
register_edge_assert_for which records all these fancy asserts in a
vec. This is
now used from EVRP:
gimple *stmt = last_stmt (pred_e->src);
if (stmt
&& gimple_code (stmt) == GIMPLE_COND
&& (op0 = gimple_cond_lhs (stmt))
&& TREE_CODE (op0) == SSA_NAME
&& (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
|| POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))))
{
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "Visiting controlling predicate ");
print_gimple_stmt (dump_file, stmt, 0, 0);
}
/* Entering a new scope. Try to see if we can find a VR
here. */
tree op1 = gimple_cond_rhs (stmt);
if (TREE_OVERFLOW_P (op1))
op1 = drop_tree_overflow (op1);
tree_code code = gimple_cond_code (stmt);
auto_vec<assert_info, 8> asserts;
register_edge_assert_for (op0, pred_e, code, op0, op1, asserts);
if (TREE_CODE (op1) == SSA_NAME)
register_edge_assert_for (op1, pred_e, code, op0, op1, asserts);
regular VRP transforms those into its own assert representation in
finish_register_edge_assert_for (though assert_info should be enough
for jump threading).
So I think it should be possible for jump threading to use this new
machinery (and even DOM based jump threading or backwards threading
could make use of this!)
Richard.
>>
>> Jeff
>>
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 92a4e395ba8..32cc81978df 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,14 @@
> +2017-05-03 Jeff Law <[email protected]>
> +
> + PR tree-optimization/78496
> + * tree-vrp.c (simplify_cond_using_ranges_1): Renamed
> + from simplify_cond_using_ranges. Split off code to walk
> + backwards through casts into ...
> + (simplify_cond_using_ranges_2): New function.
> + (simplify_stmt_using_ranges): Call simplify_cond_using_ranges_1.
> + (execute_vrp): After identifying jump threads, call
> + simplify_cond_using_ranges_2.
> +
> 2017-05-03 Jeff Downs <[email protected]>
> Rainer Orth <[email protected]>
>
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 596468d33e9..55a44e4635a 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2017-05-03 Jeff Law <[email protected]>
> +
> + PR tree-optimization/78496
> + * gcc.dg/tree-ssa/ssa-thread-15.c: New test.
> +
> 2017-05-03 Uros Bizjak <[email protected]>
>
> * g++.dg/lto/pr79671_0.C (foo): Fix asm constraints.
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-15.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-15.c
> new file mode 100644
> index 00000000000..f73268cacbb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-15.c
> @@ -0,0 +1,51 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +
> +/* We should thread the if (!in_loop) completely leaving
> + just two conditionals. */
> +/* { dg-final { scan-tree-dump-times "if \\(" 2 "vrp1" } } */
> +
> +
> +union tree_node;
> +typedef union tree_node *tree;
> +
> +enum size_type_kind
> +{
> + SIZETYPE,
> + SSIZETYPE,
> + BITSIZETYPE,
> + SBITSIZETYPE,
> + TYPE_KIND_LAST
> +};
> +extern tree size_int_kind (long, enum size_type_kind);
> +
> +
> +
> +typedef struct
> +{
> +
> + tree base, step;
> +
> +} affine_iv;
> +
> +struct loop
> +{
> +
> + int num;
> +};
> +extern unsigned char simple_iv ();
> +
> +unsigned char
> +dr_analyze_innermost (struct loop *loop, tree poffset)
> +{
> + affine_iv offset_iv;
> + unsigned char in_loop = (loop && loop->num);
> +
> +
> + if (in_loop)
> + simple_iv ();
> +
> + if (!in_loop)
> + offset_iv.step = size_int_kind (0, SSIZETYPE);
> +
> +}
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index df5e4267491..6a4035dc924 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -9977,7 +9977,7 @@ range_fits_type_p (value_range *vr, unsigned
> dest_precision, signop dest_sgn)
> the original conditional. */
>
> static bool
> -simplify_cond_using_ranges (gcond *stmt)
> +simplify_cond_using_ranges_1 (gcond *stmt)
> {
> tree op0 = gimple_cond_lhs (stmt);
> tree op1 = gimple_cond_rhs (stmt);
> @@ -10080,6 +10080,22 @@ simplify_cond_using_ranges (gcond *stmt)
> }
> }
> }
> + return false;
> +}
> +
> +/* STMT is a conditional at the end of a basic block.
> +
> + If the conditional is of the form SSA_NAME op constant and the SSA_NAME
> + was set via a type conversion, try to replace the SSA_NAME with the RHS
> + of the type conversion. Doing so makes the conversion dead which helps
> + subsequent passes. */
> +
> +static void
> +simplify_cond_using_ranges_2 (gcond *stmt)
> +{
> + tree op0 = gimple_cond_lhs (stmt);
> + tree op1 = gimple_cond_rhs (stmt);
> + enum tree_code cond_code = gimple_cond_code (stmt);
>
> /* If we have a comparison of an SSA_NAME (OP0) against a constant,
> see if OP0 was set by a type conversion where the source of
> @@ -10097,7 +10113,7 @@ simplify_cond_using_ranges (gcond *stmt)
>
> if (!is_gimple_assign (def_stmt)
> || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
> - return false;
> + return;
>
> innerop = gimple_assign_rhs1 (def_stmt);
>
> @@ -10142,12 +10158,16 @@ simplify_cond_using_ranges (gcond *stmt)
> tree newconst = fold_convert (TREE_TYPE (innerop), op1);
> gimple_cond_set_lhs (stmt, innerop);
> gimple_cond_set_rhs (stmt, newconst);
> - return true;
> + update_stmt (stmt);
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Folded into: ");
> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> + fprintf (dump_file, "\n");
> + }
> }
> }
> }
> -
> - return false;
> }
>
> /* Simplify a switch statement using the value range of the switch
> @@ -10746,7 +10766,7 @@ simplify_stmt_using_ranges (gimple_stmt_iterator
> *gsi)
> }
> }
> else if (gimple_code (stmt) == GIMPLE_COND)
> - return simplify_cond_using_ranges (as_a <gcond *> (stmt));
> + return simplify_cond_using_ranges_1 (as_a <gcond *> (stmt));
> else if (gimple_code (stmt) == GIMPLE_SWITCH)
> return simplify_switch_using_ranges (as_a <gswitch *> (stmt));
> else if (is_gimple_call (stmt)
> @@ -11759,6 +11779,21 @@ execute_vrp (bool warn_array_bounds_p)
> the datastructures built by VRP. */
> identify_jump_threads ();
>
> + /* A comparison of an SSA_NAME against a constant where the SSA_NAME
> + was set by a type conversion can often be rewritten to use the
> + RHS of the type conversion.
> +
> + However, doing so inhibits jump threading through the comparison.
> + So that transformation is not performed until after jump threading
> + is complete. */
> + basic_block bb;
> + FOR_EACH_BB_FN (bb, cfun)
> + {
> + gimple *last = last_stmt (bb);
> + if (last && gimple_code (last) == GIMPLE_COND)
> + simplify_cond_using_ranges_2 (as_a <gcond *> (last));
> + }
> +
> vrp_free_lattice ();
>
> free_numbers_of_iterations_estimates (cfun);
>