Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Richard Biener Wed, 21 Oct 2015 03:42:50 -0700

On Mon, Oct 19, 2015 at 10:17 AM, Alan Hayward <alan.hayw...@arm.com> wrote:
>
>
>>On 30/09/2015 13:45, "Richard Biener" <richard.guent...@gmail.com> wrote:
>>
>>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayw...@arm.com>
>>>wrote:
>>>>
>>>>
>>>> On 18/09/2015 14:53, "Alan Hayward" <alan.hayw...@arm.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>On 18/09/2015 14:26, "Alan Lawrence" <alan.lawre...@arm.com> wrote:
>>>>>
>>>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>>>
>>>>>>> Ok, I see.
>>>>>>>
>>>>>>> That this case is already vectorized is because it implements
>>>>>>>MAX_EXPR,
>>>>>>> modifying it slightly to
>>>>>>>
>>>>>>> int foo (int *a)
>>>>>>> {
>>>>>>>    int val = 0;
>>>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>>>      if (a[i] > val)
>>>>>>>        val = a[i] + 1;
>>>>>>>    return val;
>>>>>>> }
>>>>>>>
>>>>>>> makes it no longer handled by current code.
>>>>>>>
>>>>>>
>>>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>>>expressions
>>>>>>like
>>>>>>
>>>>>>int foo (int *a)
>>>>>>{
>>>>>>    int val = 0;
>>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>>      if (some_expression (i))
>>>>>>        val = another_expression (i);
>>>>>>    return val;
>>>>>>}
>>>>>
>>>>>Yes, thatâs correct. Hopefully my new test cases should cover
>>>>>everything.
>>>>>
>>>>
>>>> Attached is a new version of the patch containing all the changes
>>>> requested by Richard.
>>>
>>>+      /* Compare the max index vector to the vector of found indexes to
>>>find
>>>+        the postion of the max value.  This will result in either a
>>>single
>>>+        match or all of the values.  */
>>>+      tree vec_compare = make_ssa_name (index_vec_type_signed);
>>>+      gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>>>EQ_EXPR,
>>>+                                                    induction_index,
>>>+                                                    max_index_vec);
>>>
>>>I'm not sure all targets can handle this.  If I deciper the code
>>>correctly then we do
>>>
>>>  mask = induction_index == max_index_vec;
>>>  vec_and = mask & vec_data;
>>>
>>>plus some casts.  So this is basically
>>>
>>>  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>>>
>>>without the need to relate the induction index vector type to the data
>>>vector type.
>>>I believe this is also the form all targets support.
>>
>>
>>Ok, Iâll replace this.
>>
>>>
>>>I am missing a comment before all this code-generation that shows the
>>>transform
>>>result with the variable names used in the code-gen.  I have a hard
>>>time connecting
>>>things here.
>>
>>Ok, Iâll add some comments.
>>
>>>
>>>+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>>>scalar_type,
>>>+                                            matched_data_reduc);
>>>+      epilog_stmt = gimple_build_assign (new_scalar_dest,
>>>+                                        matched_data_reduc_cast);
>>>+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>>>+      gimple_assign_set_lhs (epilog_stmt, new_temp);
>>>
>>>this will leave the stmt unsimplified.  scalar sign-changes should use
>>>NOP_EXPR,
>>>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>>>Also just do like before - first make_ssa_name and then directly use it
>>>in the
>>>gimple_build_assign.
>>
>>We need the VIEW_CONVERT_EXPR for the cases where we have float data
>>values. The index is always integer.
>>
>>
>>>
>>>The patch is somewhat hard to parse with all the indentation changes.  A
>>>context
>>>diff would be much easier to read in those contexts.
>>
>>Ok, Iâll make the next patch like that
>>
>>>
>>>+  if (v_reduc_type == COND_REDUCTION)
>>>+    {
>>>+      widest_int ni;
>>>+
>>>+      if (! max_loop_iterations (loop, &ni))
>>>+       {
>>>+         if (dump_enabled_p ())
>>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>>+                            "loop count not known, cannot create cond "
>>>+                            "reduction.\n");
>>>
>>>ugh.  That's bad.
>>>
>>>+      /* The additional index will be the same type as the condition.
>>>Check
>>>+        that the loop can fit into this less one (because we'll use up
>>>the
>>>+        zero slot for when there are no matches).  */
>>>+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>>>+      if (wi::geu_p (ni, wi::to_widest (max_index)))
>>>+       {
>>>+         if (dump_enabled_p ())
>>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>>+                            "loop size is greater than data size.\n");
>>>+         return false;
>>>
>>>Likewise.
>>
>>We could do better if we made the index type larger.
>>But as a first implementation of this optimisation, I didnât want to
>>overcomplicate things more.
>>
>>>
>>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>>>gimple_stmt_iterator *gsi,
>>>   if (dump_enabled_p ())
>>>     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>>>
>>>+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>>>+
>>>   /* FORNOW: Multiple types are not supported for condition.  */
>>>   if (code == COND_EXPR)
>>>
>>>this change looks odd (or wrong).  The type should be _only_ set/changed
>>>during
>>>analysis.
>>
>>
>>The problem is, for COND_EXPRs, this function calls
>>vectorizable_condition(), which sets STMT_VINFO_TYPE to
>>condition_vec_info_type.


Ah, the pre-existing issue of the transform phase re-doing the analysis...
a fix would be to condition that call on vec_stmt == NULL, thus analysis
phase.

>>Therefore we need something to restore it back to reduc_vec_info_type on
>>the non-analysis call.
>>
>>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
>>the call to vectorizable_condition(), but that looked worse.
>>I could back up the value of STMT_VINFO_TYPE before calling
>>vectorizable_condition() and then restore it after? I think thatâll look a
>>lot better.
>>
>>
>>>
>>>+
>>>+      /* For cond reductions we need to add an additional conditional
>>>based on
>>>+        the loop index.  */
>>>+      if (v_reduc_type == COND_REDUCTION)
>>>+       {
>>>+         int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>>+         int k;
>>>...
>>>+         STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>>>+         set_vinfo_for_stmt (index_condition, index_vec_info);
>>>+
>>>+         /* Update the phi with the vec cond.  */
>>>+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>>>+                      UNKNOWN_LOCATION);
>>>
>>>same as before - I am missing a comment that shows the generated code
>>>and connects
>>>the local vars used.
>>
>>Ok, Iâll add something
>>
>>>
>>>
>>>+         tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>>>+         gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>>>ccompare);
>>>+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt,
>>>GSI_SAME_STMT);
>>>+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>>>
>>>hum - are you sure this works with ncopies > 1?  Will it use the
>>>correct vec_stmt?
>>
>>We donât support this when ncopies >1.
>>
>>In vectorizable_reduction():
>>
>>if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
>>    {
>>      if (dump_enabled_p ())
>>       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                        "multiple types in double reduction or condition "
>>                        "reduction.\n");
>>      return false;
>>    }
>>
>>
>>
>>>
>>>I still dislike the v_reduc_type plastered and passed everywhere.  Can
>>>you explore
>>>adding the reduction kind to stmt_info?
>>
>>Ok, I can do that.
>>
>>
>>Thanks for the comments.
>>Iâll put together a patch with the above changes.
>>
>>Thanks,
>>Alan.
>>
>>
>
> Richard, as requested I've updated with the follow changes:
>
> * AND and EQ replaced with a COND_EXPR
> * Better comments for the code gen, included references to variable names
> * Kept the VIEW_CONVERT_EXPR - we need this for when the data is float type
> * Backed up STMT_VINFO_TYPE before the call to vectorizable_condition()
> and restored after the call. I considered extracting the relvant parts of
> vectorizable_condition() into a sub function, but it had too many
> dependencies with the rest of vectorizable_condition().
> * v_reduc_type is now part of stmt_info
> * Created a diff using 50 lines of context

Ok with the vectorizable_condition call guarded as suggested instead.

Thanks,
Richard.

>
> Thanks,
> Alan.
>
>

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Reply via email to