On Thu, Sep 10, 2015 at 4:51 PM, Alan Hayward <[email protected]> wrote:
> Hi,
> This patch (attached) adds support for vectorizing conditional expressions
> (PR 65947), for example:
>
> int condition_reduction (int *a, int min_v)
> {
> int last = 0;
> for (int i = 0; i < N; i++)
> if (a[i] < min_v)
> last = a[i];
> return last;
> }
>
> To do this the loop is vectorised to create a vector of data results (ie
> of matching a[i] values). Using an induction variable, an additional
> vector is added containing the indexes where the matches occured. In the
> function epilogue this is reduced to a single max value and then used to
> index into the vector of data results.
> When no values are matched in the loop, the indexes vector will contain
> all zeroes, eventually matching the first entry in the data results vector.
>
> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> REDUC_MAX_EXPR is not supported for the required modes, failing the
> vectorization. On mips it complains that the required vcond expression is
> not supported. It is suggested the relevant backend experts add the
> required backend support.
>
> Using a simple testcase based around a large number of N and run on an
> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> it's original time.
>
> This patch caused binary differences in three spec2006 binaries on aarch64
> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> showed no improvement or degregation in runtime.
>
>
> In the near future I hope to submit a further patch (as PR 66558) which
> optimises the case where the result is simply the index of the loop, for
> example:
> int condition_reduction (int *a, int min_v)
> {
> int last = 0;
> for (int i = 0; i < N; i++)
> if (a[i] < min_v)
> last = i;
> return last;
> }
> In this case a lot of the new code can be optimized away.
>
> I have run check for aarch64, arm and x86 and have seen no regressions.
Now comments on the patch itself.
+ if (code == COND_EXPR)
+ *v_reduc_type = COND_REDUCTION;
so why not simply use COND_EXPR instead of the new v_reduc_type?
+ if (check_reduction && code != COND_EXPR &&
+ vect_is_slp_reduction (loop_info, phi, def_stmt))
&&s go to the next line
+ /* Reduction of the max index and a reduction of the found
+ values. */
+ epilogue_cost += add_stmt_cost (target_cost_data, 1,
+ vec_to_scalar, stmt_info, 0,
+ vect_epilogue);
vec_to_scalar once isn't what the comment suggests. Instead the
comment suggests twice what a regular reduction would do
but I guess we can "hide" the vec_to_scalar cost and "merge" it
with the broadcast. Thus make the above two vector_stmt costs?
+ /* A broadcast of the max value. */
+ epilogue_cost += add_stmt_cost (target_cost_data, 2,
+ scalar_to_vec, stmt_info, 0,
+ vect_epilogue);
comment suggests a single broadcast.
@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
the final vector of induction results: */
exit_phi = NULL;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
- {
+ {
gimple use_stmt = USE_STMT (use_p);
if (is_gimple_debug (use_stmt))
continue;
please avoid unrelated whitespace changes.
+ case COND_EXPR:
+ if (v_reduc_type == COND_REDUCTION)
+ {
...
+ /* Fall through. */
+
case MIN_EXPR:
case MAX_EXPR:
- case COND_EXPR:
aww, so we could already handle COND_EXPR reductions? How do they
differ from what you add? Did you see if that path is even exercised today?
+ /* Create a vector of {init_value, 0, 0, 0...}. */
+ vec<constructor_elt, va_gc> *v;
+ vec_alloc (v, nunits);
+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
+ if (SCALAR_FLOAT_TYPE_P (scalar_type))
+ for (i = 1; i < nunits; ++i)
+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+ build_real (scalar_type, dconst0));
+ else
+ for (i = 1; i < nunits; ++i)
+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+ build_int_cst (scalar_type, 0));
+ init_def = build_constructor (vectype, v);
you can unify the float/int case by using build_zero_cst (scalar_type).
Note that you should build a vector constant instead of a constructor
if init_val is a constant. The convenient way is to build the vector
elements into a tree[] array and use build_vector_stat in that case.
+ /* Find maximum value from the vector of found indexes. */
+ tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
just use make_ssa_name (index_scalar_type);
+ /* Convert the vector of data to the same type as the EQ. */
+ tree vec_data_cast;
+ if ( TYPE_UNSIGNED (index_vec_type))
+ {
How come it never happens the element
sizes do not match? (int index type and double data type?)
+ /* Where the max index occured, use the value from the data vector. */
+ tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, "");
+ gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
+ vec_compare, vec_data_cast);
that is, don't you need to do some widening/shortening on the comparison result?
(also what happens in the case "or all of the values"?)
Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)
+ This function also handles reduction of condition expressions, for example:
+ for (int i = 0; i < N; i++)
+ if (a[i] < value)
+ last = a[i];
+ This is handled by vectorising the loop and creating an additional vector
+ containing the loop indexes for which "a[i] < value" was true. In the
+ function epilogue this is reduced to a single max value and then used to
+ index into the vector of results.
I miss a comment that shows the kind of code we transform this to.
"an additional vector containing the loop indexes" can't work - the vector
will not be large enough ;) Naiively I would have made 'last' a vector,
performing the reduction element-wise and in the epilogue reduce
'last' itself. And it looks like we are already doing that for
int foo (int *a)
{
int val = 0;
for (int i = 0; i < 1024; ++i)
if (a[i] > val)
val = a[i];
return val;
}
I must be missing something. Yes, I think we can't do index reduction yet,
but pr65947-10.c is alrady handled?
Stopping here.
Thanks,
Richard.
>
> Changelog:
>
> 2015-08-28 Alan Hayward <[email protected]>
>
> PR tree-optimization/65947
> * tree-vect-loop.c
> (vect_is_simple_reduction_1): Find condition reductions.
> (vect_model_reduction_cost): Add condition reduction costs.
> (get_initial_def_for_reduction): Add condition reduction initial
> var.
> (vect_create_epilog_for_reduction): Add condition reduction epilog.
> (vectorizable_reduction): Condition reduction support.
> * tree-vect-stmts.c
> (vectorizable_condition): Add vect reduction arg
> * doc/sourcebuild.texi (Vector-specific attributes): Document
> vect_max_reduc
>
> testsuite/Changelog:
>
> PR tree-optimization/65947
> * lib/target-supports.exp
> (check_effective_target_vect_max_reduc): Add.
> * gcc.dg/vect/pr65947-1.c: New test.
> * gcc.dg/vect/pr65947-2.c: New test.
> * gcc.dg/vect/pr65947-3.c: New test.
> * gcc.dg/vect/pr65947-4.c: New test.
> * gcc.dg/vect/pr65947-5.c: New test.
> * gcc.dg/vect/pr65947-6.c: New test.
> * gcc.dg/vect/pr65947-7.c: New test.
> * gcc.dg/vect/pr65947-8.c: New test.
> * gcc.dg/vect/pr65947-9.c: New test.
> * gcc.dg/vect/pr65947-10.c: New test.
> * gcc.dg/vect/pr65947-11.c: New test.
>
>
>
> Thanks,
> Alan
>
>