[PATCH] Fix result for conditional reductions matching at index 0
Hi, When translating conditional reductions based on integer induction, the compiler uses the value zero to indicate the absence of any matches: if the index of the last match is still zero at the end of the loop, the default value is returned. The problem with this approach is that this default value is returned not only when there were no matches at all, but also when the last match occurred at index 0. This causes the test gcc.dg/vect/pr65947-14.c to fail. This patch corrects this by reusing the vector of indices used for COND_REDUCTION, which starts at 1. If the 1-based index of the last match is non-zero, 1 is subtracted from it, otherwise the initial value is returned. I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2, causing both paths through the reduc_code != ERROR_MARK branch being taken). 2017-11-21 Kilian Verhetsel * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the returned value for INTEGER_INDUC_COND_REDUCTION whose last match occurred at index 0. (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets the induction variable to the code generating the epilogue. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c (revision 254913) +++ gcc/tree-vect-loop.c (working copy) @@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node, static void vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt, - gimple *reduc_def_stmt, + gimple *reduc_def_stmt, gimple *induct_stmt, int ncopies, enum tree_code reduc_code, vec reduction_phis, bool double_reduc, @@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec vect_d The first match will be a 1 to allow 0 to be used for non-matching indexes. If there are no matches at all then the vector will be all zeroes. */ - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + == INTEGER_INDUC_COND_REDUCTION) { tree indx_before_incr, indx_after_incr; int nunits_out = TYPE_VECTOR_SUBPARTS (vectype); @@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec vect_d else new_phi_result = PHI_RESULT (new_phis[0]); - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + == INTEGER_INDUC_COND_REDUCTION) && reduc_code != ERROR_MARK) { /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing @@ -4797,76 +4801,118 @@ vect_create_epilog_for_reduction (vec vect_d induction_index); gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT); - /* Vector of {max_index, max_index, max_index,...}. */ - tree max_index_vec = make_ssa_name (index_vec_type); - tree max_index_vec_rhs = build_vector_from_val (index_vec_type, - max_index); - gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec, - max_index_vec_rhs); - gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT); + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + { + /* Vector of {max_index, max_index, max_index,...}. */ + tree max_index_vec = make_ssa_name (index_vec_type); + tree max_index_vec_rhs = build_vector_from_val (index_vec_type, + max_index); + gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec, + max_index_vec_rhs); + gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT); - /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes - with the vector (INDUCTION_INDEX) of found indexes, choosing values - from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC) - otherwise. Only one value should match, resulting in a vector - (VEC_COND) with one data value and the rest zeros. - In the case where the loop never made any matches, every index will - match, resulting in a vector with all data values (which will all be - the default value). */ + /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes + with the vector (INDUCTION_INDEX) of found indexes, choosing values + from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC) + otherwise. Only one value should match, resulting in a vector + (VEC_COND) with one data value and the rest zeros. In the case + where the loop never made any matches, every index will match, + resulting in a vector with all data values (which will all be the + default value). */ - /* Compare the max index vector to the vector of found indexes to find - the posit
Re: [PATCH] Fix result for conditional reductions matching at index 0
> This is PR81179 I think, please mention that in the changelog. Correct, my bad for missing that. > This unconditionally pessimizes code even if there is no valid index > zero, right? Almost, since for a loop such as: #define OFFSET 1 unsigned int find(const unsigned int *a, unsigned int v) { unsigned int result = 120; for (unsigned int i = OFFSET; i < 32+OFFSET; i++) { if (a[i-OFFSET] == v) result = i; } return result; } the index i will match the contents of the index vector used here --- but this does indeed pessimize the code generated for, say, OFFSET = 2. It is probably more sensible to use the existing code path in those situations. > The issue with the COND_REDUCITION index vector is overflow IIRC. Does that mean such overflows can already manifest themselves for regular COND_REDUCTION? I had assumed sufficient checks were already in place because of the presence of the is_nonwrapping_integer_induction test.
Re: [PATCH] Fix result for conditional reductions matching at index 0
Thank you both for your comments. I have added the check to ensure the index vector won't cause an overflow. I also added tests to the testsuite in order to check that the loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX iterations. I was not able to write code that triggers INTEGER_INDUC_COND_REDUCTION when using char or other smaller types (changing the types of last, min_v and a to something else causes COND_REDUCTION to be used instead), so these tests are only compiled and not executed. I also moved an instruction that generates a vector of zeroes (used for COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this should remove the unused vector that Alan noticed. 2017-11-21 Kilian Verhetsel gcc/ChangeLog: PR testsuite/81179 * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the returned value for INTEGER_INDUC_COND_REDUCTION whose last match occurred at index 0. (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets the induction variable to the code generating the epilogue and check that no overflow will occur. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for overflows when compiling conditional reductions. * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N UINT_MAX + +/* Condition reduction with maximum possible loop size. Will fail to vectorize + because values in the index vector will overflow. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N (UINT_MAX - 1) + +/* Condition reduction with maximum possible loop size, minus one. This should + still be vectorized correctly. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c (revision 254996) +++ gcc/tree-vect-loop.c (working copy) @@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node, static void vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt, - gimple *reduc_def_stmt, + gimple *reduc_def_stmt, gimple *induct_stmt, int ncopies, enum tree_code reduc_code, vec reduction_phis, bool double_reduc, @@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec vect_d The first match will be a 1 to allow 0 to be used for non-matching indexes. If there are no matches at all then the vector will be all zeroes. */ - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + == INTEGER_INDUC_COND_REDUCTION) { tree indx_before_incr, indx_after_incr; int nunits_out = TYPE_VECTOR_SUBPARTS (vectype); @@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec vect_d else new_phi_result = PHI_RESULT (new_phis[0]); - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_R
Re: [PATCH] Fix result for conditional reductions matching at index 0
Hello, Jakub Jelinek writes: > As it doesn't apply, I can't easily check what the patch generates > on the PR80631 testcases vs. my thoughts on that; though, if it emits > something more complicated for the simple cases, perhaps we could improve > those (not handle it like COND_REDUCTION, but instead as former > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0 > if 0 isn't usable for the condition never matched value. While you could use values different from 0, I'm not sure this can be done as efficiently. 0 is convenient because a single bitwise-and between the index vector and the condition will set lanes that do not contain a match to 0. Jakub Jelinek writes: > First of all, I fail to see why we don't handle negative step, > that can be done with REDUC_MIN instead of REDUC_MAX. That would not work without first using values different from 0 to indicate the absence of matches (except in cases where all indices are negative), which I assume is why the test was there in the first place. Below is the patch with fixed formatting and changes to make it apply cleanly to r255537 (as far as I can tell this was simply caused by some variables and constants having been renamed). 2017-11-21 Kilian Verhetsel gcc/ChangeLog: PR testsuite/81179 * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the returned value for INTEGER_INDUC_COND_REDUCTION whose last match occurred at index 0. (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets the induction variable to the code generating the epilogue and check that no overflow will occur. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for overflows when compiling conditional reductions. * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N UINT_MAX + +/* Condition reduction with maximum possible loop size. Will fail to vectorize + because values in the index vector will overflow. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N (UINT_MAX - 1) + +/* Condition reduction with maximum possible loop size, minus one. This should + still be vectorized correctly. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c (revision 255537) +++ gcc/tree-vect-loop.c (working copy) @@ -4325,7 +4325,7 @@ get_initial_defs_for_reduction (slp_tree slp_node, static void vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt, - gimple *reduc_def_stmt, + gimple *reduc_def_stmt, gimple *induct_stmt, int ncopies, internal_fn reduc_fn, vec reduction_phis, bool double_reduc, @@ -4486,7 +4486,9 @@ vect_create_epilog_for_reduction (vec vect_d The first match will be a 1 to allow 0 to be used for non-matching indexes. If there are no matches at all then the vector will be all zeroes. */ - if
Re: [PATCH] Fix result for conditional reductions matching at index 0
Jakub Jelinek writes: > Of course it can be done efficiently, what we care most is that the body of > the vectorized loop is efficient. That's fair, I was looking at the x86 assembly being generated when a single vectorized iteration was enough (because that is the context in which I first encountered this bug): int f(unsigned int *x, unsigned int k) { unsigned int result = 8; for (unsigned int i = 0; i < 8; i++) { if (x[i] == k) result = i; } return result; } where the vpand instruction this generates would have to be replaced with a variable blend if the default value weren't 0 — although I had not realized even SSE4.1 on x86 includes such an instruction, making this point less relevant. > Thanks, it applies cleanly now > > + else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION > > + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > > + == INTEGER_INDUC_COND_REDUCTION)) > > + && reduc_fn == IFN_LAST)» > > contains a character at the end of line that makes it not to compile. My bad, I must have added this when I opened the patch file itself to inspect it... > Another thing is, as your patch is quite large, we need a copyright > assignment for the changes before we can accept it, see > https://gcc.gnu.org/contribute.html for details. > > If you are already covered by an assignment of some company, please tell > us which one it is, otherwise contact us and we'll get you the needed > forms. I am not covered by any copyright assignment yet. Do I need to send you any additional information?