https://gcc.gnu.org/g:8678fc697046fba1014f1db6321ee670538b0881
commit 8678fc697046fba1014f1db6321ee670538b0881 Author: Thomas Schwinge <tschwi...@baylibre.com> Date: Wed Jul 3 12:20:17 2024 +0200 Revert "[og10] vect: Add target hook to prefer gather/scatter instructions" Testing current OG14 commit 735bbbfc6eaf58522c3ebb0946b66f33958ea134 for '--target=amdgcn-amdhsa' (I've tested '-march=gfx908', '-march=gfx1100'), this change has been identified to be causing ~100 instances of execution test PASS -> FAIL, thus wrong-code generation. It's possible that we've had the same misbehavior also on OG13 and earlier, but just nobody ever tested that. And/or, that at some point in time, the original patch fell out of sync, wasn't updated for relevant upstream vectorizer changes. Until someone gets to analyze that (and upstream these changes here), we shall revert this commit on OG14. gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Remove documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Remove target hook under vectorizer. * tree-vect-stmts.cc (get_group_load_store_type): Remove code to optionally prefer gather/scatter instructions to scalar/elementwise fallback. * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Remove hook definition. This reverts OG14 commit 4abc54b6d6c3129cf4233e49231b1255b236c2be. Diff: --- gcc/ChangeLog.omp | 13 +++++++++++++ gcc/config/gcn/gcn.cc | 2 -- gcc/doc/tm.texi | 5 ----- gcc/doc/tm.texi.in | 2 -- gcc/target.def | 8 -------- gcc/tree-vect-stmts.cc | 9 ++------- 6 files changed, 15 insertions(+), 24 deletions(-) diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp index ac4a30e81c8c..3dd5bd03dc99 100644 --- a/gcc/ChangeLog.omp +++ b/gcc/ChangeLog.omp @@ -1,3 +1,16 @@ +2024-07-03 Thomas Schwinge <tschwi...@baylibre.com> + + * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Remove + documentation hook. + * doc/tm.texi: Regenerate. + * target.def (prefer_gather_scatter): Remove target hook under + vectorizer. + * tree-vect-stmts.cc (get_group_load_store_type): Remove code to + optionally prefer gather/scatter instructions to + scalar/elementwise fallback. + * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): + Remove hook definition. + 2024-05-19 Roger Sayle <ro...@nextmovesoftware.com> * config/nvptx/nvptx.md (popcount<mode>2): Split into... diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index a247eecd8e8a..d6531f55190c 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -8059,8 +8059,6 @@ gcn_dwarf_register_span (rtx rtl) gcn_vector_alignment_reachable #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p -#undef TARGET_VECTORIZE_PREFER_GATHER_SCATTER -#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true struct gcc_target targetm = TARGET_INITIALIZER; diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index e64c7541f605..c8b8b126b242 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6482,11 +6482,6 @@ The default is @code{NULL_TREE} which means to not vectorize scatter stores. @end deftypefn -@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER -This hook is set to TRUE if gather loads or scatter stores are cheaper on -this target than a sequence of elementwise loads or stores. -@end deftypevr - @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool}) This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 645950b12d78..658e1e63371e 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4309,8 +4309,6 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_BUILTIN_SCATTER -@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER - @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN @hook TARGET_SIMD_CLONE_ADJUST diff --git a/gcc/target.def b/gcc/target.def index e4b26a7df3ee..fdad7bbc93e2 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2044,14 +2044,6 @@ all zeros. GCC can then try to branch around the instruction instead.", (unsigned ifn), default_empty_mask_is_expensive) -/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\ -we cannot use a contiguous access. */ -DEFHOOKPOD -(prefer_gather_scatter, - "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\ -this target than a sequence of elementwise loads or stores.", - bool, false) - /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a7e33120edaf..f8d8636b139a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2217,14 +2217,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that - allows us to use contiguous accesses. - - On some targets (e.g. AMD GCN), always use gather/scatter accesses - here since those are the only types of vector loads/stores available, - and the fallback case of using elementwise accesses is very - inefficient. */ + allows us to use contiguous accesses. */ if (*memory_access_type == VMAT_ELEMENTWISE - && (targetm.vectorize.prefer_gather_scatter || single_element_p) + && single_element_p && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, masked_p, gs_info))