https://gcc.gnu.org/g:4abc54b6d6c3129cf4233e49231b1255b236c2be
commit 4abc54b6d6c3129cf4233e49231b1255b236c2be Author: Julian Brown <jul...@codesourcery.com> Date: Wed Nov 25 09:08:01 2020 -0800 [og10] vect: Add target hook to prefer gather/scatter instructions For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. The attached patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. 2021-01-13 Julian Brown <jul...@codesourcery.com> gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer gather/scatter instructions to scalar/elementwise fallback. * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook. Diff: --- gcc/ChangeLog.omp | 11 +++++++++++ gcc/config/gcn/gcn.cc | 2 ++ gcc/doc/tm.texi | 5 +++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 8 ++++++++ gcc/tree-vect-stmts.cc | 9 +++++++-- 6 files changed, 35 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp index 06ee9d83b27..e8ff6483444 100644 --- a/gcc/ChangeLog.omp +++ b/gcc/ChangeLog.omp @@ -1,3 +1,14 @@ +2021-01-13 Julian Brown <jul...@codesourcery.com> + + * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add + documentation hook. + * doc/tm.texi: Regenerate. + * target.def (prefer_gather_scatter): Add target hook under vectorizer. + * tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer + gather/scatter instructions to scalar/elementwise fallback. + * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define + hook. + 2021-01-13 Julian Brown <jul...@codesourcery.com> * omp-offload.cc (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter. diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index d6531f55190..a247eecd8e8 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -8059,6 +8059,8 @@ gcn_dwarf_register_span (rtx rtl) gcn_vector_alignment_reachable #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p +#undef TARGET_VECTORIZE_PREFER_GATHER_SCATTER +#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true struct gcc_target targetm = TARGET_INITIALIZER; diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c8b8b126b24..e64c7541f60 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6482,6 +6482,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter stores. @end deftypefn +@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER +This hook is set to TRUE if gather loads or scatter stores are cheaper on +this target than a sequence of elementwise loads or stores. +@end deftypevr + @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool}) This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 658e1e63371..645950b12d7 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4309,6 +4309,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_BUILTIN_SCATTER +@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER + @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN @hook TARGET_SIMD_CLONE_ADJUST diff --git a/gcc/target.def b/gcc/target.def index fdad7bbc93e..e4b26a7df3e 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2044,6 +2044,14 @@ all zeros. GCC can then try to branch around the instruction instead.", (unsigned ifn), default_empty_mask_is_expensive) +/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\ +we cannot use a contiguous access. */ +DEFHOOKPOD +(prefer_gather_scatter, + "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\ +this target than a sequence of elementwise loads or stores.", + bool, false) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f8d8636b139..a7e33120eda 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2217,9 +2217,14 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that - allows us to use contiguous accesses. */ + allows us to use contiguous accesses. + + On some targets (e.g. AMD GCN), always use gather/scatter accesses + here since those are the only types of vector loads/stores available, + and the fallback case of using elementwise accesses is very + inefficient. */ if (*memory_access_type == VMAT_ELEMENTWISE - && single_element_p + && (targetm.vectorize.prefer_gather_scatter || single_element_p) && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, masked_p, gs_info))