Do not mark vector element broadcast resulting from OpenCL operations as having side effects for targets that have a suitable hardware operation, so that the `vec_duplicateM' standard RTL pattern can be directly used for them. This does not happen currently, because any RTL pattern named `vec_duplicateM' cannot express the side effects presumed, and therefore a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly prevents any `vec_duplicateM' pattern from being used.
This side-effect annotation was introduced for the C frontend with: <https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then: <https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar vector binary operation"), and finally landed with commit 0e3a99ae913c ("c-typeck.c (scalar_to_vector): New function."), along with original support for scalar-to-vector OpenCL operations. Support for these operations was then gradually added to C++ with: <https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then: <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed scalar-vector operations"), landing with commit a212e43fca22 ("re PR c++/54427 (Expose more vector extensions)"), followed by: <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then: <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++] Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR c++/54427 (Expose more vector extensions)"). And side-effect annotation for C++ was retrofitted, again gradually, with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++] Missing save_expr in vector-scalar ops"), landing with commit 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before build_vector_from_val."), and then: <https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector conditional expression with scalar arguments"), landing with commit 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with 1 vector and 2 scalars.") This was all long before we gained support for the `vec_duplicateM' standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR and associated optab") and it may have had sense for where the element broadcast operation was always open-coded. However where provided by hardware the element broadcast operation is just like any other vector operation, presumably cheap, and whether to repeat it or not should be based on the cost of the operation and not hardwired. So there is no point in artificially marking the operation as having a side effect for the lone purpose of preventing repetition. This does not currently prevent targets like the Aarch64 and MIPS ones from using the hardware element broadcast operation, but it requires the backend to wire it explicitly via the `vec_initMN' standard RTL pattern rather than relying on the middle end to pick it up via the `vec_duplicateM' pattern automatically. This change lifts this limitation. gcc/ * c/c-typeck.cc (build_binary_op) Do not annotate vector element broadcast operations as having a side effect if there is such an operation provided by the backend. * cp/call.cc (build_conditional_expr): Likewise. * cp/typeck.cc (cp_build_binary_op): Likewise. --- Hi, I've come across this while working on an out-of-tree microarchitecture, which does not require a `vec_initMN' RTL pattern, however has hardware support suitable for cheap `vec_duplicateM'. I could not persuade the middle end to make use of the new `vec_duplicateM' pattern however until I removed the side-effect anbnotation. So I think this is a move in the right direction even if we do not have support for said microarchitecture at the moment. For some reason I could not persuade our current Aarch64 code to use the `vec_duplicateM' pattern directly either. This is because at the time `store_constructor' is called the vector mode for the element broadcast operation is say `V4SFmode' (for a vector of floats), however there is no corresponding `vec_duplicatev4sf' pattern. Code in the `vec_initv4sfsf' pattern does the right thing though, referring to `aarch64_simd_dupv4sf' by hand instead. There is a `vec_duplicatevnx4sf' pattern corresponding to `VNx4SFmode', but I was unable to figure out how to activate it. This change has passed regression-testing with the `aarch64-linux-gnu' target (with `-march=armv9-a') and QEMU in the user emulation mode across all the GCC languages and libraries. NB there are indentation issues with code surrounding places I have modified, but I have chosen not to address them so as not to obfuscate this change. Likewise there are unnecessary compound statements used in the switch statements in the C++ frontend parts modified. Any questions or comments? Otherwise OK once GCC 13 has opened? Maciej --- gcc/c/c-typeck.cc | 9 +++++++-- gcc/cp/call.cc | 19 +++++++++++++++---- gcc/cp/typeck.cc | 9 +++++++-- 3 files changed, 29 insertions(+), 8 deletions(-) gcc-scalar-to-vector-no-side-effect.diff Index: gcc/gcc/c/c-typeck.cc =================================================================== --- gcc.orig/gcc/c/c-typeck.cc +++ gcc/gcc/c/c-typeck.cc @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. #include "gomp-constants.h" #include "spellcheck-tree.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op0, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type1), sc); op0 = build_vector_from_val (type1, sc); if (!maybe_const) @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op1, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type0), sc); op1 = build_vector_from_val (type0, sc); if (!maybe_const) Index: gcc/gcc/cp/call.cc =================================================================== --- gcc.orig/gcc/cp/call.cc +++ gcc/gcc/cp/call.cc @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "langhooks.h" #include "c-family/c-objc.h" #include "internal-fn.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "gcc-rich-location.h" @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio return error_mark_node; } + bool have_vec_duplicate + = optab_handler (vec_duplicate_optab, + TYPE_MODE (vtype)) != CODE_FOR_nothing; + arg2 = cp_convert (stype, arg2, complain); - arg2 = save_expr (arg2); + if (!have_vec_duplicate) + arg2 = save_expr (arg2); arg2 = build_vector_from_val (vtype, arg2); arg2_type = vtype; arg3 = cp_convert (stype, arg3, complain); - arg3 = save_expr (arg3); + if (!have_vec_duplicate) + arg3 = save_expr (arg3); arg3 = build_vector_from_val (vtype, arg3); arg3_type = vtype; } @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio return error_mark_node; case stv_firstarg: { - arg2 = save_expr (arg2); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg3_type)) == CODE_FOR_nothing) + arg2 = save_expr (arg2); arg2 = convert (TREE_TYPE (arg3_type), arg2); arg2 = build_vector_from_val (arg3_type, arg2); arg2_type = TREE_TYPE (arg2); @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio } case stv_secondarg: { - arg3 = save_expr (arg3); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg2_type)) == CODE_FOR_nothing) + arg3 = save_expr (arg3); arg3 = convert (TREE_TYPE (arg2_type), arg3); arg3 = build_vector_from_val (arg2_type, arg3); arg3_type = TREE_TYPE (arg3); Index: gcc/gcc/cp/typeck.cc =================================================================== --- gcc.orig/gcc/cp/typeck.cc +++ gcc/gcc/cp/typeck.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. #include "c-family/c-objc.h" #include "c-family/c-ubsan.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t case stv_firstarg: { op0 = convert (TREE_TYPE (type1), op0); - op0 = save_expr (op0); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + op0 = save_expr (op0); op0 = build_vector_from_val (type1, op0); type0 = TREE_TYPE (op0); code0 = TREE_CODE (type0); @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t case stv_secondarg: { op1 = convert (TREE_TYPE (type0), op1); - op1 = save_expr (op1); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + op1 = save_expr (op1); op1 = build_vector_from_val (type0, op1); type1 = TREE_TYPE (op1); code1 = TREE_CODE (type1);