On Thu, Jan 27, 2022 at 1:16 PM Maciej W. Rozycki <ma...@embecosm.com> wrote: > > Do not mark vector element broadcast resulting from OpenCL operations as > having side effects for targets that have a suitable hardware operation, > so that the `vec_duplicateM' standard RTL pattern can be directly used > for them. This does not happen currently, because any RTL pattern named > `vec_duplicateM' cannot express the side effects presumed, and therefore > a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly > prevents any `vec_duplicateM' pattern from being used. > > This side-effect annotation was introduced for the C frontend with: > <https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar > vector binary operation"), and finally landed with commit 0e3a99ae913c > ("c-typeck.c (scalar_to_vector): New function."), along with original > support for scalar-to-vector OpenCL operations. > > Support for these operations was then gradually added to C++ with: > <https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed > scalar-vector operations"), landing with commit a212e43fca22 ("re PR > c++/54427 (Expose more vector extensions)"), followed by: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++] > Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR > c++/54427 (Expose more vector extensions)"). > > And side-effect annotation for C++ was retrofitted, again gradually, > with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++] > Missing save_expr in vector-scalar ops"), landing with commit > 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before > build_vector_from_val."), and then: > <https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector > conditional expression with scalar arguments"), landing with commit > 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with > 1 vector and 2 scalars.") > > This was all long before we gained support for the `vec_duplicateM' > standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR > and associated optab") and it may have had sense for where the element > broadcast operation was always open-coded. However where provided by > hardware the element broadcast operation is just like any other vector > operation, presumably cheap, and whether to repeat it or not should be > based on the cost of the operation and not hardwired. > > So there is no point in artificially marking the operation as having a > side effect for the lone purpose of preventing repetition. This does > not currently prevent targets like the Aarch64 and MIPS ones from using > the hardware element broadcast operation, but it requires the backend to > wire it explicitly via the `vec_initMN' standard RTL pattern rather than > relying on the middle end to pick it up via the `vec_duplicateM' pattern > automatically. This change lifts this limitation. > > gcc/ > * c/c-typeck.cc (build_binary_op) Do not annotate vector element > broadcast operations as having a side effect if there is such an > operation provided by the backend. > * cp/call.cc (build_conditional_expr): Likewise. > * cp/typeck.cc (cp_build_binary_op): Likewise. > --- > Hi, > > I've come across this while working on an out-of-tree microarchitecture, > which does not require a `vec_initMN' RTL pattern, however has hardware > support suitable for cheap `vec_duplicateM'. I could not persuade the > middle end to make use of the new `vec_duplicateM' pattern however until I > removed the side-effect anbnotation. So I think this is a move in the > right direction even if we do not have support for said microarchitecture > at the moment. > > For some reason I could not persuade our current Aarch64 code to use the > `vec_duplicateM' pattern directly either. This is because at the time > `store_constructor' is called the vector mode for the element broadcast > operation is say `V4SFmode' (for a vector of floats), however there is no > corresponding `vec_duplicatev4sf' pattern. Code in the `vec_initv4sfsf' > pattern does the right thing though, referring to `aarch64_simd_dupv4sf' > by hand instead. There is a `vec_duplicatevnx4sf' pattern corresponding > to `VNx4SFmode', but I was unable to figure out how to activate it. > > This change has passed regression-testing with the `aarch64-linux-gnu' > target (with `-march=armv9-a') and QEMU in the user emulation mode across > all the GCC languages and libraries. > > NB there are indentation issues with code surrounding places I have > modified, but I have chosen not to address them so as not to obfuscate > this change. Likewise there are unnecessary compound statements used in > the switch statements in the C++ frontend parts modified. > > Any questions or comments? Otherwise OK once GCC 13 has opened? > > Maciej > --- > gcc/c/c-typeck.cc | 9 +++++++-- > gcc/cp/call.cc | 19 +++++++++++++++---- > gcc/cp/typeck.cc | 9 +++++++-- > 3 files changed, 29 insertions(+), 8 deletions(-) > > gcc-scalar-to-vector-no-side-effect.diff > Index: gcc/gcc/c/c-typeck.cc > =================================================================== > --- gcc.orig/gcc/c/c-typeck.cc > +++ gcc/gcc/c/c-typeck.cc > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > #include "gomp-constants.h" > #include "spellcheck-tree.h" > #include "gcc-rich-location.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "asan.h" > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > bool maybe_const = true; > tree sc; > sc = c_fully_fold (op0, false, &maybe_const); > - sc = save_expr (sc); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type1)) == CODE_FOR_nothing) > + sc = save_expr (sc);
This doesn't make much sense - I suppose the CONSTRUCTOR retains TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE and thus should have been cleared during gimplification or in the end ignored by RTL expansion. You do not add a testcase so it's hard to see what goes wrong where exactly. Richard. > sc = convert (TREE_TYPE (type1), sc); > op0 = build_vector_from_val (type1, sc); > if (!maybe_const) > @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en > bool maybe_const = true; > tree sc; > sc = c_fully_fold (op1, false, &maybe_const); > - sc = save_expr (sc); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type0)) == CODE_FOR_nothing) > + sc = save_expr (sc); > sc = convert (TREE_TYPE (type0), sc); > op1 = build_vector_from_val (type0, sc); > if (!maybe_const) > Index: gcc/gcc/cp/call.cc > =================================================================== > --- gcc.orig/gcc/cp/call.cc > +++ gcc/gcc/cp/call.cc > @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. > #include "langhooks.h" > #include "c-family/c-objc.h" > #include "internal-fn.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "gcc-rich-location.h" > @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio > return error_mark_node; > } > > + bool have_vec_duplicate > + = optab_handler (vec_duplicate_optab, > + TYPE_MODE (vtype)) != CODE_FOR_nothing; > + > arg2 = cp_convert (stype, arg2, complain); > - arg2 = save_expr (arg2); > + if (!have_vec_duplicate) > + arg2 = save_expr (arg2); > arg2 = build_vector_from_val (vtype, arg2); > arg2_type = vtype; > arg3 = cp_convert (stype, arg3, complain); > - arg3 = save_expr (arg3); > + if (!have_vec_duplicate) > + arg3 = save_expr (arg3); > arg3 = build_vector_from_val (vtype, arg3); > arg3_type = vtype; > } > @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio > return error_mark_node; > case stv_firstarg: > { > - arg2 = save_expr (arg2); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (arg3_type)) == > CODE_FOR_nothing) > + arg2 = save_expr (arg2); > arg2 = convert (TREE_TYPE (arg3_type), arg2); > arg2 = build_vector_from_val (arg3_type, arg2); > arg2_type = TREE_TYPE (arg2); > @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio > } > case stv_secondarg: > { > - arg3 = save_expr (arg3); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (arg2_type)) == > CODE_FOR_nothing) > + arg3 = save_expr (arg3); > arg3 = convert (TREE_TYPE (arg2_type), arg3); > arg3 = build_vector_from_val (arg2_type, arg3); > arg3_type = TREE_TYPE (arg3); > Index: gcc/gcc/cp/typeck.cc > =================================================================== > --- gcc.orig/gcc/cp/typeck.cc > +++ gcc/gcc/cp/typeck.cc > @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. > #include "c-family/c-objc.h" > #include "c-family/c-ubsan.h" > #include "gcc-rich-location.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "asan.h" > @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t > case stv_firstarg: > { > op0 = convert (TREE_TYPE (type1), op0); > - op0 = save_expr (op0); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type1)) == CODE_FOR_nothing) > + op0 = save_expr (op0); > op0 = build_vector_from_val (type1, op0); > type0 = TREE_TYPE (op0); > code0 = TREE_CODE (type0); > @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t > case stv_secondarg: > { > op1 = convert (TREE_TYPE (type0), op1); > - op1 = save_expr (op1); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type0)) == CODE_FOR_nothing) > + op1 = save_expr (op1); > op1 = build_vector_from_val (type0, op1); > type1 = TREE_TYPE (op1); > code1 = TREE_CODE (type1);