<[email protected]> writes:
> The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
> mandatory from Armv9.5-a. It introduces instructions for computing the
> floating point absolute maximum and minimum of the two vectors element-wise.
>
> This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of
> this extension are implemented as the following builtin functions:
> * vamax_f16
> * vamaxq_f16
> * vamax_f32
> * vamaxq_f32
> * vamaxq_f64
> * vamin_f16
> * vaminq_f16
> * vamin_f32
> * vaminq_f32
> * vaminq_f64
>
> We are defining a new way to add AArch64 AdvSIMD intrinsics by listing
> all the intrinsics in a .def file and then using that .def file to
> initialise various data structures. This would lead to more concise code
> and easier addition of the new AdvSIMD intrinsics in future.
>
> The faminmax intrinsics are defined using the new approach.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-builtins.cc
> (ENTRY): Macro to parse the contents of
> aarch64-simd-pragma-builtins.def.
> (enum aarch64_builtins): New enum values for faminmax builtins
> via aarch64-simd-pragma-builtins.def.
> (enum aarch64_builtin_signatures): Enum to specify the
> number of operands a builtin will take.
> (ENTRY_VHSDF): Macro to parse the contents of
> aarch64-simd-pragma-builtins.def.
> (struct aarch64_pragma_builtins_data): Struct to hold data from
> aarch64-simd-pragma-builtins.def.
> (aarch64_fntype): New function to define function types of
> intrinsics given an object of type aarch64_pragma_builtins_data.
> (aarch64_init_pragma_builtins): New function to define pragma
> builtins.
> (aarch64_get_pragma_builtin): New function to get a row of
> aarch64_pragma_builtins, given code.
> (handle_arm_neon_h): Modify to call
> aarch64_init_pragma_builtins.
> (aarch64_general_check_builtin_call): Modify to check whether
> required flag is being used for pragma builtins.
> (aarch64_expand_pragma_builtin): New function to emit
> instructions of pragma_builtin.
> (aarch64_general_expand_builtin): Modify to call
> aarch64_expand_pragma_builtin.
> * config/aarch64/aarch64-option-extensions.def
> (AARCH64_OPT_EXTENSION): Introduce new flag for this extension.
> * config/aarch64/aarch64-simd.md
> (@aarch64_<faminmax_uns_op><mode>): Instruction pattern for
> faminmax intrinsics.
> * config/aarch64/aarch64.h
> (TARGET_FAMINMAX): Introduce new flag for this extension.
> * config/aarch64/iterators.md: New iterators and unspecs.
> * config/arm/types.md: Introduce neon_fp_aminmax<q> attributes.
> * doc/invoke.texi: Document extension in AArch64 Options.
> * config/aarch64/aarch64-simd-pragma-builtins.def: New file to
> list pragma builtins.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test.
> * gcc.target/aarch64/simd/faminmax-builtins.c: New test.
> ---
> gcc/config/aarch64/aarch64-builtins.cc | 123 ++++++++++++++++++
> .../aarch64/aarch64-option-extensions.def | 2 +
> .../aarch64/aarch64-simd-pragma-builtins.def | 23 ++++
> gcc/config/aarch64/aarch64-simd.md | 11 ++
> gcc/config/aarch64/aarch64.h | 4 +
> gcc/config/aarch64/iterators.md | 9 ++
> gcc/config/arm/types.md | 5 +
> gcc/doc/invoke.texi | 2 +
> .../aarch64/simd/faminmax-builtins-no-flag.c | 10 ++
> .../aarch64/simd/faminmax-builtins.c | 115 ++++++++++++++++
> 10 files changed, 304 insertions(+)
> create mode 100644 gcc/config/aarch64/aarch64-simd-pragma-builtins.def
> create mode 100644
> gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc
> b/gcc/config/aarch64/aarch64-builtins.cc
> index eb878b933fe..6e64ae86c52 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -757,6 +757,18 @@ typedef struct
> #define VAR1(T, N, MAP, FLAG, A) \
> AARCH64_SIMD_BUILTIN_##T##_##N##A,
>
> +#undef ENTRY
> +#define ENTRY(N, S, M, U, F) \
> + AARCH64_##N,
> +
> +#undef ENTRY_VHSDF
> +#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC, EXTENSIONS) \
> + AARCH64_##NAME##_f16, \
> + AARCH64_##NAME##q_f16, \
> + AARCH64_##NAME##_f32, \
> + AARCH64_##NAME##q_f32, \
> + AARCH64_##NAME##q_f64,
> +
> enum aarch64_builtins
> {
> AARCH64_BUILTIN_MIN,
> @@ -829,6 +841,10 @@ enum aarch64_builtins
> AARCH64_RBIT,
> AARCH64_RBITL,
> AARCH64_RBITLL,
> + /* Pragma builtins. */
> + AARCH64_PRAGMA_BUILTIN_START,
> +#include "aarch64-simd-pragma-builtins.def"
> + AARCH64_PRAGMA_BUILTIN_END,
> /* System register builtins. */
> AARCH64_RSR,
> AARCH64_RSRP,
> @@ -947,6 +963,7 @@ const char *aarch64_scalar_builtin_types[] = {
>
> extern GTY(()) aarch64_simd_type_info aarch64_simd_types[];
>
> +#undef ENTRY
> #define ENTRY(E, M, Q, G) \
> {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode, qualifier_##Q},
> struct aarch64_simd_type_info aarch64_simd_types [] = {
> @@ -1547,6 +1564,80 @@ aarch64_init_simd_builtin_functions (bool
> called_from_pragma)
> }
> }
>
> +enum aarch64_builtin_signatures
> +{
> + AARCH64_BUILTIN_SIGNATURE_MIN,
> + binary,
> + AARCH64_BUILTIN_SIGNATURE_MAX,
> +};
"binary" on its own probably isn't unique enough, and similar names
could clash with target-independent code. One way of avoiding that
would be to use:
AARCH64_BUILTIN_SIGNATURE_binary
and use token pasting in ENTRY to add the "AARCH64_BUILTIN_SIGNATURE_"
prefix. But I guess a more C++ way would be to use enum classes:
enum class aarch64_builtin_signature
{
binary,
};
which is probably a bit nicer. (I don't think we need the min and
max values.) Then:
> +
> +#undef ENTRY
> +#define ENTRY(N, S, M, U, F) \
> + {#N, S, E_##M##mode, U, F},
...this would be:
{#N, aarch64_builtin_signature::S, E_##M##mode, U, F},
> +
> +#undef ENTRY_VHSDF
> +#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC, EXTENSIONS) \
> + ENTRY (NAME##_f16, SIGNATURE, V4HF, UNSPEC, EXTENSIONS) \
> + ENTRY (NAME##q_f16, SIGNATURE, V8HF, UNSPEC, EXTENSIONS) \
> + ENTRY (NAME##_f32, SIGNATURE, V2SF, UNSPEC, EXTENSIONS) \
> + ENTRY (NAME##q_f32, SIGNATURE, V4SF, UNSPEC, EXTENSIONS) \
> + ENTRY (NAME##q_f64, SIGNATURE, V2DF, UNSPEC, EXTENSIONS)
> +
> +/* Initialize pragma builtins. */
> +
> +struct aarch64_pragma_builtins_data
> +{
> + const char *name;
> + aarch64_builtin_signatures signature;
> + machine_mode mode;
> + int unspec;
> + aarch64_feature_flags required_extensions;
> +};
> +
> +static aarch64_pragma_builtins_data aarch64_pragma_builtins[] = {
> +#include "aarch64-simd-pragma-builtins.def"
> +};
> +
> +static tree
> +aarch64_fntype (const aarch64_pragma_builtins_data &builtin_data)
> +{
> + auto type = aarch64_simd_builtin_type (builtin_data.mode, qualifier_none);
> + switch (builtin_data.signature)
> + {
> + case binary:
And similarly here.
> + return build_function_type_list (type, type, type, NULL_TREE);
> + default:
> + gcc_unreachable ();
> + }
> +}
> +
> [...]
> @@ -3189,6 +3287,25 @@ aarch64_expand_builtin_data_intrinsic (unsigned int
> fcode, tree exp, rtx target)
> return ops[0].value;
> }
>
> +static rtx
> +aarch64_expand_pragma_builtin (tree exp, rtx target,
> + const aarch64_pragma_builtins_data* builtin_data)
Formatting, should be:
const aarch64_pragma_builtins_data *builtin_data)
with the "*" after the space.
> +{
> + expand_operand ops[3];
> + auto mode = builtin_data->mode;
> + auto op1 = expand_normal (CALL_EXPR_ARG (exp, 0));
> + auto op2 = expand_normal (CALL_EXPR_ARG (exp, 1));
> + create_output_operand (&ops[0], target, mode);
> + create_input_operand (&ops[1], op1, mode);
> + create_input_operand (&ops[2], op2, mode);
> +
> + auto unspec = builtin_data->unspec;
> + auto icode = code_for_aarch64 (unspec, mode);
> + expand_insn (icode, 3, ops);
> +
> + return target;
> +}
> +
> /* Expand an expression EXP as fpsr or fpcr setter (depending on
> UNSPEC) using MODE. */
> static void
> @@ -3369,6 +3486,11 @@ aarch64_general_expand_builtin (unsigned int fcode,
> tree exp, rtx target,
> && fcode <= AARCH64_RBITLL)
> return aarch64_expand_builtin_data_intrinsic (fcode, exp, target);
>
> + if (auto builtin_data = aarch64_get_pragma_builtin (fcode))
> + {
> + return aarch64_expand_pragma_builtin (exp, target, builtin_data);
> + }
> +
Formatting nit, should be no braces for single statements:
if (auto builtin_data = aarch64_get_pragma_builtin (fcode))
return aarch64_expand_pragma_builtin (exp, target, builtin_data);
> gcc_unreachable ();
> }
>
> [...]
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index 23c03a96371..7542c81ed91 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -9910,3 +9910,14 @@
> "shl\\t%d0, %d1, #16"
> [(set_attr "type" "neon_shift_imm")]
> )
> +
> +;; faminmax
> +(define_insn "@aarch64_<faminmax_uns_op><mode>"
> + [(set (match_operand:VHSDF 0 "register_operand" "=w")
> + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
> + (match_operand:VHSDF 2 "register_operand" "w")]
> + FAMINMAX_UNS))]
> + "TARGET_FAMINMAX"
> + "<faminmax_uns_op>\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
> + [(set_attr "type" "neon_fp_aminmax<q>")]
> +)
Sorry for not noticing this last time, but: rather than add a new
scheduling type for these instructions, let's just assume for now
that they have the same performance characteristics as fmax and fmin.
We can tweak it later if a scheduling model needs to distinguish them.
Reusing an existing type is better because existing scheduling models
will recognise it as a known operation, rather than treating it as an
unknown operation (which is usually done conservatively).
Not that it matters for modern OoO cores, but still. :)
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 2dfb999bea5..de14f57071a 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -457,6 +457,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE
> ATTRIBUTE_UNUSED
> enabled through +gcs. */
> #define TARGET_GCS AARCH64_HAVE_ISA (GCS)
>
> +/* Floating Point Absolute Maximum/Minimum extension instructions are
> + enabled through +faminmax. */
I realise this happens in pre-existing code too, but: there should just be
one space after "/*":
/* Floating Point Absolute Maximum/Minimum extension instructions are
enabled through +faminmax. */
Thanks,
Richard
> +#define TARGET_FAMINMAX AARCH64_HAVE_ISA (FAMINMAX)
> +
> [...]