[PATCH 5/8] aarch64: Add support for SVE_B16B16

Richard Sandiford Fri, 15 Nov 2024 04:54:07 -0800

This patch adds support for the SVE_B16B16 extension, which provides
non-widening BF16 versions of existing instructions.


Mostly it's just a simple extension of iterators.  The main
complications are:

(1) The new instructions have no immediate forms.  This is easy to
    handle for the cond_* patterns (the ones that have an explicit
    else value) since those are already divided into register and
    non-register versions.  All we need to do is tighten the predicates.

    However, the @aarch64_pred_<optab><mode> patterns handle the
    immediates directly.  Rather than complicate them further,
    it seemed best to add a single @aarch64_pred_<optab><mode> for
    all BF16 arithmetic.

(2) There is no BFSUBR, so the usual method of handling reversed
    operands breaks down.  The patch deals with this using some
    new attributes that together disable the "BFSUBR" alternative.

(3) Similarly, there are no BFMAD or BFMSB instructions, so we need
    to disable those forms in the BFMLA and BFMLS patterns.

The patch includes support for generic bf16 vectors too.

It would be possible to use these instructions for scalars, as with
the recent FLOGB patch, but that's left as future work.

gcc/
        * config/aarch64/aarch64-option-extensions.def
        (sve-b16b16): New extension.
        * doc/invoke.texi: Document it.
        * config/aarch64/aarch64.h (TARGET_SME_B16B16, TARGET_SVE2_OR_SME2)
        (TARGET_SSVE_B16B16): New macros.
        * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
        Conditionally define __ARM_FEATURE_SVE_B16B16
        * config/aarch64/aarch64-sve-builtins-sve2.def: Add AARCH64_FL_SVE2
        to the SVE2p1 requirements.  Add SVE_B16B16 forms of existing
        intrinsics.
        * config/aarch64/aarch64-sve-builtins.cc (type_suffixes): Treat
        bfloat as a floating-point type.
        (TYPES_h_bfloat): New macro.
        * config/aarch64/aarch64.md (is_bf16, is_rev, supports_bf16_rev)
        (mode_enabled): New attributes.
        (enabled): Test mode_enabled.
        * config/aarch64/iterators.md (SVE_FULL_F_BF): New mode iterator.
        (SVE_CLAMP_F): Likewise.
        (SVE_Fx24): Add BF16 modes when TARGET_SSVE_B16B16.
        (sve_lane_con): Handle BF16 modes.
        (b): Handle SF and DF modes.
        (is_bf16): New mode attribute.
        (supports_bf16, supports_bf16_rev): New int attributes.
        * config/aarch64/predicates.md
        (aarch64_sve_float_maxmin_immediate): Reject BF16 modes.
        * config/aarch64/aarch64-sve.md
        (*post_ra_<sve_fp_op><mode>3): Add BF16 support, and likewise
        for the associated define_split.
        (<optab:SVE_COND_FP_BINARY_OPTAB><mode>): Add BF16 support.
        (@cond_<optab:SVE_COND_FP_BINARY><mode>): Likewise.
        (*cond_<optab:SVE_COND_FP_BINARY><mode>_2_relaxed): Likewise.
        (*cond_<optab:SVE_COND_FP_BINARY><mode>_2_strict): Likewise.
        (*cond_<optab:SVE_COND_FP_BINARY><mode>_3_relaxed): Likewise.
        (*cond_<optab:SVE_COND_FP_BINARY><mode>_3_strict): Likewise.
        (*cond_<optab:SVE_COND_FP_BINARY><mode>_any_relaxed): Likewise.
        (*cond_<optab:SVE_COND_FP_BINARY><mode>_any_strict): Likewise.
        (@aarch64_mul_lane_<mode>): Likewise.
        (<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
        (@aarch64_pred_<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
        (@cond_<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
        (*cond_<optab:SVE_COND_FP_TERNARY><mode>_4_relaxed): Likewise.
        (*cond_<optab:SVE_COND_FP_TERNARY><mode>_4_strict): Likewise.
        (*cond_<optab:SVE_COND_FP_TERNARY><mode>_any_relaxed): Likewise.
        (*cond_<optab:SVE_COND_FP_TERNARY><mode>_any_strict): Likewise.
        (@aarch64_<optab:SVE_FP_TERNARY_LANE>_lane_<mode>): Likewise.
        * config/aarch64/aarch64-sve2.md
        (@aarch64_pred_<optab:SVE_COND_FP_BINARY><mode>): Define BF16 version.
        (@aarch64_sve_fclamp<mode>): Add BF16 support.
        (*aarch64_sve_fclamp<mode>_x): Likewise.
        (*aarch64_sve_<maxmin_uns_op><SVE_Fx24:mode>): Likewise.
        (*aarch64_sve_single_<maxmin_uns_op><SVE_Fx24:mode>): Likewise.
        * config/aarch64/aarch64.cc (aarch64_sve_float_arith_immediate_p)
        (aarch64_sve_float_mul_immediate_p): Return false for BF16 modes.

gcc/testsuite/
        * lib/target-supports.exp: Test the assembler for sve-b16b16 support.
        * gcc.target/aarch64/pragma_cpp_predefs_4.c: Test the new B16B16
        macros.
        * gcc.target/aarch64/sve/fmad_1.c: Test bfloat16 too.
        * gcc.target/aarch64/sve/fmla_1.c: Likewise.
        * gcc.target/aarch64/sve/fmls_1.c: Likewise.
        * gcc.target/aarch64/sve/fmsb_1.c: Likewise.
        * gcc.target/aarch64/sve/cond_mla_9.c: New test.
        * gcc.target/aarch64/sme2/acle-asm/clamp_bf16_x2.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/clamp_bf16_x4.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/max_bf16_x2.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/max_bf16_x4.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/maxnm_bf16_x2.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/maxnm_bf16_x4.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/min_bf16_x2.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/min_bf16_x4.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/minnm_bf16_x2.c: Likewise.
        * gcc.target/aarch64/sme2/acle-asm/minnm_bf16_x4.c: Likewise.
        * gcc.target/aarch64/sve/bf16_arith_1.c: Likewise.
        * gcc.target/aarch64/sve/bf16_arith_1.h: Likewise.
        * gcc.target/aarch64/sve/bf16_arith_2.c: Likewise.
        * gcc.target/aarch64/sve/bf16_arith_3.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/add_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/clamp_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/max_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/maxnm_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/min_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/minnm_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/mla_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/mla_lane_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/mls_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/mls_lane_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/mul_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/mul_lane_bf16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/sub_bf16.c: Likewise.
---
 gcc/config/aarch64/aarch64-c.cc               |   3 +
 .../aarch64/aarch64-option-extensions.def     |   3 +
 .../aarch64/aarch64-sve-builtins-sve2.def     |  27 ++
 gcc/config/aarch64/aarch64-sve-builtins.cc    |   7 +-
 gcc/config/aarch64/aarch64-sve.md             | 371 ++++++++++--------
 gcc/config/aarch64/aarch64-sve2.md            |  75 +++-
 gcc/config/aarch64/aarch64.cc                 |   6 +-
 gcc/config/aarch64/aarch64.h                  |   9 +
 gcc/config/aarch64/aarch64.md                 |  29 +-
 gcc/config/aarch64/iterators.md               |  63 ++-
 gcc/config/aarch64/predicates.md              |   1 +
 gcc/doc/invoke.texi                           |   3 +
 .../gcc.target/aarch64/pragma_cpp_predefs_4.c |  41 ++
 .../aarch64/sme2/acle-asm/clamp_bf16_x2.c     |  98 +++++
 .../aarch64/sme2/acle-asm/clamp_bf16_x4.c     | 108 +++++
 .../aarch64/sme2/acle-asm/max_bf16_x2.c       | 211 ++++++++++
 .../aarch64/sme2/acle-asm/max_bf16_x4.c       | 253 ++++++++++++
 .../aarch64/sme2/acle-asm/maxnm_bf16_x2.c     | 211 ++++++++++
 .../aarch64/sme2/acle-asm/maxnm_bf16_x4.c     | 253 ++++++++++++
 .../aarch64/sme2/acle-asm/min_bf16_x2.c       | 211 ++++++++++
 .../aarch64/sme2/acle-asm/min_bf16_x4.c       | 253 ++++++++++++
 .../aarch64/sme2/acle-asm/minnm_bf16_x2.c     | 211 ++++++++++
 .../aarch64/sme2/acle-asm/minnm_bf16_x4.c     | 253 ++++++++++++
 .../gcc.target/aarch64/sve/bf16_arith_1.c     |  10 +
 .../gcc.target/aarch64/sve/bf16_arith_1.h     |  24 ++
 .../gcc.target/aarch64/sve/bf16_arith_2.c     |   8 +
 .../gcc.target/aarch64/sve/bf16_arith_3.c     |   8 +
 .../gcc.target/aarch64/sve/cond_mla_9.c       |  25 ++
 gcc/testsuite/gcc.target/aarch64/sve/fmad_1.c |   9 +-
 gcc/testsuite/gcc.target/aarch64/sve/fmla_1.c |   9 +-
 gcc/testsuite/gcc.target/aarch64/sve/fmls_1.c |   9 +-
 gcc/testsuite/gcc.target/aarch64/sve/fmsb_1.c |   9 +-
 .../aarch64/sve2/acle/asm/add_bf16.c          | 315 +++++++++++++++
 .../aarch64/sve2/acle/asm/clamp_bf16.c        |  49 +++
 .../aarch64/sve2/acle/asm/max_bf16.c          | 301 ++++++++++++++
 .../aarch64/sve2/acle/asm/maxnm_bf16.c        | 301 ++++++++++++++
 .../aarch64/sve2/acle/asm/min_bf16.c          | 301 ++++++++++++++
 .../aarch64/sve2/acle/asm/minnm_bf16.c        | 301 ++++++++++++++
 .../aarch64/sve2/acle/asm/mla_bf16.c          | 341 ++++++++++++++++
 .../aarch64/sve2/acle/asm/mla_lane_bf16.c     | 135 +++++++
 .../aarch64/sve2/acle/asm/mls_bf16.c          | 341 ++++++++++++++++
 .../aarch64/sve2/acle/asm/mls_lane_bf16.c     | 135 +++++++
 .../aarch64/sve2/acle/asm/mul_bf16.c          | 315 +++++++++++++++
 .../aarch64/sve2/acle/asm/mul_lane_bf16.c     | 121 ++++++
 .../aarch64/sve2/acle/asm/sub_bf16.c          | 304 ++++++++++++++
 gcc/testsuite/lib/target-supports.exp         |   2 +-
 46 files changed, 5864 insertions(+), 209 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/clamp_bf16_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/clamp_bf16_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/max_bf16_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/max_bf16_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/maxnm_bf16_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/maxnm_bf16_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/min_bf16_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/min_bf16_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/minnm_bf16_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/minnm_bf16_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/bf16_arith_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/bf16_arith_1.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/bf16_arith_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/bf16_arith_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_mla_9.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/add_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/clamp_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/max_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/maxnm_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/min_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/minnm_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/mla_bf16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/mla_lane_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/mls_bf16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/mls_lane_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/mul_bf16.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/mul_lane_bf16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sub_bf16.c

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index d1ae80c0bb3..0b59ceb49e3 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -208,6 +208,9 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
                        "__ARM_FEATURE_SVE_MATMUL_FP32", pfile);
   aarch64_def_or_undef (TARGET_SVE_F64MM,
                        "__ARM_FEATURE_SVE_MATMUL_FP64", pfile);
+  aarch64_def_or_undef (AARCH64_HAVE_ISA (SVE_B16B16)
+                       && (TARGET_SVE2 || TARGET_SME2),
+                       "__ARM_FEATURE_SVE_B16B16", pfile);
   aarch64_def_or_undef (TARGET_SVE2, "__ARM_FEATURE_SVE2", pfile);
   aarch64_def_or_undef (TARGET_SVE2_AES, "__ARM_FEATURE_SVE2_AES", pfile);
   aarch64_def_or_undef (TARGET_SVE2_BITPERM,
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index c9d419afc8f..a5ab16233ba 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -165,6 +165,9 @@ AARCH64_FMV_FEATURE("rpres", RPRES, ())
 
 AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
 
+/* This specifically does not imply +sve.  */
+AARCH64_OPT_EXTENSION("sve-b16b16", SVE_B16B16, (), (), (), "")
+
 AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
 
 AARCH64_FMV_FEATURE("f32mm", SVE_F32MM, (F32MM))
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index c641ed510ff..39b5a59ae79 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -335,3 +335,30 @@ DEF_SVE_FUNCTION_GS (svzipq, unaryxn, all_data, x24, none)
 DEF_SVE_FUNCTION (svamax, binary_opt_single_n, all_float, mxz)
 DEF_SVE_FUNCTION (svamin, binary_opt_single_n, all_float, mxz)
 #undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS \
+  sve_and_sme (AARCH64_FL_SVE2 | AARCH64_FL_SVE_B16B16, \
+              AARCH64_FL_SME2 | AARCH64_FL_SVE_B16B16)
+DEF_SVE_FUNCTION (svadd, binary_opt_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svclamp, clamp, h_bfloat, none)
+DEF_SVE_FUNCTION (svmax, binary_opt_single_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svmaxnm, binary_opt_single_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svmla, ternary_opt_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svmla_lane, ternary_lane, h_bfloat, none)
+DEF_SVE_FUNCTION (svmls, ternary_opt_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svmls_lane, ternary_lane, h_bfloat, none)
+DEF_SVE_FUNCTION (svmin, binary_opt_single_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svminnm, binary_opt_single_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svmul, binary_opt_n, h_bfloat, mxz)
+DEF_SVE_FUNCTION (svmul_lane, binary_lane, h_bfloat, none)
+DEF_SVE_FUNCTION (svsub, binary_opt_n, h_bfloat, mxz)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS \
+  streaming_only (AARCH64_FL_SME2 | AARCH64_FL_SVE_B16B16)
+DEF_SVE_FUNCTION_GS (svclamp, clamp, h_bfloat, x24, none)
+DEF_SVE_FUNCTION_GS (svmax, binary_opt_single_n, h_bfloat, x24, none)
+DEF_SVE_FUNCTION_GS (svmaxnm, binary_opt_single_n, h_bfloat, x24, none)
+DEF_SVE_FUNCTION_GS (svmin, binary_opt_single_n, h_bfloat, x24, none)
+DEF_SVE_FUNCTION_GS (svminnm, binary_opt_single_n, h_bfloat, x24, none)
+#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index b3d961452d3..75b35f6b546 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -139,7 +139,7 @@ CONSTEXPR const type_suffix_info 
type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
     BITS / BITS_PER_UNIT, \
     TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
     TYPE_##CLASS == TYPE_unsigned, \
-    TYPE_##CLASS == TYPE_float, \
+    TYPE_##CLASS == TYPE_float || TYPE_##CLASS == TYPE_bfloat, \
     TYPE_##CLASS != TYPE_bool, \
     TYPE_##CLASS == TYPE_bool, \
     false, \
@@ -292,6 +292,10 @@ CONSTEXPR const group_suffix_info group_suffixes[] = {
   D (s16, s8), D (s32, s16), D (s64, s32), \
   D (u16, u8), D (u32, u16), D (u64, u32)
 
+/* _bf16.  */
+#define TYPES_h_bfloat(S, D) \
+  S (bf16)
+
 /* _s16
    _u16.  */
 #define TYPES_h_integer(S, D) \
@@ -739,6 +743,7 @@ DEF_SVE_TYPES_ARRAY (bhs_integer);
 DEF_SVE_TYPES_ARRAY (bhs_data);
 DEF_SVE_TYPES_ARRAY (bhs_widen);
 DEF_SVE_TYPES_ARRAY (c);
+DEF_SVE_TYPES_ARRAY (h_bfloat);
 DEF_SVE_TYPES_ARRAY (h_integer);
 DEF_SVE_TYPES_ARRAY (hs_signed);
 DEF_SVE_TYPES_ARRAY (hs_integer);
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 1602668271e..9ec19220f4d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -5267,6 +5267,9 @@ (define_insn_and_rewrite "*cond_<optab><mode>_any_strict"
 ;; ---- [FP] General binary arithmetic corresponding to rtx codes
 ;; -------------------------------------------------------------------------
 ;; Includes post-RA forms of:
+;; - BFADD (SVE_B16B16)
+;; - BFMUL (SVE_B16B16)
+;; - BFSUB (SVE_B16B16)
 ;; - FADD
 ;; - FMUL
 ;; - FSUB
@@ -5275,34 +5278,41 @@ (define_insn_and_rewrite 
"*cond_<optab><mode>_any_strict"
 ;; Split a predicated instruction whose predicate is unused into an
 ;; unpredicated instruction.
 (define_split
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
           (match_operand:SI 4 "aarch64_sve_gp_strictness")
-          (match_operand:SVE_FULL_F 2 "register_operand")
-          (match_operand:SVE_FULL_F 3 "register_operand")]
+          (match_operand:SVE_FULL_F_BF 2 "register_operand")
+          (match_operand:SVE_FULL_F_BF 3 "register_operand")]
          <SVE_COND_FP>))]
   "TARGET_SVE
    && reload_completed
    && INTVAL (operands[4]) == SVE_RELAXED_GP"
   [(set (match_dup 0)
-       (SVE_UNPRED_FP_BINARY:SVE_FULL_F (match_dup 2) (match_dup 3)))]
+       (SVE_UNPRED_FP_BINARY:SVE_FULL_F_BF (match_dup 2) (match_dup 3)))]
 )
 
 ;; Unpredicated floating-point binary operations (post-RA only).
 ;; These are generated by the split above.
 (define_insn "*post_ra_<sve_fp_op><mode>3"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
-       (SVE_UNPRED_FP_BINARY:SVE_FULL_F
-         (match_operand:SVE_FULL_F 1 "register_operand" "w")
-         (match_operand:SVE_FULL_F 2 "register_operand" "w")))]
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand" "=w")
+       (SVE_UNPRED_FP_BINARY:SVE_FULL_F_BF
+         (match_operand:SVE_FULL_F_BF 1 "register_operand" "w")
+         (match_operand:SVE_FULL_F_BF 2 "register_operand" "w")))]
   "TARGET_SVE && reload_completed"
-  "<sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>")
+  "<b><sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>")
 
 ;; -------------------------------------------------------------------------
 ;; ---- [FP] General binary arithmetic corresponding to unspecs
 ;; -------------------------------------------------------------------------
 ;; Includes merging forms of:
+;; - BFADD   (SVE_B16B16)
+;; - BFMAX   (SVE_B16B16)
+;; - BFMAXNM (SVE_B16B16)
+;; - BFMIN   (SVE_B16B16)
+;; - BFMINNM (SVE_B16B16)
+;; - BFMUL   (SVE_B16B16)
+;; - BFSUB   (SVE_B16B16)
 ;; - FADD    (constant forms handled in the "Addition" section)
 ;; - FDIV
 ;; - FDIVR
@@ -5332,14 +5342,14 @@ (define_insn "@aarch64_sve_<optab><mode>"
 ;; Unpredicated floating-point binary operations that need to be predicated
 ;; for SVE.
 (define_expand "<optab><mode>3"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_dup 3)
           (const_int SVE_RELAXED_GP)
-          (match_operand:SVE_FULL_F 1 "<sve_pred_fp_rhs1_operand>")
-          (match_operand:SVE_FULL_F 2 "<sve_pred_fp_rhs2_operand>")]
+          (match_operand:SVE_FULL_F_BF 1 "<sve_pred_fp_rhs1_operand>")
+          (match_operand:SVE_FULL_F_BF 2 "<sve_pred_fp_rhs2_operand>")]
          SVE_COND_FP_BINARY_OPTAB))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
   {
     operands[3] = aarch64_ptrue_reg (<VPRED>mode);
   }
@@ -5364,37 +5374,37 @@ (define_insn "@aarch64_pred_<optab><mode>"
 
 ;; Predicated floating-point operations with merging.
 (define_expand "@cond_<optab><mode>"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "<sve_pred_fp_rhs1_operand>")
-             (match_operand:SVE_FULL_F 3 "<sve_pred_fp_rhs2_operand>")]
+             (match_operand:SVE_FULL_F_BF 2 "<sve_pred_fp_rhs1_operand>")
+             (match_operand:SVE_FULL_F_BF 3 "<sve_pred_fp_rhs2_operand>")]
             SVE_COND_FP_BINARY)
-          (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
+          (match_operand:SVE_FULL_F_BF 4 "aarch64_simd_reg_or_zero")]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
 )
 
 ;; Predicated floating-point operations, merging with the first input.
 (define_insn_and_rewrite "*cond_<optab><mode>_2_relaxed"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_operand 4)
              (const_int SVE_RELAXED_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")]
             SVE_COND_FP_BINARY)
           (match_dup 2)]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
   {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
-     [ w        , Upl , 0 , w ; *              ] <sve_fp_op>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%2\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ w        , Upl , 0 , w ; *              ] <b><sve_fp_op>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%2\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
   }
   "&& !rtx_equal_p (operands[1], operands[4])"
   {
@@ -5403,21 +5413,21 @@ (define_insn_and_rewrite "*cond_<optab><mode>_2_relaxed"
 )
 
 (define_insn "*cond_<optab><mode>_2_strict"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")]
             SVE_COND_FP_BINARY)
           (match_dup 2)]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
   {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
-     [ w        , Upl , 0 , w ; *              ] <sve_fp_op>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%2\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ w        , Upl , 0 , w ; *              ] <b><sve_fp_op>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%2\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
   }
 )
 
@@ -5466,21 +5476,21 @@ (define_insn "*cond_<optab><mode>_2_const_strict"
 
 ;; Predicated floating-point operations, merging with the second input.
 (define_insn_and_rewrite "*cond_<optab><mode>_3_relaxed"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_operand 4)
              (const_int SVE_RELAXED_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")]
             SVE_COND_FP_BINARY)
           (match_dup 3)]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16_rev> || !<is_bf16>)"
   {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
-     [ w        , Upl , w , 0 ; *              ] <sve_fp_op_rev>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %2.<Vetype>
-     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%3\;<sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
+     [ w        , Upl , w , 0 ; *              ] 
<b><sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
+     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%3\;<b><sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
   }
   "&& !rtx_equal_p (operands[1], operands[4])"
   {
@@ -5489,46 +5499,48 @@ (define_insn_and_rewrite "*cond_<optab><mode>_3_relaxed"
 )
 
 (define_insn "*cond_<optab><mode>_3_strict"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")]
             SVE_COND_FP_BINARY)
           (match_dup 3)]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16_rev> || !<is_bf16>)"
   {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
-     [ w        , Upl , w , 0 ; *              ] <sve_fp_op_rev>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %2.<Vetype>
-     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%3\;<sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
+     [ w        , Upl , w , 0 ; *              ] 
<b><sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
+     [ ?&w      , Upl , w , w ; yes            ] movprfx\t%0, 
%3\;<b><sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
   }
 )
 
 ;; Predicated floating-point operations, merging with an independent value.
 (define_insn_and_rewrite "*cond_<optab><mode>_any_relaxed"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_operand 5)
              (const_int SVE_RELAXED_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")]
             SVE_COND_FP_BINARY)
-          (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
+          (match_operand:SVE_FULL_F_BF 4 "aarch64_simd_reg_or_zero")]
          UNSPEC_SEL))]
   "TARGET_SVE
+   && (<supports_bf16> || !<is_bf16>)
    && !rtx_equal_p (operands[2], operands[4])
-   && !rtx_equal_p (operands[3], operands[4])"
-  {@ [ cons: =0 , 1   , 2 , 3 , 4   ]
-     [ &w       , Upl , 0 , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , w , 0 , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
-     [ &w       , Upl , w , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%2.<Vetype>\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , w , w , 0   ] movprfx\t%0.<Vetype>, %1/m, 
%2.<Vetype>\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w , w   ] #
+   && !((<supports_bf16_rev> || !<is_bf16>)
+       && rtx_equal_p (operands[3], operands[4]))"
+  {@ [ cons: =0 , 1   , 2 , 3 , 4  ; attrs: is_rev ]
+     [ &w       , Upl , 0 , w , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , w , 0 , Dz ; true ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
+     [ &w       , Upl , w , w , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%2.<Vetype>\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , w , w , 0  ; *    ] movprfx\t%0.<Vetype>, %1/m, 
%2.<Vetype>\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w , w  ; *    ] #
   }
   "&& 1"
   {
@@ -5545,30 +5557,34 @@ (define_insn_and_rewrite 
"*cond_<optab><mode>_any_relaxed"
     else
       FAIL;
   }
-  [(set_attr "movprfx" "yes")]
+  [(set_attr "movprfx" "yes")
+   (set_attr "is_bf16" "<is_bf16>")
+   (set_attr "supports_bf16_rev" "<supports_bf16_rev>")]
 )
 
 (define_insn_and_rewrite "*cond_<optab><mode>_any_strict"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")]
             SVE_COND_FP_BINARY)
-          (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
+          (match_operand:SVE_FULL_F_BF 4 "aarch64_simd_reg_or_zero")]
          UNSPEC_SEL))]
   "TARGET_SVE
+   && (<supports_bf16> || !<is_bf16>)
    && !rtx_equal_p (operands[2], operands[4])
-   && !rtx_equal_p (operands[3], operands[4])"
-  {@ [ cons: =0 , 1   , 2 , 3 , 4   ]
-     [ &w       , Upl , 0 , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , w , 0 , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
-     [ &w       , Upl , w , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%2.<Vetype>\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , w , w , 0   ] movprfx\t%0.<Vetype>, %1/m, 
%2.<Vetype>\;<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w , w   ] #
+   && !((<supports_bf16_rev> || !<is_bf16>)
+       && rtx_equal_p (operands[3], operands[4]))"
+  {@ [ cons: =0 , 1   , 2 , 3 , 4  ; attrs: is_rev ]
+     [ &w       , Upl , 0 , w , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , w , 0 , Dz ; true ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fp_op_rev>\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>
+     [ &w       , Upl , w , w , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%2.<Vetype>\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , w , w , 0  ; *    ] movprfx\t%0.<Vetype>, %1/m, 
%2.<Vetype>\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w , w  ; *    ] #
   }
   "&& reload_completed
    && register_operand (operands[4], <MODE>mode)
@@ -5578,7 +5594,9 @@ (define_insn_and_rewrite "*cond_<optab><mode>_any_strict"
                                             operands[4], operands[1]));
     operands[4] = operands[2] = operands[0];
   }
-  [(set_attr "movprfx" "yes")]
+  [(set_attr "movprfx" "yes")
+   (set_attr "is_bf16" "<is_bf16>")
+   (set_attr "supports_bf16_rev" "<supports_bf16_rev>")]
 )
 
 ;; Same for operations that take a 1-bit constant.
@@ -6390,6 +6408,7 @@ (define_insn_and_rewrite 
"*aarch64_cond_abd<mode>_any_strict"
 ;; ---- [FP] Multiplication
 ;; -------------------------------------------------------------------------
 ;; Includes:
+;; - BFMUL (SVE_B16B16)
 ;; - FMUL
 ;; -------------------------------------------------------------------------
 
@@ -6417,15 +6436,15 @@ (define_insn "@aarch64_pred_<optab><mode>"
 
 ;; Unpredicated multiplication by selected lanes.
 (define_insn "@aarch64_mul_lane_<mode>"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
-       (mult:SVE_FULL_F
-         (unspec:SVE_FULL_F
-           [(match_operand:SVE_FULL_F 2 "register_operand" "<sve_lane_con>")
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand" "=w")
+       (mult:SVE_FULL_F_BF
+         (unspec:SVE_FULL_F_BF
+           [(match_operand:SVE_FULL_F_BF 2 "register_operand" "<sve_lane_con>")
             (match_operand:SI 3 "const_int_operand")]
            UNSPEC_SVE_LANE_SELECT)
-         (match_operand:SVE_FULL_F 1 "register_operand" "w")))]
+         (match_operand:SVE_FULL_F_BF 1 "register_operand" "w")))]
   "TARGET_SVE"
-  "fmul\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]"
+  "<b>fmul\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]"
 )
 
 ;; -------------------------------------------------------------------------
@@ -7345,15 +7364,15 @@ (define_insn "@aarch64_sve_add_<optab><vsi2qi>"
 
 ;; Unpredicated floating-point ternary operations.
 (define_expand "<optab><mode>4"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_dup 4)
           (const_int SVE_RELAXED_GP)
-          (match_operand:SVE_FULL_F 1 "register_operand")
-          (match_operand:SVE_FULL_F 2 "register_operand")
-          (match_operand:SVE_FULL_F 3 "register_operand")]
+          (match_operand:SVE_FULL_F_BF 1 "register_operand")
+          (match_operand:SVE_FULL_F_BF 2 "register_operand")
+          (match_operand:SVE_FULL_F_BF 3 "register_operand")]
          SVE_COND_FP_TERNARY))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
   {
     operands[4] = aarch64_ptrue_reg (<VPRED>mode);
   }
@@ -7361,37 +7380,39 @@ (define_expand "<optab><mode>4"
 
 ;; Predicated floating-point ternary operations.
 (define_insn "@aarch64_pred_<optab><mode>"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
           (match_operand:SI 5 "aarch64_sve_gp_strictness")
-          (match_operand:SVE_FULL_F 2 "register_operand")
-          (match_operand:SVE_FULL_F 3 "register_operand")
-          (match_operand:SVE_FULL_F 4 "register_operand")]
+          (match_operand:SVE_FULL_F_BF 2 "register_operand")
+          (match_operand:SVE_FULL_F_BF 3 "register_operand")
+          (match_operand:SVE_FULL_F_BF 4 "register_operand")]
          SVE_COND_FP_TERNARY))]
-  "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 2  , 3 , 4 ; attrs: movprfx ]
-     [ w        , Upl , %w , w , 0 ; *              ] 
<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ w        , Upl , 0  , w , w ; *              ] 
<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
-     [ ?&w      , Upl , w  , w , w ; yes            ] movprfx\t%0, 
%4\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
+  {@ [ cons: =0 , 1   , 2  , 3 , 4 ; attrs: movprfx , is_rev ]
+     [ w        , Upl , %w , w , 0 ; *   , *    ] 
<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ w        , Upl , 0  , w , w ; *   , true ] 
<b><sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+     [ ?&w      , Upl , w  , w , w ; yes , *    ] movprfx\t%0, 
%4\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
   }
+  [(set_attr "is_bf16" "<is_bf16>")
+   (set_attr "supports_bf16_rev" "false")]
 )
 
 ;; Predicated floating-point ternary operations with merging.
 (define_expand "@cond_<optab><mode>"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")
-             (match_operand:SVE_FULL_F 4 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")
+             (match_operand:SVE_FULL_F_BF 4 "register_operand")]
             SVE_COND_FP_TERNARY)
-          (match_operand:SVE_FULL_F 5 "aarch64_simd_reg_or_zero")]
+          (match_operand:SVE_FULL_F_BF 5 "aarch64_simd_reg_or_zero")]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
 {
   /* Swap the multiplication operands if the fallback value is the
      second of the two.  */
@@ -7448,22 +7469,22 @@ (define_insn "*cond_<optab><mode>_2_strict"
 ;; Predicated floating-point ternary operations, merging with the
 ;; third input.
 (define_insn_and_rewrite "*cond_<optab><mode>_4_relaxed"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_operand 5)
              (const_int SVE_RELAXED_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")
-             (match_operand:SVE_FULL_F 4 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")
+             (match_operand:SVE_FULL_F_BF 4 "register_operand")]
             SVE_COND_FP_TERNARY)
           (match_dup 4)]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
   {@ [ cons: =0 , 1   , 2 , 3 , 4 ; attrs: movprfx ]
-     [ w        , Upl , w , w , 0 ; *              ] 
<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w , w ; yes            ] movprfx\t%0, 
%4\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ w        , Upl , w , w , 0 ; *              ] 
<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w , w ; yes            ] movprfx\t%0, 
%4\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
   }
   "&& !rtx_equal_p (operands[1], operands[5])"
   {
@@ -7472,51 +7493,52 @@ (define_insn_and_rewrite "*cond_<optab><mode>_4_relaxed"
 )
 
 (define_insn "*cond_<optab><mode>_4_strict"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")
-             (match_operand:SVE_FULL_F 4 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")
+             (match_operand:SVE_FULL_F_BF 4 "register_operand")]
             SVE_COND_FP_TERNARY)
           (match_dup 4)]
          UNSPEC_SEL))]
-  "TARGET_SVE"
+  "TARGET_SVE && (<supports_bf16> || !<is_bf16>)"
   {@ [ cons: =0 , 1   , 2 , 3 , 4 ; attrs: movprfx ]
-     [ w        , Upl , w , w , 0 ; *              ] 
<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w , w ; yes            ] movprfx\t%0, 
%4\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ w        , Upl , w , w , 0 ; *              ] 
<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w , w ; yes            ] movprfx\t%0, 
%4\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
   }
 )
 
 ;; Predicated floating-point ternary operations, merging with an
 ;; independent value.
 (define_insn_and_rewrite "*cond_<optab><mode>_any_relaxed"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_operand 6)
              (const_int SVE_RELAXED_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")
-             (match_operand:SVE_FULL_F 4 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")
+             (match_operand:SVE_FULL_F_BF 4 "register_operand")]
             SVE_COND_FP_TERNARY)
-          (match_operand:SVE_FULL_F 5 "aarch64_simd_reg_or_zero")]
+          (match_operand:SVE_FULL_F_BF 5 "aarch64_simd_reg_or_zero")]
          UNSPEC_SEL))]
   "TARGET_SVE
-   && !rtx_equal_p (operands[2], operands[5])
-   && !rtx_equal_p (operands[3], operands[5])
+   && (<supports_bf16> || !<is_bf16>)
+   && (<is_bf16> || !rtx_equal_p (operands[2], operands[5]))
+   && (<is_bf16> || !rtx_equal_p (operands[3], operands[5]))
    && !rtx_equal_p (operands[4], operands[5])"
-  {@ [ cons: =0 , 1   , 2 , 3 , 4 , 5   ]
-     [ &w       , Upl , w , w , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , w , w , 0 , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , 0 , w , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
-     [ &w       , Upl , w , 0 , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %4.<Vetype>
-     [ &w       , Upl , w , w , w , 0   ] movprfx\t%0.<Vetype>, %1/m, 
%4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w , w , w   ] #
+  {@ [ cons: =0 , 1   , 2 , 3 , 4 , 5  ; attrs: is_rev ]
+     [ &w       , Upl , w , w , w , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%4.<Vetype>\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , w , w , 0 , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , 0 , w , w , Dz ; true ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+     [ &w       , Upl , w , 0 , w , Dz ; true ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fmad_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %4.<Vetype>
+     [ &w       , Upl , w , w , w , 0  ; *    ] movprfx\t%0.<Vetype>, %1/m, 
%4.<Vetype>\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w , w , w  ; *    ] #
   }
   "&& 1"
   {
@@ -7533,33 +7555,36 @@ (define_insn_and_rewrite 
"*cond_<optab><mode>_any_relaxed"
     else
       FAIL;
   }
-  [(set_attr "movprfx" "yes")]
+  [(set_attr "movprfx" "yes")
+   (set_attr "is_bf16" "<is_bf16>")
+   (set_attr "supports_bf16_rev" "false")]
 )
 
 (define_insn_and_rewrite "*cond_<optab><mode>_any_strict"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
          [(match_operand:<VPRED> 1 "register_operand")
-          (unspec:SVE_FULL_F
+          (unspec:SVE_FULL_F_BF
             [(match_dup 1)
              (const_int SVE_STRICT_GP)
-             (match_operand:SVE_FULL_F 2 "register_operand")
-             (match_operand:SVE_FULL_F 3 "register_operand")
-             (match_operand:SVE_FULL_F 4 "register_operand")]
+             (match_operand:SVE_FULL_F_BF 2 "register_operand")
+             (match_operand:SVE_FULL_F_BF 3 "register_operand")
+             (match_operand:SVE_FULL_F_BF 4 "register_operand")]
             SVE_COND_FP_TERNARY)
-          (match_operand:SVE_FULL_F 5 "aarch64_simd_reg_or_zero")]
+          (match_operand:SVE_FULL_F_BF 5 "aarch64_simd_reg_or_zero")]
          UNSPEC_SEL))]
   "TARGET_SVE
-   && !rtx_equal_p (operands[2], operands[5])
-   && !rtx_equal_p (operands[3], operands[5])
+   && (<supports_bf16> || !<is_bf16>)
+   && (<is_bf16> || !rtx_equal_p (operands[2], operands[5]))
+   && (<is_bf16> || !rtx_equal_p (operands[3], operands[5]))
    && !rtx_equal_p (operands[4], operands[5])"
-  {@ [ cons: =0 , 1   , 2 , 3 , 4 , 5   ]
-     [ &w       , Upl , w , w , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , w , w , 0 , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ &w       , Upl , 0 , w , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
-     [ &w       , Upl , w , 0 , w , Dz  ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %4.<Vetype>
-     [ &w       , Upl , w , w , w , 0   ] movprfx\t%0.<Vetype>, %1/m, 
%4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
-     [ ?&w      , Upl , w , w , w , w   ] #
+  {@ [ cons: =0 , 1   , 2 , 3 , 4 , 5  ; attrs: is_rev ]
+     [ &w       , Upl , w , w , w , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%4.<Vetype>\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , w , w , 0 , Dz ; *    ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ &w       , Upl , 0 , w , w , Dz ; true ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+     [ &w       , Upl , w , 0 , w , Dz ; true ] movprfx\t%0.<Vetype>, %1/z, 
%0.<Vetype>\;<b><sve_fmad_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %4.<Vetype>
+     [ &w       , Upl , w , w , w , 0  ; *    ] movprfx\t%0.<Vetype>, %1/m, 
%4.<Vetype>\;<b><sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+     [ ?&w      , Upl , w , w , w , w  ; *    ] #
   }
   "&& reload_completed
    && register_operand (operands[5], <MODE>mode)
@@ -7569,25 +7594,27 @@ (define_insn_and_rewrite 
"*cond_<optab><mode>_any_strict"
                                             operands[5], operands[1]));
     operands[5] = operands[4] = operands[0];
   }
-  [(set_attr "movprfx" "yes")]
+  [(set_attr "movprfx" "yes")
+   (set_attr "is_bf16" "<is_bf16>")
+   (set_attr "supports_bf16_rev" "false")]
 )
 
 ;; Unpredicated FMLA and FMLS by selected lanes.  It doesn't seem worth using
 ;; (fma ...) since target-independent code won't understand the indexing.
 (define_insn "@aarch64_<optab>_lane_<mode>"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
-         [(match_operand:SVE_FULL_F 1 "register_operand")
-          (unspec:SVE_FULL_F
-            [(match_operand:SVE_FULL_F 2 "register_operand")
+  [(set (match_operand:SVE_FULL_F_BF 0 "register_operand")
+       (unspec:SVE_FULL_F_BF
+         [(match_operand:SVE_FULL_F_BF 1 "register_operand")
+          (unspec:SVE_FULL_F_BF
+            [(match_operand:SVE_FULL_F_BF 2 "register_operand")
              (match_operand:SI 3 "const_int_operand")]
             UNSPEC_SVE_LANE_SELECT)
-          (match_operand:SVE_FULL_F 4 "register_operand")]
+          (match_operand:SVE_FULL_F_BF 4 "register_operand")]
          SVE_FP_TERNARY_LANE))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1 , 2              , 4 ; attrs: movprfx ]
-     [ w        , w , <sve_lane_con> , 0 ; *              ] 
<sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]
-     [ ?&w      , w , <sve_lane_con> , w ; yes            ] movprfx\t%0, 
%4\;<sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]
+     [ w        , w , <sve_lane_con> , 0 ; *              ] 
<b><sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]
+     [ ?&w      , w , <sve_lane_con> , w ; yes            ] movprfx\t%0, 
%4\;<b><sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]
   }
 )
 
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 9383c777d80..a721a6889b1 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -56,6 +56,7 @@
 ;; ---- [INT] General binary arithmetic that maps to unspecs
 ;; ---- [INT] Saturating binary arithmetic
 ;; ---- [INT] Saturating left shifts
+;; ---- [FP] Non-widening bfloat16 arithmetic
 ;; ---- [FP] Clamp to minimum/maximum
 ;;
 ;; == Uniform ternary arithmnetic
@@ -1317,52 +1318,84 @@ (define_insn_and_rewrite "*cond_<sve_int_op><mode>_any"
   [(set_attr "movprfx" "yes")]
 )
 
+;; -------------------------------------------------------------------------
+;; ---- [FP] Non-widening bfloat16 arithmetic
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - BFADD
+;; - BFMAX
+;; - BFMAXNM
+;; - BFMIN
+;; - BFMINNM
+;; - BFMUL
+;; -------------------------------------------------------------------------
+
+;; Predicated B16B16 binary operations.
+(define_insn "@aarch64_pred_<optab><mode>"
+  [(set (match_operand:VNx8BF_ONLY 0 "register_operand")
+       (unspec:VNx8BF_ONLY
+         [(match_operand:<VPRED> 1 "register_operand")
+          (match_operand:SI 4 "aarch64_sve_gp_strictness")
+          (match_operand:VNx8BF_ONLY 2 "register_operand")
+          (match_operand:VNx8BF_ONLY 3 "register_operand")]
+         SVE_COND_FP_BINARY_OPTAB))]
+  "TARGET_SSVE_B16B16 && <supports_bf16>"
+  {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx , is_rev ]
+     [ w        , Upl , 0 , w ; *    , *   ] <b><sve_fp_op>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %3.<Vetype>
+     [ w        , Upl , w , 0 ; *   , true ] <b><sve_fp_op_rev>\t%0.<Vetype>, 
%1/m, %0.<Vetype>, %2.<Vetype>
+     [ ?&w      , Upl , w , w ; yes , *    ] movprfx\t%0, 
%2\;<b><sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+  }
+  [(set_attr "is_bf16" "<is_bf16>")
+   (set_attr "supports_bf16_rev" "<supports_bf16_rev>")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [FP] Clamp to minimum/maximum
 ;; -------------------------------------------------------------------------
+;; - BFCLAMP (SVE_B16B16)
 ;; - FCLAMP
 ;; -------------------------------------------------------------------------
 
 ;; The minimum is applied after the maximum, which matters if the maximum
 ;; bound is (unexpectedly) less than the minimum bound.
 (define_insn "@aarch64_sve_fclamp<mode>"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
-         [(unspec:SVE_FULL_F
-            [(match_operand:SVE_FULL_F 1 "register_operand")
-             (match_operand:SVE_FULL_F 2 "register_operand")]
+  [(set (match_operand:SVE_CLAMP_F 0 "register_operand")
+       (unspec:SVE_CLAMP_F
+         [(unspec:SVE_CLAMP_F
+            [(match_operand:SVE_CLAMP_F 1 "register_operand")
+             (match_operand:SVE_CLAMP_F 2 "register_operand")]
             UNSPEC_FMAXNM)
-          (match_operand:SVE_FULL_F 3 "register_operand")]
+          (match_operand:SVE_CLAMP_F 3 "register_operand")]
          UNSPEC_FMINNM))]
-  "TARGET_SVE2p1_OR_SME2"
+  ""
   {@ [cons: =0,  1, 2, 3; attrs: movprfx]
-     [       w, %0, w, w; *             ] fclamp\t%0.<Vetype>, %2.<Vetype>, 
%3.<Vetype>
-     [     ?&w,  w, w, w; yes           ] movprfx\t%0, 
%1\;fclamp\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>
+     [       w, %0, w, w; *             ] <b>fclamp\t%0.<Vetype>, %2.<Vetype>, 
%3.<Vetype>
+     [     ?&w,  w, w, w; yes           ] movprfx\t%0, 
%1\;<b>fclamp\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>
   }
 )
 
 (define_insn_and_split "*aarch64_sve_fclamp<mode>_x"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+  [(set (match_operand:SVE_CLAMP_F 0 "register_operand")
+       (unspec:SVE_CLAMP_F
          [(match_operand 4)
           (const_int SVE_RELAXED_GP)
-          (unspec:SVE_FULL_F
+          (unspec:SVE_CLAMP_F
             [(match_operand 5)
              (const_int SVE_RELAXED_GP)
-             (match_operand:SVE_FULL_F 1 "register_operand")
-             (match_operand:SVE_FULL_F 2 "register_operand")]
+             (match_operand:SVE_CLAMP_F 1 "register_operand")
+             (match_operand:SVE_CLAMP_F 2 "register_operand")]
             UNSPEC_COND_FMAXNM)
-          (match_operand:SVE_FULL_F 3 "register_operand")]
+          (match_operand:SVE_CLAMP_F 3 "register_operand")]
          UNSPEC_COND_FMINNM))]
-  "TARGET_SVE2p1_OR_SME2"
+  ""
   {@ [cons: =0,  1, 2, 3; attrs: movprfx]
      [       w, %0, w, w; *             ] #
      [     ?&w,  w, w, w; yes           ] #
   }
   "&& true"
   [(set (match_dup 0)
-       (unspec:SVE_FULL_F
-         [(unspec:SVE_FULL_F
+       (unspec:SVE_CLAMP_F
+         [(unspec:SVE_CLAMP_F
             [(match_dup 1)
              (match_dup 2)]
             UNSPEC_FMAXNM)
@@ -1382,7 +1415,7 @@ (define_insn "@aarch64_sve_fclamp_single<mode>"
             (match_operand:<VSINGLE> 3 "register_operand" "w"))]
          UNSPEC_FMINNM))]
   "TARGET_STREAMING_SME2"
-  "fclamp\t%0, %2.<Vetype>, %3.<Vetype>"
+  "<b>fclamp\t%0, %2.<Vetype>, %3.<Vetype>"
 )
 
 ;; =========================================================================
@@ -2289,7 +2322,7 @@ (define_insn "*aarch64_sve_<maxmin_uns_op><mode>"
           (match_operand:SVE_Fx24 2 "aligned_register_operand" 
"Uw<vector_count>")]
          SVE_FP_BINARY_MULTI))]
   "TARGET_STREAMING_SME2"
-  "<maxmin_uns_op>\t%0, %0, %2"
+  "<b><maxmin_uns_op>\t%0, %0, %2"
 )
 
 (define_insn "@aarch64_sve_single_<maxmin_uns_op><mode>"
@@ -2300,7 +2333,7 @@ (define_insn "@aarch64_sve_single_<maxmin_uns_op><mode>"
             (match_operand:<VSINGLE> 2 "register_operand" "x"))]
          SVE_FP_BINARY_MULTI))]
   "TARGET_STREAMING_SME2"
-  "<maxmin_uns_op>\t%0, %0, %2.<Vetype>"
+  "<b><maxmin_uns_op>\t%0, %0, %2.<Vetype>"
 )
 
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 00bcf18ae97..9cc9dc06c6a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22924,7 +22924,8 @@ aarch64_sve_float_arith_immediate_p (rtx x, bool 
negate_p)
   rtx elt;
   REAL_VALUE_TYPE r;
 
-  if (!const_vec_duplicate_p (x, &elt)
+  if (GET_MODE_INNER (GET_MODE (x)) == BFmode
+      || !const_vec_duplicate_p (x, &elt)
       || !CONST_DOUBLE_P (elt))
     return false;
 
@@ -22948,7 +22949,8 @@ aarch64_sve_float_mul_immediate_p (rtx x)
 {
   rtx elt;
 
-  return (const_vec_duplicate_p (x, &elt)
+  return (GET_MODE_INNER (GET_MODE (x)) != BFmode
+         && const_vec_duplicate_p (x, &elt)
          && CONST_DOUBLE_P (elt)
          && (real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf)
              || real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconst2)));
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f07b2c49f0d..c22a6bb69d8 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -358,6 +358,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 /* Same with streaming mode enabled.  */
 #define TARGET_STREAMING_SME2 (TARGET_STREAMING && TARGET_SME2)
 
+#define TARGET_SME_B16B16 AARCH64_HAVE_ISA (SME_B16B16)
+
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3 AARCH64_HAVE_ISA (V8_3A)
 
@@ -486,6 +488,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 
 /* Combinatorial tests.  */
 
+#define TARGET_SVE2_OR_SME2 \
+  ((TARGET_SVE2 || TARGET_STREAMING) \
+   && (TARGET_SME2 || TARGET_NON_STREAMING))
+
 /* There's no need to check TARGET_SME for streaming or streaming-compatible
    functions, since streaming mode itself implies SME.  */
 #define TARGET_SVE2p1_OR_SME (TARGET_SVE2p1 || TARGET_STREAMING)
@@ -494,6 +500,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
   ((TARGET_SVE2p1 || TARGET_STREAMING) \
    && (TARGET_SME2 || TARGET_NON_STREAMING))
 
+#define TARGET_SSVE_B16B16 \
+  (AARCH64_HAVE_ISA (SVE_B16B16) && TARGET_SVE2_OR_SME2)
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8d10197c9e8..c59ca45c733 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -509,11 +509,32 @@ (define_attr "arch_enabled" "no,yes"
     (const_string "yes")
     (const_string "no")))
 
+;; True if this a bfloat16 operation.  Only used for certain instructions.
+(define_attr "is_bf16" "false,true" (const_string "false"))
+
+;; True if this alternative uses an SVE instruction in which the operands
+;; are reversed.  This can happen for naturally commutative operations
+;; such as FADD, or when using things like FSUBR in preference to FSUB,
+;; or similarly when using things like FMAD in preference to FMLA.
+(define_attr "is_rev" "false,true" (const_string "false"))
+
+;; True if this operation supports is_rev-style instructions for bfloat16.
+(define_attr "supports_bf16_rev" "false,true" (const_string "false"))
+
+;; Selectively enable alternatives based on the mode of the operation.
+(define_attr "mode_enabled" "false,true"
+  (cond [(and (eq_attr "is_bf16" "true")
+             (eq_attr "is_rev" "true")
+             (eq_attr "supports_bf16_rev" "false"))
+        (const_string "false")]
+       (const_string "true")))
+
 ;; Attribute that controls whether an alternative is enabled or not.
-;; Currently it is only used to disable alternatives which touch fp or simd
-;; registers when -mgeneral-regs-only is specified or to require a special
-;; architecture support.
-(define_attr "enabled" "no,yes" (attr "arch_enabled"))
+(define_attr "enabled" "no,yes"
+  (if_then_else (and (eq_attr "arch_enabled" "yes")
+                    (eq_attr "mode_enabled" "true"))
+               (const_string "yes")
+               (const_string "no")))
 
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 0d0ce8cd387..3852f0d42fb 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -455,6 +455,14 @@ (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF])
 ;; Fully-packed SVE floating-point vector modes and their scalar equivalents.
 (define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF])
 
+(define_mode_iterator SVE_FULL_F_BF [(VNx8BF "TARGET_SSVE_B16B16") SVE_FULL_F])
+
+;; Modes for which (B)FCLAMP is supported.
+(define_mode_iterator SVE_CLAMP_F [(VNx8BF "TARGET_SSVE_B16B16")
+                                  (VNx8HF "TARGET_SVE2p1_OR_SME2")
+                                  (VNx4SF "TARGET_SVE2p1_OR_SME2")
+                                  (VNx2DF "TARGET_SVE2p1_OR_SME2")])
+
 ;; Fully-packed SVE integer vector modes that have 8-bit or 16-bit elements.
 (define_mode_iterator SVE_FULL_BHI [VNx16QI VNx8HI])
 
@@ -643,7 +651,9 @@ (define_mode_iterator SVE_BHSx24 [VNx32QI VNx16HI VNx8SI
 (define_mode_iterator SVE_Ix24 [VNx32QI VNx16HI VNx8SI VNx4DI
                                VNx64QI VNx32HI VNx16SI VNx8DI])
 
-(define_mode_iterator SVE_Fx24 [VNx16HF VNx8SF VNx4DF
+(define_mode_iterator SVE_Fx24 [(VNx16BF "TARGET_SSVE_B16B16")
+                               (VNx32BF "TARGET_SSVE_B16B16")
+                               VNx16HF VNx8SF VNx4DF
                                VNx32HF VNx16SF VNx8DF])
 
 (define_mode_iterator SVE_SFx24 [VNx8SF VNx16SF])
@@ -2481,7 +2491,8 @@ (define_mode_attr narrower_mask [(VNx8HI "0x81") (VNx4HI 
"0x41")
 ;; The constraint to use for an SVE [SU]DOT, FMUL, FMLA or FMLS lane index.
 (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x")
                                                          (V2DI "x")
-                               (VNx8HF "y") (VNx4SF "y") (VNx2DF "x")])
+                               (VNx8BF "y") (VNx8HF "y")
+                               (VNx4SF "y") (VNx2DF "x")])
 
 ;; The constraint to use for an SVE FCMLA lane index.
 (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")])
@@ -2491,8 +2502,13 @@ (define_mode_attr vec_or_offset [(V8QI "vec") (V16QI 
"vec") (V4HI "vec")
                                 (V2DI "vec") (DI "offset")])
 
 (define_mode_attr b [(VNx8BF "b") (VNx8HF "") (VNx4SF "") (VNx2DF "")
-                    (VNx16BF "b") (VNx16HF "")
-                    (VNx32BF "b") (VNx32HF "")])
+                    (VNx16BF "b") (VNx16HF "") (VNx8SF "") (VNx4DF "")
+                    (VNx32BF "b") (VNx32HF "") (VNx16SF "") (VNx8DF "")])
+
+(define_mode_attr is_bf16 [(VNx8BF "true")
+                          (VNx8HF "false")
+                          (VNx4SF "false")
+                          (VNx2DF "false")])
 
 (define_mode_attr aligned_operand [(VNx16QI "register_operand")
                                   (VNx8HI "register_operand")
@@ -4552,6 +4568,45 @@ (define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad")
                              (UNSPEC_COND_FNMLA "fnmad")
                              (UNSPEC_COND_FNMLS "fnmsb")])
 
+(define_int_attr supports_bf16 [(UNSPEC_COND_FADD "true")
+                               (UNSPEC_COND_FAMAX "false")
+                               (UNSPEC_COND_FAMIN "false")
+                               (UNSPEC_COND_FDIV "false")
+                               (UNSPEC_COND_FMAX "true")
+                               (UNSPEC_COND_FMAXNM "true")
+                               (UNSPEC_COND_FMIN "true")
+                               (UNSPEC_COND_FMINNM "true")
+                               (UNSPEC_COND_FMLA "true")
+                               (UNSPEC_COND_FMLS "true")
+                               (UNSPEC_COND_FMUL "true")
+                               (UNSPEC_COND_FMULX "false")
+                               (UNSPEC_COND_FMULX "false")
+                               (UNSPEC_COND_FNMLA "false")
+                               (UNSPEC_COND_FNMLS "false")
+                               (UNSPEC_COND_FSUB "true")
+                               (UNSPEC_COND_SMAX "true")
+                               (UNSPEC_COND_SMIN "true")])
+
+;; Differs from supports_bf16 only in UNSPEC_COND_FSUB.
+(define_int_attr supports_bf16_rev [(UNSPEC_COND_FADD "true")
+                                   (UNSPEC_COND_FAMAX "false")
+                                   (UNSPEC_COND_FAMIN "false")
+                                   (UNSPEC_COND_FDIV "false")
+                                   (UNSPEC_COND_FMAX "true")
+                                   (UNSPEC_COND_FMAXNM "true")
+                                   (UNSPEC_COND_FMIN "true")
+                                   (UNSPEC_COND_FMINNM "true")
+                                   (UNSPEC_COND_FMLA "true")
+                                   (UNSPEC_COND_FMLS "true")
+                                   (UNSPEC_COND_FMUL "true")
+                                   (UNSPEC_COND_FMULX "false")
+                                   (UNSPEC_COND_FMULX "false")
+                                   (UNSPEC_COND_FNMLA "false")
+                                   (UNSPEC_COND_FNMLS "false")
+                                   (UNSPEC_COND_FSUB "false")
+                                   (UNSPEC_COND_SMAX "true")
+                                   (UNSPEC_COND_SMIN "true")])
+
 ;; The register constraint to use for the final operand in a binary BRK.
 (define_int_attr brk_reg_con [(UNSPEC_BRKN "0")
                              (UNSPEC_BRKPA "Upa") (UNSPEC_BRKPB "Upa")])
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 6ad9a4bd8b9..b4b0349e80c 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -923,6 +923,7 @@ (define_predicate "aarch64_sve_float_mul_immediate"
 
 (define_predicate "aarch64_sve_float_maxmin_immediate"
   (and (match_code "const_vector")
+       (match_test "GET_MODE_INNER (GET_MODE (op)) != BFmode")
        (ior (match_test "op == CONST0_RTX (GET_MODE (op))")
            (match_test "op == CONST1_RTX (GET_MODE (op))"))))
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4a494f6a668..1e50edb53ac 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21809,6 +21809,9 @@ Enable the RCpc3 (Release Consistency) extension.
 Enable the fp8 (8-bit floating point) extension.
 @item faminmax
 Enable the Floating Point Absolute Maximum/Minimum extension.
+@item sve-b16b16
+Enable the SVE non-widening brain floating-point (@code{bf16}) extension.
+This only has an effect when @code{sve2} or @code{sme2} are also enabled.
 
 @end table
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c 
b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
index 23ebe5e4f50..9dd346faf96 100644
--- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
@@ -83,3 +83,44 @@
 #ifndef __ARM_FEATURE_SME_F64F64
 #error Foo
 #endif
+
+#pragma GCC target "+nothing+sve-b16b16"
+#ifdef __ARM_FEATURE_SVE_B16B16
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_SVE
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_SME
+#error Foo
+#endif
+
+#pragma GCC target "+nothing+sve-b16b16+sve"
+#ifdef __ARM_FEATURE_SVE_B16B16
+#error Foo
+#endif
+#ifndef __ARM_FEATURE_SVE
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_SME
+#error Foo
+#endif
+
+#pragma GCC target "+nothing+sve-b16b16+sve2"
+#ifndef __ARM_FEATURE_SVE_B16B16
+#error Foo
+#endif
+#ifndef __ARM_FEATURE_SVE
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_SME
+#error Foo
+#endif
+
+#pragma GCC target "+nothing+sve-b16b16+sme2"
+#ifndef __ARM_FEATURE_SVE_B16B16
+#error Foo
+#endif
+#ifndef __ARM_FEATURE_SME
+#error Foo
+#endif
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index fd58682cae3..44671adea12 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12121,7 +12121,7 @@ proc check_effective_target_aarch64_tiny { } {
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
                          "i8mm" "f32mm" "f64mm" "bf16" "sb" "sve2" "ls64"
-                         "sme" "sme-i16i64" "sme2" } {
+                         "sme" "sme-i16i64" "sme2" "sve-b16b16" } {
     eval [string map [list FUNC $aarch64_ext] {
        proc check_effective_target_aarch64_asm_FUNC_ok { } {
          if { [istarget aarch64*-*-*] } {
-- 
2.25.1

tests.diff.xz
Description: application/xz

[PATCH 5/8] aarch64: Add support for SVE_B16B16

Reply via email to