Went through the patch series and it looks good to me, no additional comments. But I'm not a maintainer so would wait for others to review it.

Thanks,
Saurabh

On 12/18/2025 5:14 PM, Claudio Bantaloukas wrote:

This patch series completes support for SME2 and SME2p1 intrinsics relative to
modal 8bit floating point types.

- The first patch in the series introduces tests for using luti intrinsics with
   mf8 that was already working since their introduction, now that their use is
   documented in ACLE.
- The second patch extends the definitions of existing non-interpreting sve2/sme
   intrinsics to support mfloat8 types.
- The third and fourth patches add widening and narrowing sme2 fp8 conversions
   respectively (svcvt).
- The fifth patch adds multi-vector floating-point adjust exponent intrinsics
   (svscale).
- The sixth patch adds support for the sme-f8f16 and sme-f8f32 arch features
   and related defines.
- Patch 7 adds Multi-vector 8-bit floating-point multiply-add long intrinsics.
- Patch 8 adds 8-bit floating-point sum of outer products and accumulate
   intrinsics.
- Patch 9 adds 8-bit floating point dot product intrinsics.

Compared to version 1 of this patch series:
- updated commit messages per requests.
- fixed gating of intrinsics in patch four (narrowing sme2 conversions to fp8).
- introduced aarch64_output_asm_with_extra_operand function and updated insns in
   aarch64-sme.md to no longer use out of bounds operands.

Compared to version 2 of this patch series:
- replaced aarch64_output_asm_with_extra_operand with
   aarch64_output_asm_with_offset which does not require allocating space for
   operands on the stack in patch 7.

Compared to version 3 of this patch series:
- Entirely removed functions using snprintf and returned to existing use of
   operands array as the array is long enough for this use.
- Addressed Richard Ball's feedback (renamed test files, improved readability in
   exp file, formatting).

Compared to version 4 of this patch series:
- Reposting patches inline rather than as attachments.
- Accidentally posted the last 8 rather than 9 patches. Thanks to Artemiy who
   spotted this.

Regression tested on aarch64-unknown-linux-gnu.

OK to merge?

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (8):
   aarch64: add tests for sme mfloat8 luti functions
   aarch64: extend sme intrinsics to mfp8
   aarch64: add widening sme2 fp8 conversions
   aarch64: add narrowing sme2 conversions to fp8
   aarch64: add multi-vector floating-point adjust exponent intrinsics
   aarch64: add basic support for sme-f8f16 and sme-f8f32
   aarch64: add Multi-vector 8-bit floating-point multiply-add long
   aarch64: add 8-bit floating-point sum of outer products and accumulate

Karl Meakin (1):
   aarch64: add 8-bit floating point dot product

  gcc/config/aarch64/aarch64-c.cc               |   4 +
  .../aarch64/aarch64-option-extensions.def     |   4 +
  gcc/config/aarch64/aarch64-sme.md             | 571 ++++++++++++++++++
  .../aarch64/aarch64-sve-builtins-base.cc      |  47 +-
  .../aarch64/aarch64-sve-builtins-functions.h  |  23 +-
  .../aarch64/aarch64-sve-builtins-shapes.cc    |  43 +-
  .../aarch64/aarch64-sve-builtins-shapes.h     |   1 +
  .../aarch64/aarch64-sve-builtins-sme.cc       |  20 +-
  .../aarch64/aarch64-sve-builtins-sme.def      |  55 +-
  gcc/config/aarch64/aarch64-sve-builtins-sme.h |   2 +
  .../aarch64/aarch64-sve-builtins-sve2.cc      |   2 +
  .../aarch64/aarch64-sve-builtins-sve2.def     |  12 +
  .../aarch64/aarch64-sve-builtins-sve2.h       |   2 +
  gcc/config/aarch64/aarch64-sve-builtins.cc    |  34 +-
  gcc/config/aarch64/aarch64-sve2.md            |  52 +-
  gcc/config/aarch64/aarch64.h                  |  10 +
  gcc/config/aarch64/iterators.md               |  73 ++-
  gcc/doc/invoke.texi                           |   6 +
  .../aarch64/sme2/aarch64-sme2-acle-asm.exp    |   3 +-
  .../gcc.target/aarch64/pragma_cpp_predefs_4.c |  34 ++
  .../aarch64/sme/acle-asm/read_hor_za128.c     |  31 +
  .../aarch64/sme/acle-asm/read_hor_za8.c       |  31 +
  .../aarch64/sme/acle-asm/read_ver_za128.c     |  31 +
  .../aarch64/sme/acle-asm/read_ver_za8.c       |  31 +
  .../aarch64/sme/acle-asm/revd_mf8.c           |  76 +++
  .../aarch64/sme/acle-asm/test_sme_acle.h      |   2 +-
  .../aarch64/sme/acle-asm/write_hor_za128.c    |  10 +
  .../aarch64/sme/acle-asm/write_hor_za8.c      |  10 +
  .../aarch64/sme/acle-asm/write_ver_za128.c    |  10 +
  .../aarch64/sme/acle-asm/write_ver_za8.c      |  10 +
  .../aarch64/sme2/aarch64-sme2-acle-asm.exp    |   3 +-
  .../aarch64/sme2/acle-asm/cvt_mf8_bf16_x2.c   |  56 ++
  .../aarch64/sme2/acle-asm/cvt_mf8_f16_x2.c    |  56 ++
  .../aarch64/sme2/acle-asm/cvt_mf8_f32_x4.c    |  72 +++
  .../aarch64/sme2/acle-asm/cvt_mf8_x2.c        |  47 ++
  .../aarch64/sme2/acle-asm/cvtl_mf8_x2.c       |  47 ++
  .../aarch64/sme2/acle-asm/cvtn_mf8_f32_x4.c   |  72 +++
  .../sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c   | 119 ++++
  .../sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c   | 125 ++++
  .../sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c   | 119 ++++
  .../sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c   | 125 ++++
  .../sme2/acle-asm/dot_single_za16_mf8_vg1x2.c | 126 ++++
  .../sme2/acle-asm/dot_single_za16_mf8_vg1x4.c | 126 ++++
  .../sme2/acle-asm/dot_single_za32_mf8_vg1x2.c | 126 ++++
  .../sme2/acle-asm/dot_single_za32_mf8_vg1x4.c | 126 ++++
  .../sme2/acle-asm/dot_za16_mf8_vg1x2.c        | 150 +++++
  .../sme2/acle-asm/dot_za16_mf8_vg1x4.c        | 166 +++++
  .../sme2/acle-asm/dot_za32_mf8_vg1x2.c        | 150 +++++
  .../sme2/acle-asm/dot_za32_mf8_vg1x4.c        | 166 +++++
  .../aarch64/sme2/acle-asm/ld1_mf8_x2.c        | 262 ++++++++
  .../aarch64/sme2/acle-asm/ld1_mf8_x4.c        | 354 +++++++++++
  .../aarch64/sme2/acle-asm/ldnt1_mf8_x2.c      | 262 ++++++++
  .../aarch64/sme2/acle-asm/ldnt1_mf8_x4.c      | 354 +++++++++++
  .../aarch64/sme2/acle-asm/luti2_mf8.c         |  48 ++
  .../aarch64/sme2/acle-asm/luti2_mf8_x2.c      |  50 ++
  .../aarch64/sme2/acle-asm/luti2_mf8_x4.c      |  56 ++
  .../aarch64/sme2/acle-asm/luti4_mf8.c         |  48 ++
  .../aarch64/sme2/acle-asm/luti4_mf8_x2.c      |  50 ++
  .../sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c   | 167 +++++
  .../sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c   | 136 +++++
  .../sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c   | 142 +++++
  .../sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c   | 169 ++++++
  .../sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c   | 137 +++++
  .../sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c   | 143 +++++
  .../sme2/acle-asm/mla_za16_mf8_vg2x1.c        | 167 +++++
  .../sme2/acle-asm/mla_za16_mf8_vg2x2.c        | 285 +++++++++
  .../sme2/acle-asm/mla_za16_mf8_vg2x4.c        | 287 +++++++++
  .../sme2/acle-asm/mla_za32_mf8_vg4x1.c        | 167 +++++
  .../sme2/acle-asm/mla_za32_mf8_vg4x2.c        | 277 +++++++++
  .../sme2/acle-asm/mla_za32_mf8_vg4x4.c        | 289 +++++++++
  .../aarch64/sme2/acle-asm/mopa_za16_mf8.c     |  36 ++
  .../aarch64/sme2/acle-asm/mopa_za32_mf8.c     |  36 ++
  .../aarch64/sme2/acle-asm/read_hor_za8_vg2.c  |  78 +++
  .../aarch64/sme2/acle-asm/read_hor_za8_vg4.c  |  91 +++
  .../aarch64/sme2/acle-asm/read_ver_za8_vg2.c  |  78 +++
  .../aarch64/sme2/acle-asm/read_ver_za8_vg4.c  |  91 +++
  .../aarch64/sme2/acle-asm/read_za8_vg1x2.c    |  48 ++
  .../aarch64/sme2/acle-asm/read_za8_vg1x4.c    |  54 ++
  .../aarch64/sme2/acle-asm/readz_hor_za128.c   |  10 +
  .../aarch64/sme2/acle-asm/readz_hor_za8.c     |  10 +
  .../aarch64/sme2/acle-asm/readz_hor_za8_vg2.c |  78 +++
  .../aarch64/sme2/acle-asm/readz_hor_za8_vg4.c |  91 +++
  .../aarch64/sme2/acle-asm/readz_ver_za128.c   | 197 ++++++
  .../aarch64/sme2/acle-asm/readz_ver_za8.c     |  10 +
  .../aarch64/sme2/acle-asm/readz_ver_za8_vg2.c |  77 +++
  .../aarch64/sme2/acle-asm/readz_ver_za8_vg4.c |  90 +++
  .../aarch64/sme2/acle-asm/readz_za8_vg1x2.c   |  48 ++
  .../aarch64/sme2/acle-asm/readz_za8_vg1x4.c   |  56 ++
  .../aarch64/sme2/acle-asm/scale_f16_x2.c      | 192 ++++++
  .../aarch64/sme2/acle-asm/scale_f16_x4.c      | 229 +++++++
  .../aarch64/sme2/acle-asm/scale_f32_x2.c      | 208 +++++++
  .../aarch64/sme2/acle-asm/scale_f32_x4.c      | 229 +++++++
  .../aarch64/sme2/acle-asm/scale_f64_x2.c      | 208 +++++++
  .../aarch64/sme2/acle-asm/scale_f64_x4.c      | 229 +++++++
  .../aarch64/sme2/acle-asm/sel_mf8_x2.c        |  92 +++
  .../aarch64/sme2/acle-asm/sel_mf8_x4.c        |  92 +++
  .../aarch64/sme2/acle-asm/st1_mf8_x2.c        | 262 ++++++++
  .../aarch64/sme2/acle-asm/st1_mf8_x4.c        | 354 +++++++++++
  .../aarch64/sme2/acle-asm/stnt1_mf8_x2.c      | 262 ++++++++
  .../aarch64/sme2/acle-asm/stnt1_mf8_x4.c      | 354 +++++++++++
  .../aarch64/sme2/acle-asm/test_sme2_acle.h    |  12 +-
  .../aarch64/sme2/acle-asm/uzp_mf8_x2.c        |  77 +++
  .../aarch64/sme2/acle-asm/uzp_mf8_x4.c        |  73 +++
  .../aarch64/sme2/acle-asm/uzpq_mf8_x2.c       |  77 +++
  .../aarch64/sme2/acle-asm/uzpq_mf8_x4.c       |  73 +++
  .../sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c  | 119 ++++
  .../sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c | 119 ++++
  .../sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c | 119 ++++
  .../aarch64/sme2/acle-asm/write_hor_za8_vg2.c |  78 +++
  .../aarch64/sme2/acle-asm/write_hor_za8_vg4.c |  91 +++
  .../aarch64/sme2/acle-asm/write_ver_za8_vg2.c |  78 +++
  .../aarch64/sme2/acle-asm/write_ver_za8_vg4.c |  91 +++
  .../aarch64/sme2/acle-asm/write_za8_vg1x2.c   |  48 ++
  .../aarch64/sme2/acle-asm/write_za8_vg1x4.c   |  54 ++
  .../aarch64/sme2/acle-asm/zip_mf8_x2.c        |  77 +++
  .../aarch64/sme2/acle-asm/zip_mf8_x4.c        |  73 +++
  .../aarch64/sme2/acle-asm/zipq_mf8_x2.c       |  77 +++
  .../aarch64/sme2/acle-asm/zipq_mf8_x4.c       |  73 +++
  .../aarch64/sve/acle/asm/test_sve_acle.h      |   3 +
  .../sve/acle/general-c/binary_za_m_1.c        |  14 +
  .../acle/general-c/binary_za_slice_lane_1.c   |  14 +
  .../general-c/binary_za_slice_opt_single_1.c  |  16 +
  .../general-c/dot_half_za_slice_lane_fpm.c    | 106 ++++
  .../aarch64/sve2/acle/asm/ld1_mf8_x2.c        | 269 +++++++++
  .../aarch64/sve2/acle/asm/ld1_mf8_x4.c        | 361 +++++++++++
  .../aarch64/sve2/acle/asm/ldnt1_mf8_x2.c      | 269 +++++++++
  .../aarch64/sve2/acle/asm/ldnt1_mf8_x4.c      | 361 +++++++++++
  .../aarch64/sve2/acle/asm/revd_mf8.c          |  80 +++
  .../aarch64/sve2/acle/asm/stnt1_mf8_x2.c      | 269 +++++++++
  .../aarch64/sve2/acle/asm/stnt1_mf8_x4.c      | 361 +++++++++++
  gcc/testsuite/lib/target-supports.exp         |   1 +
  131 files changed, 14445 insertions(+), 45 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/revd_mf8.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_bf16_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_f16_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_f32_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvtl_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvtn_mf8_f32_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ld1_mf8_x2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ld1_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ldnt1_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ldnt1_mf8_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti4_mf8.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti4_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mopa_za16_mf8.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mopa_za32_mf8.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/readz_ver_za128.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f16_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f16_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f32_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f32_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f64_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f64_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/sel_mf8_x2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/sel_mf8_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/st1_mf8_x2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/st1_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/stnt1_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/stnt1_mf8_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_mf8_x2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zip_mf8_x2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zip_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zipq_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zipq_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/dot_half_za_slice_lane_fpm.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ld1_mf8_x2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ld1_mf8_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_mf8_x4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_mf8.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_mf8_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_mf8_x4.c


Reply via email to