Went through the patch series and it looks good to me, no additional
comments. But I'm not a maintainer so would wait for others to review it.
Thanks,
Saurabh
On 12/18/2025 5:14 PM, Claudio Bantaloukas wrote:
This patch series completes support for SME2 and SME2p1 intrinsics relative to
modal 8bit floating point types.
- The first patch in the series introduces tests for using luti intrinsics with
mf8 that was already working since their introduction, now that their use is
documented in ACLE.
- The second patch extends the definitions of existing non-interpreting sve2/sme
intrinsics to support mfloat8 types.
- The third and fourth patches add widening and narrowing sme2 fp8 conversions
respectively (svcvt).
- The fifth patch adds multi-vector floating-point adjust exponent intrinsics
(svscale).
- The sixth patch adds support for the sme-f8f16 and sme-f8f32 arch features
and related defines.
- Patch 7 adds Multi-vector 8-bit floating-point multiply-add long intrinsics.
- Patch 8 adds 8-bit floating-point sum of outer products and accumulate
intrinsics.
- Patch 9 adds 8-bit floating point dot product intrinsics.
Compared to version 1 of this patch series:
- updated commit messages per requests.
- fixed gating of intrinsics in patch four (narrowing sme2 conversions to fp8).
- introduced aarch64_output_asm_with_extra_operand function and updated insns in
aarch64-sme.md to no longer use out of bounds operands.
Compared to version 2 of this patch series:
- replaced aarch64_output_asm_with_extra_operand with
aarch64_output_asm_with_offset which does not require allocating space for
operands on the stack in patch 7.
Compared to version 3 of this patch series:
- Entirely removed functions using snprintf and returned to existing use of
operands array as the array is long enough for this use.
- Addressed Richard Ball's feedback (renamed test files, improved readability in
exp file, formatting).
Compared to version 4 of this patch series:
- Reposting patches inline rather than as attachments.
- Accidentally posted the last 8 rather than 9 patches. Thanks to Artemiy who
spotted this.
Regression tested on aarch64-unknown-linux-gnu.
OK to merge?
Thanks,
Claudio Bantaloukas
Claudio Bantaloukas (8):
aarch64: add tests for sme mfloat8 luti functions
aarch64: extend sme intrinsics to mfp8
aarch64: add widening sme2 fp8 conversions
aarch64: add narrowing sme2 conversions to fp8
aarch64: add multi-vector floating-point adjust exponent intrinsics
aarch64: add basic support for sme-f8f16 and sme-f8f32
aarch64: add Multi-vector 8-bit floating-point multiply-add long
aarch64: add 8-bit floating-point sum of outer products and accumulate
Karl Meakin (1):
aarch64: add 8-bit floating point dot product
gcc/config/aarch64/aarch64-c.cc | 4 +
.../aarch64/aarch64-option-extensions.def | 4 +
gcc/config/aarch64/aarch64-sme.md | 571 ++++++++++++++++++
.../aarch64/aarch64-sve-builtins-base.cc | 47 +-
.../aarch64/aarch64-sve-builtins-functions.h | 23 +-
.../aarch64/aarch64-sve-builtins-shapes.cc | 43 +-
.../aarch64/aarch64-sve-builtins-shapes.h | 1 +
.../aarch64/aarch64-sve-builtins-sme.cc | 20 +-
.../aarch64/aarch64-sve-builtins-sme.def | 55 +-
gcc/config/aarch64/aarch64-sve-builtins-sme.h | 2 +
.../aarch64/aarch64-sve-builtins-sve2.cc | 2 +
.../aarch64/aarch64-sve-builtins-sve2.def | 12 +
.../aarch64/aarch64-sve-builtins-sve2.h | 2 +
gcc/config/aarch64/aarch64-sve-builtins.cc | 34 +-
gcc/config/aarch64/aarch64-sve2.md | 52 +-
gcc/config/aarch64/aarch64.h | 10 +
gcc/config/aarch64/iterators.md | 73 ++-
gcc/doc/invoke.texi | 6 +
.../aarch64/sme2/aarch64-sme2-acle-asm.exp | 3 +-
.../gcc.target/aarch64/pragma_cpp_predefs_4.c | 34 ++
.../aarch64/sme/acle-asm/read_hor_za128.c | 31 +
.../aarch64/sme/acle-asm/read_hor_za8.c | 31 +
.../aarch64/sme/acle-asm/read_ver_za128.c | 31 +
.../aarch64/sme/acle-asm/read_ver_za8.c | 31 +
.../aarch64/sme/acle-asm/revd_mf8.c | 76 +++
.../aarch64/sme/acle-asm/test_sme_acle.h | 2 +-
.../aarch64/sme/acle-asm/write_hor_za128.c | 10 +
.../aarch64/sme/acle-asm/write_hor_za8.c | 10 +
.../aarch64/sme/acle-asm/write_ver_za128.c | 10 +
.../aarch64/sme/acle-asm/write_ver_za8.c | 10 +
.../aarch64/sme2/aarch64-sme2-acle-asm.exp | 3 +-
.../aarch64/sme2/acle-asm/cvt_mf8_bf16_x2.c | 56 ++
.../aarch64/sme2/acle-asm/cvt_mf8_f16_x2.c | 56 ++
.../aarch64/sme2/acle-asm/cvt_mf8_f32_x4.c | 72 +++
.../aarch64/sme2/acle-asm/cvt_mf8_x2.c | 47 ++
.../aarch64/sme2/acle-asm/cvtl_mf8_x2.c | 47 ++
.../aarch64/sme2/acle-asm/cvtn_mf8_f32_x4.c | 72 +++
.../sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c | 119 ++++
.../sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c | 125 ++++
.../sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c | 119 ++++
.../sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c | 125 ++++
.../sme2/acle-asm/dot_single_za16_mf8_vg1x2.c | 126 ++++
.../sme2/acle-asm/dot_single_za16_mf8_vg1x4.c | 126 ++++
.../sme2/acle-asm/dot_single_za32_mf8_vg1x2.c | 126 ++++
.../sme2/acle-asm/dot_single_za32_mf8_vg1x4.c | 126 ++++
.../sme2/acle-asm/dot_za16_mf8_vg1x2.c | 150 +++++
.../sme2/acle-asm/dot_za16_mf8_vg1x4.c | 166 +++++
.../sme2/acle-asm/dot_za32_mf8_vg1x2.c | 150 +++++
.../sme2/acle-asm/dot_za32_mf8_vg1x4.c | 166 +++++
.../aarch64/sme2/acle-asm/ld1_mf8_x2.c | 262 ++++++++
.../aarch64/sme2/acle-asm/ld1_mf8_x4.c | 354 +++++++++++
.../aarch64/sme2/acle-asm/ldnt1_mf8_x2.c | 262 ++++++++
.../aarch64/sme2/acle-asm/ldnt1_mf8_x4.c | 354 +++++++++++
.../aarch64/sme2/acle-asm/luti2_mf8.c | 48 ++
.../aarch64/sme2/acle-asm/luti2_mf8_x2.c | 50 ++
.../aarch64/sme2/acle-asm/luti2_mf8_x4.c | 56 ++
.../aarch64/sme2/acle-asm/luti4_mf8.c | 48 ++
.../aarch64/sme2/acle-asm/luti4_mf8_x2.c | 50 ++
.../sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c | 167 +++++
.../sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c | 136 +++++
.../sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c | 142 +++++
.../sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c | 169 ++++++
.../sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c | 137 +++++
.../sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c | 143 +++++
.../sme2/acle-asm/mla_za16_mf8_vg2x1.c | 167 +++++
.../sme2/acle-asm/mla_za16_mf8_vg2x2.c | 285 +++++++++
.../sme2/acle-asm/mla_za16_mf8_vg2x4.c | 287 +++++++++
.../sme2/acle-asm/mla_za32_mf8_vg4x1.c | 167 +++++
.../sme2/acle-asm/mla_za32_mf8_vg4x2.c | 277 +++++++++
.../sme2/acle-asm/mla_za32_mf8_vg4x4.c | 289 +++++++++
.../aarch64/sme2/acle-asm/mopa_za16_mf8.c | 36 ++
.../aarch64/sme2/acle-asm/mopa_za32_mf8.c | 36 ++
.../aarch64/sme2/acle-asm/read_hor_za8_vg2.c | 78 +++
.../aarch64/sme2/acle-asm/read_hor_za8_vg4.c | 91 +++
.../aarch64/sme2/acle-asm/read_ver_za8_vg2.c | 78 +++
.../aarch64/sme2/acle-asm/read_ver_za8_vg4.c | 91 +++
.../aarch64/sme2/acle-asm/read_za8_vg1x2.c | 48 ++
.../aarch64/sme2/acle-asm/read_za8_vg1x4.c | 54 ++
.../aarch64/sme2/acle-asm/readz_hor_za128.c | 10 +
.../aarch64/sme2/acle-asm/readz_hor_za8.c | 10 +
.../aarch64/sme2/acle-asm/readz_hor_za8_vg2.c | 78 +++
.../aarch64/sme2/acle-asm/readz_hor_za8_vg4.c | 91 +++
.../aarch64/sme2/acle-asm/readz_ver_za128.c | 197 ++++++
.../aarch64/sme2/acle-asm/readz_ver_za8.c | 10 +
.../aarch64/sme2/acle-asm/readz_ver_za8_vg2.c | 77 +++
.../aarch64/sme2/acle-asm/readz_ver_za8_vg4.c | 90 +++
.../aarch64/sme2/acle-asm/readz_za8_vg1x2.c | 48 ++
.../aarch64/sme2/acle-asm/readz_za8_vg1x4.c | 56 ++
.../aarch64/sme2/acle-asm/scale_f16_x2.c | 192 ++++++
.../aarch64/sme2/acle-asm/scale_f16_x4.c | 229 +++++++
.../aarch64/sme2/acle-asm/scale_f32_x2.c | 208 +++++++
.../aarch64/sme2/acle-asm/scale_f32_x4.c | 229 +++++++
.../aarch64/sme2/acle-asm/scale_f64_x2.c | 208 +++++++
.../aarch64/sme2/acle-asm/scale_f64_x4.c | 229 +++++++
.../aarch64/sme2/acle-asm/sel_mf8_x2.c | 92 +++
.../aarch64/sme2/acle-asm/sel_mf8_x4.c | 92 +++
.../aarch64/sme2/acle-asm/st1_mf8_x2.c | 262 ++++++++
.../aarch64/sme2/acle-asm/st1_mf8_x4.c | 354 +++++++++++
.../aarch64/sme2/acle-asm/stnt1_mf8_x2.c | 262 ++++++++
.../aarch64/sme2/acle-asm/stnt1_mf8_x4.c | 354 +++++++++++
.../aarch64/sme2/acle-asm/test_sme2_acle.h | 12 +-
.../aarch64/sme2/acle-asm/uzp_mf8_x2.c | 77 +++
.../aarch64/sme2/acle-asm/uzp_mf8_x4.c | 73 +++
.../aarch64/sme2/acle-asm/uzpq_mf8_x2.c | 77 +++
.../aarch64/sme2/acle-asm/uzpq_mf8_x4.c | 73 +++
.../sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c | 119 ++++
.../sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c | 119 ++++
.../sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c | 119 ++++
.../aarch64/sme2/acle-asm/write_hor_za8_vg2.c | 78 +++
.../aarch64/sme2/acle-asm/write_hor_za8_vg4.c | 91 +++
.../aarch64/sme2/acle-asm/write_ver_za8_vg2.c | 78 +++
.../aarch64/sme2/acle-asm/write_ver_za8_vg4.c | 91 +++
.../aarch64/sme2/acle-asm/write_za8_vg1x2.c | 48 ++
.../aarch64/sme2/acle-asm/write_za8_vg1x4.c | 54 ++
.../aarch64/sme2/acle-asm/zip_mf8_x2.c | 77 +++
.../aarch64/sme2/acle-asm/zip_mf8_x4.c | 73 +++
.../aarch64/sme2/acle-asm/zipq_mf8_x2.c | 77 +++
.../aarch64/sme2/acle-asm/zipq_mf8_x4.c | 73 +++
.../aarch64/sve/acle/asm/test_sve_acle.h | 3 +
.../sve/acle/general-c/binary_za_m_1.c | 14 +
.../acle/general-c/binary_za_slice_lane_1.c | 14 +
.../general-c/binary_za_slice_opt_single_1.c | 16 +
.../general-c/dot_half_za_slice_lane_fpm.c | 106 ++++
.../aarch64/sve2/acle/asm/ld1_mf8_x2.c | 269 +++++++++
.../aarch64/sve2/acle/asm/ld1_mf8_x4.c | 361 +++++++++++
.../aarch64/sve2/acle/asm/ldnt1_mf8_x2.c | 269 +++++++++
.../aarch64/sve2/acle/asm/ldnt1_mf8_x4.c | 361 +++++++++++
.../aarch64/sve2/acle/asm/revd_mf8.c | 80 +++
.../aarch64/sve2/acle/asm/stnt1_mf8_x2.c | 269 +++++++++
.../aarch64/sve2/acle/asm/stnt1_mf8_x4.c | 361 +++++++++++
gcc/testsuite/lib/target-supports.exp | 1 +
131 files changed, 14445 insertions(+), 45 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/revd_mf8.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_bf16_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_f16_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_f32_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvtl_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvtn_mf8_f32_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ld1_mf8_x2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ld1_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ldnt1_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ldnt1_mf8_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti4_mf8.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti4_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mopa_za16_mf8.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mopa_za32_mf8.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/readz_ver_za128.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f16_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f16_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f32_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f32_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f64_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f64_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/sel_mf8_x2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/sel_mf8_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/st1_mf8_x2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/st1_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/stnt1_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/stnt1_mf8_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_mf8_x2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zip_mf8_x2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zip_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zipq_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zipq_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/dot_half_za_slice_lane_fpm.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ld1_mf8_x2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ld1_mf8_x4.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_mf8_x4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_mf8.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_mf8_x2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_mf8_x4.c