This series incrementally adds support for operations on unpacked vectors
of floating-point values. By "unpacked", we're referring to the in-register
layout of partial SVE vector modes. For example, the elements of a VNx4HF
are stored as:
... | X | HF | X | HF | X | HF | X | HF |
Where 'X' denotes the undefined upper half of the 32-bit container that each
16-bit value is stored in. This padding must not affect the operation's
behavior, so should not be interpreted if the operation may trap.
The series is organised as follows:
* NFCs to iterators.md that lay the groundwork for the rest of the
series.
* Unpacked conversions, in which a solution to the issue described
above is given.
* Unpacked comparisons, which are slightly less trivial than...
* Unpacked unary/binary/ternary operations, each of which is broken
down into:
* Defining the unconditional expansion
* Supporting OP/UNSPEC_SEL combiner patterns under
SVE_RELAXED_GP
* Defining the conditional expander (if applicable)
This allows each change to aarch64-sve.md to be testable; once the conditional
expander for an operation is defined, the rules in match.pd canonicalize any
occurrence of that operation combined with a VEC_COND_EXPR into these
conditional forms, which would make the SVE_RELAXED_GP patterns dead at trunk.
I’ve taken this approach because I believe it’s valuable to have these
patterns to fall back on.
Notes on code generation under -ftrapping-math:
1) In the example below, we're currently unable to remove (1) in favour of
(2).
ptrue p6.b, all (1)
ptrue p7.d, all (2)
ld1w z30.d, p6/z, [x1]
ld1w z29.d, p6/z, [x3]
fsub z30.s, p7/m, z30.s, #1.0
In the expanded RTL, the predicate source of the LD1Ws is a
(subreg:VNx2BI (reg:VNx16BI 111) 0), where every bit of 111 is a 1. The
predicate source of the FSUB is a (subreg:VNx4BI (reg:VNx16BI 112) 0),
where every 8th bit of 112 is a 1, and the rest are 0.
2) The AND emitted by the conditional expander typically follows a CMP<CC>
operation, where it is trivially redundant.
cmpne p5.d, p7/z, z0.d, #0
ptrue p6.d, vl32
and p6.b, p6/z, p5.b, p5.b
The fold we need here is slightly different from what the existing
*cmp<cmp_op><mode>_and splitting patterns achieve, in that we don’t need to
replace p7 with p6 to make the AND redundant.
The AND in this case has the structure:
(set (reg:VNx4BI 113)
(and (subreg:VNx4BI (reg:VNx16BI 111) 0)
(subreg:VNx4BI (reg:VNx2BI 112) 0)
This problem feels somewhat related to how we might handle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151.
Bootstrapped & regtested on aarch64-linux-gnu.
Thanks,
Spencer
Spencer Abson (14):
aarch64: Extend iterator support for partial SVE FP modes
aarch64: Add support for unpacked SVE FP conversions
aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions
aarch64: Add support for unpacked SVE FP comparisons
aarch64: Compare/and splits for unpacked SVE FP comparisons
aarch64: Add support for unpacked SVE FP unary operations
aarch64: Relaxed SEL combiner patterns for unpacked SVE FP unary
operations
aarch64: Add support for unpacked SVE FP binary arithmetic
aarch64: Add support for unpacked SVE FDIV
aarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary
arithmetic
aarch64: Add support for unpacked SVE FP conditional binary arithmetic
aarch64: Add support for unpacked SVE FP ternary arithmetic
aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary
arithmetic
aarch64: Add support for unpacked SVE FP conditional ternary
arithmetic
gcc/config/aarch64/aarch64-protos.h | 4 +
gcc/config/aarch64/aarch64-sve.md | 889 ++++++++++++------
gcc/config/aarch64/aarch64-sve2.md | 10 +-
gcc/config/aarch64/aarch64.cc | 125 ++-
gcc/config/aarch64/iterators.md | 97 +-
gcc/config/aarch64/predicates.md | 4 +
.../aarch64/sve/unpacked_binary_bf16_1.C | 35 +
.../aarch64/sve/unpacked_binary_bf16_2.C | 15 +
.../aarch64/sve/unpacked_cond_binary_bf16_1.C | 46 +
.../aarch64/sve/unpacked_cond_binary_bf16_2.C | 18 +
.../sve/unpacked_cond_ternary_bf16_1.C | 35 +
.../sve/unpacked_cond_ternary_bf16_2.C | 14 +
.../aarch64/sve/unpacked_ternary_bf16_1.C | 27 +
.../aarch64/sve/unpacked_ternary_bf16_2.C | 11 +
.../aarch64/sve/pack_fcvt_signed_1.c | 2 +-
.../aarch64/sve/pack_fcvt_unsigned_1.c | 2 +-
.../gcc.target/aarch64/sve/pack_float_1.c | 2 +-
.../gcc.target/aarch64/sve/unpack_float_1.c | 2 +-
.../aarch64/sve/unpacked_builtin_fmax_1.c | 40 +
.../aarch64/sve/unpacked_builtin_fmax_2.c | 16 +
.../aarch64/sve/unpacked_builtin_fmin_1.c | 40 +
.../aarch64/sve/unpacked_builtin_fmin_2.c | 16 +
.../sve/unpacked_cond_builtin_fmax_1.c | 47 +
.../sve/unpacked_cond_builtin_fmax_2.c | 20 +
.../sve/unpacked_cond_builtin_fmin_1.c | 47 +
.../sve/unpacked_cond_builtin_fmin_2.c | 20 +
.../aarch64/sve/unpacked_cond_cvtf_1.c | 47 +
.../aarch64/sve/unpacked_cond_fabs_1.c | 32 +
.../aarch64/sve/unpacked_cond_fadd_1.c | 58 ++
.../aarch64/sve/unpacked_cond_fadd_2.c | 24 +
.../aarch64/sve/unpacked_cond_fcvt_1.c | 37 +
.../aarch64/sve/unpacked_cond_fcvtz_1.c | 51 +
.../aarch64/sve/unpacked_cond_fdiv_1.c | 43 +
.../aarch64/sve/unpacked_cond_fdiv_2.c | 18 +
.../aarch64/sve/unpacked_cond_fmaxnm_1.c | 49 +
.../aarch64/sve/unpacked_cond_fmaxnm_2.c | 20 +
.../aarch64/sve/unpacked_cond_fminnm_1.c | 49 +
.../aarch64/sve/unpacked_cond_fminnm_2.c | 20 +
.../aarch64/sve/unpacked_cond_fmla_1.c | 47 +
.../aarch64/sve/unpacked_cond_fmla_2.c | 18 +
.../aarch64/sve/unpacked_cond_fmls_1.c | 47 +
.../aarch64/sve/unpacked_cond_fmls_2.c | 18 +
.../aarch64/sve/unpacked_cond_fmul_1.c | 46 +
.../aarch64/sve/unpacked_cond_fmul_2.c | 18 +
.../aarch64/sve/unpacked_cond_fneg_1.c | 34 +
.../aarch64/sve/unpacked_cond_fnmla_1.c | 47 +
.../aarch64/sve/unpacked_cond_fnmla_2.c | 18 +
.../aarch64/sve/unpacked_cond_fnmls_1.c | 47 +
.../aarch64/sve/unpacked_cond_fnmls_2.c | 18 +
.../aarch64/sve/unpacked_cond_frinta_1.c | 32 +
.../aarch64/sve/unpacked_cond_frinti_1.c | 32 +
.../aarch64/sve/unpacked_cond_frintm_1.c | 32 +
.../aarch64/sve/unpacked_cond_frintp_1.c | 32 +
.../aarch64/sve/unpacked_cond_frintx_1.c | 32 +
.../aarch64/sve/unpacked_cond_frintz_1.c | 32 +
.../aarch64/sve/unpacked_cond_fsubr_1.c | 53 ++
.../aarch64/sve/unpacked_cond_fsubr_2.c | 22 +
.../gcc.target/aarch64/sve/unpacked_cvtf_1.c | 217 +++++
.../gcc.target/aarch64/sve/unpacked_cvtf_2.c | 23 +
.../gcc.target/aarch64/sve/unpacked_cvtf_3.c | 12 +
.../gcc.target/aarch64/sve/unpacked_fabs_1.c | 24 +
.../gcc.target/aarch64/sve/unpacked_fadd_1.c | 48 +
.../gcc.target/aarch64/sve/unpacked_fadd_2.c | 22 +
.../gcc.target/aarch64/sve/unpacked_fcm_1.c | 547 +++++++++++
.../gcc.target/aarch64/sve/unpacked_fcm_2.c | 47 +
.../aarch64/sve/unpacked_fcm_and_1.c | 18 +
.../gcc.target/aarch64/sve/unpacked_fcvt_1.c | 118 +++
.../gcc.target/aarch64/sve/unpacked_fcvt_2.c | 16 +
.../gcc.target/aarch64/sve/unpacked_fcvtz_1.c | 244 +++++
.../gcc.target/aarch64/sve/unpacked_fcvtz_2.c | 26 +
.../gcc.target/aarch64/sve/unpacked_fdiv_1.c | 34 +
.../gcc.target/aarch64/sve/unpacked_fdiv_2.c | 11 +
.../gcc.target/aarch64/sve/unpacked_fdiv_3.c | 11 +
.../aarch64/sve/unpacked_fmaxnm_1.c | 41 +
.../aarch64/sve/unpacked_fmaxnm_2.c | 16 +
.../aarch64/sve/unpacked_fminnm_1.c | 42 +
.../aarch64/sve/unpacked_fminnm_2.c | 16 +
.../gcc.target/aarch64/sve/unpacked_fmla_1.c | 34 +
.../gcc.target/aarch64/sve/unpacked_fmla_2.c | 11 +
.../gcc.target/aarch64/sve/unpacked_fmls_1.c | 34 +
.../gcc.target/aarch64/sve/unpacked_fmls_2.c | 11 +
.../gcc.target/aarch64/sve/unpacked_fmul_1.c | 39 +
.../gcc.target/aarch64/sve/unpacked_fmul_2.c | 14 +
.../gcc.target/aarch64/sve/unpacked_fneg_1.c | 26 +
.../gcc.target/aarch64/sve/unpacked_fnmla_1.c | 34 +
.../gcc.target/aarch64/sve/unpacked_fnmla_2.c | 11 +
.../gcc.target/aarch64/sve/unpacked_fnmls_1.c | 34 +
.../gcc.target/aarch64/sve/unpacked_fnmls_2.c | 11 +
.../aarch64/sve/unpacked_frinta_1.c | 27 +
.../aarch64/sve/unpacked_frinta_2.c | 11 +
.../aarch64/sve/unpacked_frinti_1.c | 27 +
.../aarch64/sve/unpacked_frinti_2.c | 11 +
.../aarch64/sve/unpacked_frintm_1.c | 27 +
.../aarch64/sve/unpacked_frintm_2.c | 11 +
.../aarch64/sve/unpacked_frintp_1.c | 27 +
.../aarch64/sve/unpacked_frintp_2.c | 11 +
.../aarch64/sve/unpacked_frintx_1.c | 27 +
.../aarch64/sve/unpacked_frintx_2.c | 11 +
.../aarch64/sve/unpacked_frintz_1.c | 27 +
.../aarch64/sve/unpacked_frintz_2.c | 11 +
.../gcc.target/aarch64/sve/unpacked_fsubr_1.c | 42 +
.../gcc.target/aarch64/sve/unpacked_fsubr_2.c | 16 +
102 files changed, 4371 insertions(+), 364 deletions(-)
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_1.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_2.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_binary_bf16_1.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_ternary_bf16_1.C
create mode 100644
gcc/testsuite/g++.target/aarch64/sve/unpacked_ternary_bf16_2.C
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fabs_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fadd_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fadd_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fcvt_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fcvtz_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fdiv_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmaxnm_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fminnm_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmul_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmul_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fneg_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frinta_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frinti_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintm_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintp_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintx_1.c
create mode 100644
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintz_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fsubr_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_3.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fabs_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_and_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_3.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmla_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmla_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmls_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmls_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fneg_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmla_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmla_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmls_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmls_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinta_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinta_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinti_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinti_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintm_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintm_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintp_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintp_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintx_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintx_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintz_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintz_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_2.c
--
2.34.1