[gcc r16-1530] aarch64: Add support for unpacked SVE FP conversions

Spencer Abson via Gcc-cvs Mon, 16 Jun 2025 12:43:00 -0700

https://gcc.gnu.org/g:a26c5fc8fe2bdb4973d114732238d95c3896bc08


commit r16-1530-ga26c5fc8fe2bdb4973d114732238d95c3896bc08
Author: Spencer Abson <spencer.ab...@arm.com>
Date:   Mon Jun 16 19:31:30 2025 +0000

    aarch64: Add support for unpacked SVE FP conversions
    
    This patch introduces expanders for FP<-FP conversions that levarage
    partial vector modes.  We also extend the INT<-FP and FP<-INT conversions
    using the same approach.
    
    The ACLE enables vectorized conversions like the following:
    
    fcvt z0.h, p7/m, z1.s
    
    modelling the source vector as VNx4SF:
    
    ... |     SF|     SF|     SF|     SF|
    
    and the destination as a VNx8HF, where this operation would yield:
    
    ... | 0 | HF| 0 | HF| 0 | HF| 0 | HF|
    
    hence the useful results are stored unpacked, i.e.
    
    ... | X | HF| X | HF| X | HF| X | HF|   (VNx4HF)
    
    This patch allows the vectorizer to use this variant of fcvt as a
    conversion from VNx4SF to VNx4HF.  The same idea applies to widening
    conversions, and between vectors with FP and integer base types.
    
    If the source itself had been unpacked, e.g.
    
    ... |   X   |     SF|   X   |     SF|   (VNx2SF)
    
    The result would yield
    
    ... | X | X | X | HF| X | X | X | HF|   (VNx2HF)
    
    The upper bits of each container here are undefined, it's important to
    avoid interpreting them during FP operations - doing so could introduce
    spurious traps.  The obvious route we've taken here is to mask undefined
    lanes using the operation's predicate if we have flag_trapping_math.
    
    The VPRED predicate mode (e.g. VNx2BI here) cannot do this; to ensure
    correct behavior, we need a predicate mode that can control the data as if
    it were fully-packed (VNx4BI).
    
    Both VNx2BI and VNx4BI must be recognised as legal governing predicate modes
    by the corresponding FP insns.  In general, the governing predicate mode for
    an insn could be any such with at least as many significant lanes as the 
data
    mode.  For example, addvnx4hf3 could be controlled by any of VNx{4,8,16}BI.
    
    We implement 'aarch64_predicate_operand', a new define_special_predicate, to
    acheive this.
    
    gcc/ChangeLog:
    
            * config/aarch64/aarch64-protos.h (aarch64_sve_valid_pred_p):
            Declare helper for aarch64_predicate_operand.
            (aarch64_sve_packed_pred): Declare helper for new expanders.
            (aarch64_sve_fp_pred): Likewise.
            * config/aarch64/aarch64-sve.md (<optab><mode><v_int_equiv>2):
            Extend into...
            (<optab><SVE_HSF:mode><SVE_HSDI:mode>2): New expander for converting
            vectors of HF,SF to vectors of HI,SI,DI.
            (<optab><VNx2DF_ONLY:mode><SVE_2SDI:mode>2): New expander for 
converting
            vectors of SI,DI to vectors of DF.
            (*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_F:mode><SVE_HSDI:mode>):
            New pattern to match those we've added here.
            (@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>): 
Extend
            into...
            (@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><SVE_SI:mode>): Match 
both
            VNx2SI<-VNx2DF and VNx4SI<-VNx4DF.
            (<optab><v_int_equiv><mode>2): Extend into...
            (<optab><SVE_HSDI:mode><SVE_F:mode>2): New expander for converting 
vectors
            of HI,SI,DI to vectors of HF,SF,DF.
            
(*aarch64_sve_<optab>_nonextend<SVE_HSDI:mode><SVE_PARTIAL_F:mode>): New
            pattern to match those we've added here.
            (trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>2): New expander to handle
            narrowing ('truncating') FP<-FP conversions.
            (*aarch64_sve_<optab>_trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>): 
New
            pattern to handle those we've added here.
            (extend<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>2): New expander to 
handle
            widening ('extending') FP<-FP conversions.
            
(*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>): New
            pattern to handle those we've added here.
            * config/aarch64/aarch64.cc (aarch64_sve_packed_pred): New function.
            (aarch64_sve_fp_pred): Likewise.
            (aarch64_sve_valid_pred_p): Likewise.
            * config/aarch64/iterators.md (SVE_PARTIAL_HSF): New mode iterator.
            (SVE_HSF): Likewise.
            (SVE_SDF): Likewise.
            (SVE_SI): Likewise.
            (SVE_2SDI) Likewise.
            (self_mask):  Extend to all integer/FP vector modes.
            (narrower_mask): Likewise (excluding QI).
            * config/aarch64/predicates.md (aarch64_predicate_operand): New 
special
            predicate to handle narrower predicate modes.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/aarch64/sve/pack_fcvt_signed_1.c: Disable the aarch64 
vector
            cost model to preserve this test.
            * gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c: Likewise.
            * gcc.target/aarch64/sve/pack_float_1.c: Likewise.
            * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
            * gcc.target/aarch64/sve/unpacked_cvtf_1.c: New test.
            * gcc.target/aarch64/sve/unpacked_cvtf_2.c: Likewise.
            * gcc.target/aarch64/sve/unpacked_cvtf_3.c: Likewise.
            * gcc.target/aarch64/sve/unpacked_fcvt_1.c: Likewise.
            * gcc.target/aarch64/sve/unpacked_fcvt_2.c: Likewise.
            * gcc.target/aarch64/sve/unpacked_fcvtz_1.c: Likewise.
            * gcc.target/aarch64/sve/unpacked_fcvtz_2.c: Likewise.

Diff:
---
 gcc/config/aarch64/aarch64-protos.h                |   3 +
 gcc/config/aarch64/aarch64-sve.md                  | 173 +++++++++++++--
 gcc/config/aarch64/aarch64.cc                      |  51 +++++
 gcc/config/aarch64/iterators.md                    |  45 ++--
 gcc/config/aarch64/predicates.md                   |   5 +
 .../gcc.target/aarch64/sve/pack_fcvt_signed_1.c    |   2 +-
 .../gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c  |   2 +-
 .../gcc.target/aarch64/sve/pack_float_1.c          |   2 +-
 .../gcc.target/aarch64/sve/unpack_float_1.c        |   2 +-
 .../gcc.target/aarch64/sve/unpacked_cvtf_1.c       | 217 ++++++++++++++++++
 .../gcc.target/aarch64/sve/unpacked_cvtf_2.c       |  23 ++
 .../gcc.target/aarch64/sve/unpacked_cvtf_3.c       |  12 +
 .../gcc.target/aarch64/sve/unpacked_fcvt_1.c       | 118 ++++++++++
 .../gcc.target/aarch64/sve/unpacked_fcvt_2.c       |  16 ++
 .../gcc.target/aarch64/sve/unpacked_fcvtz_1.c      | 244 +++++++++++++++++++++
 .../gcc.target/aarch64/sve/unpacked_fcvtz_2.c      |  26 +++
 16 files changed, 902 insertions(+), 39 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index b1ce42fc3966..40088db2b423 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -947,6 +947,7 @@ bool aarch64_parallel_select_half_p (machine_mode, rtx);
 bool aarch64_pars_overlap_p (rtx, rtx);
 bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
 bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
+bool aarch64_sve_valid_pred_p (rtx, machine_mode);
 bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
 bool aarch64_simd_valid_and_imm (rtx);
 bool aarch64_simd_valid_and_imm_fmov (rtx, unsigned int * = NULL);
@@ -1028,6 +1029,8 @@ rtx aarch64_ptrue_reg (machine_mode, unsigned int);
 rtx aarch64_ptrue_reg (machine_mode, machine_mode);
 rtx aarch64_pfalse_reg (machine_mode);
 bool aarch64_sve_same_pred_for_ptest_p (rtx *, rtx *);
+rtx aarch64_sve_packed_pred (machine_mode);
+rtx aarch64_sve_fp_pred (machine_mode, rtx *);
 void aarch64_emit_load_store_through_mode (rtx, rtx, machine_mode);
 bool aarch64_expand_maskloadstore (rtx *, machine_mode);
 void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 180c6dd9e5b5..450975dd088e 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -154,8 +154,10 @@
 ;; ---- [FP<-INT] Packs
 ;; ---- [FP<-INT] Unpacks
 ;; ---- [FP<-FP] Packs
+;; ---- [FP<-FP] Truncating conversions
 ;; ---- [FP<-FP] Packs (bfloat16)
 ;; ---- [FP<-FP] Unpacks
+;; ---- [FP<-FP] Extending conversions
 ;; ---- [PRED<-PRED] Packs
 ;; ---- [PRED<-PRED] Unpacks
 ;;
@@ -9524,18 +9526,37 @@
 ;; - FCVTZU
 ;; -------------------------------------------------------------------------
 
-;; Unpredicated conversion of floats to integers of the same size (HF to HI,
-;; SF to SI or DF to DI).
-(define_expand "<optab><mode><v_int_equiv>2"
-  [(set (match_operand:<V_INT_EQUIV> 0 "register_operand")
-       (unspec:<V_INT_EQUIV>
+;; Unpredicated conversion of floats to integers of the same size or wider,
+;; excluding conversions from DF (see below).
+(define_expand "<optab><SVE_HSF:mode><SVE_HSDI:mode>2"
+  [(set (match_operand:SVE_HSDI 0 "register_operand")
+       (unspec:SVE_HSDI
+         [(match_dup 2)
+          (match_dup 3)
+          (match_operand:SVE_HSF 1 "register_operand")]
+         SVE_COND_FCVTI))]
+  "TARGET_SVE
+   && (~(<SVE_HSDI:self_mask> | <SVE_HSDI:narrower_mask>) & 
<SVE_HSF:self_mask>) == 0"
+  {
+    operands[2] = aarch64_sve_fp_pred (<SVE_HSDI:MODE>mode, &operands[3]);
+  }
+)
+
+;; SI <- DF can't use SI <- trunc (DI <- DF) without -ffast-math, so this
+;; truncating variant of FCVTZ{S,U} is useful for auto-vectorization.
+;;
+;; DF is the only source mode for which the mask used above doesn't apply,
+;; we define a separate pattern for it here.
+(define_expand "<optab><VNx2DF_ONLY:mode><SVE_2SDI:mode>2"
+  [(set (match_operand:SVE_2SDI 0 "register_operand")
+       (unspec:SVE_2SDI
          [(match_dup 2)
           (const_int SVE_RELAXED_GP)
-          (match_operand:SVE_FULL_F 1 "register_operand")]
+          (match_operand:VNx2DF_ONLY 1 "register_operand")]
          SVE_COND_FCVTI))]
   "TARGET_SVE"
   {
-    operands[2] = aarch64_ptrue_reg (<VPRED>mode);
+    operands[2] = aarch64_ptrue_reg (VNx2BImode);
   }
 )
 
@@ -9554,18 +9575,37 @@
   }
 )
 
-;; Predicated narrowing float-to-integer conversion.
-(define_insn "@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>"
-  [(set (match_operand:VNx4SI_ONLY 0 "register_operand")
-       (unspec:VNx4SI_ONLY
+;; As above, for pairs used by the auto-vectorizer only.
+(define_insn "*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_F:mode><SVE_HSDI:mode>"
+  [(set (match_operand:SVE_HSDI 0 "register_operand")
+       (unspec:SVE_HSDI
+         [(match_operand:<SVE_HSDI:VPRED> 1 "aarch64_predicate_operand")
+          (match_operand:SI 3 "aarch64_sve_gp_strictness")
+          (match_operand:SVE_PARTIAL_F 2 "register_operand")]
+         SVE_COND_FCVTI))]
+   "TARGET_SVE
+   && (~(<SVE_HSDI:self_mask> | <SVE_HSDI:narrower_mask>) & 
<SVE_PARTIAL_F:self_mask>) == 0"
+  {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
+     [ w        , Upl , 0 ; *              ] fcvtz<su>\t%0.<SVE_HSDI:Vetype>, 
%1/m, %2.<SVE_PARTIAL_F:Vetype>
+     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
%2\;fcvtz<su>\t%0.<SVE_HSDI:Vetype>, %1/m, %2.<SVE_PARTIAL_F:Vetype>
+  }
+)
+
+;; Predicated narrowing float-to-integer conversion.  The VNx2DF->VNx4SI
+;; variant is provided for the ACLE, where the zeroed odd-indexed lanes are
+;; significant.  The VNx2DF->VNx2SI variant is provided for auto-vectorization,
+;; where the upper 32 bits of each container are ignored.
+(define_insn "@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><SVE_SI:mode>"
+  [(set (match_operand:SVE_SI 0 "register_operand")
+       (unspec:SVE_SI
          [(match_operand:VNx2BI 1 "register_operand")
           (match_operand:SI 3 "aarch64_sve_gp_strictness")
           (match_operand:VNx2DF_ONLY 2 "register_operand")]
          SVE_COND_FCVTI))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
-     [ w        , Upl , 0 ; *              ] 
fcvtz<su>\t%0.<VNx4SI_ONLY:Vetype>, %1/m, %2.<VNx2DF_ONLY:Vetype>
-     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
%2\;fcvtz<su>\t%0.<VNx4SI_ONLY:Vetype>, %1/m, %2.<VNx2DF_ONLY:Vetype>
+     [ w        , Upl , 0 ; *              ] fcvtz<su>\t%0.<SVE_SI:Vetype>, 
%1/m, %2.<VNx2DF_ONLY:Vetype>
+     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
%2\;fcvtz<su>\t%0.<SVE_SI:Vetype>, %1/m, %2.<VNx2DF_ONLY:Vetype>
   }
 )
 
@@ -9710,18 +9750,19 @@
 ;; - UCVTF
 ;; -------------------------------------------------------------------------
 
-;; Unpredicated conversion of integers to floats of the same size
-;; (HI to HF, SI to SF or DI to DF).
-(define_expand "<optab><v_int_equiv><mode>2"
-  [(set (match_operand:SVE_FULL_F 0 "register_operand")
-       (unspec:SVE_FULL_F
+;; Unpredicated conversion of integers to floats of the same size or
+;; narrower.
+(define_expand "<optab><SVE_HSDI:mode><SVE_F:mode>2"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
          [(match_dup 2)
-          (const_int SVE_RELAXED_GP)
-          (match_operand:<V_INT_EQUIV> 1 "register_operand")]
+          (match_dup 3)
+          (match_operand:SVE_HSDI 1 "register_operand")]
          SVE_COND_ICVTF))]
-  "TARGET_SVE"
+  "TARGET_SVE
+   && (~(<SVE_HSDI:self_mask> | <SVE_HSDI:narrower_mask>) & <SVE_F:self_mask>) 
== 0"
   {
-    operands[2] = aarch64_ptrue_reg (<VPRED>mode);
+    operands[2] = aarch64_sve_fp_pred (<SVE_HSDI:MODE>mode, &operands[3]);
   }
 )
 
@@ -9741,6 +9782,22 @@
   }
 )
 
+;; As above, for pairs that are used by the auto-vectorizer only.
+(define_insn 
"*aarch64_sve_<optab>_nonextend<SVE_HSDI:mode><SVE_PARTIAL_F:mode>"
+  [(set (match_operand:SVE_PARTIAL_F 0 "register_operand")
+       (unspec:SVE_PARTIAL_F
+         [(match_operand:<SVE_HSDI:VPRED> 1 "aarch64_predicate_operand")
+          (match_operand:SI 3 "aarch64_sve_gp_strictness")
+          (match_operand:SVE_HSDI 2 "register_operand")]
+         SVE_COND_ICVTF))]
+  "TARGET_SVE
+   && (~(<SVE_HSDI:self_mask> | <SVE_HSDI:narrower_mask>) & 
<SVE_PARTIAL_F:self_mask>) == 0"
+  {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
+     [ w        , Upl , 0 ; *              ] 
<su>cvtf\t%0.<SVE_PARTIAL_F:Vetype>, %1/m, %2.<SVE_HSDI:Vetype>
+     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
%2\;<su>cvtf\t%0.<SVE_PARTIAL_F:Vetype>, %1/m, %2.<SVE_HSDI:Vetype>
+  }
+)
+
 ;; Predicated widening integer-to-float conversion.
 (define_insn "@aarch64_sve_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>"
   [(set (match_operand:VNx2DF_ONLY 0 "register_operand")
@@ -9924,6 +9981,27 @@
   }
 )
 
+;; -------------------------------------------------------------------------
+;; ---- [FP<-FP] Truncating conversions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - FCVT
+;; -------------------------------------------------------------------------
+
+;; Unpredicated float-to-float truncation.
+(define_expand "trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>2"
+  [(set (match_operand:SVE_PARTIAL_HSF 0 "register_operand")
+       (unspec:SVE_PARTIAL_HSF
+         [(match_dup 2)
+          (match_dup 3)
+          (match_operand:SVE_SDF 1 "register_operand")]
+         SVE_COND_FCVT))]
+  "TARGET_SVE && (~<SVE_SDF:narrower_mask> & <SVE_PARTIAL_HSF:self_mask>) == 0"
+  {
+    operands[2] = aarch64_sve_fp_pred (<SVE_SDF:MODE>mode, &operands[3]);
+  }
+)
+
 ;; Predicated float-to-float truncation.
 (define_insn "@aarch64_sve_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>"
   [(set (match_operand:SVE_FULL_HSF 0 "register_operand")
@@ -9939,6 +10017,21 @@
   }
 )
 
+;; As above, for pairs that are used by the auto-vectorizer only.
+(define_insn "*aarch64_sve_<optab>_trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>"
+  [(set (match_operand:SVE_PARTIAL_HSF 0 "register_operand")
+       (unspec:SVE_PARTIAL_HSF
+         [(match_operand:<SVE_SDF:VPRED> 1 "aarch64_predicate_operand")
+          (match_operand:SI 3 "aarch64_sve_gp_strictness")
+          (match_operand:SVE_SDF 2 "register_operand")]
+         SVE_COND_FCVT))]
+  "TARGET_SVE && (~<SVE_SDF:narrower_mask> & <SVE_PARTIAL_HSF:self_mask>) == 0"
+  {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
+     [ w        , Upl , 0 ; *              ] 
fcvt\t%0.<SVE_PARTIAL_HSF:Vetype>, %1/m, %2.<SVE_SDF:Vetype>
+     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
%2\;fcvt\t%0.<SVE_PARTIAL_HSF:Vetype>, %1/m, %2.<SVE_SDF:Vetype>
+  }
+)
+
 ;; Predicated float-to-float truncation with merging.
 (define_expand "@cond_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>"
   [(set (match_operand:SVE_FULL_HSF 0 "register_operand")
@@ -10081,6 +10174,27 @@
   }
 )
 
+;; -------------------------------------------------------------------------
+;; ---- [FP<-FP] Extending conversions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - FCVT
+;; -------------------------------------------------------------------------
+
+;; Unpredicated float-to-float extension.
+(define_expand "extend<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>2"
+  [(set (match_operand:SVE_SDF 0 "register_operand")
+       (unspec:SVE_SDF
+         [(match_dup 2)
+          (match_dup 3)
+          (match_operand:SVE_PARTIAL_HSF 1 "register_operand")]
+         SVE_COND_FCVT))]
+  "TARGET_SVE && (~<SVE_SDF:narrower_mask> & <SVE_PARTIAL_HSF:self_mask>) == 0"
+  {
+    operands[2] = aarch64_sve_fp_pred (<SVE_SDF:MODE>mode, &operands[3]);
+  }
+)
+
 ;; Predicated float-to-float extension.
 (define_insn 
"@aarch64_sve_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>"
   [(set (match_operand:SVE_FULL_SDF 0 "register_operand")
@@ -10096,6 +10210,21 @@
   }
 )
 
+;; As above, for pairs that are used by the auto-vectorizer only.
+(define_insn 
"*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>"
+  [(set (match_operand:SVE_SDF 0 "register_operand")
+       (unspec:SVE_SDF
+         [(match_operand:<SVE_SDF:VPRED> 1 "aarch64_predicate_operand")
+          (match_operand:SI 3 "aarch64_sve_gp_strictness")
+          (match_operand:SVE_PARTIAL_HSF 2 "register_operand")]
+         SVE_COND_FCVT))]
+  "TARGET_SVE && (~<SVE_SDF:narrower_mask> & <SVE_PARTIAL_HSF:self_mask>) == 0"
+  {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
+     [ w        , Upl , 0 ; *              ] fcvt\t%0.<SVE_SDF:Vetype>, %1/m, 
%2.<SVE_PARTIAL_HSF:Vetype>
+     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
%2\;fcvt\t%0.<SVE_SDF:Vetype>, %1/m, %2.<SVE_PARTIAL_HSF:Vetype>
+  }
+)
+
 ;; Predicated float-to-float extension with merging.
 (define_expand "@cond_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>"
   [(set (match_operand:SVE_FULL_SDF 0 "register_operand")
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a65552a062f9..5540946eac71 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3860,6 +3860,44 @@ aarch64_sve_same_pred_for_ptest_p (rtx *pred1, rtx 
*pred2)
   return (ptrue1_p && ptrue2_p) || rtx_equal_p (pred1[0], pred2[0]);
 }
 
+
+/* Generate a predicate to control partial SVE mode DATA_MODE as if it
+   were fully packed, enabling the defined elements only.  */
+rtx
+aarch64_sve_packed_pred (machine_mode data_mode)
+{
+  unsigned int container_bytes
+    = aarch64_sve_container_bits (data_mode) / BITS_PER_UNIT;
+  /* Enable the significand of each container only.  */
+  rtx ptrue = force_reg (VNx16BImode, aarch64_ptrue_all (container_bytes));
+  /* Predicate at the element size.  */
+  machine_mode pmode
+    = aarch64_sve_pred_mode (GET_MODE_UNIT_SIZE (data_mode)).require ();
+  return gen_lowpart (pmode, ptrue);
+}
+
+/* Generate a predicate and strictness value to govern a floating-point
+   operation with SVE mode DATA_MODE.
+
+   If DATA_MODE is a partial vector mode, this pair prevents the operation
+   from interpreting undefined elements - unless we don't need to suppress
+   their trapping behavior.  */
+rtx
+aarch64_sve_fp_pred (machine_mode data_mode, rtx *strictness)
+{
+   unsigned int vec_flags = aarch64_classify_vector_mode (data_mode);
+   if (flag_trapping_math && (vec_flags & VEC_PARTIAL))
+     {
+       if (strictness)
+        *strictness = gen_int_mode (SVE_STRICT_GP, SImode);
+       return aarch64_sve_packed_pred (data_mode);
+     }
+   if (strictness)
+     *strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
+   /* Use the VPRED mode.  */
+   return aarch64_ptrue_reg (aarch64_sve_pred_mode (data_mode));
+}
+
 /* Emit a comparison CMP between OP0 and OP1, both of which have mode
    DATA_MODE, and return the result in a predicate of mode PRED_MODE.
    Use TARGET as the target register if nonnull and convenient.  */
@@ -23697,6 +23735,19 @@ aarch64_simd_shift_imm_p (rtx x, machine_mode mode, 
bool left)
     return IN_RANGE (INTVAL (x), 1, bit_width);
 }
 
+
+/* Check whether X can control SVE mode MODE.  */
+bool
+aarch64_sve_valid_pred_p (rtx x, machine_mode mode)
+{
+  machine_mode pred_mode = GET_MODE (x);
+  if (!aarch64_sve_pred_mode_p (pred_mode))
+    return false;
+
+  return known_ge (GET_MODE_NUNITS (pred_mode),
+                  GET_MODE_NUNITS (mode));
+}
+
 /* Return the bitmask CONST_INT to select the bits required by a zero extract
    operation of width WIDTH at bit position POS.  */
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d62953877517..2700392db5fa 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -541,6 +541,13 @@
 ;; elements.
 (define_mode_iterator SVE_FULL_HSF [VNx8HF VNx4SF])
 
+;; Partial SVE floating-point vector modes that have 16-bit or 32-bit
+;; elements.
+(define_mode_iterator SVE_PARTIAL_HSF [VNx2HF VNx4HF VNx2SF])
+
+;; SVE floating-point vector modes that have 16-bit or 32-bit elements.
+(define_mode_iterator SVE_HSF [SVE_PARTIAL_HSF SVE_FULL_HSF])
+
 ;; Fully-packed SVE integer vector modes that have 16-bit or 64-bit elements.
 (define_mode_iterator SVE_FULL_HDI [VNx8HI VNx2DI])
 
@@ -565,6 +572,9 @@
 (define_mode_iterator SVE_MATMULF [(VNx4SF "TARGET_SVE_F32MM")
                                   (VNx2DF "TARGET_SVE_F64MM")])
 
+;; SVE floating-point vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SDF [VNx2SF SVE_FULL_SDF])
+
 ;; Fully-packed SVE vector modes that have 32-bit or smaller elements.
 (define_mode_iterator SVE_FULL_BHS [VNx16QI VNx8HI VNx4SI
                                    VNx8BF VNx8HF VNx4SF])
@@ -634,6 +644,9 @@
                                VNx4SI VNx2SI
                                VNx2DI])
 
+;; SVE integer vector modes with 32-bit elements.
+(define_mode_iterator SVE_SI [VNx2SI VNx4SI])
+
 (define_mode_iterator SVE_DIx24 [VNx4DI VNx8DI])
 
 ;; SVE modes with 2 or 4 elements.
@@ -649,6 +662,9 @@
 (define_mode_iterator SVE_2 [VNx2QI VNx2HI VNx2HF VNx2BF
                             VNx2SI VNx2SF VNx2DI VNx2DF])
 
+;; SVE SI and DI modes with 2 elements.
+(define_mode_iterator SVE_2SDI [VNx2SI VNx2DI])
+
 ;; SVE integer modes with 2 elements, excluding the widest element.
 (define_mode_iterator SVE_2BHSI [VNx2QI VNx2HI VNx2SI])
 
@@ -2596,19 +2612,22 @@
 (define_mode_attr data_bytes [(VNx16BI "1") (VNx8BI "2")
                              (VNx4BI "4") (VNx2BI "8")])
 
-;; Two-nybble mask for partial vector modes: nunits, byte size.
-(define_mode_attr self_mask [(VNx8QI "0x81")
-                            (VNx4QI "0x41")
-                            (VNx2QI "0x21")
-                            (VNx4HI "0x42")
-                            (VNx2HI "0x22")
-                            (VNx2SI "0x24")])
-
-;; For SVE_HSDI vector modes, the mask of narrower modes, encoded as above.
-(define_mode_attr narrower_mask [(VNx8HI "0x81") (VNx4HI "0x41")
-                                (VNx2HI "0x21")
-                                (VNx4SI "0x43") (VNx2SI "0x23")
-                                (VNx2DI "0x27")])
+;; Two-nybble mask for vector modes: nunits, byte size.
+(define_mode_attr self_mask [(VNx2HI "0x22") (VNx2HF "0x22")
+                            (VNx4HI "0x42") (VNx4HF "0x42")
+                            (VNx8HI "0x82") (VNx8HF "0x82")
+                            (VNx2SI "0x24") (VNx2SF "0x24")
+                            (VNx4SI "0x44") (VNx4SF "0x44")
+                            (VNx2DI "0x28") (VNx2DF "0x28")
+                            (VNx8QI "0x81") (VNx4QI "0x41") (VNx2QI "0x21")])
+
+;; The mask of narrower vector modes, encoded as above.
+(define_mode_attr narrower_mask [(VNx8HI "0x81") (VNx8HF "0x81")
+                                (VNx4HI "0x41") (VNx4HF "0x41")
+                                (VNx2HI "0x21") (VNx2HF "0x21")
+                                (VNx4SI "0x43") (VNx4SF "0x43")
+                                (VNx2SI "0x23") (VNx2SF "0x23")
+                                (VNx2DI "0x27") (VNx2DF "0x27")])
 
 ;; The constraint to use for an SVE [SU]DOT, FMUL, FMLA or FMLS lane index.
 (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 2c6af831eae1..d8e9725a1b65 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -587,6 +587,11 @@
   return aarch64_simd_shift_imm_p (op, mode, false);
 })
 
+(define_special_predicate "aarch64_predicate_operand"
+  (and (match_code "reg,subreg")
+       (match_test "register_operand (op, GET_MODE (op))")
+       (match_test "aarch64_sve_valid_pred_p (op, mode)")))
+
 (define_predicate "aarch64_simd_imm_zero"
   (and (match_code "const,const_vector")
        (match_test "op == CONST0_RTX (GET_MODE (op))")))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_signed_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_signed_1.c
index 367fbd967a3e..5c76cbd88da7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_signed_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_signed_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } 
*/
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c
index c5da480c9932..5e3881a895ec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } 
*/
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pack_float_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pack_float_1.c
index 2683a87f4ff4..4810df88e407 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pack_float_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pack_float_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } 
*/
 
 void __attribute__ ((noinline, noclone))
 pack_float_plus_1point1 (float *d, double *s, int size)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c
index deb4cf5e940b..d1e74634ece8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize  --param aarch64-vect-compare-costs=0" } 
*/
 
 void __attribute__ ((noinline, noclone))
 unpack_float_plus_7point9 (double *d, float *s, int size)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_1.c
new file mode 100644
index 000000000000..76baffa359b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_1.c
@@ -0,0 +1,217 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=2048 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+#include <stdint.h>
+
+typedef _Float16 v32hf __attribute__((vector_size(64)));
+typedef _Float16 v64hf __attribute__((vector_size(128)));
+
+typedef float v32sf __attribute__((vector_size(128)));
+typedef float v64sf __attribute__((vector_size(256)));
+
+typedef double v32df __attribute__((vector_size(256)));
+
+typedef int16_t v32hi __attribute__((vector_size(64)));
+typedef int16_t v64hi __attribute__((vector_size(128)));
+typedef uint16_t v32uhi __attribute__((vector_size(64)));
+typedef uint16_t v64uhi __attribute__((vector_size(128)));
+
+typedef int32_t v32si __attribute__((vector_size(128)));
+typedef int32_t v64si __attribute__((vector_size(256)));
+typedef uint32_t v32usi __attribute__((vector_size(128)));
+typedef uint32_t v64usi __attribute__((vector_size(256)));
+
+typedef int64_t v32di __attribute__((vector_size(256)));
+typedef uint64_t v32udi __attribute__((vector_size(256)));
+
+/*
+** float_2hf2hi:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     scvtf   (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v32hf
+float_2hf2hi (v32hi x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** float_2hf2uhi:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     ucvtf   (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v32hf
+float_2hf2uhi (v32uhi x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** float_2hf2si:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     scvtf   (z[0-9]+)\.h, \2/m, \1\.s
+**     ...
+*/
+v32hf
+float_2hf2si (v32si x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** float_2hf2usi:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     ucvtf   (z[0-9]+)\.h, \2/m, \1\.s
+**     ...
+*/
+v32hf
+float_2hf2usi (v32usi x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** float_2hf2di:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     scvtf   (z[0-9]+)\.h, \1/m, \2\.d
+**     ...
+*/
+v32hf
+float_2hf2di (v32di x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** float_2hf2udi:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     ucvtf   (z[0-9]+)\.h, \1/m, \2\.d
+**     ...
+*/
+v32hf
+float_2hf2udi (v32udi x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** float_4hf4hi:
+**     ...
+**     ld1h    (z[0-9]+)\.s, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.s, vl64
+**     scvtf   (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v64hf
+float_4hf4hi (v64hi x)
+{
+  return __builtin_convertvector (x, v64hf);
+}
+
+/*
+** float_4hf4uhi:
+**     ...
+**     ld1h    (z[0-9]+)\.s, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.s, vl64
+**     ucvtf   (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v64hf
+float_4hf4uhi (v64uhi x)
+{
+  return __builtin_convertvector (x, v64hf);
+}
+
+/*
+** float_4hf4si:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1w    (z[0-9]+)\.s, \1/z, \[x0\]
+**     scvtf   (z[0-9]+)\.h, \1/m, \2\.s
+**     ...
+*/
+v64hf
+float_4hf4si (v64si x)
+{
+  return __builtin_convertvector (x, v64hf);
+}
+
+/*
+** float_4hf4usi:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1w    (z[0-9]+)\.s, \1/z, \[x0\]
+**     ucvtf   (z[0-9]+)\.h, \1/m, \2\.s
+**     ...
+*/
+v64hf
+float_4hf4usi (v64usi x)
+{
+  return __builtin_convertvector (x, v64hf);
+}
+
+/*
+** float_2sf2si:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     scvtf   (z[0-9]+)\.s, \2/m, \1\.s
+**     ...
+*/
+v32sf
+float_2sf2si (v32si x)
+{
+  return __builtin_convertvector (x, v32sf);
+}
+
+/*
+** float_2sf2usi:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     ucvtf   (z[0-9]+)\.s, \2/m, \1\.s
+**     ...
+*/
+v32sf
+float_2sf2usi (v32usi x)
+{
+  return __builtin_convertvector (x, v32sf);
+}
+
+/*
+** float_2sf2di:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     scvtf   (z[0-9]+)\.s, \1/m, \2\.d
+**     ...
+*/
+v32sf
+float_2sf2di (v32di x)
+{
+  return __builtin_convertvector (x, v32sf);
+}
+
+/*
+** float_2sf2udi:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     ucvtf   (z[0-9]+)\.s, \1/m, \2\.d
+**     ...
+*/
+v32sf
+float_2sf2udi (v32udi x)
+{
+  return __builtin_convertvector (x, v32sf);
+}
+
+/* { dg-final { check-function-bodies "**" "" ""} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_2.c
new file mode 100644
index 000000000000..f578bcfbdef1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=2048 -fno-trapping-math" } */
+
+#include "unpacked_cvtf_1.c"
+
+/* { dg-final { scan-assembler-not {\tptrue\tp[0-7]\.d} } } */
+/* { dg-final { scan-assembler-not {\tptrue\tp[0-7]\.s} } } */
+/* { dg-final { scan-assembler-times {\tptrue\tp[0-7]\.b} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.s\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_3.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_3.c
new file mode 100644
index 000000000000..6324bdd8db77
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+void f64_i32 (double *restrict x, int32_t  *restrict y, int n)
+{
+  for (int i = 0; i < n; i++)
+    x[i] = (double)y[i];
+}
+
+/* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.[sd], p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_1.c
new file mode 100644
index 000000000000..0babf1523849
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_1.c
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=2048 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+typedef _Float16 v32hf __attribute__((vector_size(64)));
+typedef _Float16 v64hf __attribute__((vector_size(128)));
+
+typedef float v32sf __attribute__((vector_size(128)));
+typedef float v64sf __attribute__((vector_size(256)));
+
+typedef double v32df __attribute__((vector_size(256)));
+
+/*
+** trunc_2sf2df:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvt    (z[0-9]+)\.s, \1/m, \2\.d
+**     ...
+*/
+v32sf
+trunc_2sf2df (v32df x)
+{
+  return __builtin_convertvector (x, v32sf);
+}
+
+/*
+** trunc_2hf2df:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvt    (z[0-9]+)\.h, \1/m, \2\.d
+**     ...
+*/
+v32hf
+trunc_2hf2df (v32df x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** trunc_4hf4sf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1w    (z[0-9]+)\.s, \1/z, \[x0\]
+**     fcvt    (z[0-9]+)\.h, \1/m, \2\.s
+**     ...
+*/
+v64hf
+trunc_4hf4sf (v64sf x)
+{
+  return __builtin_convertvector (x, v64hf);
+}
+
+/*
+** trunc_2hf2sf:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvt    (z[0-9]+)\.h, \2/m, \1\.s
+**     ...
+*/
+v32hf
+trunc_2hf2sf (v32sf x)
+{
+  return __builtin_convertvector (x, v32hf);
+}
+
+/*
+** extend_2df2hf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1h    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvt    (z[0-9]+)\.d, \1/m, \2\.h
+**     ...
+*/
+v32df
+extend_2df2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32df);
+}
+
+/*
+** extend_2df2sf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1w    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvt    (z[0-9]+)\.d, \1/m, \2\.s
+**     ...
+*/
+v32df
+extend_2df2sf (v32sf x)
+{
+  return __builtin_convertvector (x, v32df);
+}
+
+/*
+** extend_4sf4hf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1h    (z[0-9]+)\.s, \1/z, \[x0\]
+**     fcvt    (z[0-9]+)\.s, \1/m, \2\.h
+**     ...
+*/
+v64sf
+extend_4sf4hf (v64hf x)
+{
+  return __builtin_convertvector (x, v64sf);
+}
+
+/*
+** extend_2sf2hf:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvt    (z[0-9]+)\.s, \2/m, \1\.h
+**     ...
+*/
+v32sf
+extend_2sf2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32sf);
+}
+
+/* { dg-final { check-function-bodies "**" "" ""} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_2.c
new file mode 100644
index 000000000000..8c369eec6aed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=2048 -fno-trapping-math" } */
+
+#include "unpacked_fcvt_1.c"
+
+/* { dg-final { scan-assembler-not {\tptrue\tp[0-7]\.d} } } */
+/* { dg-final { scan-assembler-not {\tptrue\tp[0-7]\.s} } } */
+/* { dg-final { scan-assembler-times {\tptrue\tp[0-7]\.b} 8 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.s\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_1.c
new file mode 100644
index 000000000000..773a3dc961da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_1.c
@@ -0,0 +1,244 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=2048 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+#include <stdint.h>
+
+typedef _Float16 v32hf __attribute__((vector_size(64)));
+typedef _Float16 v64hf __attribute__((vector_size(128)));
+
+typedef float v32sf __attribute__((vector_size(128)));
+typedef float v64sf __attribute__((vector_size(256)));
+
+typedef double v32df __attribute__((vector_size(256)));
+
+typedef int16_t v32hi __attribute__((vector_size(64)));
+typedef int16_t v64hi __attribute__((vector_size(128)));
+typedef uint16_t v32uhi __attribute__((vector_size(64)));
+typedef uint16_t v64uhi __attribute__((vector_size(128)));
+
+typedef int32_t v32si __attribute__((vector_size(128)));
+typedef int32_t v64si __attribute__((vector_size(256)));
+typedef uint32_t v32usi __attribute__((vector_size(128)));
+typedef uint32_t v64usi __attribute__((vector_size(256)));
+
+typedef int64_t v32di __attribute__((vector_size(256)));
+typedef uint64_t v32udi __attribute__((vector_size(256)));
+
+
+/*
+** fix_trunc_2hi2hf:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvtzs  (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v32hi
+fix_trunc_2hi2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32hi);
+}
+
+/*
+** fix_trunc_2uhi2hf:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvtzu  (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v32uhi
+fix_trunc_2uhi2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32uhi);
+}
+
+/*
+** fix_trunc_2si2hf:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvtzs  (z[0-9]+)\.s, \2/m, \1\.h
+**     ...
+*/
+v32si
+fix_trunc_2si2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32si);
+}
+
+/*
+** fix_trunc_2usi2hf:
+**     ...
+**     ld1h    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvtzu  (z[0-9]+)\.s, \2/m, \1\.h
+**     ...
+*/
+v32usi
+fix_trunc_2usi2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32usi);
+}
+
+/*
+** fix_trunc_2di2hf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1h    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvtzs  (z[0-9]+)\.d, \1/m, \2\.h
+**     ...
+*/
+v32di
+fix_trunc_2di2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32di);
+}
+
+/*
+** fix_trunc_2udi2hf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1h    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvtzu  (z[0-9]+)\.d, \1/m, \2\.h
+**     ...
+*/
+v32udi
+fix_trunc_2udi2hf (v32hf x)
+{
+  return __builtin_convertvector (x, v32udi);
+}
+
+/*
+** fix_trunc_4hi4hf:
+**     ...
+**     ld1h    (z[0-9]+)\.s, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.s, vl64
+**     fcvtzs  (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v64hi
+fix_trunc_4hi4hf (v64hf x)
+{
+  return __builtin_convertvector (x, v64hi);
+}
+
+/*
+** fix_trunc_4uhi4hf:
+**     ...
+**     ld1h    (z[0-9]+)\.s, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.s, vl64
+**     fcvtzu  (z[0-9]+)\.h, \2/m, \1\.h
+**     ...
+*/
+v64uhi
+fix_trunc_4uhi4hf (v64hf x)
+{
+  return __builtin_convertvector (x, v64uhi);
+}
+
+/*
+** fix_trunc_4si4hf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1h    (z[0-9]+)\.s, \1/z, \[x0\]
+**     fcvtzs  (z[0-9]+)\.s, \1/m, \2\.h
+**     ...
+*/
+v64si
+fix_trunc_4si4hf (v64hf x)
+{
+  return __builtin_convertvector (x, v64si);
+}
+
+/*
+** fix_trunc_4usi4hf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1h    (z[0-9]+)\.s, \1/z, \[x0\]
+**     fcvtzu  (z[0-9]+)\.s, \1/m, \2\.h
+**     ...
+*/
+v64usi
+fix_trunc_4usi4hf (v64hf x)
+{
+  return __builtin_convertvector (x, v64usi);
+}
+
+/*
+** fix_trunc_2si2sf:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvtzs  (z[0-9]+)\.s, \2/m, \1\.s
+**     ...
+*/
+v32si
+fix_trunc_2si2sf (v32sf x)
+{
+  return __builtin_convertvector (x, v32si);
+}
+
+/*
+** fix_trunc_2usi2sf:
+**     ...
+**     ld1w    (z[0-9]+)\.d, p[0-7]/z, \[x0\]
+**     ptrue   (p[0-7])\.d, vl32
+**     fcvtzu  (z[0-9]+)\.s, \2/m, \1\.s
+**     ...
+*/
+v32usi
+fix_trunc_2usi2sf (v32sf x)
+{
+  return __builtin_convertvector (x, v32usi);
+}
+
+/*
+** fix_trunc_2di2sf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1w    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvtzs  (z[0-9]+)\.d, \1/m, \2\.s
+**     ...
+*/
+v32di
+fix_trunc_2di2sf (v32sf x)
+{
+  return __builtin_convertvector (x, v32di);
+}
+
+/*
+** fix_trunc_2udi2sf:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1w    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvtzu  (z[0-9]+)\.d, \1/m, \2\.s
+**     ...
+*/
+v32udi
+fix_trunc_2udi2sf (v32sf x)
+{
+  return __builtin_convertvector (x, v32udi);
+}
+
+/*
+** fix_trunc_2si2df:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvtzs  (z[0-9]+)\.s, \1/m, \2\.d
+**     ...
+*/
+v32si
+fix_trunc_2si2df (v32df x)
+{
+  return __builtin_convertvector (x, v32si);
+}
+
+/*
+** fix_trunc_2usi2df:
+**     ptrue   (p[0-7])\.b, vl256
+**     ld1d    (z[0-9]+)\.d, \1/z, \[x0\]
+**     fcvtzu  (z[0-9]+)\.s, \1/m, \2\.d
+**     ...
+*/
+v32usi
+fix_trunc_2usi2df (v32df x)
+{
+  return __builtin_convertvector (x, v32usi);
+}
+
+/* { dg-final { check-function-bodies "**" "" ""} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_2.c
new file mode 100644
index 000000000000..0587753d15e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=2048 -fno-trapping-math" } */
+
+#include "unpacked_fcvtz_1.c"
+
+/* { dg-final { scan-assembler-not {\tptrue\tp[0-7]\.d} } } */
+/* { dg-final { scan-assembler-not {\tptrue\tp[0-7]\.s} } } */
+/* { dg-final { scan-assembler-times {\tptrue\tp[0-7]\.b} 16 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvtzs\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tfcvtzu\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvtzs\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tfcvtzu\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.h\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvtzs\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvtzu\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.h\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvtzs\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvtzu\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvtzs\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvtzu\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.s\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfcvtzs\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfcvtzu\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.d\n} 1 } } */

[gcc r16-1530] aarch64: Add support for unpacked SVE FP conversions

Reply via email to