RE: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

Tamar Christina Thu, 31 Jul 2025 08:21:40 -0700

> -----Original Message-----
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Sent: Thursday, July 31, 2025 3:47 PM
> To: Jennifer Schmitz <jschm...@nvidia.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Andrew Pinski
> <pins...@gmail.com>; Richard Earnshaw <richard.earns...@arm.com>; Richard
> Sandiford <richard.sandif...@arm.com>; Tamar Christina
> <tamar.christ...@arm.com>; Alex Coplan <alex.cop...@arm.com>
> Subject: Re: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.
> 
> 
> 
> > On 29 Jul 2025, at 17:14, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> >
> > This patch adds dispatch constraints for Neoverse V2 and illustrates the 
> > steps
> > necessary to enable dispatch scheduling for an AArch64 core.
> >
> > The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG.
> > Please note that the values used here deviate slightly from the current SWOG
> > version but are based on correct numbers. Arm will do an official Neoverse 
> > V2
> > SWOG release with the updated values in due time.
> >
> > Here are the steps how we implemented the dispatch constraints for
> > Neoverse V2:
> > 1. We used instruction attributes to group instructions into dispatch 
> > groups,
> >   corresponding to operations that utilize a certain pipeline type. For 
> > that,
> >   we added a new attribute (neoversev2_dispatch) with values for the
> >   different dispatch groups. The values of neoversev2_dispatch are 
> > determined
> >   using expressions of other instruction attributes.
> >   For example, the SWOG describes a constraint of "Up to 4 uOPs utilizing 
> > the
> >   M pipelines". Thus, one of the values of neoversev2_dispatch is "m" and it
> >   groups instructions that use the M pipelines such as integer 
> > multiplication.
> >   Note that we made some minor simplifications compared to the information
> >   in the SWOG, because the instruction annotation does not allow for a fully
> >   accurate mapping of instructions to utilized pipelines. To give one 
> > example,
> >   the instructions IRG and LDG are both tagged with "memtag", but IRG uses
> >   the M pipelines, while LDG uses the L pipelines.
> > 2. In the Neoverse V2 tuning model, we added an array of dispatch_constraint
> >   objects and referenced it in the tune_params. The new attribute
> >   neoversev2_dispatch provided a compact way to define the dispatch
> >   constraints.
> > 3. We enabled dispatch scheduling for Neoverse V2 by adding the
> >   AARCH64_EXTRA_TUNE_DISPATCH_SCHED tune flag.
> >
> > Performance evaluation on Grace machine using SPEC2017 and GROMACS2024:
> > We ran each benchmark 5 times compiled with trunk (commit a1fb757) and with
> > the patch series and computed the speed-up for the median values per
> > test (i.e. values >1 mean that the patch series improves performance):
> >
> > SPEC2017 FP (-O3 -Wl,-z,muldefs -lm -fallow-argument-mismatch -fpermissive
> >     -fstack-arrays -flto=auto -Wl,--sort-section=name -march=native
> >     -mcpu=neoverse-v2 -std=gnu17):
> > Geom. mean of speed-ups         1.0006
> > blender                         1.0008
> > bwaves                          0.9996
> > cactuBSSN                       1.0007
> > fotonik3d                       1.0002
> > imagick                         0.9999
> > lbm                             1.0016
> > nab                             1.0012
> > namd                            1.0002
> > parest                          1.0004
> > povray                          1.0029
> > roms                            1.0000
> > wrf                             1.0003
> >
> > SPEC2017 INT (same as SPEC2017 FP):
> > Geom. mean of speed-ups         0.9994
> > deepsjeng                       0.9991
> > gcc                             1.0024
> > leela                           0.9985
> > mcf                             0.9985
> > exchange2                       1.0000
> > omnetpp                         1.0005
> > perlbench                       0.9975
> > x264                            1.0032
> > xalancbmk                       0.9916
> > xz                              1.0032
> >
> > GROMACS 2024 (-O3 -Wl,-z,muldefs -lm -flto=auto -Wl,--sort-section=name
> >      -march=native -mcpu=neoverse-v2)
> > Geom. mean of speed-ups:                     1.0024
> > 22vs23_cut_arm_neon_asimd_cpu_perf           1.0005
> > 22vs23_cut_arm_sve_cpu_perf                  1.0153
> > 22vs23_fsw_arm_neon_asimd_cpu_perf           1.0107
> > 22vs23_fsw_arm_sve_cpu_perf                  1.0156
> > 22vs23_ljpme-geom_arm_neon_asimd_cpu_perf    1.0081
> > 22vs23_ljpme-geom_arm_sve_cpu_perf           1.0024
> > 22vs23_ljpme-lb_arm_neon_asimd_cpu_perf      1.0068
> > 22vs23_ljpme-lb_arm_sve_cpu_perf             0.9957
> > 22vs23_psh_arm_neon_asimd_cpu_perf           0.9957
> > 22vs23_psh_arm_sve_cpu_perf                  0.9885
> > 22vs23_psw_arm_neon_asimd_cpu_perf           0.9983
> > 22vs23_psw_arm_sve_cpu_perf                  1.0024
> > 22vs23_rf_arm_neon_asimd_cpu_perf            0.9976
> > 22vs23_rf_arm_sve_cpu_perf                   0.9916
> >
> > The effect of the patch series on compile times was evaluated by
> > comparing the compile times of insn-emit-1.cc. Speed-up for the median
> > values of 5 repetitions: 1.0001
> >
> > Any help with further performance evaluation would be greatly appreciated.
> >
> > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> 
> My thoughts on this:
> * From first principles it seems that scheduling for dispatch constraints is 
> the
> sensible strategy for aggressive OoO CPUs.
> Trying to fill in gaps created by high-latency instructions as per the 
> traditional
> scheduling approach is not useful as the hardware should handle it 
> automatically.
> These CPUs are instead more sensitive to more frontend limitations like 
> dispatch.
> * The performance results here show that SPEC is not particularly sensitive 
> to the
> scheduling approach. GROMACS looks a bit more interesting with some subtests
> getting up to 1.5% better.
> GROMACS uses more explicit intrinsics-based vector code which is different to 
> how
> SPEC is written. If someone has access to Neoverse V2 hardware and non-SPEC-
> shaped workloads it’d be very interesting to get more data points on the
> performance.
> * The implementation of the relevant hooks and the CPU-specific is nicely 
> isolated
> in the new .cc and neoversev2.md files so hopefully CPUs that won’t use this
> scheduling scheme shouldn’t need to care much about the code for it.
>


FWIW it's on my backlog to look at as well. I don't in principle have any 
objections,
It's just a large thing to go through :)

I'll try to get to it soon!

Thanks,
Tamar

> Thanks,
> Kyrill
> 
> 
> >
> > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md: Include neoversev2.md.
> > * config/aarch64/tuning_models/neoversev2.h: Enable dispatch
> > scheduling and add dispatch constraints.
> > * config/aarch64/neoversev2.md: New file and new instruction attribute
> > neoversev2_dispatch.
> > ---
> > gcc/config/aarch64/aarch64.md                 |   3 +
> > gcc/config/aarch64/neoversev2.md              | 192 ++++++++++++++++++
> > gcc/config/aarch64/tuning_models/neoversev2.h | 102 +++++++++-
> > 3 files changed, 294 insertions(+), 3 deletions(-)
> > create mode 100644 gcc/config/aarch64/neoversev2.md
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index fc9c819b864..bceaf40ae97 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -672,6 +672,9 @@
> > (include "tsv110.md")
> > (include "thunderx3t110.md")
> >
> > +;; Dispatch scheduling
> > +(include "neoversev2.md")
> > +
> > ;; -------------------------------------------------------------------
> > ;; Jumps and other miscellaneous insns
> > ;; -------------------------------------------------------------------
> > diff --git a/gcc/config/aarch64/neoversev2.md
> b/gcc/config/aarch64/neoversev2.md
> > new file mode 100644
> > index 00000000000..8dc9b098d09
> > --- /dev/null
> > +++ b/gcc/config/aarch64/neoversev2.md
> > @@ -0,0 +1,192 @@
> > +;; Instruction attribute for dispatch scheduling for Neoverse V2.
> > +;; Copyright The GNU Toolchain Authors.
> > +;;
> > +;; This file is part of GCC.
> > +;;
> > +;; GCC is free software; you can redistribute it and/or modify it
> > +;; under the terms of the GNU General Public License as published by
> > +;; the Free Software Foundation; either version 3, or (at your option)
> > +;; any later version.
> > +;;
> > +;; GCC is distributed in the hope that it will be useful, but
> > +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +;; General Public License for more details.
> > +;;
> > +;; You should have received a copy of the GNU General Public License
> > +;; along with GCC; see the file COPYING3.  If not see
> > +;; <http://www.gnu.org/licenses/>.
> > +
> > +;; Attribute that groups other instruction attributes into dispatch groups
> > +;; for Neoverse V2 cores.  Dispatch groups are groups of pipelines for 
> > which
> > +;; the SWOG specifies a dispatch constraint.  For example: Because the SWOG
> > +;; contains a dispatch constraint for the V02 pipelines, there is an 
> > attribute
> > +;; value "v02" that groups instructions that are processed by the V0 and V2
> > +;; pipelines.
> > +;; Values that contain a "_" represent combinations of dispatch groups.
> > +;; For example, there are dispatch constraints for the M0 and V pipelines.
> > +;; The value "m0_v" groups instructions that utilize the M0 as well as the
> > +;; V pipelines, such that both dispatch constraints apply.
> > +
> > +(define_attr "neoversev2_dispatch"
> > +  "none,bs01,bsm,m,m0,v02,v13,v,l01,l,bsm_l,m_l,m0_v,v_v13,v_l,\
> > +   l01_d,l01_v"
> > +  (cond [(eq_attr "type" "branch,call")
> > + (const_string "bs01")
> > + (ior
> > +   (eq_attr "type" "adc_reg,alu_ext,alu_imm,alu_sreg,alus_ext,\
> > +    alus_imm,alus_sreg,clz,csel,logic_imm,logic_reg,logics_imm,\
> > +    logics_reg,mov_imm,rbit,rev,shift_reg")
> > +   (eq_attr "sve_type" "sve_pred_cnt_scalar"))
> > + (const_string "bsm")
> > + (ior
> > +   (eq_attr "type" "alu_ext,alus_ext,bfm,bfx,mul,rotate_imm,\
> > +    smull,umull")
> > +   (eq_attr "autodetect_type" "alu_shift_asr_op2,alu_shift_lsl_op2,\
> > +    alu_shift_lsr_op2")
> > +   (eq_attr "sve_type" "sve_pred_cnt_ctrl,sve_pred_misc"))
> > + (const_string "m")
> > + (ior
> > +   (eq_attr "type" "crc,f_cvti2f,mla,neon_from_gp,neon_from_gp_q,\
> > +    sdiv,smlal,udiv,umlal")
> > +   (eq_attr "sve_type" "sve_ffr,sve_pred_logical"))
> > + (const_string "m0")
> > + (ior
> > +   (eq_attr "type"
> > +    "crypto_sha256_slow,crypto_sha3,crypto_sha512,crypto_sm3,\
> > +     crypto_sm4,f_rintd,f_rints,fccmpd,fccmps,fcmpd,fcmps,fdivd,\
> > +     fdivs,fsqrtd,fsqrts,neon_fp_cvt_narrow_d_q,\
> > +     neon_fp_cvt_narrow_s_q,neon_fp_cvt_widen_h,neon_fp_cvt_widen_s,\
> > +     neon_fp_div_d,neon_fp_div_d_q,neon_fp_div_s,neon_fp_div_s_q,\
> > +     neon_fp_recpe_d,neon_fp_recpe_d_q,neon_fp_recpe_s,\
> > +     neon_fp_recpe_s_q,neon_fp_recps_d,neon_fp_recps_d_q,\
> > +     neon_fp_recps_s,neon_fp_recps_s_q,neon_fp_recpx_d,\
> > +     neon_fp_recpx_d_q,neon_fp_recpx_s,neon_fp_recpx_s_q,\
> > +     neon_fp_round_d,neon_fp_round_d_q,neon_fp_round_s,\
> > +     neon_fp_round_s_q,neon_fp_rsqrte_d,neon_fp_rsqrte_d_q,\
> > +     neon_fp_rsqrte_s,neon_fp_rsqrte_s_q,neon_fp_rsqrts_d,\
> > +     neon_fp_rsqrts_d_q,neon_fp_rsqrts_s,neon_fp_rsqrts_s_q,\
> > +     neon_fp_sqrt_d,neon_fp_sqrt_d_q,neon_fp_sqrt_s,\
> > +     neon_fp_sqrt_s_q,neon_fp_to_int_d,neon_fp_to_int_d_q,\
> > +     neon_fp_to_int_s,neon_fp_to_int_s_q,neon_int_to_fp_d,\
> > +     neon_int_to_fp_d_q,neon_int_to_fp_s,neon_int_to_fp_s_q,\
> > +     neon_mla_b,neon_mla_b_q,neon_mla_h,neon_mla_h_q,\
> > +     neon_mla_s,neon_mla_s_q,neon_mla_b_long,neon_mla_h_long,\
> > +     neon_mla_h_scalar,neon_mla_h_scalar_q,neon_mla_s_long,\
> > +     neon_mla_s_scalar,neon_mla_s_scalar_q,neon_mla_h_scalar_long,\
> > +     neon_mla_s_scalar_long,neon_mul_b,neon_mul_b_q,\
> > +     neon_mul_d_long,neon_mul_h,neon_mul_h_q,neon_mul_h_long,\
> > +     neon_mul_h_scalar,neon_mul_h_scalar_q,neon_mul_h_scalar_long,\
> > +     neon_mul_s,neon_mul_s_q,neon_mul_s_long,neon_mul_s_scalar,\
> > +     neon_mul_s_scalar_q,neon_mul_s_scalar_long,neon_sat_mla_b_long,\
> > +     neon_sat_mla_h_long,neon_sat_mla_h_scalar_long,\
> > +     neon_sat_mla_s_long,neon_sat_mla_s_scalar_long,\
> > +     neon_sat_mul_b,neon_sat_mul_b_q,neon_sat_mul_b_long,\
> > +     neon_sat_mul_h,neon_sat_mul_h_q,neon_sat_mul_h_long,\
> > +     neon_sat_mul_h_scalar,neon_sat_mul_h_scalar_q,\
> > +     neon_sat_mul_h_scalar_long,neon_sat_mul_s,neon_sat_mul_s_q,\
> > +     neon_sat_mul_s_long,neon_sat_mul_s_scalar,\
> > +     neon_sat_mul_s_scalar_q,neon_sat_mul_s_scalar_long")
> > +   (eq_attr "sve_type"
> > +    "sve_crypto_sha3,sve_fp_cmp,sve_fp_cvt,sve_fp_div,sve_fp_log,\
> > +     sve_fp_sqrt,sve_int_cvt,sve_int_div,sve_int_dot,sve_int_index,\
> > +     sve_int_mul,sve_int_recip_est"))
> > + (const_string "v02")
> > + (ior
> > +   (eq_attr "type"
> > +    "neon_arith_acc,neon_arith_acc_q,neon_reduc_add,\
> > +     neon_reduc_add_long,neon_reduc_add_q,neon_reduc_minmax,\
> > +     neon_reduc_minmax_q,neon_sat_shift_imm,\
> > +     neon_sat_shift_imm_narrow_q,neon_sat_shift_imm_q,\
> > +     neon_sat_shift_reg,neon_sat_shift_reg_q,neon_shift_acc,\
> > +     neon_shift_acc_q,neon_shift_imm,neon_shift_imm_long,\
> > +     neon_shift_imm_narrow_q,neon_shift_imm_q,neon_shift_reg,\
> > +     neon_shift_reg_q")
> > +   (eq_attr "sve_type"
> > +    "sve_fp_assoc_add,sve_fp_exp,sve_int_accum,sve_int_bit_perm,\
> > +     sve_int_extend,sve_int_extract,sve_int_shift"))
> > + (const_string "v13")
> > + (ior
> > +   (eq_attr "type" "crypto_pmull,f_cvt,f_cvtf2i,f_minmaxd,f_minmaxs,\
> > +    faddd,fadds,fconstd,fconsts,fcsel,ffarithd,ffariths,fmacd,fmacs,\
> > +    fmov,fmuld,fmuls,f_mcr,f_mrc,neon_abd,\
> > +    neon_abd_long,neon_abd_q,neon_abs,neon_abs_q,neon_add,\
> > +    neon_add_halve,neon_add_halve_narrow_q,neon_add_halve_q,\
> > +    neon_add_long,neon_add_q,neon_add_widen,neon_bsl,neon_bsl_q,\
> > +    neon_cls,neon_cls_q,neon_cnt,neon_cnt_q,neon_compare,\
> > +    neon_compare_q,neon_compare_zero,neon_compare_zero_q,\
> > +    neon_dup,neon_dup_q,neon_ext,neon_ext_q,neon_fcadd,neon_fcmla,\
> > +    neon_fp_abd_d,neon_fp_abd_d_q,neon_fp_abd_s,neon_fp_abd_s_q,\
> > +    neon_fp_abs_d,neon_fp_abs_d_q,neon_fp_abs_s,neon_fp_abs_s_q,\
> > +    neon_fp_addsub_d,neon_fp_addsub_d_q,neon_fp_addsub_s,\
> > +    neon_fp_addsub_s_q,neon_fp_compare_d,neon_fp_compare_d_q,\
> > +    neon_fp_compare_s,neon_fp_compare_s_q,neon_fp_minmax_d,\
> > +    neon_fp_minmax_d_q,neon_fp_minmax_s,neon_fp_minmax_s_q,\
> > +    neon_fp_mla_d,neon_fp_mla_d_q,neon_fp_mla_d_scalar_q,\
> > +    neon_fp_mla_s,neon_fp_mla_s_q,neon_fp_mla_s_scalar,\
> > +    neon_fp_mla_s_scalar_q,neon_fp_mul_d,neon_fp_mul_d_q,\
> > +    neon_fp_mul_d_scalar_q,neon_fp_mul_s,neon_fp_mul_s_q,\
> > +    neon_fp_mul_s_scalar,neon_fp_mul_s_scalar_q,neon_fp_neg_d,\
> > +
> neon_fp_neg_d_q,neon_fp_neg_s,neon_fp_neg_s_q,neon_fp_reduc_add_d,\
> > +    neon_fp_reduc_add_d_q,neon_fp_reduc_add_s,neon_fp_reduc_add_s_q,\
> > +    neon_fp_reduc_minmax_d,neon_fp_reduc_minmax_d_q,\
> > +    neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_s_q,neon_logic,\
> > +    neon_logic_q,neon_minmax,neon_minmax_q,neon_move,\
> > +
> neon_move_narrow_q,neon_move_q,neon_neg,neon_neg_q,neon_permute,\
> > +    neon_permute_q,neon_qabs,neon_qabs_q,neon_qadd,neon_qadd_q,\
> > +    neon_qneg,neon_qneg_q,neon_qsub,neon_qsub_q,neon_rbit,\
> > +    neon_rbit_q,neon_rev,neon_rev_q,neon_sub,neon_sub_halve,\
> > +    neon_sub_halve_narrow_q,neon_sub_halve_q,neon_sub_long,\
> > +    neon_sub_q,neon_sub_widen,neon_tbl1,neon_tbl1_q,neon_tbl2,\
> > +    neon_tbl2_q,neon_tbl3,neon_tbl3_q,neon_tbl4,neon_tbl4_q,\
> > +    neon_to_gp,neon_to_gp_q,neon_tst,neon_tst_q,neon_zip,\
> > +    neon_zip_q")
> > +   (eq_attr "sve_type" "sve_fp_arith,sve_fp_misc,sve_fp_mul,\
> > +    sve_fp_reduc,sve_int_general,sve_int_pmul"))
> > + (const_string "v")
> > + (eq_attr "sve_type" "sve_store_pred")
> > + (const_string "l01")
> > + (ior
> > +   (eq_attr "type" "neon_ldp,neon_ldp_q,neon_load1_1reg,\
> > +    neon_load1_1reg_q,neon_load1_2reg,neon_load1_2reg_q,\
> > +    neon_load1_3reg,neon_load1_3reg_q,neon_load1_4reg,\
> > +    neon_load1_4reg_q")
> > +   (eq_attr "sve_type" "sve_load_1reg"))
> > + (const_string "l")
> > + (eq_attr "type" "f_loadd,f_loads")
> > + (const_string "bsm_l")
> > + (eq_attr "sve_type" "sve_load_pred")
> > + (const_string "m_l")
> > + (ior
> > +   (eq_attr "type" "neon_ins,neon_ins_q")
> > +   (eq_attr "sve_type" "sve_int_cmp_set,sve_int_match,sve_pred_vec"))
> > + (const_string "m0_v")
> > + (eq_attr "sve_type" "sve_int_reduc")
> > + (const_string "v_v13")
> > + (ior
> > +   (eq_attr "type" "neon_load1_all_lanes,neon_load1_one_lane,\
> > +    neon_load1_one_lane_q,neon_load2_2reg,neon_load2_2reg_q,\
> > +    neon_load2_all_lanes,neon_load2_all_lanes_q,neon_load2_one_lane,\
> > +    neon_load3_3reg,neon_load3_3reg_q,neon_load3_all_lanes,\
> > +    neon_load3_all_lanes_q,neon_load3_one_lane,neon_load4_4reg,\
> > +    neon_load4_4reg_q,neon_load4_all_lanes,neon_load4_all_lanes_q,\
> > +    neon_load4_one_lane")
> > +   (eq_attr "sve_type" "sve_gatherload_32,sve_gatherload_64,\
> > +    sve_load_2reg,sve_load_3reg,sve_load_4reg"))
> > + (const_string "v_l")
> > + (eq_attr "type" "load_16,load_4,load_8,store_16,store_4,store_8")
> > + (const_string "l01_d")
> > + (ior
> > +   (eq_attr "type" "f_stored,f_stores,neon_stp,neon_stp_q,\
> > +    neon_store1_1reg,neon_store1_1reg_q,neon_store1_2reg,\
> > +    neon_store1_2reg_q,neon_store1_3reg,neon_store1_3reg_q,\
> > +    neon_store1_4reg,neon_store1_4reg_q,neon_store1_one_lane,\
> > +    neon_store1_one_lane_q,neon_store2_2reg,neon_store2_2reg_q,\
> > +    neon_store2_one_lane,neon_store2_one_lane_q,neon_store3_3reg,\
> > +    neon_store3_3reg_q,neon_store3_one_lane,neon_store3_one_lane_q,\
> > +    neon_store4_4reg,neon_store4_4reg_q,neon_store4_one_lane,\
> > +    neon_store4_one_lane_q")
> > +   (eq_attr "sve_type" "sve_scatterstore_32,sve_scatterstore_64,\
> > +    sve_store_1reg,sve_store_2reg,sve_store_3reg,sve_store_4reg"))
> > + (const_string "l01_v")]
> > + (const_string "none")))
> > \ No newline at end of file
> > diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h
> b/gcc/config/aarch64/tuning_models/neoversev2.h
> > index faf06d8e7ed..c3749d0c194 100644
> > --- a/gcc/config/aarch64/tuning_models/neoversev2.h
> > +++ b/gcc/config/aarch64/tuning_models/neoversev2.h
> > @@ -21,6 +21,7 @@
> > #define GCC_AARCH64_H_NEOVERSEV2
> >
> > #include "generic.h"
> > +#include "../aarch64-sched-dispatch.h"
> >
> > static const struct cpu_regmove_cost neoversev2_regmove_cost =
> > {
> > @@ -188,6 +189,100 @@ static const struct cpu_vector_cost
> neoversev2_vector_cost =
> >   &neoversev2_vec_issue_info /* issue_info  */
> > };
> >
> > +/* Neoverse V2 dispatch constraints for instruction scheduling.  */
> > +static const dispatch_constraint neoversev2_dispatch_constraints[] = {
> > +  dispatch_constraint ("total", 16, [](rtx_insn *)
> > +    {
> > +      return 1;
> > +    }),
> > +  dispatch_constraint ("b_s01", 4, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01);
> > +    }),
> > +  dispatch_constraint ("m0", 2, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M0
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
> > +    }),
> > +  dispatch_constraint ("m", 4, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M0
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M_L
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
> > +    }),
> > +  dispatch_constraint ("b_s_m", 8, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_BSM
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M0
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_BSM_L
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M_L
> > +   || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
> > +    }),
> > +  dispatch_constraint ("v02", 2, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V02);
> > +    }),
> > +  dispatch_constraint ("v13", 2, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V13);
> > +    }),
> > +  dispatch_constraint ("v", 4, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      switch (dispatch_group) {
> > + case NEOVERSEV2_DISPATCH_V02:
> > + case NEOVERSEV2_DISPATCH_V13:
> > + case NEOVERSEV2_DISPATCH_V:
> > + case NEOVERSEV2_DISPATCH_M0_V:
> > + case NEOVERSEV2_DISPATCH_V_L:
> > + case NEOVERSEV2_DISPATCH_L01_V:
> > +  return 1;
> > + case NEOVERSEV2_DISPATCH_V_V13:
> > +  return 2;
> > + default:
> > +  return 0;
> > +      }
> > +    }),
> > +  dispatch_constraint ("l01_d", 4, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      switch (dispatch_group) {
> > + case NEOVERSEV2_DISPATCH_L01_V:
> > + case NEOVERSEV2_DISPATCH_L01:
> > +  return 1;
> > + case NEOVERSEV2_DISPATCH_L01_D:
> > +  return 2;
> > + default:
> > +  return 0;
> > +      }
> > +    }),
> > +  dispatch_constraint ("l", 6, [](rtx_insn *insn)
> > +    {
> > +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> > +      switch (dispatch_group) {
> > + case NEOVERSEV2_DISPATCH_L:
> > + case NEOVERSEV2_DISPATCH_BSM_L:
> > + case NEOVERSEV2_DISPATCH_M_L:
> > + case NEOVERSEV2_DISPATCH_V_L:
> > + case NEOVERSEV2_DISPATCH_L01_V:
> > +  return 1;
> > + case NEOVERSEV2_DISPATCH_L01_D:
> > +  return 2;
> > + default:
> > +  return 0;
> > +      }
> > +    })
> > +};
> > +
> > static const struct tune_params neoversev2_tunings =
> > {
> >   &cortexa76_extra_costs,
> > @@ -221,12 +316,13 @@ static const struct tune_params neoversev2_tunings
> =
> >    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> >    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> >    | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW
> > -   | AARCH64_EXTRA_TUNE_AVOID_LDAPUR), /* tune_flags.  */
> > +   | AARCH64_EXTRA_TUNE_AVOID_LDAPUR
> > +   | AARCH64_EXTRA_TUNE_DISPATCH_SCHED), /* tune_flags.  */
> >   &generic_armv9a_prefetch_tune,
> >   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> >   AARCH64_LDP_STP_POLICY_ALWAYS,   /* stp_policy_model.  */
> > -  nullptr, /* dispatch_constraints.  */
> > -  0 /* num_dispatch_constraints.  */
> > +  neoversev2_dispatch_constraints,  /* dispatch_constraints.  */
> > +  ARRAY_SIZE (neoversev2_dispatch_constraints)  /* 
> > num_dispatch_constraints.
> */
> > };
> >
> > #endif /* GCC_AARCH64_H_NEOVERSEV2.  */
> > --
> > 2.34.1

RE: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

Reply via email to