> -----Original Message----- > From: Kyrylo Tkachov <ktkac...@nvidia.com> > Sent: Thursday, July 31, 2025 3:47 PM > To: Jennifer Schmitz <jschm...@nvidia.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Andrew Pinski > <pins...@gmail.com>; Richard Earnshaw <richard.earns...@arm.com>; Richard > Sandiford <richard.sandif...@arm.com>; Tamar Christina > <tamar.christ...@arm.com>; Alex Coplan <alex.cop...@arm.com> > Subject: Re: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2. > > > > > On 29 Jul 2025, at 17:14, Jennifer Schmitz <jschm...@nvidia.com> wrote: > > > > This patch adds dispatch constraints for Neoverse V2 and illustrates the > > steps > > necessary to enable dispatch scheduling for an AArch64 core. > > > > The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG. > > Please note that the values used here deviate slightly from the current SWOG > > version but are based on correct numbers. Arm will do an official Neoverse > > V2 > > SWOG release with the updated values in due time. > > > > Here are the steps how we implemented the dispatch constraints for > > Neoverse V2: > > 1. We used instruction attributes to group instructions into dispatch > > groups, > > corresponding to operations that utilize a certain pipeline type. For > > that, > > we added a new attribute (neoversev2_dispatch) with values for the > > different dispatch groups. The values of neoversev2_dispatch are > > determined > > using expressions of other instruction attributes. > > For example, the SWOG describes a constraint of "Up to 4 uOPs utilizing > > the > > M pipelines". Thus, one of the values of neoversev2_dispatch is "m" and it > > groups instructions that use the M pipelines such as integer > > multiplication. > > Note that we made some minor simplifications compared to the information > > in the SWOG, because the instruction annotation does not allow for a fully > > accurate mapping of instructions to utilized pipelines. To give one > > example, > > the instructions IRG and LDG are both tagged with "memtag", but IRG uses > > the M pipelines, while LDG uses the L pipelines. > > 2. In the Neoverse V2 tuning model, we added an array of dispatch_constraint > > objects and referenced it in the tune_params. The new attribute > > neoversev2_dispatch provided a compact way to define the dispatch > > constraints. > > 3. We enabled dispatch scheduling for Neoverse V2 by adding the > > AARCH64_EXTRA_TUNE_DISPATCH_SCHED tune flag. > > > > Performance evaluation on Grace machine using SPEC2017 and GROMACS2024: > > We ran each benchmark 5 times compiled with trunk (commit a1fb757) and with > > the patch series and computed the speed-up for the median values per > > test (i.e. values >1 mean that the patch series improves performance): > > > > SPEC2017 FP (-O3 -Wl,-z,muldefs -lm -fallow-argument-mismatch -fpermissive > > -fstack-arrays -flto=auto -Wl,--sort-section=name -march=native > > -mcpu=neoverse-v2 -std=gnu17): > > Geom. mean of speed-ups 1.0006 > > blender 1.0008 > > bwaves 0.9996 > > cactuBSSN 1.0007 > > fotonik3d 1.0002 > > imagick 0.9999 > > lbm 1.0016 > > nab 1.0012 > > namd 1.0002 > > parest 1.0004 > > povray 1.0029 > > roms 1.0000 > > wrf 1.0003 > > > > SPEC2017 INT (same as SPEC2017 FP): > > Geom. mean of speed-ups 0.9994 > > deepsjeng 0.9991 > > gcc 1.0024 > > leela 0.9985 > > mcf 0.9985 > > exchange2 1.0000 > > omnetpp 1.0005 > > perlbench 0.9975 > > x264 1.0032 > > xalancbmk 0.9916 > > xz 1.0032 > > > > GROMACS 2024 (-O3 -Wl,-z,muldefs -lm -flto=auto -Wl,--sort-section=name > > -march=native -mcpu=neoverse-v2) > > Geom. mean of speed-ups: 1.0024 > > 22vs23_cut_arm_neon_asimd_cpu_perf 1.0005 > > 22vs23_cut_arm_sve_cpu_perf 1.0153 > > 22vs23_fsw_arm_neon_asimd_cpu_perf 1.0107 > > 22vs23_fsw_arm_sve_cpu_perf 1.0156 > > 22vs23_ljpme-geom_arm_neon_asimd_cpu_perf 1.0081 > > 22vs23_ljpme-geom_arm_sve_cpu_perf 1.0024 > > 22vs23_ljpme-lb_arm_neon_asimd_cpu_perf 1.0068 > > 22vs23_ljpme-lb_arm_sve_cpu_perf 0.9957 > > 22vs23_psh_arm_neon_asimd_cpu_perf 0.9957 > > 22vs23_psh_arm_sve_cpu_perf 0.9885 > > 22vs23_psw_arm_neon_asimd_cpu_perf 0.9983 > > 22vs23_psw_arm_sve_cpu_perf 1.0024 > > 22vs23_rf_arm_neon_asimd_cpu_perf 0.9976 > > 22vs23_rf_arm_sve_cpu_perf 0.9916 > > > > The effect of the patch series on compile times was evaluated by > > comparing the compile times of insn-emit-1.cc. Speed-up for the median > > values of 5 repetitions: 1.0001 > > > > Any help with further performance evaluation would be greatly appreciated. > > > > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. > > My thoughts on this: > * From first principles it seems that scheduling for dispatch constraints is > the > sensible strategy for aggressive OoO CPUs. > Trying to fill in gaps created by high-latency instructions as per the > traditional > scheduling approach is not useful as the hardware should handle it > automatically. > These CPUs are instead more sensitive to more frontend limitations like > dispatch. > * The performance results here show that SPEC is not particularly sensitive > to the > scheduling approach. GROMACS looks a bit more interesting with some subtests > getting up to 1.5% better. > GROMACS uses more explicit intrinsics-based vector code which is different to > how > SPEC is written. If someone has access to Neoverse V2 hardware and non-SPEC- > shaped workloads it’d be very interesting to get more data points on the > performance. > * The implementation of the relevant hooks and the CPU-specific is nicely > isolated > in the new .cc and neoversev2.md files so hopefully CPUs that won’t use this > scheduling scheme shouldn’t need to care much about the code for it. >
FWIW it's on my backlog to look at as well. I don't in principle have any objections, It's just a large thing to go through :) I'll try to get to it soon! Thanks, Tamar > Thanks, > Kyrill > > > > > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64.md: Include neoversev2.md. > > * config/aarch64/tuning_models/neoversev2.h: Enable dispatch > > scheduling and add dispatch constraints. > > * config/aarch64/neoversev2.md: New file and new instruction attribute > > neoversev2_dispatch. > > --- > > gcc/config/aarch64/aarch64.md | 3 + > > gcc/config/aarch64/neoversev2.md | 192 ++++++++++++++++++ > > gcc/config/aarch64/tuning_models/neoversev2.h | 102 +++++++++- > > 3 files changed, 294 insertions(+), 3 deletions(-) > > create mode 100644 gcc/config/aarch64/neoversev2.md > > > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > > index fc9c819b864..bceaf40ae97 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -672,6 +672,9 @@ > > (include "tsv110.md") > > (include "thunderx3t110.md") > > > > +;; Dispatch scheduling > > +(include "neoversev2.md") > > + > > ;; ------------------------------------------------------------------- > > ;; Jumps and other miscellaneous insns > > ;; ------------------------------------------------------------------- > > diff --git a/gcc/config/aarch64/neoversev2.md > b/gcc/config/aarch64/neoversev2.md > > new file mode 100644 > > index 00000000000..8dc9b098d09 > > --- /dev/null > > +++ b/gcc/config/aarch64/neoversev2.md > > @@ -0,0 +1,192 @@ > > +;; Instruction attribute for dispatch scheduling for Neoverse V2. > > +;; Copyright The GNU Toolchain Authors. > > +;; > > +;; This file is part of GCC. > > +;; > > +;; GCC is free software; you can redistribute it and/or modify it > > +;; under the terms of the GNU General Public License as published by > > +;; the Free Software Foundation; either version 3, or (at your option) > > +;; any later version. > > +;; > > +;; GCC is distributed in the hope that it will be useful, but > > +;; WITHOUT ANY WARRANTY; without even the implied warranty of > > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > +;; General Public License for more details. > > +;; > > +;; You should have received a copy of the GNU General Public License > > +;; along with GCC; see the file COPYING3. If not see > > +;; <http://www.gnu.org/licenses/>. > > + > > +;; Attribute that groups other instruction attributes into dispatch groups > > +;; for Neoverse V2 cores. Dispatch groups are groups of pipelines for > > which > > +;; the SWOG specifies a dispatch constraint. For example: Because the SWOG > > +;; contains a dispatch constraint for the V02 pipelines, there is an > > attribute > > +;; value "v02" that groups instructions that are processed by the V0 and V2 > > +;; pipelines. > > +;; Values that contain a "_" represent combinations of dispatch groups. > > +;; For example, there are dispatch constraints for the M0 and V pipelines. > > +;; The value "m0_v" groups instructions that utilize the M0 as well as the > > +;; V pipelines, such that both dispatch constraints apply. > > + > > +(define_attr "neoversev2_dispatch" > > + "none,bs01,bsm,m,m0,v02,v13,v,l01,l,bsm_l,m_l,m0_v,v_v13,v_l,\ > > + l01_d,l01_v" > > + (cond [(eq_attr "type" "branch,call") > > + (const_string "bs01") > > + (ior > > + (eq_attr "type" "adc_reg,alu_ext,alu_imm,alu_sreg,alus_ext,\ > > + alus_imm,alus_sreg,clz,csel,logic_imm,logic_reg,logics_imm,\ > > + logics_reg,mov_imm,rbit,rev,shift_reg") > > + (eq_attr "sve_type" "sve_pred_cnt_scalar")) > > + (const_string "bsm") > > + (ior > > + (eq_attr "type" "alu_ext,alus_ext,bfm,bfx,mul,rotate_imm,\ > > + smull,umull") > > + (eq_attr "autodetect_type" "alu_shift_asr_op2,alu_shift_lsl_op2,\ > > + alu_shift_lsr_op2") > > + (eq_attr "sve_type" "sve_pred_cnt_ctrl,sve_pred_misc")) > > + (const_string "m") > > + (ior > > + (eq_attr "type" "crc,f_cvti2f,mla,neon_from_gp,neon_from_gp_q,\ > > + sdiv,smlal,udiv,umlal") > > + (eq_attr "sve_type" "sve_ffr,sve_pred_logical")) > > + (const_string "m0") > > + (ior > > + (eq_attr "type" > > + "crypto_sha256_slow,crypto_sha3,crypto_sha512,crypto_sm3,\ > > + crypto_sm4,f_rintd,f_rints,fccmpd,fccmps,fcmpd,fcmps,fdivd,\ > > + fdivs,fsqrtd,fsqrts,neon_fp_cvt_narrow_d_q,\ > > + neon_fp_cvt_narrow_s_q,neon_fp_cvt_widen_h,neon_fp_cvt_widen_s,\ > > + neon_fp_div_d,neon_fp_div_d_q,neon_fp_div_s,neon_fp_div_s_q,\ > > + neon_fp_recpe_d,neon_fp_recpe_d_q,neon_fp_recpe_s,\ > > + neon_fp_recpe_s_q,neon_fp_recps_d,neon_fp_recps_d_q,\ > > + neon_fp_recps_s,neon_fp_recps_s_q,neon_fp_recpx_d,\ > > + neon_fp_recpx_d_q,neon_fp_recpx_s,neon_fp_recpx_s_q,\ > > + neon_fp_round_d,neon_fp_round_d_q,neon_fp_round_s,\ > > + neon_fp_round_s_q,neon_fp_rsqrte_d,neon_fp_rsqrte_d_q,\ > > + neon_fp_rsqrte_s,neon_fp_rsqrte_s_q,neon_fp_rsqrts_d,\ > > + neon_fp_rsqrts_d_q,neon_fp_rsqrts_s,neon_fp_rsqrts_s_q,\ > > + neon_fp_sqrt_d,neon_fp_sqrt_d_q,neon_fp_sqrt_s,\ > > + neon_fp_sqrt_s_q,neon_fp_to_int_d,neon_fp_to_int_d_q,\ > > + neon_fp_to_int_s,neon_fp_to_int_s_q,neon_int_to_fp_d,\ > > + neon_int_to_fp_d_q,neon_int_to_fp_s,neon_int_to_fp_s_q,\ > > + neon_mla_b,neon_mla_b_q,neon_mla_h,neon_mla_h_q,\ > > + neon_mla_s,neon_mla_s_q,neon_mla_b_long,neon_mla_h_long,\ > > + neon_mla_h_scalar,neon_mla_h_scalar_q,neon_mla_s_long,\ > > + neon_mla_s_scalar,neon_mla_s_scalar_q,neon_mla_h_scalar_long,\ > > + neon_mla_s_scalar_long,neon_mul_b,neon_mul_b_q,\ > > + neon_mul_d_long,neon_mul_h,neon_mul_h_q,neon_mul_h_long,\ > > + neon_mul_h_scalar,neon_mul_h_scalar_q,neon_mul_h_scalar_long,\ > > + neon_mul_s,neon_mul_s_q,neon_mul_s_long,neon_mul_s_scalar,\ > > + neon_mul_s_scalar_q,neon_mul_s_scalar_long,neon_sat_mla_b_long,\ > > + neon_sat_mla_h_long,neon_sat_mla_h_scalar_long,\ > > + neon_sat_mla_s_long,neon_sat_mla_s_scalar_long,\ > > + neon_sat_mul_b,neon_sat_mul_b_q,neon_sat_mul_b_long,\ > > + neon_sat_mul_h,neon_sat_mul_h_q,neon_sat_mul_h_long,\ > > + neon_sat_mul_h_scalar,neon_sat_mul_h_scalar_q,\ > > + neon_sat_mul_h_scalar_long,neon_sat_mul_s,neon_sat_mul_s_q,\ > > + neon_sat_mul_s_long,neon_sat_mul_s_scalar,\ > > + neon_sat_mul_s_scalar_q,neon_sat_mul_s_scalar_long") > > + (eq_attr "sve_type" > > + "sve_crypto_sha3,sve_fp_cmp,sve_fp_cvt,sve_fp_div,sve_fp_log,\ > > + sve_fp_sqrt,sve_int_cvt,sve_int_div,sve_int_dot,sve_int_index,\ > > + sve_int_mul,sve_int_recip_est")) > > + (const_string "v02") > > + (ior > > + (eq_attr "type" > > + "neon_arith_acc,neon_arith_acc_q,neon_reduc_add,\ > > + neon_reduc_add_long,neon_reduc_add_q,neon_reduc_minmax,\ > > + neon_reduc_minmax_q,neon_sat_shift_imm,\ > > + neon_sat_shift_imm_narrow_q,neon_sat_shift_imm_q,\ > > + neon_sat_shift_reg,neon_sat_shift_reg_q,neon_shift_acc,\ > > + neon_shift_acc_q,neon_shift_imm,neon_shift_imm_long,\ > > + neon_shift_imm_narrow_q,neon_shift_imm_q,neon_shift_reg,\ > > + neon_shift_reg_q") > > + (eq_attr "sve_type" > > + "sve_fp_assoc_add,sve_fp_exp,sve_int_accum,sve_int_bit_perm,\ > > + sve_int_extend,sve_int_extract,sve_int_shift")) > > + (const_string "v13") > > + (ior > > + (eq_attr "type" "crypto_pmull,f_cvt,f_cvtf2i,f_minmaxd,f_minmaxs,\ > > + faddd,fadds,fconstd,fconsts,fcsel,ffarithd,ffariths,fmacd,fmacs,\ > > + fmov,fmuld,fmuls,f_mcr,f_mrc,neon_abd,\ > > + neon_abd_long,neon_abd_q,neon_abs,neon_abs_q,neon_add,\ > > + neon_add_halve,neon_add_halve_narrow_q,neon_add_halve_q,\ > > + neon_add_long,neon_add_q,neon_add_widen,neon_bsl,neon_bsl_q,\ > > + neon_cls,neon_cls_q,neon_cnt,neon_cnt_q,neon_compare,\ > > + neon_compare_q,neon_compare_zero,neon_compare_zero_q,\ > > + neon_dup,neon_dup_q,neon_ext,neon_ext_q,neon_fcadd,neon_fcmla,\ > > + neon_fp_abd_d,neon_fp_abd_d_q,neon_fp_abd_s,neon_fp_abd_s_q,\ > > + neon_fp_abs_d,neon_fp_abs_d_q,neon_fp_abs_s,neon_fp_abs_s_q,\ > > + neon_fp_addsub_d,neon_fp_addsub_d_q,neon_fp_addsub_s,\ > > + neon_fp_addsub_s_q,neon_fp_compare_d,neon_fp_compare_d_q,\ > > + neon_fp_compare_s,neon_fp_compare_s_q,neon_fp_minmax_d,\ > > + neon_fp_minmax_d_q,neon_fp_minmax_s,neon_fp_minmax_s_q,\ > > + neon_fp_mla_d,neon_fp_mla_d_q,neon_fp_mla_d_scalar_q,\ > > + neon_fp_mla_s,neon_fp_mla_s_q,neon_fp_mla_s_scalar,\ > > + neon_fp_mla_s_scalar_q,neon_fp_mul_d,neon_fp_mul_d_q,\ > > + neon_fp_mul_d_scalar_q,neon_fp_mul_s,neon_fp_mul_s_q,\ > > + neon_fp_mul_s_scalar,neon_fp_mul_s_scalar_q,neon_fp_neg_d,\ > > + > neon_fp_neg_d_q,neon_fp_neg_s,neon_fp_neg_s_q,neon_fp_reduc_add_d,\ > > + neon_fp_reduc_add_d_q,neon_fp_reduc_add_s,neon_fp_reduc_add_s_q,\ > > + neon_fp_reduc_minmax_d,neon_fp_reduc_minmax_d_q,\ > > + neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_s_q,neon_logic,\ > > + neon_logic_q,neon_minmax,neon_minmax_q,neon_move,\ > > + > neon_move_narrow_q,neon_move_q,neon_neg,neon_neg_q,neon_permute,\ > > + neon_permute_q,neon_qabs,neon_qabs_q,neon_qadd,neon_qadd_q,\ > > + neon_qneg,neon_qneg_q,neon_qsub,neon_qsub_q,neon_rbit,\ > > + neon_rbit_q,neon_rev,neon_rev_q,neon_sub,neon_sub_halve,\ > > + neon_sub_halve_narrow_q,neon_sub_halve_q,neon_sub_long,\ > > + neon_sub_q,neon_sub_widen,neon_tbl1,neon_tbl1_q,neon_tbl2,\ > > + neon_tbl2_q,neon_tbl3,neon_tbl3_q,neon_tbl4,neon_tbl4_q,\ > > + neon_to_gp,neon_to_gp_q,neon_tst,neon_tst_q,neon_zip,\ > > + neon_zip_q") > > + (eq_attr "sve_type" "sve_fp_arith,sve_fp_misc,sve_fp_mul,\ > > + sve_fp_reduc,sve_int_general,sve_int_pmul")) > > + (const_string "v") > > + (eq_attr "sve_type" "sve_store_pred") > > + (const_string "l01") > > + (ior > > + (eq_attr "type" "neon_ldp,neon_ldp_q,neon_load1_1reg,\ > > + neon_load1_1reg_q,neon_load1_2reg,neon_load1_2reg_q,\ > > + neon_load1_3reg,neon_load1_3reg_q,neon_load1_4reg,\ > > + neon_load1_4reg_q") > > + (eq_attr "sve_type" "sve_load_1reg")) > > + (const_string "l") > > + (eq_attr "type" "f_loadd,f_loads") > > + (const_string "bsm_l") > > + (eq_attr "sve_type" "sve_load_pred") > > + (const_string "m_l") > > + (ior > > + (eq_attr "type" "neon_ins,neon_ins_q") > > + (eq_attr "sve_type" "sve_int_cmp_set,sve_int_match,sve_pred_vec")) > > + (const_string "m0_v") > > + (eq_attr "sve_type" "sve_int_reduc") > > + (const_string "v_v13") > > + (ior > > + (eq_attr "type" "neon_load1_all_lanes,neon_load1_one_lane,\ > > + neon_load1_one_lane_q,neon_load2_2reg,neon_load2_2reg_q,\ > > + neon_load2_all_lanes,neon_load2_all_lanes_q,neon_load2_one_lane,\ > > + neon_load3_3reg,neon_load3_3reg_q,neon_load3_all_lanes,\ > > + neon_load3_all_lanes_q,neon_load3_one_lane,neon_load4_4reg,\ > > + neon_load4_4reg_q,neon_load4_all_lanes,neon_load4_all_lanes_q,\ > > + neon_load4_one_lane") > > + (eq_attr "sve_type" "sve_gatherload_32,sve_gatherload_64,\ > > + sve_load_2reg,sve_load_3reg,sve_load_4reg")) > > + (const_string "v_l") > > + (eq_attr "type" "load_16,load_4,load_8,store_16,store_4,store_8") > > + (const_string "l01_d") > > + (ior > > + (eq_attr "type" "f_stored,f_stores,neon_stp,neon_stp_q,\ > > + neon_store1_1reg,neon_store1_1reg_q,neon_store1_2reg,\ > > + neon_store1_2reg_q,neon_store1_3reg,neon_store1_3reg_q,\ > > + neon_store1_4reg,neon_store1_4reg_q,neon_store1_one_lane,\ > > + neon_store1_one_lane_q,neon_store2_2reg,neon_store2_2reg_q,\ > > + neon_store2_one_lane,neon_store2_one_lane_q,neon_store3_3reg,\ > > + neon_store3_3reg_q,neon_store3_one_lane,neon_store3_one_lane_q,\ > > + neon_store4_4reg,neon_store4_4reg_q,neon_store4_one_lane,\ > > + neon_store4_one_lane_q") > > + (eq_attr "sve_type" "sve_scatterstore_32,sve_scatterstore_64,\ > > + sve_store_1reg,sve_store_2reg,sve_store_3reg,sve_store_4reg")) > > + (const_string "l01_v")] > > + (const_string "none"))) > > \ No newline at end of file > > diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h > b/gcc/config/aarch64/tuning_models/neoversev2.h > > index faf06d8e7ed..c3749d0c194 100644 > > --- a/gcc/config/aarch64/tuning_models/neoversev2.h > > +++ b/gcc/config/aarch64/tuning_models/neoversev2.h > > @@ -21,6 +21,7 @@ > > #define GCC_AARCH64_H_NEOVERSEV2 > > > > #include "generic.h" > > +#include "../aarch64-sched-dispatch.h" > > > > static const struct cpu_regmove_cost neoversev2_regmove_cost = > > { > > @@ -188,6 +189,100 @@ static const struct cpu_vector_cost > neoversev2_vector_cost = > > &neoversev2_vec_issue_info /* issue_info */ > > }; > > > > +/* Neoverse V2 dispatch constraints for instruction scheduling. */ > > +static const dispatch_constraint neoversev2_dispatch_constraints[] = { > > + dispatch_constraint ("total", 16, [](rtx_insn *) > > + { > > + return 1; > > + }), > > + dispatch_constraint ("b_s01", 4, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01); > > + }), > > + dispatch_constraint ("m0", 2, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M0 > > + || dispatch_group == NEOVERSEV2_DISPATCH_M0_V); > > + }), > > + dispatch_constraint ("m", 4, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M > > + || dispatch_group == NEOVERSEV2_DISPATCH_M0 > > + || dispatch_group == NEOVERSEV2_DISPATCH_M_L > > + || dispatch_group == NEOVERSEV2_DISPATCH_M0_V); > > + }), > > + dispatch_constraint ("b_s_m", 8, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01 > > + || dispatch_group == NEOVERSEV2_DISPATCH_BSM > > + || dispatch_group == NEOVERSEV2_DISPATCH_M > > + || dispatch_group == NEOVERSEV2_DISPATCH_M0 > > + || dispatch_group == NEOVERSEV2_DISPATCH_BSM_L > > + || dispatch_group == NEOVERSEV2_DISPATCH_M_L > > + || dispatch_group == NEOVERSEV2_DISPATCH_M0_V); > > + }), > > + dispatch_constraint ("v02", 2, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V02); > > + }), > > + dispatch_constraint ("v13", 2, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V13); > > + }), > > + dispatch_constraint ("v", 4, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + switch (dispatch_group) { > > + case NEOVERSEV2_DISPATCH_V02: > > + case NEOVERSEV2_DISPATCH_V13: > > + case NEOVERSEV2_DISPATCH_V: > > + case NEOVERSEV2_DISPATCH_M0_V: > > + case NEOVERSEV2_DISPATCH_V_L: > > + case NEOVERSEV2_DISPATCH_L01_V: > > + return 1; > > + case NEOVERSEV2_DISPATCH_V_V13: > > + return 2; > > + default: > > + return 0; > > + } > > + }), > > + dispatch_constraint ("l01_d", 4, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + switch (dispatch_group) { > > + case NEOVERSEV2_DISPATCH_L01_V: > > + case NEOVERSEV2_DISPATCH_L01: > > + return 1; > > + case NEOVERSEV2_DISPATCH_L01_D: > > + return 2; > > + default: > > + return 0; > > + } > > + }), > > + dispatch_constraint ("l", 6, [](rtx_insn *insn) > > + { > > + auto dispatch_group = get_attr_neoversev2_dispatch (insn); > > + switch (dispatch_group) { > > + case NEOVERSEV2_DISPATCH_L: > > + case NEOVERSEV2_DISPATCH_BSM_L: > > + case NEOVERSEV2_DISPATCH_M_L: > > + case NEOVERSEV2_DISPATCH_V_L: > > + case NEOVERSEV2_DISPATCH_L01_V: > > + return 1; > > + case NEOVERSEV2_DISPATCH_L01_D: > > + return 2; > > + default: > > + return 0; > > + } > > + }) > > +}; > > + > > static const struct tune_params neoversev2_tunings = > > { > > &cortexa76_extra_costs, > > @@ -221,12 +316,13 @@ static const struct tune_params neoversev2_tunings > = > > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > > | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW > > - | AARCH64_EXTRA_TUNE_AVOID_LDAPUR), /* tune_flags. */ > > + | AARCH64_EXTRA_TUNE_AVOID_LDAPUR > > + | AARCH64_EXTRA_TUNE_DISPATCH_SCHED), /* tune_flags. */ > > &generic_armv9a_prefetch_tune, > > AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ > > AARCH64_LDP_STP_POLICY_ALWAYS, /* stp_policy_model. */ > > - nullptr, /* dispatch_constraints. */ > > - 0 /* num_dispatch_constraints. */ > > + neoversev2_dispatch_constraints, /* dispatch_constraints. */ > > + ARRAY_SIZE (neoversev2_dispatch_constraints) /* > > num_dispatch_constraints. > */ > > }; > > > > #endif /* GCC_AARCH64_H_NEOVERSEV2. */ > > -- > > 2.34.1