Re: [PATCH] aarch64: Add the scheduling model for Neoverse N1

Ramana Radhakrishnan Sat, 22 Apr 2023 14:47:11 -0700

On Tue, Apr 18, 2023 at 10:42 PM Evandro Menezes via Gcc-patches
<[email protected]> wrote:
>
> This patch adds the scheduling model for Neoverse N1, based on the 
> information from the "Arm Neoverse N1 Software Optimization Guide”.


There haven't been many large core schedulers recently - so it will be
interesting to note and to see the experience.

We are probably missing any mention of the performance impact of these
patches - relative changes for a range of benchmarks are usually
specified in such submissions. I would strongly suggest seeing such
numbers. Prior experience over many generations has been that plugging
in numbers from a SWOG doesn't mean that the scheduler will perform
well.

Equally the patch has a number of really large blocking usage of
functional units in the scheduling descriptions especially for fp div
and fp sqrt instructions - these usually just bloat the automaton for
not enough gain in terms of actual performance in real world
measurements. I've noted a couple in a quick review below in the
patches.

A minor nit wearing my AArch32 hat is that -mcpu=neoverse-n1 is an
option for AArch32 thus technically the scheduling descriptions need
to be in gcc/config/arm but its possibly ok given the usage
statistics.

Reviewed-by: Ramana Radhakrishnan. <[email protected]>


regards
Ramana

>
> --
> Evandro Menezes
>
> ================================================================================
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-cores.def: Use the Neoverse N1 scheduling 
> model.
>         * config/aarch64/aarch64.md: Include `neoverse-n1.md`.
>         * config/aarch64/neoverse-n1.md: New file.
>
> Signed-off-by: Evandro Menezes <[email protected]>
> ---
>  gcc/config/aarch64/aarch64-cores.def |   2 +-
>  gcc/config/aarch64/aarch64.md        |   1 +
>  gcc/config/aarch64/neoverse-n1.md    | 711 +++++++++++++++++++++++++++
>  3 files changed, 713 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/aarch64/neoverse-n1.md
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index e352e4077b1..cc842c4e22c 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -116,7 +116,7 @@ AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 
> V8_2A,  (F16, RCPC, DOTPRO
>  AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
>  AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
>  AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> PROFILE), cortexa76, 0x41, 0xd0c, -1)
> -AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
> +AARCH64_CORE("neoverse-n1",  neoversen1, neoversen1, V8_2A,  (F16, RCPC, 
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
>  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
>
>  /* Cavium ('C') cores. */
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 022eef80bc1..6cb9e31259b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -471,6 +471,7 @@
>  (include "../arm/cortex-a57.md")
>  (include "../arm/exynos-m1.md")
>  (include "falkor.md")
> +(include "neoverse-n1.md")
>  (include "saphira.md")
>  (include "thunderx.md")
>  (include "../arm/xgene1.md")
> diff --git a/gcc/config/aarch64/neoverse-n1.md 
> b/gcc/config/aarch64/neoverse-n1.md
> new file mode 100644
> index 00000000000..d66fa10c330
> --- /dev/null
> +++ b/gcc/config/aarch64/neoverse-n1.md
> @@ -0,0 +1,711 @@
> +;; Arm Neoverse N1 pipeline description
> +;; (Based on the "Arm Neoverse N1 Software Optimization Guide")
> +;;
> +;; Copyright (C) 2014-2023 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but
> +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;; General Public License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +;; The Neoverse N1 core is modelled as a multiple issue pipeline that has
> +;; the following functional units.
> +
> +(define_automaton "neoverse_n1")
> +
> +;; 1 - Two pipelines for integer operations: SX1, SX2.
> +
> +(define_cpu_unit "neon1_sx1_issue" "neoverse_n1")
> +(define_reservation "neon1_sx1" "neon1_sx1_issue")
> +
> +(define_cpu_unit "neon1_sx2_issue" "neoverse_n1")
> +(define_reservation "neon1_sx2" "neon1_sx2_issue")
> +
> +;; 2 - One pipeline for complex integer operations: MX.
> +
> +(define_cpu_unit "neon1_mx_issue"
> +                "neoverse_n1")
> +(define_reservation "neon1_mx" "neon1_mx_issue")
> +(define_reservation "neon1_m_block" "neon1_mx_issue")
> +
> +;; 3 - Two asymmetric pipelines for Neon and FP operations: CX1, CX2.
> +(define_automaton "neoverse_n1_cx")
> +
> +(define_cpu_unit "neon1_cx1_issue"
> +                "neoverse_n1_cx")
> +(define_cpu_unit "neon1_cx2_issue"
> +                "neoverse_n1_cx")
> +
> +(define_reservation "neon1_cx1" "neon1_cx1_issue")
> +(define_reservation "neon1_cx2" "neon1_cx2_issue")
> +(define_reservation "neon1_v0_block" "neon1_cx1_issue")
> +
> +;; 4 - One pipeline for branch operations: BX.
> +
> +(define_cpu_unit "neon1_bx_issue" "neoverse_n1")
> +(define_reservation "neon1_bx" "neon1_bx_issue")
> +
> +;; 5 - Two pipelines for load and store operations: LS1, LS2.
> +
> +(define_cpu_unit "neon1_ls1_issue" "neoverse_n1")
> +(define_reservation "neon1_ls1" "neon1_ls1_issue")
> +
> +(define_cpu_unit "neon1_ls2_issue" "neoverse_n1")
> +(define_reservation "neon1_ls2" "neon1_ls2_issue")
> +
> +;; Block all issue queues.
> +
> +(define_reservation "neon1_block" "neon1_sx1_issue + neon1_sx2_issue
> +                                 + neon1_mx_issue
> +                                 + neon1_cx1_issue + neon1_cx2_issue
> +                                 + neon1_ls1_issue + neon1_ls2_issue")
> +
> +;; Issue groups.
> +
> +(define_reservation "neon1_b" "neon1_bx")
> +(define_reservation "neon1_i" "(neon1_sx1 | neon1_sx2 | neon1_mx)")
> +(define_reservation "neon1_m" "neon1_mx")
> +(define_reservation "neon1_d" "(neon1_sx2 | neon1_mx)")
> +(define_reservation "neon1_l" "(neon1_ls1 | neon1_ls2)")
> +(define_reservation "neon1_v" "(neon1_cx1 | neon1_cx2)")
> +(define_reservation "neon1_v0" "neon1_cx1")
> +(define_reservation "neon1_v1" "neon1_cx2")
> +
> +;; Intructions resouces.
> +
> +;; Block.
> +(define_insn_reservation "neoverse_n1_block" 1
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "block"))
> +  "neon1_block")
> +
> +;; Branches
> +;; No latency as there is no result.
> +(define_insn_reservation "neoverse_n1_branch" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "branch"))
> +  "neon1_b")
> +
> +;; Calls
> +;; No latency as there is no result.
> +(define_insn_reservation "neoverse_n1_call" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "call"))
> +  "neon1_i + neon1_b")
> +
> +;; ALU with no or simple shift.
> +;; TODO: there should also be "alus_shift_imm_lsl_1to4".
> +(define_insn_reservation "neoverse_n1_alu" 1
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "alu_imm, alu_shift_imm_lsl_1to4, alu_sreg, \
> +                        alus_imm, alus_sreg, \
> +                        csel, \
> +                        logic_imm, logic_reg, logic_shift_imm, \
> +                        logics_imm, logics_reg, \
> +                        mov_reg"))
> +  "neon1_i")
> +
> +;; ALU with extension or complex shift.
> +;; TODO: there should also be "alus_shift_imm_other".
> +(define_insn_reservation "neoverse_n1_alu_shift" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "alu_ext, alu_shift_imm_other, alu_shift_reg, \
> +                        alus_shift_imm, alus_shift_reg, \
> +                        logic_shift_reg, logics_shift_imm, logics_shift_reg, 
> \
> +                        crc"))
> +  "neon1_m")
> +
> +;; Miscellaneous ALU.
> +;; TODO: model 2-register "extr", "bfi", variable shifts.
> +(define_insn_reservation "neoverse_n1_alu_misc" 1
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "adr, rotate_imm, bfm, clz, mov_imm, rbit, rev"))
> +  "neon1_i")
> +
> +;; Integer divide.
> +;; Divisions are not pipelined.
> +(define_insn_reservation "neoverse_n1_div" 12
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "udiv, sdiv"))
> +  "neon1_m, (neon1_m_block * 12)")
> +
> +;; Narrow multiply.
> +(define_insn_reservation "neoverse_n1_mul" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "mla, mul"))
> +  "neon1_m")
> +
> +;; Wide multiply.
> +;; TODO: model multiply high.
> +(define_insn_reservation "neoverse_n1_mull" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "smull, umull"))
> +  "neon1_m")
> +
> +;; Integer load.
> +(define_insn_reservation "neoverse_n1_ld" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "load_byte, load_4, load_8"))
> +  "neon1_l")
> +
> +(define_insn_reservation "neoverse_n1_ld16" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "load_16"))
> +  "neon1_l * 2")
> +
> +;; Integer store.
> +(define_insn_reservation "neoverse_n1_st" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "store_4, store_8"))
> +  "neon1_d, neon1_l")
> +
> +(define_insn_reservation "neoverse_n1_stp" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "store_16"))
> +  "neon1_i, (neon1_l * 2)")
> +
> +;; FP arithmetic.
> +(define_insn_reservation "neoverse_n1_fp_alu" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_minmaxd, f_minmaxs, \
> +                        faddd, fadds, \
> +                        fconstd, fconsts, \
> +                        fcsel, \
> +                        ffarithd, ffariths, \
> +                        fmov"))
> +  "neon1_v")
> +
> +;; FP compare.
> +(define_insn_reservation "neoverse_n1_fp_cmp" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "fcmpd, fcmps, fccmpd, fccmps"))
> +  "neon1_v0")
> +
> +;; FP round.
> +(define_insn_reservation "neoverse_n1_fp_rint" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_rintd, f_rints"))
> +  "neon1_v0")
> +
> +;; FP divide & square-root.
> +;; Divisions are not pipelined.
> +(define_insn_reservation "neoverse_n1_fp_divd" 15
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "fdivd, fsqrtd"))
> +  "neon1_v0, (neon1_v0_block * 15)")
> +
> +(define_insn_reservation "neoverse_n1_fp_divs" 10
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "fdivs, fsqrts"))
> +  "neon1_v0, (neon1_v0_block * 10)")
> +
> +;; FP multiply.
> +(define_insn_reservation "neoverse_n1_fp_mul" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "fmuld, fmuls"))
> +  "neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_fp_mac" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "fmacd, fmacs"))
> +  "neon1_v")
> +
> +;; FP convert.
> +(define_insn_reservation "neoverse_n1_fp_cvt" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_cvt"))
> +  "neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_fp_cvti2f" 6
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_cvti2f"))
> +  "neon1_m + neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_fp_cvtf2i" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_cvtf2i"))
> +  "neon1_v0 + neon1_v1")
> +
> +;; FP move.
> +(define_insn_reservation "neoverse_n1_fp_mov" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "fconstd, fconsts, \
> +                        fmov"))
> +  "neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_fp_movi2f" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_mcr"))
> +  "neon1_m")
> +
> +(define_insn_reservation "neoverse_n1_fp_movf2i" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_mrc, \
> +                        neon_to_gp, neon_to_gp_q"))
> +  "neon1_v1")
> +
> +;; FP load.
> +(define_insn_reservation "neoverse_n1_fp_ld" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_loadd, f_loads"))
> +  "neon1_i, neon1_l")
> +
> +(define_insn_reservation "neoverse_n1_fp_ldp" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_ldp"))
> +  "neon1_i, (neon1_l * 2)")
> +
> +(define_insn_reservation "neoverse_n1_fp_ldp_q" 7
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_ldp_q"))
> +  "neon1_i, (neon1_l * 2)")
> +
> +;; FP store.
> +(define_insn_reservation "neoverse_n1_fp_st" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "f_stored, f_stores"))
> +  "neon1_i, neon1_l")
> +
> +(define_insn_reservation "neoverse_n1_fp_stp" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_stp"))
> +  "neon1_l + neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_fp_stp_q" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_stp_q"))
> +  "(neon1_l * 2) + neon1_v")
> +
> +;; ASIMD arithmetic.
> +(define_insn_reservation "neoverse_n1_asimd_abd_long" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_abd_long"))
> +  "neon1_v1")
> +
> +(define_insn_reservation "neoverse_n1_asimd_alu" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_abd, neon_abd_q, \
> +                        neon_abs, neon_abs_q, \
> +                        neon_add, neon_add_q, \
> +                        neon_add_halve, neon_add_halve_q, \
> +                        neon_add_halve_narrow_q, \
> +                        neon_add_long, neon_add_widen, \
> +                        neon_bsl, neon_bsl_q, \
> +                        neon_cls, neon_cls_q, \
> +                        neon_compare, neon_compare_q, \
> +                        neon_compare_zero, neon_compare_zero_q, \
> +                        neon_dot, neon_dot_q, \
> +                        neon_dup, neon_dup_q, \
> +                        neon_ext, neon_ext_q, \
> +                        neon_ins, neon_ins_q, \
> +                        neon_logic, neon_logic_q, \
> +                        neon_minmax, neon_minmax_q, \
> +                        neon_move, neon_move_q, \
> +                        neon_move_narrow_q, \
> +                        neon_neg, neon_neg_q, \
> +                        neon_permute, neon_permute_q, \
> +                        neon_qabs, neon_qabs_q, \
> +                        neon_qadd, neon_qadd_q, \
> +                        neon_qneg, neon_qneg_q, \
> +                        neon_qsub, neon_qsub_q, \
> +                        neon_rbit, neon_rbit_q, \
> +                        neon_reduc_add, neon_reduc_add_q, \
> +                        neon_rev, neon_rev_q, \
> +                        neon_sub, neon_sub_q, \
> +                        neon_sub_halve, neon_sub_halve_q, \
> +                        neon_sub_halve_narrow_q, \
> +                        neon_sub_widen, neon_sub_long, \
> +                        neon_tbl1, neon_tbl1_q, \
> +                        neon_tbl2, neon_tbl2_q"))
> +  "neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_arith_acc" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_arith_acc"))
> +  "neon1_v1")
> +
> +(define_insn_reservation "neoverse_n1_asimd_shift_acc_q" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_shift_acc_q"))
> +  "neon1_v1")
> +
> +(define_insn_reservation "neoverse_n1_asimd_reduc" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_reduc_add_long, \
> +                        neon_reduc_minmax, neon_reduc_minmax_q"))
> +  "neon1_v1")
> +
> +
> +;; ASIMD multiply.
> +(define_insn_reservation "neoverse_n1_asimd_mla" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_mla_b, neon_mla_b_long, \
> +                        neon_mla_h, neon_mla_h_long, \
> +                        neon_mla_h_scalar, neon_mla_h_scalar_long, \
> +                        neon_mla_s, neon_mla_s_long, \
> +                        neon_mla_s_scalar, neon_mla_s_scalar_long"))
> +  "neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_asimd_mla_q" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_mla_b_q,
> +                        neon_mla_h_q, neon_mla_h_scalar_q, \
> +                        neon_mla_s_q, neon_mla_s_scalar_q"))
> +  "neon1_v0 * 2")
> +
> +(define_insn_reservation "neoverse_n1_asimd_mul" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_mul_b, neon_mul_b_long, \
> +                        neon_mul_h, neon_mul_h_long, \
> +                        neon_mul_s, neon_mul_s_long,
> +                        neon_sat_mul_b, neon_sat_mul_b_long,
> +                        neon_sat_mul_h, neon_sat_mul_h_long, \
> +                        neon_sat_mul_h_scalar, neon_sat_mul_h_scalar_long,
> +                        neon_sat_mul_s, neon_sat_mul_s_long, \
> +                        neon_sat_mul_s_scalar, neon_sat_mul_s_scalar_long"))
> +  "neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_asimd_mul_q" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_mul_b_q, neon_mul_h_q, neon_mul_s_q, \
> +                        neon_sat_mul_b_q, \
> +                        neon_sat_mul_h_q, neon_sat_mul_h_scalar_q, \
> +                        neon_sat_mul_s_q, neon_sat_mul_s_scalar_q"))
> +  "neon1_v0 * 2")
> +
> +(define_insn_reservation "neoverse_n1_asimd_sat_mla" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_sat_mla_b_long, \
> +                        neon_sat_mla_h_long, neon_sat_mla_h_scalar_long, \
> +                        neon_sat_mla_s_long, neon_sat_mla_s_scalar_long"))
> +  "neon1_v0")
> +
> +;; ASIMD shift.
> +(define_insn_reservation "neoverse_n1_asimd_shift" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_shift_imm, neon_shift_imm_q, 
> neon_shift_imm_long, \
> +                        neon_shift_reg, neon_shift_reg_q"))
> +  "neon1_v1")
> +
> +(define_insn_reservation "neoverse_n1_asimd_shift_q" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_sat_shift_imm, neon_sat_shift_imm_q, \
> +                        neon_sat_shift_imm_narrow_q, \
> +                        neon_sat_shift_reg, neon_sat_shift_reg_q, \
> +                        neon_shift_imm_narrow_q"))
> +  "neon1_v1")
> +
> +;; ASIMD FP arithmetic.
> +(define_insn_reservation "neoverse_n1_asimd_fp_alu" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_abd_d, neon_fp_abd_d_q, \
> +                        neon_fp_abd_s, neon_fp_abd_s_q, \
> +                        neon_fp_abs_d, neon_fp_abs_d_q, \
> +                        neon_fp_abs_s, neon_fp_abs_s_q, \
> +                        neon_fp_addsub_d, neon_fp_addsub_d_q, \
> +                        neon_fp_addsub_s, neon_fp_addsub_s_q, \
> +                        neon_fp_compare_d, neon_fp_compare_d_q, \
> +                        neon_fp_compare_s, neon_fp_compare_s_q, \
> +                        neon_fp_minmax_d, neon_fp_minmax_d_q, \
> +                        neon_fp_minmax_s, neon_fp_minmax_s_q, \
> +                        neon_fp_neg_d, neon_fp_neg_d_q, \
> +                        neon_fp_neg_s, neon_fp_neg_s_q, \
> +                        neon_fp_reduc_add_d, neon_fp_reduc_add_d_q, \
> +                        neon_fp_reduc_add_s, neon_fp_reduc_add_s_q"))
> +  "neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_fp_reduc" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_reduc_minmax_d, neon_fp_reduc_minmax_d_q, \
> +                        neon_fp_reduc_minmax_s, neon_fp_reduc_minmax_s_q"))
> +  "neon1_v")
> +
> +;; ASIMD FP convert.
> +(define_insn_reservation "neoverse_n1_asimd_cvt" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_cvt_narrow_d_q, \
> +                        neon_fp_cvt_widen_s, \
> +                        neon_fp_to_int_d, neon_fp_to_int_d_q, \
> +                        neon_fp_to_int_s, \
> +                        neon_int_to_fp_d, neon_int_to_fp_d_q, \
> +                        neon_int_to_fp_s, \
> +                        neon_fp_recpe_d, neon_fp_recpe_s, \
> +                        neon_fp_recpx_d, neon_fp_recpx_s, \
> +                        neon_fp_round_d, neon_fp_round_d_q, \
> +                        neon_fp_round_s"))
> +  "neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_asimd_cvt_q" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_cvt_narrow_s_q, \
> +                        neon_fp_cvt_widen_h, \
> +                        neon_fp_to_int_s_q, \
> +                        neon_int_to_fp_s_q, \
> +                        neon_fp_recpe_d_q, neon_fp_recpe_s_q, \
> +                        neon_fp_recpx_d_q, neon_fp_recpx_s_q, \
> +                        neon_fp_round_s_q"))
> +  "neon1_v0 * 2")
> +
> +;; ASIMD FP divide & square-root.
> +;; Divisions are not pipelined.
> +(define_insn_reservation "neoverse_n1_asimd_fp_divd_q" 15
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_div_d_q"))
> +  "neon1_v0, (neon1_v0_block * 14)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_fp_divs" 10
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_div_s"))
> +  "neon1_v0, (neon1_v0_block * 5)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_fp_divs_q" 10
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_div_s_q"))
> +  "neon1_v0, (neon1_v0_block * 9)")
> +

Why 9 here ?

> +(define_insn_reservation "neoverse_n1_asimd_fp_sqrtd_q" 17
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_sqrt_d_q"))
> +  "neon1_v0, (neon1_v0_block * 16)")

Such large numbers and large reservations of units usually increases
compile time and automaton size for not enough gain - Please look at
this and the automaton size using the verbose options that are
available for debugging automatons.

16 is a really large number - possibly worth dropping it down to 5 or 6.

Consider something similar above.

> +
> +(define_insn_reservation "neoverse_n1_asimd_fp_sqrts" 10
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_sqrt_s"))
> +  "neon1_v0, (neon1_v0_block * 5)")
> +




> +(define_insn_reservation "neoverse_n1_asimd_fp_sqrts_q" 10
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_sqrt_s_q"))
> +  "neon1_v0, (neon1_v0_block * 9)")
> +
> +;; ASIMD FP multiply.
> +(define_insn_reservation "neoverse_n1_asimd_fp_mul" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_mul_d, neon_fp_mul_d_q, 
> neon_fp_mul_d_scalar_q, \
> +                        neon_fp_mul_s, neon_fp_mul_s_q, 
> neon_fp_mul_s_scalar_q"))
> +  "neon1_v")
> +
> +;; TODO: model the long form.
> +(define_insn_reservation "neoverse_n1_asimd_fp_mla" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_fp_mla_d, neon_fp_mla_d_q, 
> neon_fp_mla_d_scalar_q, \
> +                        neon_fp_mla_s, neon_fp_mla_s_q, 
> neon_fp_mla_s_scalar_q, \
> +                        neon_fp_recps_d, neon_fp_recps_d_q, \
> +                        neon_fp_recps_s, neon_fp_recps_s_q"))
> +  "neon1_v")
> +
> +;; ASIMD miscellaneous.
> +(define_insn_reservation "neoverse_n1_asimd_gp_fp" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_from_gp, neon_from_gp_q"))
> +  "neon1_m")
> +
> +;; TODO: model "tbx" fully.
> +(define_insn_reservation "neoverse_n1_asimd_tbl_3" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_tbl3, neon_tbl3_q"))
> +  "neon1_v * 4")
> +
> +(define_insn_reservation "neoverse_n1_asimd_tbl_4" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_tbl4, neon_tbl4_q"))
> +  "neon1_v * 6")
> +
> +;; ASIMD load.
> +(define_insn_reservation "neoverse_n1_asimd_ld_a" 5
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load1_1reg, neon_load1_1reg_q, \
> +                        neon_load1_2reg, neon_load1_2reg_q"))
> +  "neon1_l")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_b" 6
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load1_3reg, neon_load1_3reg_q"))
> +  "neon1_l * 3")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_c" 6
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load1_4reg, neon_load1_4reg_q"))
> +  "neon1_l * 4")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_d" 7
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load1_all_lanes, neon_load1_all_lanes_q, \
> +                        neon_load1_one_lane, neon_load1_one_lane_q"))
> +  "neon1_l + neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_e" 7
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load2_2reg, neon_load2_2reg_q, \
> +                        neon_load2_all_lanes, neon_load2_all_lanes_q, \
> +                        neon_load2_one_lane, neon_load2_one_lane_q"))
> +  "(neon1_l * 2) + neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_f" 8
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load3_3reg, neon_load3_3reg_q, \
> +                        neon_load4_all_lanes, neon_load4_all_lanes_q, \
> +                        neon_load4_one_lane, neon_load4_one_lane_q"))
> +  "(neon1_l * 4) + neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_g" 7
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load3_all_lanes, neon_load3_all_lanes_q, \
> +                        neon_load3_one_lane, neon_load3_one_lane_q"))
> +  "(neon1_l * 4) + neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_h" 8
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load4_4reg"))
> +  "(neon1_l * 7) + neon1_v")
> +
> +(define_insn_reservation "neoverse_n1_asimd_ld_i" 10
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_load4_4reg_q"))
> +  "(neon1_l * 10) + neon1_v")
> +
> +;; ASIMD store.
> +(define_insn_reservation "neoverse_n1_asimd_st_a" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_1reg, neon_store1_1reg_q, \
> +                        neon_store1_2reg"))
> +  "neon1_v + neon1_l")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_b" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_1reg_q, \
> +                        neon_store1_2reg"))
> +  "neon1_v + (neon1_l * 2)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_c" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_2reg_q, \
> +                        neon_store1_4reg"))
> +  "neon1_v + (neon1_l * 4)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_d" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_3reg"))
> +  "neon1_v + (neon1_l * 3)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_e" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_3reg_q"))
> +  "neon1_v + (neon1_l * 6)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_f" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_4reg_q"))
> +  "neon1_v + (neon1_l * 8)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_g" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store1_one_lane, neon_store1_one_lane_q, \
> +                        neon_store2_2reg, \
> +                        neon_store2_one_lane, neon_store2_one_lane_q"))
> +  "neon1_v + (neon1_l * 2)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_h" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store2_2reg_q, \
> +                        neon_store3_3reg, \
> +                        neon_store3_one_lane_q"))
> +  "neon1_v + (neon1_l * 4)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_i" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store3_3reg_q"))
> +  "neon1_v + (neon1_l * 6)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_j" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store3_one_lane"))
> +  "neon1_v + (neon1_l * 4)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_k" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store4_4reg"))
> +  "neon1_v + (neon1_l * 6)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_l" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store4_4reg_q"))
> +  "neon1_v + (neon1_l * 12)")
> +
> +(define_insn_reservation "neoverse_n1_asimd_st_m" 0
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "neon_store4_one_lane, neon_store4_one_lane_q"))
> +  "neon1_v + (neon1_l * 3)")
> +
> +;; ASIMD crypto.
> +;; TODO: model different widths.
> +(define_insn_reservation "neoverse_n1_asimd_aese" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "crypto_aese"))
> +  "neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_asimd_aesmc" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "crypto_aesmc"))
> +  "neon1_v0")
> +
> +;; FIXME: "sha256u1" should be "crypto_sha256_fast".
> +(define_insn_reservation "neoverse_n1_asimd_sha_fast" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "crypto_sha1_fast, crypto_sha1_xor,
> +                        crypto_sha256_fast"))
> +  "neon1_v0")
> +
> +(define_insn_reservation "neoverse_n1_asimd_sha_slow" 4
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "crypto_sha1_slow, crypto_sha256_slow"))
> +  "neon1_v0")
> +
> +;; FIXME: "pmull" sometimes is also "neon_mul_{b,h,s}(_scalar)?(_(q|long))?"
> +(define_insn_reservation "neoverse_n1_asimd_poly" 3
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "crypto_pmull"))
> +  "neon1_v0")
> +
> +;; CRC
> +(define_insn_reservation "neoverse_n1_crc" 2
> +  (and (eq_attr "tune" "neoversen1")
> +       (eq_attr "type" "crc"))
> +  "neon1_m")
> +
> +;; Bypasses.
> +
> +;; Integer multiply.
> +;; TODO: model the X and high forms.
> +(define_bypass 1 "neoverse_n1_mul, neoverse_n1_mull"
> +                 "neoverse_n1_mul, neoverse_n1_mull")
> +
> +;; FP multiply.
> +(define_bypass 2 "neoverse_n1_fp_mul" "neoverse_n1_fp_mul")
> +(define_bypass 2 "neoverse_n1_fp_mac" "neoverse_n1_fp_mac")
> +
> +;; ASIMD arithmetic.
> +(define_bypass 1 "neoverse_n1_asimd_arith_acc" "neoverse_n1_asimd_arith_acc")
> +(define_bypass 1 "neoverse_n1_asimd_shift_acc_q" 
> "neoverse_n1_asimd_shift_acc_q")
> +
> +;; ASIMD multiply.
> +(define_bypass 1 "neoverse_n1_asimd_mla"   "neoverse_n1_asimd_mla")
> +(define_bypass 2 "neoverse_n1_asimd_mla_q" "neoverse_n1_asimd_mla_q")
> +
> +;; ASIMD FP multiply.
> +(define_bypass 2 "neoverse_n1_asimd_fp_mul" "neoverse_n1_asimd_fp_mul")
> +(define_bypass 2 "neoverse_n1_asimd_fp_mla" "neoverse_n1_asimd_fp_mla")
> +
> +;; CRC
> +(define_bypass 1 "neoverse_n1_crc" "neoverse_n1_*")
> --
> 2.39.2 (Apple Git-143)
>

Re: [PATCH] aarch64: Add the scheduling model for Neoverse N1

Reply via email to