On 2/27/24 1:52 AM, Jiawei wrote:
From: Chen Jiawei <jia...@iscas.ac.cn>
Co-Authored by: Lin Jiawei <jiawei....@epfl.ch>
This patch add XiangShan Nanhu cpu microarchitecture,
Nanhu is a 6-issue, superscalar, out-of-order processor.
More details see: https://xiangshan-doc.readthedocs.io/zh-cn/latest/arch
gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): New def.
(RISCV_CORE): Ditto.
* config/riscv/riscv-opts.h (enum
* riscv_microarchitecture_type): New option.
* config/riscv/riscv.cc: New def.
* config/riscv/riscv.md: New include.
* config/riscv/xiangshan.md: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mcpu-xiangshan-nanhu.c: New test.
As was discussed last Tuesday, this should be safe, even at this late
stage in the gcc-14 cycle.
+/* Costs to use when optimizing for xiangshan nanhu. */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+ {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* fp_add */
+ {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* fp_mul */
+ {COSTS_N_INSNS (10), COSTS_N_INSNS (20)}, /* fp_div */
+ {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* int_mul */
+ {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
+ 6, /* issue_rate */
+ 3, /* branch_cost */
+ 3, /* memory_cost */
+ 3, /* fmv_cost */
+ true, /*
slow_unaligned_access */
+ false, /* use_divmod_expansion */
+ RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH, /* fusible_ops */
+ NULL, /* vector cost */
Is your integer division really that fast? The table above essentially
says that your cpu can do integer division in 6 cycles.
+
+(define_insn_reservation "xiangshan_mul" 3
+ (and (eq_attr "tune" "xiangshan")
+ (eq_attr "type" "imul"))
+ "xs_mdu_rs")
+
+(define_insn_reservation "xiangshan_div" 21
+ (and (eq_attr "tune" "xiangshan")
+ (eq_attr "type" "idiv"))
+ "xs_mdu_rs")
Whereas your pipeline description says it's 21c.
I strongly suspect you want to increase the cost of the int_div in the
tuning table. And with a the higher cost you probably want to turn on
use_divmod_expansion.
I'll also note that your scheduler description also indicates your
division is fully pipelined. Is that correct? if not, you'll want to
adjust that reservation.
+
+(define_insn_reservation "xiangshan_sfdiv" 11
+ (and (eq_attr "tune" "xiangshan")
+ (eq_attr "type" "fdiv")
+ (eq_attr "mode" "SF"))
+ "xs_fmisc_rs")
+
+(define_insn_reservation "xiangshan_sfsqrt" 17
+ (and (eq_attr "tune" "xiangshan")
+ (eq_attr "type" "fsqrt")
+ (eq_attr "mode" "SF"))
+ "xs_fmisc_rs")
+
+(define_insn_reservation "xiangshan_dfdiv" 21
+ (and (eq_attr "tune" "xiangshan")
+ (eq_attr "type" "fdiv")
+ (eq_attr "mode" "DF"))
+ "xs_fmisc_rs")
+
+(define_insn_reservation "xiangshan_dfsqrt" 37
+ (and (eq_attr "tune" "xiangshan")
+ (eq_attr "type" "fsqrt")
+ (eq_attr "mode" "DF"))
+ "xs_fmisc_rs")
Similarly these say your fpdiv and fpsqrt are fully pipelined. It's
certainly possible, but I suspect it's really just an oversight. Given
these values you may also want to adjust the cost of an fp division in
the cost table.
Finally with such high values for for the div/sqrt units, we find that
the DFA "blows up" causing genattrtab to run for a very long time. We'll
have to keep an eye on that.
And just to be clear, I think these can be done as a followup patch. I'm
going to push this patch as-is rather than make any adjustments -- you
almost certainly know the processor's capabilities better than myself or
anyone else on this list :-)
Jeff