On Fri, 2020-12-04 at 13:19 -0600, acsawdey--- via Gcc-patches wrote:
> From: Aaron Sawdey <[email protected]>
>
Assorted comments sprinkled around below.
thanks
-Will
> This patch adds the first batch of patterns to support p10 fusion. These
> will allow combine to create a single insn for a pair of instructions
> that that power10 can fuse and execute. These particular ones have the
Just one that, or maybe 'that the'.
s/ones/fusion pairs/ ?
> requirement that only cr0 can be used when fusing a load with a compare
> immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
> to put that requirement in, and if it doesn't work out later the splitter
> can get used.
... splitter can get used, or ... splitter will <do something...>
>
> The patterns are generated by a script genfusion.pl and live in new file
> fusion.md. This script will be expanded to generate more patterns for
> fusion.
ok
>
> This also adds option -mpower10-fusion which defaults on for power10 and
> will gate all these fusion patterns. In addition I have added an
> undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
> that just controls the load+compare-immediate patterns. I have make
made
> these default on for power10 but they are not disallowed for earlier
> processors because it is still valid code. This allows us to test the
> correctness of fusion code generation by turning it on explicitly.
>
> If bootstrap/regtest is clean, ok for trunk?
>
> Thanks!
>
> Aaron
>
> gcc/ChangeLog:
>
> * config/rs6000/genfusion.pl: New file, script to generate
> define_insn_and_split patterns so combine can arrange fused
> instructions next to each other.
New script to generate ...
> * config/rs6000/fusion.md: New file, generated fused instruction
> patterns for combine.
> * config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
> (non_update_memory_operand): New predicate.
ok
> * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
> OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
> POWERPC_MASKS.
> * config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
> prototype.
All usages of address_is_non_pfx_d_or_x() appear to be negated, i.e.
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0),
DImode, NON_PREFIXED_DS))"
Fully understanding that naming is
hard, I'd wonder if that can be adjusted to avoid the double negative.
something like (address_load_mode_requires_prefix (...foo) ?
> * config/rs6000/rs6000.c (rs6000_option_override_internal):
> automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
> if target is power10. (rs600_opt_masks): Allow -mpower10-fusion
> in function attributes. (address_is_non_pfx_d_or_x): New function.
ok
> * config/rs6000/rs6000.h: Add MASK_P10_FUSION.
> * config/rs6000/rs6000.md: Include fusion.md.
> * config/rs6000/rs6000.opt: Add -mpower10-fusion
> and -mpower10-fusion-ld-cmpi.
ok
> * config/rs6000/t-rs6000: Add dependencies involving fusion.md.
ok
> ---
> gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++
> gcc/config/rs6000/genfusion.pl | 144 ++++++++++++
> gcc/config/rs6000/predicates.md | 14 ++
> gcc/config/rs6000/rs6000-cpus.def | 6 +-
> gcc/config/rs6000/rs6000-protos.h | 2 +
> gcc/config/rs6000/rs6000.c | 51 +++++
> gcc/config/rs6000/rs6000.h | 1 +
> gcc/config/rs6000/rs6000.md | 1 +
> gcc/config/rs6000/rs6000.opt | 8 +
> gcc/config/rs6000/t-rs6000 | 6 +-
> 10 files changed, 588 insertions(+), 2 deletions(-)
> create mode 100644 gcc/config/rs6000/fusion.md
> create mode 100755 gcc/config/rs6000/genfusion.pl
>
> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
> new file mode 100644
> index 00000000000..a4d3a6ae7f3
> --- /dev/null
> +++ b/gcc/config/rs6000/fusion.md
> @@ -0,0 +1,357 @@
> +;; -*- buffer-read-only: t -*-
> +;; Generated automatically by genfusion.pl
> +
> +;; Copyright (C) 2020 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it under
> +;; the terms of the GNU General Public License as published by the Free
> +;; Software Foundation; either version 3, or (at your option) any later
> +;; version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +;; for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3. If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is DI result mode is clobber compare mode is CC extend is none
> +(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
> + (match_operand:DI 3 "const_m1_to_1_operand" "n")))
> + (clobber (match_scratch:DI 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "ld%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is DI result mode is clobber compare mode is CCUNS extend is
> none
> +(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
> + (match_operand:DI 3 "const_0_to_1_operand" "n")))
> + (clobber (match_scratch:DI 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "ld%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is DI result mode is DI compare mode is CC extend is none
> +(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
> + (match_operand:DI 3 "const_m1_to_1_operand" "n")))
> + (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "ld%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is DI result mode is DI compare mode is CCUNS extend is none
> +(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
> + (match_operand:DI 3 "const_0_to_1_operand" "n")))
> + (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "ld%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is SI result mode is clobber compare mode is CC extend is none
> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
> + (match_operand:SI 3 "const_m1_to_1_operand" "n")))
> + (clobber (match_scratch:SI 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is SI result mode is clobber compare mode is CCUNS extend is
> none
> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
> + (match_operand:SI 3 "const_0_to_1_operand" "n")))
> + (clobber (match_scratch:SI 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is SI result mode is SI compare mode is CC extend is none
> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
> + (match_operand:SI 3 "const_m1_to_1_operand" "n")))
> + (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is SI result mode is SI compare mode is CCUNS extend is none
> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
> + (match_operand:SI 3 "const_0_to_1_operand" "n")))
> + (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (match_dup 1))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
> + (match_operand:SI 3 "const_m1_to_1_operand" "n")))
> + (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI
> (match_dup 1)))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode,
> NON_PREFIXED_DS))"
> + [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
> + (match_operand:SI 3 "const_0_to_1_operand" "n")))
> + (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI
> (match_dup 1)))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is HI result mode is clobber compare mode is CC extend is sign
> +(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
> + (match_operand:HI 3 "const_m1_to_1_operand" "n")))
> + (clobber (match_scratch:GPR 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lha%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is HI result mode is clobber compare mode is CCUNS extend is
> zero
> +(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
> + (match_operand:HI 3 "const_0_to_1_operand" "n")))
> + (clobber (match_scratch:GPR 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
> +(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
> + [(set (match_operand:CC 2 "cc_reg_operand" "=x")
> + (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
> + (match_operand:HI 3 "const_m1_to_1_operand" "n")))
> + (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI
> (match_dup 1)))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lha%X1 %0,%1\;cmpdi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CC (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
> +(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
> + (match_operand:HI 3 "const_0_to_1_operand" "n")))
> + (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI
> (match_dup 1)))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is QI result mode is clobber compare mode is CCUNS extend is
> zero
> +(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
> + (match_operand:QI 3 "const_0_to_1_operand" "n")))
> + (clobber (match_scratch:GPR 0 "=r"))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
> +;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
> +(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> + (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
> + (match_operand:QI 3 "const_0_to_1_operand" "n")))
> + (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR
> (match_dup 1)))]
> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
> + "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
> + "&& reload_completed
> + && (cc_reg_not_cr0_operand (operands[2], CCmode)
> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode,
> NON_PREFIXED_D))"
> + [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
> + (set (match_dup 2)
> + (compare:CCUNS (match_dup 0)
> + (match_dup 3)))]
> + ""
> + [(set_attr "type" "load")
> + (set_attr "cost" "8")
> + (set_attr "length" "8")])
> +
Reviewed with a mix of in-depth analysis and a skim.. nothing jumped
out at me here.
> diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
> new file mode 100755
> index 00000000000..494537c9439
> --- /dev/null
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -0,0 +1,144 @@
> +#!/usr/bin/perl -w
> +# Generate fusion.md
> +# Copyright (C) 2020 Free Software Foundation, Inc.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3. If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +my $copyright = <<'EOF';
> +;; -*- buffer-read-only: t -*-
> +;; Generated automatically by genfusion.pl
> +
> +;; Copyright (C) 2020 Free Software Foundation, Inc.
> +;;
Embedding the date in an autogenerated file catches my eye. I don't
see this in things like $GCC_BUILD/gcc/insn-recog.c ; I'm not sure it's
necessary in this case. (but prob doesn't hurt).
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it under
> +;; the terms of the GNU General Public License as published by the Free
> +;; Software Foundation; either version 3, or (at your option) any later
> +;; version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +;; for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3. If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +EOF
> +
> +print $copyright;
> +
> +sub mode_to_ldst_char
> +{
> + my ($mode) = @_;
> + if ($mode eq 'DI') { return 'd'; }
> + if ($mode eq 'SI') { return 'w'; }
> + if ($mode eq 'HI') { return 'h'; }
> + if ($mode eq 'QI') { return 'b'; }
> + return '?';
> +}
> +
> +sub gen_ld_cmpi_p10
> +{
> + LMODE: foreach $lmode ('DI','SI','HI','QI') {
> + $ldst = mode_to_ldst_char($lmode);
> + $clobbermode = $lmode;
> + # For clobber, we need a SI/DI reg in case we split because we have to
> sign/zero extend.
> + if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
> + RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
> + # EXTDI does not exist, and we cannot directly produce HI/QI results.
> + next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
> + # Don't allow EXTQI because that would allow HI result which we can't
> do.
> + if ( $result eq "EXTQI" ) { $result = "GPR"; }
> + CCMODE: foreach $ccmode ('CC','CCUNS') {
> + $np = "NON_PREFIXED_D";
> + if ( $ccmode eq 'CC' ) {
> + next CCMODE if $lmode eq 'QI';
> + if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
> + # ld and lwa are both DS-FORM.
> + $np = "NON_PREFIXED_DS";
> + }
> + $cmpl = "";
> + $echr = "a";
> + $constpred = "const_m1_to_1_operand";
> + } else {
> + if ( $lmode eq 'DI' ) {
> + # ld is DS-form, but lwz is not.
> + $np = "NON_PREFIXED_DS";
> + }
> + $cmpl = "l";
> + $echr = "z";
> + $constpred = "const_0_to_1_operand";
> + }
> + if ($lmode eq 'DI') { $echr = ""; }
> + if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
> + # We always need extension if result > lmode.
> + if ( $ccmode eq 'CC' ) {
> + $extend = "sign";
> + } else {
> + $extend = "zero";
> + }
> + } else {
> + # Result of SI/DI does not need sign extension.
> + $extend = "none";
> + }
> + print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
> + print ";; load mode is $lmode result mode is $result compare mode is
> $ccmode extend is $extend\n";
> +
> + print "(define_insn_and_split
> \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
> + print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\"
> \"=x\")\n";
> + print " (compare:${ccmode} (match_operand:${lmode} 1
> \"non_update_memory_operand\" \"m\")\n";
> + print " (match_operand:${lmode} 3 \"${constpred}\"
> \"n\")))\n";
> + if ($result eq 'clobber') {
> + print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
> + } elsif ($result eq $lmode) {
> + print " (set (match_operand:${result} 0 \"gpc_reg_operand\"
> \"=r\") (match_dup 1))]\n";
> + } else {
> + print " (set (match_operand:${result} 0 \"gpc_reg_operand\"
> \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
> + }
> + print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
> + print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
> + print " \"&& reload_completed\n";
> + print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
> + print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0),
> ${lmode}mode, ${np}))\"\n";
> + if ($extend eq "none") {
> + print " [(set (match_dup 0) (match_dup 1))\n";
> + } else {
> + $resultmode = $result;
> + if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
> + print " [(set (match_dup 0) (${extend}_extend:${resultmode}
> (match_dup 1)))\n";
> + }
> + print " (set (match_dup 2)\n";
> + print " (compare:${ccmode} (match_dup 0)\n";
> + print " (match_dup 3)))]\n";
> + print " \"\"\n";
> + print " [(set_attr \"type\" \"load\")\n";
> + print " (set_attr \"cost\" \"8\")\n";
> + print " (set_attr \"length\" \"8\")])\n";
> + print "\n";
> + }
> + }
> + }
> +}
Looked over, seems OK. presumably testing will reveal any issues. :-)
> +
> +
> +gen_ld_cmpi_p10();
> +
> +exit(0);
> +
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 9ad5ae67302..78de8102f44 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -297,6 +297,11 @@ (define_predicate "const_0_to_1_operand"
> (and (match_code "const_int")
> (match_test "IN_RANGE (INTVAL (op), 0, 1)")))
>
> +;; Match op = -1, op = 0, or op = 1.
> +(define_predicate "const_m1_to_1_operand"
> + (and (match_code "const_int")
> + (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
> +
What does the _m1 indicate here? (I can't tell from pre-existing usage
if it's negative, or match or mode or something other..)
> ;; Match op = 0..3.
> (define_predicate "const_0_to_3_operand"
> (and (match_code "const_int")
> @@ -847,6 +852,15 @@ (define_special_predicate "update_address_mem"
> || GET_CODE (XEXP (op, 0)) == PRE_DEC
> || GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
>
> +;; Anything that matches memory_operand but does not update the address.
> +(define_predicate "non_update_memory_operand"
> + (match_code "mem")
> +{
> + if (update_address_mem (op, mode))
> + return 0;
> + return memory_operand (op, mode);
> +})
> +
> ;; Return 1 if the operand is a MEM with an indexed-form address.
> (define_special_predicate "indexed_address_mem"
> (match_test "(MEM_P (op)
> diff --git a/gcc/config/rs6000/rs6000-cpus.def
> b/gcc/config/rs6000/rs6000-cpus.def
> index 8d2c1ffd6cf..3e65289d8df 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -82,7 +82,9 @@
>
> #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
> | OPTION_MASK_POWER10 \
> - | OTHER_POWER10_MASKS)
> + | OTHER_POWER10_MASKS \
> + | OPTION_MASK_P10_FUSION \
> + | OPTION_MASK_P10_FUSION_LD_CMPI)
>
> /* Flags that need to be turned off if -mno-power9-vector. */
> #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW
> \
> @@ -129,6 +131,8 @@
> | OPTION_MASK_FLOAT128_KEYWORD \
> | OPTION_MASK_FPRND \
> | OPTION_MASK_POWER10 \
> + | OPTION_MASK_P10_FUSION \
> + | OPTION_MASK_P10_FUSION_LD_CMPI \
> | OPTION_MASK_HTM \
> | OPTION_MASK_ISEL \
> | OPTION_MASK_MFCRF \
ok
> diff --git a/gcc/config/rs6000/rs6000-protos.h
> b/gcc/config/rs6000/rs6000-protos.h
> index 3c4682b0e26..cd644083558 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -191,6 +191,8 @@ enum non_prefixed_form {
>
> extern enum insn_form address_to_insn_form (rtx, machine_mode,
> enum non_prefixed_form);
> +extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
> + enum non_prefixed_form
> non_prefix_format);
> extern bool prefixed_load_p (rtx_insn *);
> extern bool prefixed_store_p (rtx_insn *);
> extern bool prefixed_paddi_p (rtx_insn *);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 517467ebc63..759551d07ec 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4423,6 +4423,12 @@ rs6000_option_override_internal (bool global_init_p)
> if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
> rs6000_isa_flags |= OPTION_MASK_MMA;
>
> + if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION)
> == 0)
> + rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
> +
> + if (TARGET_POWER10 && (rs6000_isa_flags_explicit &
> OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
> + rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
> +
> /* Turn off vector pair/mma options on non-power10 systems. */
> else if (!TARGET_POWER10 && TARGET_MMA)
> {
> @@ -23614,6 +23620,7 @@ static struct rs6000_opt_mask const
> rs6000_opt_masks[] =
> { "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
> { "power9-misc", OPTION_MASK_P9_MISC, false, true },
> { "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
> + { "power10-fusion", OPTION_MASK_P10_FUSION, false,
> true },
> { "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false,
> true },
> { "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
> { "prefixed", OPTION_MASK_PREFIXED, false,
> true },
> @@ -25705,6 +25712,50 @@ address_to_insn_form (rtx addr,
> return INSN_FORM_BAD;
> }
>
ok
> +/* Given address rtx ADDR for a load of MODE, is this legitimate for a
> + non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
> + given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
> + a D-form or DS-form instruction. X-form and base_reg are always
> + allowed. */
> +bool
> +address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
> + enum non_prefixed_form non_prefixed_format)
> +{
> + enum insn_form result_form;
> +
> + result_form = address_to_insn_form (addr, mode, non_prefixed_format);
> +
> + switch (non_prefixed_format)
> + {
> + case NON_PREFIXED_D:
> + switch (result_form)
> + {
> + case INSN_FORM_X:
> + case INSN_FORM_D:
> + case INSN_FORM_DS:
> + case INSN_FORM_BASE_REG:
> + return true;
> + default:
> + break;
> + }
> + break;
> + case NON_PREFIXED_DS:
> + switch (result_form)
> + {
> + case INSN_FORM_X:
> + case INSN_FORM_DS:
> + case INSN_FORM_BASE_REG:
> + return true;
> + default:
> + break;
> + }
> + break;
> + default:
> + break;
> + }
> + return false;
> +}
> +
> /* Helper function to see if we're potentially looking at lfs/stfs.
> - PARALLEL containing a SET and a CLOBBER
> - stfs:
ok
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 5bf9c83fc1e..307c0b200bd 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
> #define MASK_UPDATE OPTION_MASK_UPDATE
> #define MASK_VSX OPTION_MASK_VSX
> #define MASK_POWER10 OPTION_MASK_POWER10
> +#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
>
> #ifndef IN_LIBGCC2
> #define MASK_POWERPC64 OPTION_MASK_POWERPC64
ok
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b89990f46bf..c39b7098978 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -14926,3 +14926,4 @@ (define_insn "*cmpeqb_internal"
> (include "dfp.md")
> (include "crypto.md")
> (include "htm.md")
> +(include "fusion.md")
ok
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 2888172cb27..008a318b98d 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -479,6 +479,14 @@ mpower8-vector
> Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
> Use vector and scalar instructions added in ISA 2.07.
>
> +mpower10-fusion
> +Target Report Mask(P10_FUSION) Var(rs6000_isa_flags)
> +Fuse certain integer operations together for better performance on power10.
> +
> +mpower10-fusion-ld-cmpi
> +Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
> +Fuse certain integer operations together for better performance on power10.
> +
> mcrypto
> Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
> Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
ok
> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
> index 1ddb5729cb2..bcc71a9e21b 100644
> --- a/gcc/config/rs6000/t-rs6000
> +++ b/gcc/config/rs6000/t-rs6000
> @@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
> $(COMPILE) $<
> $(POSTCOMPILE)
>
> +$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
> + $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
> +
> $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh
> \
> $(srcdir)/config/rs6000/rs6000-cpus.def
> $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
> @@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
> $(srcdir)/config/rs6000/mma.md \
> $(srcdir)/config/rs6000/crypto.md \
> $(srcdir)/config/rs6000/htm.md \
> - $(srcdir)/config/rs6000/dfp.md
> + $(srcdir)/config/rs6000/dfp.md \
> + $(srcdir)/config/rs6000/fusion.md
ok.