Hi Uros, 在 2026/5/28 19:26, Kewen Lin 写道: > Hi, > > 在 2026/5/28 14:29, Uros Bizjak 写道: >> On Tue, May 26, 2026 at 1:30 PM Kewen Lin <[email protected]> wrote: >>> >>> Hi, >>> >>> Commit r17-258 introduced separated c86-4g fdiv units to avoid the >>> automaton explosion caused by modeling the whole divider latency on >>> normal FPU pipes. But the real hardware may keep the associated FPU >>> pipe occupied for some cycles at both the beginning and the end of >>> an fdiv or sqrt operation. Following Alexander's suggestion in [1], >>> this patch still keeps the long-latency part on the dedicated fdiv >>> unit but models only a bounded part of the FPU pipe occupancy. It >>> makes the first four cycles reserve both the selected FPU pipe and >>> the fdiv unit, then keep only the fdiv unit for the remaining cycles. >>> >>> Taking r17-258 as baseline, I tried K = 1,2,3,4 for >>> >>> fpu,divider*N -> (fpu+divider)*K, divider*(N-K) >>> >>> and measured the time for build/genautomata and the top 100 symbol >>> sizes of insn-automata.o (baseline normalized as 100) as below: >>> >>> 1) without any other changes: >>> time size >>> baseline 100 100 >>> r17-203 340.0 629.3 >>> K1 100.3 100 >>> K2 105.5 112.5 >>> K3 112.8 129 >>> K4 119.4 141 >>> >>> 2) Splitting fpu0/fpu2 and fpu1/fpu3 to paired automatons: >>> time size >>> baseline 100 100 >>> r17-203 340.0 629.3 >>> KS1 79.6 43.3 >>> KS2 79.8 43.3 >>> KS3 79.6 43.3 >>> KS4 79.4 43.3 >>> >>> It turns out that if we want to model the FPU occupancy for some >>> beginning cycles, separating the involved fpu1/fpu3 from the >>> original fpu looks better. So this patch splits fpu0/fpu2 and >>> fpu1/fpu3 into two paired automata and this extra coupling does >>> not grow the main FPU automata significantly. >>> >>> This patch also corrects some other modeling omissions like: >>> >>> - Fix c86_4g_fp_op_idiv_load latency typo by one cycle. >>> - Merge the old c86_4g_m7 idiv DI/SI/HI reservations after >>> aligning their latency and divider unit occupancy (with >>> updated values), while keeping QI separate. >>> - Adjust reservation units in templates like >>> c86_4g_m7_avx_vpinsr_reg_load and c86_4g_m7_avx512_sseadd_xy >>> etc. >>> - Add missing reservation units and unit occupancy in templates >>> like c86_4g_m7_avx512_permi2_ymm and >>> c86_4g_m7_sse_sseiadd_hplus_load etc. >>> - Adjust reservation units and unit occupancy in templates like >>> c86_4g_m7_avx512_perm_zmm_imm, c86_4g_m7_avx512_expand and >>> c86_4g_m7_avx512_ssemul etc. >>> >>> And also introduces some reusable reservation aliases to simplify >>> some modelings. >>> >>> I tested build time for i686 bootstrapping in a docker container: >>> - r17-202: 2437s (before c86-4g support) >>> - r17-203: 7291s (c86-4g support) >>> - r17-258: 2646s (tweaking for build time) >>> - this: 2358s >>> It looks this patch improves build time (even better than r17-202 >>> though the trivial gap can be due to some jitter). >>> >>> The symbol sizes are improved as below: >>> >>> nm -CS -t d --defined-only gcc/insn-automata.o \ >>> | sed 's/^[0-9]* 0*//' \ >>> | sort -n | tail -20 >>> >>> with r17-258: >>> >>> 20068 r bdver1_fp_transitions >>> 22354 r c86_4g_m7_ieu_min_issue_delay >>> 26208 r slm_min_issue_delay >>> 26580 t internal_min_issue_delay(int, DFA_chip*) >>> 26869 t internal_state_transition(int, DFA_chip*) >>> 27244 r bdver1_fp_min_issue_delay >>> 28518 r glm_check >>> 28518 r glm_transitions >>> 33690 r geode_min_issue_delay >>> 33728 r c86_4g_fp_transitions >>> 45436 r znver4_fpu_min_issue_delay >>> 46980 r bdver3_fp_min_issue_delay >>> 49428 r glm_min_issue_delay >>> 53730 r btver2_fp_min_issue_delay >>> 53760 r znver1_fp_transitions >>> 89414 r c86_4g_m7_ieu_transitions >>> 93960 r bdver3_fp_transitions >>> 181744 r znver4_fpu_transitions >>> 326322 r c86_4g_m7_fpu_min_issue_delay >>> 1305288 r c86_4g_m7_fpu_transitions >>> >>> with this: >>> >>> 17872 r print_reservation(_IO_FILE*, rtx_insn*)::... >>> 20068 r bdver1_fp_check >>> 20068 r bdver1_fp_transitions >>> 22016 r c86_4g_m7_fpu02_transitions >>> 22354 r c86_4g_m7_ieu_min_issue_delay >>> 26208 r slm_min_issue_delay >>> 27244 r bdver1_fp_min_issue_delay >>> 28199 t internal_min_issue_delay(int, DFA_chip*) >>> 28362 t internal_state_transition(int, DFA_chip*) >>> 28518 r glm_check >>> 28518 r glm_transitions >>> 33690 r geode_min_issue_delay >>> 45436 r znver4_fpu_min_issue_delay >>> 46980 r bdver3_fp_min_issue_delay >>> 49428 r glm_min_issue_delay >>> 53730 r btver2_fp_min_issue_delay >>> 53760 r znver1_fp_transitions >>> 89414 r c86_4g_m7_ieu_transitions >>> 93960 r bdver3_fp_transitions >>> 181744 r znver4_fpu_transitions >>> >>> Based on random sampling of SPEC2017 benchmarks 525.x264_r and >>> 521.wrf_r, I verified that the new modeling introduces no >>> significant compilation overhead. Testing with a single job on a >>> c86-4g-m7 machine revealed no impact on x264 and a tiny increase >>> for wrf (~0.3%). >>> >>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716681.html >>> >>> Bootstrapped and regress-tested on one c86-4g-m7 machine, as well >>> as a cfarm x86-64 machine. Is it ok for trunk? >>> >>> BR, >>> Kewen >>> ----- >>> >>> gcc/ChangeLog: >>> >>> * config/i386/c86-4g-m7.md (c86_4g_m7_fpu): Remove automaton. >>> (c86_4g_m7_fpu02): New automaton. >>> (c86_4g_m7_fpu13): Ditto. >>> (c86-4g-m7-fpu0): Move to c86_4g_m7_fpu02 automaton. >>> (c86-4g-m7-fpu1): Move to c86_4g_m7_fpu13 automaton. >>> (c86-4g-m7-fpu2): Move to c86_4g_m7_fpu02 automaton. >>> (c86-4g-m7-fpu3): Move to c86_4g_m7_fpu13 automaton. >>> (c86-4g-m7-fdiv): Remove cpu unit. >>> (c86-4g-m7-fdiv1): New cpu unit. >>> (c86-4g-m7-fdiv3): Ditto. >>> (c86-4g-m7-fpu_0_3): New reservation. >>> (c86-4g-m7-fpu_1_3x2): Ditto. >>> (c86-4g-m7-fpu_1_3x3): Ditto. >>> (c86-4g-m7-fpu_1_3x6): Ditto. >>> (c86-4g-m7-fpux2): Ditto. >>> (c86-4g-m7-fpux4): Ditto. >>> (c86-4g-m7-fpux6): Ditto. >>> (c86-4g-m7-fpux8): Ditto. >>> (c86-4g-m7-fpux16): Ditto. >>> (c86-4g-m7-fp1fdiv1x4): Ditto. >>> (c86-4g-m7-fp3fdiv3x4): Ditto. >>> (c86-4g-m7-fdiv13): Ditto. >>> (c86-4g-m7-fp13div13): Ditto. >>> (c86-4g-m7-fp13div13x4): Ditto. >>> (c86-4g-m7-fp1div1_fp3div3_x4x8): Ditto. >>> (c86-4g-m7-fp1div1_fp3div3_x4x9): Ditto. >>> (c86-4g-m7-fp1div1_fp3div3_x4x11): Ditto. >>> (c86-4g-m7-fp1div1_fp3div3_x4x15): Ditto. >>> (c86-4g-m7-fp1div1_fp3div3_x4x18): Ditto. >>> (c86_4g_m7_idiv): New reservation. >>> (c86_4g_m7_idiv_QI): Adjust reservation latency and unit occupancy. >>> (c86_4g_m7_idiv_load): New reservation. >>> (c86_4g_m7_idiv_QI_load): Adjust reservation latency and unit >>> occupancy. >>> (c86_4g_m7_idiv_DI): Remove reservation. >>> (c86_4g_m7_idiv_SI): Ditto. >>> (c86_4g_m7_idiv_HI): Ditto. >>> (c86_4g_m7_idiv_DI_load): Ditto. >>> (c86_4g_m7_idiv_SI_load): Ditto. >>> (c86_4g_m7_idiv_HI_load): Ditto. >>> (c86_4g_m7_sse_insertimm): Adjust reservation units and unit >>> occupancy. >>> (c86_4g_m7_sse_insert): Ditto. >>> (c86_4g_m7_fp_sqrt): Adjust reservation. >>> (c86_4g_m7_fp_div): Ditto. >>> (c86_4g_m7_fp_div_load): Ditto. >>> (c86_4g_m7_fp_idiv_load): Ditto. >>> (c86_4g_m7_sse_pinsr_reg): Adjust reservation units and unit >>> occupancy. >>> (c86_4g_m7_sse_pinsr_reg_load): Ditto. >>> (c86_4g_m7_avx_vpinsr_reg): Ditto. >>> (c86_4g_m7_avx_vpinsr_reg_load): Ditto. >>> (c86_4g_m7_avx512_perm_xmm): Delete the prefix condition. >>> (c86_4g_m7_avx512_perm_xmm_opload): Ditto. >>> (c86_4g_m7_avx512_permi2_ymm): Adjust reservation units and unit >>> occupancy. >>> (c86_4g_m7_avx512_permi2_zmm): Ditto. >>> (c86_4g_m7_avx512_permi2_ymm_load): Ditto. >>> (c86_4g_m7_avx512_permi2_zmm_load): Ditto. >>> (c86_4g_m7_avx512_perm_zmm_imm): Ditto. >>> (c86_4g_m7_avx512_perm_zmm_imm_load): Ditto. >>> (c86_4g_m7_avx512_perm_zmm_noimm): Ditto. >>> (c86_4g_m7_sse_perm_zmm_noimm_load): Ditto. >>> (c86_4g_m7_avx_perm_ymm): Remove. >>> (c86_4g_m7_avx_perm_ymem): Ditto. >>> (c86_4g_m7_avx512_shuf_zmm): Adjust reservation units and unit >>> occupancy. >>> (c86_4g_m7_avx512_shuf_zmem): Ditto. >>> (c86_4g_m7_avx512_cmpestr): Ditto. >>> (c86_4g_m7_avx512_cmpestr_load): Ditto. >>> (c86_4g_m7_avx512_vdbpsadbw_zmm): Ditto. >>> (c86_4g_m7_avx512_vdbpsadbw_zmem): Ditto. >>> (c86_4g_m7_avx_ssecomi_comi): Ditto. >>> (c86_4g_m7_avx_ssecomi_comi_load): Ditto. >>> (c86_4g_m7_avx512_expand): Ditto. >>> (c86_4g_m7_avx512_expand_load): Ditto. >>> (c86_4g_m7_avx512_expand_z): Ditto. >>> (c86_4g_m7_avx512_expand_z_load): Ditto. >>> (c86_4g_m7_sse_movnt_xy): Rename to c86_4g_m7_sse_movnt. >>> (c86_4g_m7_avx512_sseadd_xy): Adjust reservation units. >>> (c86_4g_m7_avx512_sseadd_xy_load): Ditto. >>> (c86_4g_m7_sse_sseiadd_hplus): Adjust reservation units and unit >>> occupancy. >>> (c86_4g_m7_sse_sseiadd_hplus_load): Ditto. >>> (c86_4g_m7_avx512_ssemul): Adjust reservation units. >>> (c86_4g_m7_avx512_ssemul_load): Ditto. >>> (c86_4g_m7_avx512_ssediv): Remove. >>> (c86_4g_m7_avx512_ssediv_mem): Remove. >>> (c86_4g_m7_avx512_ssediv_x): New. >>> (c86_4g_m7_avx512_ssediv_xmem): New. >>> (c86_4g_m7_avx512_ssediv_y): New. >>> (c86_4g_m7_avx512_ssediv_ymem): New. >>> (c86_4g_m7_avx512_ssediv_z): Adjust reservation units. >>> (c86_4g_m7_avx512_ssediv_zmem): Ditto. >>> (c86_4g_m7_avx512_ssecmp_z): Add reservation units and unit >>> occupancy. >>> (c86_4g_m7_avx512_ssecmp_z_load): Ditto. >>> (c86_4g_m7_avx512_ssecmp_vp_z): New reservation. >>> (c86_4g_m7_avx512_ssecmp_vp_z_load): Ditto. >>> (c86_4g_m7_avx512_ssecmp_test_z): Remove reservation. >>> (c86_4g_m7_avx512_ssecmp_test_z_load): Ditto. >>> (c86_4g_m7_avx512_muladd): Broaden matching condition. >>> (c86_4g_m7_avx512_muladd_load): Ditto. >>> (c86_4g_m7_fma_muladd): Remove reservation. >>> (c86_4g_m7_fma_muladd_load): Ditto. >>> (c86_4g_m7_avx512_sse_conflict_x): Add reservation units and unit >>> occupancy. >>> (c86_4g_m7_avx512_sse_conflict_x_load): Ditto. >>> (c86_4g_m7_avx512_sse_conflict_y): Ditto. >>> (c86_4g_m7_avx512_sse_conflict_y_load): Ditto. >>> (c86_4g_m7_avx512_sse_conflict_z): Ditto. >>> (c86_4g_m7_avx512_sse_conflict_z_load): Ditto. >>> (c86_4g_m7_avx512_sse_class_z): Add reservation units and unit >>> occupancy. >>> (c86_4g_m7_avx512_sse_class_z_load): Ditto. >>> (c86_4g_m7_avx512_sse_sqrt): Remove. >>> (c86_4g_m7_avx512_sse_sqrt_load): Remove. >>> (c86_4g_m7_avx512_sse_sqrt_sf_x): New. >>> (c86_4g_m7_avx512_sse_sqrt_sf_xload): New. >>> (c86_4g_m7_avx512_sse_sqrt_sf_y): New. >>> (c86_4g_m7_avx512_sse_sqrt_sf_yload): New. >>> (c86_4g_m7_avx512_sse_sqrt_sf_z): New. >>> (c86_4g_m7_avx512_sse_sqrt_sf_zload): New. >>> (c86_4g_m7_avx512_sse_sqrt_df_x): New. >>> (c86_4g_m7_avx512_sse_sqrt_df_xload): New. >>> (c86_4g_m7_avx512_sse_sqrt_df_y): New. >>> (c86_4g_m7_avx512_sse_sqrt_df_yload): New. >>> (c86_4g_m7_avx512_sse_sqrt_df_z): New. >>> (c86_4g_m7_avx512_sse_sqrt_df_zload): New. >>> (c86_4g_m7_avx512_msklog_vector): Add reservation units and unit >>> occupancy. >>> (c86_4g_m7_avx512_mskmov_z_k): Ditto. >>> (c86_4g_m7_avx512_mskmov_k_reg): Ditto. >>> * config/i386/c86-4g.md (c86_4g_fp): Remove automaton. >>> (c86_4g_fp024): New automaton. >>> (c86_4g_fp1): Ditto. >>> (c86-4g-fp0): Move to c86_4g_fp024 automaton. >>> (c86-4g-fp1): Move to c86_4g_fp1 automaton. >>> (c86-4g-fp2): Move to c86_4g_fp024 automaton. >>> (c86-4g-fp3): Ditto. >>> (c86-4g-fp1fdivx4): New reservation. >>> (c86_4g_fp_sqrt): Adjust reservation. >>> (c86_4g_sse_sqrt_sf): Ditto. >>> (c86_4g_sse_sqrt_sf_mem): Ditto. >>> (c86_4g_sse_sqrt_df): Ditto. >>> (c86_4g_sse_sqrt_df_mem): Ditto. >>> (c86_4g_fp_op_div): Ditto. >>> (c86_4g_fp_op_div_load): Ditto. >>> (c86_4g_fp_op_idiv_load): Adjust reservation latency. >>> (c86_4g_ssediv_ss_ps): Adjust reservation. >>> (c86_4g_ssediv_ss_ps_load): Ditto. >>> (c86_4g_ssediv_sd_pd): Ditto. >>> (c86_4g_ssediv_sd_pd_load): Ditto. >>> (c86_4g_ssediv_avx256_ps): Ditto. >>> (c86_4g_ssediv_avx256_ps_load): Ditto. >>> (c86_4g_ssediv_avx256_pd): Ditto. >>> (c86_4g_ssediv_avx256_pd_load): Ditto. >> >> LGTM (not a thorough review, but this patch is definitely an improvement). > > Thanks Uros! Pushed as r17-895-gdd682ea0414926.
Since this patch has landed for more than 10 days without any issues reported, would it be ok to backport these three patches below to all the active release branches? r17-203-g2a64a63d982584 i386: Support HYGON c86-4g series processors r17-258-gc776dcd5f868a1 i386: Adjust some c86-4g*.md modeling to reduce build time r17-895-gdd682ea0414926 i386: Refine c86-4g fdiv scheduling model BR, Kewen > > BR, > Kewen > >> >> Thanks, >> Uros. >> >>> >>> Co-authored-by: Xin Liu <[email protected]> >>> Signed-off-by: Xin Liu <[email protected]> >>> Signed-off-by: Kewen Lin <[email protected]> >>> >>> --- >>> gcc/config/i386/c86-4g-m7.md | 412 ++++++++++++++++++++--------------- >>> gcc/config/i386/c86-4g.md | 61 +++--- >>> 2 files changed, 270 insertions(+), 203 deletions(-) >>> >>> diff --git a/gcc/config/i386/c86-4g-m7.md b/gcc/config/i386/c86-4g-m7.md >>> index 54a850db3be..96bd322a288 100644 >>> --- a/gcc/config/i386/c86-4g-m7.md >>> +++ b/gcc/config/i386/c86-4g-m7.md >>> @@ -20,8 +20,10 @@ >>> ;; HYGON c86-4g-m7 Scheduling >>> ;; Modeling automatons for decoders, integer execution pipes, >>> ;; AGU pipes, branch, floating point execution, fp store units, >>> -;; integer and floating point dividers. >>> -(define_automaton "c86_4g_m7, c86_4g_m7_ieu, c86_4g_m7_agu, c86_4g_m7_fpu, >>> c86_4g_m7_idiv, c86_4g_m7_fdiv") >>> +;; integer and floating point dividers. Split fpu1 and fpu3 >>> +;; into their own automata to keep these units independent >>> +;; without increasing the main c86_4g_m7_fpu state space. >>> +(define_automaton "c86_4g_m7, c86_4g_m7_ieu, c86_4g_m7_agu, >>> c86_4g_m7_fpu02, c86_4g_m7_fpu13, c86_4g_m7_idiv, c86_4g_m7_fdiv") >>> >>> ;; Decoders unit has 4 decoders and all of them can decode fast path >>> ;; and vector type instructions. >>> @@ -30,10 +32,6 @@ (define_cpu_unit "c86-4g-m7-decode1" "c86_4g_m7") >>> (define_cpu_unit "c86-4g-m7-decode2" "c86_4g_m7") >>> (define_cpu_unit "c86-4g-m7-decode3" "c86_4g_m7") >>> >>> -;; Two separated dividers for int and fp. >>> -(define_cpu_unit "c86-4g-m7-idiv" "c86_4g_m7_idiv") >>> -(define_cpu_unit "c86-4g-m7-fdiv" "c86_4g_m7_fdiv") >>> - >>> ;; Currently blocking all decoders for vector path instructions as >>> ;; they are dispatched separetely as microcode sequence. >>> (define_reservation "c86-4g-m7-vector" >>> "c86-4g-m7-decode0+c86-4g-m7-decode1+c86-4g-m7-decode2+c86-4g-m7-decode3") >>> @@ -50,6 +48,9 @@ (define_cpu_unit "c86-4g-m7-ieu1" "c86_4g_m7_ieu") >>> (define_cpu_unit "c86-4g-m7-ieu2" "c86_4g_m7_ieu") >>> (define_cpu_unit "c86-4g-m7-ieu3" "c86_4g_m7_ieu") >>> >>> +;; One separated integer divider. >>> +(define_cpu_unit "c86-4g-m7-idiv" "c86_4g_m7_idiv") >>> + >>> ;; c86-4g-m7 has an additional branch unit. >>> (define_cpu_unit "c86-4g-m7-bru0" "c86_4g_m7_ieu") >>> (define_reservation "c86-4g-m7-ieu" >>> "c86-4g-m7-ieu0|c86-4g-m7-ieu1|c86-4g-m7-ieu2|c86-4g-m7-ieu3") >>> @@ -67,23 +68,48 @@ (define_reservation "c86-4g-m7-store" >>> "c86-4g-m7-agu-reserve") >>> ;; vectorpath (microcoded) instructions are single issue instructions. >>> ;; So, they occupy all the integer units. >>> (define_reservation "c86-4g-m7-ivector" "c86-4g-m7-ieu0+c86-4g-m7-ieu1 >>> - >>> +c86-4g-m7-ieu2+c86-4g-m7-ieu3+c86-4g-m7-bru0 >>> - >>> +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") >>> + >>> +c86-4g-m7-ieu2+c86-4g-m7-ieu3+c86-4g-m7-bru0 >>> + >>> +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") >>> >>> ;; Floating point unit 4 FP pipes. >>> -(define_cpu_unit "c86-4g-m7-fpu0" "c86_4g_m7_fpu") >>> -(define_cpu_unit "c86-4g-m7-fpu1" "c86_4g_m7_fpu") >>> -(define_cpu_unit "c86-4g-m7-fpu2" "c86_4g_m7_fpu") >>> -(define_cpu_unit "c86-4g-m7-fpu3" "c86_4g_m7_fpu") >>> +(define_cpu_unit "c86-4g-m7-fpu0" "c86_4g_m7_fpu02") >>> +(define_cpu_unit "c86-4g-m7-fpu1" "c86_4g_m7_fpu13") >>> +(define_cpu_unit "c86-4g-m7-fpu2" "c86_4g_m7_fpu02") >>> +(define_cpu_unit "c86-4g-m7-fpu3" "c86_4g_m7_fpu13") >>> + >>> (define_reservation "c86-4g-m7-fpu" >>> "c86-4g-m7-fpu0|c86-4g-m7-fpu1|c86-4g-m7-fpu2|c86-4g-m7-fpu3") >>> -(define_reservation "c86-4g-m7-fpu_0_2" "c86-4g-m7-fpu0|c86-4g-m7-fpu2") >>> -(define_reservation "c86-4g-m7-fpu_1_3" "c86-4g-m7-fpu1|c86-4g-m7-fpu3") >>> (define_reservation "c86-4g-m7-fpu_0_1" "c86-4g-m7-fpu0|c86-4g-m7-fpu1") >>> +(define_reservation "c86-4g-m7-fpu_0_2" "c86-4g-m7-fpu0|c86-4g-m7-fpu2") >>> (define_reservation "c86-4g-m7-fpu_0_2x2" >>> "c86-4g-m7-fpu0*2|c86-4g-m7-fpu2*2") >>> (define_reservation "c86-4g-m7-fpu_0_2x4" >>> "c86-4g-m7-fpu0*4|c86-4g-m7-fpu2*4") >>> +(define_reservation "c86-4g-m7-fpu_0_3" "c86-4g-m7-fpu0|c86-4g-m7-fpu3") >>> +(define_reservation "c86-4g-m7-fpu_1_3" "c86-4g-m7-fpu1|c86-4g-m7-fpu3") >>> +(define_reservation "c86-4g-m7-fpu_1_3x2" >>> "c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") >>> +(define_reservation "c86-4g-m7-fpu_1_3x3" >>> "c86-4g-m7-fpu1*3|c86-4g-m7-fpu3*3") >>> +(define_reservation "c86-4g-m7-fpu_1_3x6" >>> "c86-4g-m7-fpu1*6|c86-4g-m7-fpu3*6") >>> +(define_reservation "c86-4g-m7-fpux2" >>> "c86-4g-m7-fpu0*2|c86-4g-m7-fpu1*2|c86-4g-m7-fpu2*2|c86-4g-m7-fpu3*2") >>> +(define_reservation "c86-4g-m7-fpux4" >>> "c86-4g-m7-fpu0*4|c86-4g-m7-fpu1*4|c86-4g-m7-fpu2*4|c86-4g-m7-fpu3*4") >>> +(define_reservation "c86-4g-m7-fpux8" >>> "c86-4g-m7-fpu0*8|c86-4g-m7-fpu1*8|c86-4g-m7-fpu2*8|c86-4g-m7-fpu3*8") >>> +(define_reservation "c86-4g-m7-fpux6" >>> "c86-4g-m7-fpu0*6|c86-4g-m7-fpu1*6|c86-4g-m7-fpu2*6|c86-4g-m7-fpu3*6") >>> +(define_reservation "c86-4g-m7-fpux16" >>> "c86-4g-m7-fpu0*16|c86-4g-m7-fpu1*16|c86-4g-m7-fpu2*16|c86-4g-m7-fpu3*16") >>> (define_reservation "c86-4g-m7-fvector" "c86-4g-m7-fpu0+c86-4g-m7-fpu1 >>> - +c86-4g-m7-fpu2+c86-4g-m7-fpu3 >>> - >>> +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") >>> + +c86-4g-m7-fpu2+c86-4g-m7-fpu3 >>> + >>> +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") >>> + >>> +;; Two FP dividers. >>> +(define_cpu_unit "c86-4g-m7-fdiv1" "c86_4g_m7_fdiv") >>> +(define_cpu_unit "c86-4g-m7-fdiv3" "c86_4g_m7_fdiv") >>> + >>> +(define_reservation "c86-4g-m7-fp1fdiv1x4" >>> "(c86-4g-m7-fpu1+c86-4g-m7-fdiv1)*4") >>> +(define_reservation "c86-4g-m7-fp3fdiv3x4" >>> "(c86-4g-m7-fpu3+c86-4g-m7-fdiv3)*4") >>> +(define_reservation "c86-4g-m7-fdiv13" "(c86-4g-m7-fdiv1+c86-4g-m7-fdiv3)") >>> +(define_reservation "c86-4g-m7-fp13div13" >>> "(c86-4g-m7-fpu1+c86-4g-m7-fpu3+c86-4g-m7-fdiv1+c86-4g-m7-fdiv3)") >>> +(define_reservation "c86-4g-m7-fp13div13x4" "c86-4g-m7-fp13div13*4") >>> +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x8" >>> "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*8)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*8)") >>> +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x9" >>> "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*9)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*9)") >>> +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x11" >>> "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*11)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*11)") >>> +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x15" >>> "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*15)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*15)") >>> +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x18" >>> "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*18)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*18)") >>> >>> ;; IMOV/IMOVX >>> (define_insn_reservation "c86_4g_m7_imov_xchg" 1 >>> @@ -168,61 +194,33 @@ (define_insn_reservation "c86_4g_m7_imul_load" 7 >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-ieu1") >>> >>> ;; IDIV >>> -(define_insn_reservation "c86_4g_m7_idiv_DI" 41 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "idiv") >>> - (and (eq_attr "mode" "DI") >>> - (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*41") >>> - >>> -(define_insn_reservation "c86_4g_m7_idiv_SI" 25 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "idiv") >>> - (and (eq_attr "mode" "SI") >>> - (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*25") >>> - >>> -(define_insn_reservation "c86_4g_m7_idiv_HI" 17 >>> +(define_insn_reservation "c86_4g_m7_idiv" 7 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "idiv") >>> - (and (eq_attr "mode" "HI") >>> + (and (eq_attr "mode" "!QI") >>> (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*17") >>> + "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*7") >>> >>> -(define_insn_reservation "c86_4g_m7_idiv_QI" 15 >>> +(define_insn_reservation "c86_4g_m7_idiv_QI" 6 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "idiv") >>> (and (eq_attr "mode" "QI") >>> (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-ieu3,c86-4g-m7-idiv*15") >>> - >>> -(define_insn_reservation "c86_4g_m7_idiv_DI_load" 45 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "idiv") >>> - (and (eq_attr "mode" "DI") >>> - (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*41") >>> - >>> -(define_insn_reservation "c86_4g_m7_idiv_SI_load" 29 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "idiv") >>> - (and (eq_attr "mode" "SI") >>> - (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*25") >>> + "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*6") >>> >>> -(define_insn_reservation "c86_4g_m7_idiv_HI_load" 21 >>> +(define_insn_reservation "c86_4g_m7_idiv_load" 11 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "idiv") >>> - (and (eq_attr "mode" "HI") >>> + (and (eq_attr "mode" "!QI") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*17") >>> + >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*7") >>> >>> -(define_insn_reservation "c86_4g_m7_idiv_QI_load" 19 >>> +(define_insn_reservation "c86_4g_m7_idiv_QI_load" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "idiv") >>> (and (eq_attr "mode" "QI") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*15") >>> + >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*6") >>> >>> ;; Integer/genaral Instructions >>> (define_insn_reservation "c86_4g_m7_insn" 1 >>> @@ -385,14 +383,14 @@ (define_insn_reservation "c86_4g_m7_sse_insertimm" 3 >>> (and (eq_attr "type" "sseins") >>> (and (eq_attr "memory" "none") >>> (eq_attr "length_immediate" "2")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-fpu0|c86-4g-m7-fpu3,c86-4g-m7-fpu1") >>> + >>> "c86-4g-m7-double,c86-4g-m7-fpu_0_3,c86-4g-m7-fpu1") >>> >>> (define_insn_reservation "c86_4g_m7_sse_insert" 3 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "sseins") >>> (and (eq_attr "memory" "none") >>> (eq_attr "length_immediate" "!2")))) >>> - "c86-4g-m7-direct,c86-4g-m7-fpu1") >>> + "c86-4g-m7-direct,c86-4g-m7-fpu1*2") >>> >>> ;; FCMOV >>> (define_insn_reservation "c86_4g_m7_fp_cmov" 4 >>> @@ -444,7 +442,7 @@ (define_insn_reservation "c86_4g_m7_fp_sqrt" 22 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "fpspc") >>> (eq_attr "c86_attr" "sqrt"))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-fpu1,c86-4g-m7-fdiv*22") >>> + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x18") >>> >>> ;; FPSPC >>> (define_insn_reservation "c86_4g_m7_fp_spc_direct" 5 >>> @@ -487,21 +485,21 @@ (define_insn_reservation "c86_4g_m7_fp_div" 15 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "fdiv") >>> (eq_attr "memory" "none"))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15") >>> + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x11") >>> >>> (define_insn_reservation "c86_4g_m7_fp_div_load" 22 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "fdiv") >>> (and (eq_attr "fp_int_src" "false") >>> (eq_attr "memory" "!none")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x11") >>> >>> (define_insn_reservation "c86_4g_m7_fp_idiv_load" 26 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "fdiv") >>> (and (eq_attr "fp_int_src" "true") >>> (eq_attr "memory" "!none")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15") >>> + >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1*4,c86-4g-m7-fp1div1_fp3div3_x4x11") >>> >>> (define_insn_reservation "c86_4g_m7_fp_fsgn" 1 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -634,7 +632,7 @@ (define_insn_reservation "c86_4g_m7_sse_pinsr_reg" 1 >>> (and (eq_attr "c86_attr" "insr") >>> (and (eq_attr "prefix" "orig") >>> (eq_attr "memory" "none"))))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-ieu2,c86-4g-m7-fpu_0_1") >>> + "c86-4g-m7-double,c86-4g-m7-ieu2,c86-4g-m7-fpu") >>> >>> (define_insn_reservation "c86_4g_m7_sse_pinsr_reg_load" 3 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -642,7 +640,7 @@ (define_insn_reservation "c86_4g_m7_sse_pinsr_reg_load" >>> 3 >>> (and (eq_attr "c86_attr" "insr") >>> (and (eq_attr "prefix" "orig") >>> (eq_attr "memory" "load"))))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1") >>> + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu") >>> >>> (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg" 2 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -650,7 +648,7 @@ (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg" 2 >>> (and (eq_attr "c86_attr" "insr") >>> (and (eq_attr "prefix" "!orig") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-double,c86-4g-m7-fpu2*2") >>> + "c86-4g-m7-double,c86-4g-m7-fpu_1_3x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg_load" 8 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -658,7 +656,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx_vpinsr_reg_load" 8 >>> (and (eq_attr "c86_attr" "insr") >>> (and (eq_attr "prefix" "!orig") >>> (eq_attr "memory" "load"))))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1|c86-4g-m7-fpu2|c86-4g-m7-fpu3") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_1_3") >>> >>> ;; PERM >>> (define_insn_reservation "c86_4g_m7_avx512_perm_xmm" 3 >>> @@ -668,8 +666,7 @@ (define_insn_reservation "c86_4g_m7_avx512_perm_xmm" 3 >>> (eq_attr "mode" >>> "V4SF,V2DF,TI")) >>> (and (eq_attr "c86_attr" >>> "perm") >>> (eq_attr "mode" >>> "V8SF,V4DF,TI,OI"))) >>> - (and (eq_attr "prefix" "evex") >>> - (eq_attr "memory" "none"))))) >>> + (eq_attr "memory" "none")))) >>> "c86-4g-m7-direct,c86-4g-m7-fpu_0_2x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_perm_xmm_opload" 10 >>> @@ -679,8 +676,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_perm_xmm_opload" 10 >>> (eq_attr "mode" >>> "V4SF,V2DF,TI")) >>> (and (eq_attr "c86_attr" >>> "perm") >>> (eq_attr "mode" >>> "V8SF,V4DF,TI,OI"))) >>> - (and (eq_attr "prefix" "evex") >>> - (eq_attr "memory" "load"))))) >>> + (eq_attr "memory" "load")))) >>> >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm" 4 >>> @@ -689,7 +685,7 @@ (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm" 4 >>> (and (eq_attr "c86_attr" "perm2") >>> (and (eq_attr "mode" "V8SF,V4DF,OI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpux4") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm" 16 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -697,7 +693,7 @@ (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm" >>> 16 >>> (and (eq_attr "c86_attr" "perm2") >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpux16") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm_load" 11 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -705,7 +701,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_permi2_ymm_load" 11 >>> (and (eq_attr "c86_attr" "perm2") >>> (and (eq_attr "mode" "V8SF,V4DF,OI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux4") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm_load" 23 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -713,7 +709,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_permi2_zmm_load" 23 >>> (and (eq_attr "c86_attr" "perm2") >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux16") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_imm" 4 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -722,7 +718,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_perm_zmm_imm" 4 >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (and (match_operand 2 >>> "immediate_operand") >>> (eq_attr "memory" "none")))))) >>> - "c86-4g-m7-direct,c86-4g-m7-fpu_0_2x4") >>> + "c86-4g-m7-direct,c86-4g-m7-fpux4") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_imm_load" 11 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -731,7 +727,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_perm_zmm_imm_load" 11 >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (and (match_operand 2 >>> "immediate_operand") >>> (eq_attr "memory" "load")))))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2x4") >>> + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpux4") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_noimm" 8 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -740,7 +736,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_perm_zmm_noimm" 8 >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (and (match_operand 2 >>> "nonimmediate_operand") >>> (eq_attr "memory" "none")))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpux8") >>> >>> (define_insn_reservation "c86_4g_m7_sse_perm_zmm_noimm_load" 15 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -749,23 +745,7 @@ (define_insn_reservation >>> "c86_4g_m7_sse_perm_zmm_noimm_load" 15 >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (and (match_operand 2 >>> "nonimmediate_operand") >>> (eq_attr "memory" "load")))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> - >>> -(define_insn_reservation "c86_4g_m7_avx_perm_ymm" 3 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "sselog") >>> - (and (eq_attr "c86_attr" "perm") >>> - (and (eq_attr "prefix" "!evex") >>> - (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> - >>> -(define_insn_reservation "c86_4g_m7_avx_perm_ymem" 10 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "sselog") >>> - (and (eq_attr "c86_attr" "perm") >>> - (and (eq_attr "prefix" "!evex") >>> - (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux8") >>> >>> ;; VINSERT >>> (define_insn_reservation "c86_4g_m7_avx512_insertx_ymm" 3 >>> @@ -853,7 +833,7 @@ (define_insn_reservation "c86_4g_m7_avx512_shuf_zmm" 4 >>> (and (eq_attr "c86_attr" "shufx") >>> (and (eq_attr "mode" "V8DF,V16SF,XI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpu_0_2x4") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_shuf_xymem" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -869,7 +849,7 @@ (define_insn_reservation "c86_4g_m7_avx512_shuf_zmem" 11 >>> (and (eq_attr "c86_attr" "shufx") >>> (and (eq_attr "mode" "V8DF,V16SF,XI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2x4") >>> >>> ;; SSELOGIC >>> (define_insn_reservation "c86_4g_m7_sselogic_xymm" 1 >>> @@ -892,14 +872,14 @@ (define_insn_reservation "c86_4g_m7_avx512_cmpestr" 6 >>> (and (eq_attr "type" "sselog") >>> (and (eq_attr "c86_attr" "cmpestr") >>> (eq_attr "memory" "none")))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpux6") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_cmpestr_load" 13 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "sselog") >>> (and (eq_attr "c86_attr" "cmpestr") >>> (eq_attr "memory" "load")))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux6") >>> >>> ;; SSELOG >>> (define_insn_reservation "c86_4g_m7_avx512_log" 1 >>> @@ -940,7 +920,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_vdbpsadbw_zmm" 4 >>> (and (eq_attr "c86_attr" "sadbw") >>> (and (eq_attr "mode" "XI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_vdbpsadbw_zmem" 11 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -948,7 +928,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_vdbpsadbw_zmem" 11 >>> (and (eq_attr "c86_attr" "sadbw") >>> (and (eq_attr "mode" "XI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2") >>> >>> ;; ABS >>> (define_insn_reservation "c86_4g_m7_avx512_abs" 1 >>> @@ -1052,14 +1032,14 @@ (define_insn_reservation >>> "c86_4g_m7_avx_ssecomi_comi" 1 >>> (and (eq_attr "type" "ssecomi") >>> (and (eq_attr "prefix_extra" "0") >>> (eq_attr "memory" "none")))) >>> - "c86-4g-m7-double,c86-4g-m7-fpu2|c86-4g-m7-fpu3") >>> + "c86-4g-m7-double,c86-4g-m7-fpu") >>> >>> (define_insn_reservation "c86_4g_m7_avx_ssecomi_comi_load" 8 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssecomi") >>> (and (eq_attr "prefix_extra" "0") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu2|c86-4g-m7-fpu3") >>> + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu") >>> >>> (define_insn_reservation "c86_4g_m7_avx_ssecomi_test" 1 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1201,7 +1181,7 @@ (define_insn_reservation "c86_4g_m7_avx512_expand" 3 >>> (and (eq_attr "c86_attr" >>> "expand,compress") >>> (and (not (eq_attr "mode" >>> "XI,V16SF,V8DF")) >>> (eq_attr "memory" "none"))))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_expand_load" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1209,7 +1189,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_expand_load" 10 >>> (and (eq_attr "c86_attr" >>> "expand,compress") >>> (and (not (eq_attr "mode" >>> "XI,V16SF,V8DF")) >>> (eq_attr "memory" "load"))))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_expand_z" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1217,7 +1197,7 @@ (define_insn_reservation "c86_4g_m7_avx512_expand_z" >>> 10 >>> (and (eq_attr "c86_attr" >>> "expand,compress") >>> (and (eq_attr "mode" "XI,V16SF,V8DF") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_expand_z_load" 17 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1225,7 +1205,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_expand_z_load" 17 >>> (and (eq_attr "c86_attr" >>> "expand,compress") >>> (and (eq_attr "mode" "XI,V16SF,V8DF") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") >>> >>> ;; MOVNT >>> (define_insn_reservation "c86_4g_m7_avx512_movnt_load" 8 >>> @@ -1252,7 +1232,7 @@ (define_insn_reservation "c86_4g_m7_sse_movnt_store" 4 >>> (eq_attr "memory" "!none"))))) >>> "c86-4g-m7-direct,c86-4g-m7-store,c86-4g-m7-fpu1") >>> >>> -(define_insn_reservation "c86_4g_m7_sse_movnt_xy" 4 >>> +(define_insn_reservation "c86_4g_m7_sse_movnt" 4 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssemov") >>> (and (eq_attr "c86_attr" "movnt") >>> @@ -1377,14 +1357,14 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sseadd_xy" 3 >>> (and (eq_attr "type" "sseadd") >>> (and (eq_attr "c86_attr" "other") >>> (eq_attr "memory" "none")))) >>> - "c86-4g-m7-direct,c86-4g-m7-fpu3") >>> + "c86-4g-m7-direct,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sseadd_xy_load" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "sseadd") >>> (and (eq_attr "c86_attr" "other") >>> (eq_attr "memory" "load")))) >>> - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_1_3") >>> >>> ;; HADD/HSUB >>> (define_insn_reservation "c86_4g_m7_avx_sseadd_hplus" 7 >>> @@ -1507,7 +1487,7 @@ (define_insn_reservation >>> "c86_4g_m7_sse_sseiadd_hplus" 3 >>> (and (eq_attr "c86_attr" "hplus") >>> (and (eq_attr "prefix" "orig") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-fpu0*2") >>> + "c86-4g-m7-vector,c86-4g-m7-fpux2") >>> >>> (define_insn_reservation "c86_4g_m7_sse_sseiadd_hplus_load" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1515,49 +1495,63 @@ (define_insn_reservation >>> "c86_4g_m7_sse_sseiadd_hplus_load" 10 >>> (and (eq_attr "c86_attr" "hplus") >>> (and (eq_attr "prefix" "orig") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu0*2") >>> + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux2") >>> >>> ;; SSEMUL >>> (define_insn_reservation "c86_4g_m7_avx512_ssemul" 3 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssemul") >>> (eq_attr "memory" "none"))) >>> - "c86-4g-m7-direct,c86-4g-m7-fpu0") >>> + "c86-4g-m7-direct,c86-4g-m7-fpu_0_2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_ssemul_load" 10 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssemul") >>> (eq_attr "memory" "load"))) >>> - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu0") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2") >>> >>> ;; SSEDIV >>> -(define_insn_reservation "c86_4g_m7_avx512_ssediv" 13 >>> +(define_insn_reservation "c86_4g_m7_avx512_ssediv_x" 13 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "ssediv") >>> + (and (eq_attr "mode" "SF,DF,V4SF,V2DF") >>> + (eq_attr "memory" "none")))) >>> + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x8") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_ssediv_xmem" 20 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "ssediv") >>> + (and (eq_attr "mode" "SF,DF,V4SF,V2DF") >>> + (eq_attr "memory" "load")))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x8") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_ssediv_y" 13 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssediv") >>> - (and (not (eq_attr "mode" "V16SF,V8DF")) >>> + (and (eq_attr "mode" "V8SF,V4DF") >>> (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-fpu3,c86-4g-m7-fdiv*13") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*8") >>> >>> -(define_insn_reservation "c86_4g_m7_avx512_ssediv_mem" 20 >>> +(define_insn_reservation "c86_4g_m7_avx512_ssediv_ymem" 20 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssediv") >>> - (and (not (eq_attr "mode" "V16SF,V8DF")) >>> + (and (eq_attr "mode" "V8SF,V4DF") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fdiv*13") >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*8") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_ssediv_z" 24 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssediv") >>> (and (eq_attr "mode" "V16SF,V8DF") >>> (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-fpu3,c86-4g-m7-fdiv*24") >>> + >>> "c86-4g-m7-double,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*20") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_ssediv_zmem" 31 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssediv") >>> (and (eq_attr "mode" "V16SF,V8DF") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fdiv*24") >>> + >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*20") >>> >>> ;; SSECMP >>> (define_insn_reservation "c86_4g_m7_avx512_ssecmp" 5 >>> @@ -1582,7 +1576,7 @@ (define_insn_reservation "c86_4g_m7_avx512_ssecmp_z" 5 >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (and (eq_attr "c86_attr" "other") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_ssecmp_z_load" 12 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1590,7 +1584,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_ssecmp_z_load" 12 >>> (and (eq_attr "mode" "V16SF,V8DF,XI") >>> (and (eq_attr "c86_attr" "other") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp" 5 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1610,6 +1604,24 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_ssecmp_vp_load" 12 >>> (eq_attr "memory" "load")))))) >>> >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3") >>> >>> +(define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp_z" 5 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "ssecmp") >>> + (and (eq_attr "prefix" "evex") >>> + (and (eq_attr "mode" "XI") >>> + (and (eq_attr "c86_attr" "other,ptest") >>> + (eq_attr "memory" "none")))))) >>> + "c86-4g-m7-double,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp_z_load" 12 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "ssecmp") >>> + (and (eq_attr "prefix" "evex") >>> + (and (eq_attr "mode" "XI") >>> + (and (eq_attr "c86_attr" "other,ptest") >>> + (eq_attr "memory" "load")))))) >>> + >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3x2") >>> + >>> (define_insn_reservation "c86_4g_m7_avx_ssecmp_vp" 1 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssecmp") >>> @@ -1641,22 +1653,6 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_ssecmp_test_load" 13 >>> (eq_attr "memory" "load"))))) >>> >>> "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fpu_1_3") >>> >>> -(define_insn_reservation "c86_4g_m7_avx512_ssecmp_test_z" 4 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "ssecmp") >>> - (and (eq_attr "mode" "XI") >>> - (and (eq_attr "c86_attr" "ptest") >>> - (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> - >>> -(define_insn_reservation "c86_4g_m7_avx512_ssecmp_test_z_load" 11 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "ssecmp") >>> - (and (eq_attr "mode" "XI") >>> - (and (eq_attr "c86_attr" "ptest") >>> - (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> - >>> ;; SSECVT >>> (define_insn_reservation "c86_4g_m7_avx512_ssecvt_xy" 4 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1768,17 +1764,14 @@ (define_insn_reservation "c86_4g_m7_avx512_muladd" 4 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssemuladd") >>> (and (eq_attr "c86_attr" "other") >>> - (and (not (eq_attr "isa" "fma,fma4")) >>> - (eq_attr "mode" >>> "V32HF,V16SF,V8DF,XI") >>> - (eq_attr "memory" "none"))))) >>> + (eq_attr "memory" "none")))) >>> "c86-4g-m7-direct,c86-4g-m7-fpu_0_2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_muladd_load" 11 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "ssemuladd") >>> (and (eq_attr "c86_attr" "other") >>> - (and (not (eq_attr "isa" "fma,fma4")) >>> - (eq_attr "memory" "load"))))) >>> + (eq_attr "memory" "load")))) >>> >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_muladd_madd" 4 >>> @@ -1797,20 +1790,6 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_muladd_madd_load" 11 >>> (eq_attr "memory" "load"))))) >>> >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2") >>> >>> -(define_insn_reservation "c86_4g_m7_fma_muladd" 4 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "ssemuladd") >>> - (and (eq_attr "isa" "fma,fma4") >>> - (eq_attr "memory" "none")))) >>> - "c86-4g-m7-direct,c86-4g-m7-fpu_0_1") >>> - >>> -(define_insn_reservation "c86_4g_m7_fma_muladd_load" 11 >>> - (and (eq_attr "cpu" "c86_4g_m7") >>> - (and (eq_attr "type" "ssemuladd") >>> - (and (eq_attr "isa" "fma,fma4") >>> - (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1") >>> - >>> ;; SSE >>> (define_insn_reservation "c86_4g_m7_avx512_sse_range" 1 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1838,7 +1817,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_conflict_x" 2 >>> (and (eq_attr "c86_decode" "vector") >>> (and (eq_attr "mode" "TI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_x_load" 9 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1846,7 +1825,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_conflict_x_load" 9 >>> (and (eq_attr "c86_decode" "vector") >>> (and (eq_attr "mode" "TI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x2") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_y" 5 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1854,7 +1833,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_conflict_y" 5 >>> (and (eq_attr "c86_decode" "vector") >>> (and (eq_attr "mode" "OI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_y_load" 12 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1862,7 +1841,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_conflict_y_load" 12 >>> (and (eq_attr "c86_decode" "vector") >>> (and (eq_attr "mode" "OI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_z" 8 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1870,7 +1849,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_conflict_z" 8 >>> (and (eq_attr "c86_decode" "vector") >>> (and (eq_attr "mode" "XI") >>> (eq_attr "memory" "none"))))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x6") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_z_load" 15 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1878,7 +1857,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_conflict_z_load" 15 >>> (and (eq_attr "c86_decode" "vector") >>> (and (eq_attr "mode" "XI") >>> (eq_attr "memory" "load"))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x6") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_class" 4 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1905,7 +1884,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_class_z" 4 >>> (and (eq_attr "length_immediate" "1") >>> (and (eq_attr "mode" >>> "V32HF,V16SF,V8DF") >>> (eq_attr "memory" "none")))))) >>> - "c86-4g-m7-vector") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-fpu_1_3,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_sse_class_z_load" 11 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1914,7 +1893,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_sse_class_z_load" 11 >>> (and (eq_attr "length_immediate" "1") >>> (and (eq_attr "mode" >>> "V32HF,V16SF,V8DF") >>> (eq_attr "memory" "load")))))) >>> - "c86-4g-m7-vector,c86-4g-m7-load") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx_sse" 5 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1932,19 +1911,102 @@ (define_insn_reservation "c86_4g_m7_avx_sse_load" >>> 12 >>> (eq_attr "memory" "load"))))) >>> >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1") >>> >>> -(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt" 16 >>> +;; SSE SQRT >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_x" 14 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "sse") >>> - (and (eq_attr "c86_attr" "sqrt") >>> - (eq_attr "memory" "none")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-fpu1|c86-4g-m7-fpu3,c86-4g-m7-fdiv*16") >>> + (and (eq_attr "mode" "SF,V4SF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "none"))))) >>> + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x9") >>> >>> -(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_load" 23 >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_xload" 21 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "sse") >>> - (and (eq_attr "c86_attr" "sqrt") >>> - (eq_attr "memory" "load")))) >>> - >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1|c86-4g-m7-fpu3,c86-4g-m7-fdiv*16") >>> + (and (eq_attr "mode" "SF,V4SF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "load"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x9") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_y" 14 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V8SF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "none"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*9") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_yload" 21 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V8SF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "load"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*9") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_z" 26 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V16SF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "none"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*22") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_zload" 33 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V16SF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "load"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*22") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_x" 20 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "DF,V2DF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "none"))))) >>> + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x15") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_xload" 27 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "DF,V2DF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "load"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x15") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_y" 20 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V4DF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "none"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*15") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_yload" 27 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V4DF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "load"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*15") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_z" 38 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V8DF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "none"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*34") >>> + >>> +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_zload" 45 >>> + (and (eq_attr "cpu" "c86_4g_m7") >>> + (and (eq_attr "type" "sse") >>> + (and (eq_attr "mode" "V8DF") >>> + (and (eq_attr "c86_attr" "sqrt") >>> + (eq_attr "memory" "load"))))) >>> + >>> "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*34") >>> >>> ;; MSKLOG/MSKMOV >>> (define_insn_reservation "c86_4g_m7_avx512_msklog" 1 >>> @@ -1957,7 +2019,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_msklog_vector" 4 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "msklog") >>> (eq_attr "c86_decode" "vector"))) >>> - "c86-4g-m7-vector") >>> + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_mskmov_reg_k" 1 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1977,7 +2039,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_mskmov_z_k" 3 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> (and (eq_attr "type" "mskmov") >>> (match_operand:V8DI 0 "register_operand" >>> "v"))) >>> - >>> "c86-4g-m7-vector,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") >>> + >>> "c86-4g-m7-vector,c86-4g-m7-fpu3,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_mskmov_k_k" 1 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> @@ -1991,7 +2053,7 @@ (define_insn_reservation >>> "c86_4g_m7_avx512_mskmov_k_reg" 3 >>> (and (eq_attr "type" "mskmov") >>> (and (match_operand 0 "register_operand" >>> "k") >>> (match_operand 1 "register_operand" >>> "r")))) >>> - >>> "c86-4g-m7-double,c86-4g-m7-fpu1*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") >>> + >>> "c86-4g-m7-double,c86-4g-m7-fpu1,c86-4g-m7-fpu_1_3") >>> >>> (define_insn_reservation "c86_4g_m7_avx512_mskmov_k_m" 8 >>> (and (eq_attr "cpu" "c86_4g_m7") >>> diff --git a/gcc/config/i386/c86-4g.md b/gcc/config/i386/c86-4g.md >>> index 49a46a8aa19..8b81fcaabb2 100644 >>> --- a/gcc/config/i386/c86-4g.md >>> +++ b/gcc/config/i386/c86-4g.md >>> @@ -30,8 +30,10 @@ (define_attr "c86_attr" >>> "other,abs,sqrt,maxmin,blend,blendv,rcp,movnt,avg, >>> ;; HYGON Scheduling >>> ;; Modeling automatons for decoders, integer execution pipes, >>> ;; AGU pipes, floating point execution units, integer and >>> -;; floating point dividers. >>> -(define_automaton "c86_4g, c86_4g_ieu, c86_4g_fp, c86_4g_agu, c86_4g_idiv, >>> c86_4g_fdiv") >>> +;; floating point dividers. Split fp1 into its own automaton >>> +;; to keep this unit independent without increasing the main >>> +;; c86_4g_fp state space. >>> +(define_automaton "c86_4g, c86_4g_ieu, c86_4g_fp024, c86_4g_fp1, >>> c86_4g_agu, c86_4g_idiv, c86_4g_fdiv") >>> >>> ;; Decoders unit has 4 decoders and all of them can decode fast path >>> ;; and vector type instructions. >>> @@ -40,10 +42,6 @@ (define_cpu_unit "c86-4g-decode1" "c86_4g") >>> (define_cpu_unit "c86-4g-decode2" "c86_4g") >>> (define_cpu_unit "c86-4g-decode3" "c86_4g") >>> >>> -;; Two separated dividers for int and fp. >>> -(define_cpu_unit "c86-4g-idiv" "c86_4g_idiv") >>> -(define_cpu_unit "c86-4g-fdiv" "c86_4g_fdiv") >>> - >>> ;; Currently blocking all decoders for vector path instructions as >>> ;; they are dispatched separetely as microcode sequence. >>> ;; Fix me: Need to revisit this. >>> @@ -55,7 +53,6 @@ (define_reservation "c86-4g-direct" >>> "c86-4g-decode0|c86-4g-decode1|c86-4g-decode >>> ;; Fix me: Need to revisit this later to simulate fast path double >>> behavior. >>> (define_reservation "c86-4g-double" "c86-4g-direct") >>> >>> - >>> ;; Integer unit 4 ALU pipes. >>> (define_cpu_unit "c86-4g-ieu0" "c86_4g_ieu") >>> (define_cpu_unit "c86-4g-ieu1" "c86_4g_ieu") >>> @@ -63,6 +60,9 @@ (define_cpu_unit "c86-4g-ieu2" "c86_4g_ieu") >>> (define_cpu_unit "c86-4g-ieu3" "c86_4g_ieu") >>> (define_reservation "c86-4g-ieu" >>> "c86-4g-ieu0|c86-4g-ieu1|c86-4g-ieu2|c86-4g-ieu3") >>> >>> +;; One separated integer divider. >>> +(define_cpu_unit "c86-4g-idiv" "c86_4g_idiv") >>> + >>> ;; 2 AGU pipes in c86_4g >>> ;; According to CPU diagram last AGU unit is used only for stores. >>> (define_cpu_unit "c86-4g-agu0" "c86_4g_agu") >>> @@ -81,10 +81,10 @@ (define_reservation "c86-4g-ivector" >>> "c86-4g-ieu0+c86-4g-ieu1 >>> +c86-4g-agu0+c86-4g-agu1") >>> >>> ;; Floating point unit 4 FP pipes. >>> -(define_cpu_unit "c86-4g-fp0" "c86_4g_fp") >>> -(define_cpu_unit "c86-4g-fp1" "c86_4g_fp") >>> -(define_cpu_unit "c86-4g-fp2" "c86_4g_fp") >>> -(define_cpu_unit "c86-4g-fp3" "c86_4g_fp") >>> +(define_cpu_unit "c86-4g-fp0" "c86_4g_fp024") >>> +(define_cpu_unit "c86-4g-fp1" "c86_4g_fp1") >>> +(define_cpu_unit "c86-4g-fp2" "c86_4g_fp024") >>> +(define_cpu_unit "c86-4g-fp3" "c86_4g_fp024") >>> >>> (define_reservation "c86-4g-fpu" >>> "c86-4g-fp0|c86-4g-fp1|c86-4g-fp2|c86-4g-fp3") >>> >>> @@ -92,6 +92,11 @@ (define_reservation "c86-4g-fvector" >>> "c86-4g-fp0+c86-4g-fp1 >>> +c86-4g-fp2+c86-4g-fp3 >>> +c86-4g-agu0+c86-4g-agu1") >>> >>> +;; One separated FP divider. >>> +(define_cpu_unit "c86-4g-fdiv" "c86_4g_fdiv") >>> + >>> +(define_reservation "c86-4g-fp1fdivx4" "(c86-4g-fp1+c86-4g-fdiv)*4") >>> + >>> ;; Call instruction >>> (define_insn_reservation "c86_4g_call" 1 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> @@ -387,7 +392,7 @@ (define_insn_reservation "c86_4g_fp_sqrt" 22 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "type" "fpspc") >>> (eq_attr "c86_attr" "sqrt"))) >>> - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*22") >>> + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*18") >>> >>> (define_insn_reservation "c86_4g_sse_sqrt_sf" 14 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> @@ -395,7 +400,7 @@ (define_insn_reservation "c86_4g_sse_sqrt_sf" 14 >>> (and (eq_attr "memory" "none,unknown") >>> (and (eq_attr "c86_attr" "sqrt") >>> (eq_attr "type" "sse"))))) >>> - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*14") >>> + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*10") >>> >>> (define_insn_reservation "c86_4g_sse_sqrt_sf_mem" 21 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> @@ -403,7 +408,7 @@ (define_insn_reservation "c86_4g_sse_sqrt_sf_mem" 21 >>> (and (eq_attr "memory" "load") >>> (and (eq_attr "c86_attr" "sqrt") >>> (eq_attr "type" "sse"))))) >>> - >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*14") >>> + >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*10") >>> >>> (define_insn_reservation "c86_4g_sse_sqrt_df" 20 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> @@ -411,7 +416,7 @@ (define_insn_reservation "c86_4g_sse_sqrt_df" 20 >>> (and (eq_attr "memory" "none,unknown") >>> (and (eq_attr "c86_attr" "sqrt") >>> (eq_attr "type" "sse"))))) >>> - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*20") >>> + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*16") >>> >>> (define_insn_reservation "c86_4g_sse_sqrt_df_mem" 27 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> @@ -419,7 +424,7 @@ (define_insn_reservation "c86_4g_sse_sqrt_df_mem" 27 >>> (and (eq_attr "memory" "load") >>> (and (eq_attr "c86_attr" "sqrt") >>> (eq_attr "type" "sse"))))) >>> - >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*20") >>> + >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*16") >>> >>> ;; RCP >>> (define_insn_reservation "c86_4g_sse_rcp" 5 >>> @@ -492,20 +497,20 @@ (define_insn_reservation "c86_4g_fp_op_div" 15 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "type" "fdiv") >>> (eq_attr "memory" "none"))) >>> - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*15") >>> + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*11") >>> >>> (define_insn_reservation "c86_4g_fp_op_div_load" 22 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "type" "fdiv") >>> (eq_attr "memory" "load"))) >>> - >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*15") >>> + >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*11") >>> >>> -(define_insn_reservation "c86_4g_fp_op_idiv_load" 27 >>> +(define_insn_reservation "c86_4g_fp_op_idiv_load" 26 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "type" "fdiv") >>> (and (eq_attr "fp_int_src" "true") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*19") >>> + >>> "c86-4g-double,c86-4g-load,c86-4g-fp1*4,c86-4g-fp1fdivx4,c86-4g-fdiv*11") >>> >>> ;; MMX, SSE, SSEn.n, AVX, AVX2 instructions >>> (define_insn_reservation "c86_4g_fp_insn" 1 >>> @@ -1024,28 +1029,28 @@ (define_insn_reservation "c86_4g_ssediv_ss_ps" 10 >>> (eq_attr "mode" "V4SF,SF")) >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "none"))) >>> - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*10") >>> + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*6") >>> >>> (define_insn_reservation "c86_4g_ssediv_ss_ps_load" 17 >>> (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (eq_attr "mode" "V4SF,SF")) >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "load"))) >>> - >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*10") >>> + >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*6") >>> >>> (define_insn_reservation "c86_4g_ssediv_sd_pd" 13 >>> (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (eq_attr "mode" "V2DF,DF")) >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "none"))) >>> - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*13") >>> + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*9") >>> >>> (define_insn_reservation "c86_4g_ssediv_sd_pd_load" 20 >>> (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (eq_attr "mode" "V2DF,DF")) >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "load"))) >>> - >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*13") >>> + >>> "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*9") >>> >>> >>> (define_insn_reservation "c86_4g_ssediv_avx256_ps" 10 >>> @@ -1053,28 +1058,28 @@ (define_insn_reservation "c86_4g_ssediv_avx256_ps" >>> 10 >>> (and (eq_attr "mode" "V8SF") >>> (and (eq_attr "memory" "none") >>> (eq_attr "type" "ssediv")))) >>> - "c86-4g-double,c86-4g-fp1,c86-4g-fdiv*10") >>> + "c86-4g-double,c86-4g-fp1fdivx4,c86-4g-fdiv*6") >>> >>> (define_insn_reservation "c86_4g_ssediv_avx256_ps_load" 17 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "mode" "V8SF") >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*10") >>> + >>> "c86-4g-double,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*6") >>> >>> (define_insn_reservation "c86_4g_ssediv_avx256_pd" 13 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "mode" "V4DF") >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "none")))) >>> - "c86-4g-double,c86-4g-fp1,c86-4g-fdiv*13") >>> + "c86-4g-double,c86-4g-fp1fdivx4,c86-4g-fdiv*9") >>> >>> (define_insn_reservation "c86_4g_ssediv_avx256_pd_load" 20 >>> (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> (and (eq_attr "mode" "V4DF") >>> (and (eq_attr "type" "ssediv") >>> (eq_attr "memory" "load")))) >>> - >>> "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*13") >>> + >>> "c86-4g-double,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*9") >>> ;; SSE MUL >>> (define_insn_reservation "c86_4g_ssemul_ss_ps" 3 >>> (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") >>> -- >>> 2.34.1 >>> >
