Hi Uros, 在 2026/6/8 15:35, Uros Bizjak 写道: > On Mon, Jun 8, 2026 at 8:09 AM Kewen Lin <[email protected]> wrote: >> >> Hi Uros, >> >>>>>> Based on random sampling of SPEC2017 benchmarks 525.x264_r and >>>>> 521.wrf_r, I verified that the new modeling introduces no >>>>> significant compilation overhead. Testing with a single job on a >>>>> c86-4g-m7 machine revealed no impact on x264 and a tiny increase >>>>> for wrf (~0.3%). >>>>> >>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716681.html >>>>> >>>>> Bootstrapped and regress-tested on one c86-4g-m7 machine, as well >>>>> as a cfarm x86-64 machine. Is it ok for trunk? >>>>> >>>>> BR, >>>>> Kewen >>>>> ----- >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/i386/c86-4g-m7.md (c86_4g_m7_fpu): Remove automaton. >>>>> (c86_4g_m7_fpu02): New automaton. >>>>> (c86_4g_m7_fpu13): Ditto. >>>>> (c86-4g-m7-fpu0): Move to c86_4g_m7_fpu02 automaton. >>>>> (c86-4g-m7-fpu1): Move to c86_4g_m7_fpu13 automaton. >>>>> (c86-4g-m7-fpu2): Move to c86_4g_m7_fpu02 automaton. >>>>> (c86-4g-m7-fpu3): Move to c86_4g_m7_fpu13 automaton. >>>>> (c86-4g-m7-fdiv): Remove cpu unit. >>>>> (c86-4g-m7-fdiv1): New cpu unit. >>>>> (c86-4g-m7-fdiv3): Ditto. >>>>> (c86-4g-m7-fpu_0_3): New reservation. >>>>> (c86-4g-m7-fpu_1_3x2): Ditto. >>>>> (c86-4g-m7-fpu_1_3x3): Ditto. >>>>> (c86-4g-m7-fpu_1_3x6): Ditto. >>>>> (c86-4g-m7-fpux2): Ditto. >>>>> (c86-4g-m7-fpux4): Ditto. >>>>> (c86-4g-m7-fpux6): Ditto. >>>>> (c86-4g-m7-fpux8): Ditto. >>>>> (c86-4g-m7-fpux16): Ditto. >>>>> (c86-4g-m7-fp1fdiv1x4): Ditto. >>>>> (c86-4g-m7-fp3fdiv3x4): Ditto. >>>>> (c86-4g-m7-fdiv13): Ditto. >>>>> (c86-4g-m7-fp13div13): Ditto. >>>>> (c86-4g-m7-fp13div13x4): Ditto. >>>>> (c86-4g-m7-fp1div1_fp3div3_x4x8): Ditto. >>>>> (c86-4g-m7-fp1div1_fp3div3_x4x9): Ditto. >>>>> (c86-4g-m7-fp1div1_fp3div3_x4x11): Ditto. >>>>> (c86-4g-m7-fp1div1_fp3div3_x4x15): Ditto. >>>>> (c86-4g-m7-fp1div1_fp3div3_x4x18): Ditto. >>>>> (c86_4g_m7_idiv): New reservation. >>>>> (c86_4g_m7_idiv_QI): Adjust reservation latency and unit >>>>> occupancy. >>>>> (c86_4g_m7_idiv_load): New reservation. >>>>> (c86_4g_m7_idiv_QI_load): Adjust reservation latency and unit >>>>> occupancy. >>>>> (c86_4g_m7_idiv_DI): Remove reservation. >>>>> (c86_4g_m7_idiv_SI): Ditto. >>>>> (c86_4g_m7_idiv_HI): Ditto. >>>>> (c86_4g_m7_idiv_DI_load): Ditto. >>>>> (c86_4g_m7_idiv_SI_load): Ditto. >>>>> (c86_4g_m7_idiv_HI_load): Ditto. >>>>> (c86_4g_m7_sse_insertimm): Adjust reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_sse_insert): Ditto. >>>>> (c86_4g_m7_fp_sqrt): Adjust reservation. >>>>> (c86_4g_m7_fp_div): Ditto. >>>>> (c86_4g_m7_fp_div_load): Ditto. >>>>> (c86_4g_m7_fp_idiv_load): Ditto. >>>>> (c86_4g_m7_sse_pinsr_reg): Adjust reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_sse_pinsr_reg_load): Ditto. >>>>> (c86_4g_m7_avx_vpinsr_reg): Ditto. >>>>> (c86_4g_m7_avx_vpinsr_reg_load): Ditto. >>>>> (c86_4g_m7_avx512_perm_xmm): Delete the prefix condition. >>>>> (c86_4g_m7_avx512_perm_xmm_opload): Ditto. >>>>> (c86_4g_m7_avx512_permi2_ymm): Adjust reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_avx512_permi2_zmm): Ditto. >>>>> (c86_4g_m7_avx512_permi2_ymm_load): Ditto. >>>>> (c86_4g_m7_avx512_permi2_zmm_load): Ditto. >>>>> (c86_4g_m7_avx512_perm_zmm_imm): Ditto. >>>>> (c86_4g_m7_avx512_perm_zmm_imm_load): Ditto. >>>>> (c86_4g_m7_avx512_perm_zmm_noimm): Ditto. >>>>> (c86_4g_m7_sse_perm_zmm_noimm_load): Ditto. >>>>> (c86_4g_m7_avx_perm_ymm): Remove. >>>>> (c86_4g_m7_avx_perm_ymem): Ditto. >>>>> (c86_4g_m7_avx512_shuf_zmm): Adjust reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_avx512_shuf_zmem): Ditto. >>>>> (c86_4g_m7_avx512_cmpestr): Ditto. >>>>> (c86_4g_m7_avx512_cmpestr_load): Ditto. >>>>> (c86_4g_m7_avx512_vdbpsadbw_zmm): Ditto. >>>>> (c86_4g_m7_avx512_vdbpsadbw_zmem): Ditto. >>>>> (c86_4g_m7_avx_ssecomi_comi): Ditto. >>>>> (c86_4g_m7_avx_ssecomi_comi_load): Ditto. >>>>> (c86_4g_m7_avx512_expand): Ditto. >>>>> (c86_4g_m7_avx512_expand_load): Ditto. >>>>> (c86_4g_m7_avx512_expand_z): Ditto. >>>>> (c86_4g_m7_avx512_expand_z_load): Ditto. >>>>> (c86_4g_m7_sse_movnt_xy): Rename to c86_4g_m7_sse_movnt. >>>>> (c86_4g_m7_avx512_sseadd_xy): Adjust reservation units. >>>>> (c86_4g_m7_avx512_sseadd_xy_load): Ditto. >>>>> (c86_4g_m7_sse_sseiadd_hplus): Adjust reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_sse_sseiadd_hplus_load): Ditto. >>>>> (c86_4g_m7_avx512_ssemul): Adjust reservation units. >>>>> (c86_4g_m7_avx512_ssemul_load): Ditto. >>>>> (c86_4g_m7_avx512_ssediv): Remove. >>>>> (c86_4g_m7_avx512_ssediv_mem): Remove. >>>>> (c86_4g_m7_avx512_ssediv_x): New. >>>>> (c86_4g_m7_avx512_ssediv_xmem): New. >>>>> (c86_4g_m7_avx512_ssediv_y): New. >>>>> (c86_4g_m7_avx512_ssediv_ymem): New. >>>>> (c86_4g_m7_avx512_ssediv_z): Adjust reservation units. >>>>> (c86_4g_m7_avx512_ssediv_zmem): Ditto. >>>>> (c86_4g_m7_avx512_ssecmp_z): Add reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_avx512_ssecmp_z_load): Ditto. >>>>> (c86_4g_m7_avx512_ssecmp_vp_z): New reservation. >>>>> (c86_4g_m7_avx512_ssecmp_vp_z_load): Ditto. >>>>> (c86_4g_m7_avx512_ssecmp_test_z): Remove reservation. >>>>> (c86_4g_m7_avx512_ssecmp_test_z_load): Ditto. >>>>> (c86_4g_m7_avx512_muladd): Broaden matching condition. >>>>> (c86_4g_m7_avx512_muladd_load): Ditto. >>>>> (c86_4g_m7_fma_muladd): Remove reservation. >>>>> (c86_4g_m7_fma_muladd_load): Ditto. >>>>> (c86_4g_m7_avx512_sse_conflict_x): Add reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_avx512_sse_conflict_x_load): Ditto. >>>>> (c86_4g_m7_avx512_sse_conflict_y): Ditto. >>>>> (c86_4g_m7_avx512_sse_conflict_y_load): Ditto. >>>>> (c86_4g_m7_avx512_sse_conflict_z): Ditto. >>>>> (c86_4g_m7_avx512_sse_conflict_z_load): Ditto. >>>>> (c86_4g_m7_avx512_sse_class_z): Add reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_avx512_sse_class_z_load): Ditto. >>>>> (c86_4g_m7_avx512_sse_sqrt): Remove. >>>>> (c86_4g_m7_avx512_sse_sqrt_load): Remove. >>>>> (c86_4g_m7_avx512_sse_sqrt_sf_x): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_sf_xload): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_sf_y): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_sf_yload): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_sf_z): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_sf_zload): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_df_x): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_df_xload): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_df_y): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_df_yload): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_df_z): New. >>>>> (c86_4g_m7_avx512_sse_sqrt_df_zload): New. >>>>> (c86_4g_m7_avx512_msklog_vector): Add reservation units and unit >>>>> occupancy. >>>>> (c86_4g_m7_avx512_mskmov_z_k): Ditto. >>>>> (c86_4g_m7_avx512_mskmov_k_reg): Ditto. >>>>> * config/i386/c86-4g.md (c86_4g_fp): Remove automaton. >>>>> (c86_4g_fp024): New automaton. >>>>> (c86_4g_fp1): Ditto. >>>>> (c86-4g-fp0): Move to c86_4g_fp024 automaton. >>>>> (c86-4g-fp1): Move to c86_4g_fp1 automaton. >>>>> (c86-4g-fp2): Move to c86_4g_fp024 automaton. >>>>> (c86-4g-fp3): Ditto. >>>>> (c86-4g-fp1fdivx4): New reservation. >>>>> (c86_4g_fp_sqrt): Adjust reservation. >>>>> (c86_4g_sse_sqrt_sf): Ditto. >>>>> (c86_4g_sse_sqrt_sf_mem): Ditto. >>>>> (c86_4g_sse_sqrt_df): Ditto. >>>>> (c86_4g_sse_sqrt_df_mem): Ditto. >>>>> (c86_4g_fp_op_div): Ditto. >>>>> (c86_4g_fp_op_div_load): Ditto. >>>>> (c86_4g_fp_op_idiv_load): Adjust reservation latency. >>>>> (c86_4g_ssediv_ss_ps): Adjust reservation. >>>>> (c86_4g_ssediv_ss_ps_load): Ditto. >>>>> (c86_4g_ssediv_sd_pd): Ditto. >>>>> (c86_4g_ssediv_sd_pd_load): Ditto. >>>>> (c86_4g_ssediv_avx256_ps): Ditto. >>>>> (c86_4g_ssediv_avx256_ps_load): Ditto. >>>>> (c86_4g_ssediv_avx256_pd): Ditto. >>>>> (c86_4g_ssediv_avx256_pd_load): Ditto. >>>> >>>> LGTM (not a thorough review, but this patch is definitely an improvement). >>> >>> Thanks Uros! Pushed as r17-895-gdd682ea0414926. >> >> Since this patch has landed for more than 10 days without any issues >> reported, >> would it be ok to backport these three patches below to all the active >> release >> branches? >> >> r17-203-g2a64a63d982584 i386: Support HYGON c86-4g series processors >> r17-258-gc776dcd5f868a1 i386: Adjust some c86-4g*.md modeling to reduce >> build time >> r17-895-gdd682ea0414926 i386: Refine c86-4g fdiv scheduling model > > These patches are so intrusive that I don't feel comfortable to > backport them. As a new development, they don't fix any regression, so > it would be against current policy that allows only backports of > regression fixes in release branches. > > If you really feel they should go to release branches, then please > discuss the issue with release managers (CC'd).
Thanks for the quick response and for clarifying the concern! I understand that this is not a regression fix in the usual sense, and that the full three-patch series is more intrusive than a normal release-branch backport. The main motivation here is downstream usability. Some of our users and customers are still on older GCC releases, and GCC 17 may be too new for them to adopt in the near future. If the basic c86-4g enablement were available in an active release branch, they would have a practical upgrade path to a new GCC release that already includes this CPU support, without having to jump directly to GCC 17. IMHO, this patch series is relatively self-contained, it wouldn't affect other existing i386 processors. In particular, the exposed build-time issue has been addressed by the follow-up patches. I also looked at the history of AMD znver4/5 CPU enablement, the enablement patches were backported to the two then-latest release branches respectively. I wonder if this c86-4g enablement patch could be considered as a similar special-case CPU enablement backport. Would you be willing to reconsider this as a special-case CPU enablement backport perhaps only for gcc-16 and gcc-15? Also looking forward to Richi's and Jakub's opinions. BR, Kewen
