On Mon, Jun 8, 2026 at 3:36 PM Richard Biener
<[email protected]> wrote:
>
> On Mon, Jun 8, 2026 at 3:15 PM Kewen Lin <[email protected]> wrote:
> >
> > Hi Uros,
> >
> > 在 2026/6/8 15:35, Uros Bizjak 写道:
> > > On Mon, Jun 8, 2026 at 8:09 AM Kewen Lin <[email protected]> wrote:
> > >>
> > >> Hi Uros,
> > >>
> > >>>>>> Based on random sampling of SPEC2017 benchmarks 525.x264_r and
> > >>>>> 521.wrf_r, I verified that the new modeling introduces no
> > >>>>> significant compilation overhead.  Testing with a single job on a
> > >>>>> c86-4g-m7 machine revealed no impact on x264 and a tiny increase
> > >>>>> for wrf (~0.3%).
> > >>>>>
> > >>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716681.html
> > >>>>>
> > >>>>> Bootstrapped and regress-tested on one c86-4g-m7 machine, as well
> > >>>>> as a cfarm x86-64 machine.  Is it ok for trunk?
> > >>>>>
> > >>>>> BR,
> > >>>>> Kewen
> > >>>>> -----
> > >>>>>
> > >>>>> gcc/ChangeLog:
> > >>>>>
> > >>>>>         * config/i386/c86-4g-m7.md (c86_4g_m7_fpu): Remove automaton.
> > >>>>>         (c86_4g_m7_fpu02): New automaton.
> > >>>>>         (c86_4g_m7_fpu13): Ditto.
> > >>>>>         (c86-4g-m7-fpu0): Move to c86_4g_m7_fpu02 automaton.
> > >>>>>         (c86-4g-m7-fpu1): Move to c86_4g_m7_fpu13 automaton.
> > >>>>>         (c86-4g-m7-fpu2): Move to c86_4g_m7_fpu02 automaton.
> > >>>>>         (c86-4g-m7-fpu3): Move to c86_4g_m7_fpu13 automaton.
> > >>>>>         (c86-4g-m7-fdiv): Remove cpu unit.
> > >>>>>         (c86-4g-m7-fdiv1): New cpu unit.
> > >>>>>         (c86-4g-m7-fdiv3): Ditto.
> > >>>>>         (c86-4g-m7-fpu_0_3): New reservation.
> > >>>>>         (c86-4g-m7-fpu_1_3x2): Ditto.
> > >>>>>         (c86-4g-m7-fpu_1_3x3): Ditto.
> > >>>>>         (c86-4g-m7-fpu_1_3x6): Ditto.
> > >>>>>         (c86-4g-m7-fpux2): Ditto.
> > >>>>>         (c86-4g-m7-fpux4): Ditto.
> > >>>>>         (c86-4g-m7-fpux6): Ditto.
> > >>>>>         (c86-4g-m7-fpux8): Ditto.
> > >>>>>         (c86-4g-m7-fpux16): Ditto.
> > >>>>>         (c86-4g-m7-fp1fdiv1x4): Ditto.
> > >>>>>         (c86-4g-m7-fp3fdiv3x4): Ditto.
> > >>>>>         (c86-4g-m7-fdiv13): Ditto.
> > >>>>>         (c86-4g-m7-fp13div13): Ditto.
> > >>>>>         (c86-4g-m7-fp13div13x4): Ditto.
> > >>>>>         (c86-4g-m7-fp1div1_fp3div3_x4x8): Ditto.
> > >>>>>         (c86-4g-m7-fp1div1_fp3div3_x4x9): Ditto.
> > >>>>>         (c86-4g-m7-fp1div1_fp3div3_x4x11): Ditto.
> > >>>>>         (c86-4g-m7-fp1div1_fp3div3_x4x15): Ditto.
> > >>>>>         (c86-4g-m7-fp1div1_fp3div3_x4x18): Ditto.
> > >>>>>         (c86_4g_m7_idiv): New reservation.
> > >>>>>         (c86_4g_m7_idiv_QI): Adjust reservation latency and unit 
> > >>>>> occupancy.
> > >>>>>         (c86_4g_m7_idiv_load): New reservation.
> > >>>>>         (c86_4g_m7_idiv_QI_load): Adjust reservation latency and unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_idiv_DI): Remove reservation.
> > >>>>>         (c86_4g_m7_idiv_SI): Ditto.
> > >>>>>         (c86_4g_m7_idiv_HI): Ditto.
> > >>>>>         (c86_4g_m7_idiv_DI_load): Ditto.
> > >>>>>         (c86_4g_m7_idiv_SI_load): Ditto.
> > >>>>>         (c86_4g_m7_idiv_HI_load): Ditto.
> > >>>>>         (c86_4g_m7_sse_insertimm): Adjust reservation units and unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_sse_insert): Ditto.
> > >>>>>         (c86_4g_m7_fp_sqrt): Adjust reservation.
> > >>>>>         (c86_4g_m7_fp_div): Ditto.
> > >>>>>         (c86_4g_m7_fp_div_load): Ditto.
> > >>>>>         (c86_4g_m7_fp_idiv_load): Ditto.
> > >>>>>         (c86_4g_m7_sse_pinsr_reg): Adjust reservation units and unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_sse_pinsr_reg_load): Ditto.
> > >>>>>         (c86_4g_m7_avx_vpinsr_reg): Ditto.
> > >>>>>         (c86_4g_m7_avx_vpinsr_reg_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_perm_xmm): Delete the prefix condition.
> > >>>>>         (c86_4g_m7_avx512_perm_xmm_opload): Ditto.
> > >>>>>         (c86_4g_m7_avx512_permi2_ymm): Adjust reservation units and 
> > >>>>> unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_avx512_permi2_zmm): Ditto.
> > >>>>>         (c86_4g_m7_avx512_permi2_ymm_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_permi2_zmm_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_perm_zmm_imm): Ditto.
> > >>>>>         (c86_4g_m7_avx512_perm_zmm_imm_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_perm_zmm_noimm): Ditto.
> > >>>>>         (c86_4g_m7_sse_perm_zmm_noimm_load): Ditto.
> > >>>>>         (c86_4g_m7_avx_perm_ymm): Remove.
> > >>>>>         (c86_4g_m7_avx_perm_ymem): Ditto.
> > >>>>>         (c86_4g_m7_avx512_shuf_zmm): Adjust reservation units and unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_avx512_shuf_zmem): Ditto.
> > >>>>>         (c86_4g_m7_avx512_cmpestr): Ditto.
> > >>>>>         (c86_4g_m7_avx512_cmpestr_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_vdbpsadbw_zmm): Ditto.
> > >>>>>         (c86_4g_m7_avx512_vdbpsadbw_zmem): Ditto.
> > >>>>>         (c86_4g_m7_avx_ssecomi_comi): Ditto.
> > >>>>>         (c86_4g_m7_avx_ssecomi_comi_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_expand): Ditto.
> > >>>>>         (c86_4g_m7_avx512_expand_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_expand_z): Ditto.
> > >>>>>         (c86_4g_m7_avx512_expand_z_load): Ditto.
> > >>>>>         (c86_4g_m7_sse_movnt_xy): Rename to c86_4g_m7_sse_movnt.
> > >>>>>         (c86_4g_m7_avx512_sseadd_xy): Adjust reservation units.
> > >>>>>         (c86_4g_m7_avx512_sseadd_xy_load): Ditto.
> > >>>>>         (c86_4g_m7_sse_sseiadd_hplus): Adjust reservation units and 
> > >>>>> unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_sse_sseiadd_hplus_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_ssemul): Adjust reservation units.
> > >>>>>         (c86_4g_m7_avx512_ssemul_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_ssediv): Remove.
> > >>>>>         (c86_4g_m7_avx512_ssediv_mem): Remove.
> > >>>>>         (c86_4g_m7_avx512_ssediv_x): New.
> > >>>>>         (c86_4g_m7_avx512_ssediv_xmem): New.
> > >>>>>         (c86_4g_m7_avx512_ssediv_y): New.
> > >>>>>         (c86_4g_m7_avx512_ssediv_ymem): New.
> > >>>>>         (c86_4g_m7_avx512_ssediv_z): Adjust reservation units.
> > >>>>>         (c86_4g_m7_avx512_ssediv_zmem): Ditto.
> > >>>>>         (c86_4g_m7_avx512_ssecmp_z): Add reservation units and unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_avx512_ssecmp_z_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_ssecmp_vp_z): New reservation.
> > >>>>>         (c86_4g_m7_avx512_ssecmp_vp_z_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_ssecmp_test_z): Remove reservation.
> > >>>>>         (c86_4g_m7_avx512_ssecmp_test_z_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_muladd): Broaden matching condition.
> > >>>>>         (c86_4g_m7_avx512_muladd_load): Ditto.
> > >>>>>         (c86_4g_m7_fma_muladd): Remove reservation.
> > >>>>>         (c86_4g_m7_fma_muladd_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_conflict_x): Add reservation units and 
> > >>>>> unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_avx512_sse_conflict_x_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_conflict_y): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_conflict_y_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_conflict_z): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_conflict_z_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_class_z): Add reservation units and unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_avx512_sse_class_z_load): Ditto.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt): Remove.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_load): Remove.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_sf_x): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_sf_xload): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_sf_y): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_sf_yload): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_sf_z): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_sf_zload): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_df_x): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_df_xload): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_df_y): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_df_yload): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_df_z): New.
> > >>>>>         (c86_4g_m7_avx512_sse_sqrt_df_zload): New.
> > >>>>>         (c86_4g_m7_avx512_msklog_vector): Add reservation units and 
> > >>>>> unit
> > >>>>>         occupancy.
> > >>>>>         (c86_4g_m7_avx512_mskmov_z_k): Ditto.
> > >>>>>         (c86_4g_m7_avx512_mskmov_k_reg): Ditto.
> > >>>>>         * config/i386/c86-4g.md (c86_4g_fp): Remove automaton.
> > >>>>>         (c86_4g_fp024): New automaton.
> > >>>>>         (c86_4g_fp1): Ditto.
> > >>>>>         (c86-4g-fp0): Move to c86_4g_fp024 automaton.
> > >>>>>         (c86-4g-fp1): Move to c86_4g_fp1 automaton.
> > >>>>>         (c86-4g-fp2): Move to c86_4g_fp024 automaton.
> > >>>>>         (c86-4g-fp3): Ditto.
> > >>>>>         (c86-4g-fp1fdivx4): New reservation.
> > >>>>>         (c86_4g_fp_sqrt): Adjust reservation.
> > >>>>>         (c86_4g_sse_sqrt_sf): Ditto.
> > >>>>>         (c86_4g_sse_sqrt_sf_mem): Ditto.
> > >>>>>         (c86_4g_sse_sqrt_df): Ditto.
> > >>>>>         (c86_4g_sse_sqrt_df_mem): Ditto.
> > >>>>>         (c86_4g_fp_op_div): Ditto.
> > >>>>>         (c86_4g_fp_op_div_load): Ditto.
> > >>>>>         (c86_4g_fp_op_idiv_load): Adjust reservation latency.
> > >>>>>         (c86_4g_ssediv_ss_ps): Adjust reservation.
> > >>>>>         (c86_4g_ssediv_ss_ps_load): Ditto.
> > >>>>>         (c86_4g_ssediv_sd_pd): Ditto.
> > >>>>>         (c86_4g_ssediv_sd_pd_load): Ditto.
> > >>>>>         (c86_4g_ssediv_avx256_ps): Ditto.
> > >>>>>         (c86_4g_ssediv_avx256_ps_load): Ditto.
> > >>>>>         (c86_4g_ssediv_avx256_pd): Ditto.
> > >>>>>         (c86_4g_ssediv_avx256_pd_load): Ditto.
> > >>>>
> > >>>> LGTM (not a thorough review, but this patch is definitely an 
> > >>>> improvement).
> > >>>
> > >>> Thanks Uros!  Pushed as r17-895-gdd682ea0414926.
> > >>
> > >> Since this patch has landed for more than 10 days without any issues 
> > >> reported,
> > >> would it be ok to backport these three patches below to all the active 
> > >> release
> > >> branches?
> > >>
> > >>   r17-203-g2a64a63d982584  i386: Support HYGON c86-4g series processors
> > >>   r17-258-gc776dcd5f868a1   i386: Adjust some c86-4g*.md modeling to 
> > >> reduce build time
> > >>   r17-895-gdd682ea0414926  i386: Refine c86-4g fdiv scheduling model
> > >
> > > These patches are so intrusive that I don't feel comfortable to
> > > backport them. As a new development, they don't fix any regression, so
> > > it would be against current policy that allows only backports of
> > > regression fixes in release branches.
> > >
> > > If you really feel they should go to release branches, then please
> > > discuss the issue with release managers (CC'd).
> >
> > Thanks for the quick response and for clarifying the concern!
> >
> > I understand that this is not a regression fix in the usual sense,
> > and that the full three-patch series is more intrusive than a normal
> > release-branch backport.  The main motivation here is downstream
> > usability. Some of our users and customers are still on older GCC
> > releases, and GCC 17 may be too new for them to adopt in the near
> > future.  If the basic c86-4g enablement were available in an active
> > release branch, they would have a practical upgrade path to a new
> > GCC release that already includes this CPU support, without
> > having to jump directly to GCC 17.  IMHO, this patch series is
> > relatively self-contained, it wouldn't affect other existing i386
> > processors.  In particular, the exposed build-time issue has been
> > addressed by the follow-up patches.
> >
> > I also looked at the history of AMD znver4/5 CPU enablement, the
> > enablement patches were backported to the two then-latest release
> > branches respectively.  I wonder if this c86-4g enablement patch
> > could be considered as a similar special-case CPU enablement
> > backport.
> >
> > Would you be willing to reconsider this as a special-case CPU
> > enablement backport perhaps only for gcc-16 and gcc-15?
> >
> > Also looking forward to Richi's and Jakub's opinions.
>
> I think we've usually refrained from adding scheduler models to branches
> but indeed -march/tune=XYZ enablement is usually backported including
> basic cost tables when necessary.
>
> As RM I defer to target maintainers for such issues.

OK, then let's backport the series to gcc-16 and (after being unfrozen) gcc-15.

Thanks,
Uros.

Reply via email to