[PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-24 Thread mayshao
Hi all:
This patch enables -march/-mtune=yongfeng, costs and tunings are set 
according to the characteristics of the processor. We add a new md file to 
describe yongfeng processor.

Bootstrapped /regtested X86_64.

Ok for trunk?
BR
Mayshao
gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize yongfeng.
* common/config/i386/i386-common.cc: Add yongfeng.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add 
ZHAOXIN_FAM7H_YONGFENG.
* config.gcc: Add yongfeng.
* config/i386/driver-i386.cc (host_detect_local_cpu): Let -march=native
recognize yongfeng processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add yongfeng.
* config/i386/i386-options.cc (m_YONGFENG): New definition.
(m_ZHAOXIN): Ditto.
* config/i386/i386.h (enum processor_type): Add PROCESSOR_YONGFENG.
* config/i386/i386.md: Add yongfeng.
* config/i386/lujiazui.md: Fix typo.
* config/i386/x86-tune-costs.h (struct processor_costs): Add yongfeng 
costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace m_LUJIAZUI by 
m_ZHAOXIN.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
(X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about yongfeng.
* doc/invoke.texi: Ditto.
* config/i386/yongfeng.md: New file for decribing yongfeng processor.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv32.C: Handle new march.
* gcc.target/i386/funcspec-56.inc: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |   6 +
 gcc/common/config/i386/i386-common.cc |  10 +-
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/config.gcc|  12 +-
 gcc/config/i386/driver-i386.cc|   5 +
 gcc/config/i386/i386-c.cc |   7 +
 gcc/config/i386/i386-options.cc   |   3 +
 gcc/config/i386/i386.h|   9 +
 gcc/config/i386/i386.md   |   3 +-
 gcc/config/i386/lujiazui.md   |   2 +-
 gcc/config/i386/x86-tune-costs.h  | 116 +++
 gcc/config/i386/x86-tune-sched.cc |  27 +-
 gcc/config/i386/x86-tune.def  |  75 +-
 gcc/config/i386/yongfeng.md   | 848 ++
 gcc/doc/extend.texi   |   3 +
 gcc/doc/invoke.texi   |   6 +
 gcc/testsuite/g++.target/i386/mv32.C  |   5 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   6 +-
 18 files changed, 1095 insertions(+), 49 deletions(-)
 create mode 100644 gcc/config/i386/yongfeng.md

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index f7218887c0f..7d25479eb89 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -663,6 +663,12 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
  reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
  cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
}
+ else if (model >= 0x5b)
+   {
+ cpu = "yongfeng";
+ CHECK___builtin_cpu_is ("yongfeng");
+ cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
+   }
   break;
   

[gcc12 backport] i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

2023-03-08 Thread mayshao
From: mayshao-oc 

Hi Jakub:
  This is backport of the fix for PR target/100758 from mainline to the gcc12 
release branch. Because the bug still exists in gcc12 on Zhaoxin platform, and 
it will incur ISA feature detection failure, we want to fix it as the 
mainline.This patch has been retested against the gcc12 branch on Intel, Amd, 
Zhaoxin with make bootstrap and make -k check without failure. Ok for the gcc12 
branch?

BR
Mayshao

gcc/ChangeLog:
PR target/100758
* common/config/i386/cpuinfo.h (cpu_indicator_init): Call 
get_available_features
for all CPUs with max_level >= 1, rather than just Intel, AMD.
---
 gcc/common/config/i386/cpuinfo.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 388f4798406..0333da56ba5 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -910,6 +910,10 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   extended_model = (eax >> 12) & 0xf0;
   extended_family = (eax >> 20) & 0xff;
 
+  /* Find available features. */
+  get_available_features (cpu_model, cpu_model2, cpu_features2,
+ ecx, edx);
+
   if (vendor == signature_INTEL_ebx)
 {
   /* Adjust model and family for Intel CPUS. */
@@ -924,9 +928,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_intel_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_INTEL;
@@ -943,9 +944,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_amd_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_AMD;
-- 
2.17.1



[gcc11 backport] i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

2023-03-08 Thread mayshao
Hi Jakub:
  This is backport of the fix for PR target/100758 from mainline to the gcc11 
release branch. Because the bug still exists in gcc11 on Zhaoxin platform, and 
it will incur ISA feature detection failure, we want to fix it as the 
mainline.This patch has been retested against the gcc11 branch on 
Intel,Amd,Zhaoxin with make bootstrap and make -k check without failure. Ok for 
the gcc11 branch?

BR
Mayshao

gcc/ChangeLog:
PR target/100758
* common/config/i386/cpuinfo.h (cpu_indicator_init): Call 
get_available_features
for all CPUs with max_level >= 1, rather than just Intel, AMD.
---
 gcc/common/config/i386/cpuinfo.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 18ff71ac5ad..00b5eee21d4 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -882,6 +882,10 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   extended_model = (eax >> 12) & 0xf0;
   extended_family = (eax >> 20) & 0xff;
 
+  /* Find available features. */
+  get_available_features (cpu_model, cpu_model2, cpu_features2,
+ ecx, edx);
+
   if (vendor == signature_INTEL_ebx)
 {
   /* Adjust model and family for Intel CPUS. */
@@ -896,9 +900,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_intel_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_INTEL;
@@ -915,9 +916,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_amd_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_AMD;
-- 
2.17.1



[gcc10 backport] i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

2023-03-08 Thread mayshao
From: mayshao-oc 

Hi Jakub:
  This is backport of the fix for PR target/100758 from mainline to the gcc10 
release branch. Because the bug still exists in gcc10 on Zhaoxin platform, and 
it will incur ISA feature detection failure, we want to fix it as the 
mainline.This patch has been retested against the gcc10 branch on 
Intel,Amd,Zhaoxin with make bootstrap and make -k check without failure. Ok for 
the gcc10 branch?

BR
Mayshao

libgcc/ChangeLog:
PR target/100758
* config/i386/cpuinfo.c (__cpu_indicator_init): Call 
get_available_features
for all CPUs with max_level >= 1, rather than just Intel, AMD.
---
 libgcc/config/i386/cpuinfo.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index 83301a1445f..ad250edbcb7 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -474,6 +474,9 @@ __cpu_indicator_init (void)
   extended_model = (eax >> 12) & 0xf0;
   extended_family = (eax >> 20) & 0xff;
 
+  /* Find available features. */
+  get_available_features (ecx, edx, max_level);
+
   if (vendor == signature_INTEL_ebx)
 {
   /* Adjust model and family for Intel CPUS. */
@@ -487,8 +490,6 @@ __cpu_indicator_init (void)
 
   /* Get CPU type.  */
   get_intel_cpu (family, model, brand_id);
-  /* Find available features. */
-  get_available_features (ecx, edx, max_level);
   __cpu_model.__cpu_vendor = VENDOR_INTEL;
 }
   else if (vendor == signature_AMD_ebx)
@@ -502,8 +503,6 @@ __cpu_indicator_init (void)
 
   /* Get CPU type.  */
   get_amd_cpu (family, model);
-  /* Find available features. */
-  get_available_features (ecx, edx, max_level);
   __cpu_model.__cpu_vendor = VENDOR_AMD;
 }
   else
-- 
2.17.1



[PATCH] [x86_64] Zhaoxin lujiazui enablement

2022-03-24 Thread MayShao
Hi Uros,

This patch fix Zhaoxin CPU Vendor ID detection problem
and add Zhaoxin "lujiazui" processor support and tuning.

Currently gcc can't recognize Zhaoxin CPU (Vendor ID "CentaurHauls" and 
"Shanghai")
and wrongly identify Zhaoxin "lujiazui" as Intel core2 or i386, which is 
confusing for users.

This patch enables -march/-mtune=lujiazui. Lujiazui is Zhaonxin family 7th 
processor.
Costs and tunings are set according to the characteristics of the processor.
We add a new md file to describe lujiazui pipeline.

Testing :
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

OK for master?

Background:
Related Zhaoxin linux kernel patch can be found at:
https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bd...@zhaoxin.com/

Related Zhaoxin glibc patch can be found at:
https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193

gcc/ChangeLog:

   * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Detect
   the cpu type of ZHAOXIN processors.
   (cpu_indicator_init): Handle ZHAOXIN processors.
   * common/config/i386/i386-common.cc: Add lujiazui.
   * common/config/i386/i386-cpuinfo.h (enum processor_vendor): Add
   VENDOR_ZHAOXIN.
   (enum processor_types): Add ZHAOXIN_FAM7H.
   (enum processor_subtypes):Add ZHAOXIN_FAM7H_LUJIAZUI.
   * config.gcc: Add -march=lujiazui.
   * config/i386/cpuid.h (signature_SHANGHAI_ebx): New definition
   for ZHAOXIN.
   (signature_SHANGHAI_ecx): Likewise.
   (signature_SHANGHAI_edx): Likewise.
   * config/i386/driver-i386.cc (host_detect_local_cpu): Let
   -march=native recognize lujiazui processor.
   * config/i386/i386-c.cc (ix86_target_macros_internal): Add
   lujiazui def_or_undef.
   * config/i386/i386-options.cc (m_LUJIAZUI): New definition.
   * config/i386/i386.h (enum processor_type): Add PROCESSOR_LUJIAZUI.
   * config/i386/i386.md: Add lujiazui cpu and include new md file.
   * config/i386/x86-tune-costs.h (struct processor_costs): Add
   lujiazui_cost.
   * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add lujiazui.
   (ix86_adjust_cost): Likewise.
   * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for lujiazui.
   (X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
   (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
   (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
   (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
   (X86_TUNE_MOVX): Likewise.
   (X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
   (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Likewise.
   (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Likewise.
   (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Likewise.
   (X86_TUNE_FUSE_ALU_AND_BRANCH): Likewise.
   (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Likewise.
   (X86_TUNE_USE_LEAVE): Likewise.
   (X86_TUNE_PUSH_MEMORY): Likewise.
   (X86_TUNE_LCP_STALL): Likewise.
   (X86_TUNE_USE_INCDEC): Likewise.
   (X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
   (X86_TUNE_OPT_AGU): Likewise.
   (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Likewise.
   (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
   (X86_TUNE_USE_SAHF): Likewise.
   (X86_TUNE_USE_BT): Likewise.
   (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
   (X86_TUNE_ONE_IF_CONV_INSN): Likewise.
   (X86_TUNE_AVOID_MFENCE): Likewise.
   (X86_TUNE_EXPAND_ABS): Likewise.
   (X86_TUNE_USE_SIMODE_FIOP): Likewise.
   (X86_TUNE_USE_FFREEP): Likewise.
   (X86_TUNE_EXT_80387_CONSTANTS): Likewise.
   (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
   (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
   (X86_TUNE_SSE_TYPELESS_STORES): Likewise.
   (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
   (X86_TUNE_USE_GATHER): Likewise.
   * doc/extend.texi: Add lujiazui.
   * doc/invoke.texi: Add details about lujiazui.
   * config/i386/lujiazui.md: New file for describing lujiazui pipeline.

gcc/testsuite/ChangeLog:

   * gcc.target/i386/funcspec-56.inc: Handle new march.
   * g++.target/i386/mv31.C: New test for -march=lujiazui.
---
 gcc/common/config/i386/cpuinfo.h  |  51 +-
 gcc/common/config/i386/i386-common.cc |   9 +
 gcc/common/config/i386/i386-cpuinfo.h |   3 +
 gcc/config.gcc|  10 +-
 gcc/config/i386/cpuid.h   |   4 +
 gcc/config/i386/driver-i386.cc|  20 +-
 gcc/config/i386/i386-c.cc |   7 +
 gcc/config/i386/i386-options.cc   |   3 +
 gcc/config/i386/i386.h|   1 +
 gcc/config/i386/i386.md   |   5 +-
 gcc/config/i386/lujiazui.md   | 844 ++
 gcc/config/i386/x86-tune-costs.h  | 115 +++
 gcc/config/i386/x86-tune-sched.cc |   2 +
 gcc/config/i386/x86-tune.def  |  91 +-
 gcc/doc/extend.texi   

[PATCH] [x86_64]: Zhaoxin lujiazui enablement

2022-05-16 Thread mayshao
Hi Uros:
This patch fix Zhaoxin CPU vendor ID detection problem and add zhaoxin 
"lujiazui" processor support.
Currently gcc can't recognize Zhaoxin CPU(vendor ID "CentaurHauls" and 
"Shanghai") if user use -march=native option, which is confusing for users.
This patch enables -march=native in zhaoxin family 7th processor and 
-march/-mtune=lujiazui, costs and tunning are set according to the 
characteristics of the processor.We add a new md file to describe lujiazui 
pipeline.

Testing:
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

Ok for master?

Background:
Related Zhaoxin linux kernel patch can be found at:
https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bd...@zhaoxin.com/

Related Zhaoxin glibc patch can be found at:
https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu):
(cpu_indicator_init):
* common/config/i386/i386-common.cc:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
(enum processor_types):
(enum processor_subtypes):
* config.gcc:
* config/i386/cpuid.h (signature_SHANGHAI_ebx):
(signature_SHANGHAI_ecx):
(signature_SHANGHAI_edx):
* config/i386/driver-i386.cc (host_detect_local_cpu):
* config/i386/i386-c.cc (ix86_target_macros_internal):
* config/i386/i386-options.cc (m_LUJIAZUI):
* config/i386/i386.h (enum processor_type):
* config/i386/i386.md:
* config/i386/x86-tune-costs.h (struct processor_costs):
* config/i386/x86-tune-sched.cc (ix86_issue_rate):
(ix86_adjust_cost):
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE):
(X86_TUNE_PARTIAL_REG_DEPENDENCY):
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY):
(X86_TUNE_MOVX):
(X86_TUNE_MEMORY_MISMATCH_STALL):
(X86_TUNE_FUSE_CMP_AND_BRANCH_32):
(X86_TUNE_FUSE_CMP_AND_BRANCH_64):
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS):
(X86_TUNE_FUSE_ALU_AND_BRANCH):
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS):
(X86_TUNE_USE_LEAVE):
(X86_TUNE_PUSH_MEMORY):
(X86_TUNE_LCP_STALL):
(X86_TUNE_USE_INCDEC):
(X86_TUNE_INTEGER_DFMODE_MOVES):
(X86_TUNE_OPT_AGU):
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES):
(X86_TUNE_USE_SAHF):
(X86_TUNE_USE_BT):
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI):
(X86_TUNE_ONE_IF_CONV_INSN):
(X86_TUNE_AVOID_MFENCE):
(X86_TUNE_EXPAND_ABS):
(X86_TUNE_USE_SIMODE_FIOP):
(X86_TUNE_USE_FFREEP):
(X86_TUNE_EXT_80387_CONSTANTS):
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL):
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL):
(X86_TUNE_SSE_TYPELESS_STORES):
(X86_TUNE_SSE_LOAD0_BY_PXOR):
* doc/extend.texi:
* doc/invoke.texi:
* config/i386/lujiazui.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc:
* g++.target/i386/mv32.C: New test.
---
 gcc/common/config/i386/cpuinfo.h  |  54 +-
 gcc/common/config/i386/i386-common.cc |   8 +
 gcc/common/config/i386/i386-cpuinfo.h |   3 +
 gcc/config.gcc|  10 +-
 gcc/config/i386/cpuid.h   |   4 +
 gcc/config/i386/driver-i386.cc|  20 +-
 gcc/config/i386/i386-c.cc |   7 +
 gcc/config/i386/i386-options.cc   |   3 +
 gcc/config/i386/i386.h|   1 +
 gcc/config/i386/i386.md   |   5 +-
 gcc/config/i386/lujiazui.md   | 844 ++
 gcc/config/i386/x86-tune-costs.h  | 115 +++
 gcc/config/i386/x86-tune-sched.cc |   2 +
 gcc/config/i386/x86-tune.def  |  89 +-
 gcc/doc/extend.texi   |   3 +
 gcc/doc/invoke.texi   |   5 +
 gcc/testsuite/g++.target/i386/mv32.C  |  31 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 18 files changed, 1159 insertions(+), 47 deletions(-)
 create mode 100644 gcc/config/i386/lujiazui.md
 create mode 100644 gcc/testsuite/g++.target/i386/mv32.C

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 6d6171f4555..adc02bc3d98 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -526,6 +526,39 @@ get_intel_cpu (struct __processor_model *cpu_model,
   return cpu;
 }
 
+/* Get the specific type of ZHAOXIN CPU and return ZHAOXIN CPU name.
+   Return NULL for unknown ZHAOXIN CPU.  */
+
+static inline const char *
+get_zhaoxin_cpu (struct __processor_model *cpu_model,
+   struct __processor_model2 *cpu_model2,
+   un

[PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-05-27 Thread MayShao
From: mayshao 

Hi all:
This patch enables -march/-mtune=shijidadao, costs and tunings are set 
according to the characteristics of the processor.

Bootstrapped /regtested X86_64.

Ok for trunk?
BR
Mayshao
gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
* common/config/i386/i386-common.cc: Add shijidadao.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
Add ZHAOXIN_FAM7H_SHIJIDADAO.
* config.gcc: Add shijidadao.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Let -march=native recognize shijidadao processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
* config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
(m_SHIJIDADAO): New definition.
* config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
* config/i386/x86-tune-costs.h (struct processor_costs):
Add shijidadao_cost.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add 
m_SHIJIDADAO.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about shijidadao.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv32.C: Handle new -march
* gcc.target/i386/funcspec-56.inc: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |   8 +-
 gcc/common/config/i386/i386-common.cc |   8 +-
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/config.gcc|  14 ++-
 gcc/config/i386/driver-i386.cc|  11 +-
 gcc/config/i386/i386-c.cc |   7 ++
 gcc/config/i386/i386-options.cc   |   4 +-
 gcc/config/i386/i386.h|   1 +
 gcc/config/i386/x86-tune-costs.h  | 116 ++
 gcc/config/i386/x86-tune-sched.cc |   2 +
 gcc/config/i386/x86-tune.def  |   8 +-
 gcc/doc/extend.texi   |   3 +
 gcc/doc/invoke.texi   |   6 +
 gcc/testsuite/g++.target/i386/mv32.C  |   6 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 15 files changed, 183 insertions(+), 14 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 4610bf6d6a4..936039725ab 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
  reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
  cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
}
- else if (model >= 0x5b)
+ else if (model == 0x5b)
{
  cpu = "yongfeng";
  CHECK___builtin_cpu_is ("yongfeng");
  cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
}
+ else if (model >= 0x6b)
+   {
+ cpu = "shijidadao";
+ CHECK___builtin_cpu_is ("shijidadao");
+ cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
+   }
   break;
 default:
   break;
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 895e5fa662d..eb3f94c529c 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2066,6 +2066,7 @@ const char *const processor_names[] =
   "intel",
   "lujiazui",
   "yongfeng",
+  "shijidadao",
   "geode",
   "k6",
   "athlon",
@@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
   | PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
   {"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
PTA_LUJIAZUI,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
   {"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
PTA_YONGFENG,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
+  {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
+   PTA_YONGFENG,
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
   {"k8", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
   | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 9edad96d4fd..fa3b76f4931 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -104,6 +104,7 @@ enum processor_subtypes
   INTEL_COREI7_PANTHERLAKE,
   ZHAOXIN_FAM7H_YONGFENG,
   AMDFAM1AH_ZNVER5,
+ 

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-30 Thread Mayshao-oc
>On Fri, Oct 27, 2023 at 12:20 PM mayshao  wrote:
>>
>> On 2023/10/26 17:34, Uros Bizjak wrote:
>> > On Wed, Oct 25, 2023 at 8:43 AM mayshao  wrote:
>> >>
>> >> Hi all:
>> >>  This patch enables -march/-mtune=yongfeng, costs and tunings are set 
>> >> according to the characteristics of the processor. We add a new md file 
>> >> to describe yongfeng processor.
>> >>
>> >>  Bootstrapped /regtested X86_64.
>> >>
>> >>  Ok for trunk?
>> >> BR
>> >> Mayshao
>> >> gcc/ChangeLog:
>> >>
>> >>  * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize 
>> >> yongfeng.
>> >>  * common/config/i386/i386-common.cc: Add yongfeng.
>> >>  * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): 
>> >> Add ZHAOXIN_FAM7H_YONGFENG.
>> >>  * config.gcc: Add yongfeng.
>> >>  * config/i386/driver-i386.cc (host_detect_local_cpu): Let 
>> >> -march=native
>> >>  recognize yongfeng processors.
>> >>  * config/i386/i386-c.cc (ix86_target_macros_internal): Add 
>> >> yongfeng.
>> >>  * config/i386/i386-options.cc (m_YONGFENG): New definition.
>> >>  (m_ZHAOXIN): Ditto.
>> >>  * config/i386/i386.h (enum processor_type): Add 
>> >> PROCESSOR_YONGFENG.
>> >>  * config/i386/i386.md: Add yongfeng.
>> >>  * config/i386/lujiazui.md: Fix typo.
>> >>  * config/i386/x86-tune-costs.h (struct processor_costs): Add 
>> >> yongfeng costs.
>> >>  * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng.
>> >>  (ix86_adjust_cost): Ditto.
>> >>  * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace 
>> >> m_LUJIAZUI by m_ZHAOXIN.
>> >>  (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
>> >>  (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
>> >>  (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
>> >>  (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
>> >>  (X86_TUNE_MOVX): Ditto.
>> >>  (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
>> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
>> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
>> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
>> >>  (X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
>> >>  (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
>> >>  (X86_TUNE_USE_LEAVE): Ditto.
>> >>  (X86_TUNE_PUSH_MEMORY): Ditto.
>> >>  (X86_TUNE_LCP_STALL): Ditto.
>> >>  (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
>> >>  (X86_TUNE_OPT_AGU): Ditto.
>> >>  (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
>> >>  (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
>> >>  (X86_TUNE_USE_SAHF): Ditto.
>> >>  (X86_TUNE_USE_BT): Ditto.
>> >>  (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
>> >>  (X86_TUNE_ONE_IF_CONV_INSN): Ditto.
>> >>  (X86_TUNE_AVOID_MFENCE): Ditto.
>> >>  (X86_TUNE_EXPAND_ABS): Ditto.
>> >>  (X86_TUNE_USE_SIMODE_FIOP): Ditto.
>> >>  (X86_TUNE_USE_FFREEP): Ditto.
>> >>  (X86_TUNE_EXT_80387_CONSTANTS): Ditto.
>> >>  (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
>> >>  (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
>> >>  (X86_TUNE_SSE_TYPELESS_STORES): Ditto.
>> >>  (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
>> >>  (X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
>> >>  (X86_TUNE_USE_GATHER_4PARTS): Ditto.
>> >>  (X86_TUNE_USE_GATHER_8PARTS): Ditto.
>> >>  (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
>> >>  * doc/extend.texi: Add details about yongfeng.
>> >>  * doc/invoke.texi: Ditto.
>> >>  * config/i386/yongfeng.md: New file for decribing yongfeng 
>> >> processor.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >>
>> >>  * g++.target/i386/mv32.C: Handle new march.
>> >>  * gcc.target/i386/funcspec-56.inc: Ditto.
>> >
>> > LGTM.
>> >
>> > There are a couple of comments that 

Re: [PATCH] [x86_64] Zhaoxin lujiazui enablement

2022-03-28 Thread Mayshao-oc
On Sun, Mar 27, 2022 at 5:15 PM Uros Bizjak  wrote:
> On Fri, Mar 25, 2022 at 3:08 AM MayShao  wrote:
> >
> > Hi Uros,
> >
> > This patch fix Zhaoxin CPU Vendor ID detection problem
> > and add Zhaoxin "lujiazui" processor support and tuning.
> >
> > Currently gcc can't recognize Zhaoxin CPU (Vendor ID "CentaurHauls" and 
> > "Shanghai")
> > and wrongly identify Zhaoxin "lujiazui" as Intel core2 or i386, which is 
> > confusing for users.
> >
> > This patch enables -march/-mtune=lujiazui. Lujiazui is Zhaonxin family 7th 
> > processor.
> > Costs and tunings are set according to the characteristics of the processor.
> > We add a new md file to describe lujiazui pipeline.
> >
> > Testing :
> > Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
> >
> > OK for master?
>
> This patch is not a bugfix, so it will have to wait for a next stage 1
> to reopen.
>
> Uros.
>
Yes, Thanks for your reminder.
Then please help to review this patch again
when the next stage 1 reopen.
I have ever contributed to glibc before, should I need to
re-sign the FSF copyright assignment for this patch?


May
> >
> > Background:
> > Related Zhaoxin linux kernel patch can be found at:
> >  https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bd...@zhaoxin.com/
> >
> > Related Zhaoxin glibc patch can be found at:
> >  
> > https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193
> >
> > gcc/ChangeLog:
> >
> >* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Detect
> >the cpu type of ZHAOXIN processors.
> >(cpu_indicator_init): Handle ZHAOXIN processors.
> >* common/config/i386/i386-common.cc: Add lujiazui.
> >* common/config/i386/i386-cpuinfo.h (enum processor_vendor): Add
> >VENDOR_ZHAOXIN.
> >(enum processor_types): Add ZHAOXIN_FAM7H.
> >(enum processor_subtypes):Add ZHAOXIN_FAM7H_LUJIAZUI.
> >* config.gcc: Add -march=lujiazui.
> >* config/i386/cpuid.h (signature_SHANGHAI_ebx): New definition
> >for ZHAOXIN.
> >(signature_SHANGHAI_ecx): Likewise.
> >(signature_SHANGHAI_edx): Likewise.
> >* config/i386/driver-i386.cc (host_detect_local_cpu): Let
> >-march=native recognize lujiazui processor.
> >* config/i386/i386-c.cc (ix86_target_macros_internal): Add
> >lujiazui def_or_undef.
> >* config/i386/i386-options.cc (m_LUJIAZUI): New definition.
> >* config/i386/i386.h (enum processor_type): Add PROCESSOR_LUJIAZUI.
> >* config/i386/i386.md: Add lujiazui cpu and include new md file.
> >* config/i386/x86-tune-costs.h (struct processor_costs): Add
> >lujiazui_cost.
> >* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add lujiazui.
> >(ix86_adjust_cost): Likewise.
> >* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for lujiazui.
> >(X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
> >(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
> >(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
> >(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
> >(X86_TUNE_MOVX): Likewise.
> >(X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
> >(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Likewise.
> >(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Likewise.
> >(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Likewise.
> >(X86_TUNE_FUSE_ALU_AND_BRANCH): Likewise.
> >(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Likewise.
> >(X86_TUNE_USE_LEAVE): Likewise.
> >(X86_TUNE_PUSH_MEMORY): Likewise.
> >(X86_TUNE_LCP_STALL): Likewise.
> >(X86_TUNE_USE_INCDEC): Likewise.
> >(X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
> >(X86_TUNE_OPT_AGU): Likewise.
> >(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Likewise.
> >(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
> >(X86_TUNE_USE_SAHF): Likewise.
> >(X86_TUNE_USE_BT): Likewise.
> >(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
> >(X86_TUNE_ONE_IF_CONV_INSN): Likewise.
> >(X86_TUNE_AVOID_MFENCE): Likewise.
> >(X86_TUNE_EXPAND_ABS): Likewise.
> >(X86_TUNE_USE_SIMODE_FIOP): Likewise.
> >(X86_TUNE_USE_FFREEP): Likewise.
> >(X86_TUNE_EXT_80387_CONSTANTS): Likewise.
> >(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
> &g

Re: [PATCH] [x86_64] Zhaoxin lujiazui enablement

2022-10-26 Thread Mayshao-oc


Hi Martin:
Thanks for your patch,  I comment the questions below.

> Hello.

> I noticed this patch set which is kind of related to 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107364.

> And I have a couple of questions:

>1) I noticed you drop AVX and F16C features for the newly added "lujiazui". 
>Why do you need it?
>  I would expect these features would be properly detected by cpuid?

Yes, these features could be detected by cpuid, and in respect of 
functionality, these features are ok, but in respect of performance, these 
features need further improvement, so we decide to drop it now, and add these 
features back when performance meet our expectation.

> 2) If you really need it, can you please test for me the attached patch? It 
> should come up
>  with a new function.

I have tested the patch, It's ok.

> 3) Have question about:

> else if (vendor == signature_CENTAUR_ebx && family < 0x07)
>cpu_model->__cpu_vendor = VENDOR_CENTAUR;
> else if (vendor == signature_SHANGHAI_ebx
>   || vendor == signature_CENTAUR_ebx)

> Are there any signature_CENTAUR_ebx models with family == 0x7 ?
> Similarly, are there any signature_SHANGHAI_ebx modes with family < 0x7 ?

Yes, both cases exist in our products.

> Thanks,
> Martin

BR
Mayshao


Re: [PATCH] [x86_64] Zhaoxin lujiazui enablement

2022-10-27 Thread Mayshao-oc




>>
>> Hi Martin:
>> Thanks for your patch,  I comment the questions below.

>Hi.

>:)

>>
>>> Hello.
>>
>>> I noticed this patch set which is kind of related to 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107364.
>>
>>> And I have a couple of questions:
>>
>>>1) I noticed you drop AVX and F16C features for the newly added "lujiazui". 
>>>Why do you need it?
>>>  I would expect these features would be properly detected by cpuid?
>>
>> Yes, these features could be detected by cpuid, and in respect of 
>> functionality, these features are ok, but in respect of performance, these 
>> features need further improvement, so we decide to drop it now, and add 
>> these features back when performance meet our expectation.

> I see. So theoretically you can increase costs of the corresponding insns and 
> that could be dropped now?
> But I'm not a costing expert.

I am new to gcc, and have lots of things to learn. About LTO and PGO, I have 
read some knowledge you and hubicka shared, and it helps me a lot, As a 
performance issue, it is a good idea to use cost model to solve, and disable 
avx entirely seems overkill. But cost model need to set the appropriate value 
of the cost, it's challenging to specify the number and more challenging to 
justify why we set that number. Our current approach have a pitfall to 
accommodate AVX intrinsic functions(eg: __mm256_loadu_pd), we could use -mavx 
to specify this explictly to overcome this.

>>
>>> 2) If you really need it, can you please test for me the attached patch? It 
>>> should come up
>>>  with a new function.
>>
>> I have tested the patch, It's ok.

> Good, I'm going to install it.

>>
>>> 3) Have question about:
>>
>>> else if (vendor == signature_CENTAUR_ebx && family < 0x07)
>>>cpu_model->__cpu_vendor = VENDOR_CENTAUR;
>>> else if (vendor == signature_SHANGHAI_ebx
>>>   || vendor == signature_CENTAUR_ebx)
>>
>>> Are there any signature_CENTAUR_ebx models with family == 0x7 ?
>>> Similarly, are there any signature_SHANGHAI_ebx modes with family < 0x7 ?
>>
>> Yes, both cases exist in our products.

> Good. Then we miss a CPU features detection for (vendor == 
> signature_CENTAUR_ebx && family < 0x07)
> aka https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107364. But it's not worth 
> it as it's a legacy hardware,
> right?

Yes, for legacy hardware, we need to keep it work correctly, but in respect of 
performance, we don't spend a lot of time to tune.

> Cheers,
> Martin

>>
>>> Thanks,
>> Martin
>>
>> BR
>> Mayshao



Re: [PATCH] [x86_64]: Zhaoxin lujiazui enablement

2022-05-17 Thread Mayshao-oc
> On Tue, May 17, 2022 at 5:15 AM mayshao  wrote:
>> Hi Uros:
>> This patch fix Zhaoxin CPU vendor ID detection problem and add 
>> zhaoxin "lujiazui" processor support.
>> Currently gcc can't recognize Zhaoxin CPU(vendor ID "CentaurHauls" 
>> and "Shanghai") if user use -march=native option, which is confusing for 
>> users.
>> This patch enables -march=native in zhaoxin family 7th processor and 
>> -march/-mtune=lujiazui, costs and tunning are set according to the 
>> characteristics of the processor.We add a new md file to describe lujiazui 
>> pipeline.
>> Testing:
>> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>> Ok for master?
>> Background:
>> Related Zhaoxin linux kernel patch can be found at:
>> https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bd...@zhaoxin.com/
>> Related Zhaoxin glibc patch can be found at:
>> https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193
>> gcc/ChangeLog:
> The entries below are suspiciously empty - please fill in the details.

Sorry for forgetting this. Update patch. Thanks.

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Detect
the specific type of Zhaoxin CPU, and return Zhaoxin CPU name.
(cpu_indicator_init): Handle Zhaoxin processors.
* common/config/i386/i386-common.cc: Add lujiazui.
* common/config/i386/i386-cpuinfo.h (enum processor_vendor): Add
VENDOR_ZHAOXIN.
(enum processor_types): Add ZHAOXIN_FAM7H.
(enum processor_subtypes): Add ZHAOXIN_FAM7H_LUJIAZUI.
* config.gcc: Add lujiazui.
* config/i386/cpuid.h (signature_SHANGHAI_ebx): Add
Signatures for zhaoxin
(signature_SHANGHAI_ecx): Ditto.
(signature_SHANGHAI_edx): Ditto.
* config/i386/driver-i386.cc (host_detect_local_cpu): Let
-march=native recognize lujiazui processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add lujiazui.
* config/i386/i386-options.cc (m_LUJIAZUI): New_definition.
* config/i386/i386.h (enum processor_type): Ditto.
* config/i386/i386.md: Add lujiazui.
* config/i386/x86-tune-costs.h (struct processor_costs): Add
lujiazui costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add lujiazui.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Add lujiazui tunnings.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_USE_INCDEC): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
* doc/extend.texi: Add details about lujiazui.
* doc/invoke.texi: Add details about lujiazui.
* config/i386/lujiazui.md: Introduce lujiazui cpu and include new md file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Test -arch=lujiauzi and -tune=lujiazui.
* g++.target/i386/mv32.C: Ditto.

>> * common/config/i386/cpuinfo.h (get_zhaoxin_cpu):
>> (cpu_indicator_init):
>> * common/config/i386/i386-common.cc:
>> * common/config/i386/i386-cpuinfo.h (enum processor_vendor):
>> (enum processor_types):
>> (enum processor_subtypes):
>> * config.gcc:
>> * config/i386/cpuid.h (signature_SHANGHAI_ebx):
>> (signature_SHANGHAI_ecx):
>> (signature_SHANGHAI_edx):
>> * config/i386/driver-i386.cc (host_detect_local_cpu):
>> * config/i386/i386-c.cc (ix86_target_macros_internal):
>> * config/i386/i386-options.cc (m_LUJIAZUI):
>> * config/i386/i386.h (enum processor_type):
>> * config/i386/i386.md:
>> * config/i386/x86-tune-costs.h (struct processor_costs):
>> * config/i386/x86-tune-sched.cc (ix86_issue_rate):
>> (ix86_adjust_cost):
>> * config/i386/x86-tun

答复: [PATCH] i386: correct division modeling in lujiazui.md

2022-12-20 Thread Mayshao-oc
>Ping. If there are any questions or concerns about the patch, please let me
>know: I'm interested in continuing this cleanup at least for older AMD models.
>
Thanks for your patch.
We are running benchmark on speccpu2017 to get the performance number, it takes 
some time. 
If we get the result , we will give feedback right away. 
BR 
Mayshao
>I noticed I had an extra line in my Changelog:
>
>>  (lua_sseicvt_si): Ditto.
>
>It got there accidentally and I will drop it.
>
>Alexander
>
>On Fri, 9 Dec 2022, Alexander Monakov wrote:
>
>> Model the divider in Lujiazui processors as a separate automaton to 
>> significantly reduce the overall model size. This should also result 
>> in improved accuracy, as pipe 0 should be able to accept new 
>> instructions while the divider is occupied.
>> 
>> It is unclear why integer divisions are modeled as if pipes 0-3 are 
>> all occupied. I've opted to keep a single-cycle reservation of all 
>> four pipes together, so GCC should continue trying to pack 
>> instructions around a division accordingly.
>> 
>> Currently top three symbols in insn-automata.o are:
>> 
>> 106102 r lujiazui_core_check
>> 106102 r lujiazui_core_transitions
>> 196123 r lujiazui_core_min_issue_delay
>> 
>> This patch shrinks all lujiazui tables to:
>> 
>> 3 r lujiazui_decoder_min_issue_delay
>> 20 r lujiazui_decoder_transitions
>> 32 r lujiazui_agu_min_issue_delay
>> 126 r lujiazui_agu_transitions
>> 304 r lujiazui_div_base
>> 352 r lujiazui_div_check
>> 352 r lujiazui_div_transitions
>> 1152 r lujiazui_core_min_issue_delay
>> 1592 r lujiazui_agu_translate
>> 1592 r lujiazui_core_translate
>> 1592 r lujiazui_decoder_translate
>> 1592 r lujiazui_div_translate
>> 3952 r lujiazui_div_min_issue_delay
>> 9216 r lujiazui_core_transitions
>> 
>> This continues the work on reducing i386 insn-automata.o size started 
>> with similar fixes for division and multiplication instructions in 
>> znver.md [1][2]. I plan to submit corresponding fixes for 
>> b[td]ver[123].md as well.
>> 
>> [1] 
>> https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f12
>> 15f5...@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
>> [2] 
>> https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonak
>> o...@ispras.ru/
>> 
>> gcc/ChangeLog:
>> 
>>  PR target/87832
>>  * config/i386/lujiazui.md (lujiazui_div): New automaton.
>>  (lua_div): New unit.
>>  (lua_idiv_qi): Correct unit in the reservation.
>>  (lua_idiv_qi_load): Ditto.
>>  (lua_idiv_hi): Ditto.
>>  (lua_idiv_hi_load): Ditto.
>>  (lua_idiv_si): Ditto.
>>  (lua_idiv_si_load): Ditto.
>>  (lua_idiv_di): Ditto.
>>  (lua_idiv_di_load): Ditto.
>>  (lua_fdiv_SF): Ditto.
>>  (lua_fdiv_SF_load): Ditto.
>>  (lua_fdiv_DF): Ditto.
>>  (lua_fdiv_DF_load): Ditto.
>>  (lua_fdiv_XF): Ditto.
>>  (lua_fdiv_XF_load): Ditto.
>>  (lua_ssediv_SF): Ditto.
>>  (lua_ssediv_load_SF): Ditto.
>>  (lua_ssediv_V4SF): Ditto.
>>  (lua_ssediv_load_V4SF): Ditto.
>>  (lua_ssediv_V8SF): Ditto.
>>  (lua_ssediv_load_V8SF): Ditto.
>>  (lua_ssediv_SD): Ditto.
>>  (lua_ssediv_load_SD): Ditto.
>>  (lua_ssediv_V2DF): Ditto.
>>  (lua_ssediv_load_V2DF): Ditto.
>>  (lua_ssediv_V4DF): Ditto.
>>  (lua_ssediv_load_V4DF): Ditto.
>>  (lua_sseicvt_si): Ditto.
>> ---
>>  gcc/config/i386/lujiazui.md | 58 
>> +++--
>>  1 file changed, 30 insertions(+), 28 deletions(-)
>> 
>> diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md 
>> index 9046c09f2..58a230c70 100644
>> --- a/gcc/config/i386/lujiazui.md
>> +++ b/gcc/config/i386/lujiazui.md
>> @@ -19,8 +19,8 @@
>>  
>>  ;; Scheduling for ZHAOXIN lujiazui processor.
>>  
>> -;; Modeling automatons for decoders, execution pipes and AGU pipes.
>> -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu")
>> +;; Modeling automatons for decoders, execution pipes, AGU pipes, and 
>> divider.
>> +(define_automaton 
>> +"lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div")
>>  
>>  ;; The rules for the decoder are simple:
>>  ;;  - an instruction with 1 uop can be decoded by any of the three @@ 
>> -55,6 +55,8 @@ (define_reservation "lua_decoder01" 
>> "lua_decoder0|lua_decoder1")  (de

Re: [PATCH] i386: correct division modeling in lujiazui.md

2022-12-29 Thread Mayshao-oc
>Ping. If there are any questions or concerns about the patch, please let me
>know: I'm interested in continuing this cleanup at least for older AMD models.
>
Hi Alexander:
According to the speccpu2017 benchmark result, the patch looks good in 
lujiazui. 
BR 
Mayshao
>I noticed I had an extra line in my Changelog:
>
>>  (lua_sseicvt_si): Ditto.
>
>It got there accidentally and I will drop it.
>
>Alexander
>
>On Fri, 9 Dec 2022, Alexander Monakov wrote:
>
>> Model the divider in Lujiazui processors as a separate automaton to 
>> significantly reduce the overall model size. This should also result 
>> in improved accuracy, as pipe 0 should be able to accept new 
>> instructions while the divider is occupied.
>> 
>> It is unclear why integer divisions are modeled as if pipes 0-3 are 
>> all occupied. I've opted to keep a single-cycle reservation of all 
>> four pipes together, so GCC should continue trying to pack 
>> instructions around a division accordingly.
>> 
>> Currently top three symbols in insn-automata.o are:
>> 
>> 106102 r lujiazui_core_check
>> 106102 r lujiazui_core_transitions
>> 196123 r lujiazui_core_min_issue_delay
>> 
>> This patch shrinks all lujiazui tables to:
>> 
>> 3 r lujiazui_decoder_min_issue_delay
>> 20 r lujiazui_decoder_transitions
>> 32 r lujiazui_agu_min_issue_delay
>> 126 r lujiazui_agu_transitions
>> 304 r lujiazui_div_base
>> 352 r lujiazui_div_check
>> 352 r lujiazui_div_transitions
>> 1152 r lujiazui_core_min_issue_delay
>> 1592 r lujiazui_agu_translate
>> 1592 r lujiazui_core_translate
>> 1592 r lujiazui_decoder_translate
>> 1592 r lujiazui_div_translate
>> 3952 r lujiazui_div_min_issue_delay
>> 9216 r lujiazui_core_transitions
>> 
>> This continues the work on reducing i386 insn-automata.o size started 
>> with similar fixes for division and multiplication instructions in 
>> znver.md [1][2]. I plan to submit corresponding fixes for 
>> b[td]ver[123].md as well.
>> 
>> [1] 
>> https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f12
>> 15f5...@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
>> [2] 
>> https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonak
>> o...@ispras.ru/
>> 
>> gcc/ChangeLog:
>> 
>>  PR target/87832
>>  * config/i386/lujiazui.md (lujiazui_div): New automaton.
>>  (lua_div): New unit.
>>  (lua_idiv_qi): Correct unit in the reservation.
>>  (lua_idiv_qi_load): Ditto.
>>  (lua_idiv_hi): Ditto.
>>  (lua_idiv_hi_load): Ditto.
>>  (lua_idiv_si): Ditto.
>>  (lua_idiv_si_load): Ditto.
>>  (lua_idiv_di): Ditto.
>>  (lua_idiv_di_load): Ditto.
>>  (lua_fdiv_SF): Ditto.
>>  (lua_fdiv_SF_load): Ditto.
>>  (lua_fdiv_DF): Ditto.
>>  (lua_fdiv_DF_load): Ditto.
>>  (lua_fdiv_XF): Ditto.
>>  (lua_fdiv_XF_load): Ditto.
>>  (lua_ssediv_SF): Ditto.
>>  (lua_ssediv_load_SF): Ditto.
>>  (lua_ssediv_V4SF): Ditto.
>>  (lua_ssediv_load_V4SF): Ditto.
>>  (lua_ssediv_V8SF): Ditto.
>>  (lua_ssediv_load_V8SF): Ditto.
>>  (lua_ssediv_SD): Ditto.
>>  (lua_ssediv_load_SD): Ditto.
>>  (lua_ssediv_V2DF): Ditto.
>>  (lua_ssediv_load_V2DF): Ditto.
>>  (lua_ssediv_V4DF): Ditto.
>>  (lua_ssediv_load_V4DF): Ditto.
>>  (lua_sseicvt_si): Ditto.
>> ---
>>  gcc/config/i386/lujiazui.md | 58 
>> +++--
>>  1 file changed, 30 insertions(+), 28 deletions(-)
>> 
>> diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md 
>> index 9046c09f2..58a230c70 100644
>> --- a/gcc/config/i386/lujiazui.md
>> +++ b/gcc/config/i386/lujiazui.md
>> @@ -19,8 +19,8 @@
>>  
>>  ;; Scheduling for ZHAOXIN lujiazui processor.
>>  
>> -;; Modeling automatons for decoders, execution pipes and AGU pipes.
>> -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu")
>> +;; Modeling automatons for decoders, execution pipes, AGU pipes, and 
>> divider.
>> +(define_automaton 
>> +"lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div")
>>  
>>  ;; The rules for the decoder are simple:
>>  ;;  - an instruction with 1 uop can be decoded by any of the three @@ 
>> -55,6 +55,8 @@ (define_reservation "lua_decoder01" 
>> "lua_decoder0|lua_decoder1")  (define_cpu_unit 
>> "lua_p0,lua_p1,lua_p2,lua_p3" "

[PATCH] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-11 Thread MayShao-oc
From: mayshao 

Hi all:
We reply in PR104688 that ZHAOXIN guarantees that 16-byte VMOVDQA on 
16-byte aligned address is atomic, if memory type of the address is WB. So 
there is no need to clear bit_AVX on ZHAOXIN CPUs.

Bootstrapped /regtested X86_64.

Ok for trunk?
BR
Mayshao

libatomic/ChangeLog:

PR target/104688
* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on ZHAOXIN CPUs.
---
 libatomic/config/x86/init.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..3740c88a936 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -34,20 +34,6 @@ __libat_feat1_init (void)
   unsigned int eax, ebx, ecx, edx;
   FEAT1_REGISTER = 0;
   __get_cpuid (1, &eax, &ebx, &ecx, &edx);
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-  == (bit_AVX | bit_CMPXCHG16B))
-{
-  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
-  unsigned int ecx2 = 0;
-  __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
-  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
-   FEAT1_REGISTER &= ~bit_AVX;
-}
-#endif
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;
-- 
2.27.0



[PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread MayShao-oc
From: mayshao 

Hi Jakub:

Thanks for your review,We should just amend this to handle Zhaoxin.

Bootstrapped /regtested X86_64.

Ok for trunk?
BR
Mayshao

libatomic/ChangeLog:

PR target/104688
* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on ZHAOXIN CPUs.
---
 libatomic/config/x86/init.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..0d6864909bb 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -39,12 +39,15 @@ __libat_feat1_init (void)
   == (bit_AVX | bit_CMPXCHG16B))
 {
   /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
-  unsigned int ecx2 = 0;
+is atomic, and AMD is going to do something similar soon. Zhaoxin also
+guarantees this. We don't have a guarantee from vendors of other CPUs 
+with AVX,like VIA.  */
+  unsigned int ecx2 = 0, family = 0;
+  family = (eax >> 8) & 0x0f;
   __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
-  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx
+  && !(ecx2 == signature_CENTAUR_ecx && family > 0x6)
+  && ecx2 != signature_SHANGHAI_ecx)
FEAT1_REGISTER &= ~bit_AVX;
 }
 #endif
-- 
2.27.0



Re: [PATCH] invoke.texi: Clarify -march=lujiazui

2024-05-23 Thread mayshao-oc

Hi Jakub:

  I think the modified lujiazui description is what actually 
happens,thanks.


BR
Mayshao



[这封邮件来自外部发件人 谨防风险]

Hi!

Yesterday I was searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation.  So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

Tested on x86_64, ok for trunk?

2024-04-11  Jakub Jelinek  

 * doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
 AVX and F16C, -march=lujiazui actually doesn't enable those.

--- gcc/doc/invoke.texi.jj  2024-04-11 09:26:01.156865894 +0200
+++ gcc/doc/invoke.texi 2024-04-11 10:47:53.457582922 +0200
@@ -34696,8 +34696,10 @@ instruction set support.

  @item lujiazui
  ZHAOXIN lujiazui CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
-SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
-ABM, BMI, BMI2, F16C, FXSR, RDSEED instruction set support.
+SSE4.2, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, FXSR, RDSEED instruction set support.  While the CPUs
+do support AVX and F16C, these aren't enabled by @code{-march=lujiazui}
+for performance reasons.

  @item yongfeng
  ZHAOXIN yongfeng CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,

 Jakub



Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-06-18 Thread mayshao-oc




On 5/28/24 14:15, Uros Bizjak wrote:




On Mon, May 27, 2024 at 10:33 AM MayShao  wrote:


From: mayshao 

Hi all:
 This patch enables -march/-mtune=shijidadao, costs and tunings are set 
according to the characteristics of the processor.

 Bootstrapped /regtested X86_64.

 Ok for trunk?


OK.

Thanks,
Uros.


Thanks for your review, please help me commit.

BR
Mayshao




BR
Mayshao
gcc/ChangeLog:

 * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
 * common/config/i386/i386-common.cc: Add shijidadao.
 * common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
 Add ZHAOXIN_FAM7H_SHIJIDADAO.
 * config.gcc: Add shijidadao.
 * config/i386/driver-i386.cc (host_detect_local_cpu):
 Let -march=native recognize shijidadao processors.
 * config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
 * config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
 (m_SHIJIDADAO): New definition.
 * config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
 * config/i386/x86-tune-costs.h (struct processor_costs):
 Add shijidadao_cost.
 * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
 (ix86_adjust_cost): Ditto.
 * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add 
m_SHIJIDADAO.
 (X86_TUNE_USE_GATHER_4PARTS): Ditto.
 (X86_TUNE_USE_GATHER_8PARTS): Ditto.
 (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
 * doc/extend.texi: Add details about shijidadao.
 * doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

 * g++.target/i386/mv32.C: Handle new -march
 * gcc.target/i386/funcspec-56.inc: Ditto.
---
  gcc/common/config/i386/cpuinfo.h  |   8 +-
  gcc/common/config/i386/i386-common.cc |   8 +-
  gcc/common/config/i386/i386-cpuinfo.h |   1 +
  gcc/config.gcc|  14 ++-
  gcc/config/i386/driver-i386.cc|  11 +-
  gcc/config/i386/i386-c.cc |   7 ++
  gcc/config/i386/i386-options.cc   |   4 +-
  gcc/config/i386/i386.h|   1 +
  gcc/config/i386/x86-tune-costs.h  | 116 ++
  gcc/config/i386/x86-tune-sched.cc |   2 +
  gcc/config/i386/x86-tune.def  |   8 +-
  gcc/doc/extend.texi   |   3 +
  gcc/doc/invoke.texi   |   6 +
  gcc/testsuite/g++.target/i386/mv32.C  |   6 +
  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
  15 files changed, 183 insertions(+), 14 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 4610bf6d6a4..936039725ab 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
   reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
   cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
 }
- else if (model >= 0x5b)
+ else if (model == 0x5b)
 {
   cpu = "yongfeng";
   CHECK___builtin_cpu_is ("yongfeng");
   cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
 }
+ else if (model >= 0x6b)
+   {
+ cpu = "shijidadao";
+ CHECK___builtin_cpu_is ("shijidadao");
+ cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
+   }
break;
  default:
break;
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 895e5fa662d..eb3f94c529c 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2066,6 +2066,7 @@ const char *const processor_names[] =
"intel",
"lujiazui",
"yongfeng",
+  "shijidadao",
"geode",
"k6",
"athlon",
@@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
| PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
{"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
 PTA_LUJIAZUI,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
{"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
 PTA_YONGFENG,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
+  {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
+   PTA_YONGFENG,
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
{"k8", PROCESSOR_K8, CPU_K8,
  PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
| PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
inde

[PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-04 Thread MayShao-oc
Hi all:
This patch add -malign-tight-loops flag to control
pass_align_tight_loops.
The motivation is that pass_align_tight_loops may cause performance
regression in nested loops.

The example code as follows:

#define ITER 2
#define ITER_O 10

int i, j,k;
int array[ITER];

void loop()
{
  int i;
  for(k = 0; k < ITER_O; k++)
  for(j = 0; j < ITER; j++)
  for(i = 0; i < ITER; i++)
  {
array[i] += j;
array[i] += i;
array[i] += 2*j;
array[i] += 2*i;
  }
}

When I compile it with gcc -O1 loop.c, the output assembly as follows.
It is not optimal, because of too many nops insert in the outer loop.

00400540 :
  400540:   48 83 ec 08 sub$0x8,%rsp
  400544:   bf 0a 00 00 00  mov$0xa,%edi
  400549:   b9 00 00 00 00  mov$0x0,%ecx
  40054e:   8d 34 09lea(%rcx,%rcx,1),%esi
  400551:   b8 00 00 00 00  mov$0x0,%eax
  400556:   66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
  40055d:   00 00 00 00
  400561:   66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
  400568:   00 00 00 00
  40056c:   66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
  400573:   00 00 00 00
  400577:   66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
  40057e:   00 00
  400580:   89 ca   mov%ecx,%edx
  400582:   03 14 85 60 10 60 00add0x601060(,%rax,4),%edx
  400589:   01 c2   add%eax,%edx
  40058b:   01 f2   add%esi,%edx
  40058d:   8d 14 42lea(%rdx,%rax,2),%edx
  400590:   89 14 85 60 10 60 00mov%edx,0x601060(,%rax,4)
  400597:   48 83 c0 01 add$0x1,%rax
  40059b:   48 3d 20 4e 00 00   cmp$0x4e20,%rax
  4005a1:   75 dd   jne400580 

   I benchmark this program in the intel Xeon, and find the optimization may 
cause a 40% performance regression
(6.6B cycles VS 9.3B cycles).
   So I propose to add -malign-tight-loops flag to control tight loop 
optimization to avoid this, we could disalbe this optimization by default.
   Bootstrapped X86_64.
   Ok for trunk?

BR
Mayshao

gcc/ChangeLog:

* config/i386/i386-features.cc (ix86_align_tight_loops): New flag.
* config/i386/i386.opt (malign-tight-loops): New option.
* doc/invoke.texi (-malign-tight-loops): Document.
---
 gcc/config/i386/i386-features.cc | 4 +++-
 gcc/config/i386/i386.opt | 4 
 gcc/doc/invoke.texi  | 7 ++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index e2e85212a4f..f9546e00b07 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3620,7 +3620,9 @@ public:
   /* opt_pass methods: */
   bool gate (function *) final override
 {
-  return optimize && optimize_function_for_speed_p (cfun);
+  return ix86_align_tight_loops
+  && optimize
+  && optimize_function_for_speed_p (cfun);
 }
 
   unsigned int execute (function *) final override
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 64c295d344c..ec41de192bc 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1266,6 +1266,10 @@ mlam=
 Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type) Init(lam_none)
 -mlam=[none|u48|u57] Instrument meta data position in user data pointers.
 
+malign-tight-loops
+Target Var(ix86_align_tight_loops) Init(0) Optimization
+Enable align tight loops.
+
 Enum
 Name(lam_type) Type(enum lam_type) UnknownError(unknown lam type %qs)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 07920e07b4d..9ec1e1f0095 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1510,7 +1510,7 @@ See RS/6000 and PowerPC Options.
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice}
 -mindirect-branch-register -mharden-sls=@var{choice}
 -mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access
--munroll-only-small-loops -mlam=@var{choice}}
+-munroll-only-small-loops -mlam=@var{choice} -malign-tight-loops}
 
 @emph{x86 Windows Options}
 
@@ -36530,6 +36530,11 @@ LAM(linear-address masking) allows special bits in the 
pointer to be used
 for metadata. The default is @samp{none}. With @samp{u48}, pointer bits in
 positions 62:48 can be used for metadata; With @samp{u57}, pointer bits in
 positions 62:57 can be used for metadata.
+
+@opindex malign-tight-loops
+@opindex mno-align-tight-loops
+@item -malign-tight-loops
+Controls tight loop alignment optimization.
 @end table
 
 @node x86 Windows Options
-- 
2.27.0



Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Mayshao-oc
> > On Thu, Nov 7, 2024 at 10:29?AM MayShao-oc  wrote:
> >
> > Hi all:
> >For zhaoxin, I find no improvement when enable pass_align_tight_loops,
> > and have performance drop in some cases.
> >This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin.
> >
> >Bootstrapped X86_64.
> >Ok for trunk?
> > BR
> > Mayshao
> > gcc/ChangeLog:
> >
> > * config/i386/i386-features.cc (TARGET_ALIGN_TIGHT_LOOPS):
> > default true in all processors except for zhaoxin.
> > * config/i386/i386.h (TARGET_ALIGN_TIGHT_LOOPS): New Macro.
> > * config/i386/x86-tune.def (X86_TUNE_ALIGN_TIGHT_LOOPS):
> > New tune
> > ---
> >  gcc/config/i386/i386-features.cc | 4 +++-
> >  gcc/config/i386/i386.h   | 3 +++
> >  gcc/config/i386/x86-tune.def | 4 
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index e2e85212a4f..d9fd92964fe 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -3620,7 +3620,9 @@ public:
> >/* opt_pass methods: */
> >bool gate (function *) final override
> >  {
> > -  return optimize && optimize_function_for_speed_p (cfun);
> > +  return TARGET_ALIGN_TIGHT_LOOPS
> > +&& optimize
> > +&& optimize_function_for_speed_p (cfun);
> >  }
> >
> >unsigned int execute (function *) final override
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 2dcd8803a08..7f9010246c2 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -466,6 +466,9 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
> >  #define TARGET_USE_RCR ix86_tune_features[X86_TUNE_USE_RCR]
> >  #define TARGET_SSE_MOVCC_USE_BLENDV \
> > ix86_tune_features[X86_TUNE_SSE_MOVCC_USE_BLENDV]
> > +#define TARGET_ALIGN_TIGHT_LOOPS \
> > +ix86_tune_features[X86_TUNE_ALIGN_TIGHT_LOOPS]
> > +
> >
> >  /* Feature tests against the various architecture variations.  */
> >  enum ix86_arch_indices {
> > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> > index 6ebb2fd3414..bd4fa8b3eee 100644
> > --- a/gcc/config/i386/x86-tune.def
> > +++ b/gcc/config/i386/x86-tune.def
> > @@ -542,6 +542,10 @@ DEF_TUNE (X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD,
> >  DEF_TUNE (X86_TUNE_SSE_MOVCC_USE_BLENDV,
> >   "sse_movcc_use_blendv", ~m_CORE_ATOM)
> >
> > +/* X86_TUNE_ALIGN_TIGHT_LOOPS: if false, tight loops are not aligned. */
> > +DEF_TUNE (X86_TUNE_ALIGN_TIGHT_LOOPS, "align_tight_loops",
> > +~(m_ZHAOXIN))
> Please also add ~(m_ZHAOXIN | m_CASCADELAKE | m_SKYLAKE_AVX512))
> And could you put it under the section of
> 
>  
> /*/
> -/* Branch predictor tuning  
> */
> +/* Branch predictor and The Front-end tuning
>   */
>  
> /*/
> > +
> >  
> > /*/
> >  /* AVX instruction selection tuning (some of SSE flags affects AVX, too)   
> >   */
> >  
> > /*/
> > --
> > 2.27.0
> >
> 
> 
> --
> BR,
> Hongtao

Ok

BR
Mayshao


0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v1.patch
Description: 0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v1.patch


[PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread MayShao-oc
Hi all:
   For zhaoxin, I find no improvement when enable pass_align_tight_loops,
and have performance drop in some cases.
   This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin.

   Bootstrapped X86_64.
   Ok for trunk?
BR
Mayshao
gcc/ChangeLog:

* config/i386/i386-features.cc (TARGET_ALIGN_TIGHT_LOOPS):
default true in all processors except for zhaoxin.
* config/i386/i386.h (TARGET_ALIGN_TIGHT_LOOPS): New Macro.
* config/i386/x86-tune.def (X86_TUNE_ALIGN_TIGHT_LOOPS):
New tune
---
 gcc/config/i386/i386-features.cc | 4 +++-
 gcc/config/i386/i386.h   | 3 +++
 gcc/config/i386/x86-tune.def | 4 
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index e2e85212a4f..d9fd92964fe 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3620,7 +3620,9 @@ public:
   /* opt_pass methods: */
   bool gate (function *) final override
 {
-  return optimize && optimize_function_for_speed_p (cfun);
+  return TARGET_ALIGN_TIGHT_LOOPS
+&& optimize
+&& optimize_function_for_speed_p (cfun);
 }
 
   unsigned int execute (function *) final override
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2dcd8803a08..7f9010246c2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -466,6 +466,9 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 #define TARGET_USE_RCR ix86_tune_features[X86_TUNE_USE_RCR]
 #define TARGET_SSE_MOVCC_USE_BLENDV \
ix86_tune_features[X86_TUNE_SSE_MOVCC_USE_BLENDV]
+#define TARGET_ALIGN_TIGHT_LOOPS \
+ix86_tune_features[X86_TUNE_ALIGN_TIGHT_LOOPS]
+
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 6ebb2fd3414..bd4fa8b3eee 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -542,6 +542,10 @@ DEF_TUNE (X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD,
 DEF_TUNE (X86_TUNE_SSE_MOVCC_USE_BLENDV,
  "sse_movcc_use_blendv", ~m_CORE_ATOM)
 
+/* X86_TUNE_ALIGN_TIGHT_LOOPS: if false, tight loops are not aligned. */
+DEF_TUNE (X86_TUNE_ALIGN_TIGHT_LOOPS, "align_tight_loops",
+~(m_ZHAOXIN))
+
 /*/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too) */
 /*/
-- 
2.27.0



Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-07 Thread Mayshao-oc
> > -Original Message-
> > From: Xi Ruoyao 
> > Sent: Thursday, November 7, 2024 1:12 PM
> > To: Liu, Hongtao ; Mayshao-oc  > o...@zhaoxin.com>; Hongtao Liu 
> > Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com;
> > richard.guent...@gmail.com; Tim Hu(WH-RD) ; Silvia
> > Zhao(BJ-RD) ; Louis Qi(BJ-RD)
> > ; Cobe Chen(BJ-RD) 
> > Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for
> > pass_align_tight_loops
> > On Thu, 2024-11-07 at 04:58 +, Liu, Hongtao wrote:
> > > > > > Hi all:
> > > > > > For zhaoxin, I find no improvement when enable
> > > > > > pass_align_tight_loops, and have performance drop in some cases.
> > > > > > This patch add a new tunable to bypass
> > > > > > pass_align_tight_loops in
> > > > zhaoxin.
> > > > > >
> > > > > > Bootstrapped X86_64.
> > > > > > Ok for trunk?
> > > LGTM.
> >
> > I'd suggest to add the reference to PR 117438 into the subject and 
> > ChangeLog.
> Yes, thanks.
Add PR 117438 into the subject and ChangeLog.
> >
> > --
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
BR
Mayshao


0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v2.patch
Description: 0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v2.patch


Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-07 Thread Mayshao-oc
> On Fri, Nov 8, 2024 at 10:21 AM Mayshao-oc  wrote:
> > > > -Original Message-
> > > > From: Xi Ruoyao 
> > > > Sent: Thursday, November 7, 2024 1:12 PM
> > > > To: Liu, Hongtao ; Mayshao-oc  > > > o...@zhaoxin.com>; Hongtao Liu 
> > > > Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com;
> > > > richard.guent...@gmail.com; Tim Hu(WH-RD) ; Silvia
> > > > Zhao(BJ-RD) ; Louis Qi(BJ-RD)
> > > > ; Cobe Chen(BJ-RD) 
> > > > Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for
> > > > pass_align_tight_loops
> > > > On Thu, 2024-11-07 at 04:58 +, Liu, Hongtao wrote:
> > > > > > > > Hi all:
> > > > > > > > For zhaoxin, I find no improvement when enable
> > > > > > > > pass_align_tight_loops, and have performance drop in some cases.
> > > > > > > > This patch add a new tunable to bypass
> > > > > > > > pass_align_tight_loops in
> > > > > > zhaoxin.
> > > > > > > >
> > > > > > > > Bootstrapped X86_64.
> > > > > > > > Ok for trunk?
> > > > > LGTM.
> > > >
> > > > I'd suggest to add the reference to PR 117438 into the subject and 
> > > > ChangeLog.
> > > Yes, thanks.
> > Add PR 117438 into the subject and ChangeLog.
> PR target/117438
> Others LGTM.
Update this in ChangeLog.
I should report the PR in bugzilla in target category in the first place.
Thanks.
> > > >
> > > > --
> > > > Xi Ruoyao 
> > > > School of Aerospace Science and Technology, Xidian University
> > BR
> > Mayshao
> 
> 
> 
> --
> BR,
> Hongtao
BR
Mayshao


0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v3.patch
Description: 0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v3.patch


Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-19 Thread Mayshao-oc
> On Fri, Nov 8, 2024 at 10:21 AM Mayshao-oc  wrote:
> >
> > > > -Original Message-
> > > > From: Xi Ruoyao 
> > > > Sent: Thursday, November 7, 2024 1:12 PM
> > > > To: Liu, Hongtao ; Mayshao-oc  > > > o...@zhaoxin.com>; Hongtao Liu 
> > > > Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com; 
> > > > richard.guent...@gmail.com; Tim Hu(WH-RD) ; 
> > > > Silvia
> > > > Zhao(BJ-RD) ; Louis Qi(BJ-RD) 
> > > > ; Cobe Chen(BJ-RD) 
> > > > Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for 
> > > > pass_align_tight_loops On Thu, 2024-11-07 at 04:58 +, Liu, 
> > > > Hongtao wrote:
> > > > > > > > Hi all:
> > > > > > > > For zhaoxin, I find no improvement when enable 
> > > > > > > > pass_align_tight_loops, and have performance drop in some cases.
> > > > > > > > This patch add a new tunable to bypass 
> > > > > > > > pass_align_tight_loops in
> > > > > > zhaoxin.
> > > > > > > >
> > > > > > > > Bootstrapped X86_64.
> > > > > > > > Ok for trunk?
> > > > > LGTM.
> > > >
> > > > I'd suggest to add the reference to PR 117438 into the subject and 
> > > > ChangeLog.
> > > Yes, thanks.
> > Add PR 117438 into the subject and ChangeLog.
> PR target/117438
> Others LGTM.
> > > >
> > > > --
> > > > Xi Ruoyao 
> > > > School of Aerospace Science and Technology, Xidian University
> > BR
> > Mayshao
> 
> 
> 
> --
> BR,
> Hongtao

Hi Hongtao:

  It seems no further comments. Could you please help me commit this patch?

BR
Mayshao



0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v3.patch
Description: 0001-x86_64-Add-microarchtecture-tunable-for-pass_align_v3.patch