[PATCH v6] RISC-V:Optimize the MASK opt generation

2023-09-12 Thread Feng Wang
New patch add some comments and update docs for this new usage. --- Accoring to Kito's advice, using "MASK(name) Var(other_flag_name)" to generate MASK and TARGET MACRO automatically. This patch improve the MACRO generation of MASK_* and TARGET_*. Due to the more and more riscv extensions are added

Re: [PATCH] LoongArch: Fix lo_sum rtx cost

2023-09-16 Thread WANG Xuerui
Hi, On 9/16/23 17:16, mengqinggang wrote: The cost of lo_sum rtx for addi.d instruction my be a very big number if computed by common function. It may cause some symbols saving to stack and loading from stack if there no enough registers during loop optimization. Thanks for the patch! It seems

[PATCH] RISC-V: Support simplifying x/(-1) to neg for vector.

2023-09-19 Thread yanzhang . wang
From: Yanzhang Wang gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): support simplifying vector int not only scalar int. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/simplify-vdiv.c: New test. Signed-off-by: Yanzhang Wang

[PATCH v2 00/13] Support Intel APX EGPR

2023-09-22 Thread Hongyu Wang
h previous constraints. 3. Support constraint mapping for all gpr related common constraints in inline asm. Bootstrapped/regtested x86_64-linux-gnu. Ok for trunk? Hongyu Wang (2): [APX EGPR] middle-end: Add index_reg_class with insn argument. [APX EGPR] Handle GPR16 only vector move i

[PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.

2023-09-22 Thread Hongyu Wang
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn. Add index_reg_class with insn argument for lra/reload usage. gcc/ChangeLog: * addresses.h (index_reg_class): New wrapper function like base_reg_class. * doc/tm.texi: Document INSN_INDEX_REG_CLASS.

[PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR

2023-09-22 Thread Hongyu Wang
traint. (jp): Likewise for "p" constraint. * config/i386/i386.h (enum reg_class): Add new reg class GENERAL_GPR16. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/constraints.md | 59 +- gcc/config/i386/i386.h

[PATCH 07/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-22 Thread Hongyu Wang
s to EGPR prohibited constraints. (ix86_md_asm_adjust): Calls map_egpr_constraints. * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-inline-gpr-norex2.c: New test. Co-authored-by: Hongyu Wang Co-authored-by: Hon

[PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-09-22 Thread Hongyu Wang
Wang Co-authored-by: Hongtao Liu --- gcc/common/config/i386/cpuinfo.h | 12 +++- gcc/common/config/i386/i386-common.cc | 17 + gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h| 1 + gcc/config/i386/cpuid.h | 1 + gcc

[PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-22 Thread Hongyu Wang
): Ditto. * reload1.cc (maybe_fix_stack_asms): Ditto. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/addresses.h| 19 +++ gcc/doc/tm.texi| 14 ++ gcc/doc/tm.texi.in | 14 ++ gcc/lra-constraints.cc | 15

[PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-09-22 Thread Hongyu Wang
): Likewise. (aesimc): Likewise. (aeskeygenassist): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic tests. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/i386-protos.h

[PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers

2023-09-22 Thread Hongyu Wang
-names.c: New test. * gcc.target/i386/apx-spill_to_egprs-1.c: Likewise. * gcc.target/i386/apx-interrupt-1.c: Likewise. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/i386-protos.h | 1 + gcc/config/i386/i386.cc

[PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)

2023-09-22 Thread Hongyu Wang
t): Likewise. (pclmulqdq): Likewise. (vgf2p8affineinvqb_): Likewise. (vgf2p8affineqb_): Likewise. (vgf2p8mulb_): Likewise. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/i386.md | 42 +++--- gcc/config/i386/mmx.md | 143 +

[PATCH 06/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class.

2023-09-22 Thread Hongyu Wang
. (INSN_INDEX_REG_CLASS): Likewise. (enum reg_class): Add INDEX_GPR16. (GENERAL_GPR16_REGNO_P): Define. * config/i386/i386.md (gpr32): New attribute. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/i386-protos.h | 3 ++ gcc/config/i386/i386.cc

[PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5)

2023-09-22 Thread Hongyu Wang
intrinsic tests. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/sse.md| 73 .../i386/apx-legacy-insn-check-norex2.c | 106 ++ 2 files changed, 155 insertions(+), 24 deletions(-) diff --git a/gcc/config/i386

[PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-09-22 Thread Hongyu Wang
For vector move insns like vmovdqa/vmovdqu, their evex counterparts requrire explicit suffix 64/32/16/8. The usage of these instruction are prohibited under AVX10_1 or AVX512F, so for we select vmovaps/vmovups for vector load/store insns that contains EGPR if ther is no AVX512VL, and keep the origi

[PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5)

2023-09-22 Thread Hongyu Wang
et its constraint to jm and set attr_gpr32 to 0. (vec_set_lo_): Likewise. (vec_set_lo_): Likewise for SF/SI modes. (vec_set_hi_): Likewise. (vec_set_hi_): Likewise for SF/SI modes. (vec_set_hi_): Likewise. (vec_set_lo_): Likewise. (avx2_set_hi_v32qi

[PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-09-22 Thread Hongyu Wang
: * lib/target-supports.exp: Add apxf check. * gcc.target/i386/apx-legacy-insn-check-norex2.c: New test. * gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler test. Co-authored-by: Hongyu Wang Co-authored-by: Hongtao Liu --- gcc/config/i386/i386.md

[PATCH 2/3 v2] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-06-27 Thread Feng Wang
v2: Rebase. Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic functions are added by this patch. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f): Add 'Zvfbfmin' intrinsic in bases. (class vfwcvtbf16_f): Ditto.

[PATCH 3/3 v2] RISC-V: Add md files for vector BFloat16

2024-06-27 Thread Feng Wang
v2:Rebase. Accroding to the BFloat16 spec, some vector iterators and new pattern are added in md files. gcc/ChangeLog: * config/riscv/riscv.md: Add new insn name for vector BFloat16. * config/riscv/vector-iterators.md: Add some iterators for vector BFloat16. * config/risc

[PATCH 1/3 v2] RISC-V: Add vector type of BFloat16 format

2024-06-27 Thread Feng Wang
v2: Rebase. The vector type of BFloat16 format is added in this patch, subsequent extensions to zvfbfmin and zvfwma need to be based on this patch. gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (bfloat16_type): Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_I

[PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-01 Thread Hongyu Wang
Hi, According to APX spec, the pushp/popp pairs should be matched, otherwise the PPX hint cannot take effect and cause performance loss. In the ix86_expand_epilogue, there are several optimizations that may cause the epilogue using mov to restore the regs. Check if PPX applied and prevent usage o

Re: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-02 Thread Hongyu Wang
apx spec, the mismatched pushp/popp pair does confused the fast-forwarding logic and turns off the PPX optimization. We just need to make sure every pushp for a certain reg has corresponding popp for that reg. Richard Biener 于2024年7月2日周二 16:18写道: > > On Tue, Jul 2, 2024 at 5:24 AM Hongyu Wan

[PATCH] [APX NF] Add a pass to convert legacy insn to NF insns

2024-07-09 Thread Hongyu Wang
Hi, For APX ccmp, current infrastructure will always generate cstore for the ccmp flag user, like cmpe%rcx, %r8 ccmpnel %rax, %rbx seta%dil add %rcx, %r9 add %r9, %rdx testb %dil, %dil je .L2 For such case, the legacy

[PATCH 1/3 v3] RISC-V: Add vector type of BFloat16 format

2024-07-11 Thread Feng Wang
v3: Rebase v2: Rebase The vector type of BFloat16 format is added in this patch, subsequent extensions to zvfbfmin and zvfwma need to be based on this patch. Signed-off-by: Feng Wang gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (bfloat16_type): Generate bf16

[PATCH 2/3 v3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-07-11 Thread Feng Wang
v3: Modify warning message in riscv.cc v2: Rebase Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic functions are added by this patch. Signed-off-by: Feng Wang gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16

[PATCH 3/3 v3] RISC-V: Add md files for vector BFloat16

2024-07-11 Thread Feng Wang
V3: Add Bfloat16 vector insn in generic-vector-ooo.md v2: Rebase Accroding to the BFloat16 spec, some vector iterators and new pattern are added in md files. Signed-off-by: Feng Wang gcc/ChangeLog: * config/riscv/generic-vector-ooo.md: Add def_insn_reservation for vector BFloat16

[PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-13 Thread Hongyu Wang
Hi, According to the instruction spec of AVX512BF16, the convert from float to BF16 is not a simple truncation. It has special handling for denormal/nan, even for normal float it will add an extra bias according to the least significant bit for bf number. This means we cannot use the vcvtne2ps2bf1

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongyu Wang
> Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? We can still deal with BFmode permutation the same way as HFmode, so the change in ix86_vectorize_vec_perm_const can be preserved. Hongtao Liu 于2024年7月15日周一 09:40写道: > > On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wa

[PATCH 0/3] Support Intel APX CCMP

2024-05-15 Thread Hongyu Wang
html Hongyu Wang (3): [APX CCMP] Support APX CCMP [APX CCMP] Adjust startegy for selecting ccmp candidates [APX CCMP] Support ccmp for float compare gcc/ccmp.cc| 12 +- gcc/config/i386/i386-expand.cc | 164 + gcc/config/i386/

[PATCH 3/3] [APX CCMP] Support ccmp for float compare

2024-05-15 Thread Hongyu Wang
The ccmp insn itself doesn't support fp compare, but x86 has fp comi insn that changes EFLAG which can be the scc input to ccmp. Allow scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD compare which can not be identified in ccmp. gcc/ChangeLog: * config/i386/i386-expand.cc

[PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-15 Thread Hongyu Wang
For general ccmp scenario, the tree sequence is like _1 = (a < b) _2 = (c < d) _3 = _1 & _2 current ccmp expanding will try to swap compare order for _1 and _2, compare the cost/cost2 between compare _1 and _2 first, then return the sequence with lower cost. For x86 ccmp, we don't support FP com

[PATCH 1/3] [APX CCMP] Support APX CCMP

2024-05-15 Thread Hongyu Wang
APX CCMP feature implements conditional compare which executes compare when EFLAGS matches certain condition. CCMP introduces default flags value (dfv), when conditional compare does not execute, it will directly set the flags according to dfv. The instruction goes like ccmpeq {dfv=sf,of,cf,zf}

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-15 Thread Hongyu Wang
t cmp supports but ccmp not, so ret/ret2 will all be valid when comparing cost. Thanks in advance. Hongyu Wang 于2024年5月15日周三 16:22写道: > > For general ccmp scenario, the tree sequence is like > > _1 = (a < b) > _2 = (c < d) > _3 = _1 & _2 > > current ccmp expandin

[PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-15 Thread Hongyu Wang
Hi, In ix86_override_options_after_change, calls to ix86_default_align and ix86_recompute_optlev_based_flags will cause mismatched target opt_set when doing cl_optimization_restore. Move them back to ix86_option_override_internal to solve the issue. Bootstrapped & regtested on x86_64-pc-linux-gnu

Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-16 Thread Hongyu Wang
Richard Biener 于2024年5月16日周四 15:05写道: > > On Thu, May 16, 2024 at 8:25 AM Hongyu Wang wrote: > > > > Hi, > > > > In ix86_override_options_after_change, calls to ix86_default_align > > and ix86_recompute_optlev_based_flags will cause mism

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-23 Thread Hongyu Wang
Gently ping for this :) Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can help to review this part? Thanks. Hongyu Wang 于2024年5月15日周三 16:25写道: > > CC'd Richard for ccmp part as previously it is added only for aarch64. > The original logic will not interr

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-29 Thread Hongyu Wang
Gently ping :) Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can help to review this part? Thanks. Hongyu Wang 于2024年5月23日周四 16:27写道: > > Gently ping for this :) > Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can > help to review this pa

[PATCH v2] RISC-V: Add auto-vect pattern for vector rotate shift

2024-08-07 Thread Feng Wang
/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test. Signed-off-by: Feng Wang --- gcc/config/riscv/autovec.md | 16

[PATCH] [APX] Adjust target-support check [PR 115341]

2024-06-05 Thread Hongyu Wang
Current target apxf check does not specify sub-features that assembler supports, so the check with older binutils will fail at assemble stage for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check for latest apx subfeatures. Bootstrapped & regtested on x86-64-pc-linux-gnu with bin

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-06-06 Thread Hongyu Wang
ns first. The costs are not + meaningful for failed expansions. */ + + if (ret2 && (!ret || cost2 < cost1)) { *prep_seq = prep_seq_2; *gen_seq = gen_seq_2; -- 2.31.1 Richard Sandiford 于2024年6月5日周三 17:21写道: > > Hongyu Wang writes: > > CC'd R

[PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-11 Thread Hongyu Wang
Hi, For CTEST, we don't have conditional AND so there's no optimization opportunity to write a new ctest pattern. Emit ctest when ccmp did comparison to const 0 to save bytes. Bootstrapped & regtested under x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (@ccmp)

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Hongyu Wang
Thanks for the advice, updated patch in attachment. Bootstrapped/regtested on x86-64-pc-linux-gnu. Ok for trunk? Uros Bizjak 于2024年6月12日周三 18:12写道: > > On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak wrote: > > > > On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang wro

[PATCH] Add targetm.have_ccmp hook [PR115370]

2024-06-12 Thread Hongyu Wang
Hi, In cfgexpand, there is an optimization for branch which tests targetm.gen_ccmp_first == NULL. However for target like x86-64, the hook was implemented but it does not indicate that ccmp was enabled. Add a new target hook TARGET_HAVE_CCMP and replace the middle-end check for the existance of ge

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Hongyu Wang
> Perhaps the constraint can be slightly optimized to avoid repeating > (,) pairs. > > ",m," > "C ,," Yes, will check-in with this change. Thanks! Uros Bizjak 于2024年6月13日周四 14:06写道: > > On Thu, Jun 13, 2024 at 3:44 AM Hongyu Wang wrote: > &

Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-06-13 Thread Hongyu Wang
Sorry for breaking the original logic, and very appreciate for your patch!! It does makes the logic more clear on top of opts and opts_set. I think the function name can be like ix86_unroll_flag_adjust instead of ix86_override_options_after_change_1, like the previous 2 functions which declares th

Re: [PATCH] Add targetm.have_ccmp hook [PR115370]

2024-06-13 Thread Hongyu Wang
Thanks, this it the patch I'm going to check-in. Richard Sandiford 于2024年6月13日周四 17:04写道: > > Hongyu Wang writes: > > Hi, > > > > In cfgexpand, there is an optimization for branch which tests > > targetm.gen_ccmp_first == NULL. However for target like x86-64,

[PATCH] i386: Fix some ISA bit test in option_override

2024-06-19 Thread Hongyu Wang
Hi, This patch adjusts several new feature check in ix86_option_override_interal that directly use TARGET_* instead of TARGET_*_P (opts->ix86_isa_flags), which caused cmdline option overrides target_attribute isa flag. Bootstrapped && regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLo

[PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-06-20 Thread Feng Wang
Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic functions are added by this patch. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f): Add 'Zvfbfmin' intrinsic in bases. (class vfwcvtbf16_f): Ditto. (class

[PATCH 3/3] RISC-V: Add md files for vector BFloat16

2024-06-20 Thread Feng Wang
Accroding to the BFloat16 spec, some vector iterators and new pattern are added in md files. All these changes passed the rvv test and rvv-intrinsic test for bfloat16. gcc/ChangeLog: * config/riscv/riscv.md: Add new insn name for vector BFloat16. * config/riscv/vector-iterators.m

[PATCH 1/3] RISC-V: Add vector type of BFloat16 format

2024-06-20 Thread Feng Wang
The vector type of BFloat16 format is added in this patch, subsequent extensions to zvfbfmin and zvfwma need to be based on this patch. gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (bfloat16_type): Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_INDEX.

[PATCH] libstdc++: Fix --disable-libstdcxx-verbose abi break [PR115585]

2024-06-22 Thread Shengdun Wang
From: Shengdun Wang __glibcxx_assert_fail is not defined when we disable the libstdcxx-verbose. This causes ABI break when a binary is compiled with verbose enabled. libstdc++-v3/ChangeLog: * src/c++11/assert_fail.cc: --- libstdc++-v3/src/c++11/assert_fail.cc | 13 + 1

[PATCH] libstdc++: Fix --disable-libstdcxx-verbose abi break [PR115585]

2024-06-22 Thread Shengdun Wang
From: Shengdun Wang __glibcxx_assert_fail is not defined when we disable the libstdcxx-verbose. This causes ABI break when a binary is compiled with verbose enabled. libstdc++-v3/ChangeLog: * src/c++11/assert_fail.cc: --- libstdc++-v3/src/c++11/assert_fail.cc | 13 + 1

[PATCH] libstdc++: Fix --disable-libstdcxx-verbose abi break [PR115585]

2024-06-22 Thread Shengdun Wang
__glibcxx_assert_fail is not defined when we disable the libstdcxx-verbose. This causes ABI break when a binary is compiled with verbose enabled. libstdc++-v3/ChangeLog: * src/c++11/assert_fail.cc: --- libstdc++-v3/src/c++11/assert_fail.cc | 13 + 1 file changed, 9 insertions

[PATCH] Always -lntdll for all cygming targets [PR113501]

2024-06-22 Thread Shengdun Wang
From: Shengdun Wang The mcf thread has already linked to -lntdll, and it's confirmed that even Windows 95 includes ntdll.dll. Additionally, if users do not use any functions from ntdll directly, the inclusion of -lntdll does not result in linking to it. Therefore, I propose making

Re: Re: [PATCH] RISC-V: Support -m[no-]unaligned-access

2024-06-24 Thread Wang Pengcheng
riscv.opt: Add option alias. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/predef-align-10.c: New test. >> * gcc.target/riscv/predef-align-7.c: New test. >> * gcc.target/riscv/predef-align-8.c: New test. >> * gcc.target/riscv/predef-align-9.c: New test.

[PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-26 Thread Hongyu Wang
Hi, When introducing munroll-only-small-loops, the option was marked as Target Save and added to -O2 default which makes attribute(optimize) resets target option and causing error when cmdline has O1 and funciton attribute has O2 and other target options. Mark this option as Optimization to fix.

Re: [PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-29 Thread Hongyu Wang
Richard Biener 于2024年7月26日周五 19:45写道: > > On Fri, Jul 26, 2024 at 10:50 AM Hongyu Wang wrote: > > > > Hi, > > > > When introducing munroll-only-small-loops, the option was marked as > > Target Save and added to -O2 default which makes attribute(optimize)

Re: [PATCH 0/1] Initial support for AVX10.2

2024-08-04 Thread Hongyu Wang
Andi Kleen 于2024年8月5日周一 06:31写道: > > > BTW, I noticed that in LLVM there is FP8 support for ARM currently > > undergoing. I will have a look on it to see if everything is mature. > > There's even FP8 work for ARM work under way for gcc, see > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/6

[PATCH] RISC-V: Add auto-vect pattern for vector rotate shift

2024-08-07 Thread Feng Wang
This patch add the vector rotate shift pattern for auto-vect. With this patch, the scalar rotate shift can be automatically vectorized into vector rotate shift. signed-off-by: Feng Wang gcc/ChangeLog: * config/riscv/autovec-opt.md (v3): Add define_expand for vector

Re: Fix PR69752, insn with REG_INC being removed as equiv_init insn

2016-02-12 Thread Jiong Wang
On 12/02/16 07:43, Jeff Law wrote: On 02/11/2016 06:28 PM, Bernd Schmidt wrote: This seems fairly straightforward: (insn 213 455 216 6 (set (reg:SI 266) (mem/u/c:SI (post_inc:SI (reg/f:SI 267)) [4 S4 A32])) 748 {*thumb1_movsi_insn} (expr_list:REG_EQUAL (const_int -1044200508 [

Re: Fix PR69752, insn with REG_INC being removed as equiv_init insn

2016-02-12 Thread Jiong Wang
On 12/02/16 13:33, Bernd Schmidt wrote: On 02/12/2016 02:18 PM, Jiong Wang wrote: PR rtl-optimization/69752 * ira.c (update_equiv_regs): When looking for more than a single SET, also take other side effects into account. Will it be better that we don't remove the insn if i

Re: [PATCH ARM] RFC: PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc generated by -pg

2016-02-12 Thread Jiong Wang
On 12/02/16 14:56, Charles Baylis wrote: This is encountered when building an allyesconfig Linux kernel because the Linux build system generates very large sections by partial linking a large number of object files. This causes link failures I have tried latest BFD linker? I suspect the follo

Re: [PATCH ARM] RFC: PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc generated by -pg

2016-02-12 Thread Jiong Wang
On 12/02/16 15:02, Jiong Wang wrote: On 12/02/16 14:56, Charles Baylis wrote: This is encountered when building an allyesconfig Linux kernel because the Linux build system generates very large sections by partial linking a large number of object files. This causes link failures I have

[AArch64] Tighten direct call pattern to repair -fno-plt

2015-07-16 Thread Jiong Wang
and tightening their predicates appropriately. > > Jeff Attachment is the patch which repair -fno-plt support for AArch64. aarch64_is_noplt_call_p will only be true if: * gcc is generating position independent code. * function symbol has declaration. * either -fno-plt or "(no_pl

Re: [AArch64] Tighten direct call pattern to repair -fno-plt

2015-07-16 Thread Jiong Wang
* either -fno-plt or "(no_plt)" attribute specified. >> * it's a external function. >> >> OK for trunk? >> >> 2015-07-16 Jiong Wang >> >> gcc/ >> * config/aarch64/aarch64-protos.h (aarch64_is_noplt_call_p): New >> declara

[COMMITTED][AArch64] Restrict got_mem_hoist_1.c with small memory model

2015-07-20 Thread Jiong Wang
pported on AArch64, while for absolute address, anchor used, single "ldr" generated, IV hoisted by PRE pass also, in either case, this testcase doesn't apply, we should skip it thus. Committed attach patch as obivious. 2015-07-20 Jiong Wang gcc/testsuite/ * gcc.target/aarch64

[AArch64][sibcall]Tighten direct call pattern to repair -fno-plt

2015-07-21 Thread Jiong Wang
Jiong Wang writes: > Alexander Monakov writes: > >>> Attachment is the patch which repair -fno-plt support for AArch64. >>> >>> aarch64_is_noplt_call_p will only be true if: >>> >>> * gcc is generating position independent code. >>&

Re: [AArch64] PR63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-22 Thread Jiong Wang
Jiong Wang writes: > Current IRA still use both target macros in a few places. > > Tell IRA to use the order we defined rather than with it's own cost > calculation. Allocate caller saved first, then callee saved. > > This is especially useful for LR/x30, as it's fr

Re: [AArch64/wwwdoc] Document -fpic support for small memory model

2015-07-22 Thread Jiong Wang
Jiong Wang writes: > Marcus Shawcroft writes: > >> On 26 June 2015 at 10:32, Jiong Wang wrote: >>> >>> This patch respin https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01804.html. >>> >>> A new symbol classification "SYMBOL_SMALL_GOT_28K"

Re: [AArch64/wwwdoc] Document -fpic support for small memory model

2015-07-23 Thread Jiong Wang
James Greenhalgh writes: > On Fri, Jun 26, 2015 at 02:45:39PM +0100, Jiong Wang wrote: >> >> Marcus Shawcroft writes: >> >> 2015-06-26 Jiong Wang >> >> wwwdocs/ >> * htdocs/gcc-6/changes.html (AArch64): Document -fpic for sm

[Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-24 Thread Jiong Wang
James Greenhalgh writes: > On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: >> Current IRA still use both target macros in a few places. >> >> Tell IRA to use the order we defined rather than with it's own cost >> calculation. Allocate calle

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread Jiong Wang
Andrew Pinski writes: > On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang wrote: >> >> James Greenhalgh writes: >> >>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: >>>> Current IRA still use both target macros in a few places. >>>>

[AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-07-28 Thread Jiong Wang
model the override of x0 caused by the function call which is hidded by the UNSPEC. So here, we restricting operand 0 to be x0, the override of x0 can be reflected to the gcc. OK for trunk? 2015-07-28 Ramana Radhakrishnan Jiong Wang gcc/ * config/aarch64/aarch64.d (tlsdesc_smal

Re: Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-08-05 Thread Jiong Wang
On 28/07/15 16:44, Martin Sebor wrote: Attached is an updated patch with the changes above. gcc/testsuite/ChangeLog 2015-07-28 Martin Sebor * g++.dg/Wframe-address-in-Wall.C: New test. * g++.dg/Wframe-address.C: New test. * g++.dg/Wno-frame-address.C: New test. * gcc.dg/Wfr

[COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-06 Thread Jiong Wang
James Greenhalgh writes: > On Tue, Jul 28, 2015 at 02:12:36PM +0100, Jiong Wang wrote: >> >> The instruction sequences for preparing argument for TLS descriptor >> runtime resolver and the later function call to resolver can actually be >> hoisted out of the loop

[COMMITTED][AArch64] Tighten direct call pattern to repair -fno-plt

2015-08-06 Thread Jiong Wang
James Greenhalgh writes: > On Thu, Jul 16, 2015 at 11:21:25AM +0100, Jiong Wang wrote: >> >> Jeff Law writes: >> >> > On 06/23/2015 02:29 AM, Ramana Radhakrishnan wrote: >> > >> >>> If you try disabling the REG_EQUAL note generation

[COMMITTED][AArch64][sibcall]Tighten direct call pattern to repair -fno-plt

2015-08-06 Thread Jiong Wang
James Greenhalgh writes: > On Tue, Jul 21, 2015 at 01:42:35PM +0100, Jiong Wang wrote: >> >> Jiong Wang writes: >> >> > Alexander Monakov writes: >> > >> >>> Attachment is the patch which repair -fno-plt support for AArch64. >&

[Patch/ccmp] Cost instruction sequences to choose better expand order

2015-09-18 Thread Jiong Wang
ecursive call of expand_ccmp_expr_1 while this patch only handle the inner most call where the incoming gimple is with both operands be comparision operations. NOTE: AArch64 backend can't cost CCMP instruction accurately, so I marked the testcase as XFAIL which will be removed once we fix

Re: [Patch/ccmp] Cost instruction sequences to choose better expand order

2015-09-21 Thread Jiong Wang
Bernd Schmidt writes: > On 09/18/2015 05:21 PM, Jiong Wang wrote: >> >> Current conditional compare (CCMP) support in GCC aim to optimize >> short circuit for cascade comparision, given a simple conditional >> compare candidate: >> >>if (a == 17 || a =

Re: [AArch64/testsuite] Add more TLS local executable testcases

2015-09-22 Thread Jiong Wang
Marcus Shawcroft writes: > On 26 August 2015 at 14:58, Jiong Wang wrote: >> >> This patch cover tlsle tiny model tests, tls size truncation for tiny & >> small model included also. >> >> All testcases pass native test. >> >> OK for trunk? &

Re: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-09-28 Thread Jiong Wang
Andrew Pinski writes: > On Tue, Jul 28, 2015 at 6:12 AM, Jiong Wang wrote: >> >> The instruction sequences for preparing argument for TLS descriptor >> runtime resolver and the later function call to resolver can actually be >> hoisted out of the loop. >> >

Re: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-09-28 Thread Jiong Wang
Jiong Wang writes: > Andrew Pinski writes: > >> On Tue, Jul 28, 2015 at 6:12 AM, Jiong Wang wrote: >>> >>> The instruction sequences for preparing argument for TLS descriptor >>> runtime resolver and the later function call to resolver can

Re: [AArch64/testsuite] Add more TLS local executable testcases

2015-10-02 Thread Jiong Wang
Jiong Wang writes: > Marcus Shawcroft writes: > >> On 26 August 2015 at 14:58, Jiong Wang wrote: >>> >>> This patch cover tlsle tiny model tests, tls size truncation for tiny & >>> small model included also. >>> >>> All testcases pass

Re: [AArch64] [TLSIE][2/2] Implement TLS IE for tiny model

2015-10-05 Thread Jiong Wang
James Greenhalgh writes: > Hi Jiong, > > I was looking at another bug and in the process of auditing our code > spotted an issue with this patch from back in June... > > On Fri, Jun 19, 2015 at 10:15:38AM +0100, Jiong Wang wrote: >> diff --git a/gcc/config/aarch64/aarch64

[AArch64] --with-arch in config.gcc support "."

2015-10-14 Thread Jiong Wang
Since armv8.1 added, we need to improve --with-arch recognition sed pattern to catch the new "." in the architecture base name. OK for trunk? 2015-10-14 Jiong Wang gcc/ * config.gcc: Recognize "." in architecture base name for AArch64. diff --git a/gcc/config.gcc b/

Re: [AArch64] --with-arch in config.gcc support "."

2015-10-15 Thread Jiong Wang
On 14/10/15 16:24, Andreas Schwab wrote: Jiong Wang writes: diff --git a/gcc/config.gcc b/gcc/config.gcc index 5818663..215ad9a 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3544,7 +3544,7 @@ case "${target}" in eval "val=\$with_$which"

[AArch64] Update comments on the usage of X30 in FIXED_REGISTERS and CALL_USED_REGISTERS

2015-10-16 Thread Jiong Wang
hanks. 2015-10-16 Jiong. Wang gcc/ * config/aarch64/aarch64.h: Update the comments on usage of X30. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5a8db76..1eaaca0 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -210,14 +210,17 @@ e

Re: PING: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-10-27 Thread Jiong Wang
On 27/10/15 11:37, H.J. Lu wrote: On Tue, Oct 27, 2015 at 4:20 AM, Bernd Schmidt wrote: On 10/19/2015 09:55 PM, H.J. Lu wrote: * calls.c (prepare_call_address): Don't handle -fno-plt here. Is any other target using -fno-plt? If not, and if that's really just a aarch64 is the onl

Re: PING: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-10-27 Thread Jiong Wang
On 27/10/15 13:06, H.J. Lu wrote: On Tue, Oct 27, 2015 at 5:52 AM, Jiong Wang wrote: On 27/10/15 11:37, H.J. Lu wrote: On Tue, Oct 27, 2015 at 4:20 AM, Bernd Schmidt wrote: On 10/19/2015 09:55 PM, H.J. Lu wrote: * calls.c (prepare_call_address): Don't handle -fno-plt

Re: PING: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-10-27 Thread Jiong Wang
On 27/10/15 14:50, H.J. Lu wrote: On Tue, Oct 27, 2015 at 7:34 AM, Ramana Radhakrishnan wrote: OK, then it's fairly x86-64 specific optimization, because we can't do "call *mem" in aarch64 and some other targets. It is a fairly x86_64 specific optimization and doesn't apply to AArch64. The

Re: [AArch64] Update comments on the usage of X30 in FIXED_REGISTERS and CALL_USED_REGISTERS

2015-10-30 Thread Jiong Wang
On 16/10/15 15:36, Jiong Wang wrote: The patch https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html from last year changed the definition of LR in CALL_USED_REGISTERS, but didn't update the comment above the #define to reflect the new usage. This patch bring the comment inline wit

Re: [AArch64] Update comments on the usage of X30 in FIXED_REGISTERS and CALL_USED_REGISTERS

2015-11-02 Thread Jiong Wang
On 02/11/15 12:01, Richard Earnshaw wrote: On 16/10/15 15:36, Jiong Wang wrote: The patch https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html from last year changed the definition of LR in CALL_USED_REGISTERS, but didn't update the comment above the #define to reflect the new

Re: Re: [Patch ARM-AArch64/testsuite Neon intrinsics 00/20] Executable tests

2015-11-02 Thread Jiong Wang
On 27 May 2015 at 22:15, Christophe Lyon wrote: * gcc.target/aarch64/advsimd-intrinsics/vtbX.c: Likewise. Noticed this testcase failed on big-endian on my local test gcc.target/aarch64/advsimd-intrinsics/vtbX.c line 188 in buffer 'expected_vtbl3') at type int8x8 index 0: got 0x0

Re: [Patch ARM-AArch64/testsuite Neon intrinsics 00/20] Executable tests

2015-11-02 Thread Jiong Wang
On 02/11/15 14:38, Christophe Lyon wrote: On 2 November 2015 at 15:20, Jiong Wang wrote: On 27 May 2015 at 22:15, Christophe Lyon wrote: * gcc.target/aarch64/advsimd-intrinsics/vtbX.c: Likewise. Noticed this testcase failed on big-endian on my local test gcc.target/aarch64

Re: [AArch64] Update comments on the usage of X30 in FIXED_REGISTERS and CALL_USED_REGISTERS

2015-11-03 Thread Jiong Wang
On 02/11/15 14:52, Richard Earnshaw wrote: On 02/11/15 12:58, Jiong Wang wrote: On 02/11/15 12:01, Richard Earnshaw wrote: On 16/10/15 15:36, Jiong Wang wrote: The patch https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html from last year changed the definition of LR in

[PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-04 Thread Jiong Wang
perand is only used by several misalign pattern, I guess that's why this bug is not exposed for long time. boostrap & regression OK on armv8 aarch32, ok for trunk? 2015-11-04 Jiong Wang Jim Wilson gcc/ PR target/67305 * config/arm/arm.md (neon_vector_mem_operand): Return F

Re: Re: [AArch64][TLSGD][2/2] Implement TLS GD traditional for tiny code model

2015-11-05 Thread Jiong Wang
you compile with "-O2 -ftls-model=global-dynamic -fpic -mtls-dialect=trad t.c -mcmodel=tiny -fomit-frame-pointer", wrong code will be generated: main: str x19, [sp, -16]! <--- x30 is not saved. adr x0, :tlsgd:t0 bl __tls_get_addr nop Patc

Re: Re: [PATCH] Fix PRs 66502 and 67167

2015-11-06 Thread Jiong Wang
On 21/08/15 10:47, Jiong Wang wrote: Richard Biener writes: I see the following ICE: t.c:13:1: internal compiler error: in decompose_normal_address, at rtlanal.c:6090 } ^ 0xc94a37 decompose_normal_address /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6090 0xc94d25

[Patch] PR tree-optimization/68234 Improve range info for loop Phi node

2015-11-11 Thread Jiong Wang
8 new VR_VARYING -> VR_RANGE found by vrp1, and 5008 new by vrp2. While on AArch64 there are 44756 new by vrp1, and 6047 new by vrp2. OK for trunk? 2015-11-11 Richard Biener Jiong Wang gcc/ PR tree-optimization/68234 * tree-vrp.c (vrp_visit_phi_node): Extend SCEV check to th

Re: [PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-11 Thread Jiong Wang
On 04/11/15 09:45, Jiong Wang wrote: As discussed at the bugzilla https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305 neon_vector_mem_operand is broken. As the comments says "/* Reject eliminable registers. */", the code block at the head of this function which checks eliminable

<    1   2   3   4   5   6   7   8   9   10   >