[PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-15 Thread liuhongt via Gcc-patches
Ping rebased on latest trunk. gcc/ChangeLog: * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize). * doc/invoke.texi (Options That Control Optimization): Update documents. * opts.c (default_options_table): Enable auto-vectorization at O2 with very-c

[PATCH] Check mask type when doing cond_op related gimple simplification.

2021-09-15 Thread liuhongt via Gcc-patches
Ping. Bootstrapped and regtest on x86_64-linux-gnu{-m32,}, aarch64-unknown-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR middle-end/102080 * match.pd: Check mask type when doing cond_op related gimple simplification. * tree.c (is_truth_type_for): New funct

[PATCH] [AVX512FP16] Support embedded broadcast for AVX512FP16 instructions.

2021-09-16 Thread liuhongt via Gcc-patches
Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}. Runtime tests passed under sde{-m32,}. gcc/ChangeLog: PR target/87767 * config/i386/i386.c (ix86_print_operand): Handle V8HF/V16HF/V32HFmode. * config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode. *

[PATCH] [i386] Fix ICE in pass_rpad.

2021-09-17 Thread liuhongt via Gcc-patches
Besides conversion instructions, pass_rpad also handles scalar sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to handle conversion instructions, so fix it. Bootstrapped and regtest on x86_64-linux-gnu{-m32,} w/ configure --enable-checking=yes,rtl,extra, failed tests are fixed

[PATCH] Support 64bit fma/fms/fnma/fnms under avx512vl.

2021-09-21 Thread liuhongt via Gcc-patches
Hi: fma/fms/fnma/fnmsv2sf4 are defined only under (TARGET_FMA || TARGET_FMA4). The patch extend the expanders to TARGET_AVX512VL. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/mmx.md (fmav2sf4): Extend to AVX512 fma. (f

[PATCH] [i386] Adjust testcase.

2021-09-21 Thread liuhongt via Gcc-patches
Pushed to trunk. gcc/testsuite/ChangeLog: * gcc.target/i386/pr92658-avx512f.c: Refine testcase. * gcc.target/i386/pr92658-avx512vl.c: Adjust scan-assembler, only v2di->v2qi truncate is not supported, v4di->v4qi should be supported. --- gcc/testsuite/gcc.target/i38

[PATCH] wwwdocs: [GCC12] Mention Intel AVX512-FP16.

2021-09-22 Thread liuhongt via Gcc-patches
--- htdocs/gcc-12/changes.html | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index 81f62fe3..14149212 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -165,8 +165,12 @@ a work-in-progre

[PATCH 0/7] AVX512FP16: Support bunch of expanders for HFmode and vector HFmodes

2021-09-22 Thread liuhongt via Gcc-patches
expander for smin/maxhf3. AVX512FP16: Add fix(uns)?_truncmn2 for HF scalar and vector modes AVX512FP16: Add float(uns)?mn2 expander AVX512FP16: add truncmn2/extendmn2 expanders AVX512FP16: Enable vec_cmpmn/vcondmn expanders for HF modes. liuhongt (2): AVX512FP16: Add expander for rint

[PATCH 1/7] AVX512FP16: Add expander for rint/nearbyinthf2.

2021-09-22 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/i386.md (rinthf2): New expander. (nearbyinthf2): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-round-1.c: Add new testcase. --- gcc/config/i386/i386.md | 22 +++ .../i386/avx

[PATCH 2/7] AVX512FP16: Add expander for fmahf4

2021-09-22 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/sse.md (FMAMODEM): extend to handle FP16. (VFH_SF_AVX512VL): Extend to handle HFmode. (VF_SF_AVX512VL): Deleted. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-fma-1.c: New test. * gcc.target/i386/avx512fp16vl-fma-1.c: N

[PATCH 3/7] AVX512FP16: Add expander for smin/maxhf3.

2021-09-22 Thread liuhongt via Gcc-patches
From: Hongyu Wang gcc/ChangeLog: * config/i386/i386.md (hf3): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-minmax-1.c: New test. --- gcc/config/i386/i386.md | 11 ++ .../i386/avx512fp16-builtin-minmax-1.c| 35 +++

[PATCH 4/7] AVX512FP16: Add fix(uns)?_truncmn2 for HF scalar and vector modes

2021-09-22 Thread liuhongt via Gcc-patches
From: Hongyu Wang NB: 64bit/32bit vectorize for HFmode is not supported for now, will adjust this patch when V2HF/V4HF operations supported. gcc/ChangeLog: * config/i386/i386.md (fix_trunchf2): New expander. (fixuns_trunchfhi2): Likewise. (*fixuns_trunchfsi2zext): New de

[PATCH 5/7] AVX512FP16: Add float(uns)?mn2 expander

2021-09-22 Thread liuhongt via Gcc-patches
From: Hongyu Wang gcc/ChangeLog: * config/i386/sse.md (float2): New expander. (avx512fp16_vcvt2ph_): Rename to ... (floatv4hf2): ... this, and drop constraints. (avx512fp16_vcvtqq2ph_v2di): Rename to ... (floatv2div2hf2): ... this, and like

[PATCH 6/7] AVX512FP16: add truncmn2/extendmn2 expanders

2021-09-22 Thread liuhongt via Gcc-patches
From: Hongyu Wang gcc/ChangeLog: * config/i386/sse.md (extend2): New expander. (extendv4hf2): Likewise. (extendv2hfv2df2): Likewise. (trunc2): Likewise. (avx512fp16_vcvt2ph_): Rename to ... (truncv4hf2): ... this, and drop constraints.

[PATCH 7/7] AVX512FP16: Enable vec_cmpmn/vcondmn expanders for HF modes.

2021-09-22 Thread liuhongt via Gcc-patches
From: Hongyu Wang gcc/ChangeLog: * config/i386/i386-expand.c (ix86_use_mask_cmp_p): Enable HFmode mask_cmp. * config/i386/sse.md (sseintvecmodelower): Add HF vector modes. (_store_mask): Extend to support HF vector modes. (vec_cmp): Likewise. (vcon

[PATCH] [GCC12] Mention Intel AVX512-FP16 and _Float16 support.

2021-09-23 Thread liuhongt via Gcc-patches
Updated, mention _Float16 support. --- htdocs/gcc-12/changes.html | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index 81f62fe3..f19c6718 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.

[PATCH] [GIMPLE] Simplify (_Float16) ceil ((double) x) to .CEIL (x) when available.

2021-09-24 Thread liuhongt via Gcc-patches
Hi: Related discussion in [1] and PR. Bootstrapped and regtest on x86_64-linux-gnu{-m32,}. Ok for trunk? [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html gcc/ChangeLog: PR target/102464 * config/i386/i386.c (ix86_optab_supported_p): Return true f

[PATCH] [i386] Remove storage only description for _Float16 w/o avx512fp16.

2021-09-24 Thread liuhongt via Gcc-patches
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580207.html gcc/ChangeLog: * doc/extend.texi (Half-Precision): Remove storage only description for _Float16 w/o avx512fp16. --- gcc/doc/extend.texi | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff

[PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-25 Thread liuhongt via Gcc-patches
Hi: > Please don't add the -fno- option to the warning tests.  As I said, > I would prefer to either suppress the vectorization for the failing > cases by tweaking the test code or xfail them.  That way future > regressions won't be masked by the option.  Once we've moved > the warning to a more su

[PATCH] Revert "Optimize v4sf reduction.".

2021-09-27 Thread liuhongt via Gcc-patches
Revert due to performace regression. This reverts commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf. PR target/102473 PR target/101059 --- gcc/config/i386/sse.md| 39 ++- gcc/testsuite/gcc.target/i386/sse2-pr101059.c | 32 --- gcc/tests

[PATCH] Support 128/256/512-bit vector _Float16 plus/smin/smax reduce.

2021-09-27 Thread liuhongt via Gcc-patches
Hi: Add expanders for reduc_{smin,smax,plus}_scal_{v8hf,v16hf,v32hf} Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,} gcc/ChangeLog: * config/i386/i386-expand.c (emit_reduc_half): Handle V8HF/V16HF/V32HFmode. * config/i386/sse.md (REDUC_SSE_PLUS_MODE): Add V8HF

[PATCH] [i386] Support reduc_{plus,smax,smin,umax,min}_scal_v4hi.

2021-09-27 Thread liuhongt via Gcc-patches
Hi: Bootstrapped and regtested on x86_64-pc-lunux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/102494 * config/i386/i386-expand.c (emit_reduc_half): Hanlde V4HImode. * config/i386/mmx.md (reduc_plus_scal_v4hi): New. (reduc__scal_v4hi): New. gcc/testsuite

[PATCH] Adjust testcase for O2 vect.

2021-10-31 Thread liuhongt via Gcc-patches
> (I'm assuming the difference is due to some architectural > constraints as opposed to arbitrary limitations in the code There're 2 difference: 1. target support unaligned store or not. 2. target support move by piece or not(which will enable block move in gimple level). Updated patch. Adjust c

[PATCH v5] Improve integer bit test on __atomic_fetch_[or|and]_* returns

2021-11-03 Thread liuhongt via Gcc-patches
Sorry for the slow reply: Here is update according to comments 1. Define new match function in match.pd. 2. Adjust code for below >> + gsi_remove (gsip, true); >> + var = build1 (NOP_EXPR, TREE_TYPE (use_nop_lhs), var); > >instead of building a GENERIC NOP you co

[PATCH 1/2] [Middle-end] Simplify (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a, b).

2021-11-03 Thread liuhongt via Gcc-patches
a and b are same type as the truncation type and has less precision than extend type. Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/102464 * match.pd: simplify (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a,b) when

[PATCH 2/2] [i386] Extend vternlog define_insn_and_split to memory_operand to enable more optimziation.

2021-11-03 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}. Ready to push to trunk after first patch is approved. gcc/ChangeLog: PR target/101989 * config/i386/predicates.md (reg_or_notreg_operand): Rename to .. (regmem_or_bitnot_regmem_operand): .. and extend to handle

[PATCH] Add !flag_signaling_nans to simplifcation: (trunc)copysign((extend)a, (extend)b) to copysign (a, b).

2021-11-04 Thread liuhongt via Gcc-patches
> Note that this is not safe with -fsignaling-nans, so needs to be disabled > for that option (if there isn't already logic somewhere with that effect), > because the extend will convert a signaling NaN to quiet (raising > "invalid"), but copysign won't, so this transformation could result in a > s

[PATCH 1/2] [Gimple] Simplify (trunc)fmax/fmin((extend)a, (extend)b) to MAX/MIN(a, b)

2021-11-04 Thread liuhongt via Gcc-patches
a and b are same type as trunc type and has less precision than extend type, the transformation is guarded by flag_finite_math_only. Bootstrapped and regtested under x86_64-pc-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR target/102464 * match.pd: Simplify (trunc)fmax/fmin((ex

[PATCH 2/2] [Gimple] Simplify (trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a, b, c).

2021-11-04 Thread liuhongt via Gcc-patches
a, b, c are same type as truncation type and has less precision than extend type, the optimization is guarded under flag_unsafe_math_optimizations. Bootstrapped and regtested under x86_64-pc-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR target/102464 * match.pd: Simplify

[PATCH] Update documentation for -ftree-loop-vectorize and -ftree-slp-vectorize which are enabled by default at -02.

2021-11-05 Thread liuhongt via Gcc-patches
Bootstrappend on x86_64-pc-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR tree-optimization/103077 * doc/invoke.texi (Options That Control Optimization): Update documentation for -ftree-loop-vectorize and -ftree-slp-vectorize which are enabled by default at -02.

[PATCH] [pass_if_conversion] Extend is_cond_scalar_reduction to handle bit_and/bit_xor/bit_ior.

2021-11-08 Thread liuhongt via Gcc-patches
This will enable transformation like - # sum1_50 = PHI - # sum2_52 = PHI + # sum1_50 = PHI <_87(13), 0(4)> + # sum2_52 = PHI <_89(13), 0(4)> # ivtmp_62 = PHI i.2_7 = (long unsigned int) i_49; _8 = i.2_7 * 8; ... vec1_i_38 = vec1_29 >> _10; vec2_i_39 = vec2_31 >> _10; _11 =

[PATCH] Improve integer bit test on __atomic_fetch_[or|and]_* returns

2021-11-09 Thread liuhongt via Gcc-patches
> > > > +#if GIMPLE > > +(match (nop_atomic_bit_test_and_p @0 @1) > > + (bit_and:c (nop_convert?@4 (ATOMIC_FETCH_OR_XOR_N @2 INTEGER_CST@0 @3)) > > +           INTEGER_CST@1) > > no need for the :c on the bit_and when the 2nd operand is an Changed. > INTEGER_CST (likewise below) > > > + (with { >

[PATCH] [i386] Extend vpcmov to handle V8HF/V16HFmode under TARGET_XOP.

2021-11-09 Thread liuhongt via Gcc-patches
This patch fixes ICE in pr103151. Bootstrap and regtest on x86_64-linux-gnu{-m32,}. Ready to push to trunk. gcc/ChangeLog: PR target/103151 * config/i386/sse.md (V_128_256): Extend to V8HF/V16HF. (avxsizesuffix): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386

[PATCH] Enhance optimize_atomic_bit_test_and to handle truncation.

2021-11-16 Thread liuhongt via Gcc-patches
r12-5102-gfb161782545224f5 improves integer bit test on __atomic_fetch_[or|and]_* returns only for nop_convert, .i.e. transfrom mask_5 = 1 << bit_4(D); mask.0_1 = (unsigned int) mask_5; _2 = __atomic_fetch_or_4 (a_7(D), mask.0_1, 0); t1_9 = (int) _2; t2_10 = mask_5 & t1_9; to mask_5

[no subject]

2021-06-01 Thread liuhongt via Gcc-patches
This is the updated patch.

[PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-01 Thread liuhongt via Gcc-patches
For i386, it will enable below opt from notl%edi vpbroadcastd%edi, %xmm0 vpand %xmm1, %xmm0, %xmm0 to vpbroadcastd%edi, %xmm0 vpandn %xmm1, %xmm0, %xmm0 gcc/ChangeLog: PR target/100711 * simplify-rtx.c (simplify_unary_operat

[PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-01 Thread liuhongt via Gcc-patches
For i386, it will enable below opt from notl%edi vpbroadcastd%edi, %xmm0 vpand %xmm1, %xmm0, %xmm0 to vpbroadcastd%edi, %xmm0 vpandn %xmm1, %xmm0, %xmm0 gcc/ChangeLog: PR target/100711 * simplify-rtx.c (simplify_unary_operat

[PATCH 1/2] CALL_INSN may not be a real function call.

2021-06-02 Thread liuhongt via Gcc-patches
Use "used" flag for CALL_INSN to indicate it's a fake call. If it's a fake call, it won't have its own function stack. gcc/ChangeLog PR target/82735 * df-scan.c (df_get_call_refs): When call_insn is a fake call, it won't use stack pointer reg. * final.c (leaf_funct

[PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.

2021-06-02 Thread liuhongt via Gcc-patches
When __builtin_ia32_vzeroupper is called explicitly, the corresponding vzeroupper pattern does not carry any CLOBBERS or SETs before LRA, which leads to incorrect optimization in pass_reload. In order to solve this problem, this patch refine instructions as call_insns in which the call has a specia

[PATCH] [i386] Fix ICE of insn does not satisfy its constraints.

2021-06-03 Thread liuhongt via Gcc-patches
For evex encoding extended instructions, when vector length is less than 512 bits, AVX512VL is needed, besides some instructions like vpmovzxbx need extra AVX512BW. So this patch refines corresponding constraints, i.e. from "v/vm" to "Yv/Yvm", from "v/vm" to "Yw/Ywm". Bootstrapped and regtested on

[PATCH] [GCC-12] Mention O2 vectorization enabling.

2021-10-08 Thread liuhongt via Gcc-patches
--- htdocs/gcc-12/changes.html | 9 + 1 file changed, 9 insertions(+) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index 22839f2d..6e898db7 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -68,6 +68,15 @@ a work-in-progress. General Im

[PATCH] Refine movhfcc.

2021-10-08 Thread liuhongt via Gcc-patches
For AVX512-FP16, HFmode only supports vcmpsh whose dest is mask register, so for movhfcc, it's vcmpsh op2, op1, %k1 vmovsh op1, op2{%k1} mov op2, dest gcc/ChangeLog: PR target/102639 * config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Handle HFmode. (ix86_use_

[PATCH] Adjust more testcases for O2 vectorization enabling.

2021-10-08 Thread liuhongt via Gcc-patches
Pushed to trunk. libgomp/ChangeLog: * testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap. * testsuite/libgomp.c++/scan-11.C: Ditto. * testsuite/libgomp.c++/scan-12.C: Ditto. * testsuite/libgomp.c++/scan-13.C: Ditto. * testsuite/libgomp.c++/

[PATCH] Adjust testcase for O2 vectorization enabling

2021-10-10 Thread liuhongt via Gcc-patches
libgomp/ChangeLog: * testsuite/libgomp.graphite/force-parallel-8.c: Add -fno-tree-vectorize. --- libgomp/testsuite/libgomp.graphite/force-parallel-8.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-8.c b/libgomp/test

[PATCH] Adjust testcase for O2 vectorization enabling.

2021-10-10 Thread liuhongt via Gcc-patches
gcc/testsuite/ChangeLog: PR middle-end/102669 * gnat.dg/unroll1.adb: Add -fno-tree-vectorize. --- gcc/testsuite/gnat.dg/unroll1.adb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gnat.dg/unroll1.adb b/gcc/testsuite/gnat.dg/unroll1.adb index 34d8

[PATCH][i386] Support reduc_{plus,smax,smin,umax,umin}_scal_v4qi.

2021-10-10 Thread liuhongt via Gcc-patches
After providing expanders for reduc_umin/umax/smin/smax_scal_v4qi, perfomance are a little bit faster than before for reduce operations w/ options -O2 -march=haswell, -O2 -march=skylake-avx512 and -Ofast -march=skylake-avx512. gcc/ChangeLog PR target/102483 * config/i386/i386-ex

[PATCH] Adjust testcase for O2 vectorization[Wuninitialized]

2021-10-12 Thread liuhongt via Gcc-patches
As discussed in PR. It looks like it's just the the location of the warning that's off, the warning itself is still issued but it's swallowed by the dg-prune-output directive. Since the test was added to verify the fix for an ICE without vectorization I think disabling vectorization should be fine

[PATCH] Adjust testcase for O2 vectorization.

2021-10-14 Thread liuhongt via Gcc-patches
Hi Kewen: Cound you help to verify if this patch fix those regressions for rs6000 port. As discussed in [1], this patch add xfail/target selector to those testcases, also make a copy of them so that they can be tested w/o vectorization. Newly added xfail/target selectors are used to check the v

[PATCH] Adjust testcase for O2 vectorization.

2021-10-19 Thread liuhongt via Gcc-patches
updated patch: 1. Add documents in doc/sourcebuild.texi (Effective-Target Keywords). 2. Reduce -novec.c testcases to contain only new failed parted which is caused by O2 vectorization. 3. Add PR in dg-warning comment. As discussed in [1], this patch add xfail/target selector to those testcas

[PATCH] Canonicalize __atomic/sync_fetch_or/xor/and for constant mask.

2021-10-21 Thread liuhongt via Gcc-patches
Hi: This patch is try to canoicalize bit_and and nop_convert order for __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*,__sync_fetch_and_or_*, __sync_fetch_and_xor_*,__sync_xor_and_fetch_*, __atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant. .i.e. +/* Canonicalize +

[PATCH] Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when a is a _Float16 value.

2021-10-24 Thread liuhongt via Gcc-patches
Similar for sqrt/sqrtl. gcc/ChangeLog: PR target/102464 * match.pd: Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when direct_internal_fn_supported_p, similar for sqrt/sqrtl. gcc/testsuite/ChangeLog: PR target/102464 * gcc.target/i386/pr102464-sqrtph.c

[PATCH] Canonicalize __atomic/sync_fetch_or/xor/and for constant mask.

2021-10-24 Thread liuhongt via Gcc-patches
Canoicalize & and nop_convert order for __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*,__sync_fetch_and_or_*, __sync_fetch_and_xor_*,__sync_xor_and_fetch_*, __atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant. .i.e. +/* Canonicalize + _1 = __atomic_fetch_or_4 (&v,

[PATCH] Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations.

2021-10-25 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/102464 * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New function type. (V16HF_FTYPE_V16HF): Ditto. (V32HF_FTYPE_V32HF): Ditto. (V8HF_FTYP

[PATCH] Adjust testcase for O2 vect.

2021-10-28 Thread liuhongt via Gcc-patches
Adjust code in check_vect_slp_aligned_store_usage to make it an exact pattern match of the corresponding testcases. These new target/xfail selectors are added as a temporary solution, and should be removed after real issue is fixed for Wstringop-overflow. gcc/ChangeLog: * doc/sourcebuild.

[PATCH] [i386] Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

2021-07-13 Thread liuhongt via Gcc-patches
By optimizing vector movement to broadcast in ix86_expand_vector_move during pass_expand, pass_reload/LRA can automatically generate an avx512 embedded broadcast, pass_cpb is not needed. Considering that in the absence of avx512f, broadcast from memory is still slightly faster than loading the ent

[PATCH] Support logic shift left/right for avx512 mask type.

2021-07-20 Thread liuhongt via Gcc-patches
Hi: As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html cut start- > note for the lowpart we can just view-convert away the excess bits, > fully re-using the mask. We generate surprisingly "good" code: > > kmovb %k1, %edi > shrb$4, %dil >

[PATCH V2 00/10] Initial support for AVX512FP16

2021-07-21 Thread liuhongt via Gcc-patches
Hi: As discussed in [1], this patch support _Float16 under target sse2 and above, w/o avx512fp16, _Float16 type is storage only, all operations are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults, which may lead to

[PATCH 01/10] Update hf soft-fp from glibc.

2021-07-21 Thread liuhongt via Gcc-patches
libgcc/ChangeLog * soft-fp/eqhf2.c: New file. * soft-fp/extendhfdf2.c: New file. * soft-fp/extendhfsf2.c: New file. * soft-fp/extendhfxf2.c: New file. * soft-fp/half.h (FP_CMP_EQ_H): New marco. * soft-fp/truncdfhf2.c: New file * soft-fp/trunc

[PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-07-21 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode. * config/i386/i386.c (enum x86_64_reg_class): Add X86_64_SSEHF_CLASS. (merge_classes): Handle X86_64_SSEHF_CLASS. (examine_argument): Ditto. (construct_container): Ditto.

[PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

2021-07-21 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * optabs-query.c (get_best_extraction_insn): Use word_mode for HF field. libgcc/ChangeLog: * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro. * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto. * config/i386/sfp-machine.h (_FP_NAN

[PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

2021-07-21 Thread liuhongt via Gcc-patches
From: "Guo, Xuepeng" gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect FEATURE_AVX512FP16. * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512FP16_SET, OPTION_MASK_ISA_AVX512FP16_UNSET, OPTION_MASK_ISA2_AVX512FP1

[PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16.

2021-07-21 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic. (_mm256_set_ph): Likewise. (_mm512_set_ph): Likewise. (_mm_setr_ph): Likewise. (_mm256_setr_ph): Likewise. (_mm512_setr_ph): Likewise. (_mm_set1_ph): Likewise.

[PATCH 06/10] AVX512FP16: Add testcase for vector init and broadcast intrinsics.

2021-07-21 Thread liuhongt via Gcc-patches
gcc/testsuite/ChangeLog: * gcc.target/i386/m512-check.h: Add union128h, union256h, union512h. * gcc.target/i386/avx512fp16-10a.c: New test. * gcc.target/i386/avx512fp16-10b.c: Ditto. * gcc.target/i386/avx512fp16-1a.c: Ditto. * gcc.target/i386/avx512fp16-1b.c

[PATCH 07/10] AVX512FP16: Add tests for vector passing in variable arguments.

2021-07-21 Thread liuhongt via Gcc-patches
From: "H.J. Lu" gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vararg-1.c: New test. * gcc.target/i386/avx512fp16-vararg-2.c: Ditto. * gcc.target/i386/avx512fp16-vararg-3.c: Ditto. * gcc.target/i386/avx512fp16-vararg-4.c: Ditto. --- .../gcc.target/i386/avx

[PATCH 09/10] AVX512FP16: Add ABI test for ymm.

2021-07-21 Thread liuhongt via Gcc-patches
gcc/testsuite/ChangeLog: * gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp: New exp file. * gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header. * gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h: Likewise. * gcc.targ

[PATCH 10/10] AVX512FP16: Add abi test for zmm

2021-07-21 Thread liuhongt via Gcc-patches
gcc/testsuite/ChangeLog: * gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp: New file. * gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise. * gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise. * gcc.target/x86_64/abi/avx512fp1

[PATCH] [i386] Add a separate function to calculate cost for WIDEN_MULT_EXPR.

2021-07-28 Thread liuhongt via Gcc-patches
Hi: As described in PR 39821, WIDEN_MULT_EXPR should use a different cost model from MULT_EXPR, this patch add ix86_widen_mult_cost for that. Reference basis for the cost model is https://godbolt.org/z/EMjaz4Knn. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. gcc/ChangeLog: *

[PATCH] Adjust/Refine testcases.

2021-07-28 Thread liuhongt via Gcc-patches
Committed as obvious fix, and opened pr101668 to record the issue related to pr92658-{avx512bw-2,sse4-2,sse4}.c. gcc/testsuite/ChangeLog: PR target/99881 * gcc.target/i386/pr91446.c: Adjust testcase. * gcc.target/i386/pr92658-avx512bw-2.c: Ditto. * gcc.target/i38

[PATCH] [x86] x86: Don't add crtfastmath.o for -shared and add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2022-12-13 Thread liuhongt via Gcc-patches
Don't add crtfastmath.o for -shared to avoid changing the MXCSR register when loading a shared library. crtfastmath.o will be used only when building executables. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/55522 PR target/368

[PATCH V2 1/2] x86: Don't add crtfastmath.o for -shared

2022-12-14 Thread liuhongt via Gcc-patches
Update in V2: Split -shared change into a separate commit and add some documentation for it. Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}. Ok of trunk? Don't add crtfastmath.o for -shared to avoid changing the MXCSR register when loading a shared library. crtfastmath.o will be used onl

[PATCH V2 2/2] [x86] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2022-12-14 Thread liuhongt via Gcc-patches
Update in v2: 1. Support -mno-daz-ftz, and make the the option effectively three state as: if (mdaz-ftz) link crtfastmath.o else if ((Ofast || ffast-math || funsafe-math-optimizations) && !shared && !mno-daz-ftz) link crtfastmath.o else Don't link crtfastmath.o 2. Still make the op

[PATCH] Don't add crtfastmath.o for -shared.

2023-01-13 Thread liuhongt via Gcc-patches
Patches [1] and [2] fixed PR55522 for x86-linux but left all other x86 targets unfixed (x86-cygwin, x86-darwin and x86-mingw32). This patch applies a similar change to other specs using crtfastmath.o. Ok for trunk? [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608528.html [2] https:

[PATCH] Fix target_clone ("arch=graniterapids-d") and target_clone ("arch=arrowlake-s")

2023-08-22 Thread liuhongt via Gcc-patches
Both "graniterapid-d" and "graniterapids" are attached with PROCESSOR_GRANITERAPID in processor_alias_table but mapped to different __cpu_subtype in get_intel_cpu. And get_builtin_code_for_version will try to match the first PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to "granitepr

[PATCH] [x86] Refactor mode iterator V_128 and V_128H, V_256 and V_256H

2023-08-24 Thread liuhongt via Gcc-patches
Merge V_128H and V_256H into V_128 and V_256, adjust related patterns. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/sse.md (vec_set): Removed. (V_128H): Merge into .. (V_128): .. this. (V_256H): Merge

[PATCH] Use vmaskmov{ps, pd} for VI48_128_256 when TARGET_AVX2 is not available.

2023-08-24 Thread liuhongt via Gcc-patches
vpmaskmov{d,q} is available for TARGET_AVX2, vmaskmov{ps,ps} is available for TARGET_AVX, w/o TARGET_AVX2, we can use vmaskmov{ps,pd} for VI48_128_256 Bootstrapped and regtested on x86_64-pc-linux{-m32,}. Ready push to trunk. gcc/ChangeLog: PR target/19 * config/i386/sse.md (

[PATCH] Refactor vector HF/BF mode iterators and patterns.

2023-08-30 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/sse.md (_blendm): Merge VF_AVX512HFBFVL into VI12HFBF_AVX512VL. (VF_AVX512HFBF16): Renamed to VHFBF. (VF_AVX512FP16VL): Renamed to VHF_AVX512VL. (VF_

[PATCH] Adjust costing of emulated vectorized gather/scatter

2023-08-30 Thread liuhongt via Gcc-patches
r14-332-g24905a4bd1375c adjusts costing of emulated vectorized gather/scatter. commit 24905a4bd1375ccd99c02510b9f9529015a48315 Author: Richard Biener Date: Wed Jan 18 11:04:49 2023 +0100 Adjust costing of emulated vectorized gather/scatter Emulated gather/scatter behave similar to

[PATCH] Generate vmovsh instead of vpblendw for specific vec_merge.

2023-09-04 Thread liuhongt via Gcc-patches
On SPR, vmovsh can be execute on 3 ports, vpblendw can only be executed on 2 ports. On znver4, vpblendw can be executed on 4 ports, if vmovsh is similar as vmovss, then it can also be executed on 4 ports. So there's no difference for znver? but vmovsh is more optimized on SPR. Bootstrapped and reg

[PATCH] Support vpermw/vpermi2w/vpermt2w instructions for vector HF/BFmodes.

2023-09-06 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/sse.md (_vpermt2var3): New define_insn. (VHFBF_AVX512VL): New mode iterator. (VI2HFBF_AVX512VL): New mode iterator. --- gcc/config/i386/sse.md | 32

[PATCH] Remove constraint modifier % for fcmaddcph/fcmulcph since there're not commutative.

2023-09-07 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} on SPR. Ready push to trunk and backport to GCC13/GCC12. gcc/ChangeLog: PR target/111306 * config/i386/sse.md (int_comm): New int_attr. (fma__): Remove % for Complex conjugate operations since they're not

[PATCH] Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative.

2023-09-10 Thread liuhongt via Gcc-patches
Here's the patch I've commited. The patch also remove % for vfmaddcph. gcc/ChangeLog: PR target/111306 PR target/111335 * config/i386/sse.md (int_comm): New int_attr. (fma__): Remove % for Complex conjugate operations since they're not commutative.

[PATCH] Fix incorrect insn type to avoid ICE in memory attr auto-detection.

2022-11-07 Thread liuhongt via Gcc-patches
Memory attribute auto detection will check operand 2 for type sselog, and check operand 1 for type sselog1. For below 2 insns, there's no operand 2. Change type to sselog1. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/107540 * c

[PATCH 2/2] Enable hwasan for x86-64.

2022-11-10 Thread liuhongt via Gcc-patches
libsanitizer * configure.tgt: Enable hwasan for x86-64. --- libsanitizer/configure.tgt | 1 + 1 file changed, 1 insertion(+) diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt index 87d8a2c3820..72385a4a39d 100644 --- a/libsanitizer/configure.tgt +++ b/libsanitizer/confi

[PATCH 1/2] Implement hwasan target_hook.

2022-11-10 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/i386-opts.h (enum lam_type): New enum. * config/i386/i386.c (ix86_memtag_can_tag_addresses): New. (ix86_memtag_set_tag): Ditto. (ix86_memtag_extract_tag): Ditto. (ix86_memtag_add_tag): Ditto. (ix86_memtag_tag_size): Ditto

[PATCH 0/2] Support HWASAN with Intel LAM

2022-11-10 Thread liuhongt via Gcc-patches
}. Ok for trunk? liuhongt (2): Implement hwasan target_hook. Enable hwasan for x86-64. gcc/config/i386/i386-expand.cc | 12 gcc/config/i386/i386-options.cc | 3 + gcc/config/i386/i386-opts.h | 6 ++ gcc/config/i386/i386-protos.h | 2 + gcc/config/i386/i386.c

[PATCH] [x86] define builtins for "shared" avxneconvert-avx512bf16vl builtins.

2022-11-17 Thread liuhongt via Gcc-patches
This should fix incorrect error when call those builtin with -mavxneconvert and w/o -mavx512bf16 -mavx512vl. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} Ready to push to trunk. gcc/ChangeLog: * config/i386/i386-builtins.cc (def_builtin): Hanlde "shared" avx512bf16vl-

[PATCH] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-04-19 Thread liuhongt via Gcc-patches
-Jakub's comments-- That said, these fundamental types whose presence/absence depends on ISA flags are quite problematic IMHO, as they are incompatible with the target attribute/pragmas. Whether they are available or not available depends on whether in this case SSE2 is enabled during c

[PATCH 1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet.

2023-04-19 Thread liuhongt via Gcc-patches
1547 /* If this insn loads a parameter from its stack slot, then it 1548 represents a savings, rather than a cost, if the parameter is 1549 stored in memory. Record this fact. 1550 1551 Similarly if we're loading other constants from memory (constant 1552 pool, TOC references, sma

[PATCH 2/2] Adjust testcases after better RA decision.

2023-04-19 Thread liuhongt via Gcc-patches
After optimization for RA, memory op is not propagated into instructions(>1), and it make testcases not generate vxorps since the memory is loaded into the dest, and the dest is never unused now. So rewrite testcases to make the codegen more stable. gcc/testsuite/ChangeLog: * gcc.target/

[PATCH] Canonicalize vec_merge when mask is constant.

2023-04-19 Thread liuhongt via Gcc-patches
Use swap_communattive_operands_p for canonicalization. When both value has same operand precedence value, then first bit in the mask should select first operand. The canonicalization should help backends for pattern match. .i.e. x86 backend has lots of vec_merge patterns, combine will create any f

[PATCH 2/2] [i386] def_or_undef __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__ for target attribute/pragmas.

2023-04-21 Thread liuhongt via Gcc-patches
> But for the C++23 macros, more importantly I think we really should > also in ix86_target_macros_internal add > if (c_dialect_cxx () > && cxx_dialect > cxx20 > && (isa_flag & OPTION_MASK_ISA_SSE2)) > { > def_or_undef (parse_in, "__STDCPP_FLOAT16_T__"); > def_or_undef

[PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-04-21 Thread liuhongt via Gcc-patches
> > + if (!TARGET_SSE2) > > +{ > > + if (c_dialect_cxx () > > + && cxx_dialect > cxx20) > > Formatting, both conditions are short, so just put them on one line. Changed. > But for the C++23 macros, more importantly I think we really should > also in ix86_target_macros_internal add

[PATCH] Add testcases for ffs/ctz vectorization.

2023-04-22 Thread liuhongt via Gcc-patches
Ready push to trunk. gcc/testsuite/ChangeLog: PR tree-optimization/109011 * gcc.target/i386/pr109011-b1.c: New test. * gcc.target/i386/pr109011-b2.c: New test. * gcc.target/i386/pr109011-d1.c: New test. * gcc.target/i386/pr109011-d2.c: New test. * g

[PATCH] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-04-26 Thread liuhongt via Gcc-patches
Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try intermediate integer type whenever gimple ranger can tell it's safe. .i.e. When there's no direct optab for vector long long -> vector float, but the value range of integer can be represented as int, try vector int -> vector floa

[PATCH v2] Canonicalize vec_merge when mask is constant.

2023-05-03 Thread liuhongt via Gcc-patches
Here's update patch with documents in md.texi. Ok for trunk? -- Use swap_communattive_operands_p for canonicalization. When both value has same operand precedence value, then first bit in the mask should select first operand. The canonicalization should help backends for pattern match

[PATCH] [powerpc] Add a peephole2 to eliminate redundant move from VSX_REGS to GENERAL_REGS when it's from memory.

2023-05-03 Thread liuhongt via Gcc-patches
r14-172-g0368d169492017 use NO_REGS instead of GENERAL_REGS in memory cost calculation when preferred register class is unkown. + /* Costs for NO_REGS are used in cost calculation on the +1st pass when the preferred register classes are not +known yet. In this case we take the

[PATCH V2] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-05-07 Thread liuhongt via Gcc-patches
> > @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info > > *vinfo, vec *vec_oprnds, > >stmt_vec_info stmt_info, > >vec &vec_dsts, > >gimple_stmt_iterator *gsi,

[PATCH] Detect bswap + rotate for byte permutation in pass_bswap.

2023-05-09 Thread liuhongt via Gcc-patches
The patch doesn't handle: 1. cast64_to_32, 2. memory source with rsize < range. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR middle-end/108938 * gimple-ssa-store-merging.cc (is_bswap_or_nop_p): New function, cut from origin

[PATCH] Change AVX512FP16 to AVX512-FP16 in the document.

2023-01-28 Thread liuhongt via Gcc-patches
The official name is AVX512-FP16. Ready to push to trunk. gcc/ChangeLog: * config/i386/i386.opt: Change AVX512FP16 to AVX512-FP16. * doc/invoke.texi: Ditto. --- gcc/config/i386/i386.opt | 2 +- gcc/doc/invoke.texi | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-)

[PATCH] Change AVX512FP16 to AVX512-FP16 which is official name.

2023-01-28 Thread liuhongt via Gcc-patches
Ready to push to trunk. --- htdocs/gcc-12/changes.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index 30fa4d6e..49055ffe 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -754,7 +754,7 @@

<    1   2   3   4   5   6   7   >