[PATCH] Allocate general register(memory/immediate) for 16/32/64-bit vector bit_op patterns.

2022-07-10 Thread liuhongt via Gcc-patches
And split it to GPR-version instruction after reload. This will enable below optimization for 16/32/64-bit vector bit_op - movd(%rdi), %xmm0 - movd(%rsi), %xmm1 - pand%xmm1, %xmm0 - movd%xmm0, (%rdi) + movl(%rsi), %eax + andl%eax, (%rdi)

[PATCH] [RFC]Support vectorization for Complex type.

2022-07-10 Thread liuhongt via Gcc-patches
The patch only handles load/store(including ctor/permutation, except gather/scatter) for complex type, other operations don't needs to be handled since they will be lowered by pass cplxlower.(MASK_LOAD is not supported for complex type, so no need to handle either). Instead of support vector(2) _C

[PATCH] Extend 64-bit vector bit_op patterns with ?r alternative

2022-07-13 Thread liuhongt via Gcc-patches
And split it to GPR-version instruction after reload. > ?r was introduced under the assumption that we want vector values > mostly in vector registers. Currently there are no instructions with > memory or immediate operand, so that made sense at the time. Let's > keep ?r until logic instructions w

[PATCH] Extend 16/32-bit vector bit_op patterns with (m, 0, i)(vertical) alternative.

2022-07-17 Thread liuhongt via Gcc-patches
And split it after reload. >IMO, the only case it is worth adding is a direct immediate store to >memory, which HJ recently added. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/106038 * config/i386/mmx.md (3): Extend to AND mem,

[PATCH V2] [RFC]Support vectorization for Complex type.

2022-07-17 Thread liuhongt via Gcc-patches
V2 update: Handle VMAT_ELEMENTWISE, VMAT_CONTIGUOUS_PERMUTE, VMAT_STRIDED_SLP, VMAT_CONTIGUOUS_REVERSE, VMAT_CONTIGUOUS_DOWN for complex type. I've run SPECspeed@2017 627.cam4_s, there's some vectorization cases, but no big performance impact(since this patch only handle load/store). Any co

[PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-18 Thread liuhongt via Gcc-patches
And split it after reload. > You will need ix86_binary_operator_ok insn constraint here with > corresponding expander using ix86_fixup_binary_operands_no_copy to > prepare insn operands. Split define_expand with just register_operand, and allow memory/immediate in define_insn, assume combine/forwp

[PATCH] Move pass_cse_sincos after vectorizer.

2022-07-19 Thread liuhongt via Gcc-patches
__builtin_cexpi can't be vectorized since there's gap between it and vectorized sincos version(In libmvec, it passes a double and two double pointer and returns nothing.) And it will lose some vectorization opportunity if sin & cos are optimized to cexpi before vectorizer. I'm trying to add vect_r

gcc-patches@gcc.gnu.org

2022-07-19 Thread liuhongt via Gcc-patches
> My original comments still stand (it feels like this should be more generic). > Can we go the way lowering complex loads/stores first?  A large part > of the testcases > added by the patch should pass after that. This is the patch as suggested, one additional change is handling COMPLEX_CST for r

[PATCH V3] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-20 Thread liuhongt via Gcc-patches
And split it after reload. gcc/ChangeLog: PR target/106038 * config/i386/mmx.md (3): New define_expand, it's original "3". (*3): New define_insn, it's original "3" be extended to handle memory and immediate operand with ix86_binary_operator_ok. Also

[PATCH] Adjust testcase.

2022-07-21 Thread liuhongt via Gcc-patches
r13-1762-gf9d4c3b45c5ed5f45c8089c990dbd4e181929c3d lower complex type move to scalars, but testcase pr23911 is supposed to scan __complex__ constant which is never available, so adjust testcase to scan IMAGPART/REALPART_EXPR constants separately. Pushed as obvious patch. gcc/testsuite/ChangeLog

[RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-03 Thread liuhongt via Gcc-patches
For neg, the patch create a vec_init as [ a, -a, a, -a, ... ] and no vec_step is needed to update vectorized iv since vf is always multiple of 2(negative * negative is positive). For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..] as vec_step as [ c * nunits, c * nunits, c * nuni

[PATCH] Fix ICE in rtl check when bootstrap.

2023-08-07 Thread liuhongt via Gcc-patches
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20230806/work/gcc-14-20230806/libgfortran/generated/matmul_i1.c: In function ‘matmul_i1_avx512f’: /var/tmp/portage/sys-devel/gcc-14.0.0_pre20230806/work/gcc-14-20230806/libgfortran/generated/matmul_i1.c:1781:1: internal compiler error: RTL check: expected

[PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762]

2023-08-07 Thread liuhongt via Gcc-patches
Similar like r14-2786-gade30fad6669e5, the patch is for V4HF/V2HFmode. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/110762 * config/i386/mmx.md (3): Changed from define_insn to define_expand and break into .. (v4

[PATCH] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread liuhongt via Gcc-patches
Don't access leaf 7 subleaf 1 unless subleaf 0 says it is supported via EAX. Intel documentation says invalid subleaves return 0. We had been relying on that behavior instead of checking the max sublef number. It appears that some Sandy Bridge CPUs return at least the subleaf 0 EDX value for subl

[PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread liuhongt via Gcc-patches
> Please rather do it in a more self-descriptive way, as proposed in the > attached patch. You won't need a comment then. > Adjusted in V2 patch. Don't access leaf 7 subleaf 1 unless subleaf 0 says it is supported via EAX. Intel documentation says invalid subleaves return 0. We had been relying

[PATCH] Rename local variable subleaf_level to max_subleaf_level.

2023-08-08 Thread liuhongt via Gcc-patches
This minor fix is preapproved in [1]. Committed to trunk. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626758.html gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Rename local variable subleaf_level to max_subleaf_level. --- gcc/common/config

[PATCH] i386: Do not sanitize upper part of V2HFmode and V4HFmode reg with -fno-trapping-math [PR110832]

2023-08-09 Thread liuhongt via Gcc-patches
Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named patterns in order to avoid generation of partial vector V8HFmode trapping instructions. Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR target/110832 * config/i386/mmx.md

[PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread liuhongt via Gcc-patches
Currently we have 3 different independent tunes for gather "use_gather,use_gather_2parts,use_gather_4parts", similar for scatter, there're "use_scatter,use_scatter_2parts,use_scatter_4parts" The patch support 2 standardizing options to enable/disable vectorization for all gather/scatter instructio

[PATCH] Software mitigation: Disable gather generation in vectorization for GDS affected Intel Processors.

2023-08-10 Thread liuhongt via Gcc-patches
For more details of GDS (Gather Data Sampling), refer to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html After microcode update, there's performance regression. To avoid that, the patch disables gather gene

[PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions

2023-08-10 Thread liuhongt via Gcc-patches
Rename original use_gather to use_gather_8parts, Support -mtune-ctrl={,^}use_gather to set/clear tune features use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather as alias of -mtune-ctrl=, use_gather, ^use_gather. Similar for use_scatter. How about this version? gcc/ChangeLog:

[PATCH] Generate vmovapd instead of vmovsd for moving DFmode between SSE_REGS.

2023-08-13 Thread liuhongt via Gcc-patches
vmovapd can enable register renaming and have same code size as vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than vmovsh. When TARGET_AVX512VL is not available, still generate vmovsd/vmovss/vmovsh to avoid vmovapd/vmovaps zmm16-31. Bootstrapped and regtested on x86_64-pc-linux-gn

[PATCH] Support -march=gracemont

2023-08-17 Thread liuhongt via Gcc-patches
Alderlake-N is E-core only, add it as an alias of Alderlake. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Any comments? gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Detect Alderlake-N. * common/config/i386/i386-common.cc (alias_table): Suppo

[PATCH] Mention Intel -march=gracemont for Alderlake-N.

2023-08-20 Thread liuhongt via Gcc-patches
--- htdocs/gcc-14/changes.html | 4 1 file changed, 4 insertions(+) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index eae25f1a..2c888660 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -151,6 +151,10 @@ a work-in-progress. -march=luna

[PATCH] Adjust testcase for Intel GDS.

2023-08-21 Thread liuhongt via Gcc-patches
gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-pr88464-2.c: Add -mgather to options. * gcc.target/i386/avx512f-pr88464-3.c: Ditto. * gcc.target/i386/avx512f-pr88464-4.c: Ditto. * gcc.target/i386/avx512f-pr88464-6.c: Ditto. * gcc.target/i386/avx51

[PATCH] [x86] Testcase fix.

2023-08-21 Thread liuhongt via Gcc-patches
Commit as an abvious fix. gcc/testsuite/ChangeLog: * gcc.target/i386/invariant-ternlog-1.c: Only scan %rdx under TARGET_64BIT. --- gcc/testsuite/gcc.target/i386/invariant-ternlog-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386

[PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-20 Thread liuhongt via Gcc-patches
I notice there's some refactor in vectorizable_conversion for code_helper,so I've adjusted my patch to that. Here's the patch I'm going to commit. We have already use intermidate type in case WIDEN, but not for NONE, this patch extended that. gcc/ChangeLog: PR target/110018 * tre

[PATCH] Refine maskloadmn pattern with UNSPEC_MASKLOAD.

2023-06-20 Thread liuhongt via Gcc-patches
If mem_addr points to a memory region with less than whole vector size bytes of accessible memory and k is a mask that would prevent reading the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32

[PATCH 2/3] Don't use intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.

2023-06-25 Thread liuhongt via Gcc-patches
> > Hmm, good question. GENERIC has a direct truncation to unsigned char > > for example, the C standard generally says if the integral part cannot > > be represented then the behavior is undefined. So I think we should be > > safe here (0x1.0p32 doesn't fit an int). > > We should be following An

[PATCH 3/3] [aarch64] Adjust testcase to match assembly output after r14-2007.

2023-06-25 Thread liuhongt via Gcc-patches
The new assembly looks better than original one, so I adjust those testcases. Ok for trunk? gcc/testsuite/ChangeLog: PR tree-optimization/110371 PR tree-optimization/110018 * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Scan scvt + sxtw instead of scvt + zip1 + z

[PATCH 1/3] Use cvt_op to save intermediate type operand instead of "subtle" vec_dest.

2023-06-25 Thread liuhongt via Gcc-patches
When there're multiple operands in vec_oprnds0, vec_dest will be overwrited to vectype_out, but in multi_step_cvt case, cvt_type is expected. It caused an ICE when verify_gimple_in_cfg. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and aarch64-linux-gnu. Ok for trunk? gcc/ChangeLog:

[PATCH] Issue a warning for conversion between short and __bf16 under TARGET_AVX512BF16.

2023-06-26 Thread liuhongt via Gcc-patches
__bfloat16 is redefined from typedef short to real __bf16 since GCC V13. The patch issues an warning for potential silent implicit conversion between __bf16 and short where users may only expect a data movement. To avoid too many false positive, warning is only under TARGET_AVX512BF16. Bootstrapp

[PATCH] [x86] Refine maskstore patterns with UNSPEC_MASKMOV.

2023-06-26 Thread liuhongt via Gcc-patches
At the rtl level, we cannot guarantee that the maskstore is not optimized to other full-memory accesses, as the current implementations are equivalent in terms of pattern, to solve this potential problem, this patch refines the pattern of the maskstore and the intrinsics with unspec. One thing I'm

[PATCH 2/2] Make option mvzeroupper independent of optimization level.

2023-06-26 Thread liuhongt via Gcc-patches
pass_insert_vzeroupper is under condition TARGET_AVX && TARGET_VZEROUPPER && flag_expensive_optimizations && !optimize_size But the document of mvzeroupper doesn't mention the insertion required -O2 and above, it may confuse users when they explicitly use -Os -mvzeroupper. mvzeroupp

[PATCH 1/2] Don't issue vzeroupper for vzeroupper call_insn.

2023-06-26 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/82735 * config/i386/i386.cc (ix86_avx_u127_mode_needed): Don't emit vzeroupper for vzeroupper call_insn. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-vzeroupper-30.

[PATCH] Break false dependence for vpternlog by inserting vpxor.

2023-07-03 Thread liuhongt via Gcc-patches
vpternlog is also used for optimization which doesn't need any valid input operand, in that case, the destination is used as input in the instruction and that creates a false dependence. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to push to trunk. gcc/ChangeLog: PR t

[PATCH] Disparage slightly for the alternative which move DFmode between SSE_REGS and GENERAL_REGS.

2023-07-05 Thread liuhongt via Gcc-patches
For testcase void __cond_swap(double* __x, double* __y) { bool __r = (*__x < *__y); auto __tmp = __r ? *__x : *__y; *__y = __r ? *__y : *__x; *__x = __tmp; } GCC-14 with -O2 and -march=x86-64 options generates the following code: __cond_swap(double*, double*): movsd xmm1, QWORD

[PATCH 2/2] Adjust rtx_cost for DF/SFmode AND/IOR/XOR/ANDN operations.

2023-07-05 Thread liuhongt via Gcc-patches
They should have same cost as vector mode since both generate pand/pandn/pxor/por instruction. Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for DF/SFmode AND/IOR/XOR/ANDN operations.

[PATCH 1/2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-05 Thread liuhongt via Gcc-patches
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for the testcase in the PR, there's an extra move from cmp_op0 to if_true, and it failed ix86_expand_sse_fp_minmax. This patch adds pre_reload splitter to detect the

[PATCH V2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-06 Thread liuhongt via Gcc-patches
> Please split the above pattern into two, one emitting UNSPEC_IEEE_MAX > and the other emitting UNSPEC_IEEE_MIN. Splitted. > The test involves blendv instruction, which is SSE4.1, so it is > pointless to test it without -msse4.1. Please add -msse4.1 instead of > -march=x86_64 and use sse4_runtime

[PATCH] Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'

2023-07-09 Thread liuhongt via Gcc-patches
False dependency happens when destination is only updated by pternlog. There is no false dependency when destination is also used in source. So either a pxor should be inserted, or input operand should be set with constraint '0'. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to p

[PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.

2023-07-10 Thread liuhongt via Gcc-patches
Similar like we did for cmpxchg, but extended to all ix86_comparison_int_operator since cmpccxadd set EFLAGS exactly same as CMP. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, Ok for trunk? gcc/ChangeLog: PR target/110591 * config/i386/sync.md (cmpccxadd_): Add a new

[PATCH v2] Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'

2023-07-10 Thread liuhongt via Gcc-patches
Here's updated patch. 1. use optimize_insn_for_speed_p instead of using optimize_function_for_speed_p. 2. explicitly move memory to dest register to avoid false dependence in one_cmpl pattern. False dependency happens when destination is only updated by pternlog. There is no false dependency whe

[PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.

2023-07-11 Thread liuhongt via Gcc-patches
Similar like we did for CMPXCHG, but extended to all ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same as CMP. When operand order in CMP insn is same as that in CMPCCXADD, CMP insn can be eliminated directly. When operand order is swapped in CMP insn, only optimize cmpccxadd +

[PATCH] Fix typo in the testcase.

2023-07-11 Thread liuhongt via Gcc-patches
Antony Polukhin 2023-07-11 09:51:58 UTC There's a typo at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87 It should be `|| !test3() || !test3r()` rather than `|| !te

[PATCH] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-10 Thread liuhongt via Gcc-patches
> The quoted patch shows -shared in context and you didn't post a > backport version > to look at. But yes, we shouldn't change -shared behavior on a > branch, even less so make it > inconsistent between targets. Here's the patch. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for

[PATCH] Provide -fcf-protection=branch,return.

2023-05-11 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/89701 * common.opt: Refactor -fcf-protection= to support combination of param. * lto-wrapper.c (merge_and_complain): Adjusted. * opts.c (parse_cf_protection_opt

[PATCH V2] Provide -fcf-protection=branch,return.

2023-05-13 Thread liuhongt via Gcc-patches
> I think this could be simplified if you use either EnumSet or > EnumBitSet instead in common.opt for `-fcf-protection=`. Use EnumSet instead of EnumBitSet since CF_FULL is not power of 2. It is a bit tricky for sets classification, cf_branch and cf_return should be in different sets, but they bo

[PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-16 Thread liuhongt via Gcc-patches
r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost calculation when the preferred register class are not known yet. It regressed powerpc PR109610 and PR109858, it looks too aggressive to use NO_REGS when mode can be allocated with GENERAL_REGS. The patch takes a step back, still use

[PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABS_EXPR.

2023-05-22 Thread liuhongt via Gcc-patches
Also for 64-bit vector abs intrinsics _mm_abs_{pi8,pi16,pi32}. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/109900 * config/i386/i386.cc (ix86_gimple_fold_builtin): Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} and

[PATCH] [x86] Split notl + pbraodcast + pand to pbroadcast + pandn more modes.

2023-05-25 Thread liuhongt via Gcc-patches
r12-5595-gc39d77f252e895306ef88c1efb3eff04e4232554 adds 2 splitter to transform notl + pbroadcast + pand to pbroadcast + pandn for VI124_AVX2 which leaves out all DI-element-size ones as well as all 512-bit ones. This patch extend the splitter to VI_AVX2 which will handle DImode for AVX2, and V64QI

[PATCH] Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors.

2023-05-25 Thread liuhongt via Gcc-patches
lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since icelake. At least for icelake and later intel Core processors, the errata tune is not needed. And the tune isn't need for ATOM either. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to push to trunk. gcc/Chang

[PATCH] Support cond_add/sub/mul/div for vector float/double.

2021-08-01 Thread liuhongt via Gcc-patches
Hi: This patch supports cond_add/sub/mul/div expanders for vector float/double. There're still cond_fma/fms/fnms/fma/max/min/xor/ior/and left which I failed to figure out a testcase to validate them. Also cond_add/sub/mul for vector integer. Bootstrap is ok, survive the regression test on

[PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-08-01 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode. * config/i386/i386.c (enum x86_64_reg_class): Add X86_64_SSEHF_CLASS. (merge_classes): Handle X86_64_SSEHF_CLASS. (examine_argument): Ditto. (construct_container): Ditto.

[PATCH V3 0/6] Initial support for AVX512FP16

2021-08-01 Thread liuhongt via Gcc-patches
AVX512FP16 feature and scalar _Float16 instructions. liuhongt (5): Update hf soft-fp from glibc. [i386] Enable _Float16 type for TARGET_SSE2 and above. [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations. Support -fexcess-precision=16 which will enable

[PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

2021-08-01 Thread liuhongt via Gcc-patches
gcc/ada/ChangeLog: * gcc-interface/misc.c (gnat_post_options): Issue an error for -fexcess-precision=16. gcc/c-family/ChangeLog: * c-common.c (excess_precision_mode_join): Update below comments. (c_ts18661_flt_eval_method): Set excess_precision_type to EXC

[PATCH 1/6] Update hf soft-fp from glibc.

2021-08-01 Thread liuhongt via Gcc-patches
libgcc/ChangeLog * soft-fp/eqhf2.c: New file. * soft-fp/extendhfdf2.c: New file. * soft-fp/extendhfsf2.c: New file. * soft-fp/extendhfxf2.c: New file. * soft-fp/half.h (FP_CMP_EQ_H): New marco. * soft-fp/truncdfhf2.c: New file * soft-fp/trunc

[PATCH 3/6] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

2021-08-01 Thread liuhongt via Gcc-patches
libgcc/ChangeLog: * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro. * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto. * config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto. * config/i386/t-softfp: Add hf soft-fp. * config.host: Add i386/64/t-softf

[PATCH 6/6] AVX512FP16: Support vector init/broadcast/set/extract for FP16.

2021-08-01 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic. (_mm256_set_ph): Likewise. (_mm512_set_ph): Likewise. (_mm_setr_ph): Likewise. (_mm256_setr_ph): Likewise. (_mm512_setr_ph): Likewise. (_mm_set1_ph): Likewise.

[PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

2021-08-01 Thread liuhongt via Gcc-patches
From: "Guo, Xuepeng" gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect FEATURE_AVX512FP16. * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512FP16_SET, OPTION_MASK_ISA_AVX512FP16_UNSET, OPTION_MASK_ISA2_AVX512FP1

[PATCH] Add cond_add/sub/mul for vector integer modes.

2021-08-02 Thread liuhongt via Gcc-patches
Hi: This is a follow up of [1]. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Pushed to trunk. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576514.html gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. (cond_mul): Ditto. gcc/testsuite/ChangeLo

[PATCH] [i386] Refine predicate of peephole2 to general_reg_operand. [PR target/101743]

2021-08-03 Thread liuhongt via Gcc-patches
Hi: The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc should only work on general registers, considering that x86 also supports mov instructions between gpr, sse reg, mask reg, limiting the peephole2 predicate to general_reg_operand. I failed to contruct a testcase, but I believ

[PATCH] [i386] Support cond_{fma, fms, fnma, fnms} for vector float/double under AVX512.

2021-08-03 Thread liuhongt via Gcc-patches
Hi: This patch add expanders cond_{fma,fms,fnms,fnms} for vector float/double modes. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Pushed to trunk. gcc/ChangeLog: * config/i386/sse.md (cond_fma): New expander. (cond_fms): Ditto. (cond_fnma): Ditto.

[PATCH] Add dg-require-effective-target for testcases.

2021-08-03 Thread liuhongt via Gcc-patches
Hi: Pushed to trunk as an abvious fix. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_addsubmul_d-2.c: Add dg-require-effective-target for avx512. * gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto. * gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto. * gc

[PATCH 0/3] [i386] Support cond_{smax, smin, umax, umin, xor, ior, and} for vector modes under AVX512

2021-08-04 Thread liuhongt via Gcc-patches
Hi: Together with the previous 3 patches, all cond_op expanders of vector modes are supported (if they have a corresponding avx512 mask instruction). Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. liuhongt (3): [i386] Support cond_{smax,smin,umax,umin} for vector integer modes

[PATCH 1/3] [i386] Support cond_{smax, smin, umax, umin} for vector integer modes under AVX512.

2021-08-04 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_maxmin_b-1.c: New test. * gcc.target/i386/cond_op_maxmin_b-2.c: New test. * gcc.target/i386/cond_op_maxmin_d-1.c: New test. * gcc.target/i386/cond

[PATCH 3/3] [i386] Support cond_{xor, ior, and} for vector integer mode under AVX512.

2021-08-04 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_anylogic_d-1.c: New test. * gcc.target/i386/cond_op_anylogic_d-2.c: New test. * gcc.target/i386/cond_op_anylogic_q-1.c: New test. * gcc.target/i38

[PATCH 2/3] [i386] Support cond_{smax, smin} for vector float/double modes under AVX512.

2021-08-04 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_maxmin_double-1.c: New test. * gcc.target/i386/cond_op_maxmin_double-2.c: New test. * gcc.target/i386/cond_op_maxmin_float-1.c: New test. * gcc.ta

[PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-05 Thread liuhongt via Gcc-patches
Hi: --- OK, I think sth is amiss here upthread. insv/extv do look like they are designed to work on integer modes (but docs do not say anything about this here). In fact the caller of extract_bit_field_using_extv is named extract_integral_bit_field. Of course nothing seems to check what kind of m

[PATCH] [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element.

2021-08-06 Thread liuhongt via Gcc-patches
Hi: Bootstrapped and regtested on x86_64-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR rtl-optimization/101796 * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Simplify vector shift/rotate with const_vec_duplicate to vector shift/rot

[PATCH] [i386] Support cond_ashr/lshr/ashl for vector integer modes under AVX512.

2021-08-09 Thread liuhongt via Gcc-patches
Hi: Boostrapped and regtested on x86_64-linux-gnu{-m32,}. gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. (VI248_AVX512VLBW): New mode iterator. * config/i386/predicates.md (nonimmediate_or_const_vec_dup_operand): New predicate. gcc/testsuite/ChangeLo

[PATCH] Extend ldexp{s, d}f3 to vscalefs{s, d} when TARGET_AVX512F and TARGET_SSE_MATH.

2021-08-10 Thread liuhongt via Gcc-patches
Hi: AVX512F supported vscalefs{s,d} which is the same as ldexp except the second operand should be floating point. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. gcc/ChangeLog: PR target/98309 * config/i386/i386.md (ldexp3): Extend to vscalefs[sd] when TARGET_

[PATCH] [i386] Combine avx_vec_concatv16si and avx512f_zero_extendv16hiv16si2_1 to avx512f_zero_extendv16hiv16si2_2.

2021-08-10 Thread liuhongt via Gcc-patches
Hi: Add define_insn_and_split to combine avx_vec_concatv16si/2 and avx512f_zero_extendv16hiv16si2_1 since the latter already zero_extend the upper bits, similar for other patterns which are related to pmovzx{bw,wd,dq}. It will do optimization like - vmovdqa %ymm0, %ymm0# 7 [c=4 l=

[PATCH] [i386] Introduce a scalar version of avx512f_vmscalef and adjust ldexp3 for it.

2021-08-11 Thread liuhongt via Gcc-patches
Hi: This is the patch i'm going to checkin. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}; 2021-08-12 Uros Bizjak gcc/ChangeLog: PR target/98309 * config/i386/i386.md (avx512f_scalef2): New define_insn. (ldexp3): Adjust for new define_insn.

[PATCH] [i386] Optimize vec_perm_expr to match vpmov{dw,qd,wb}.

2021-08-11 Thread liuhongt via Gcc-patches
Hi: This is another patch to optimize vec_perm_expr to match vpmov{dw,dq,wb} under AVX512. For scenarios(like pr101846-2.c) where the upper half is not used, this patch generates better code with only one vpmov{wb,dw,qd} instruction. For scenarios(like pr101846-3.c) where the upper half is actu

[PATCH] [i386] Optimize __builtin_shuffle_vector.

2021-08-15 Thread liuhongt via Gcc-patches
Hi: Here's updated patch which does 3 things: 1. Support vpermw/vpermb in ix86_expand_vec_one_operand_perm_avx512. 2. Support 256/128-bits vpermi2b in ix86_expand_vec_perm_vpermt2. 3. Add define_insn_and_split to optimize specific vector permutation to opmov{dw,wb,qd}. Bootstrapped and regtes

[PATCH] [i386] Fix ICE.

2021-08-16 Thread liuhongt via Gcc-patches
Hi: avx512f_scalef2 only accept register_operand for operands[1], force it to reg in ldexp3. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for trunk. gcc/ChangeLog: PR target/101930 * config/i386/i386.md (ldexp3): Force operands[1] to reg. gcc/testsuite

[PATCH] [i386] Add x86 tune to enable v2df vector reduction by paddpd.

2021-08-17 Thread liuhongt via Gcc-patches
Hi: This patch add a new x86 tune named X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD to enable haddpd for v2df vector reduction, the tune is disabled by default. Bootstrapped and regtested on x86_64-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR target/97147 * config/i386/i386.h

[PATCH] Revert "Add the member integer_to_sse to processor_cost as a cost simulation for movd/pinsrd. It will be used to calculate the cost of vec_construct."

2021-08-17 Thread liuhongt via Gcc-patches
This reverts commit 872da9a6f664a06d73c987aa0cb2e5b830158a10. PR target/101936 PR target/101929 Bootstrapped and regtested on x86_64-linux-gnu{-m32,} Pushed to master. --- gcc/config/i386/i386.c | 6 +- gcc/config/i386/i386.h | 1 - gcc/config/i386/x8

[PATCH] Disable slp in loop vectorizer when cost model is very-cheap.

2021-08-22 Thread liuhongt via Gcc-patches
Performance impact for the commit with option: -march=x86-64 -O2 -ftree-vectorize -fvect-cost-model=very-cheap SPEC2017 fprate 503.bwaves_rBuildSame 507.cactuBSSN_r -0.04 508.namd_r 0.14 510.parest_r-0.54 511.povray_r 0.10 519.lbm_r B

[PATCH] [i386] Fix ICE.

2021-08-23 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Pushed to trunk. gcc/ChangeLog: PR target/102016 * config/i386/sse.md (*avx512f_pshufb_truncv8hiv8qi_1): Add TARGET_AVX512BW to condition. gcc/testsuite/ChangeLog: PR target/102016 * gcc.target/i3

[PATCH] [i386] Optimize (a & b) | (c & ~b) to vpternlog instruction.

2021-08-23 Thread liuhongt via Gcc-patches
Also optimize below 3 forms to vpternlog, op1, op2, op3 are register_operand or unary_p as (not reg) A: (any_logic (any_logic op1 op2) op3) B: (any_logic (any_logic op1 op2) (any_logic op3 op4)) op3/op4 should be equal to op1/op2 C: (any_logic (any_logic (any_logic:op1 op2) op3) op4) op3/op4 shoul

[PATCH] Change illegitimate constant into memref of constant pool in change_zero_ext.

2021-08-24 Thread liuhongt via Gcc-patches
Hi: This patch extend change_zero_ext to change illegitimate constant into constant pool, this will enable simplification of below: Trying 5 -> 7: 5: r85:V4SF=[`*.LC0'] REG_EQUAL const_vector 7: r84:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel) REG_DEAD r85:V4SF

[PATCH] [i386] Enable avx512 embedde broadcast for vpternlog.

2021-08-24 Thread liuhongt via Gcc-patches
gcc/ChangeLog: PR target/101989 * config/i386/sse.md (_vternlog): Enable avx512 embedded broadcast. (*_vternlog_all): Ditto. (_vternlog_mask): Ditto. gcc/testsuite/ChangeLog: PR target/101989 * gcc.target/i386/pr101989-broadcast-1.c: New te

[PATCH] Adjust testcases to avoid new failures brought by r12-3108 when compiled w -march=cascadelake.

2021-08-24 Thread liuhongt via Gcc-patches
Pushed to trunk as an obvious fix. gcc/testsuite/ChangeLog: PR target/101989 * gcc.target/i386/avx2-shiftqihi-constant-1.c: Add -mno-avx512f. * gcc.target/i386/sse2-shiftqihi-constant-1.c: Add -mno-avx --- gcc/testsuite/gcc.target/i386/avx2-shiftqihi-constant-1.c | 2 +-

[PATCH] Fold more shuffle builtins to VEC_PERM_EXPR.

2021-08-25 Thread liuhongt via Gcc-patches
This patch is a follow-up to [1], it fold all shufps/shufpd builtins into gimple. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. [1] https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html gcc/ PR target/98167 PR target/43147 * config/i386/i386.c (ix86_

[PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-08-26 Thread liuhongt via Gcc-patches
When gimple simplifcation try to combine op and vec_cond_expr to cond_op, it doesn't check if mask type matches. It causes an ICE when expand cond_op with mismatched mode. This patch add a function named cond_vectorized_internal_fn_supported_p to additionally check mask type than vectorized_in

[PATCH] [i386] Unify UNSPEC_MASKED_EQ/GT to the form of UNSPEC_PCMP.

2021-08-30 Thread liuhongt via Gcc-patches
Currently for evex vpcmpeqb instruction, we have two forms of rtl template representation, one is (unspec [op1 op2] UNSPEC_MASK_EQ), the other is (unspec [op1, op2, const_int 0] UNSPEC_PCMP), which increases the maintenance burden, such as optimization (not: vpcmpeqb) to (vpcmpneqb) requires two de

[PATCH 1/2] Revert "Make sure we're playing with integral modes before call extract_integral_bit_field."

2021-08-31 Thread liuhongt via Gcc-patches
This reverts commit 7218c2ec365ce95f5a1012a6eb425b0a36aec6bf. PR middle-end/102133 --- gcc/expmed.c | 103 +-- 1 file changed, 25 insertions(+), 78 deletions(-) diff --git a/gcc/expmed.c b/gcc/expmed.c index f083d6e86d0..3143f38e057 100644 ---

[PATCH 0/2] Get rid of all float-int special cases in validate_subreg.

2021-08-31 Thread liuhongt via Gcc-patches
o see whether binaries are the same as HEAD~2, i guess they're the same. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578189.html. liuhongt (2): Revert "Make sure we're playing with integral modes before call extract_integral_bit_field." Get rid of all f

[PATCH 2/2] Get rid of all float-int special cases in validate_subreg.

2021-08-31 Thread liuhongt via Gcc-patches
gcc/ChangeLog: * emit-rtl.c (validate_subreg): Get rid of all float-int special cases. --- gcc/emit-rtl.c | 40 1 file changed, 40 deletions(-) diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index ff3b4449b37..77ea8948ee8 100644 --- a/gcc/em

[PATCH] [ICE] Check another epilog variable peeling case in vectorizable_nonlinear_induction.

2022-09-13 Thread liuhongt via Gcc-patches
In vectorizable_nonlinear_induction, r13-2503-gc13223b790bbc5 prevent variable peeling by only checking LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo). But when "!vect_use_loop_mask_for_alignment_p (loop_vinfo) && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0", vectorizer will still do variable peel

[PATCH] Modernize ix86_builtin_vectorized_function with corresponding expanders.

2022-09-15 Thread liuhongt via Gcc-patches
For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of in_mode is not equal out_mode, vectorizer doesn't go to internal fn way,still left that part in the ix86_builtin_vectorized_function. Remove others builtins and add corresponding expanders. Note the patch just refactor the codes,

[PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-15 Thread liuhongt via Gcc-patches
There's peephole2 submit in 1990s which split cmp mem, 0 to load mem, reg + test reg, reg. I don't know exact reason why gcc do this. For latest x86 processors, ciscization should help processor frontend also codesize, for processor backend, they should be the same(has same uops). So the patch de

[PATCH] [x86] Adjust issue_rate for latest Intel processors.

2022-09-15 Thread liuhongt via Gcc-patches
For Skylake based processor, decoder is 4-way. For Sunny Cove and Willow Cove, decoder is 5-way. For Golden cove, decoder is 6-way. Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}. Ready to install. gcc/ChangeLog: * config/i386/x86-tune-sched.cc (ix86_issue_rate): Adjust for

[PATCH] Support 64-bit vectorization for single-precision floating rounding operation.

2022-09-19 Thread liuhongt via Gcc-patches
Here's list the patch supported. rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR target/106910 * config/i386/mmx.md (nearbyintv2sf2): New expander. (rintv2sf2): Ditt

[PATCH] Fix incorrect handle in vectorizable_induction for mixed induction type.

2022-09-19 Thread liuhongt via Gcc-patches
The codes in vectorizable_induction for slp_node assume all phi_info have same induction type(vect_step_op_add), but since we support nonlinear induction, it could be wrong handled. So the patch return false when slp_node has mixed induction type. Note codes in other place will still vectorize the

[PATCH] Don't check can_vec_perm_const_p for nonlinear iv_init when it's constant.

2022-09-20 Thread liuhongt via Gcc-patches
When init_expr is INTEGER_CST or REAL_CST, can_vec_perm_const_p is not necessary since there's no real vec_perm needed, but vec_gen_perm_mask_checked will gcc_assert (can_vec_perm_const_p). So it's better to use vec_gen_perm_mask_any in vect_create_nonlinear_iv_init. Bootstrapped and regtested on

[PATCH] [x86] Fix typo in floorv2sf2, should be register_operand for op1, not vector_operand.

2022-09-21 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Verify 526.blend_r can be rebuilt with the fix. Ok for trunk? gcc/ChangeLog: PR target/106994 * config/i386/mmx.md (floorv2sf2): Fix typo, use register_operand instead of vector_operand for operands[1]. gcc/testsu

[PATCH] [x86] Support 2-instruction vector shuffle for V4SI/V4SF in ix86_expand_vec_perm_const_1.

2022-09-22 Thread liuhongt via Gcc-patches
x86 have shufps which shuffles the first operand to the lower 64-bit, and the second operand to the upper 64-bit. For __builtin_shufflevector (op0, op1, 1, 4, 3, 6), it will be veclowered since can_vec_perm_const_p return false for sse2 target. This patch add a new function to support 2-operand v4s

[PATCH] [x86] Support 2-instruction vector shuffle for V4SI/V4SF in ix86_expand_vec_perm_const_1.

2022-09-25 Thread liuhongt via Gcc-patches
>Missing space before ( Changed. >> + /* shufps. */ >> + ok = expand_vselect_vconcat(tmp, d->op0, d->op1, >> + perm1, d->nelt, false); > >Ditto. Changed. > >> + /* When lone_idx is not 0, it must from second op(count == 1). */ >> + gcc_assert ((lo

<    1   2   3   4   5   6   >