Re: [PATCH 00/18] Support -mevex512 for AVX512

2023-10-06 Thread Hongtao Liu
On Thu, Sep 28, 2023 at 11:23 AM ZiNgA BuRgA wrote: > > That sounds about right. The code I had in mind would perhaps look like: > > > #if defined(__AVX512BW__) && defined(__AVX512VL__) > #if defined(__EVEX256__) && !defined(__EVEX512__) > // compiled code is AVX10.1/256 and AVX512

Re: [PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-10-06 Thread Hongtao Liu
> (apx_egpr): Likewise. > (apx_push2pop2): Likewise. > (apx_ndd): Likewise. > (apx_all): Likewise. > * doc/invoke.texi: Document mapxf. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/apx-1.c: New test. > > Co-aut

Re: [PATCH] [i386] APX EGPR: fix missing pattern that prohibits egpr

2023-10-08 Thread Hongtao Liu
On Mon, Oct 9, 2023 at 10:05 AM Hongyu Wang wrote: > > For vec_concatv2di, m constraint in alternative 0 and 1 could result in > egpr allocated on operand 2 under -mapxf. Should use jm instead. > > Bootstrapped/regtested on x86-64-linux-gnu. > > Ok for trunk? Ok. > > gcc/ChangeLog: > > * c

Re: [PATCH] [APX] Support Intel APX PUSH2POP2

2023-10-11 Thread Hongtao Liu
On Tue, Oct 10, 2023 at 2:51 PM Hongyu Wang wrote: > > From: "Mo, Zewei" > > Hi, > > Intel APX PUSH2POP2 feature has been released in [1]. > > This feature requires stack to be aligned at 16byte, therefore in > prologue/epilogue, a standalone push/pop will be emitted before any > push2/pop2 if th

Re: [PATCH] Disparage slightly for the alternative which move DFmode between SSE_REGS and GENERAL_REGS.

2023-10-12 Thread Hongtao Liu
On Thu, Jul 6, 2023 at 1:53 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jul 6, 2023 at 3:14 AM liuhongt wrote: > > > > For testcase > > > > void __cond_swap(double* __x, double* __y) { > > bool __r = (*__x < *__y); > > auto __tmp = __r ? *__x : *__y; > > *__y = __r ? *__y : *__x; > >

Re: [PATCH 0/3] Add Intel new cpu archs

2023-10-17 Thread Hongtao Liu
On Mon, Oct 16, 2023 at 2:25 PM Haochen Jiang wrote: > > Hi all, > > The patches aim to add new cpu archs Clear Water Forest and > Panther Lake. Here comes the documentation: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > > Also in the patches, I refactored how we detect cpu according to f

Re: [PATCH] Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big. 65; 6800; 1c There's loop in vect_peel_nonlinear_iv_init to

2023-10-18 Thread Hongtao Liu
On Wed, Oct 18, 2023 at 4:33 PM liuhongt wrote: > Cut from subject... There's a loop in vect_peel_nonlinear_iv_init to get init_expr * pow (step_expr, skip_niters). When skipn_iters is too big, compile time hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to init_expr << (exa

Re: [PATCH] x86: Correct ISA enabled for clients since Arrow Lake

2023-10-19 Thread Hongtao Liu
On Wed, Oct 18, 2023 at 4:10 PM Haochen Jiang wrote: > > Hi all, > > I just found that since ISAs enabled on Sierra Forest changed, clients since > Arrow Lake will wrongly enable ENQCMD according to the current code. > > To avoid messing up again in the future, I changed the dependency on how ISAs

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Hongtao Liu
On Mon, Oct 23, 2023 at 8:35 PM Richard Biener wrote: > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt wrote: > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > vcond and vcondeq shouldn't be necessary if there's > vcond_mask and vcmp support which is the

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 10:53 AM Hongtao Liu wrote: > > On Mon, Oct 23, 2023 at 8:35 PM Richard Biener > wrote: > > > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt wrote: > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > >

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 1:23 PM Hongtao Liu wrote: > > On Tue, Oct 24, 2023 at 10:53 AM Hongtao Liu wrote: > > > > On Mon, Oct 23, 2023 at 8:35 PM Richard Biener > > wrote: > > > > > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt wrote: > > >

Re: [PATCH] i386: Fix undefined masks in vpopcnt tests

2023-10-24 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 6:10 PM Richard Sandiford wrote: > > The files changed in this patch had tests for masked and unmasked > popcnt. However, the mask inputs to the masked forms were undefined, > and would be set to zero by init_regs. Any combine-like pass that > ran after init_regs could th

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

2023-10-27 Thread Hongtao Liu
On Fri, Oct 27, 2023 at 2:49 PM Richard Biener wrote: > > > > > Am 27.10.2023 um 07:50 schrieb liuhongt : > > > > When 2 vectors are equal, kmask is allones and kortest will set CF, > > else CF will be cleared. > > > > So CF bit can be used to check for the result of the comparison. > > > > Boots

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

2023-10-27 Thread Hongtao Liu
On Fri, Oct 27, 2023 at 3:21 PM Hongtao Liu wrote: > > On Fri, Oct 27, 2023 at 2:49 PM Richard Biener > wrote: > > > > > > > > > Am 27.10.2023 um 07:50 schrieb liuhongt : > > > > > > When 2 vectors are equal, kmask is allones

Re: [PATCH] Fix incorrect option mask and avx512cd target push

2023-10-30 Thread Hongtao Liu
On Mon, Oct 30, 2023 at 3:47 PM Haochen Jiang wrote: > > Hi all, > > This patch fixed two obvious bug in current evex512 implementation. > > Also, I moved AVX512CD+AVX512VL part out of the AVX512VL to avoid > accidental handle miss in avx512cd in the future. > > Ok for trunk? Ok. > > BRs, > Haoche

Re: [PATCH 0/4] Fix no-evex512 function attribute

2023-10-31 Thread Hongtao Liu
On Tue, Oct 31, 2023 at 2:39 PM Haochen Jiang wrote: > > Hi all, > > These four patches are going to fix no-evex512 function attribute. The detail > of the issue comes following: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889 > > My proposal for this problem is to also push "no-evex512" w

Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Hongtao Liu
On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak wrote: > > The patch generalizes address register class handling to allow multiple > address register classes. For APX EGPR targets, some instructions can't be > encoded with REX2 prefix, so it is necessary to limit address register > class to avoid REX2

Re: [PATCH 5/5] x86: yet more PR target/100711-like splitting

2023-11-06 Thread Hongtao Liu
On Mon, Nov 6, 2023 at 7:10 PM Jan Beulich wrote: > > On 25.06.2023 08:41, Hongtao Liu wrote: > > On Sun, Jun 25, 2023 at 2:35 PM Hongtao Liu wrote: > >> > >> On Sun, Jun 25, 2023 at 2:25 PM Jan Beulich wrote: > >>> > >>> On 25.06.2023 07:1

[PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-22 Thread Hongtao Liu
Hi uros: This patch fixes false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale. Bootstrap ok, regression test on i386/x86 ok. It does something like this: - For scalar instructions with both xmm operands: op %xmmN,%xmmQ,%xmmQ > op %xmmN, %xmmN, %xmmQ for scalar instruc

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-22 Thread Hongtao Liu
Update patch: Add m constraint to define_insn (sse_1_round): Change constraint x to xm since vround support memory operand. * (*sse4_1_round): Ditto. Bootstrap and regression test ok. On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu wrote: > > Hi uros: > This patch fi

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-24 Thread Hongtao Liu
On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak wrote: > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu wrote: > > > > Update patch: > > Add m constraint to define_insn (sse_1_round > *sse_1_round > when under sse4 but not avx512f. > > It looks to me that the origi

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-24 Thread Hongtao Liu
On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu wrote: > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak wrote: > > > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu wrote: > > > > > > Update patch: > > > Add m constraint to define_insn (sse_1_round > &g

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-25 Thread Hongtao Liu
Update patch. On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak wrote: > > On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu wrote: > > > > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu wrote: > > > > > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak wrote: > > &

[PATCH] Adjust predicates and constraints of scalar insns

2019-10-25 Thread Hongtao Liu
> Looking into sse.md, there is a lot of inconsistencies in existing *vm > patterns w.r.t. operand constraints. Unfortunately, these were copied > into proposed patterns. One example is existing > > (define_insn "_vmsqrt2" > [(set (match_operand:VF_128 0 "register_operand" "=x,v") > (vec_merg

[PATCH] Remove redudant iptr when operand already has a scalar mode.

2019-10-26 Thread Hongtao Liu
> BTW: Please also note that there is no need to use or operand > mode override in scalar insn templates for intel asm dialect when > operand already has a scalar mode. https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01868.html This patch is to remove redundant when operand already has a scalar mo

[PATCH target/92295] Fix inefficient vector constructor

2019-10-31 Thread Hongtao Liu
Hi uros: This patch is about to fix inefficient vector constructor. Currently in ix86_expand_vector_init_concat, vector are initialized per 2 elements which can miss some optimization opportunity like pr92295. Bootstrap and i386 regression test is ok. Ok for trunk? Changelog gcc/

Re: [PATCH target/92295] Fix inefficient vector constructor

2019-11-02 Thread Hongtao Liu
Hi Jakub: Could you help reviewing this patch. PS: Since this patch is related to vectors(avx512f), and Uros mentioned before that he has no intension to maintain avx512f. On Fri, Nov 1, 2019 at 9:12 AM Hongtao Liu wrote: > > Hi uros: > This patch is about to fix inefficie

Re: [PATCH target/92295] Fix inefficient vector constructor

2019-11-06 Thread Hongtao Liu
Ping! On Sat, Nov 2, 2019 at 9:38 PM Hongtao Liu wrote: > > Hi Jakub: > Could you help reviewing this patch. > > PS: Since this patch is related to vectors(avx512f), and Uros > mentioned before that he has no intension to maintain avx512f. > > On Fri, Nov 1, 2019 at 9:

[PATCH] Set AVX128_OPTIMAL for all avx targets.

2019-11-11 Thread Hongtao Liu
Hi: This patch is about to set X86_TUNE_AVX128_OPTIMAL as default for all AVX target because we found there's still performance gap between 128-bit auto-vectorization and 256-bit auto-vectorization even with epilog vectorized. The performance influence of setting avx128_optimal as default on SP

Re: [PATCH] Set AVX128_OPTIMAL for all avx targets.

2019-11-12 Thread Hongtao Liu
On Tue, Nov 12, 2019 at 4:19 PM Richard Biener wrote: > > On Tue, Nov 12, 2019 at 8:36 AM Hongtao Liu wrote: > > > > Hi: > > This patch is about to set X86_TUNE_AVX128_OPTIMAL as default for > > all AVX target because we found there's still perfo

Re: [PATCH] Set AVX128_OPTIMAL for all avx targets.

2019-11-12 Thread Hongtao Liu
On Tue, Nov 12, 2019 at 4:29 PM Richard Biener wrote: > > On Tue, Nov 12, 2019 at 9:19 AM Richard Biener > wrote: > > > > On Tue, Nov 12, 2019 at 8:36 AM Hongtao Liu wrote: > > > > > > Hi: > > > This patch is about to set X86_TUNE_AVX128_OPTIMA

[PATCH] Split X86_TUNE_AVX128_OPTIMAL into X86_TUNE_AVX256_SPLIT_REGS and X86_TUNE_AVX128_OPTIMAL

2019-11-12 Thread Hongtao Liu
Hi: As mentioned in https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00832.html > So yes, it's poorly named. A preparatory patch to clean this up > (and maybe split it into TARGET_AVX256_SPLIT_REGS and TARGET_AVX128_OPTIMAL) > would be nice. Bootstrap and regression test for i386 backend is ok.

Re: [PATCH] Set AVX128_OPTIMAL for all avx targets.

2019-11-12 Thread Hongtao Liu
On Tue, Nov 12, 2019 at 4:41 PM Richard Biener wrote: > > On Tue, Nov 12, 2019 at 9:29 AM Hongtao Liu wrote: > > > > On Tue, Nov 12, 2019 at 4:19 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 12, 2019 at 8:36 AM Hongtao Liu wrote: > > >

[PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-12 Thread Hongtao Liu
Hi As mentioned in PR93724, several intrinsic macros lack a closing parenthesis. These macros are only used with -O0 option, and currently unit tests use -O2, so not covered. Bootstrap ok, regression tests on i386/x86_64 is ok. Ok for trunk? Changelog gcc/ * config/i386/avx512vbmi2in

Re: [PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-13 Thread Hongtao Liu
On Thu, Feb 13, 2020 at 5:12 PM Uros Bizjak wrote: > > On Thu, Feb 13, 2020 at 9:53 AM Jakub Jelinek wrote: > > > > On Thu, Feb 13, 2020 at 09:39:05AM +0100, Uros Bizjak wrote: > > > > Changelog > > > > gcc/ > > > >* config/i386/avx512vbmi2intrin.h > > > >(_mm512_[,mask_,maskz_]sh

Re: [PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-13 Thread Hongtao Liu
On Thu, Feb 13, 2020 at 5:31 PM Hongtao Liu wrote: > > On Thu, Feb 13, 2020 at 5:12 PM Uros Bizjak wrote: > > > > On Thu, Feb 13, 2020 at 9:53 AM Jakub Jelinek wrote: > > > > > > On Thu, Feb 13, 2020 at 09:39:05AM +0100, Uros Bizjak wrot

Re: [PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-14 Thread Hongtao Liu
Done. On Fri, Feb 14, 2020 at 7:16 PM Uros Bizjak wrote: > > On Fri, Feb 14, 2020 at 8:06 AM Uros Bizjak wrote: > > > > On Fri, Feb 14, 2020 at 7:03 AM Hongtao Liu wrote: > > > > > > On Thu, Feb 13, 2020 at 5:31 PM Hongtao Liu wrote: > > > > >

Re: [PATCH] Enable mask operation for 128/256-bit vector VCOND_EXPR under avx512f (PR92686)

2019-12-04 Thread Hongtao Liu
On Wed, Dec 4, 2019 at 4:22 PM Jakub Jelinek wrote: > > On Wed, Dec 04, 2019 at 10:07:05AM +0800, Hongtao Liu wrote: > > Changelog > > gcc/ > > PR target/92686 > > * config/i386/sse.md > > (*_cmp3, > > *_cmp3, > > *_uc

Re: [PATCH] Enable mask operation for 128/256-bit vector VCOND_EXPR under avx512f (PR92686)

2019-12-08 Thread Hongtao Liu
On Thu, Dec 5, 2019 at 4:03 PM Jakub Jelinek wrote: > > On Thu, Dec 05, 2019 at 09:56:46AM +0800, Hongtao Liu wrote: > > --- a/gcc/config/i386/i386-expand.c > > +++ b/gcc/config/i386/i386-expand.c > > + /* Using vector move with mask register. */ > > +

[PATCH] Use OPTION_MASK_ISA2_$target_[SET, UNSET, ] to indicate those for x_ix86_isa_flags2

2019-12-09 Thread Hongtao Liu
Hi uros: This patch is about to rename OPTION_MASK_ISA_$target_[SET,UNSET, ] to OPTION_MASK_ISA2_$target_[SET,UNSET, ] for those targets setting x_ix86_isa_flags2. target list as bellow: - 188static struct ix86_target_opts isa2_opts[] = 189{ 190 { "-mcx16", OPTION_MASK_ISA2_CX

[PATCH] Fix unrecognizable insn of pr92865

2019-12-09 Thread Hongtao Liu
Hi jakub: This patch is to enable integer mask cmp/cmov under AVX512F even with TARGET_XOP . Bootstrap and regression test on i386/x86_64 backend is ok. Changelog: PR target/92865 * gcc/config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable integer mask cmov when available ev

Re: [PATCH] Fix unrecognizable insn of pr92865

2019-12-10 Thread Hongtao Liu
On Tue, Dec 10, 2019 at 4:11 PM Jakub Jelinek wrote: > > On Tue, Dec 10, 2019 at 01:47:50PM +0800, Hongtao Liu wrote: > > This patch is to enable integer mask cmp/cmov under AVX512F even > > with TARGET_XOP . > > Bootstrap and regression test on i386/x86_64 backend

Re: [PATCH] Fix unrecognizable insn of pr92865

2019-12-10 Thread Hongtao Liu
On Wed, Dec 11, 2019 at 3:54 PM Jakub Jelinek wrote: > > On Wed, Dec 11, 2019 at 09:55:24AM +0800, Hongtao Liu wrote: > > Changelog > > gcc/ > > PR target/92865 > > * config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable > > integer mask cmov

[PATCH]Add tune option for integer mask cmov, enable this tune for m_CORE_AVX512

2019-12-11 Thread Hongtao Liu
Hi: This patch is about to add tune option for integer mask cmov, for some targets has both integer mask register and sse mask register, this tune indicates to use integer one. Currently it's default on for m_CORE_AVX512. Bootstrap is ok, regression test on i386/x86_64 backends is ok. ok for

[PATCH] Fix redundant load missed by fre [tree-optimization 92980]

2019-12-17 Thread Hongtao Liu
Hi: This patch is to simplify A * C + (-D) -> (A - D/C) * C when C is a power of 2 and D mod C == 0. bootstrap and make check is ok. changelog gcc/ * gcc/match.pd (A * C + (-D) = (A - D/C) * C. when C is a power of 2 and D mod C == 0): Add new simplification. gcc/testsuite

Re: [PATCH] Fix redundant load missed by fre [tree-optimization 92980]

2019-12-17 Thread Hongtao Liu
On Wed, Dec 18, 2019 at 10:50 AM Andrew Pinski wrote: > > On Tue, Dec 17, 2019 at 6:33 PM Hongtao Liu wrote: > > > > Hi: > > This patch is to simplify A * C + (-D) -> (A - D/C) * C when C is a > > power of 2 and D mod C == 0. > > bootstrap and make ch

Re: [PATCH] Fix redundant load missed by fre [tree-optimization 92980]

2019-12-18 Thread Hongtao Liu
On Wed, Dec 18, 2019 at 4:26 PM Segher Boessenkool wrote: > > On Wed, Dec 18, 2019 at 10:37:11AM +0800, Hongtao Liu wrote: > > Hi: > > This patch is to simplify A * C + (-D) -> (A - D/C) * C when C is a > > power of 2 and D mod C == 0. > > bootstrap and make c

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-03-04 Thread Hongtao Liu
On Thu, Feb 29, 2024 at 2:20 PM Hongtao Liu wrote: > > On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek wrote: > > > > Hi! > > > > Adding Hongtao and Honza into the loop as the ones who acked the original > > patch. > > > > The no_callee_saved_regist

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-12 Thread Hongtao Liu
On Tue, Mar 12, 2024 at 8:00 PM liuhongt wrote: > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > alignb. (base_align_bias - base_offset) may not aligned to alignb, and > caused segement fault. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk and backpo

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt wrote: > > > > When we split > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ]) > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct > > SQRefCounted *)CallNative_nclosure.0_1]._u

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 10:46 PM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak wrote: > > > > On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu wrote: > > > > > > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > > > > &g

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: > > Don't enable excess lanes when inverting vector bit-masks smaller than the > integer mode. This is yet another case of wrong-code due to mishandling > of oversized bitmasks. > > This issue shows up in vect/tsvc/vect-tsvc-s278.c and > vect/

Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Hongtao Liu
On Mon, Mar 18, 2024 at 6:59 PM Uros Bizjak wrote: > > On Mon, Mar 18, 2024 at 11:52 AM liuhongt wrote: > > > > Commit r14-9459-g618e34d56cc38e only handles > > general_scalar_chain::convert_op. The patch also handles > > timode_scalar_chain::convert_op to avoid potential similar bug. > > > > Boo

Re: [PATCH] Document -fexcess-precision=16.

2024-03-18 Thread Hongtao Liu
On Tue, Mar 19, 2024 at 12:16 AM Joseph Myers wrote: > > On Mon, 18 Mar 2024, liuhongt wrote: > > > +If @option{-fexcess-precision=16} is specified, casts and assignments of > > +@code{_Float16} and @code{bfloat16_t} cause value to be rounded to their > > +semantic types if they're supported by th

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek wrote: > > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote: > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > > alignb. (base_align_bias - base_offset) may not aligned to alignb, and > > caused segement fault. > > > > Boots

Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Tue, Mar 26, 2024 at 11:26 AM Hongtao Liu wrote: > > On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek wrote: > > > > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote: > > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of > > > alig

Re: [PATCH] x86: Define macros for APX options

2024-04-08 Thread Hongtao Liu
On Mon, Apr 8, 2024 at 11:44 PM H.J. Lu wrote: > > Define following macros for APX options: > > 1. __APX_EGPR__: -mapx-features=egpr. > 2. __APX_PUSH2POP2__: -mapx-features=push2pop2. > 3. __APX_NDD__: -mapx-features=ndd. > 4. __APX_PPX__: -mapx-features=ppx. For -mapx-features=, we haven't decide

Re: [PATCH v2] x86: Define __APX_INLINE_ASM_USE_GPR32__

2024-04-08 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 9:58 AM H.J. Lu wrote: > > Define __APX_INLINE_ASM_USE_GPR32__ for -mapx-inline-asm-use-gpr32. > When __APX_INLINE_ASM_USE_GPR32__ is defined, inline asm statements > should contain only instructions compatible with r16-r31. Ok. > > gcc/ > > PR target/114587 >

Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Hongtao Liu
On Thu, Apr 4, 2024 at 4:42 PM Jakub Jelinek wrote: > > On Wed, Apr 19, 2023 at 02:40:59AM +, Jiang, Haochen via Gcc-patches > wrote: > > > > (define_insn "aesenc" > > > > - [(set (match_operand:V2DI 0 "register_operand" "=x,x") > > > > - (unspec:V2DI [(match_operand:V2DI 1 "register_

Re: [PATCH] i386, v2: Fix aes/vaes patterns [PR114576]

2024-04-09 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 5:18 PM Jakub Jelinek wrote: > > On Tue, Apr 09, 2024 at 11:23:40AM +0800, Hongtao Liu wrote: > > I think we can merge alternative 2 with 3 to > > * return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" : > > \"%{evex%} vae

Re: [PATCH] Prohibit SHA/KEYLOCKER usage of EGPR when APX enabled

2024-04-09 Thread Hongtao Liu
On Tue, Apr 9, 2024 at 3:05 PM Hongyu Wang wrote: > > The latest APX spec announced removal of SHA/KEYLOCKER evex promotion [1], > which means the SHA/KEYLOCKER insn does not support EGPR when APX > enabled. Update the corresponding constraints to their EGPR-disabled > counterparts. > > Bootstrapp

Re: [PATCH] x86: Update constraints for APX NDD instructions

2024-02-07 Thread Hongtao Liu
On Tue, Feb 6, 2024 at 11:49 AM H.J. Lu wrote: > > 1. The only supported TLS code sequence with ADD is > > addq foo@gottpoff(%rip),%reg > > Change je constraint to a memory operand in APX NDD ADD pattern with > register source operand. > > 2. The instruction length of APX NDD instructions

Re: [PATCH] x86-64: Generate push2/pop2 only if the incoming stack is 16-byte aligned

2024-02-17 Thread Hongtao Liu
On Wed, Feb 14, 2024 at 5:33 AM H.J. Lu wrote: > > Since push2/pop2 requires 16-byte stack alignment, don't generate them > if the incoming stack isn't 16-byte aligned. Ok. > > gcc/ > > PR target/113912 > * config/i386/i386.cc (ix86_can_use_push2pop2): New. > (ix86_pro_and_

Re: PING: [PATCH] x86-64: Check R_X86_64_CODE_6_GOTTPOFF support

2024-02-22 Thread Hongtao Liu
On Thu, Feb 22, 2024 at 10:33 PM H.J. Lu wrote: > > On Sun, Feb 18, 2024 at 8:02 AM H.J. Lu wrote: > > > > If assembler and linker supports > > > > add %reg1, name@gottpoff(%rip), %reg2 > > > > with R_X86_64_CODE_6_GOTTPOFF, we can generate it instead of > > > > mov name@gottpoff(%rip), %reg2 > >

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > ldtilecfg and sttilecfg take a 512-byte memory block. With > _tile_loadconfig implemented as > > extern __inline void > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _tile_loadconfig (const void *__config) > { > __asm__ v

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > > > > > ldtilecfg and sttilecfg take a 512-byte memory block. With > > > _tile_loadconf

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 11:26 AM wrote: > > From: Pan Li > > We allowed vector type for get_stored_val when read is less than or > equal to store in previous. Unfortunately, we missed to adjust the > validate_subreg part accordingly. For vector type, we don't need to > restrict the mode size is

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Hongtao Liu
MODE_NATURAL_SIZE (imode); > > Pan > > -Original Message- > From: Hongtao Liu > Sent: Monday, February 26, 2024 11:41 AM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > richard.guent...@gmail.com; Wang, Yanzhang ; > rda

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-26 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 6:30 PM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 8:25 PM H.J. Lu wrote: > > > > On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu wrote: > > > > > > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > > > > >

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-26 Thread Hongtao Liu
On Tue, Feb 27, 2024 at 3:44 PM Richard Biener wrote: > > On Tue, 27 Feb 2024, haochen.jiang wrote: > > > On Linux/x86_64, > > > > af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit > > commit af66ad89e8169f44db723813662917cf4cbb78fc > > Author: Richard Biener > > Date: Fri Feb 23

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-28 Thread Hongtao Liu
On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek wrote: > > Hi! > > Adding Hongtao and Honza into the loop as the ones who acked the original > patch. > > The no_callee_saved_registers by default for noreturn functions change can > break in-process backtrace(3) or backtraces from debugger or other pr

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongtao Liu
On Tue, Jan 9, 2024 at 3:09 PM Hongyu Wang wrote: > > Hi, > > For APX, the inline asm behavior was not mentioned in any document > before. Add description for it. > > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.opt: Adjust document. > * doc/invoke.texi: Add description

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongtao Liu
On Thu, Jan 11, 2024 at 7:06 AM Andi Kleen wrote: > > Hongtao Liu writes: > >> > >> +@opindex mapx-inline-asm-use-gpr32 > >> +@item -mapx-inline-asm-use-gpr32 > >> +When APX_F enabled, EGPR usage was by default disabled to prevent > >> +unexp

Re: [PATCH] i386: Add AVX10.1 related macros

2024-01-11 Thread Hongtao Liu
On Fri, Jan 12, 2024 at 10:55 AM Jiang, Haochen wrote: > > > -Original Message- > > From: Richard Biener > > Sent: Thursday, January 11, 2024 4:19 PM > > To: Liu, Hongtao > > Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org; > > ubiz...@gmail.com; bur...@net-b.de; san...@codesourcery.com > >

Re: [PATCH] Update documents for fcf-protection=

2024-01-11 Thread Hongtao Liu
On Thu, Jan 11, 2024 at 12:06 AM H.J. Lu wrote: > > On Tue, Jan 9, 2024 at 6:02 PM liuhongt wrote: > > > > After r14-2692-g1c6231c05bdcca, the option is defined as EnumSet and > > -fcf-protection=branch won't unset any others bits since they're in > > different groups. So to override -fcf-protect

Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.

2024-01-16 Thread Hongtao Liu
On Wed, Jan 17, 2024 at 5:59 AM Roger Sayle wrote: > > > I thought I'd just missed the bug fixing season of stage3, but there > appears to a little latitude in early stage4 (for vector patches), so > I'll post this now. > > This patch resolves PR target/106060 by providing efficient methods for >

Re: [PATCH] hwasan: Check if Intel LAM_U57 is enabled

2024-01-17 Thread Hongtao Liu
On Wed, Jan 10, 2024 at 12:47 AM H.J. Lu wrote: > > When -fsanitize=hwaddress is used, libhwasan will try to enable LAM_U57 > in the startup code. Update the target check to enable hwaddress tests > if LAM_U57 is enabled. Also compile hwaddress tests with -mlam=u57 on > x86-64 since hwasan requi

Re: [PATCH 1/2] x86: Add no_callee_saved_registers function attribute

2024-01-21 Thread Hongtao Liu
On Sat, Jan 20, 2024 at 10:30 PM H.J. Lu wrote: > > When an interrupt handler is implemented by an assembly stub which does: > > 1. Save all registers. > 2. Call a C function. > 3. Restore all registers. > 4. Return from interrupt. > > it is completely unnecessary to save and restore any registers

Re: [PATCH] i386: Modify testcases failed under -DDEBUG

2024-01-24 Thread Hongtao Liu
On Mon, Jan 22, 2024 at 10:31 AM Haochen Jiang wrote: > > Hi all, > > Recently, I happened to run i386.exp under -DDEBUG and found some fail. > > This patch aims to fix that. Ok for trunk? OK. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/adx-check.h: Include stdio.

Re: [PATCH v3 0/2] x86: Don't save callee-saved registers if not needed

2024-01-24 Thread Hongtao Liu
On Tue, Jan 23, 2024 at 11:00 PM H.J. Lu wrote: > > Changes in v3: > > 1. Rebase against commit 02e68389494 > 2. Don't add call_no_callee_saved_registers to machine_function since > all callee-saved registers are properly clobbered by callee with > no_callee_saved_registers attribute. > The patch

Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.

2024-01-25 Thread Hongtao Liu
constants. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline (in stage 1)? Ok, thanks for handling this. > > > 2024-01-25 Roger Sayle >

Re: [PATCH] [ICE] Support vpcmov for V4HF/V4BF/V2HF/V2BF under TARGET_XOP.

2023-12-13 Thread Hongtao Liu
On Wed, Dec 13, 2023 at 7:59 PM Jakub Jelinek wrote: > > On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > > > gcc/ChangeLog: > > > > PR target/112904 > > * config/i386/mmx.md (*xop_pcmov

Re: [PATCH] i386: Remove RAO-INT from Grand Ridge

2023-12-14 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 10:55 AM Haochen Jiang wrote: > > Hi all, > > According to ISE050 published at the end of September, RAO-INT will not > be in Grand Ridge anymore. This patch aims to remove it. > > The documentation comes following: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > > R

Re: [PATCH] i386: Sync move_max/store_max with prefer-vector-width [PR112824]

2023-12-14 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 3:54 PM Hongyu Wang wrote: > > Hi, > > Currently move_max follows the tuning feature first, but ideally it > should sync with prefer-vector-width when it is explicitly set to keep > vector move and operation with same vector size. > > Bootstrapped/regtested on x86-64-pc-lin

Re: [PATCH] i386: Allow 64 bit mask register for -mno-evex512

2023-12-19 Thread Hongtao Liu
On Fri, Dec 15, 2023 at 10:34 AM Haochen Jiang wrote: > > Hi all, > > There is a recent change in AVX10 documentation which allows 64 bit mask > register instructions in AVX10-256, the documentation comes following: > > Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification >

Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-01 Thread Hongtao Liu
On Fri, Dec 22, 2023 at 6:25 PM Roger Sayle wrote: > > > This patch resolves the second part of PR target/112992, building upon > Hongtao Liu's solution to the first part. > > The issue addressed by this patch is that when initializing vectors by > broadcasting integer constants, the compiler has

Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-07 Thread Hongtao Liu
get/i386/pr100865-5a.c: Likewise. > * gcc.target/i386/pr100865-5b.c: Likewise. > * gcc.target/i386/pr100865-9a.c: Likewise. > * gcc.target/i386/pr100865-9b.c: Likewise. > * gcc.target/i386/pr102021.c: Likewise. > * gcc.target/i386/pr90773-17.c: Likewise. &

Re: Disable FMADD in chains for Zen4 and generic

2024-01-07 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 12:03 AM Jan Hubicka wrote: > > > > The diffrerence is that Cores understand the fact that fmadd does not need > > > all three parameters to start computation, while Zen cores doesn't. > > > > > > Since this seems noticeable win on zen and not loss on Core it seems like >

Re: [PATCH] i386: [APX] Add missing document for APX

2024-01-07 Thread Hongtao Liu
On Mon, Jan 8, 2024 at 11:09 AM Hongyu Wang wrote: > > Hi, > > The supported sub-features for APX was missing in option document and > target attribute section. Add those missing ones. > > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.opt: Add supported sub-features. >

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-03 Thread Hongtao Liu
On Fri, Dec 1, 2023 at 10:26 PM Richard Biener wrote: > > On Fri, Dec 1, 2023 at 3:39 AM liuhongt wrote: > > > > > Hmm, I would suggest you put reg_needed into the class and accumulate > > > over all vec_construct, with your patch you pessimize a single v32qi > > > over two separate v16qi for exa

Re: [PATCH v2 00/17] Support Intel APX NDD

2023-12-04 Thread Hongtao Liu
On Tue, Dec 5, 2023 at 10:32 AM Hongyu Wang wrote: > > Hi, > > APX NDD patches have been posted at > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636604.html > > Thanks to Hongtao's review, the V2 patch adds support of zext sematic with > memory input as NDD by default clear upper bits

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-04 Thread Hongtao Liu
On Mon, Dec 4, 2023 at 3:51 PM Uros Bizjak wrote: > > On Mon, Dec 4, 2023 at 8:11 AM Hongtao Liu wrote: > > > > On Fri, Dec 1, 2023 at 10:26 PM Richard Biener > > wrote: > > > > > > On Fri, Dec 1, 2023 at 3:39 AM liuhongt wrote: > > > >

Re: [PATCH] i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]

2023-12-05 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 6:23 AM Jakub Jelinek wrote: > > Hi! > > Regardless of the outcome of the REG_UNUSED discussions, I think > it is a good idea to move the vzeroupper pass one pass later. > As can be seen in the multiple PRs and as postreload.cc documents, > reload/LRA is known to create dead

Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct and stores

2023-12-05 Thread Hongtao Liu
On Mon, Dec 4, 2023 at 10:10 PM Richard Biener wrote: > > On Mon, Dec 4, 2023 at 6:32 AM liuhongt wrote: > > > > .i.e. for below cases. > >a[0] = b1; > >a[1] = b2; > >.. > >a[n] = bn; > > > > There're extra dependences when contructing the vector, but not for > > scalar store. Acc

Re: [PATCH v3 00/16] Support Intel APX NDD

2023-12-06 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 8:11 PM Uros Bizjak wrote: > > On Wed, Dec 6, 2023 at 9:08 AM Hongyu Wang wrote: > > > > Hi, > > > > Following up the discussion of V2 patches in > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639368.html, > > this patch series add early clobber for all TImode

Re: [V2 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-07 Thread Hongtao Liu
ping. On Thu, Nov 16, 2023 at 6:49 PM liuhongt wrote: > > Update in V2: > 1) Add some comments before the pattern. > 2) Remove ? from view_convert. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > When I'm working on PR112443, I notice there's some misoptimization

Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-12-07 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 3:52 PM Richard Biener wrote: > > On Wed, Dec 6, 2023 at 3:33 AM Jiang, Haochen wrote: > > > > > -Original Message- > > > From: Jiang, Haochen > > > Sent: Friday, December 1, 2023 4:51 PM > > > To: Richard Biener > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; >

Re: [v3 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-11 Thread Hongtao Liu
On Mon, Dec 11, 2023 at 4:14 PM Richard Biener wrote: > > On Mon, Dec 11, 2023 at 7:51 AM liuhongt wrote: > > > > > since you are looking at TYPE_PRECISION below you want > > > VECTOR_INTIEGER_TYPE_P here as well? The alternative > > > would be to compare TYPE_SIZE. > > > > > > Some of the check

Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongtao Liu
On Mon, Dec 11, 2023 at 8:39 PM Hongyu Wang wrote: > > > > +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { > > > dg-warning "integer constant is so large that it is unsigned" "so large" > > > } */ > > > > Just you can use (9223372036854775807LL + (__int128) 1) instead of > >

Re: [PATCH] Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.

2023-12-11 Thread Hongtao Liu
On Fri, Dec 8, 2023 at 10:17 AM liuhongt wrote: > > If the function desn't clobber any sse registers or only clobber > 128-bit part, then vzeroupper isn't issued before the function exit. > the status not CLEAN but ANY after the function. > > Also for sibling_call, it's safe to issue an vzeroupper

  1   2   3   4   5   6   7   8   9   10   >