from:"Liu, Hongtao"

RE: [PATCH] Support Intel USER_MSR

2023-10-11 Thread Liu, Hongtao




> -Original Message-
> From: Hu, Lin1 
> Sent: Tuesday, October 10, 2023 4:06 PM
> To: Hu, Lin1 ; gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: RE: [PATCH] Support Intel USER_MSR
> 
> There are some typos In /gcc/doc/extend.texi and /gcc/doc/invoke.texi. They
> should be USER_MSR, not UMSR. I have modified them in my branch.
> 
> -Original Message-
> From: Hu, Lin1 
> Sent: Tuesday, October 10, 2023 3:47 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Support Intel USER_MSR
> 
> This patch aims to support Intel USER_MSR.
Ok.
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/cpuinfo.h (get_available_features):
>   Detect USER_MSR.
>   * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_USER_MSR_SET): New.
>   (OPTION_MASK_ISA2_USER_MSR_UNSET): Ditto.
>   (ix86_handle_option): Handle -musermsr.
>   * common/config/i386/i386-cpuinfo.h (enum processor_features):
>   Add FEATURE_USER_MSR.
>   * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
> for usermsr.
>   * config.gcc: Add usermsrintrin.h
>   * config/i386/cpuid.h (bit_USER_MSR): New.
>   * config/i386/i386-builtin-types.def:
>   Add DEF_FUNCTION_TYPE (VOID, UINT64, UINT64).
>   * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
>   Add __builtin_urdmsr and __builtin_uwrmsr.
>   * config/i386/i386-builtins.h (ix86_builtins):
>   Add IX86_BUILTIN_URDMSR and IX86_BUILTIN_UWRMSR.
>   * config/i386/i386-c.cc (ix86_target_macros_internal):
>   Define __USER_MSR__.
>   * config/i386/i386-expand.cc (ix86_expand_builtin):
>   Handle new builtins.
>   * config/i386/i386-isa.def (USER_MSR): Add DEF_PTA(USER_MSR).
>   * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
>   Handle usermsr.
>   * config/i386/i386.md (urdmsr): New define_insn.
>   (uwrmsr): Ditto.
>   * config/i386/i386.opt: Add option -musermsr.
>   * config/i386/x86gprintrin.h: Include usermsrintrin.h
>   * doc/extend.texi: Document usermsr.
>   * doc/invoke.texi: Document -musermsr.
>   * doc/sourcebuild.texi: Document target usermsr.
>   * config/i386/usermsrintrin.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/funcspec-56.inc: Add new target attribute.
>   * gcc.target/i386/x86gprintrin-1.c: Add -musermsr for 64bit target.
>   * gcc.target/i386/x86gprintrin-2.c: Ditto.
>   * gcc.target/i386/x86gprintrin-3.c: Ditto.
>   * gcc.target/i386/x86gprintrin-4.c: Add musermsr for 64bit target.
>   * gcc.target/i386/x86gprintrin-5.c: Ditto
>   * gcc.target/i386/usermsr-1.c: New test.
>   * gcc.target/i386/usermsr-2.c: Ditto.
> ---
>  gcc/common/config/i386/cpuinfo.h  |  2 +
>  gcc/common/config/i386/i386-common.cc | 15 +
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/common/config/i386/i386-isas.h|  1 +
>  gcc/config.gcc|  3 +-
>  gcc/config/i386/cpuid.h   |  1 +
>  gcc/config/i386/i386-builtin-types.def|  3 +
>  gcc/config/i386/i386-builtins.cc  |  8 +++
>  gcc/config/i386/i386-builtins.h   |  2 +
>  gcc/config/i386/i386-c.cc |  2 +
>  gcc/config/i386/i386-expand.cc| 35 +++
>  gcc/config/i386/i386-isa.def  |  1 +
>  gcc/config/i386/i386-options.cc   |  4 +-
>  gcc/config/i386/i386.md   | 24 
>  gcc/config/i386/i386.opt  |  4 ++
>  gcc/config/i386/usermsrintrin.h   | 60 +++
>  gcc/config/i386/x86gprintrin.h|  2 +
>  gcc/doc/extend.texi   |  5 ++
>  gcc/doc/invoke.texi   |  6 +-
>  gcc/doc/sourcebuild.texi  |  3 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  2 +
>  gcc/testsuite/gcc.target/i386/user_msr-1.c| 20 +++
>  gcc/testsuite/gcc.target/i386/user_msr-2.c| 16 +
>  .../gcc.target/i386/x86gprintrin-1.c  |  2 +-
>  .../gcc.target/i386/x86gprintrin-2.c  |  6 +-
>  .../gcc.target/i386/x86gprintrin-3.c  | 28 -
>  .../gcc.target/i386/x86gprintrin-4.c  | 32 +-
>  .../gcc.target/i386/x86gprintrin-5.c  |  6 +-
>  28 files changed, 286 insertions(+), 8 deletions(-)  create mode 100644
> gcc/config/i386/usermsrintrin.h  create mode 100644
> gcc/testsuite/gcc.target/i386/user_msr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/user_msr-2.c
> 
> diff --git a/gcc/commo

RE: [PATCH] i386: Prevent splitting to xmm16+ when !TARGET_AVX512VL

2023-10-19 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Friday, October 20, 2023 2:21 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: [PATCH] i386: Prevent splitting to xmm16+ when !TARGET_AVX512VL
> 
> Hi all,
> 
> Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL,
> which finally leads to an ICE as pr111753 does.
> 
> This patch aims to fix that.
> 
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
LGTM.
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   PR target/111753
>   * config/i386/i386.cc (ix86_standard_x87sse_constant_load_p):
>   Do not split to xmm16+ when !TARGET_AVX512VL.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/111753
>   * gcc.target/i386/pr111753.c: New test.
> ---
>  gcc/config/i386/i386.cc  |  3 ++
>  gcc/testsuite/gcc.target/i386/pr111753.c | 69
> 
>  2 files changed, 72 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111753.c
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> 641e7680335..5f8c5eb98a2 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -5481,6 +5481,9 @@ ix86_standard_x87sse_constant_load_p (const
> rtx_insn *insn, rtx dst)
>if (src == NULL
>|| (SSE_REGNO_P (REGNO (dst))
> && standard_sse_constant_p (src, GET_MODE (dst)) != 1)
> +  || (!TARGET_AVX512VL
> +   && EXT_REX_SSE_REGNO_P (REGNO (dst))
> +   && standard_sse_constant_p (src, GET_MODE (dst)) == 1)
>|| (STACK_REGNO_P (REGNO (dst))
>  && standard_80387_constant_p (src) < 1))
>  return false;
> diff --git a/gcc/testsuite/gcc.target/i386/pr111753.c
> b/gcc/testsuite/gcc.target/i386/pr111753.c
> new file mode 100644
> index 000..16ceca6ddc6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr111753.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512bw -fno-tree-ter -Wno-div-by-zero" } */
> +
> +typedef int __attribute__((__vector_size__ (8))) v64u8; typedef char
> +__attribute__((__vector_size__ (16))) v128u8; typedef int
> +__attribute__((__vector_size__ (16))) v128u32; typedef int
> +__attribute__((__vector_size__ (32))) v256u8; typedef int
> +__attribute__((__vector_size__ (64))) v512u8; typedef short
> +__attribute__((__vector_size__ (4))) v32s16; typedef short
> +__attribute__((__vector_size__ (16))) v128s16; typedef short
> +__attribute__((__vector_size__ (32))) v256s16; typedef _Float16
> +__attribute__((__vector_size__ (16))) f16; typedef _Float32 f32;
> +typedef double __attribute__((__vector_size__ (64))) v512f64; typedef
> +_Decimal32 d32; typedef _Decimal64 __attribute__((__vector_size__
> +(32))) v256d64; typedef _Decimal64 __attribute__((__vector_size__
> +(64))) v512d64;
> +d32 foo0_d32_0, foo0_ret;
> +v256d64 foo0_v256d64_0;
> +v128s16 foo0_v128s16_0;
> +int foo0_v256d128_0;
> +
> +extern void bar(int);
> +
> +void
> +foo (v64u8, v128u8 v128u8_0, v128u8 v128s8_0,
> + v256u8 v256u8_0, int v256s8_0, v512u8 v512u8_0, int v512s8_0,
> + v256s16 v256s16_0,
> + v512u8 v512s16_0,
> + v128u32 v128u64_0,
> + v128u32 v128s64_0,
> + int, int, __int128 v128u128_0, __int128 v128s128_0, v128u32
> +v128f64_0) {
> +  v512d64 v512d64_0;
> +  v256u8 v256f32_0, v256d64_1 = foo0_v256d64_0 == foo0_d32_0;
> +  f32 f32_0;
> +  f16 v128f16_0;
> +  f32_0 /= 0;
> +  v128u8 v128u8_1 = v128u8_0 != 0;
> +  int v256d32_1;
> +  v256f32_0 /= 0;
> +  v32s16 v32s16_1 = __builtin_shufflevector ((v128s16) { }, v256s16_0,
> +5, 10);
> +  v512f64 v512f64_1 = __builtin_convertvector (v512d64_0, v512f64);
> +  v512u8 v512d128_1 = v512s16_0;
> +  v128s16 v128s16_2 =
> +__builtin_shufflevector ((v32s16) { }, v32s16_1, 0, 3, 2, 1,
> +  0, 0, 0, 3), v128s16_3 = foo0_v128s16_0 > 0;
> +  v128f16_0 /= 0;
> +  __int128 v128s128_1 = 0 == v128s128_0;
> +  v512u8 v512u8_r = v512u8_0 + v512s8_0 + (v512u8) v512f64_1 +
> +v512s16_0;
> +  v256u8 v256u8_r = ((union {
> +   v512u8 a;
> +   v256u8 b;}) v512u8_r).b +
> +v256u8_0 + v256s8_0 + v256f32_0 + v256d32_1 +
> +(v256u8) v256d64_1 + foo0_v256d128_0;
> +  v128u8 v128u8_r = ((union {
> +   v256u8 a;
> +   v128u8 b;}) v256u8_r).b +
> +v128u8_0 + v128u8_1 + v128s8_0 + (v128u8) v128s16_2 +
> +(v128u8) v128s16_3 + (v128u8) v128u64_0 + (v128u8) v128s64_0 +
> +(v128u8) v128u128_0 + (v128u8) v128s128_1 +
> +(v128u8) v128f16_0 + (v128u8) v128f64_0;
> +  bar (f32_0 + (int) foo0_d32_0);
> +  foo0_ret = ((union {
> +v64u8 a;
> +int b;}) ((union {
> +   v128u8 a;
> +   v64u8 b;}) v128u8_r).b).b;
> +}
> --
> 2.31.1

RE: [PATCH] asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector* [PR110027]

2024-04-11 Thread Liu, Hongtao




> -Original Message-
> From: Jakub Jelinek 
> Sent: Thursday, April 11, 2024 4:39 PM
> To: Richard Biener ; Jeff Law ;
> Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] asan, v3: Fix up handling of > 32 byte aligned variables 
> with -
> fsanitize=address -fstack-protector* [PR110027]
> 
> On Tue, Mar 26, 2024 at 02:08:02PM +0800, liuhongt wrote:
> > > > So, try to add some other variable with larger size and smaller
> > > > alignment to the frame (and make sure it isn't optimized away).
> > > >
> > > > alignb above is the alignment of the first partition's var, if
> > > > align_frame_offset really needs to depend on the var alignment, it
> > > > probably should be the maximum alignment of all the vars with
> > > > alignment alignb * BITS_PER_UNIT <=3D
> > > > MAX_SUPPORTED_STACK_ALIGNMENT
> > > >
> >
> > In asan_emit_stack_protection, when it allocated fake stack, it assume
> > bottom of stack is also aligned to alignb. And the place violated this
> > is the first var partition. which is 32 bytes offsets,  it should be
> > BIGGEST_ALIGNMENT / BITS_PER_UNIT.
> > So I think we need to use MAX (BIGGEST_ALIGNMENT / BITS_PER_UNIT,
> > ASAN_RED_ZONE_SIZE) for the first var partition.
> 
> Your first patch aligned offsets[0] to maximum of alignb and
> ASAN_RED_ZONE_SIZE.  But as I wrote in the reply to that mail, alignb there is
> the alignment of just a single variable which is the first one to appear in 
> the
> sorted list and is placed in the highest spot in the stack frame.
> That is not necessarily the largest alignment, the sorting ensures that it is 
> a
> variable with the largest size in the frame (and only if several of them have
> equal size, largest alignment from the same sized ones).  Your second patch
> used maximum of BIGGEST_ALIGNMENT / BITS_PER_UNIT and
> ASAN_RED_ZONE_SIZE.  That doesn't change anything at all when using -mno-
> avx512f - offsets[0] is still just 32-byte aligned in that case relative to 
> top of
> frame, just changes the -mavx512f case to be 64-byte aligned offsets[0] (aka
> offsets[0] is then either 0 or -64 instead of either
> 0 or -32).  That will not help if any variable in the frame needs 128-byte, 
> 256-
> byte, 512-byte ...  4096-byte alignment.  If you want to fix the bug in the 
> spot
> you've touched, you'd need to walk all the stack_vars[stack_vars_sorted[si2]]
> for si2 [si + 1, n - 1] and for those where the loop would do anything (i.e.
> stack_vars[i2].representative == i2
> && TREE_CODE (decl2) == SSA_NAME
>? SA.partition_to_pseudo[var_to_partition (SA.map, decl2)] == NULL_RTX
>: DECL_RTL (decl2) == pc_rtx
> and the pred applies (but that means also walking the earlier ones!
> because with -fstack-protector* the vars can be processed in several calls) 
> and
> alignb2 * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT and
> compute maximum of those alignments.
> That maximum is already computed,
> data->asan_alignb = MAX (data->asan_alignb, alignb);
> computes that, but you get the final result only after you do all the
> expand_stack_vars calls.  You'd need to compute it before.
> 
> Though, that change would be still in the wrong place.
> The thing is, it would be a waste of the precious stack space when it isn't
> needed at all (e.g.  when asan will not at compile time do the use after 
> return
> checking, or if it won't do it at runtime, or even if it will do at runtime 
> it will
> waste the space on the stack).
> 
> The following patch fixes it solely for the __asan_stack_malloc_N allocations,
> doesn't enlarge unnecessarily further the actual stack frame.
> Because asan is only supported on FRAME_GROWS_DOWNWARD
> architectures (mips, rs6000 and xtensa are conditional
> FRAME_GROWS_DOWNWARD arches, which for -fsanitize=address or -fstack-
> protector* use FRAME_GROWS_DOWNWARD 1, otherwise 0, others
> supporting asan always just use 1), the assumption for the dynamic stack
> realignment is that the top of the stack frame (aka offset
> 0) is aligned to alignb passed to the function (which is the maximum of alignb
> of all the vars in the frame).  As checked by the assertion in the patch,
> offsets[0] is 0 most of the time and so that assumption is correct, the only
> case when it is not 0 is if -fstack-protector* is on together with -
> fsanitize=address and cfgexpand.cc (create_stack_guard) created a stack
> guard.  That is the only variable which is allocated in the stack frame right
> away, for all others with -fsanitize=address defer_stack_allocation (or 
> -fstack-
> protector*) r

RE: [PATCH] i386: Add AVX10.1 related macros

2024-01-10 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, January 10, 2024 3:35 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com; burnus@net-
> b.de; san...@codesourcery.com
> Subject: [PATCH] i386: Add AVX10.1 related macros
> 
> Hi all,
> 
> This patch aims to add AVX10.1 related macros for libgomp's request. The
> request comes following:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html
> 
> Ok for trunk?
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   PR target/113288
>   * config/i386/i386-c.cc (ix86_target_macros_internal):
>   Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
> ---
>  gcc/config/i386/i386-c.cc | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index
> c3ae984670b..366b560158a 100644
> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -735,6 +735,13 @@ ix86_target_macros_internal (HOST_WIDE_INT
> isa_flag,
>  def_or_undef (parse_in, "__EVEX512__");
>if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
>  def_or_undef (parse_in, "__USER_MSR__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
> +{
> +  def_or_undef (parse_in, "__AVX10_1_256__");
> +  def_or_undef (parse_in, "__AVX10_1__");
I think this is not needed, others LGTM.
> +}
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
> +def_or_undef (parse_in, "__AVX10_1_512__");
>if (TARGET_IAMCU)
>  {
>def_or_undef (parse_in, "__iamcu");
> --
> 2.31.1

RE: [PATCH] i386: Add AVX10.1 related macros

2024-01-10 Thread Liu, Hongtao



> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, January 10, 2024 5:44 PM
> To: Liu, Hongtao 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; bur...@net-b.de; san...@codesourcery.com
> Subject: Re: [PATCH] i386: Add AVX10.1 related macros
> 
> On Wed, Jan 10, 2024 at 9:01 AM Liu, Hongtao 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Jiang, Haochen 
> > > Sent: Wednesday, January 10, 2024 3:35 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Liu, Hongtao ; ubiz...@gmail.com;
> > > burnus@net- b.de; san...@codesourcery.com
> > > Subject: [PATCH] i386: Add AVX10.1 related macros
> > >
> > > Hi all,
> > >
> > > This patch aims to add AVX10.1 related macros for libgomp's request.
> > > The request comes following:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html
> > >
> > > Ok for trunk?
> > >
> > > Thx,
> > > Haochen
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/113288
> > >   * config/i386/i386-c.cc (ix86_target_macros_internal):
> > >   Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
> > > ---
> > >  gcc/config/i386/i386-c.cc | 7 +++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
> > > index c3ae984670b..366b560158a 100644
> > > --- a/gcc/config/i386/i386-c.cc
> > > +++ b/gcc/config/i386/i386-c.cc
> > > @@ -735,6 +735,13 @@ ix86_target_macros_internal (HOST_WIDE_INT
> > > isa_flag,
> > >  def_or_undef (parse_in, "__EVEX512__");
> > >if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
> > >  def_or_undef (parse_in, "__USER_MSR__");
> > > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
> > > +{
> > > +  def_or_undef (parse_in, "__AVX10_1_256__");
> > > +  def_or_undef (parse_in, "__AVX10_1__");
> > I think this is not needed, others LGTM.
> 
> So __AVX10_1_256__ and __AVX10_1_512__ are redundant with
> __AVX10_1__ and __EVEX512__, right?
No, I mean __AVX10_1__ is redundant of __AVX10_1_256__ since -mavx10.1 is just 
alias of -mavx10.1-256.
We want explicit __AVX10_1_256__ and __AVX10_1_512__ and don't want mix 
__EVEX512__ with AVX10(They are related in their internal implementation, but 
we don't want the user to control the vector length of avx10 with -mno-evex512, 
-mno-evex512 is supposed for the existing AVX512).
> 
> > > +}
> > > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
> > > +def_or_undef (parse_in, "__AVX10_1_512__");
> > >if (TARGET_IAMCU)
> > >  {
> > >def_or_undef (parse_in, "__iamcu");
> > > --
> > > 2.31.1
> >

RE: [PATCH] i386: Remove redundant move in vnni pattern

2024-01-11 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Friday, January 12, 2024 10:26 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Remove redundant move in vnni pattern
> 
> Hi all,
> 
> This patch removes all redundant set in vnni patterns.
> 
> Ok for trunk?
Ok.
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   * config/i386/sse.md (sdot_prod): Remove redundant SET.
>   (usdot_prod): Ditto.
>   (sdot_prod): Ditto.
>   (udot_prod): Ditto.
> ---
>  gcc/config/i386/sse.md | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> 532738dcf94..acd10908d76 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16174,7 +16174,6 @@
>operands[2] = lowpart_subreg (mode,
>   force_reg (mode, operands[2]),
>   mode);
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>emit_insn (gen_vpdpwssd_ (operands[0],
> operands[3],
>  operands[1], operands[2]));
>  }
> @@ -29963,7 +29962,6 @@
>operands[2] = lowpart_subreg (mode,
>   force_reg (mode, operands[2]),
>   mode);
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>emit_insn (gen_vpdpbusd_ (operands[0],
> operands[3],
> operands[1], operands[2]));
>DONE;
> @@ -30780,7 +30778,6 @@
>operands[2] = lowpart_subreg (mode,
>   force_reg (mode, operands[2]),
>   mode);
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>emit_insn (gen_vpdpbssd_ (operands[0],
> operands[3],
> operands[1], operands[2]));
>  }
> @@ -30857,7 +30854,6 @@
>operands[2] = lowpart_subreg (mode,
>   force_reg (mode, operands[2]),
>   mode);
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>emit_insn (gen_vpdpbuud_ (operands[0],
> operands[3],
> operands[1], operands[2]));
> }
> --
> 2.31.1

RE: [PATCH] i386: Fix recent testcase fail

2024-01-08 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Monday, January 8, 2024 4:41 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Fix recent testcase fail
> 
> After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break
> vectorization is supported. The two testcases need to be fixed.
Ok.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase.
>   * gcc.target/i386/part-vect-absneghf.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c | 2 +-
>  gcc/testsuite/gcc.target/i386/part-vect-absneghf.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> index a22a6ceabff..f5dd457c9eb 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> @@ -35,7 +35,7 @@ do_test (void)
>abort ();
>  }
> 
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } }
> +*/
>  /* { dg-final { scan-assembler "\[ \t\]xor" } } */
>  /* { dg-final { scan-assembler "\[ \t\]and" } } */
>  /* { dg-final { scan-assembler-not "copysign" } } */ diff --git
> a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> index 48aed14d604..713f0bff4dd 100644
> --- a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target avx512fp16 } } */
> -/* { dg-options "-O1 -mavx512fp16 -mavx512vl -ftree-vectorize -fdump-
> tree-slp-details -fdump-tree-optimized" } */
> +/* { dg-options "-O1 -mavx512fp16 -mavx512vl -fdump-tree-slp-details
> +-fdump-tree-optimized" } */
> 
>  extern void abort ();
> 
> --
> 2.31.1

RE: [PATCH] i386: Improve code generation for vector __builtin_signbit (x.x[i]) ? -1 : 0 [PR112816]

2023-12-04 Thread Liu, Hongtao




> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, December 5, 2023 3:01 PM
> To: Uros Bizjak ; Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] i386: Improve code generation for vector __builtin_signbit
> (x.x[i]) ? -1 : 0 [PR112816]
> 
> Hi!
> 
> On the testcase I've recently fixed I've noticed bad code generation, we emit
> pxor%xmm1, %xmm1
> psrld   $31, %xmm0
> pcmpeqd %xmm1, %xmm0
> pcmpeqd %xmm1, %xmm0
> or
> vpxor   %xmm1, %xmm1, %xmm1
> vpsrld  $31, %xmm0, %xmm0
> vpcmpeqd%xmm1, %xmm0, %xmm0
> vpcmpeqd%xmm1, %xmm0, %xmm2
> rather than
> psrad   $31, %xmm2
> or
> vpsrad  $31, %xmm1, %xmm2
> The following patch fixes that using a combiner splitter.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok.
> 
> 2023-12-04  Jakub Jelinek  
> 
>   PR target/112816
>   * config/i386/sse.md ((eq (eq (lshiftrt x elt_bits-1) 0) 0)): New
>   splitter to turn psrld $31; pcmpeq; pcmpeq into psrad $31.
> 
>   * gcc.target/i386/pr112816.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2023-12-04 09:00:12.722437462 +0100
> +++ gcc/config/i386/sse.md2023-12-04 13:22:38.565833465 +0100
> @@ -16614,6 +16614,18 @@ (define_insn_and_split "*ashrv1ti3_inter
>DONE;
>  })
> 
> +(define_split
> +  [(set (match_operand:VI248_AVX2 0 "register_operand")
> +(eq:VI248_AVX2
> +   (eq:VI248_AVX2
> + (lshiftrt:VI248_AVX2
> +   (match_operand:VI248_AVX2 1 "register_operand")
> +   (match_operand:SI 2 "const_int_operand"))
> + (match_operand:VI248_AVX2 3 "const0_operand"))
> +   (match_operand:VI248_AVX2 4 "const0_operand")))]
> +  "INTVAL (operands[2]) == GET_MODE_PRECISION (mode)
> - 1"
> +  [(set (match_dup 0) (ashiftrt:VI248_AVX2 (match_dup 1) (match_dup
> +2)))])
> +
>  (define_expand "rotlv1ti3"
>[(set (match_operand:V1TI 0 "register_operand")
>   (rotate:V1TI
> --- gcc/testsuite/gcc.target/i386/pr112816.c.jj   2023-12-04
> 13:31:51.215061445 +0100
> +++ gcc/testsuite/gcc.target/i386/pr112816.c  2023-12-04
> 13:34:14.008053097 +0100
> @@ -0,0 +1,27 @@
> +/* PR target/112816 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-avx512f -masm=att" } */
> +/* { dg-final { scan-assembler-times "psrad\t\\\$31," 2 } } */
> +/* { dg-final { scan-assembler-not "pcmpeqd\t" } } */
> +
> +#define N 4
> +struct S { float x[N]; };
> +struct T { int x[N]; };
> +
> +__attribute__((target ("no-sse3,sse2"))) struct T foo (struct S x) {
> +  struct T res;
> +  for (int i = 0; i < N; ++i)
> +res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
> +  return res;
> +}
> +
> +__attribute__((target ("avx2"))) struct T bar (struct S x) {
> +  struct T res;
> +  for (int i = 0; i < N; ++i)
> +res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
> +  return res;
> +}
> 
>   Jakub

RE: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-16 Thread Liu, Hongtao



> -Original Message-
> From: Uros Bizjak 
> Sent: Wednesday, July 17, 2024 2:52 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1
> in UNSPEC_MASKMOV
> 
> On Wed, Jul 17, 2024 at 3:27 AM liuhongt  wrote:
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ready push to trunk.
> >
> > gcc/ChangeLog:
> >
> > PR target/115843
> > * config/i386/predicates.md (const0_or_m1_operand): New
> > predicate.
> > * config/i386/sse.md (*_store_mask_1): New
> > pre_reload define_insn_and_split.
> > (V): Add V32BF,V16BF,V8BF.
> > (V4SF_V8BF): Rename to ..
> > (V24F_128): .. this.
> > (*vec_concat): Adjust with V24F_128.
> > (*vec_concat_0): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr115843.c: New test.
> > ---
> >  gcc/config/i386/predicates.md|  5 
> >  gcc/config/i386/sse.md   | 32 
> >  gcc/testsuite/gcc.target/i386/pr115843.c | 38
> > 
> >  3 files changed, 69 insertions(+), 6 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/i386/pr115843.c
> >
> > diff --git a/gcc/config/i386/predicates.md
> > b/gcc/config/i386/predicates.md index 5d0bb1e0f54..680594871de
> 100644
> > --- a/gcc/config/i386/predicates.md
> > +++ b/gcc/config/i386/predicates.md
> > @@ -825,6 +825,11 @@ (define_predicate "constm1_operand"
> >(and (match_code "const_int")
> > (match_test "op == constm1_rtx")))
> >
> > +;; Match 0 or -1.
> > +(define_predicate "const0_or_m1_operand"
> > +  (ior (match_operand 0 "const0_operand")
> > +   (match_operand 0 "constm1_operand")))
> > +
> >  ;; Match exactly eight.
> >  (define_predicate "const8_operand"
> >(and (match_code "const_int")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > e44822f705b..e11610f4b88 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -294,6 +294,7 @@ (define_mode_iterator V
> > (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX")
> V4SI
> > (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX")
> V2DI
> > (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF
> "TARGET_AVX")
> > V8HF
> > +   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF
> "TARGET_AVX")
> > + V8BF
> > (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX")
> V4SF
> > (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX")
> > (V2DF "TARGET_SSE2")])
> >
> > @@ -430,8 +431,8 @@ (define_mode_iterator VFB_512
> > (V16SF "TARGET_EVEX512")
> > (V8DF "TARGET_EVEX512")])
> >
> > -(define_mode_iterator V4SF_V8HF
> > -  [V4SF V8HF])
> > +(define_mode_iterator V24F_128
> > +  [V4SF V8HF V8BF])
> >
> >  (define_mode_iterator VI48_AVX512VL
> >[(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI
> > "TARGET_AVX512VL") @@ -11543,8 +11544,8 @@ (define_insn
> "*vec_concatv2sf_sse"
> > (set_attr "mode" "V4SF,SF,DI,DI")])
> >
> >  (define_insn "*vec_concat"
> > -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=x,v,x,v")
> > -   (vec_concat:V4SF_V8HF
> > +  [(set (match_operand:V24F_128 0 "register_operand"   "=x,v,x,v")
> > +   (vec_concat:V24F_128
> >   (match_operand: 1 "register_operand" " 
> > 0,v,0,v")
> >   (match_operand: 2 "nonimmediate_operand" "
> x,v,m,m")))]
> >"TARGET_SSE"
> > @@ -11559,8 +11560,8 @@ (define_insn "*vec_concat"
> > (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
> >
> >  (define_insn "*vec_concat_0"
> > -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=v")
> > -   (vec_concat:V4SF_V8HF
> > +  [(set (match_operand:V24F_128 0 "register_operand"   "=v")
>

RE: [PATCH] [x86] Mention _Float16 and __bf16 changes in GCC14.

2024-08-11 Thread Liu, Hongtao




> -Original Message-
> From: Gerald Pfeifer 
> Sent: Saturday, August 10, 2024 6:33 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH] [x86] Mention _Float16 and __bf16 changes in GCC14.
> 
> On Wed, 31 Jul 2024, liuhongt wrote:
> > +   The _Float16 and __bf16 type are
> supported
> > +independent of SSE2. W/o SSE2, these types are storage-only, compiler
> will
> > +issue an error when they're used in conversion, unary operation,
> > +binary operation, parameter passing or value return.
> 
> "types" (plural)
> "independently"
> "Without" (spelt out)
> "the compiler"
> 
> And personally I would use an Oxford comma, so "..., or value return".
> 
> > +instead of __FLT16_MAX__(or other similar Macros).
> 
> "macros" (lowercase)
> 
> 
> --- a/htdocs/gcc-14/porting_to.html
> +++ b/htdocs/gcc-14/porting_to.html
> 
> I don't think we need this in porting_to.html as well; the release notes are
> sufficient.
> 
> 
> This patch is okay with the changes above. I see this is already
> committed. Can you please make them as follow-up? Or should I?
Could you help to refine the words, much thanks for that.
> 
> Thanks,
> Gerald

RE: [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-08-13 Thread Liu, Hongtao




> -Original Message-
> From: Victor Do Nascimento 
> Sent: Tuesday, August 13, 2024 8:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: tamar.christ...@arm.com; claz...@gmail.com; Liu, Hongtao
> ; s...@gcc.gnu.org; bernds_...@t-online.de;
> al...@redhat.com; Victor Do Nascimento 
> Subject: [PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and
> sse targets
> 
> Following the migration of the dot_prod optab from a direct to a conversion-
> type optab, ensure all back-end patterns incorporate the second machine
> mode into pattern names.

Ok.

RE: [PATCH] i386: Fix some vex insns that prohibit egpr

2024-08-13 Thread Liu, Hongtao




> -Original Message-
> From: Kong, Lingling 
> Sent: Wednesday, August 14, 2024 9:38 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Jiang, Haochen
> 
> Subject: [PATCH] i386: Fix some vex insns that prohibit egpr
> 
> Although these vex insn have evex counterpart, but when it uses the displayed
> vex prefix should not support APX EGPR.
> Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   * config/i386/sse.md (vpmadd52):
>   Prohibit egpr for vex version.
>   (vcvtneps2bf16_v8sf): Ditto.
>   (vcvtneps2bf16_v8sf): Ditto.
>   (vpdpwssds_): Ditto.
>   (vpdpwssd_): Ditto.
>   (vpdpbusds_): Ditto.
>   (vpdpbusd_): Ditto.
> ---
>  gcc/config/i386/sse.md | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> d1010bc5682..7b9f619e112 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -29886,7 +29886,7 @@
>   (unspec:VI8_AVX2
> [(match_operand:VI8_AVX2 1 "register_operand" "0,0")
>  (match_operand:VI8_AVX2 2 "register_operand" "x,v")
> -(match_operand:VI8_AVX2 3 "nonimmediate_operand" "xm,vm")]
> +(match_operand:VI8_AVX2 3 "nonimmediate_operand" "xjm,vm")]
> VPMADD52))]
>"TARGET_AVXIFMA || (TARGET_AVX512IFMA && TARGET_AVX512VL)"
>"@
> @@ -30253,7 +30253,7 @@
>   (unspec:VI4_AVX2
> [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
>  (match_operand:VI4_AVX2 2 "register_operand" "x,v")
> -(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
> +(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
> UNSPEC_VPDPBUSD))]
>"TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
>"@
> @@ -30321,7 +30321,7 @@
>   (unspec:VI4_AVX2
> [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
>  (match_operand:VI4_AVX2 2 "register_operand" "x,v")
> -(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
> +(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
Also need to set attr addr to gpr16 for the alternative.

> UNSPEC_VPDPBUSDS))]
>"TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
>"@
> @@ -30389,7 +30389,7 @@
>   (unspec:VI4_AVX2
> [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
>  (match_operand:VI4_AVX2 2 "register_operand" "x,v")
> -(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
> +(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
> UNSPEC_VPDPWSSD))]
>"TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
>"@
> @@ -30457,7 +30457,7 @@
>   (unspec:VI4_AVX2
> [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
>  (match_operand:VI4_AVX2 2 "register_operand" "x,v")
> -(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
> +(match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
> UNSPEC_VPDPWSSDS))]
>"TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
>"@
> @@ -30681,7 +30681,7 @@
>[(set (match_operand:V8BF 0 "register_operand" "=x,v")
>   (vec_concat:V8BF
> (float_truncate:V4BF
> - (match_operand:V4SF 1 "nonimmediate_operand" "xm,vm"))
> + (match_operand:V4SF 1 "nonimmediate_operand" "xjm,vm"))
> (match_operand:V4BF 2 "const0_operand")))]
>"TARGET_AVXNECONVERT || (TARGET_AVX512BF16 &&
> TARGET_AVX512VL)"
>"@
> @@ -30745,7 +30745,7 @@
>  (define_insn "vcvtneps2bf16_v8sf"
>[(set (match_operand:V8BF 0 "register_operand" "=x,v")
>   (float_truncate:V8BF
> -   (match_operand:V8SF 1 "nonimmediate_operand" "xm,vm")))]
> +   (match_operand:V8SF 1 "nonimmediate_operand" "xjm,vm")))]
>"TARGET_AVXNECONVERT || (TARGET_AVX512BF16 &&
> TARGET_AVX512VL)&q

RE: [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Liu, Hongtao



> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 14, 2024 2:13 PM
> To: Kong, Lingling ; Richard Sandiford
> 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Uros
> Bizjak 
> Subject: Re: [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV
> 
> On Fri, Jun 14, 2024 at 3:39 AM Kong, Lingling 
> wrote:
> >
> > From: konglin1 
> >
> >
> >
> > APX CFCMOV feature implements conditionally faulting which means that
> > all
> >
> > memory faults are suppressed when the condition code evaluates to
> > false and
> >
> > load or store a memory operand. Now we could load or store a memory
> > operand
> >
> > may trap or fault for conditional move.
> >
> >
> >
> > To enable CFCMOV, we add a target HOOK
> > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> >
> > in if-conversion pass to allow convert to cmov.
> >
> >
> >
> > Bootstrapped & regtested on x86-64-pc-linux-gnu with binutils 2.42 branch.
> >
> > OK for trunk?
> 
> How does if-conversion end up modifying the IL?
> 
> I have the gut feeling that your hook changes semantics of RTL and you should
> instead have an optab for a "masked" load/store?
> 
> Richard - do you already have plans how to represent the first-fault loads?
> (are there first-fault stores?)
Yes.
> 
> Richard.
> 
> >
> >
> > gcc/ChangeLog:
> >
> >
> >
> >* config/i386/i386-expand.cc (ix86_can_cfcmov_p): New
> > function that
> >
> >test if the cfcmov can be generated.
> >
> >(ix86_expand_int_movcc): Expand to cfcmov pattern if
> > ix86_can_cfcmov_p
> >
> >return ture.
> >
> >* config/i386/i386-opts.h (enum apx_features): Add 
> > apx_cfcmov.
> >
> >* config/i386/i386.cc
> > (ix86_have_conditional_move_mem_notrap): New
> >
> >function to hook
> > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> >
> >(TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook
> define.
> >
> >(ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost;
> >
> >* config/i386/i386.h (TARGET_APX_CFCMOV): Define.
> >
> >* config/i386/i386.md (*cfcmov_1): New
> > define_insn to support
> >
> >cfcmov.
> >
> >(*cfcmov_2): Ditto.
> >
> >(UNSPEC_APX_CFCMOV): New unspec for cfcmov.
> >
> >* config/i386/i386.opt: Add enum value for cfcmov.
> >
> >* ifcvt.cc (noce_try_cmove_load_mem_notrap): Use target
> > hook to allow
> >
> >convert to cfcmov for conditional load.
> >
> >(noce_try_cmove_store_mem_notrap): Convert to conditional 
> > store.
> >
> >(noce_process_if_block): Ditto.
> >
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> >
> >
> >* gcc.target/i386/apx-cfcmov-1.c: New test.
> >
> >* gcc.target/i386/apx-cfcmov-2.c: Ditto.
> >
> > ---
> >
> > gcc/config/i386/i386-expand.cc   |  63 +
> >
> > gcc/config/i386/i386-opts.h  |   4 +-
> >
> > gcc/config/i386/i386.cc  |  33 ++-
> >
> > gcc/config/i386/i386.h   |   1 +
> >
> > gcc/config/i386/i386.md  |  53 +++-
> >
> > gcc/config/i386/i386.opt |   3 +
> >
> > gcc/config/i386/predicates.md|   7 +
> >
> > gcc/ifcvt.cc | 247 ++-
> >
> > gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c |  73 ++
> >
> > gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c |  40 +++
> >
> > 10 files changed, 511 insertions(+), 13 deletions(-)
> >
> > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
> >
> > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c
> >
> >
> >
> > diff --git a/gcc/config/i386/i386-expand.cc
> > b/gcc/config/i386/i386-expand.cc
> >
> > index 312329e550b..c02a4bcbec3 100644
> >
> > --- a/gcc/config/i386/i386-expand.cc
> >
> > +++ b/gcc/config/i386/i386-expand.cc
> >
> > @@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[])
> >
> >return true;
> >
> > }
> >
> >
>

RE: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Liu, Hongtao



> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 14, 2024 2:52 PM
> To: Kong, Lingling 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Uros
> Bizjak 
> Subject: Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV
> 
> On Fri, Jun 14, 2024 at 5:12 AM Kong, Lingling 
> wrote:
> >
> > APX CFCMOV[1] feature implements conditionally faulting which means
> > that all memory faults are suppressed when the condition code
> > evaluates to false and load or store a memory operand. Now we could load
> or store a memory operand may trap or fault for conditional move.
> >
> > In middle-end, now we don't support a conditional move if we knew that a
> load from A or B could trap or fault.
> 
> What's the cost of suppressing a fault?  ISTR that for example fault 
> suppression
> for vector masked load/store is quite expensive, so when this is for example
Yes, avx512 masked load/store, the cost is expensive when memory is invalid.
> done in a loop where there's always a fault that's suppressed you can see
> 1000-fold slowdown.  I would suspect this is similar for cfcmov?  So how is 
> this
> reflected in the decision to if-convert?
But for APXF, we were told the cost of invalid memory is as cheap as valid ones.
(Why else would this instructions be designed?)
> 
> > To enable CFCMOV, we add a target HOOK
> > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> > in if-conversion pass to allow convert to cmov.
> >
> > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > We also tested spec with SDE and passed the runtime test.
> >
> > Ok for trunk?
> >
> > [1].https://www.intel.com/content/www/us/en/developer/articles/technic
> > al/advanced-performance-extensions-apx.html
> >
> > Lingling Kong (3):
> >   [APX CFCMOV] Add a new target hook:
> TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> >   [APX CFCMOV] Support APX CFCMOV in if_convert pass
> >   [APX CFCMOV] Support APX CFCMOV in backend
> >
> >  gcc/config/i386/i386-expand.cc   |  63 +
> >  gcc/config/i386/i386-opts.h  |   4 +-
> >  gcc/config/i386/i386.cc  |  33 ++-
> >  gcc/config/i386/i386.h   |   1 +
> >  gcc/config/i386/i386.md  |  53 +++-
> >  gcc/config/i386/i386.opt |   3 +
> >  gcc/config/i386/predicates.md|   7 +
> >  gcc/doc/tm.texi  |   6 +
> >  gcc/doc/tm.texi.in   |   2 +
> >  gcc/ifcvt.cc | 247 ++-
> >  gcc/target.def   |  11 +
> >  gcc/targhooks.cc |   8 +
> >  gcc/targhooks.h  |   1 +
> >  gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c |  73 ++
> > gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c |  40 +++
> >  15 files changed, 539 insertions(+), 13 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c
> >
> > --
> > 2.31.1
> >

RE: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Liu, Hongtao




> -Original Message-
> From: Kong, Lingling 
> Sent: Thursday, August 1, 2024 9:30 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Wang, Hongyu
> 
> Subject: [PATCH] i386: Fix memory constraint for APX NF
> 
> The je constraint should be used for APX NDD ADD with register source
> operand. The jM is for APX NDD patterns with immediate operand.
But these 2 alternatives is for Non-NDD.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
> * config/i386/i386.md (nf_mem_constraint): Fixed the constraint
> for the define_subst_attr.
> (nf_mem_constraint): Added new define_subst_attr.
> (*add_1): Fixed the constraint.
> ---
>  gcc/config/i386/i386.md | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> fb10fdc9f96..aa7220ee17c 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6500,7 +6500,8 @@
>  (define_subst_attr "nf_name" "nf_subst" "_nf" "")  (define_subst_attr
> "nf_prefix" "nf_subst" "%{nf%} " "")  (define_subst_attr "nf_condition"
> "nf_subst" "TARGET_APX_NF" "true") -(define_subst_attr
> "nf_mem_constraint" "nf_subst" "je" "m")
> +(define_subst_attr "nf_add_mem_constraint" "nf_subst" "je" "m")
> +(define_subst_attr "nf_mem_constraint" "nf_subst" "jM" "m")
>  (define_subst_attr "nf_applied" "nf_subst" "true" "false")  
> (define_subst_attr
> "nf_nonf_attr" "nf_subst"  "noapx_nf" "*")  (define_subst_attr
> "nf_nonf_x64_attr" "nf_subst" "noapx_nf" "x64") @@ -6514,7 +6515,7 @@
> (clobber (reg:CC FLAGS_REG))])
> 
>  (define_insn "*add_1"
> -  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> "=rm,r,r,r,r,r,r,r")
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> + "=r,r,r,r,r,r,r,r")
> (plus:SWI48
>   (match_operand:SWI48 1 "nonimmediate_operand"
> "%0,0,0,r,r,rje,jM,r")
>   (match_operand:SWI48 2 "x86_64_general_operand"
> "r,e,BM,0,le,r,e,BM")))]
> --
> 2.31.1

RE: [PATCH] i386: Fix comment/naming for APX NDD constraints

2024-08-02 Thread Liu, Hongtao




> -Original Message-
> From: Kong, Lingling 
> Sent: Friday, August 2, 2024 2:43 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; H. J. Lu 
> Subject: [PATCH] i386: Fix comment/naming for APX NDD constraints
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
> * config/i386/constraints.md: Fixed the comment/naming
> for je/jM/jO.
> * config/i386/predicates.md (apx_ndd_memory_operand):
> Renamed and fixed the comment.
> (apx_evex_memory_operand): New name.
> (apx_ndd_add_memory_operand): Ditto.
> (apx_evex_add_memory_operand): Ditto.
> ---
>  gcc/config/i386/constraints.md | 12 ++--
> gcc/config/i386/predicates.md  | 21 +++--
>  2 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 18389c47800..e03d0e1b45b 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -463,14 +463,14 @@
>   "TARGET_APX_EGPR && !TARGET_AVX ? GENERAL_GPR16 : GENERAL_REGS")
> 
>  (define_memory_constraint "je"
> -  "@internal Memory operand for APX NDD ADD."
> -  (match_operand 0 "apx_ndd_add_memory_operand"))
> +  "@internal Memory operand for APX EVEX ADD for NDD or NF."
> +  (match_operand 0 "apx_evex_add_memory_operand"))
> 
>  (define_memory_constraint "jM"
> -  "@internal Memory operand, with APX NDD check."
> -  (match_operand 0 "apx_ndd_memory_operand"))
> +  "@internal Memory operand, with APX EVEX check for NDD or NF."
> +  (match_operand 0 "apx_evex_memory_operand"))
> 
>  (define_memory_constraint "jO"
> -  "@internal Offsettable memory operand, with APX NDD check."
> -  (and (match_operand 0 "apx_ndd_memory_operand")
> +  "@internal Offsettable memory operand, with APX EVEX check for NDD or
> NF."
> +  (and (match_operand 0 "apx_evex_memory_operand")
>(match_test "offsettable_nonstrict_memref_p (op)"))) diff --git
> a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index
> 680594871de..8cab10550e8 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -2264,11 +2264,11 @@
>return true;
>  })
> 
> -;; Return true if OP is a memory operand that can be also used in APX -;; NDD
> patterns with immediate operand.  With non-default address space, -;;
> segment register or address size prefix, APX NDD instruction length
> +;; Return true if OP is a memory operand that can be also used in APX
> +EVEX for ;; NDD or NF patterns with immediate operand.  With
APX EVEX-encoded patterns(i.e. APX NDD/NF) with ...
> +non-default address space, ;; segment register or address size prefix,
> +APX EVEX instruction length
>  ;; can exceed the 15 byte size limit.
> -(define_predicate "apx_ndd_memory_operand"
> +(define_predicate "apx_evex_memory_operand"
>(match_operand 0 "memory_operand")
>  {
>/* OK if immediate operand size < 4 bytes.  */ @@ -2312,19 +2312,20 @@
>return true;
>  })
> 
> -;; Return true if OP is a memory operand which can be used in APX NDD -;;
> ADD with register source operand.  UNSPEC_GOTNTPOFF memory operand -;;
> is allowed with APX NDD ADD only if R_X86_64_CODE_6_GOTTPOFF works.
> -(define_predicate "apx_ndd_add_memory_operand"
> +;; Return true if OP is a memory operand which can be used in APX EVEX
EVEX-encoded ADD patterns(.i.e APX NDD/NF) with ...
> +ADD for ;; NDD or NF with register source operand.  UNSPEC_GOTNTPOFF
> +memory operand is ;; allowed with APX EVEX ADD only if
> R_X86_64_CODE_6_GOTTPOFF works.
> +(define_predicate "apx_evex_add_memory_operand"
>(match_operand 0 "memory_operand")
>  {
> -  /* OK if "add %reg1, name@gottpoff(%rip), %reg2" is supported.  */
> +  /* OK if "add %reg1, name@gottpoff(%rip), %reg2" or
> +   "{nf} add name@gottpoff(%rip), %reg1" are supported.  */
>if (HAVE_AS_R_X86_64_CODE_6_GOTTPOFF)
>  return true;
> 
>op = XEXP (op, 0);
> 
> -  /* Disallow APX NDD ADD with UNSPEC_GOTNTPOFF.  */
> +  /* Disallow APX EVEX ADD with UNSPEC_GOTNTPOFF.  */
APX EVEX-encoded ADD, others LGTM.
>if (GET_CODE (op) == CONST
>&& GET_CODE (XEXP (op, 0)) == UNSPEC
>&& XINT (XEXP (op, 0), 1) == UNSPEC_GOTNTPOFF)
> --
> 2.31.1

Enable BF16 support (Please ignore my former email)

2019-04-12 Thread Liu, Hongtao

Hi :
This patch is about to enable support for bfloat16 which will be in Future 
Cooper Lake, Please refer to 
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
for more details about BF16.

There are 3 instructions for AVX512BF16: VCVTNE2PS2BF16, VCVTNEPS2BF16 and 
DPBF16PS instructions, which are Vector Neural Network Instructions supporting:

-   VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
-   VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
-   VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single 
Precision.

Since only BF16 intrinsics are supported, we treat it as HI for simplicity.

Bootstrap and regression test for x86/i386 backend are ok.

Changelog:

2019-04-07Wei Xiao
gcc/:
  * common/config/i386/i386-common.c 
(OPTION_MASK_ISA_AVX512BF16_SET,
  OPTION_MASK_ISA_AVX512BF16_UNSET,
  OPTION_MASK_ISA2_AVX512BW_UNSET ): New.
  (OPTION_MASK_ISA2_AVX512F_UNSET): Add 
OPTION_MASK_ISA_AVX512BF16_UNSET.
  (ix86_handle_option): Handle -mavx512bf16.
  * config.gcc: Add avx512bf16vlintrin.h and avx512bf16intrin.h
  to extra_headers.
  * config/i386/avx512bf16vlintrin.h: New.
  * config/i386/avx512bf16intrin.h: New.
  * config/i386/cpuid.h (bit_AVX512BF16): New.
  * config/i386/driver-i386.c (host_detect_local_cpu): Detect BF16.
  * config/i386/i386-builtin-types.def: Add new types.
  * config/i386/i386-builtin.def: Add new builtins.
  * config/i386/i386-c.c (ix86_target_macros_internal): Define
  __AVX512BF16__.
  * config/i386/i386.c (ix86_target_string): Add -mavx512bf16.
  (ix86_option_override_internal): Handle BF16.
  (ix86_valid_target_attribute_inner_p): Ditto.
  (fold_builtin_cpu): Ditto.
  (ix86_expand_args_builtin): Ditto.
  * config/i386/i386.h (TARGET_AVX512BF16, TARGET_AVX512BF16_P): 
New.
  (PTA_AVX512BF16): Ditto.
  * config/i386/i386.opt: Add -mavx512bf16.
  * config/i386/immintrin.h: Include avx512bf16intrin.h
  and avx512bf16vlintrin.h.
  * config/i386/sse.md (avx512f_cvtne2ps2bf16_,
  avx512f_cvtneps2bf16_,
  avx512f_dpbf16ps_): New define_insn 
patterns.
  * config/i386/subst.md (mask_half): Add new subst.
  * doc/invoke.texi: Document -mavx512bf16.

gcc/testsuite/:
  * gcc.target/i386/avx512bf16-vcvtne2ps2bf16-1.c: New test.
  * gcc.target/i386/avx512bf16-vcvtneps2bf16-1.c: New test.
  * gcc.target/i386/avx512bf16-vdpbf16ps-1.c: New test.
  * gcc.target/i386/avx512bf16vl-vcvtne2ps2bf16-1.c: New test.
  * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: New test.
  * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test.
  * gcc.target/i386/sse-12.c: Add -mavx512bf16.
  * gcc.target/i386/sse-13.c: Ditto.
  * gcc.target/i386/sse-14.c: Ditto.
  * gcc.target/i386/sse-22.c: Ditto.
  * gcc.target/i386/sse-23.c: Ditto.
  * g++.dg/other/i386-2.C: Ditto.
  * gcc.target/i386/avx-1.c: Ditto.
  * gcc.target/i386/avx-2.c: Ditto.
  * g++.dg/other/i386-3.C: Add avx512bf16.

Regards
Hongtao Liu




0001-Enable-BF16-support.patch
Description: 0001-Enable-BF16-support.patch

RE: [PATCH] i386: Fix scalar VCOMSBF16 which only compares low word

2024-10-14 Thread Liu, Hongtao




> -Original Message-
> From: Kong, Lingling 
> Sent: Thursday, October 10, 2024 9:57 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Xu, Liwei 
> Subject: [PATCH] i386: Fix scalar VCOMSBF16 which only compares low word
> 
> Hi,
> 
> Fixed scalar VCOMSBF16 misused in AVX10.2.
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}.
> 
> Ok for trunk?
Ok.
> 
> gcc/ChangeLog:
> 
>   * config/i386/sse.md (avx10_2_comsbf16_v8bf): Fixed scalar
>   operands.
> ---
>  gcc/config/i386/sse.md | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> d6e2135423d..a529849898e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -32332,8 +32332,12 @@
>  (define_insn "avx10_2_comsbf16_v8bf"
>[(set (reg:CCFP FLAGS_REG)
>   (unspec:CCFP
> -   [(match_operand:V8BF 0 "register_operand" "v")
> -(match_operand:V8BF 1 "nonimmediate_operand" "vm")]
> +   [(vec_select:BF
> +  (match_operand:V8BF 0 "register_operand" "v")
> +  (parallel [(const_int 0)]))
> +(vec_select:BF
> +  (match_operand:V8BF 1 "nonimmediate_operand" "vm")
> +  (parallel [(const_int 0)]))]
>UNSPEC_VCOMSBF16))]
>"TARGET_AVX10_2_256"
>"vcomsbf16\t{%1, %0|%0, %1}"
> --
> 2.31.1

RE: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-04 Thread Liu, Hongtao




> -Original Message-
> From: MayShao-oc 
> Sent: Tuesday, November 5, 2024 11:20 AM
> To: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; Liu, Hongtao
> ; ubiz...@gmail.com
> Cc: ti...@zhaoxin.com; silviaz...@zhaoxin.com; loui...@zhaoxin.com;
> cobec...@zhaoxin.com
> Subject: [PATCH] [x86_64] Add flag to control tight loops alignment opt
> 
> Hi all:
> This patch add -malign-tight-loops flag to control pass_align_tight_loops.
> The motivation is that pass_align_tight_loops may cause performance
> regression in nested loops.
> 
> The example code as follows:
> 
> #define ITER 2
> #define ITER_O 10
> 
> int i, j,k;
> int array[ITER];
> 
> void loop()
> {
>   int i;
>   for(k = 0; k < ITER_O; k++)
>   for(j = 0; j < ITER; j++)
>   for(i = 0; i < ITER; i++)
>   {
> array[i] += j;
> array[i] += i;
> array[i] += 2*j;
> array[i] += 2*i;
>   }
> }
> 
> When I compile it with gcc -O1 loop.c, the output assembly as follows.
> It is not optimal, because of too many nops insert in the outer loop.
> 
> 00400540 :
>   400540: 48 83 ec 08 sub$0x8,%rsp
>   400544: bf 0a 00 00 00  mov$0xa,%edi
>   400549: b9 00 00 00 00  mov$0x0,%ecx
>   40054e: 8d 34 09lea(%rcx,%rcx,1),%esi
>   400551: b8 00 00 00 00  mov$0x0,%eax
>   400556: 66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
>   40055d: 00 00 00 00
>   400561: 66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
>   400568: 00 00 00 00
>   40056c: 66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
>   400573: 00 00 00 00
>   400577: 66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
>   40057e: 00 00
>   400580: 89 ca   mov%ecx,%edx
>   400582: 03 14 85 60 10 60 00add0x601060(,%rax,4),%edx
>   400589: 01 c2   add%eax,%edx
>   40058b: 01 f2   add%esi,%edx
>   40058d: 8d 14 42lea(%rdx,%rax,2),%edx
>   400590: 89 14 85 60 10 60 00mov%edx,0x601060(,%rax,4)
>   400597: 48 83 c0 01 add$0x1,%rax
>   40059b: 48 3d 20 4e 00 00   cmp$0x4e20,%rax
>   4005a1: 75 dd   jne400580 
> 
>I benchmark this program in the intel Xeon, and find the optimization may
> cause a 40% performance regression (6.6B cycles VS 9.3B cycles).
>So I propose to add -malign-tight-loops flag to control tight loop
> optimization to avoid this, we could disalbe this optimization by default.
>Bootstrapped X86_64.
>Ok for trunk?
> 
> BR
> Mayshao
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-features.cc (ix86_align_tight_loops): New flag.
>   * config/i386/i386.opt (malign-tight-loops): New option.
>   * doc/invoke.texi (-malign-tight-loops): Document.
> ---
>  gcc/config/i386/i386-features.cc | 4 +++-
>  gcc/config/i386/i386.opt | 4 
>  gcc/doc/invoke.texi  | 7 ++-
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-
> features.cc
> index e2e85212a4f..f9546e00b07 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -3620,7 +3620,9 @@ public:
>/* opt_pass methods: */
>bool gate (function *) final override
>  {
> -  return optimize && optimize_function_for_speed_p (cfun);
> +  return ix86_align_tight_loops
> +&& optimize
> +&& optimize_function_for_speed_p (cfun);
>  }
> 
>unsigned int execute (function *) final override diff --git
> a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> 64c295d344c..ec41de192bc 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1266,6 +1266,10 @@ mlam=
>  Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type)
> Init(lam_none)  -mlam=[none|u48|u57] Instrument meta data position in
> user data pointers.
> 
> +malign-tight-loops
> +Target Var(ix86_align_tight_loops) Init(0) Optimization Enable align
> +tight loops.

I'd like it to be on by default, so Init (1)?

> +
>  Enum
>  Name(lam_type) Type(enum lam_type) UnknownError(unknown lam
> type %qs)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> 07920e07b4d..9ec1e1f0095 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1510,7 +1510,7 @@ See RS/6000 and PowerPC Options.
>  -mindirect-branch=@var{choice}  -mfunction

RE: [PATCH] i386: Update the comment for mapxf option

2024-09-18 Thread Liu, Hongtao




> -Original Message-
> From: Kong, Lingling 
> Sent: Wednesday, September 18, 2024 4:31 PM
> To: gcc-patches 
> Cc: Liu, Hongtao ; Wang, Hongyu
> 
> Subject: [PATCH] i386: Update the comment for mapxf option
> 
> Hi,
> 
> After APX NF, CCMP and NF features supported, the comment for APX option
> also need update.
> 
> Ok for trunk?
Ok
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386.opt: Update the features included in apxf.
> ---
>  gcc/config/i386/i386.opt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> fe16e44a4ea..64c295d344c 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1313,7 +1313,7 @@ Enable vectorization for scatter instruction.
>  mapxf
>  Target Mask(ISA2_APX_F) Var(ix86_isa_flags2) Save  Support code generation
> for APX features, including EGPR, PUSH2POP2, -NDD and PPX.
> +NDD, PPX, NF, CCMP and ZU.
> 
>  mapx-features=
>  Target Undocumented Joined Enum(apx_features) EnumSet
> Var(ix86_apx_features) Init(apx_none) Save
> --
> 2.31.1

RE: [PATCH] testsuite: Add -march=x86-64-v3 to AVX10 testcases to slience warning for GCC built with AVX512 arch

2024-10-16 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, October 16, 2024 3:45 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] testsuite: Add -march=x86-64-v3 to AVX10 testcases to
> slience warning for GCC built with AVX512 arch
> 
> Hi all,
> 
> Currently, when build GCC with config --with-arch=native on AVX512
> machines, if we run AVX10.2 testcases, we will get vector size warnings.
> It is expected but annoying. Simply add -march=x86-64-v3 to override
> --with-arch=native to slience all the warnings.
> 
> Tested on x86-64-linux-gnu. Ok for trunk?
Ok.
> 
> Thx,
> Haochen
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/avx10_1-25.c: Add -march=x86-64-v3.
>   * gcc.target/i386/avx10_1-26.c: Ditto.
>   * gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-bf16-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-convert-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-media-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-minmax-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-satcvt-1.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcmppbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vmulnepbf16-2.c: Ditto.
>   * gcc.target/i386/avx10_2-

RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Liu, Hongtao




> -Original Message-
> From: Hu, Lin1 
> Sent: Thursday, November 7, 2024 11:03 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Add -mavx512vl for pr117304-1.c
> 
> Hi, all
> 
> Testing pr117304-1.c in a machine with only avx2 generates some different
> hints, so add -mavx512vl at its option list.
Didn't quite understand, what kind of hint it is, why avx512vl is needed?
> 
> Bootstrapped and regtested on x86-64-pc-linux-gnu.
> I think it is an obvious commit, but I still waiting for some while.
> If someone have other suggestion.
> 
> BRs,
> Lin
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr117304-1.c: Add -mavx512vl.
> ---
>  gcc/testsuite/gcc.target/i386/pr117304-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr117304-1.c
> b/gcc/testsuite/gcc.target/i386/pr117304-1.c
> index fc1c5bfd3e3..da26f4bd1b7 100644
> --- a/gcc/testsuite/gcc.target/i386/pr117304-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr117304-1.c
> @@ -1,6 +1,6 @@
>  /* PR target/117304 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx512f -mno-evex512" } */
> +/* { dg-options "-O2 -mavx512f -mno-evex512 -mavx512vl" } */
> 
>  typedef __attribute__((__vector_size__(32))) int __v8si;  typedef
> __attribute__((__vector_size__(32))) unsigned int __v8su;
> --
> 2.31.1

RE: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Liu, Hongtao




> -Original Message-
> From: Mayshao-oc 
> Sent: Thursday, November 7, 2024 11:13 AM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; Liu, Hongtao
> ; ubiz...@gmail.com; richard.guent...@gmail.com;
> Tim Hu(WH-RD) ; Silvia Zhao(BJ-RD)
> ; Louis Qi(BJ-RD) ; Cobe
> Chen(BJ-RD) 
> Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for
> pass_align_tight_loops
> 
> > > On Thu, Nov 7, 2024 at 10:29?AM MayShao-oc  o...@zhaoxin.com> wrote:
> > >
> > > Hi all:
> > >For zhaoxin, I find no improvement when enable
> > > pass_align_tight_loops, and have performance drop in some cases.
> > >This patch add a new tunable to bypass pass_align_tight_loops in
> zhaoxin.
> > >
> > >Bootstrapped X86_64.
> > >Ok for trunk?
LGTM.
> > > BR
> > > Mayshao
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-features.cc (TARGET_ALIGN_TIGHT_LOOPS):
> > > default true in all processors except for zhaoxin.
> > > * config/i386/i386.h (TARGET_ALIGN_TIGHT_LOOPS): New Macro.
> > > * config/i386/x86-tune.def (X86_TUNE_ALIGN_TIGHT_LOOPS):
> > > New tune
> > > ---
> > >  gcc/config/i386/i386-features.cc | 4 +++-
> > >  gcc/config/i386/i386.h   | 3 +++
> > >  gcc/config/i386/x86-tune.def | 4 
> > >  3 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/i386/i386-features.cc
> > > b/gcc/config/i386/i386-features.cc
> > > index e2e85212a4f..d9fd92964fe 100644
> > > --- a/gcc/config/i386/i386-features.cc
> > > +++ b/gcc/config/i386/i386-features.cc
> > > @@ -3620,7 +3620,9 @@ public:
> > >/* opt_pass methods: */
> > >bool gate (function *) final override
> > >  {
> > > -  return optimize && optimize_function_for_speed_p (cfun);
> > > +  return TARGET_ALIGN_TIGHT_LOOPS
> > > +&& optimize
> > > +&& optimize_function_for_speed_p (cfun);
> > >  }
> > >
> > >unsigned int execute (function *) final override diff --git
> > > a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > > 2dcd8803a08..7f9010246c2 100644
> > > --- a/gcc/config/i386/i386.h
> > > +++ b/gcc/config/i386/i386.h
> > > @@ -466,6 +466,9 @@ extern unsigned char
> > > ix86_tune_features[X86_TUNE_LAST];
> > >  #define TARGET_USE_RCR ix86_tune_features[X86_TUNE_USE_RCR]
> > >  #define TARGET_SSE_MOVCC_USE_BLENDV \
> > > ix86_tune_features[X86_TUNE_SSE_MOVCC_USE_BLENDV]
> > > +#define TARGET_ALIGN_TIGHT_LOOPS \
> > > +ix86_tune_features[X86_TUNE_ALIGN_TIGHT_LOOPS]
> > > +
> > >
> > >  /* Feature tests against the various architecture variations.  */
> > > enum ix86_arch_indices { diff --git a/gcc/config/i386/x86-tune.def
> > > b/gcc/config/i386/x86-tune.def index 6ebb2fd3414..bd4fa8b3eee
> 100644
> > > --- a/gcc/config/i386/x86-tune.def
> > > +++ b/gcc/config/i386/x86-tune.def
> > > @@ -542,6 +542,10 @@ DEF_TUNE
> > > (X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD,
> > >  DEF_TUNE (X86_TUNE_SSE_MOVCC_USE_BLENDV,
> > >   "sse_movcc_use_blendv", ~m_CORE_ATOM)
> > >
> > > +/* X86_TUNE_ALIGN_TIGHT_LOOPS: if false, tight loops are not
> > > +aligned. */ DEF_TUNE (X86_TUNE_ALIGN_TIGHT_LOOPS,
> "align_tight_loops",
> > > +~(m_ZHAOXIN))
> > Please also add ~(m_ZHAOXIN | m_CASCADELAKE |
> m_SKYLAKE_AVX512))
> > And could you put it under the section of
> >
> >
> /**
> ***/
> > -/* Branch predictor tuning 
> >  */
> > +/* Branch predictor and The Front-end tuning
> >   */
> >
> >
> /**
> ***
> > /
> > > +
> > >
> /**
> ***/
> > >  /* AVX instruction selection tuning (some of SSE flags affects AVX, too)
> */
> > >
> > >
> /**
> *
> > > **/
> > > --
> > > 2.27.0
> > >
> >
> >
> > --
> > BR,
> > Hongtao
> 
> Ok
> 
> BR
> Mayshao

RE: [PATCH] avx10_2-comibf-2.c: Require AVX10.2 support

2024-11-06 Thread Liu, Hongtao



> -Original Message-
> From: H.J. Lu 
> Sent: Wednesday, November 6, 2024 4:17 PM
> To: Liu, Hongtao ; GCC Patches  patc...@gcc.gnu.org>; Uros Bizjak 
> Subject: [PATCH] avx10_2-comibf-2.c: Require AVX10.2 support
> 
> Since avx10_2-comibf-2.c is a run test, require AVX10.2 support.
> 
> * gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target.
Ok, thanks.
> 
> --
> H.J.

RE: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Liu, Hongtao



> -Original Message-
> From: Xi Ruoyao 
> Sent: Thursday, November 7, 2024 1:12 PM
> To: Liu, Hongtao ; Mayshao-oc  o...@zhaoxin.com>; Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com;
> richard.guent...@gmail.com; Tim Hu(WH-RD) ; Silvia
> Zhao(BJ-RD) ; Louis Qi(BJ-RD)
> ; Cobe Chen(BJ-RD) 
> Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for
> pass_align_tight_loops
> 
> On Thu, 2024-11-07 at 04:58 +, Liu, Hongtao wrote:
> > > > > Hi all:
> > > > >     For zhaoxin, I find no improvement when enable
> > > > > pass_align_tight_loops, and have performance drop in some cases.
> > > > >     This patch add a new tunable to bypass
> > > > > pass_align_tight_loops in
> > > zhaoxin.
> > > > >
> > > > >     Bootstrapped X86_64.
> > > > >     Ok for trunk?
> > LGTM.
> 
> I'd suggest to add the reference to PR 117438 into the subject and ChangeLog.
Yes, thanks.
> 
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

RE: [PATCH] i386: Modify regexp of pr117304-1.c

2024-11-06 Thread Liu, Hongtao




> -Original Message-
> From: Hu, Lin1 
> Sent: Thursday, November 7, 2024 2:35 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Modify regexp of pr117304-1.c
> 
> OK, so just modify the regexp.
> 
> Since the test doesn't care if the hint is correct, modify the regexp of the 
> hint
> part to avoid future changes to the hint that would cause the test to fail.
Ok.
> 
> BRs,
> Lin
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr117304-1.c: Modify regexp.
> ---
>  gcc/testsuite/gcc.target/i386/pr117304-1.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr117304-1.c
> b/gcc/testsuite/gcc.target/i386/pr117304-1.c
> index fc1c5bfd3e3..ec75f271447 100644
> --- a/gcc/testsuite/gcc.target/i386/pr117304-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr117304-1.c
> @@ -20,9 +20,9 @@ volatile __v16su ui;
>  void
>  foo()
>  {
> -  hi ^= __builtin_ia32_cvttpd2dq512_mask(df, hi, 0, 4); /* { dg-error 
> "implicit
> declaration of function '__builtin_ia32_cvttpd2dq512_mask'; did you mean
> '__builtin_ia32_cvttpd2dq128_mask'?" } */
> -  hui ^= __builtin_ia32_cvttpd2udq512_mask(df, hui, 0, 4); /* { dg-error
> "implicit declaration of function '__builtin_ia32_cvttpd2udq512_mask'; did
> you mean '__builtin_ia32_cvttpd2udq128_mask'?" } */
> -  ui ^= __builtin_ia32_cvttps2dq512_mask(sf, ui, 0, 4); /* { dg-error 
> "implicit
> declaration of function '__builtin_ia32_cvttps2dq512_mask'; did you mean
> '__builtin_ia32_cvttps2dq128_mask'?" } */
> -  ui ^= __builtin_ia32_cvttps2udq512_mask(sf, ui, 0, 4); /* { dg-error 
> "implicit
> declaration of function '__builtin_ia32_cvttps2udq512_mask'; did you mean
> '__builtin_ia32_cvttps2udq128_mask'?" } */
> -  __builtin_ia32_cvtudq2ps512_mask(ui, sf, 0, 4); /* { dg-error "implicit
> declaration of function '__builtin_ia32_cvtudq2ps512_mask'; did you mean
> '__builtin_ia32_cvtudq2ps128_mask'?" } */
> +  hi ^= __builtin_ia32_cvttpd2dq512_mask(df, hi, 0, 4); /* { dg-error
> + "implicit declaration of function '__builtin_ia32_cvttpd2dq512_mask';
> + did you mean '__builtin_ia32_\[^\n\r]*'?" } */  hui ^=
> + __builtin_ia32_cvttpd2udq512_mask(df, hui, 0, 4); /* { dg-error
> + "implicit declaration of function '__builtin_ia32_cvttpd2udq512_mask';
> + did you mean '__builtin_ia32_\[^\n\r]*'?" } */  ui ^=
> + __builtin_ia32_cvttps2dq512_mask(sf, ui, 0, 4); /* { dg-error
> + "implicit declaration of function '__builtin_ia32_cvttps2dq512_mask';
> + did you mean '__builtin_ia32_\[^\n\r]*'?" } */  ui ^=
> __builtin_ia32_cvttps2udq512_mask(sf, ui, 0, 4); /* { dg-error "implicit
> declaration of function '__builtin_ia32_cvttps2udq512_mask'; did you mean
> '__builtin_ia32_\[^\n\r]*'?" } */  __builtin_ia32_cvtudq2ps512_mask(ui, sf, 0,
> 4); /* { dg-error "implicit declaration of function
> '__builtin_ia32_cvtudq2ps512_mask'; did you mean
> '__builtin_ia32_\[^\n\r]*'?" } */
>  }
> --
> 2.31.1

RE: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-24 Thread Liu, Hongtao




> -Original Message-
> From: Li, Pan2 
> Sent: Monday, November 25, 2024 10:01 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ; Li, Pan2
> 
> Subject: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector
> pattern
> 
> From: Pan Li 
> 
> There are some forms like below failed to recog the SAT_ADD pattern for target
> i386.  It is related to some match pattern extraction but get fixed after the
> refactor of the SAT_ADD pattern.  Thus, add testcases to ensure we may have
> similar issue in futrue.
> 
>   #define DEF_SAT_ADD(T)   \
>   T sat_add_##T (T x, T y) \
>   {\
> T res; \
> res = x + y;   \
> res |= -(T)(res < x);  \
> return res;\
>   }
> 
>   #define VEC_DEF_SAT_ADD(T)   \
>   void vec_sat_add(T * restrict a, T * restrict b) \
>   {\
> for (int i = 0; i < 8; i++)\
>   b[i] = sat_add_##T (a[i], b[i]); \
>   }
> 
>   DEF_SAT_ADD (uint32_t)
>   VEC_DEF_SAT_ADD (uint32_t)
> 
> The below test suites are passed for this patch.
> * The x86 fully regression test.
> 
>   PR target/112600
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr112600-5a-u16.c: New test.
>   * gcc.target/i386/pr112600-5a-u32.c: New test.
>   * gcc.target/i386/pr112600-5a-u64.c: New test.
>   * gcc.target/i386/pr112600-5a-u8.c: New test.
>   * gcc.target/i386/pr112600-5a.h: New test.
> 
> Signed-off-by: Pan Li 
> ---
>  .../gcc.target/i386/pr112600-5a-u16.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u32.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u64.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
>  gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
>  5 files changed, 62 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> new file mode 100644
> index 000..a278703fbdc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint16_t)
> +VEC_DEF_SAT_ADD (uint16_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */

You're scanning ".SAT_ADD ", so maybe better with pass "optimized" instead of 
"expand"?
Others LGTM.

> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> new file mode 100644
> index 000..52e31b7e1c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint32_t)
> +VEC_DEF_SAT_ADD (uint32_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> new file mode 100644
> index 000..4ee717471b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint64_t)
> +VEC_DEF_SAT_ADD (uint64_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> new file mode 100644
> index 000..9f488ebf658
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_A

RE: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-19 Thread Liu, Hongtao



> -Original Message-
> From: Mayshao-oc 
> Sent: Wednesday, November 20, 2024 2:43 PM
> To: Hongtao Liu 
> Cc: Liu, Hongtao ; Xi Ruoyao ;
> gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com;
> richard.guent...@gmail.com; Tim Hu(WH-RD) ; Silvia
> Zhao(BJ-RD) ; Louis Qi(BJ-RD)
> ; Cobe Chen(BJ-RD) 
> Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for
> pass_align_tight_loops
> 
> > On Fri, Nov 8, 2024 at 10:21 AM Mayshao-oc 
> wrote:
> > >
> > > > > -Original Message-
> > > > > From: Xi Ruoyao 
> > > > > Sent: Thursday, November 7, 2024 1:12 PM
> > > > > To: Liu, Hongtao ; Mayshao-oc  > > > > o...@zhaoxin.com>; Hongtao Liu 
> > > > > Cc: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; ubiz...@gmail.com;
> > > > > richard.guent...@gmail.com; Tim Hu(WH-RD) ;
> > > > > Silvia
> > > > > Zhao(BJ-RD) ; Louis Qi(BJ-RD)
> > > > > ; Cobe Chen(BJ-RD)
> 
> > > > > Subject: Re: [PATCH] [x86_64] Add microarchtecture tunable for
> > > > > pass_align_tight_loops On Thu, 2024-11-07 at 04:58 +, Liu,
> > > > > Hongtao wrote:
> > > > > > > > > Hi all:
> > > > > > > > > For zhaoxin, I find no improvement when enable
> > > > > > > > > pass_align_tight_loops, and have performance drop in some
> cases.
> > > > > > > > > This patch add a new tunable to bypass
> > > > > > > > > pass_align_tight_loops in
> > > > > > > zhaoxin.
> > > > > > > > >
> > > > > > > > > Bootstrapped X86_64.
> > > > > > > > > Ok for trunk?
> > > > > > LGTM.
> > > > >
> > > > > I'd suggest to add the reference to PR 117438 into the subject and
> ChangeLog.
> > > > Yes, thanks.
> > > Add PR 117438 into the subject and ChangeLog.
> > PR target/117438
> > Others LGTM.
> > > > >
> > > > > --
> > > > > Xi Ruoyao 
> > > > > School of Aerospace Science and Technology, Xidian University
> > > BR
> > > Mayshao
> >
> >
> >
> > --
> > BR,
> > Hongtao
> 
> Hi Hongtao:
> 
>   It seems no further comments. Could you please help me commit this patch?
Done.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=6350e956d1a74963a62bedabef3d4a1a3f2d4852
> 
> BR
> Mayshao

RE: [PATCH] i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]

2025-02-07 Thread Liu, Hongtao




> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, February 7, 2025 4:08 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] i386: Fix ICE with conditional QI/HI vector maxmin
> [PR118776]
> 
> Hi!
> 
> The following testcase ICEs starting with GCC 12 since r12-4526 although the
> bug has been introduced already in r12-2751.
> The problem was in the addition of cond_ define_expand
> which uses nonimmediate_operand predicates for both maxmin operands for
> all VI1248_AVX512VLBW modes.  It works fine with VI48_AVX512VL modes
> because the 3_mask VI48_AVX512VL define_expand uses
> ix86_fixup_binary_operands_no_copy and the
> *avx512f_3 VI48_AVX512VL define_insn uses %
> in constraint and !(MEM_P && MEM_P) check in condition (and
> 3 define_expand with VI124_256_AVX512F_AVX512BW
> iterator does that too), but eventhough the 8-bit and 16-bit element maxmin
> is commutative too, the 3
> define_insn with VI12_AVX512VL iterator didn't use % in constraint to make it
> commutative.  So, e.g. cond_umaxv32qi define_expand allowed
> nonimmediate_operand for both umax operands, but used
> gen_umaxv32qi_mask which wasn't commutative and only allowed
> nonimmediate_operand for the second operand.
> 
> The following patch fixes it by keeping the 3
> VI124_256_AVX512F_AVX512BW define_expand as is (it does
> ix86_fixup_binary_operands_no_copy) but extending the
> 3_mask define_expand from VI48_AVX512VL to
> VI1248_AVX512VLBW which keeps the current modes with their ISA
> conditions and adds the VI12_AVX512VL modes under additional
> TARGET_AVX512BW condition, and turning the actual define_insn into an *
> prefixed name (which it was before just for the non-masked
> case) and having the same commutative operand handling as in other
> define_insns.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok, thanks.
> 
> 2025-02-07  Jakub Jelinek  
> 
>   PR target/118776
>   * config/i386/sse.md (3_mask): Use
> VI1248_AVX512VLBW
>   iterator rather than VI48_AVX512VL.
>   (3): Rename to ...
>   (*avx512bw_3): ... this.  Use
>   nonimmediate_operand rather than register_operand predicate
> and %v
>   rather than v constraint for operand 1 and adjust condition to reject
>   MEMs in both operand 1 and 2.
> 
>   * gcc.target/i386/pr118776.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2025-01-23 15:54:53.160911648 +0100
> +++ gcc/config/i386/sse.md2025-02-07 00:16:49.155363094 +0100
> @@ -17703,12 +17703,12 @@ (define_expand "cond_"
>  })
> 
>  (define_expand "3_mask"
> -  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
> - (vec_merge:VI48_AVX512VL
> -   (maxmin:VI48_AVX512VL
> - (match_operand:VI48_AVX512VL 1 "nonimmediate_operand")
> - (match_operand:VI48_AVX512VL 2 "nonimmediate_operand"))
> -   (match_operand:VI48_AVX512VL 3 "nonimm_or_0_operand")
> +  [(set (match_operand:VI1248_AVX512VLBW 0 "register_operand")
> + (vec_merge:VI1248_AVX512VLBW
> +   (maxmin:VI1248_AVX512VLBW
> + (match_operand:VI1248_AVX512VLBW 1
> "nonimmediate_operand")
> + (match_operand:VI1248_AVX512VLBW 2
> "nonimmediate_operand"))
> +   (match_operand:VI1248_AVX512VLBW 3
> "nonimm_or_0_operand")
> (match_operand: 4 "register_operand")))]
>"TARGET_AVX512F"
>"ix86_fixup_binary_operands_no_copy (, mode,
> operands);") @@ -17724,12 +17724,12 @@ (define_insn
> "*avx512f_3 (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
> 
> -(define_insn "3"
> +(define_insn "*avx512bw_3"
>[(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
>  (maxmin:VI12_AVX512VL
> -  (match_operand:VI12_AVX512VL 1 "register_operand" "v")
> +  (match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v")
>(match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm")))]
> -  "TARGET_AVX512BW"
> +  "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P
> (operands[2]))"
> 
> "vp\t{%2, %1, %0|%0 _operand3>, %1, %2}"
>[(set_attr "type" "sseiadd")
> (set_attr "prefix" "evex")
> --- gcc/testsuite/gcc.target/i386/pr118776.c.jj   2025-02-07
> 08:41:46.054157905 +0100
> +++ gcc/testsuite/gcc.target/i386/pr118776.c  2025-02-07
> 08:40:30.508196302 +0100
> @@ -0,0 +1,23 @@
> +/* PR target/118776 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
> +
> +void bar (unsigned char *);
> +
> +void
> +foo (unsigned char *x)
> +{
> +  unsigned char b[32];
> +  bar (b);
> +  for (int i = 0; i < 32; i++)
> +{
> +  unsigned char c = 8;
> +  if (i > 3)
> + {
> +   unsigned char d = b[i];
> +   d = 1 > d ? 1 : d;
> +   c = d;
> + }
> +  x[i] = c;
> +}
> +}
> 
>   Jakub

RE: [PATCH] i386: Fix AVX512BW intrin header with OPTIMIZE [PR 118813]

2025-02-09 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Monday, February 10, 2025 2:10 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR
> 118813]
> 
> Hi all,
> 
> When moving intrins around for AVX10 implementation in GCC 14, the intrin
> _kshiftli_mask32 and _kshiftri_mask32 are wrongly wrapped by "#if
> __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__", leading to the intrin file not
> `-Wsystem-headers -Wundef` clean since r14-4490.
> 
> Ok for trunk?
Ok, and please backport to GCC14 release branch.
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   PR target/118813
>   * config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__
>   wrap.
> ---
>  gcc/config/i386/avx512bwintrin.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/avx512bwintrin.h
> b/gcc/config/i386/avx512bwintrin.h
> index 187e15a80ca..47c4c03e796 100644
> --- a/gcc/config/i386/avx512bwintrin.h
> +++ b/gcc/config/i386/avx512bwintrin.h
> @@ -199,7 +199,7 @@ _kunpackw_mask32 (__mmask16 __A, __mmask16
> __B)
> (__mmask32) __B);
>  }
> 
> -#if __OPTIMIZE__
> +#ifdef __OPTIMIZE__
>  extern __inline __mmask32
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _kshiftli_mask32 (__mmask32 __A, unsigned int __B)
> --
> 2.31.1

RE: [PATCH] i386: Change mnemonics from TCVTROWPS2PBF16[H,L] to TCVTROWPS2BF16[H,L]

2025-01-03 Thread Liu, Hongtao



> -Original Message-
> From: Jiang, Haochen 
> Sent: Friday, January 3, 2025 4:55 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Change mnemonics from TCVTROWPS2PBF16[H,L] to
> TCVTROWPS2BF16[H,L]
> 
> Hi all,
> 
> The mnemonics for TCVTROWPS2PBF16[H,L] has been changed to
> TCVTROWPS2BF16[H,L] in ISE056. There will be also some more BF16
> mnemonics change upcoming, which will fix the regression in PR118270.
Please add PR target/118270 to changelog, otherwise LGTM.
> 
> Bootstraped and tested on x86_64-pc-linux-gnu. Ok for trunk?
> 
> Ref: https://cdrdv2.intel.com/v1/dl/getContent/671368
> 
> Thx,
> Haochen
> 
> ---
> 
> In ISE056, the mnemonics for TCVTROWPS2PBF16[H,L] has been changed to
> TCVTROWPS2BF16[H,L].
> 
> gcc/ChangeLog:
> 
>   * config/i386/amxavx512intrin.h
>   (_tile_cvtrowps2pbf16h_internal): Rename to...
>   (_tile_cvtrowps2bf16h_internal): ...this.
>   (_tile_cvtrowps2pbf16hi_internal): Rename to...
>   (_tile_cvtrowps2bf16hi_internal): ...this.
>   (_tile_cvtrowps2pbf16l_internal): Rename to...
>   (_tile_cvtrowps2bf16l_internal): ...this.
>   (_tile_cvtrowps2pbf16li_internal): Rename to...
>   (_tile_cvtrowps2bf16li_internal): ...this.
>   (_tile_cvtrowps2pbf16h): Rename to...
>   (_tile_cvtrowps2bf16h): ...this.
>   (_tile_cvtrowps2pbf16hi): Rename to...
>   (_tile_cvtrowps2bf16hi): ...this.
>   (_tile_cvtrowps2pbf16l): Rename to...
>   (_tile_cvtrowps2bf16l): ...this.
>   (_tile_cvtrowps2pbf16li): Rename to...
>   (_tile_cvtrowps2bf16li): ...this.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/amxavx512-asmatt-1.c: Adjust intrin call.
>   * gcc.target/i386/amxavx512-asmintel-1.c: Ditto.
>   * gcc.target/i386/amxavx512-cvtrowps2pbf16-2.c: Rename to...
>   * gcc.target/i386/amxavx512-cvtrowps2bf16-2.c: ...this. Rename
>   test functions.
> ---
>  gcc/config/i386/amxavx512intrin.h | 32 +--
>  .../gcc.target/i386/amxavx512-asmatt-1.c  | 12 +++
>  .../gcc.target/i386/amxavx512-asmintel-1.c| 12 +++
>  ...2pbf16-2.c => amxavx512-cvtrowps2bf16-2.c} | 30 -
>  4 files changed, 43 insertions(+), 43 deletions(-)  rename
> gcc/testsuite/gcc.target/i386/{amxavx512-cvtrowps2pbf16-2.c =>
> amxavx512-cvtrowps2bf16-2.c} (67%)
> 
> diff --git a/gcc/config/i386/amxavx512intrin.h
> b/gcc/config/i386/amxavx512intrin.h
> index 59d142948fb..ab5362571d1 100644
> --- a/gcc/config/i386/amxavx512intrin.h
> +++ b/gcc/config/i386/amxavx512intrin.h
> @@ -53,38 +53,38 @@
>dst;   
> \
>  })
> 
> -#define _tile_cvtrowps2pbf16h_internal(src,A)
>   \
> +#define _tile_cvtrowps2bf16h_internal(src,A) \
>  ({   \
>__m512bh dst;
>   \
>__asm__ volatile   \
> -
> ("{tcvtrowps2pbf16h\t%1, %%tmm"#src", %0|tcvtrowps2pbf16h\t%0, %%t
> mm"#src", %1}"\
> +
> ("{tcvtrowps2bf16h\t%1, %%tmm"#src", %0|tcvtrowps2bf16h\t%0, %%tm
> m"#src", %1}" \
> : "=v" (dst) : "r" ((unsigned) (A))); \
>dst;   
> \
>  })
> 
> -#define _tile_cvtrowps2pbf16hi_internal(src,imm) \
> +#define _tile_cvtrowps2bf16hi_internal(src,imm)  \
>  ({   \
>__m512bh dst;
>   \
>__asm__ volatile   \
> -
> ("{tcvtrowps2pbf16h\t$"#imm", %%tmm"#src", %0|tcvtrowps2pbf16h\t%0,
>  %%tmm"#src", "#imm"}"\
> +
> ("{tcvtrowps2bf16h\t$"#imm", %%tmm"#src", %0|tcvtrowps2bf16h\t%0, %
> %tmm"#src", "#imm"}"  \
> : "=v" (dst) :);  \
>dst;   
> \
>  })
> 
> -#define _tile_cvtrowps2pbf16l_internal(src,A)
> \
> +#define _tile_cvtrowps2bf16l_internal(src,A) \
>  ({   \
>__m512bh dst;
>   \
>__asm_

RE: [PATCH] Document refactoring of the option -fcf-protection=x.

2024-12-24 Thread Liu, Hongtao




> -Original Message-
> From: Gerald Pfeifer 
> Sent: Wednesday, December 25, 2024 11:40 AM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org; hjl.to...@gmail.com
> Subject: Re: [PATCH] Document refactoring of the option -fcf-protection=x.
> 
> On Fri, 12 Jan 2024, Gerald Pfeifer wrote:
> > On Wed, 10 Jan 2024, liuhongt wrote:
> >> To override -fcf-protection, -fcf-protection=none needs to be added
> >> and then with -fcf-protection=xxx.
> > I'm afraid I am struggling with the English of this, but need more
> > time to untangle and suggest an alternative.
> >
> > For the time being I pushed the follow-up below.
> 
> Okay, I think I got it now.
> 
> Did you mean the following?
Yes, thanks.
> 
> Gerald
> 
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index
> ba9fc680..5d324767 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -43,9 +43,9 @@ You may also want to check out our
>for details.
>
>https://gcc.gnu.org/onlinedocs/gcc-
> 14.1.0/gcc/Instrumentation-Options.html">-fcf-
> protection=[full|branch|return|none|check]
> -  is refactored, to override -fcf-protection,
> -  -fcf-protection=none needs to be added and then
> -  with -fcf-protection=xxx.
> +  has been refactored. To override -fcf-protection with
> +  a more specific setting, add -fcf-protection=none
> +  followed by a specific -fcf-protection=xxx.
>
>Support for the ia64*-*- target ports which have been
>unmaintained for quite a while has been declared obsolete in GCC 14.

RE: [PATCH] i386: Treat Granite Rapids/Granite Rapids-D/Diamond Rapids similar as Sapphire Rapids in x86-tune.def

2025-02-26 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, February 26, 2025 4:18 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Treat Granite Rapids/Granite Rapids-D/Diamond Rapids
> similar as Sapphire Rapids in x86-tune.def
> 
> Hi all,
> 
> Since GNR, GNR-D, DMR are both P-core based, we should treat them just like
> SPR in tuning for now.
> 
> Ok for trunk and backport to GCC13/14 for GNR/GNR-D part?
Ok.
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   * config/i386/x86-tune.def
>   (X86_TUNE_DEST_FALSE_DEP_FOR_GLC): Add GNR, GNR-D, DMR.
>   (X86_TUNE_AVOID_256FMA_CHAINS): Ditto.
>   (X86_TUNE_AVX512_MOVE_BY_PIECES): Ditto.
>   (X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.
> ---
>  gcc/config/i386/x86-tune.def | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index df7b4ed22bc..0bdad7234a6 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -87,7 +87,8 @@ DEF_TUNE
> (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY,
> several insns to break false dependency on the dest register for GLC
> micro-architecture.  */
>  DEF_TUNE (X86_TUNE_DEST_FALSE_DEP_FOR_GLC,
> -   "dest_false_dep_for_glc", m_SAPPHIRERAPIDS | m_CORE_HYBRID
> +   "dest_false_dep_for_glc", m_SAPPHIRERAPIDS | m_GRANITERAPIDS
> +   | m_GRANITERAPIDS_D | m_DIAMONDRAPIDS | m_CORE_HYBRID
> | m_CORE_ATOM)
> 
>  /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and
> dependencies @@ -527,7 +528,8 @@ DEF_TUNE
> (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER
> smaller FMA chain.  */
>  DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains",
> m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_ZNVER5 |
> m_CORE_HYBRID
> -   | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
> +   | m_SAPPHIRERAPIDS | m_GRANITERAPIDS | m_GRANITERAPIDS_D
> +   | m_DIAMONDRAPIDS | m_CORE_ATOM | m_GENERIC)
> 
>  /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight
> 512bit or
> smaller FMA chain.  */
> @@ -594,12 +596,14 @@ DEF_TUNE
> (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces",
>  /* X86_TUNE_AVX512_MOVE_BY_PIECES: Optimize move_by_pieces with
> 512-bit
> AVX instructions.  */
>  DEF_TUNE (X86_TUNE_AVX512_MOVE_BY_PIECES,
> "avx512_move_by_pieces",
> -   m_SAPPHIRERAPIDS | m_ZNVER4 | m_ZNVER5)
> +   m_SAPPHIRERAPIDS | m_GRANITERAPIDS | m_GRANITERAPIDS_D
> +   | m_DIAMONDRAPIDS | m_ZNVER4 | m_ZNVER5)
> 
>  /* X86_TUNE_AVX512_STORE_BY_PIECES: Optimize store_by_pieces with
> 512-bit
> AVX instructions.  */
>  DEF_TUNE (X86_TUNE_AVX512_STORE_BY_PIECES,
> "avx512_store_by_pieces",
> -   m_SAPPHIRERAPIDS | m_ZNVER4 | m_ZNVER5)
> +   m_SAPPHIRERAPIDS | m_GRANITERAPIDS | m_GRANITERAPIDS_D
> +   | m_DIAMONDRAPIDS | m_ZNVER4 | m_ZNVER5)
> 
>  /* X86_TUNE_AVX512_TWO_EPILOGUES: Use two vector epilogues for 512-
> bit
> vectorized loops.  */
> --
> 2.31.1

RE: [PATCH 0/4] Fix AVX10.2 SAT CVT.

2025-03-19 Thread Liu, Hongtao




> -Original Message-
> From: Hu, Lin1 
> Sent: Wednesday, March 19, 2025 3:49 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH 0/4] Fix AVX10.2 SAT CVT.
> 
> Hi, all
> 
> This series of patches fixes three issues in AVX10.2 SAT CVT:
> 
> 1. Adds ep[i|u]8 suffix to *[i|u]bs intrinsic names.
> 2. Introduces SAT CVT intrinsics without rounding control.
> 3. Marks saturation by adding 's_' before core name.

I'm like the risk of this series is relatively low, so even though it's stage4, 
I'm also approving those patches.

> 
> BRs,
> Lin
> 
> Hu, Lin1 (4):
>   i386: Update Suffix for AVX10.2 SAT CVT Intrinsics
>   i386: Add AVX10.2 SAT CVT Intrinsics without Rounding Control
>   i386: Fix AVX10.2 SAT CVT testcases.
>   i386: Add "s_" as Saturation for AVX10.2 SAT CVT Intrinsics.
> 
>  gcc/config/i386/avx10_2-512satcvtintrin.h | 648 --
>  gcc/config/i386/avx10_2satcvtintrin.h | 844 +++---
>  gcc/config/i386/i386-builtin-types.def|   5 +
>  gcc/config/i386/i386-builtin.def  |  32 +
>  gcc/config/i386/i386-expand.cc|   5 +
>  .../gcc.target/i386/avx10_2-512-satcvt-1.c| 200 +++--
>  .../i386/avx10_2-512-vcvtbf162ibs-2.c |   6 +-
>  .../i386/avx10_2-512-vcvtbf162iubs-2.c|   6 +-
>  .../i386/avx10_2-512-vcvtph2ibs-2.c   |  29 +-
>  .../i386/avx10_2-512-vcvtph2iubs-2.c  |  29 +-
>  .../i386/avx10_2-512-vcvtps2ibs-2.c   |  29 +-
>  .../i386/avx10_2-512-vcvtps2iubs-2.c  |  29 +-
>  .../i386/avx10_2-512-vcvttbf162ibs-2.c|   6 +-
>  .../i386/avx10_2-512-vcvttbf162iubs-2.c   |   6 +-
>  .../i386/avx10_2-512-vcvttpd2dqs-2.c  |  27 +-
>  .../i386/avx10_2-512-vcvttpd2qqs-2.c  |  27 +-
>  .../i386/avx10_2-512-vcvttpd2udqs-2.c |  27 +-
>  .../i386/avx10_2-512-vcvttpd2uqqs-2.c |  27 +-
>  .../i386/avx10_2-512-vcvttph2ibs-2.c  |  29 +-
>  .../i386/avx10_2-512-vcvttph2iubs-2.c |  18 +-
>  .../i386/avx10_2-512-vcvttps2dqs-2.c  |  27 +-
>  .../i386/avx10_2-512-vcvttps2ibs-2.c  |  29 +-
>  .../i386/avx10_2-512-vcvttps2iubs-2.c |  29 +-
>  .../i386/avx10_2-512-vcvttps2qqs-2.c  |  28 +-
>  .../i386/avx10_2-512-vcvttps2udqs-2.c |  27 +-
>  .../i386/avx10_2-512-vcvttps2uqqs-2.c |  27 +-
>  .../gcc.target/i386/avx10_2-satcvt-1.c| 360 +---
>  .../gcc.target/i386/avx10_2-vcvtps2iubs-2.c   |  16 +
>  .../gcc.target/i386/avx10_2-vcvttsd2sis-2.c   |  24 +
>  .../gcc.target/i386/avx10_2-vcvttsd2usis-2.c  |  24 +
>  .../gcc.target/i386/avx10_2-vcvttss2sis-2.c   |  24 +
>  .../gcc.target/i386/avx10_2-vcvttss2usis-2.c  |  24 +
>  gcc/testsuite/gcc.target/i386/sse-14.c|  96 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c|  96 +-
>  34 files changed, 2213 insertions(+), 647 deletions(-)  create mode 100644
> gcc/testsuite/gcc.target/i386/avx10_2-vcvtps2iubs-2.c
> 
> --
> 2.31.1

RE: [PATCH 00/27] Use avx10.x as the only option for AVX10 with 512 bit vector support while remove avx10.x-256/512 option and 256 bit rounding support

2025-03-19 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, March 19, 2025 3:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH 00/27] Use avx10.x as the only option for AVX10 with 512 bit
> vector support while remove avx10.x-256/512 option and 256 bit rounding
> support
> 
> Hi all,
> 
> It is a little late for this change but I hope this will be a welcoming 
> change since
> it will greatly simplify the compiler options and reduce confusion for AVX10
> option combination with AVX512.
> 
> AVX10 whitepaper just got a major change and it will impact how we design
> the compiler option, which will actually simplify everything.
I like the decision.
Ok for the series.
But I'd like to leave a little time for others to comment. so please COMMIT 
this series in next week if there's no objection.
> 
> Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> In this new whitepaper, all the platforms will support 512 bit vector width
> (previously, E-core is up to 256 bit, leading to hybrid clients and Atom 
> Server
> 256 bit only). Also, 256 bit rounding is not that useful because we currently
> have rounding feature directly on E-core now and no need to use 256-bit
> rounding as somehow a workaround. HW will remove that support.
> 
> Thus, there is no need to add avx10.x-256/512 into compiler options. A
> simple avx10.x supporting all vector length is all we need. The change also
> makes -mno-evex512 not that useful. It is introduced with avx10.1-256 for
> compiling 256 bit only binary on legacy platforms to have a partial trial for
> avx10.x-256. What we also need to do is to remove 256 bit rounding.
> 
> For new features added in GCC 15 (i.e., option avx10.2-256/512 and 256 bit
> rounding), we will just simply remove them. Since avx10.2 has been directed
> to 512 bit previously. No need to change that option. We will keep the 
> testcase
> and intrin file structure for now since it is late in GCC 15 release cycle 
> and do
> clean up in GCC 16.
> 
> For features added in GCC 14 (i.e., option avx10.1-256/512 and no-evex512),
> we will raise a deprecate warning in GCC 15 and finally remove them in GCC
> 16. Also, we will need to add avx10.1 back (it was removed just less than one
> month ago) and direct it to 512 bit with a warning in GCC 15 since it was
> aliased to 256 bit previously. The deprecate and re-alias warning would also
> like to be backported to GCC 14 and landed into GCC 14.3.
> 
> The change will also lead to AVX10 and AVX512 option combination design
> simplification. Since we do not need to consider vector width enabled
> anymore, it is much more natural to just treat avx10.1 to an alias of all
> avx512xxx in SPR. Thus, "-mavx10.1 -mno-avx512fp16" should enable all
> AVX512 features while disabling AVX512FP16. And "-mavx512f -mno-
> avx10.1"
> should disable all AVX512 features. Both of the two combinations currently
> will ignore "-mno-". In GCC 15, we will raise a warning to mention that we are
> going to change that behavior in GCC 16.
> 
> Upcoming would be the patches we want to landed in GCC 15. The first two
> patches are removing 256 bit roundings added with AVX10.2 new
> instructions.
> Then the following 22 are simple revert patch for the legacy instruction
> 256 bit ymm rounding addition. The last three patch is for avx10.x option
> change.
> 
> Thx,
> Haochen
>

RE: [PATCH 0/4] Fix AVX10.2 SAT CVT.

2025-03-19 Thread Liu, Hongtao




> -Original Message-
> From: Liu, Hongtao
> Sent: Thursday, March 20, 2025 9:29 AM
> To: Hu, Lin1 ; gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com
> Subject: RE: [PATCH 0/4] Fix AVX10.2 SAT CVT.
> 
> 
> 
> > -Original Message-
> > From: Hu, Lin1 
> > Sent: Wednesday, March 19, 2025 3:49 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; ubiz...@gmail.com
> > Subject: [PATCH 0/4] Fix AVX10.2 SAT CVT.
> >
> > Hi, all
> >
> > This series of patches fixes three issues in AVX10.2 SAT CVT:
> >
> > 1. Adds ep[i|u]8 suffix to *[i|u]bs intrinsic names.
> > 2. Introduces SAT CVT intrinsics without rounding control.
> > 3. Marks saturation by adding 's_' before core name.
> 
> I'm like the risk of this series is relatively low, so even though it's 
> stage4, I'm
Typo: I think
> also approving those patches.
> 
> >
> > BRs,
> > Lin
> >
> > Hu, Lin1 (4):
> >   i386: Update Suffix for AVX10.2 SAT CVT Intrinsics
> >   i386: Add AVX10.2 SAT CVT Intrinsics without Rounding Control
> >   i386: Fix AVX10.2 SAT CVT testcases.
> >   i386: Add "s_" as Saturation for AVX10.2 SAT CVT Intrinsics.
> >
> >  gcc/config/i386/avx10_2-512satcvtintrin.h | 648 --
> >  gcc/config/i386/avx10_2satcvtintrin.h | 844 +++---
> >  gcc/config/i386/i386-builtin-types.def|   5 +
> >  gcc/config/i386/i386-builtin.def  |  32 +
> >  gcc/config/i386/i386-expand.cc|   5 +
> >  .../gcc.target/i386/avx10_2-512-satcvt-1.c| 200 +++--
> >  .../i386/avx10_2-512-vcvtbf162ibs-2.c |   6 +-
> >  .../i386/avx10_2-512-vcvtbf162iubs-2.c|   6 +-
> >  .../i386/avx10_2-512-vcvtph2ibs-2.c   |  29 +-
> >  .../i386/avx10_2-512-vcvtph2iubs-2.c  |  29 +-
> >  .../i386/avx10_2-512-vcvtps2ibs-2.c   |  29 +-
> >  .../i386/avx10_2-512-vcvtps2iubs-2.c  |  29 +-
> >  .../i386/avx10_2-512-vcvttbf162ibs-2.c|   6 +-
> >  .../i386/avx10_2-512-vcvttbf162iubs-2.c   |   6 +-
> >  .../i386/avx10_2-512-vcvttpd2dqs-2.c  |  27 +-
> >  .../i386/avx10_2-512-vcvttpd2qqs-2.c  |  27 +-
> >  .../i386/avx10_2-512-vcvttpd2udqs-2.c |  27 +-
> >  .../i386/avx10_2-512-vcvttpd2uqqs-2.c |  27 +-
> >  .../i386/avx10_2-512-vcvttph2ibs-2.c  |  29 +-
> >  .../i386/avx10_2-512-vcvttph2iubs-2.c |  18 +-
> >  .../i386/avx10_2-512-vcvttps2dqs-2.c  |  27 +-
> >  .../i386/avx10_2-512-vcvttps2ibs-2.c  |  29 +-
> >  .../i386/avx10_2-512-vcvttps2iubs-2.c |  29 +-
> >  .../i386/avx10_2-512-vcvttps2qqs-2.c  |  28 +-
> >  .../i386/avx10_2-512-vcvttps2udqs-2.c |  27 +-
> >  .../i386/avx10_2-512-vcvttps2uqqs-2.c |  27 +-
> >  .../gcc.target/i386/avx10_2-satcvt-1.c| 360 +---
> >  .../gcc.target/i386/avx10_2-vcvtps2iubs-2.c   |  16 +
> >  .../gcc.target/i386/avx10_2-vcvttsd2sis-2.c   |  24 +
> >  .../gcc.target/i386/avx10_2-vcvttsd2usis-2.c  |  24 +
> >  .../gcc.target/i386/avx10_2-vcvttss2sis-2.c   |  24 +
> >  .../gcc.target/i386/avx10_2-vcvttss2usis-2.c  |  24 +
> >  gcc/testsuite/gcc.target/i386/sse-14.c|  96 +-
> >  gcc/testsuite/gcc.target/i386/sse-22.c|  96 +-
> >  34 files changed, 2213 insertions(+), 647 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcvtps2iubs-2.c
> >
> > --
> > 2.31.1

RE: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-05 Thread Liu, Hongtao



> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, April 1, 2025 5:24 PM
> To: Hongtao Liu 
> Cc: Wang, Hongyu ; gcc-patches@gcc.gnu.org; Liu,
> Hongtao 
> Subject: Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR
> 119539]
> 
> On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu  wrote:
> >
> > On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang 
> wrote:
> > >
> > > Hi,
> > >
> > > For spiltter after 3_mask it now splits the
> > > pattern to *3_mask, causing the splitter doesn't
> > > generate nf variant. Add corresponding nf counterpart for
> > > define_insn_and_split to make the splitter also works for nf insn.
> > >
> > > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/119539
> > > * config/i386/i386.md (*3_mask_nf): New
> > > define_insn_and_split.
> > > (*3_mask_1_nf): Likewise.
> > > (*3_mask): Use force_lowpart_subreg.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/119539
> > > * gcc.target/i386/apx-nf-pr119539.c: New test.
> > > ---
> > >  gcc/config/i386/i386.md   | 46 ++-
> > >  .../gcc.target/i386/apx-nf-pr119539.c |  6 +++
> > >  2 files changed, 50 insertions(+), 2 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> > >
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> > > f7f790d2aeb..42312f0c330 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -18131,6 +18131,30 @@ (define_expand "3"
> > >DONE;
> > >  })
> > >
> > > +;; Avoid useless masking of count operand.
> > > +(define_insn_and_split "*3_mask_nf"
> > > +  [(set (match_operand:SWI 0 "nonimmediate_operand")
> > > +   (any_rotate:SWI
> > > + (match_operand:SWI 1 "nonimmediate_operand")
> > > + (subreg:QI
> > > +   (and
> > > + (match_operand 2 "int248_register_operand" "c")
> > > + (match_operand 3 "const_int_operand")) 0)))]
> > > +  "TARGET_APX_NF
> > > +   && ix86_binary_operator_ok (, mode, operands)
> > > +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> > > +  == GET_MODE_BITSIZE (mode)-1
> > > +   && ix86_pre_reload_split ()"
> > > +  "#"
> > > +  "&& 1"
> > > +  [(set (match_dup 0)
> > > +   (any_rotate:SWI (match_dup 1)
> > > +   (match_dup 2)))] {
> > > +  operands[2] = force_lowpart_subreg (QImode, operands[2],
> > > + GET_MODE (operands[2]));
> > > +})
> > Can we just change the output in original pattern, I think combine
> > will still match the pattern even w/ clobber flags.
> >
> > like
> >
> > @@ -17851,8 +17851,17 @@ (define_insn_and_split
> "*3_mask"
> > (match_dup 2)))
> >(clobber (reg:CC FLAGS_REG))])]  {
> > -  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> > -  operands[2] = gen_lowpart (QImode, operands[2]);
> > +  if (TARGET_APX_F)
> > +{
> > +  emit_move_insn (operands[0],
> > +gen_rtx_ (mode, operands[1], operands[2]));
> > +  DONE;
> > +}
> > +  else
> > +{
> > +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> > +  operands[2] = gen_lowpart (QImode, operands[2]);
> 
> Please note we have a new "force_lowpart_subreg" function that operates on
> a register operand and (if possible) simplifies a subreg of a subreg, 
> otherwise it
> forces the operand to register and creates a subreg of the temporary. Similar
> to the above combination, with a possibility to avoid a temporary reg.
> 
> > Also we can remove constraint "c" in the original pattern.
> 
> The constraint is here to handle corner case, where combine propagated an
> insn RTX with fixed register, other than %ecx, into shift RTX.
> Register allocator was not able to fix the combination, so
> TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected
> unwanted combinations. Please see the comment in
> ix86_legitimate_combined_insn function.
Oh, thanks for the explanation. 
> 
> Perhaps the above is not relevant anymore with the new register allocator
> (LRA), and the constraint can indeed be removed. But please take some
> caution.
> 
> Uros.

RE: [PATCH v2] i386: Add "s_" as Saturation for AVX10.2 Converting Intrinsics.

2025-03-25 Thread Liu, Hongtao




> -Original Message-
> From: Hu, Lin1 
> Sent: Tuesday, March 25, 2025 4:23 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: RE: [PATCH v2] i386: Add "s_" as Saturation for AVX10.2 Converting
> Intrinsics.
> 
> More details: Alignment with llvm (https://github.com/llvm/llvm-
> project/pull/131592)
> 
> BRs,
> Lin
> 
> > -Original Message-
> > From: Hu, Lin1 
> > Sent: Tuesday, March 25, 2025 4:10 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; ubiz...@gmail.com
> > Subject: [PATCH v2] i386: Add "s_" as Saturation for AVX10.2
> > Converting Intrinsics.
> >
> > Modify ChangeLog.
> >
> > This patch aims to add "s_" after 'cvt' represent saturation.
> >
> > gcc/ChangeLog:
> >
> > * config/i386/avx10_2-512convertintrin.h
> (_mm512_mask_cvtx2ps_ph):
> > Formatting fixes
> > (_mm512_mask_cvtx_round2ps_ph): Ditto
> > (_mm512_maskz_cvtx_round2ps_ph): Ditto
> > (_mm512_cvtbiassph_bf8): Rename to _mm512_cvts_biasph_bf8.
> > (_mm512_mask_cvtbiassph_bf8): Rename to
> _mm512_mask_cvts_biasph_bf8.
> > (_mm512_maskz_cvtbiassph_bf8): Rename to
> > _mm512_maskz_cvts_biasph_bf8.
> > (_mm512_cvtbiassph_hf8): Rename to _mm512_cvts_biasph_hf8.
> > (_mm512_mask_cvtbiassph_hf8): Rename to
> _mm512_mask_cvts_biasph_hf8.
> > (_mm512_maskz_cvtbiassph_hf8): Rename to
> > _mm512_maskz_cvts_biasph_hf8.
> > (_mm512_cvts2ph_bf8): Rename to _mm512_cvts_2ph_bf8.
> > (_mm512_mask_cvts2ph_bf8): Rename to
> > _mm512_mask_cvts_2ph_bf8.
> > (_mm512_maskz_cvts2ph_bf8): Rename to
> _mm512_maskz_cvts_2ph_bf8.
> > (_mm512_cvts2ph_hf8): Rename to _mm512_cvts_2ph_hf8.
> > (_mm512_mask_cvts2ph_hf8): Rename to
> > _mm512_mask_cvts_2ph_hf8.
> > (_mm512_maskz_cvts2ph_hf8): Rename to
> _mm512_maskz_cvts_2ph_hf8.
> > (_mm512_cvtsph_bf8): Rename to _mm512_cvts_ph_bf8.
> > (_mm512_mask_cvtsph_bf8): Rename to
> _mm512_mask_cvts_ph_bf8.
> > (_mm512_maskz_cvtsph_bf8): Rename to
> _mm512_maskz_cvts_ph_bf8.
> > (_mm512_cvtsph_hf8): Rename to _mm512_cvts_ph_hf8.
> > (_mm512_mask_cvtsph_hf8): Rename to
> _mm512_mask_cvts_ph_hf8.
> > (_mm512_maskz_cvtsph_hf8): Rename to
> _mm512_maskz_cvts_ph_hf8.
> > * config/i386/avx10_2convertintrin.h
> > (_mm_cvtbiassph_bf8): Rename to _mm_cvts_biasph_bf8.
> > (_mm_mask_cvtbiassph_bf8): Rename to
> _mm_mask_cvts_biasph_bf8.
> > (_mm_maskz_cvtbiassph_bf8): Rename to
> _mm_maskz_cvts_biasph_bf8.
> > (_mm256_cvtbiassph_bf8): Rename to _mm256_cvts_biasph_bf8.
> > (_mm256_mask_cvtbiassph_bf8): Rename to
> _mm256_mask_cvts_biasph_bf8.
> > (_mm256_maskz_cvtbiassph_bf8): Rename to
> > _mm256_maskz_cvts_biasph_bf8.
> > (_mm_cvtbiassph_hf8): Rename to _mm_cvts_biasph_hf8.
> > (_mm_mask_cvtbiassph_hf8): Rename to
> _mm_mask_cvts_biasph_hf8.
> > (_mm_maskz_cvtbiassph_hf8): Rename to
> _mm_maskz_cvts_biasph_hf8.
> > (_mm256_cvtbiassph_hf8): Rename to _mm256_cvts_biasph_hf8.
> > (_mm256_mask_cvtbiassph_hf8): Rename to
> _mm256_mask_cvts_biasph_hf8.
> > (_mm256_maskz_cvtbiassph_hf8): Rename to
> > _mm256_maskz_cvts_biasph_hf8.
> > (_mm_cvts2ph_bf8): Rename to _mm_cvts_2ph_bf8.
> > (_mm_mask_cvts2ph_bf8): Rename to _mm_mask_cvts_2ph_bf8.
> > (_mm_maskz_cvts2ph_bf8): Rename to _mm_maskz_cvts_2ph_bf8.
> > (_mm256_cvts2ph_bf8): Rename to _mm256_cvts_2ph_bf8.
> > (_mm256_mask_cvts2ph_bf8): Rename to
> > _mm256_mask_cvts_2ph_bf8.
> > (_mm256_maskz_cvts2ph_bf8): Rename to
> _mm256_maskz_cvts_2ph_bf8.
> > (_mm_cvts2ph_hf8): Rename to _mm_cvts_2ph_hf8.
> > (_mm_mask_cvts2ph_hf8): Rename to _mm_mask_cvts_2ph_hf8.
> > (_mm_maskz_cvts2ph_hf8): Rename to _mm_maskz_cvts_2ph_hf8.
> > (_mm256_cvts2ph_hf8): Rename to _mm256_cvts_2ph_hf8.
> > (_mm256_mask_cvts2ph_hf8): Rename to
> > _mm256_mask_cvts_2ph_hf8.
> > (_mm256_maskz_cvts2ph_hf8): Rename to
> _mm256_maskz_cvts_2ph_hf8.
> > (_mm_cvtsph_bf8): Rename to _mm_cvts_ph_bf8.
> > (_mm_mask_cvtsph_bf8): Rename to _mm_mask_cvts_ph_bf8.
> > (_mm_maskz_cvtsph_bf8): Rename to _mm_maskz_cvts_ph_bf8.
> > (_mm256_cvtsph_bf8): Rename to _mm256_cvts_ph_bf8.
> > (_mm256_mask_cvtsph_bf8): Rename to
> _mm256_mask_cvts_ph_bf8.
> > (_mm256_maskz_cvtsph_bf8): Rename to
> _mm256_maskz_cvts_ph_bf8.
> > (_mm_cvtsph_hf8): Rename to _mm_cvts_ph_hf8.
> > (_m

RE: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Liu, Hongtao




> -Original Message-
> From: Jan Hubicka 
> Sent: Friday, April 25, 2025 12:27 AM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.
> 
> > Since ix86_expand_sse_movcc will simplify them into a simple vmov,
> > vpand or vpandn.
> > Current register_operand/vector_operand could lose some optimization
> > opportunity.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/predicates.md (vector_or_0_or_1s_operand): New
> predicate.
> > (nonimm_or_0_or_1s_operand): Ditto.
> > * config/i386/sse.md (vcond_mask_):
> > Extend the predicate of operands1 to accept 0 or allones
> > operands.
> > (vcond_mask_): Ditto.
> > (vcond_mask_v1tiv1ti): Ditto.
> > (vcond_mask_): Ditto.
> > * config/i386/i386.md (movcc): Ditto for operands[2] and
> > operands[3].
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/blendv-to-maxmin.c: New test.
> > * gcc.target/i386/blendv-to-pand.c: New test.
> 
> > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > new file mode 100644
> > index 000..042eb7d8f24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > +
> > +double
> > +foo (double a)
> > +{
> > +  if (a > 0.0)
> > +return a;
> > +  return 0.0;
> > +}
> 
> With -ffast-math this is matched as MAX_EXPR at gimple level. Without -ffast-
> math we can not do that since MAX_EXPR (and RTL SMAX) are explicitely
> documented as unspecified when one of parameters is nan.
> 
> So without -ffast-math at combine time we see:
> (insn 6 3 7 2 (set (reg:DF 103)
> (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
>  (nil))
> (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> (unspec:DF [
> (reg:DF 104 [ a ])
> (reg:DF 103)
> ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
>  (expr_list:REG_DEAD (reg:DF 104 [ a ])
> (expr_list:REG_DEAD (reg:DF 103)
> (nil
> 
> maxss is defined as:
> 
> MAX(SRC1, SRC2)
> {
> IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> ELSE DEST := SRC2;
> FI;
> }
> 
> which I think translates to
>   SRC1 > SRC1 : SRC1 : SRC2
Yes, for minss/maxss
> 
> If SRC1 and SRC2 are both 0, this should evaulate to false and return RC2 if
> one of them is NaN this should evaulate to false and return SRC2
> 
> so it seems to do right side cases and has direct RTL equivalent.  So why we
> need UNSPEC_IEEE_MAX at all? Expressing this in RTL directly would enable
> RTL passes to do better job.
> Similarly for BLENDV...
Note for blendv, it checks the significant bit of the mask, not simple
 if_then_else
  mask
  if_true 
  if_false

It should be 
if_then_else
   ashiftrt mask 31
   if_true
   if_false

Maybe not very useful in practice, just like why there's UNSPEC_FMADDSUB

6334
 6335;; It would be possible to represent these without the UNSPEC as
 6336;;
 6337;; (vec_merge
 6338;;   (fma op1 op2 op3)
 6339;;   (fma op1 op2 (neg op3))
 6340;;   (merge-const))
 6341;;
 6342;; But this doesn't seem useful in practice.
 6343
 6344(define_expand "vec_fmaddsub4"
 6345  [(set (match_operand:VFH 0 "register_operand")
 6346(unspec:VFH
 6347  [(match_operand:VFH 1 "nonimmediate_operand")
 6348   (match_operand:VFH 2 "nonimmediate_operand")
 6349   (match_operand:VFH 3 "nonimmediate_operand")]
 6350  UNSPEC_FMADDSUB))]
 6351  "TARGET_FMA || TARGET_FMA4 || ( == 64 || TARGET_AVX512VL)")
 6352

> 
> Honza

RE: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 14, 2023 10:30 PM
> To: Andrew Stubbs 
> Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com; Jan Hubicka
> ; Liu, Hongtao ;
> kirill.yuk...@gmail.com
> Subject: Re: [PATCH 3/3] AVX512 fully masked vectorization
> 
> 
> 
> > Am 14.06.2023 um 16:27 schrieb Andrew Stubbs
> :
> >
> > On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
> >> This implemens fully masked vectorization or a masked epilog for
> >> AVX512 style masks which single themselves out by representing each
> >> lane with a single bit and by using integer modes for the mask (both
> >> is much like GCN).
> >> AVX512 is also special in that it doesn't have any instruction to
> >> compute the mask from a scalar IV like SVE has with while_ult.
> >> Instead the masks are produced by vector compares and the loop
> >> control retains the scalar IV (mainly to avoid dependences on mask
> >> generation, a suitable mask test instruction is available).
> >
> > This is also sounds like GCN. We currently use WHILE_ULT in the middle end
> which expands to a vector compare against a vector of stepped values. This
> requires an additional instruction to prepare the comparison vector
> (compared to SVE), but the "while_ultv64sidi" pattern (for example) returns
> the DImode bitmask, so it works reasonably well.
> >
> >> Like RVV code generation prefers a decrementing IV though IVOPTs
> >> messes things up in some cases removing that IV to eliminate it with
> >> an incrementing one used for address generation.
> >> One of the motivating testcases is from PR108410 which in turn is
> >> extracted from x264 where large size vectorization shows issues with
> >> small trip loops.  Execution time there improves compared to classic
> >> AVX512 with AVX2 epilogues for the cases of less than 32 iterations.
> >> size   scalar 128 256 512512e512f
> >> 19.42   11.329.35   11.17   15.13   16.89
> >> 25.726.536.666.667.628.56
> >> 34.495.105.105.745.085.73
> >> 44.104.334.295.213.794.25
> >> 63.783.853.864.762.542.85
> >> 83.641.893.764.501.922.16
> >>123.562.213.754.261.261.42
> >>163.360.831.064.160.951.07
> >>203.391.421.334.070.750.85
> >>243.230.661.724.220.620.70
> >>283.181.092.044.200.540.61
> >>323.160.470.410.410.470.53
> >>343.160.670.610.560.440.50
> >>383.190.950.950.820.400.45
> >>423.090.581.211.130.360.40
> >> 'size' specifies the number of actual iterations, 512e is for a
> >> masked epilog and 512f for the fully masked loop.  From
> >> 4 scalar iterations on the AVX512 masked epilog code is clearly the
> >> winner, the fully masked variant is clearly worse and it's size
> >> benefit is also tiny.
> >
> > Let me check I understand correctly. In the fully masked case, there is a
> single loop in which a new mask is generated at the start of each iteration. 
> In
> the masked epilogue case, the main loop uses no masking whatsoever, thus
> avoiding the need for generating a mask, carrying the mask, inserting
> vec_merge operations, etc, and then the epilogue looks much like the fully
> masked case, but unlike smaller mode epilogues there is no loop because the
> eplogue vector size is the same. Is that right?
> 
> Yes.
What about vectorizer and unroll, when vector size is the same, unroll factor 
is N, but there're at most N - 1 iterations for epilogue loop, will there still 
a loop? 
> > This scheme seems like it might also benefit GCN, in so much as it 
> > simplifies
> the hot code path.
> >
> > GCN does not actually have smaller vector sizes, so there's no analogue to
> AVX2 (we pretend we have some smaller sizes, but that's because the
> middle end can't do masking everywhere yet, and it helps make some vector
> constants smaller, perhaps).
> >
> >> This patch does not enable using fully masked loops or masked
> >> epilogues by default.  More work on cost modeling and vectorization
> >> kind selection on x86_64 is necessary for this.
> >> Implementation wise this introduces

RE: [PATCH] Initial Granite Rapids D Support

2023-07-11 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Mo, Zewei 
> Sent: Wednesday, July 12, 2023 1:56 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Initial Granite Rapids D Support
> 
> Hi all,
> 
> This patch is to add initial support for Granite Rapids D for GCC.
> 
> The link of related information is listed below:
> https://www.intel.com/content/www/us/en/develop/download/intel-
> architecture-instruction-set-extensions-programming-reference.html
> 
> Also, the patch of removing AMX-COMPLEX from Granite Rapids will be
> backported to GCC13.
> 
> This has been tested on x86_64-pc-linux-gnu. Is this ok for trunk? Thank you.
Ok.
> 
> Sincerely,
> Zewei Mo
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/cpuinfo.h
>   (get_intel_cpu): Handle Granite Rapids D.
>   * common/config/i386/i386-common.cc:
>   (processor_alias_table): Add graniterapids-d.
>   * common/config/i386/i386-cpuinfo.h
>   (enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
>   * config.gcc: Add -march=graniterapids-d.
>   * config/i386/driver-i386.cc (host_detect_local_cpu):
>   Handle graniterapids-d.
>   * gcc/config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
>   * doc/extend.texi: Add graniterapids-d.
>   * doc/invoke.texi: Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/mv16.C: Add graniterapids-d.
>   * gcc.target/i386/funcspec-56.inc: Handle new march.
> ---
>  gcc/common/config/i386/cpuinfo.h  |  9 -
>  gcc/common/config/i386/i386-common.cc |  2 ++
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/config.gcc|  2 +-
>  gcc/config/i386/driver-i386.cc|  3 +++
>  gcc/config/i386/i386.h|  4 +++-
>  gcc/doc/extend.texi   |  3 +++
>  gcc/doc/invoke.texi   | 11 +++
>  gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
>  10 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index ae48bc17771..7c2565c1d93 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -565,7 +565,6 @@ get_intel_cpu (struct __processor_model
> *cpu_model,
>cpu_model->__cpu_type = INTEL_SIERRAFOREST;
>break;
>  case 0xad:
> -case 0xae:
>/* Granite Rapids.  */
>cpu = "graniterapids";
>CHECK___builtin_cpu_is ("corei7"); @@ -573,6 +572,14 @@
> get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_type = INTEL_COREI7;
>cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
>break;
> +case 0xae:
> +  /* Granite Rapids D.  */
> +  cpu = "graniterapids-d";
> +  CHECK___builtin_cpu_is ("corei7");
> +  CHECK___builtin_cpu_is ("graniterapids-d");
> +  cpu_model->__cpu_type = INTEL_COREI7;
> +  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS_D;
> +  break;
>  case 0xb6:
>/* Grand Ridge.  */
>cpu = "grandridge";
> diff --git a/gcc/common/config/i386/i386-common.cc
> b/gcc/common/config/i386/i386-common.cc
> index bf126f14073..8cea3669239 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -2094,6 +2094,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL,
> PTA_GRANITERAPIDS,
>  M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
> +  {"graniterapids-d", PROCESSOR_GRANITERAPIDS, CPU_HASWELL,
> PTA_GRANITERAPIDS_D,
> +M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
> P_PROC_AVX512F},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
>{"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL, diff --git
> a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-
> cpuinfo.h
> index 2dafbb25a49..254dfec70e5 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -98,6 +98,7 @@ enum processor_subtypes
>ZHAOXIN_FAM7H_LUJIAZUI,
>AMDFAM19H_ZNVER4,
>INTEL_COREI7_GRANITERAPIDS,
> +  INTEL_COREI7_GRANITERAPIDS_D,
>CPU_SUBTYPE_MAX
>  };
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc index d88071773c9..1446eb2b3ca
> 100644

RE: [PATCH] Replace invariant ternlog operands

2023-07-26 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Yan Simonaytes 
> Sent: Wednesday, July 26, 2023 2:11 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Uros Bizjak ;
> Yan Simonaytes 
> Subject: [PATCH] Replace invariant ternlog operands
> 
> Sometimes GCC generates ternlog with three operands, but some of them are
> invariant.
> For example:
> 
> vpternlogq$252, %zmm2, %zmm1, %zmm0
> 
> In this case zmm1 register isnt used by ternlog.
> So should replace zmm1 with zmm0 or zmm2:
> 
> vpternlogq$252, %zmm0, %zmm1, %zmm0
> 
> When the third operand of ternlog is memory and both others are invariant
> should add load instruction from this memory to register and replace the first
> and the second operands to this register.
> So insted of
> 
> vpternlogq$85, (%rdi), %zmm1, %zmm0
> 
> Should emit
> 
> vmovdqa64 (%rdi), %zmm0
> vpternlogq$85, %zmm0, %zmm0, %zmm0
> 
> gcc/ChangeLog:
> 
> * config/i386/i386.cc (ternlog_invariant_operand_mask): New helper
>   function for replacing invariant operands.
> (reduce_ternlog_operands): Likewise.
> * config/i386/i386-protos.h (ternlog_invariant_operand_mask):
> Prototype here.
> (reduce_ternlog_operands): Likewise.
> * config/i386/sse.md:
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/reduce-ternlog-operands-1.c: New test.
> * gcc.target/i386/reduce-ternlog-operands-2.c: New test.
> ---
>  gcc/config/i386/i386-protos.h |  2 +
>  gcc/config/i386/i386.cc   | 45 +++
>  gcc/config/i386/sse.md| 43 ++
>  .../i386/reduce-ternlog-operands-1.c  | 20 +
>  .../i386/reduce-ternlog-operands-2.c  | 11 +
>  5 files changed, 121 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/reduce-ternlog-operands-
> 1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/reduce-ternlog-operands-
> 2.c
> 
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 27fe73ca65c..49398ef9936 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -57,6 +57,8 @@ extern int standard_80387_constant_p (rtx);  extern
> const char *standard_80387_constant_opcode (rtx);  extern rtx
> standard_80387_constant_rtx (int);  extern int standard_sse_constant_p (rtx,
> machine_mode);
> +extern int ternlog_invariant_operand_mask (rtx *operands); extern void
> +reduce_ternlog_operands (rtx *operands);
>  extern const char *standard_sse_constant_opcode (rtx_insn *, rtx *);  extern
> bool ix86_standard_x87sse_constant_load_p (const rtx_insn *, rtx);  extern
> bool ix86_pre_reload_split (void); diff --git a/gcc/config/i386/i386.cc
> b/gcc/config/i386/i386.cc index f0d6167e667..140de478571 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -5070,6 +5070,51 @@ ix86_check_no_addr_space (rtx insn)
>  }
>return true;
>  }
> +
> +/* Return mask of invariant operands:
> +   bit number 0 1 2
> +   operand number 1 2 3.  */
> +
> +int
> +ternlog_invariant_operand_mask (rtx *operands) {
> +  int mask = 0;
> +  int imm8 = XINT (operands[4], 0);
> +
> +  if (((imm8 >> 4) & 0xF) == (imm8 & 0xF))
> +mask |= 1;
> +  if (((imm8 >> 2) & 0x33) == (imm8 & 0x33))
> +mask |= (1 << 1);
> +  if (((imm8 >> 1) & 0x55) == (imm8 & 0x55))
> +mask |= (1 << 2);
> +
> +  return mask;
> +}
> +
> +/* Replace one of the unused operators with the one used.  */
> +
> +void
> +reduce_ternlog_operands (rtx *operands) {
> +  int mask = ternlog_invariant_operand_mask (operands);
> +
> +  if (mask & 1) /* the first operand is invariant.  */
> +operands[1] = operands[2];
> +
> +  if (mask & 2) /* the second operand is invariant.  */
> +operands[2] = operands[1];
> +
> +  if (mask & 4)  /* the third operand is invariant.  */
> +   operands[3] = operands[1];
> +  else if (!MEM_P (operands[3]))
> +{
> +  if (mask & 1) /* the first operand is invariant.  */
> + operands[1] = operands[3];
> +  if (mask & 2) /* the second operands is invariant.  */
> + operands[2] = operands[3];
> +}
> +}
> +
> 
> 
> 
>  /* Initialize the table of extra 80387 mathematical constants.  */
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> a2099373123..f88d82b315c 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12625,6 +12625,49 @@
> (symbol_ref " == 64 || TARGET_AVX512VL")
>

RE: [PATCH] x86: fold two of vec_dupv2df's alternatives

2023-07-31 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, August 1, 2023 1:49 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Kirill Yukhin
> 
> Subject: [PATCH] x86: fold two of vec_dupv2df's alternatives
> 
> By using Yvm in the source, both can be expressed in one.
> 
> gcc/
> 
>   * sse.md (vec_dupv2df): Fold the middle two of the
>   alternatives.
Ok, thanks.
> 
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -13784,21 +13784,20 @@
> (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
> 
>  (define_insn "vec_dupv2df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,v")
> +  [(set (match_operand:V2DF 0 "register_operand" "=x,v,v")
>   (vec_duplicate:V2DF
> -   (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))]
> +   (match_operand:DF 1 "nonimmediate_operand" "0,Yvm,vm")))]
>"TARGET_SSE2"
>"@
> unpcklpd\t%0, %0
> %vmovddup\t{%1, %0|%0, %1}
> -   vmovddup\t{%1, %0|%0, %1}
> vbroadcastsd\t{%1, }%g0{|, %1}"
> -  [(set_attr "isa" "noavx,sse3,avx512vl,*")
> -   (set_attr "type" "sselog1,ssemov,ssemov,ssemov")
> -   (set_attr "prefix" "orig,maybe_vex,evex,evex")
> -   (set_attr "mode" "V2DF,DF,DF,V8DF")
> +  [(set_attr "isa" "noavx,sse3,*")
> +   (set_attr "type" "sselog1,ssemov,ssemov")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "mode" "V2DF,DF,V8DF")
> (set (attr "enabled")
> - (cond [(eq_attr "alternative" "3")
> + (cond [(eq_attr "alternative" "2")
>(symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
> && !TARGET_PREFER_AVX256")
>  (match_test "")

RE: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Uros Bizjak 
> Sent: Wednesday, August 9, 2023 2:33 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy
> Bridge.
> 
> On Wed, Aug 9, 2023 at 3:48 AM liuhongt  wrote:
> >
> > > Please rather do it in a more self-descriptive way, as proposed in
> > > the attached patch. You won't need a comment then.
> > >
> >
> > Adjusted in V2 patch.
> >
> > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is supported
> > via EAX.
> >
> > Intel documentation says invalid subleaves return 0. We had been
> > relying on that behavior instead of checking the max sublef number.
> >
> > It appears that some Sandy Bridge CPUs return at least the subleaf 0
> > EDX value for subleaf 1. Best guess is that this is a bug in a
> > microcode patch since all of the bits we're seeing set in EDX were
> > introduced after Sandy Bridge was originally released.
> >
> > This is causing avxvnniint16 to be incorrectly enabled with
> > -march=native on these CPUs.
> >
> > gcc/ChangeLog:
> >
> > * common/config/i386/cpuinfo.h (get_available_features): Check
> > EAX for valid subleaf before use CPUID.
> > ---
> >  gcc/common/config/i386/cpuinfo.h | 82
> > +---
> >  1 file changed, 43 insertions(+), 39 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/cpuinfo.h
> > b/gcc/common/config/i386/cpuinfo.h
> > index 30ef0d334ca..9fa4dec2a7e 100644
> > --- a/gcc/common/config/i386/cpuinfo.h
> > +++ b/gcc/common/config/i386/cpuinfo.h
> > @@ -663,6 +663,7 @@ get_available_features (struct __processor_model
> *cpu_model,
> >unsigned int max_cpuid_level = cpu_model2->__cpu_max_level;
> >unsigned int eax, ebx;
> >unsigned int ext_level;
> > +  unsigned int subleaf_level;
> 
> Oh, I failed this in my previous review. This variable should be named
> max_subleaf_level, as it represents the maximum supported ECX value.
I've committed previous patch ,but not backport yet.
Guess I can just commit another patch to change the name?
For backport, I'll merge the change together with just 1 commit.
> 
> Uros.
> 
> >
> >/* Get XCR_XFEATURE_ENABLED_MASK register with xgetbv.  */
> >  #define XCR_XFEATURE_ENABLED_MASK  0x0
> > @@ -762,7 +763,7 @@ get_available_features (struct __processor_model
> *cpu_model,
> >/* Get Advanced Features at level 7 (eax = 7, ecx = 0/1). */
> >if (max_cpuid_level >= 7)
> >  {
> > -  __cpuid_count (7, 0, eax, ebx, ecx, edx);
> > +  __cpuid_count (7, 0, subleaf_level, ebx, ecx, edx);
> >if (ebx & bit_BMI)
> > set_feature (FEATURE_BMI);
> >if (ebx & bit_SGX)
> > @@ -874,45 +875,48 @@ get_available_features (struct
> __processor_model *cpu_model,
> > set_feature (FEATURE_AVX512FP16);
> > }
> >
> > -  __cpuid_count (7, 1, eax, ebx, ecx, edx);
> > -  if (eax & bit_HRESET)
> > -   set_feature (FEATURE_HRESET);
> > -  if (eax & bit_CMPCCXADD)
> > -   set_feature(FEATURE_CMPCCXADD);
> > -  if (edx & bit_PREFETCHI)
> > -   set_feature (FEATURE_PREFETCHI);
> > -  if (eax & bit_RAOINT)
> > -   set_feature (FEATURE_RAOINT);
> > -  if (avx_usable)
> > -   {
> > - if (eax & bit_AVXVNNI)
> > -   set_feature (FEATURE_AVXVNNI);
> > - if (eax & bit_AVXIFMA)
> > -   set_feature (FEATURE_AVXIFMA);
> > - if (edx & bit_AVXVNNIINT8)
> > -   set_feature (FEATURE_AVXVNNIINT8);
> > - if (edx & bit_AVXNECONVERT)
> > -   set_feature (FEATURE_AVXNECONVERT);
> > - if (edx & bit_AVXVNNIINT16)
> > -   set_feature (FEATURE_AVXVNNIINT16);
> > - if (eax & bit_SM3)
> > -   set_feature (FEATURE_SM3);
> > - if (eax & bit_SHA512)
> > -   set_feature (FEATURE_SHA512);
> > - if (eax & bit_SM4)
> > -   set_feature (FEATURE_SM4);
> > -   }
> > -  if (avx512_usable)
> > -   {
> > - if (eax & bit_AVX512BF16)
> > -   set_feature (FEATURE_AVX512BF16);
> > -   }
> > -  if (amx_usable)
> > +  if (subleaf_level >= 1)
> > {
> > - if (eax & bit_AMX_FP16)
> > -   set_feature (FEATURE_AM

RE: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Xi Ruoyao 
> Sent: Thursday, August 10, 2023 9:48 AM
> To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; ubiz...@gmail.com; hubi...@ucw.cz
> Subject: Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable
> vectorization for all gather/scatter instructions.
> 
> On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
> 
> And should we set -mno-gather as the default for GDS affected processors?
> We'll likely apply the ucode update for them, and then the gathering
> instructions will be much slower.
Assume you're talking about 
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
Yes, there will be an separate patch for microarchitecture tuning.
> 
> > gcc/ChangeLog:
> >
> > * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> > GATHER_SCATTER_DRIVER_SELF_SPECS.
> > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> > * config/i386/i386.opt (mgather): New option.
> > (mscatter): Ditto.
> > ---
> >  gcc/config/i386/i386.h   | 12 +++-
> >  gcc/config/i386/i386.opt |  8 
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > ef342fcee9b..d9ac2c29bde 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
> >  # define SUBTARGET_DRIVER_SELF_SPECS ""
> >  #endif
> >
> > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS # define
> > +GATHER_SCATTER_DRIVER_SELF_SPECS \
> > +  "%{mno-gather:-mtune-
> > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> > +   %{mgather:-mtune-
> > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> > +   %{mno-scatter:-mtune-
> > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> > +   %{mscatter:-mtune-
> > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> > +#endif
> > +
> > +#define DRIVER_SELF_SPECS \
> > +  SUBTARGET_DRIVER_SELF_SPECS " " \
> > +  GATHER_SCATTER_DRIVER_SELF_SPECS
> >
> >  /* -march=native handling only makes sense with compiler running on
> >     an x86 or x86_64 chip.  If changing this condition, also change
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > ddb7f110aa2..99948644a8d 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -424,6 +424,14 @@ mdaz-ftz
> >  Target
> >  Set the FTZ and DAZ Flags.
> >
> > +mgather
> > +Target
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target
> > +Enable vectorization for scatter instruction.
> > +
> >  mpreferred-stack-boundary=
> >  Target RejectNegative Joined UInteger
> > Var(ix86_preferred_stack_boundary_arg)
> >  Attempt to keep stack aligned to this power of 2.
> 
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

RE: [PATCH v2] x86: correct and improve "*vec_dupv2di"

2023-06-18 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Friday, June 16, 2023 2:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Kirill Yukhin
> 
> Subject: [PATCH v2] x86: correct and improve "*vec_dupv2di"
> 
> The input constraint for the %vmovddup alternative was wrong, as the upper
> 16 XMM registers require AVX512VL to be used with this insn. To
> compensate, introduce a new alternative permitting all 32 registers, by
> broadcasting to the full 512 bits in that case if AVX512VL is not available.
> 
> gcc/
> 
>   * config/i386/sse.md (vec_dupv2di): Correct %vmovddup input
>   constraint. Add new AVX512F alternative.
Could you add a testcase for that.
Ok with the testcase.
> ---
> Strictly speaking the new alternative could be enabled from AVX2 onwards,
> but vmovddup can frequently be a shorter encoding (VEX2 vs VEX3).
> 
> It was suggested that the previously flawed %vmovddup alternative could
> use "xm" as source constraint. But then its destination would better also use
> "x", I think?
> ---
> v2: Use "* return ..." form. Set "mode" to XI for new alternative
> without AVX512VL.
> 
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -26033,19 +26033,35 @@
>  (symbol_ref "true")))])
> 
>  (define_insn "*vec_dupv2di"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,v,x")
>   (vec_duplicate:V2DI
> -   (match_operand:DI 1 "nonimmediate_operand" " 0,Yv,vm,0")))]
> +   (match_operand:DI 1 "nonimmediate_operand" "
> 0,Yv,vm,Yvm,0")))]
>"TARGET_SSE"
>"@
> punpcklqdq\t%0, %0
> vpunpcklqdq\t{%d1, %0|%0, %d1}
> +   * return TARGET_AVX512VL ? \"vpbroadcastq\t{%1, %0|%0, %1}\" :
> + \"vpbroadcastq\t{%1, %g0|%g0, %1}\";
> %vmovddup\t{%1, %0|%0, %1}
> movlhps\t%0, %0"
> -  [(set_attr "isa" "sse2_noavx,avx,sse3,noavx")
> -   (set_attr "type" "sselog1,sselog1,sselog1,ssemov")
> -   (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig")
> -   (set_attr "mode" "TI,TI,DF,V4SF")])
> +  [(set_attr "isa" "sse2_noavx,avx,avx512f,sse3,noavx")
> +   (set_attr "type" "sselog1,sselog1,ssemov,sselog1,ssemov")
> +   (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig")
> +   (set (attr "mode")
> + (cond [(and (eq_attr "alternative" "2")
> + (match_test "!TARGET_AVX512VL"))
> +  (const_string "XI")
> +(eq_attr "alternative" "3")
> +  (const_string "DF")
> +(eq_attr "alternative" "4")
> +  (const_string "V4SF")
> +   ]
> +   (const_string "TI")))
> +   (set (attr "enabled")
> + (if_then_else
> +   (eq_attr "alternative" "2")
> +   (symbol_ref "TARGET_AVX512VL
> +|| (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
> +   (const_string "*")))])
> 
>  (define_insn "avx2_vbroadcasti128_"
>[(set (match_operand:VI_256 0 "register_operand" "=x,v,v")

RE: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-18 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Friday, June 16, 2023 2:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kirill Yukhin ; Liu, Hongtao
> 
> Subject: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit
> operands with just AVX512F
> 
> There's no reason to constrain this to AVX512VL, unless instructed so by -
> mprefer-vector-width=, as the wider operation is unusable for more narrow
> operands only when the possible memory source is a non-broadcast one.
> This way even the scalar copysign3 can benefit from the operation
> being a single-insn one (leaving aside moves which the compiler decides to
> insert for unclear reasons, and leaving aside the fact that
> bcst_mem_operand() is too restrictive for broadcast to be embedded right
> into VPTERNLOG*).
> 
> Along with this also request value duplication in ix86_expand_copysign()'s
> call to ix86_build_signbit_mask(), eliminating excess space allocation
> in .rodata.*, filled with zeros which are never read.
> 
> gcc/
> 
>   * config/i386/i386-expand.cc (ix86_expand_copysign): Request
>   value duplication by ix86_build_signbit_mask() when AVX512F and
>   not HFmode.
>   * config/i386/sse.md (*_vternlog_all): Convert to
>   2-alternative form. Adjust "mode" attribute. Add "enabled"
>   attribute.
>   (*_vpternlog_1): Also permit when
> TARGET_AVX512F
>   && !TARGET_PREFER_AVX256.
>   (*_vpternlog_2): Likewise.
>   (*_vpternlog_3): Likewise.
> ---
> I guess the underlying pattern, going along the lines of what
> one_cmpl2 uses, can be applied
> elsewhere as well.
> 
> HFmode could use embedded broadcast too for copysign and alike, but that
> would need to be V2HF -> V8HF (for which I don't think there are any existing
> patterns).
> ---
> v2: Respect -mprefer-vector-width=.
> 
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[])
>else
>  dest = NULL_RTX;
>op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode);
> -  mask = ix86_build_signbit_mask (vmode, 0, 0);
> +  mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode !=
> + HFmode, 0);
> 
>if (CONST_DOUBLE_P (operands[1]))
>  {
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12597,11 +12597,11 @@
> (set_attr "mode" "")])
> 
>  (define_insn "*_vternlog_all"
> -  [(set (match_operand:V 0 "register_operand" "=v")
> +  [(set (match_operand:V 0 "register_operand" "=v,v")
>   (unspec:V
> -   [(match_operand:V 1 "register_operand" "0")
> -(match_operand:V 2 "register_operand" "v")
> -(match_operand:V 3 "bcst_vector_operand" "vmBr")
> +   [(match_operand:V 1 "register_operand" "0,0")
> +(match_operand:V 2 "register_operand" "v,v")
> +(match_operand:V 3 "bcst_vector_operand" "vBr,m")
>  (match_operand:SI 4 "const_0_to_255_operand")]
> UNSPEC_VTERNLOG))]
>"TARGET_AVX512F
Change condition to  == 64 || TARGET_AVX512VL || (TARGET_AVX512F && 
!TARGET_PREFER_AVX256)
Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256.
> @@ -12609,10 +12609,22 @@
> it's not real AVX512FP16 instruction.  */
>&& (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4
>   || GET_CODE (operands[3]) != VEC_DUPLICATE)"
> -  "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"
> +{
> +  if (TARGET_AVX512VL)
> +return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}";
> +  else
> +return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3,
> +%4}"; }
>[(set_attr "type" "sselog")
> (set_attr "prefix" "evex")
> -   (set_attr "mode" "")])
> +   (set (attr "mode")
> +(if_then_else (match_test "TARGET_AVX512VL")
> +   (const_string "")
> +   (const_string "XI")))
> +   (set (attr "enabled")
> + (if_then_else (eq_attr "alternative" "1")
> +   (symbol_ref " == 64 || TARGET_AVX512VL")
> +   (const_string "*")))])
> 
>  ;; There must be lots of other combinations like  ;; @@ -12641,7 +12653,8
> @@
> (any_logic2:V
>   (match_operand:V 3 "regmem_or_bitnot_regmem_operand")
>   (mat

RE: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-24 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Wednesday, June 21, 2023 8:40 PM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin ; Liu,
> Hongtao 
> Subject: Re: [PATCH v2] x86: make better use of VBROADCASTSS /
> VPBROADCASTD
> 
> On 21.06.2023 09:44, Jan Beulich wrote:
> > On 21.06.2023 09:37, Hongtao Liu wrote:
> >> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
> >>  wrote:
> >>>
> >>> Isn't prefix_extra use bogus here? What extra prefix does
> >>> vbroadcastss
> >> According to comments, yes, no extra prefix is needed.
> >>
> >> ;; There are also additional prefixes in 3DNOW, SSSE3.
> >> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte, ;;
> >> sseiadd1,ssecvt1 to 0f7a with no DREX byte.
> >> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.
> >
> > Right, that's what triggered my question. I guess dropping these
> > "prefix_extra" really wants to be a separate patch (or maybe even
> > multiple, but it's hard to see how to split), dealing with all of the
> > instances which likely have accumulated simply via copy-and-paste.
> 
> Or wait - I'm altering those lines anyway, so I could as well drop them right
> away (and slightly shrink patch size), if that's okay with you. Of course I
> should then not forget to also mention this in the changelog entry.
> 
Yes.
> Jan

RE: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-07-04 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, July 4, 2023 11:30 PM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin ; Liu,
> Hongtao 
> Subject: Re: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit
> operands with just AVX512F
> 
> On 27.06.2023 07:11, Hongtao Liu wrote:
> > On Tue, Jun 20, 2023 at 5:34 PM Hongtao Liu  wrote:
> >>
> >> On Tue, Jun 20, 2023 at 5:03 PM Jan Beulich  wrote:
> >>>
> >>> On 20.06.2023 10:33, Hongtao Liu wrote:
> >>>> On Tue, Jun 20, 2023 at 3:07 PM Jan Beulich via Gcc-patches
> >>>>  wrote:
> >>>>>
> >>>>> I guess the underlying pattern, going along the lines of what
> >>>>> one_cmpl2 uses, can be
> applied
> >>>>> elsewhere as well.
> >>>> That should be guarded with !TARGET_PREFER_AVX256, let's handle
> >>>> that in a separate patch.
> >>>
> >>> Sure, and as indicated there are more places where similar things
> >>> could be done.
> >>>
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/i386/avx512f-copysign.c
> >>>>> @@ -0,0 +1,32 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-mavx512f -mno-avx512vl -O2" } */
> >>>> Please explicitly add -mprefer-vector-width=512, our tester will
> >>>> also test unix{-m32 \-march=cascadelake,\ -march=cascadelake} which
> >>>> set the
> >>>> - mprefer-vector-width=256, -mprefer-vector-width=512 in dg-options
> >>>> can overwrite that.
> >>>
> >>> Oh, I see. Will do. And I expect I then also need to adjust the
> >>> newly added avx512f-dupv2di.c from the earlier patch. I guess I
> >>> could commit that option addition there as obvious?
> >> Still need to send out the patch, and commit as an obvious fix.
> >>>
> >>>> Others LGTM.
> >>>
> >>> May I take this as "okay with that change", or should I submit v4?
> >> Okay. no need for a v4 version.
> >>>
> > avx512f-copysign.c failed for -m32, we need to add -mfpmath=sse to dg-
> options.
> 
> Oh, of course. I will take care of this, but it may take me a couple of days, 
> as I
> just came back from a week of vacation. One question though:
> Elsewhere such tests are simply suppressed for 32-bit. Personally I'd prefer
> going that route, but if you think adding -mfpmath=sse is indeed better, I'll
> follow your request.
Either is ok.
> 
> Jan

RE: [PATCH] Initial Granite Rapids D Support

2023-07-06 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Mo, Zewei 
> Sent: Thursday, July 6, 2023 2:37 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Initial Granite Rapids D Support
> 
> Hi all,
> 
> This patch is to add initial support for Granite Rapids D for GCC.
> The link of related information is listed below:
> https://www.intel.com/content/www/us/en/develop/download/intel-
> architecture-instruction-set-extensions-programming-reference.html
> 
> Also, the patch of removing AMX-COMPLEX from Granite Rapids will be
> backported to GCC13.
Ok.
> 
> This has been tested on x86_64-pc-linux-gnu. Is this ok for trunk? Thank you.
> 
> Sincerely,
> Zewei Mo
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/cpuinfo.h
>   (get_intel_cpu): Handle Granite Rapids D.
>   * common/config/i386/i386-common.cc:
>   (processor_names): Add graniterapids-d.
>   (processor_alias_table): Ditto.
>   * common/config/i386/i386-cpuinfo.h
>   (enum processor_subtypes): Add INTEL_GRANITERAPIDS_D.
>   * config.gcc: Add -march=graniterapids-d.
>   * config/i386/driver-i386.cc (host_detect_local_cpu):
>   Handle graniterapids-d.
>   * config/i386/i386-c.cc (ix86_target_macros_internal):
>   Ditto.
>   * config/i386/i386-options.cc (m_GRANITERAPIDSD): New.
>   (processor_cost_table): Add graniterapids-d.
>   * config/i386/i386.h (enum processor_type):
>   Add PROCESSOR_GRANITERAPIDS_D.
>   * doc/extend.texi: Add graniterapids-d.
>   * doc/invoke.texi: Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/mv16.C: Add graniterapids-d.
>   * gcc.target/i386/funcspec-56.inc: Handle new march.
> ---
>  gcc/common/config/i386/cpuinfo.h  |  9 -
>  gcc/common/config/i386/i386-common.cc |  3 +++
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/config.gcc|  2 +-
>  gcc/config/i386/driver-i386.cc|  3 +++
>  gcc/config/i386/i386-c.cc |  7 +++
>  gcc/config/i386/i386-options.cc   |  4 +++-
>  gcc/config/i386/i386.h|  5 -
>  gcc/doc/extend.texi   |  3 +++
>  gcc/doc/invoke.texi   | 11 +++
>  gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
>  12 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index ae48bc17771..7c2565c1d93 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -565,7 +565,6 @@ get_intel_cpu (struct __processor_model
> *cpu_model,
>cpu_model->__cpu_type = INTEL_SIERRAFOREST;
>break;
>  case 0xad:
> -case 0xae:
>/* Granite Rapids.  */
>cpu = "graniterapids";
>CHECK___builtin_cpu_is ("corei7"); @@ -573,6 +572,14 @@
> get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_type = INTEL_COREI7;
>cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
>break;
> +case 0xae:
> +  /* Granite Rapids D.  */
> +  cpu = "graniterapids-d";
> +  CHECK___builtin_cpu_is ("corei7");
> +  CHECK___builtin_cpu_is ("graniterapids-d");
> +  cpu_model->__cpu_type = INTEL_COREI7;
> +  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS_D;
> +  break;
>  case 0xb6:
>/* Grand Ridge.  */
>cpu = "grandridge";
> diff --git a/gcc/common/config/i386/i386-common.cc
> b/gcc/common/config/i386/i386-common.cc
> index bf126f14073..5a337c5b8be 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1971,6 +1971,7 @@ const char *const processor_names[] =
>"alderlake",
>"rocketlake",
>"graniterapids",
> +  "graniterapids-d",
>"intel",
>"lujiazui",
>"geode",
> @@ -2094,6 +2095,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL,
> PTA_GRANITERAPIDS,
>  M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
> +  {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
> PTA_GRANITERAPIDS_D,
> +M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
> P_PROC_AVX512F},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
&

RE: [PATCH v3] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-07-10 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, July 11, 2023 2:04 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kirill Yukhin ; Liu, Hongtao
> 
> Subject: [PATCH v3] x86: make better use of VBROADCASTSS /
> VPBROADCASTD
> 
> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are never
> longer (yet sometimes shorter) than the corresponding VSHUFPS / VPSHUFD,
> due to the immediate operand of the shuffle insns balancing the
> (uniform) need for VEX3 in the broadcast ones. When EVEX encoding is
> respective the broadcast insns are always shorter.
> 
> Add new alternatives to cover the AVX2 and AVX512 cases as appropriate.
> 
> While touching this anyway, switch to consistently using "sseshuf1" in the
> "type" attributes for all shuffle forms.
> 
> gcc/
> 
>   * config/i386/sse.md (vec_dupv4sf): Make first alternative use
>   vbroadcastss for AVX2. New AVX512F alternative.
>   (*vec_dupv4si): New AVX2 and AVX512F alternatives using
>   vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute.
> 
> gcc/testsuite/
> 
>   * gcc.target/i386/avx2-dupv4sf.c: New test.
>   * gcc.target/i386/avx2-dupv4si.c: Likewise.
>   * gcc.target/i386/avx512f-dupv4sf.c: Likewise.
>   * gcc.target/i386/avx512f-dupv4si.c: Likewise.
> ---
> Note that unlike originally intended, "prefix_extra" isn't dropped:
> "length_vex" uses it to determine whether 2-byte VEX encoding is possible
> (which it isn't for VBROADCASTSS / VPBROADCASTD). "length"
> itself specifically does not use it for VEX/EVEX encoded insns.
> 
> Especially with the added "enabled" attribute I didn't really see how to
> (further) fold alternatives 0 and 1. Instead *vec_dupv4si might benefit from
> using sse2_noavx2 instead of sse2 for alternative 2, except that there is no
> sse2_noavx2, only sse2_noavx.
> 
> I'm working from the assumption that the isa attributes to the original 1st 
> and
> 2nd alternatives don't need further restricting (to sse2_noavx2 or
> avx_noavx2 as applicable), as the new earlier alternatives cover all operand
> forms already when at least AVX2 is enabled.
Yes, the patch LGTM.
> ---
> v3: Testcases for new alternatives. "type" and "prefix_extra"
> adjustments.
> v2: Correct operand constraints. Respect -mprefer-vector-width=. Fold
> two alternatives of vec_dupv4sf.
> 
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25969,41 +25969,64 @@
>   (const_int 1)))])
> 
>  (define_insn "vec_dupv4sf"
> -  [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SF 0 "register_operand" "=v,v,v,x")
>   (vec_duplicate:V4SF
> -   (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
> +   (match_operand:SF 1 "nonimmediate_operand" "Yv,v,m,0")))]
>"TARGET_SSE"
>"@
> -   vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
> +   * return TARGET_AVX2 ? \"vbroadcastss\t{%1, %0|%0, %1}\" :
> \"vshufps\t{$0, %d1, %0|%0, %d1, 0}\";
> +   vbroadcastss\t{%1, %g0|%g0, %1}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "avx,avx,noavx")
> -   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_evex,maybe_evex,orig")
> -   (set_attr "mode" "V4SF")])
> +  [(set_attr "isa" "avx,*,avx,noavx")
> +   (set (attr "type")
> + (cond [(and (eq_attr "alternative" "0")
> + (match_test "!TARGET_AVX2"))
> +  (const_string "sseshuf1")
> +(eq_attr "alternative" "3")
> +  (const_string "sseshuf1")
> +   ]
> +   (const_string "ssemov")))
> +   (set (attr "length_immediate")
> + (if_then_else (eq_attr "type" "sseshuf1")
> +   (const_string "1")
> +   (const_string "0")))
> +   (set_attr "prefix_extra" "0,1,1,*")
> +   (set_attr "prefix" "maybe_evex,evex,maybe_evex,orig")
> +   (set_attr "mode" "V4SF,V16SF,V4SF,V4SF")
> +   (set (attr "enabled")
> + (if_then_else (eq_attr "alternative" "1")
>

RE: [PATCH] x86: improve fast bfloat->float conversion

2023-07-10 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, July 11, 2023 2:08 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Kirill Yukhin
> 
> Subject: [PATCH] x86: improve fast bfloat->float conversion
> 
> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
> constraints for the AVX alternative. Furthermore by using the 512-bit form of
> VPSSLD (in a new alternative) all 32 registers can be used directly by the 
> insn
> without AVX512VL needing to be enabled.
Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.
> 
> Also adjust the originally last alternative's "prefix" attribute to 
> maybe_evex.
> 
> gcc/
> 
>   * config/i386/i386.md (extendbfsf2_1): Add new AVX512F
>   alternative. Adjust original last alternative's "prefix"
>   attribute to maybe_evex.
> ---
> The corresponding expander, "extendbfsf2", looks to have been dead since
> its introduction in a1ecc5600464 ("Fix incorrect _mm_cvtsbh_ss"): The builtin
> references the insn (extendbfsf2_1), not the expander. Can't the expander
> be deleted and the name of the insn then pruned of the _1 suffix? If so, that
> further raises the question of the significance of the "!HONOR_NANS
> (BFmode)" that the expander has, but the insn doesn't have. Which may
> instead suggest the builtin was meant to reference the expander. Yet then I
> can't see what would the builtin would expand to when HONOR_NANS
> (BFmode) it true.

Quote from what Jakub said in [1].
---
This is not correct.
While using such code for _mm_cvtsbh_ss is fine if it is documented not to
raise exceptions and turn a sNaN into a qNaN, it is not fine for HONOR_NANS
(i.e. when -ffast-math is not on), because a __bf16 -> float conversion
on sNaN should raise invalid exception and turn it into a qNaN.
We could have extendbfsf2 expander that would FAIL; if HONOR_NANS and
emit extendbfsf2_1 otherwise. 
---
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I further wonder whether the nearby "extendhfdf2" expander is really
> needed. It doesn't look to specify anything that the corresponding insn
> doesn't also specify.
> 
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -5181,21 +5181,27 @@
>  ;; Don't use float_extend since psrlld doesn't raise  ;; exceptions and turn 
> a
> sNaN into a qNaN.
>  (define_insn "extendbfsf2_1"
> -  [(set (match_operand:SF 0 "register_operand"   "=x,Yw")
> +  [(set (match_operand:SF 0 "register_operand"   "=x,Yv,v")
>   (unspec:SF
> -   [(match_operand:BF 1 "register_operand" " 0,Yw")]
> +   [(match_operand:BF 1 "register_operand" " 0,Yv,v")]
> UNSPEC_CVTBFSF))]
>   "TARGET_SSE2"
>   "@
>pslld\t{$16, %0|%0, 16}
> -  vpslld\t{$16, %1, %0|%0, %1, 16}"
> -  [(set_attr "isa" "noavx,avx")
> +  vpslld\t{$16, %1, %0|%0, %1, 16}
> +  vpslld\t{$16, %g1, %g0|%g0, %g1, 16}"
> +  [(set_attr "isa" "noavx,avx,*")
> (set_attr "type" "sseishft1")
> (set_attr "length_immediate" "1")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix" "orig,vex")
> -   (set_attr "mode" "TI")
> -   (set_attr "memory" "none")])
> +   (set_attr "prefix_data16" "1,*,*")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "mode" "TI,TI,XI")
> +   (set_attr "memory" "none")
> +   (set (attr "enabled")
> + (if_then_else (eq_attr "alternative" "2")
> +   (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
> + && !TARGET_PREFER_AVX256")
> +   (const_string "*")))])
> 
>  (define_expand "extendxf2"
>[(set (match_operand:XF 0 "nonimmediate_operand")

RE: [PATCH] x86: improve fast bfloat->float conversion

2023-07-11 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, July 11, 2023 3:50 PM
> To: Liu, Hongtao 
> Cc: Kirill Yukhin ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] x86: improve fast bfloat->float conversion
> 
> On 11.07.2023 08:45, Liu, Hongtao wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Tuesday, July 11, 2023 2:08 PM
> >>
> >> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
> >> constraints for the AVX alternative. Furthermore by using the 512-bit
> >> form of VPSSLD (in a new alternative) all 32 registers can be used
> >> directly by the insn without AVX512VL needing to be enabled.
> > Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.
> 
> Thanks.
> 
> >> ---
> >> The corresponding expander, "extendbfsf2", looks to have been dead
> >> since its introduction in a1ecc5600464 ("Fix incorrect
> >> _mm_cvtsbh_ss"): The builtin references the insn (extendbfsf2_1), not
> >> the expander. Can't the expander be deleted and the name of the insn
> >> then pruned of the _1 suffix? If so, that further raises the question
> >> of the significance of the "!HONOR_NANS (BFmode)" that the expander
> >> has, but the insn doesn't have. Which may instead suggest the builtin
> >> was meant to reference the expander. Yet then I can't see what would
> >> the builtin would expand to when HONOR_NANS
> >> (BFmode) it true.
> >
> > Quote from what Jakub said in [1].
> > ---
> > This is not correct.
> > While using such code for _mm_cvtsbh_ss is fine if it is documented
> > not to raise exceptions and turn a sNaN into a qNaN, it is not fine
> > for HONOR_NANS (i.e. when -ffast-math is not on), because a __bf16 ->
> > float conversion on sNaN should raise invalid exception and turn it into a
> qNaN.
> > We could have extendbfsf2 expander that would FAIL; if HONOR_NANS
> and
> > emit extendbfsf2_1 otherwise.
> > ---
> > [1]
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I'm not sure I understand: It sounds like what Jakub said matches my
> observation, yet then it seems unlikely that the issue wasn't fixed in over 
> half
> a year.
> 
> Also having the expander FAIL when HONOR_NANS (matching what I was
> thinking) still doesn't clarify to me what then would happen to uses of the
> builtin. Is there any (common code) fallback for such a case? I didn't think
> there would be, in which case wouldn't this result in an internal compiler
> error?
For __bf16 -> float or target specific builtins, it should be ok since __bf16 
is just an extension type.
 but extendbfsf2 is a standard pattern name which is also used to expand c++23 
std::bfloat16_t -> float conversion which is assumed to raise exceptions for 
sNAN.
Since vpslld won't raise any exception, we need to add HONOR_NANS in the 
extendbfsf2 pattern.
It's my understanding, for std:bfloat16_t support, it's mentioned in [2].

https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601865.html
> 
> Jan

RE: [PATCH] i386: Fix incorrect intrinsic signature for AVX512 s{lli|rai|rli}

2023-05-25 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Hu, Lin1 
> Sent: Thursday, May 25, 2023 3:52 PM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH] i386: Fix incorrect intrinsic signature for AVX512
> s{lli|rai|rli}
> 
> OK, I update the change log and modify a part of format. The attached file is
> the new version.
LGTM.
> 
> -Original Message-
> From: Hongtao Liu 
> Sent: Thursday, May 25, 2023 11:40 AM
> To: Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Fix incorrect intrinsic signature for AVX512
> s{lli|rai|rli}
> 
> On Thu, May 25, 2023 at 10:55 AM Hu, Lin1 via Gcc-patches
>  wrote:
> >
> > Hi all,
> >
> > This patch aims to fix incorrect intrinsic signature for
> _mm{512|256|}_s{lli|rai|rli}_epi*. And it has been tested on x86_64-pc-
> linux-gnu. OK for trunk?
> >
> > BRs,
> > Lin
> >
> > gcc/ChangeLog:
> >
> > PR target/109173
> > PR target/109174
> > * config/i386/avx512bwintrin.h (_mm512_srli_epi16): Change type
> from
> > int to const int.
> int to unsigned int or const int to const unsigned int.
> Others LGTM.
> > (_mm512_mask_srli_epi16): Ditto.
> > (_mm512_slli_epi16): Ditto.
> > (_mm512_mask_slli_epi16): Ditto.
> > (_mm512_maskz_slli_epi16): Ditto.
> > (_mm512_srai_epi16): Ditto.
> > (_mm512_mask_srai_epi16): Ditto.
> > (_mm512_maskz_srai_epi16): Ditto.
> > * config/i386/avx512vlintrin.h (_mm256_mask_srli_epi32): Ditto.
> > (_mm256_maskz_srli_epi32): Ditto.
> > (_mm_mask_srli_epi32): Ditto.
> > (_mm_maskz_srli_epi32): Ditto.
> > (_mm256_mask_srli_epi64): Ditto.
> > (_mm256_maskz_srli_epi64): Ditto.
> > (_mm_mask_srli_epi64): Ditto.
> > (_mm_maskz_srli_epi64): Ditto.
> > (_mm256_mask_srai_epi32): Ditto.
> > (_mm256_maskz_srai_epi32): Ditto.
> > (_mm_mask_srai_epi32): Ditto.
> > (_mm_maskz_srai_epi32): Ditto.
> > (_mm256_srai_epi64): Ditto.
> > (_mm256_mask_srai_epi64): Ditto.
> > (_mm256_maskz_srai_epi64): Ditto.
> > (_mm_srai_epi64): Ditto.
> > (_mm_mask_srai_epi64): Ditto.
> > (_mm_maskz_srai_epi64): Ditto.
> > (_mm_mask_slli_epi32): Ditto.
> > (_mm_maskz_slli_epi32): Ditto.
> > (_mm_mask_slli_epi64): Ditto.
> > (_mm_maskz_slli_epi64): Ditto.
> > (_mm256_mask_slli_epi32): Ditto.
> > (_mm256_maskz_slli_epi32): Ditto.
> > (_mm256_mask_slli_epi64): Ditto.
> > (_mm256_maskz_slli_epi64): Ditto.
> > (_mm_mask_srai_epi16): Ditto.
> > (_mm_maskz_srai_epi16): Ditto.
> > (_mm256_srai_epi16): Ditto.
> > (_mm256_mask_srai_epi16): Ditto.
> > (_mm_mask_slli_epi16): Ditto.
> > (_mm_maskz_slli_epi16): Ditto.
> > (_mm256_mask_slli_epi16): Ditto.
> > (_mm256_maskz_slli_epi16): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/109173
> > PR target/109174
> > * gcc.target/i386/pr109173-1.c: New test.
> > * gcc.target/i386/pr109174-1.c: Ditto.
> > ---
> >  gcc/config/i386/avx512bwintrin.h   |  32 +++---
> >  gcc/config/i386/avx512fintrin.h|  58 +++
> >  gcc/config/i386/avx512vlbwintrin.h |  36 ---
> >  gcc/config/i386/avx512vlintrin.h   | 112 +++--
> >  gcc/testsuite/gcc.target/i386/pr109173-1.c |  57 +++
> >  gcc/testsuite/gcc.target/i386/pr109174-1.c |  45 +
> >  6 files changed, 236 insertions(+), 104 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr109173-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr109174-1.c
> >
> > diff --git a/gcc/config/i386/avx512bwintrin.h
> b/gcc/config/i386/avx512bwintrin.h
> > index 89790f7917b..791d4e35f32 100644
> > --- a/gcc/config/i386/avx512bwintrin.h
> > +++ b/gcc/config/i386/avx512bwintrin.h
> > @@ -2880,7 +2880,7 @@ _mm512_maskz_dbsad_epu8 (__mmask32 __U,
> __m512i __A, __m512i __B,
> >
> >  extern __inline __m512i
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm512_srli_epi16 (__m512i __A, const int __imm)
> > +_mm512_srli_epi16 (__m512i __A, const unsigned int __imm)
> >  {
> >return (__m512i) __builtin_ia32_psrlwi512_mask ((

RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-16 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Kong, Lingling 
> Sent: Friday, September 16, 2022 3:40 PM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]
> 
> Hi,
> 
> > >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> > >  : mode == V16HFmode ? V8HFmode
> > > +: mode == V16BFmode ? V8BFmode
> > Can it be written as switch case?
> Sure, I fixed it in new patch. Thanks again for take a look.
> OK for master ?
+ switch (mode)
+   {
+ case V16HImode:
+   hvmode = V8HImode;
+   break;
+ case V16HFmode:
+   hvmode = V8HFmode;
+   break;
+ case V16BFmode:
+   hvmode = V8BFmode;
+   break;
+ case V32QImode:
+   hvmode = V16QImode;
+   break;
+ default:
+   gcc_unreachable ();
+   } > 

For the format, case aligns with {?
Others LGTM.

> Thanks,
> Lingling
> 
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: Thursday, September 15, 2022 11:46 AM
> > To: Kong, Lingling 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]
> >
> > On Thu, Sep 15, 2022 at 11:36 AM Kong, Lingling via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > Hi
> > >
> > > The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf
> > > mode in
> > ix86_expand_vector_init_duplicate.
> > > Add testcase with sse2 without avx2.
> > >
> > > OK for master?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/106887
> > > * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
> > > Fixed V16BF mode case.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/106887
> > > * gcc.target/i386/vect-bfloat16-2c.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc|  1 +
> > >  .../gcc.target/i386/vect-bfloat16-2c.c| 76 +++
> > >  2 files changed, 77 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc
> > > b/gcc/config/i386/i386-expand.cc index d7b49c99dc8..9451c561489
> > > 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -15111,6 +15111,7 @@ ix86_expand_vector_init_duplicate (bool
> > mmx_ok, machine_mode mode,
> > > {
> > >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> > >  : mode == V16HFmode ? V8HFmode
> > > +: mode == V16BFmode ? V8BFmode
> > Can it be written as switch case?
> > >  : V16QImode);
> > >   rtx x = gen_reg_rtx (hvmode);
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > > b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > > new file mode 100644
> > > index 000..bead94e46a1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > > @@ -0,0 +1,76 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-mf16c -msse2 -mno-avx2 -O2" } */
> > > +
> > > +typedef __bf16 v8bf __attribute__ ((__vector_size__ (16))); typedef
> > > +__bf16 v16bf __attribute__ ((__vector_size__ (32)));
> > > +
> > > +#define VEC_EXTRACT(V,S,IDX)   \
> > > +  S\
> > > +  __attribute__((noipa))   \
> > > +  vec_extract_##V##_##IDX (V v)\
> > > +  {\
> > > +return v[IDX]; \
> > > +  }
> > > +
> > > +#define VEC_SET(V,S,IDX)   \
> > > +  V\
> > > +  __attribute__((noipa))   \
> > > +  vec_set_##V##_##IDX (V v, S s)   \
> > > +  {\
> > > +v[IDX] = s;\
> > > +return v;

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-09-21 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jiang, Haochen 
> Sent: Thursday, September 22, 2022 2:23 PM
> To: Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: RE: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> Hi all,
> 
> I would like to backport this patch to GCC 12 release branch as machines with
> the version of default GCC is 12.x (which is always using newer kernels), if 
> the
> patch is not backported, the amx tests will always fail.
> 
> Ok for backport?
Ok.
> 
> BRs,
> Haochen
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Tuesday, June 21, 2022 10:53 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > kernels
> >
> > On Tue, Jun 21, 2022 at 9:41 AM Jiang, Haochen
> > 
> > wrote:
> > >
> > > > -Original Message-----
> > > > From: Uros Bizjak 
> > > > Sent: Tuesday, June 21, 2022 3:06 PM
> > > > To: Jiang, Haochen 
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > > kernels
> > > >
> > > > On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen
> > > > 
> > > > wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Uros Bizjak 
> > > > > > Sent: Monday, June 20, 2022 10:54 PM
> > > > > > To: Jiang, Haochen 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > 
> > > > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for
> > > > > > latest kernels
> > > > > >
> > > > > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > From: "Jiang, Haochen" 
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We need syscall to enable AMX for kernels>=5.4. It is
> > > > > > > missing in current amx tests, which will cause test fail.
> > > > > >
> > > > > > So this new code is only valid for linux & co?
> > > > >
> > > > > Thanks for reminding me for that, I only test on linux since the
> > > > > header file is
> > > > only in linux.
> > > > >
> > > > > Just updated a patch wrapping with a macro not to change the
> > > > > behavior on
> > > > windows.
> > > >
> > > > I think you want __linux__ there, not __unix__.
> > >
> > > Fixed with __linux__.
> >
> > OK.
> >
> > Thanks,
> > Uros.
> >
> > >
> > > Thx,
> > > Haochen
> > >
> > > >
> > > > Uros.
> > > >
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu.
> > > > >
> > > > > Thx,
> > > > > Haochen
> > > > > >
> > > > > > Uros.
> > > > > >
> > > > > > >
> > > > > > > This patch aims to add them to fix this bug.
> > > > > > >
> > > > > > > BRs,
> > > > > > > Haochen
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > > > > New function to check if AMX is usable and enable AMX.
> > > > > > > (main): Run test if AMX is usable.
> > > > > > > ---
> > > > > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > > > > +++
> > > > > > >  1 file changed, 24 insertions(+)
> > > > > > >
> > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > index 434b0e59703..92ed8669304 100644
> > > > > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > >

RE: [PATCH] testsuite: Fix up avx256-unaligned-store-3.c test.

2022-09-25 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Hu, Lin1 
> Sent: Monday, September 26, 2022 1:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] testsuite: Fix up avx256-unaligned-store-3.c test.
> 
> Hi all,
> 
> This patch aims to fix a problem that avx256-unaligned-store-3.c test reports
> two unexpected fails under "-march=cascadelake".
> 
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
> 
> BRs,
> Lin
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/94962
>   * gcc.target/i386/avx256-unaligned-store-3.c: Add -mno-avx512f
> ---
>  gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> b/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> index f909099bcb1..67635fb9e66 100644
> --- a/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> +++ b/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -
> mtune=generic -fno-common" } */
> +/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -
> mtune=generic -fno-common -mno-avx512f" } */
> 
>  #define N 1024
> 
> --
> 2.18.2

RE: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, October 11, 2022 9:59 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [x86] Add define_insn_and_split to support general
> version of "kxnor".
> 
> On Tue, Oct 11, 2022 at 04:03:16PM +0800, liuhongt via Gcc-patches wrote:
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (*notxor_1): New post_reload
> > define_insn_and_split.
> > (*notxorqi_1): Ditto.
> 
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -10826,6 +10826,39 @@ (define_insn "*_1"
> > (set_attr "type" "alu, alu, msklog")
> > (set_attr "mode" "")])
> >
> > +(define_insn_and_split "*notxor_1"
> > +  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
> > +   (not:SWI248
> > + (xor:SWI248
> > +   (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
> > +   (match_operand:SWI248 2 "" "r,,k"
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "ix86_binary_operator_ok (XOR, mode, operands)"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(parallel
> > +[(set (match_dup 0)
> > + (xor:SWI248 (match_dup 1) (match_dup 2)))
> > + (clobber (reg:CC FLAGS_REG))])
> > +   (set (match_dup 0)
> > +   (not:SWI248 (match_dup 1)))]
> > +{
> > +  if (MASK_REGNO_P (REGNO (operands[0])))
> 
> This causes --enable-checking=yes,rtl,extra regression on
> gcc.dg/store_merging_13.c test on x86_64-linux:
> .../gcc/testsuite/gcc.dg/store_merging_13.c: In function 'f13':
> .../gcc/testsuite/gcc.dg/store_merging_13.c:189:1: internal compiler error: 
> RTL
> check: expected code 'reg', have 'mem' in rhs_regno, at rtl.h:1932 0x7b0c8f
> rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char 
> const*)
> ../../gcc/rtl.cc:916
> 0x8e74be rhs_regno
> ../../gcc/rtl.h:1932
> 0x9785fd rhs_regno
> ./genrtl.h:120
> 0x9785fd gen_split_260(rtx_insn*, rtx_def**)
> ../../gcc/config/i386/i386.md:10846
> 0x23596dc split_insns(rtx_def*, rtx_insn*)
> ../../gcc/config/i386/i386.md:16392
> 0xfccd5a try_split(rtx_def*, rtx_insn*, int)
> ../../gcc/emit-rtl.cc:3799
> 0x132e9d8 split_insn
> ../../gcc/recog.cc:3384
> 0x13359d5 split_all_insns()
> ../../gcc/recog.cc:3488
> 0x1335ae8 execute
> ../../gcc/recog.cc:4412
> Please submit a full bug report, with preprocessed source (by using -freport-
> bug).
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> 
> Fixed thusly, tested on x86_64-linux, committed to trunk as obvious.
Thanks.
> 
> 2022-10-11  Jakub Jelinek  
> 
>   PR target/107185
>   * config/i386/i386.md (*notxor_1): Use MASK_REG_P (x)
> instead of
>   MASK_REGNO_P (REGNO (x)).
> 
> --- gcc/config/i386/i386.md.jj2022-10-11 12:10:42.188891134 +0200
> +++ gcc/config/i386/i386.md   2022-10-11 15:47:45.531449089 +0200
> @@ -10843,7 +10843,7 @@ (define_insn_and_split "*notxor_1"
> (set (match_dup 0)
>   (not:SWI248 (match_dup 0)))]
>  {
> -  if (MASK_REGNO_P (REGNO (operands[0])))
> +  if (MASK_REG_P (operands[0]))
>  {
>emit_insn (gen_kxnor (operands[0], operands[1], operands[2]));
>DONE;
> 
> 
>   Jakub

RE: [PATCH] Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS

2022-10-11 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Cui, Lili 
> Sent: Wednesday, October 12, 2022 11:00 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com; Lu, Hongjiu
> 
> Subject: [PATCH] Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS
> 
> Hi Hontao,
> 
> This patch is to remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS.
> The new intel ISE removes AVX512_VP2INTERSECT from SAPPHIRERAPIDS,
> AVX512_VP2INTERSECT is only supportted in Tigerlake.
> 
> Hi Uros,
> 
> This patch is to remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS.
> The new intel ISE removes AVX512_VP2INTERSECT from SAPPHIRERAPIDS,
> AVX512_VP2INTERSECT is only supportted in Tigerlake.
> 
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
> 
> OK for master?
Yes, thanks.
> 
> 
> gcc/ChangeLog:
> 
>   * config/i386/driver-i386.cc (host_detect_local_cpu):
>   Move sapphirerapids out of AVX512_VP2INTERSECT.
>   * config/i386/i386.h: Remove AVX512_VP2INTERSECT from
> PTA_SAPPHIRERAPIDS
>   * doc/invoke.texi: Remove AVX512_VP2INTERSECT from
> SAPPHIRERAPIDS
> ---
>  gcc/config/i386/driver-i386.cc | 13 +
>  gcc/config/i386/i386.h |  7 +++
>  gcc/doc/invoke.texi|  8 
>  3 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc 
> index
> 3c702fdca33..ef567045c67 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -589,15 +589,12 @@ const char *host_detect_local_cpu (int argc, const
> char **argv)
> /* This is unknown family 0x6 CPU.  */
> if (has_feature (FEATURE_AVX))
>   {
> +   /* Assume Tiger Lake */
> if (has_feature (FEATURE_AVX512VP2INTERSECT))
> - {
> -   if (has_feature (FEATURE_TSXLDTRK))
> - /* Assume Sapphire Rapids.  */
> - cpu = "sapphirerapids";
> -   else
> - /* Assume Tiger Lake */
> - cpu = "tigerlake";
> - }
> + cpu = "tigerlake";
> +   /* Assume Sapphire Rapids.  */
> +   else if (has_feature (FEATURE_TSXLDTRK))
> + cpu = "sapphirerapids";
> /* Assume Cooper Lake */
> else if (has_feature (FEATURE_AVX512BF16))
>   cpu = "cooperlake";
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> 900a3bc3673..372a2cff8fe 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2326,10 +2326,9 @@ constexpr wide_int_bitmask PTA_ICELAKE_SERVER
> = PTA_ICELAKE_CLIENT  constexpr wide_int_bitmask PTA_TIGERLAKE =
> PTA_ICELAKE_CLIENT | PTA_MOVDIRI
>| PTA_MOVDIR64B | PTA_CLWB | PTA_AVX512VP2INTERSECT | PTA_KL |
> PTA_WIDEKL;  constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS =
> PTA_ICELAKE_SERVER | PTA_MOVDIRI
> -  | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD |
> PTA_CLDEMOTE
> -  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK |
> PTA_AMX_TILE
> -  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI |
> PTA_AVX512FP16
> -  | PTA_AVX512BF16;
> +  | PTA_MOVDIR64B | PTA_ENQCMD | PTA_CLDEMOTE | PTA_PTWRITE |
> + PTA_WAITPKG  | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE |
> + PTA_AMX_INT8 | PTA_AMX_BF16  | PTA_UINTR | PTA_AVXVNNI |
> + PTA_AVX512FP16 | PTA_AVX512BF16;
>  constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
>| PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
> constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE; diff --
> git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> 271c8bb8468..a9ecc4426a4 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -32057,11 +32057,11 @@ Intel sapphirerapids CPU with 64-bit extensions,
> MOVBE, MMX, SSE, SSE2, SSE3,  SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF,
> FXSR, AVX, XSAVE, PCLMUL, FSGSBASE,  RDRND, F16C, AVX2, BMI, BMI2, LZCNT,
> FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW,  AES, CLFLUSHOPT, XSAVEC,
> XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, AVX512DQ, -AVX512CD, PKU,
> AVX512VBMI, AVX512IFMA, SHA, AVX512VNNI, GFNI, VAES, AVX512VBMI2
> +AVX512CD, PKU, AVX512VBMI, AVX512IFMA, SHA, AVX512VNNI, GFNI, VAES,
> +AVX512VBMI2,
>  VPCLMULQDQ, AVX512BITALG, RDPID, AVX512VPOPCNTDQ, PCONFIG,
> WBNOINVD, CLWB, -MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, ENQCMD,
> CLDEMOTE, PTWRITE, WAITPKG, -SERIALIZE, TSXLDTRK, UINTR, AMX-BF16,
> AMX-TILE, AMX-INT8, AVX-VNNI, AVX512FP16 -and AVX512BF16 instruction set
> support.
> +MOVDIRI, MOVDIR64B, ENQCMD, CLDEMOTE, PTWRITE, WAITPKG, SERIALIZE,
> +TSXLDTRK, UINTR, AMX-BF16, AMX-TILE, AMX-INT8, AVX-VNNI, AVX512FP16
> and
> +AVX512BF16 instruction set support.
> 
>  @item alderlake
>  Intel Alderlake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
> SSSE3,
> --
> 2.17.1
> 
> Thanks,
> Lili.
> Thanks

RE: [PATCH] MAINTAINERS: Add myself for write after approval

2022-10-12 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Cui, Lili 
> Sent: Wednesday, October 12, 2022 3:50 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao 
> Subject: [PATCH] MAINTAINERS: Add myself for write after approval
> 
> Hi,
> 
> I want to add myself in MAINTANINER for write after approval.
> 
> OK for master?
Obvious fixes can be committed without prior 
approval(https://gcc.gnu.org/gitwrite.html).
This can be considered as an obvious fix(But you still need to send the patch 
out like this).
> 
> ChangeLog:
>   * MAINTAINERS (Write After Approval): Add myself.
> 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 11fa8bc6dbd..e4e7349a6d9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -377,6 +377,7 @@ Andrea Corallo
>   
>  Christian Cornelssen 
>  Ludovic Courtès  
>  Lawrence Crowl   
> +Lili Cui 
>  Ian Dall 
>  David Daney
>   
>  Robin Dapp   
> --
> 2.17.1

RE: [PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Wang, Hongyu 
> Sent: Tuesday, March 22, 2022 11:28 AM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch
> [PR 104978]
> 
> Hi, here is the patch with force_reg before lowpart_subreg.
> 
> Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> 
> Ok for master?
> 
> For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be
> and by 1 to ensure the mask is bind to lowest byte.
> Use masked vmovss to perform same operation which omits higher bits of mask.
> 
> gcc/ChangeLog:
> 
>   PR target/104978
>   * config/i386/sse.md
>   (avx512fp16_fmaddcsh_v8hf_mask1   Use avx512f_movsf_mask instead of vmovaps or vblend, and
>   force_reg before lowpart_subreg.
>   (avx512fp16_fcmaddcsh_v8hf_mask1 
> gcc/testsuite/ChangeLog:
> 
>   PR target/104978
>   * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Adjust asm scan.
>   * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
>   * gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: Removed.
>   * gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
>   * gcc.target/i386/pr104978.c: New test.
> 
> V3
> ---
>  gcc/config/i386/sse.md| 62 ++-
>  .../i386/avx512fp16-vfcmaddcsh-1a.c   |  4 +-
>  .../i386/avx512fp16-vfcmaddcsh-1c.c   | 13 
>  .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |  4 +-
>   .../gcc.target/i386/avx512fp16-vfmaddcsh-1c.c | 13 
>  gcc/testsuite/gcc.target/i386/pr104978.c  | 18 ++
>  6 files changed, 42 insertions(+), 72 deletions(-)  delete mode 100644
> gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104978.c
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> 21bf3c55c95..6f7af2f21d6 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -6576,7 +6576,7 @@ (define_expand
> "avx512fp16_fmaddcsh_v8hf_mask1"
> (match_operand:QI 4 "register_operand")]
>"TARGET_AVX512FP16 && "
>  {
> -  rtx op0, op1;
> +  rtx op0, op1, dest;
> 
>if ()
>  emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask
> ( @@ -6586,26 +6586,15 @@ (define_expand
> "avx512fp16_fmaddcsh_v8hf_mask1"
>  emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (operands[0],
>operands[1], operands[2], operands[3], operands[4]));
> 
> -  if (TARGET_AVX512VL)
> -  {
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
> -  }
> -  else
> -  {
> -rtx mask, tmp, vec_mask;
> -mask = lowpart_subreg (SImode, operands[4], QImode),
> -tmp = gen_reg_rtx (SImode);
> -emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
> -vec_mask = gen_reg_rtx (V4SImode);
> -emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
> -emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
> -vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
> -  }
> +  op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]),
> + V8HFmode);
> +  if (!MEM_P (operands[1]))
> +operands[1] = force_reg (V8HFmode, operands[1]);
> +  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> +  dest = gen_reg_rtx (V4SFmode);
> +  emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1,
> +operands[4]));
> +  emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest,
> +V4SFmode));
>DONE;
>  })
> 
> @@ -6631,7 +6620,7 @@ (define_expand
> "avx512fp16_fcmaddcsh_v8hf_mask1"
> (match_operand:QI 4 "register_operand")]
>"TARGET_AVX512FP16 && "
>  {
> -  rtx op0, op1;
> +  rtx op0, op1, dest;
> 
>if ()
>  emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask
> ( @@ -6641,26 +6630,15 @@ (define_expand
> "avx512fp16_fcmaddcsh_v8hf_mask1"
>  emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (operands[0],
>operands[1], operands[2], operands[3], operands[4]));
> 
> -  if (TARGET_AVX512VL)
> -  {
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode)

RE: [PATCH] docs: Document new param x86-stlf-window-ninsns.

2022-04-06 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Martin Liška 
> Sent: Wednesday, April 6, 2022 3:35 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao 
> Subject: [PATCH] docs: Document new param x86-stlf-window-ninsns.
> 
> Hi.
> 
> The patch documents the newly added parameter. One question I have is if it's
> fine listing it under 'i386 and x86_64 targets'?
Yes, thanks.
> 
> Cheers,
> Martin
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi: Document it.
> ---
>   gcc/doc/invoke.texi | 8 
>   1 file changed, 8 insertions(+)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> 3936aef69d0..1a51759e6e4 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15247,6 +15247,14 @@ loop.  The default value is four.
> 
>   @end table
> 
> +The following choices of @var{name} are available on i386 and x86_64 targets:
> +
> +@table @gcctabopt
> +@item x86-stlf-window-ninsns
> +Instructions number above which STFL stall penalty can be compensated.
> +
> +@end table
> +
>   @end table
> 
>   @node Instrumentation Options
> --
> 2.35.1

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Liu, Hongtao via Gcc-patches




>-Original Message-
>From: Kong, Lingling 
>Sent: Wednesday, November 24, 2021 2:25 PM
>To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
>Cc: Kong, Lingling 
>Subject: RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert
>_Float16 to SFmode with -mf16c [PR 102811]
>
>Hi,
>
>vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
>-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
>And cleared before conversion, updated  movhi_internal and
>ix86_can_change_mode_class.
>
>OK for master?
>
>gcc/ChangeLog:
>
>   PR target/102811
>   * config/i386/i386.c (ix86_can_change_mode_class): SSE2 can load
>16bit data
>   to sse register via pinsrw.
>   * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.
>   (extendhfdf2): Split extendhf2 into separate extendhfsf2,
>extendhfdf2.
>   extendhfdf only for target_avx512fp16.
>   (*extendhf2):rename extendhf2.
>   (truncsfhf2): Likewise.
>   (truncdfhf2): Likewise.
>   (*trunc2): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>   PR target/102811
>   * gcc.target/i386/pr90773-21.c: Optimized movhi_internal,
>   optimize vmovd + movw to vpextrw.
>   * gcc.target/i386/pr90773-23.c: Ditto.
>   * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
>---
> gcc/config/i386/i386.c|  5 +-
> gcc/config/i386/i386.md   | 74 +--
> .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
> gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
> gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
> 5 files changed, 83 insertions(+), 11 deletions(-)  create mode 100644
>gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
>
>diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
>e94efdf39fb..4b813533961 100644
>--- a/gcc/config/i386/i386.c
>+++ b/gcc/config/i386/i386.c
>@@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode
>from, machine_mode to,
>disallow a change to these modes, reload will assume it's ok to
>drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>the vec_dupv4hi pattern.
>-   NB: AVX512FP16 supports vmovw which can load 16bit data to sse
>-   register.  */
>-  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ?
>2 : 4;
>+   NB: SSE2 can load 16bit data to sse register via pinsrw.  */
>+  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 :
>+4;
>   if (GET_MODE_SIZE (from) < mov_size)
>   return false;
> }
>diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
>6eb9de81921..6ee264f1151 100644
>--- a/gcc/config/i386/i386.md
>+++ b/gcc/config/i386/i386.md
>@@ -2525,6 +2525,16 @@
> case TYPE_SSEMOV:
>   return ix86_output_ssemov (insn, operands);
>
>+case TYPE_SSELOG:
>+  if (SSE_REG_P (operands[0]))
>+  return MEM_P (operands[1])
>+? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
>+: "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
>+  else
>+  return MEM_P (operands[1])
>+? "pextrw\t{$0, %1, %0|%0, %1, 0}"
>+: "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
>+
> case TYPE_MSKLOG:
>   if (operands[1] == const0_rtx)
>   return "kxorw\t%0, %0, %0";
>@@ -2540,13 +2550,17 @@
> }
> }
>   [(set (attr "isa")
>-  (cond [(eq_attr "alternative" "9,10,11,12,13")
>-(const_string "avx512fp16")
>+  (cond [(eq_attr "alternative" "9,10,11,12")
>+(const_string "sse2")
>+ (eq_attr "alternative" "13")
>+(const_string "sse4")
>  ]
>  (const_string "*")))
>(set (attr "type")
>  (cond [(eq_attr "alternative" "9,10,11,12,13")
>-(const_string "ssemov")
>+(if_then_else (match_test "TARGET_AVX512FP16")
>+  (const_string "ssemov")
>+  (const_string "sselog"))
>   (eq_attr "alternative" "4,5,6,7")
> (const_string "mskmov")
>   (eq_attr "alternative" "8")
>@@ -4574,8 +4588,32 @@
>   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
> })
>
>-(define_insn "extendhf2"
>-  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
>+(define_expand "extendhfsf2"
>+  [(set (match_operand:S

RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-30 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Kong, Lingling 
> Sent: Friday, October 28, 2022 4:57 PM
> To: Hongtao Liu 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; Jiang,
> Haochen 
> Subject: RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> 
> Hi,
> 
> Because we  switch intrinsics for avx512bf16 to the new type __bf16. Now we
> could use m128/256bh for vector bf16 type instead of m128/256bf16.
> And unified builtin for avx512bf16/avxneconvert.
Ok.
> 
> Thanks,
> Lingling
> 
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: Tuesday, October 25, 2022 1:23 PM
> > To: Kong, Lingling 
> > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org;
> > Jiang, Haochen 
> > Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> >
> > On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling
> > 
> > wrote:
> > >
> > > > From: Gcc-patches
> > > > 
> > > > On Behalf Of Hongtao Liu via Gcc-patches
> > > > Sent: Monday, October 17, 2022 1:47 PM
> > > > To: Jiang, Haochen 
> > > > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> > > > Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> > > >
> > > > On Fri, Oct 14, 2022 at 3:58 PM Haochen Jiang via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > From: Kong Lingling 
> > > > > +(define_insn "vbcstne2ps_"
> > > > > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > > > > +(vec_duplicate:VF1_128_256
> > > > > + (unspec:SF
> > > > > +  [(match_operand:HI 1 "memory_operand" "m")]
> > > > > +  VBCSTNE)))]
> > > > > +  "TARGET_AVXNECONVERT"
> > > > > +  "vbcstne2ps\t{%1, %0|%0, %1}"
> > > > > +  [(set_attr "prefix" "vex")
> > > > > +  (set_attr "mode" "")])
> > > > Since jakub has support bf16 software emulation, can we rewrite it
> > > > with general rtl ir without unspec?
> > > > Like (float_extend:SF (match_operand:BF "memory_operand" "m")
> > > > > +
> > > > > +(define_int_iterator VCVTNEBF16
> > > > > +  [UNSPEC_VCVTNEEBF16SF
> > > > > +   UNSPEC_VCVTNEOBF16SF])
> > > > > +
> > > > > +(define_int_attr vcvtnebf16type
> > > > > +  [(UNSPEC_VCVTNEEBF16SF "ebf16")
> > > > > +   (UNSPEC_VCVTNEOBF16SF "obf16")]) (define_insn
> > > > > +"vcvtne2ps_"
> > > > > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > > > > +(unspec:VF1_128_256
> > > > > +  [(match_operand: 1 "memory_operand" "m")]
> > > > > + VCVTNEBF16))]
> > > > > +  "TARGET_AVXNECONVERT"
> > > > > +  "vcvtne2ps\t{%1, %0|%0, %1}"
> > > > > +  [(set_attr "prefix" "vex")
> > > > > +   (set_attr "mode" "")])
> > > > Similar for this one and all those patterns below.
> > >
> > > That's great! Thanks for the review!
> > > Now rewrite it without unspec and use float_extend for new define_insn.
> > Ok.
> > >
> > > Thanks
> > > Lingling
> > >
> > >
> >
> >
> > --
> > BR,
> > Hongtao

RE: [PATCH] Optimize vpermtiw/b to vpunpcklqdq for certain cases.

2022-05-13 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, May 13, 2022 4:15 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Optimize vpermtiw/b to vpunpcklqdq for certain cases.
> 
> On Fri, May 13, 2022 at 9:16 AM liuhongt  wrote:
> >
> > Assembly Optimization like:
> > -   vmovq   %xmm0, %xmm2
> > -   vmovdqa .LC0(%rip), %xmm0
> > vmovq   %xmm1, %xmm1
> > -   vpermi2w%xmm1, %xmm2, %xmm0
> > +   vmovq   %xmm0, %xmm0
> > +   vpunpcklqdq %xmm1, %xmm0, %xmm0
> >
> > ...
> >
> > -.LC0:
> > -   .value  0
> > -   .value  1
> > -   .value  2
> > -   .value  3
> > -   .value  8
> > -   .value  9
> > -   .value  10
> > -   .value  11
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/105033
> > * config/i386/sse.md (*vec_concatv4si): Extend to ..
> > (*vec_concat): .. V16QI and V8HImode.
> > (*vec_concatv16qi_permt2): New pre_reload define_insn_and_split.
> > (*vec_concatv8hi_permt2): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr105033.c: New test.
> > ---
> >  gcc/config/i386/sse.md   | 62 ++--
> >  gcc/testsuite/gcc.target/i386/pr105033.c | 27 +++
> >  2 files changed, 84 insertions(+), 5 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/i386/pr105033.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > a63df0d0b1f..2e417e47d20 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -19600,11 +19600,11 @@ (define_insn "*vec_concatv2si"
> > (set_attr "type" "sselog,ssemov,sselog,ssemov,mmxcvt,mmxmov")
> > (set_attr "mode" "TI,TI,V4SF,SF,DI,DI")])
> >
> > -(define_insn "*vec_concatv4si"
> > -  [(set (match_operand:V4SI 0 "register_operand"   "=x,v,x,x,v")
> > -   (vec_concat:V4SI
> > - (match_operand:V2SI 1 "register_operand" " 0,v,0,0,v")
> > - (match_operand:V2SI 2 "nonimmediate_operand" " x,v,x,m,m")))]
> > +(define_insn "*vec_concat"
> > +  [(set (match_operand:VI124_128 0 "register_operand"   "=x,v,x,x,v")
> > +   (vec_concat:VI124_128
> > + (match_operand: 1 "register_operand" " 
> > 0,v,0,0,v")
> > + (match_operand: 2 "nonimmediate_operand" "
> > +x,v,x,m,m")))]
> >"TARGET_SSE"
> >"@
> > punpcklqdq\t{%2, %0|%0, %2}
> > @@ -19617,6 +19617,58 @@ (define_insn "*vec_concatv4si"
> > (set_attr "prefix" "orig,maybe_evex,orig,orig,maybe_evex")
> > (set_attr "mode" "TI,TI,V4SF,V2SF,V2SF")])
> >
> > +(define_insn_and_split "*vec_concatv16qi_permt2"
> > +  [(set (match_operand:V16QI 0 "register_operand")
> > +   (unspec:V16QI
> > + [(const_vector:V16QI [(const_int 0) (const_int 1)
> > +   (const_int 2) (const_int 3)
> > +   (const_int 4) (const_int 5)
> > +   (const_int 6) (const_int 7)
> > +   (const_int 16) (const_int 17)
> > +   (const_int 18) (const_int 19)
> > +   (const_int 20) (const_int 21)
> > +   (const_int 22) (const_int 23)])
> > +  (match_operand:V16QI 1 "register_operand")
> > +  (match_operand:V16QI 2 "nonimmediate_operand")]
> > + UNSPEC_VPERMT2))]
> > +  "TARGET_AVX512VL && TARGET_AVX512VBMI"
> 
> You need "&& ix86_pre_reload_split ()" here, because a pseudo can be
> generated via force_reg.
> 
will change.
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > +   (vec_concat:V16QI (match_dup 1) (match_dup 2)))] {
> > +  operands[1] = lowpart_subreg (V8QImode,
> > +   force_reg (V16QImode, operands[1]),
> > +   V16QImode);
> > +  if (!MEM_P (operands[2]))
> > +operands[2] = force_reg (V16QImode, operands[2]);
> 
>

RE: gcc-wwwdocs branch master updated. 88e29096c36837553fc841bd1fa5df6caa776b44

2020-11-05 Thread Liu, Hongtao via Gcc-patches




>-Original Message-
>From: Gerald Pfeifer 
>Sent: Friday, November 6, 2020 5:57 AM
>To: Hongtao Liu ; hongtao Liu
>
>Cc: gcc-patches@gcc.gnu.org
>Subject: Re: gcc-wwwdocs branch master updated.
>88e29096c36837553fc841bd1fa5df6caa776b44
>
>On Thu, 29 Oct 2020, hongtao Liu via Gcc-cvs-wwwdocs wrote:
>> The branch, master has been updated
>>via  88e29096c36837553fc841bd1fa5df6caa776b44 (commit)
>>   from  053c956f6e9c71efac5be01f8a8ba79f15d87f4b (commit)
>
>>GCC now supports the Intel CPU named Alderlake through
>>  -march=alderlake.
>> -The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE ISA
>extensions.
>> +The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE
>KEYLOCKER
>> +ISA extensions.
>
>I did not see this posted on gcc-patches.  Should this list of extensions be
>separated by commas?
>
>(I can make that change if you agree.)
>

Yes, thanks for that.
Patch for adding -march=alderlake  
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549699.html
Patch for Keylocker  
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556026.html

>Also, I did not see you in gcc/MAINTAINERS, or did miss it?
>Since evidently you have write after approval access, please add yourself
>there.
>

Will do.

>Gerald

RE: gcc-wwwdocs branch master updated. 88e29096c36837553fc841bd1fa5df6caa776b44

2020-11-05 Thread Liu, Hongtao via Gcc-patches




>-Original Message-
>From: Liu, Hongtao
>Sent: Friday, November 6, 2020 9:22 AM
>To: Gerald Pfeifer ; Hongtao Liu ;
>hongtao Liu 
>Cc: gcc-patches@gcc.gnu.org
>Subject: RE: gcc-wwwdocs branch master updated.
>88e29096c36837553fc841bd1fa5df6caa776b44
>
>
>
>>-Original Message-
>>From: Gerald Pfeifer 
>>Sent: Friday, November 6, 2020 5:57 AM
>>To: Hongtao Liu ; hongtao Liu
>>
>>Cc: gcc-patches@gcc.gnu.org
>>Subject: Re: gcc-wwwdocs branch master updated.
>>88e29096c36837553fc841bd1fa5df6caa776b44
>>
>>On Thu, 29 Oct 2020, hongtao Liu via Gcc-cvs-wwwdocs wrote:
>>> The branch, master has been updated
>>>via  88e29096c36837553fc841bd1fa5df6caa776b44 (commit)
>>>   from  053c956f6e9c71efac5be01f8a8ba79f15d87f4b (commit)
>>
>>>GCC now supports the Intel CPU named Alderlake through
>>>  -march=alderlake.
>>> -The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE ISA
>>extensions.
>>> +The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE
>>KEYLOCKER
>>> +ISA extensions.
>>
>>I did not see this posted on gcc-patches.  Should this list of
>>extensions be separated by commas?
>>

I realize you're talking about the patch for gcc-wwwdocs.
No, I didn't send out a patch, sorry for that, will do it in further commit.
  
>>(I can make that change if you agree.)
>>
>
>Yes, thanks for that.
>Patch for adding -march=alderlake  https://gcc.gnu.org/pipermail/gcc-
>patches/2020-July/549699.html
>Patch for Keylocker  https://gcc.gnu.org/pipermail/gcc-patches/2020-
>October/556026.html
>
>>Also, I did not see you in gcc/MAINTAINERS, or did miss it?
>>Since evidently you have write after approval access, please add
>>yourself there.
>>
>
>Will do.
>
>>Gerald

RE: [PATCH] AVX512FP16: Support cond_op for HFmode

2021-09-23 Thread Liu, Hongtao via Gcc-patches




>-Original Message-
>From: Wang, Hongyu 
>Sent: Thursday, September 23, 2021 5:16 PM
>To: Liu, Hongtao 
>Cc: gcc-patches@gcc.gnu.org
>Subject: [PATCH] AVX512FP16: Support cond_op for HFmode
>
>Hi,
>
>This patch extend the expanders for cond_op to support vector HF modes.
>bootstraped and regtested on x86_64-pc-linux-gnu{-m32,}.
Do runtime tests passe on sde{-m32,}?
>Ok for master?
>
>gcc/ChangeLog:
>
>   * config/i386/sse.md (cond_): Extend to support
>   vector HFmodes.
>   (cond_mul): Likewise.
>   (cond_div): Likewise.
>   (cond_): Likewise.
>   (cond_fma): Likewise.
>   (cond_fms): Likewise.
>   (cond_fnma): Likewise.
>   (cond_fnms): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>   * gcc.target/i386/cond_op_addsubmuldiv__Float16-1.c: New test.
>   * gcc.target/i386/cond_op_addsubmuldiv__Float16-2.c: Ditto.
>   * gcc.target/i386/cond_op_fma__Float16-1.c: Ditto.
>   * gcc.target/i386/cond_op_fma__Float16-2.c: Ditto.
>   * gcc.target/i386/cond_op_maxmin__Float16-1.c: Ditto.
>   * gcc.target/i386/cond_op_maxmin__Float16-2.c: Ditto.
>---
> gcc/config/i386/sse.md| 112 +-
> .../i386/cond_op_addsubmuldiv__Float16-1.c|   9 ++
> .../i386/cond_op_addsubmuldiv__Float16-2.c|   7 ++
> .../gcc.target/i386/cond_op_fma__Float16-1.c  |  20 
> .../gcc.target/i386/cond_op_fma__Float16-2.c  |   7 ++
> .../i386/cond_op_maxmin__Float16-1.c  |   8 ++
> .../i386/cond_op_maxmin__Float16-2.c  |   6 +
> 7 files changed, 113 insertions(+), 56 deletions(-)  create mode 100644
>gcc/testsuite/gcc.target/i386/cond_op_addsubmuldiv__Float16-1.c
> create mode 100644
>gcc/testsuite/gcc.target/i386/cond_op_addsubmuldiv__Float16-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/cond_op_fma__Float16-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/cond_op_fma__Float16-2.c
> create mode 100644
>gcc/testsuite/gcc.target/i386/cond_op_maxmin__Float16-1.c
> create mode 100644
>gcc/testsuite/gcc.target/i386/cond_op_maxmin__Float16-2.c
>
>diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
>1ca95984afc..c2eeb7b1517 100644
>--- a/gcc/config/i386/sse.md
>+++ b/gcc/config/i386/sse.md
>@@ -2118,12 +2118,12 @@
>   [(set_attr "isa" "noavx,noavx,avx,avx")])
>
> (define_expand "cond_"
>-  [(set (match_operand:VF 0 "register_operand")
>-  (vec_merge:VF
>-(plusminus:VF
>-  (match_operand:VF 2 "vector_operand")
>-  (match_operand:VF 3 "vector_operand"))
>-(match_operand:VF 4 "nonimm_or_0_operand")
>+  [(set (match_operand:VFH 0 "register_operand")
>+  (vec_merge:VFH
>+(plusminus:VFH
>+  (match_operand:VFH 2 "vector_operand")
>+  (match_operand:VFH 3 "vector_operand"))
>+(match_operand:VFH 4 "nonimm_or_0_operand")
> (match_operand: 1 "register_operand")))]
>   " == 64 || TARGET_AVX512VL"
> {
>@@ -2207,12 +2207,12 @@
>(set_attr "mode" "")])
>
> (define_expand "cond_mul"
>-  [(set (match_operand:VF 0 "register_operand")
>-  (vec_merge:VF
>-(mult:VF
>-  (match_operand:VF 2 "vector_operand")
>-  (match_operand:VF 3 "vector_operand"))
>-(match_operand:VF 4 "nonimm_or_0_operand")
>+  [(set (match_operand:VFH 0 "register_operand")
>+  (vec_merge:VFH
>+(mult:VFH
>+  (match_operand:VFH 2 "vector_operand")
>+  (match_operand:VFH 3 "vector_operand"))
>+(match_operand:VFH 4 "nonimm_or_0_operand")
> (match_operand: 1 "register_operand")))]
>   " == 64 || TARGET_AVX512VL"
> {
>@@ -2322,12 +2322,12 @@
> })
>
> (define_expand "cond_div"
>-  [(set (match_operand:VF 0 "register_operand")
>-  (vec_merge:VF
>-(div:VF
>-  (match_operand:VF 2 "register_operand")
>-  (match_operand:VF 3 "vector_operand"))
>-(match_operand:VF 4 "nonimm_or_0_operand")
>+  [(set (match_operand:VFH 0 "register_operand")
>+  (vec_merge:VFH
>+(div:VFH
>+  (match_operand:VFH 2 "register_operand")
>+  (match_operand:VFH 3 "vector_operand"))
>+(match_operand:VFH 4 "nonimm_or_0_operand")
> (match_operand: 1 "register_operand")))]
>   " == 64 || TARGET_AVX512VL"
> {
>@@ -2660,12 +2660,12 @@
>(set_at

RE: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-03 Thread Liu, Hongtao via Gcc-patches




>-Original Message-
>From: Segher Boessenkool 
>Sent: Thursday, June 3, 2021 4:46 AM
>To: Richard Biener 
>Cc: Liu, Hongtao ; GCC Patches patc...@gcc.gnu.org>
>Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not
>(vec_duplicate A)).
>
>Hi!
>
>On Wed, Jun 02, 2021 at 09:07:35AM +0200, Richard Biener wrote:
>> On Wed, Jun 2, 2021 at 7:41 AM liuhongt via Gcc-patches
>>  wrote:
>> > For i386, it will enable below opt
>> >
>> > from
>> > notl%edi
>> > vpbroadcastd%edi, %xmm0
>> > vpand   %xmm1, %xmm0, %xmm0
>> > to
>> > vpbroadcastd%edi, %xmm0
>> > vpandn   %xmm1, %xmm0, %xmm0
>>
>> There will be cases where (vec_duplicate (not A)) is better than (not
>> (vec_duplicate A)), so I'm not sure it is a good idea to forcefully
>> canonicalize unary operations.
>
>It is two unaries in sequence, where the order does not matter either.
>As in all such cases you either have to handle both cases everywhere, or have
>a canonical order.
>
>> I suppose the
>> simplification happens inside combine
>
>combine uses simplify-rtx for most cases (it is part of combine, but used in
>quite a few other places these days).
>
>> - doesn't combine
>> already have code to try variants of an expression and isn't this a
>> good candidate that can be added there, avoiding the canonicalization?
>
>As I mentioned, this is done in simplify-rtx in cases that do not have a
>canonical representation.  This is critical because it prevents loops.
>
>A very typical example is how UMIN is optimised:
>
>   case UMIN:
>  if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0))
>   return op1;
>  if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
>   return op0;
>  tem = simplify_associative_operation (code, mode, op0, op1);
>  if (tem)
>   return tem;
>  break;
>
>(the stuff using "tem").
>
>Hongtao, can we do something similar here?  Does that work well?  Please try
>it out :-)

In simplify_rtx, no simplication occurs, there is just the difference between
 (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will 
only be 0.
Basically we don't know it's a simplication until combine successfully split the
3->2 instructions (not + broadcast + and to andnot + broadcast), but it's 
pretty awkward
to do this in combine.

Consider andnot is existed for many backends, I think a canonicalization is 
needed here.
Maybe we can add insn canonicalization for transforming (and (vect_duplicate 
(not A)) B) to 
(and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not 
(vec_duplicate A))?

>
>
>Segher

RE: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.

2021-06-03 Thread Liu, Hongtao via Gcc-patches



>-Original Message-
>From: Jakub Jelinek 
>Sent: Thursday, June 3, 2021 9:49 PM
>To: Liu, Hongtao 
>Cc: gcc-patches@gcc.gnu.org
>Subject: Re: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.
>
>On Thu, Jun 03, 2021 at 05:07:26PM +0800, liuhongt via Gcc-patches wrote:
>> @@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2"
>>"TARGET_AVX512F")
>>
>>  (define_insn "avx2_v8qiv8si2"
>> -  [(set (match_operand:V8SI 0 "register_operand" "=v")
>> +  [(set (match_operand:V8SI 0 "register_operand" "=Yv")
>>  (any_extend:V8SI
>>(vec_select:V8QI
>> -(match_operand:V16QI 1 "register_operand" "v")
>> +(match_operand:V16QI 1 "register_operand" "Yv")
>>  (parallel [(const_int 0) (const_int 1)
>> (const_int 2) (const_int 3)
>> (const_int 4) (const_int 5)
>
>Why do you need this change (and similarly other v -> Yv changes)?
>I mean, ix86_hard_regno_mode_ok for TARGET_AVX512F
>&& !TARGET_AVX512VL should return false for the 16-byte and 32-byte vector
>modes.
>
>The reason to use Yv is typically where the match_operand has 64-byte vector
>mode or scalar mode, yet it needs an AVX512VL instruction.
>
>The changes to use Yw look ok, that is for the cases where the insn requires
>both AVX512VL and AVX512BW, while ix86_hard_regno_mode_ok ensures
>the xmm16+ regs won't be used for the 16/32-byte vectors when AVX512VL is
>not on, it doesn't ensure that AVX512BW will be enabled.
Thanks for the review.
Yes, you're right, AVX512VL parts are already guaranteed by 
ix86_hard_regno_mode_ok.

Here is updated patch.
>
>   Jakub



0001-i386-Fix-ICE-of-insn-does-not-satisfy-its-constraint_v2.patch
Description: 0001-i386-Fix-ICE-of-insn-does-not-satisfy-its-constraint_v2.patch

RE: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-03 Thread Liu, Hongtao via Gcc-patches




>-Original Message-
>From: Segher Boessenkool 
>Sent: Friday, June 4, 2021 4:00 AM
>To: Liu, Hongtao 
>Cc: Richard Biener ; GCC Patches patc...@gcc.gnu.org>
>Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not
>(vec_duplicate A)).
>
>On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao wrote:
>> >A very typical example is how UMIN is optimised:
>> >
>> >   case UMIN:
>> >  if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0))
>> >return op1;
>> >  if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
>> >return op0;
>> >  tem = simplify_associative_operation (code, mode, op0, op1);
>> >  if (tem)
>> >return tem;
>> >  break;
>> >
>> >(the stuff using "tem").
>> >
>> >Hongtao, can we do something similar here?  Does that work well?
>> >Please try it out :-)
>>
>> In simplify_rtx, no simplication occurs, there is just the difference
>> between  (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here
>tem will only be 0.
>
>simplify-rtx is used by combine.  When you do and+not+splat for example my
>suggestion should kick in.  Try it out, don't just dismiss it?
>
Forgive my obtuseness, do you mean try the following changes, if so then there 
will be no "kick in", 
temp will be 0, there's no simplification here since it's just the difference 
between  (vec_duplicate (not REG))
 and (not (vec_duplicate (REG)). Or maybe you mean something else?

@@ -1708,6 +1708,17 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
 #endif
   break;

+  /* Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).  */
+case VEC_DUPLICATE:
+  if (GET_CODE (op) == NOT)
+   {
+ rtx vec_dup = gen_rtx_VEC_DUPLICATE (mode, XEXP (op, 0));
+ temp = simplify_unary_operation (NOT, mode, vec_dup, GET_MODE (op));
+ if (temp)
+   return temp;
+   }
+  break;
+
>> Basically we don't know it's a simplication until combine successfully
>> split the
>> 3->2 instructions (not + broadcast + and to andnot + broadcast), but
>> 3->it's pretty awkward
>> to do this in combine.
>
>But you need to do this *before* it is split.  That is the whole point.
>
>> Consider andnot is existed for many backends, I think a canonicalization is
>needed here.
>
>Please do note that that is not as easy as yoou may think: you need to make
>sure nothing ever creates non-canonical code.
>
>> Maybe we can add insn canonicalization for transforming (and
>> (vect_duplicate (not A)) B) to (and (not (duplicate (not A)) B) instead of
>(vec_duplicate (not A)) to (not (vec_duplicate A))?
>
>I don't understand what this means?
I mean let's give a last shot for andnot in case AND like below

@ -3702,6 +3702,16 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
   tem = simplify_associative_operation (code, mode, op0, op1);
   if (tem)
return tem;
+
+  if (GET_CODE (op0) == VEC_DUPLICATE
+ && GET_CODE (XEXP (op0, 0)) == NOT)
+   {
+ rtx vec_dup = gen_rtx_VEC_DUPLICATE (GET_MODE (op0),
+  XEXP (XEXP (op0, 0), 0));
+ return simplify_gen_binary (AND, mode,
+ gen_rtx_NOT (mode, vec_dup),
+ op1);
+   }
   break;
>
>
>Segher

RE: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-21 Thread Liu, Hongtao via Gcc-patches



>-Original Message-
>From: Uros Bizjak 
>Sent: Wednesday, July 21, 2021 4:23 PM
>To: Hongtao Liu 
>Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; H. J. Lu
>; Richard Biener 
>Subject: Re: [PATCH] Support logic shift left/right for avx512 mask type.
>
>On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu  wrote:
>>
>> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak  wrote:
>> >
>> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt  wrote:
>> > >
>> > > Hi:
>> > >   As mention in
>> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
>> > >
>> > > cut start-
>> > > > note for the lowpart we can just view-convert away the excess
>> > > > bits, fully re-using the mask.  We generate surprisingly "good" code:
>> > > >
>> > > > kmovb   %k1, %edi
>> > > > shrb$4, %dil
>> > > > kmovb   %edi, %k2
>> > > >
>> > > > besides the lack of using kshiftrb.  I guess we're just lacking
>> > > > a mask register alternative for
>> > > Yes, we can do it similar as kor/kand/kxor.
>> > > ---cut end
>> > >
>> > >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
>> > >   Ok for trunk?
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > * config/i386/constraints.md (Wb): New constraint.
>> > > (Ww): Ditto.
>> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
>> > > shift.
>> > > (*ashlqi3_1): Ditto.
>> > > (*3_1): Ditto.
>> > > (*3_1): Ditto.
>> > > * config/i386/sse.md (k): New define_split after
>> > > it to convert generic shift pattern to mask shift ones.
>> > >
>> > > gcc/testsuite/ChangeLog:
>> > >
>> > > * gcc.target/i386/mask-shift.c: New test.
>
>
>+(define_insn "*lshr3_1"
>+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m, ?k")
>+(lshiftrt:SWI12
>+  (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
>+  (match_operand:QI 2 "nonmemory_operand" "c, ")))
>+   (clobber (reg:CC FLAGS_REG))]
>+  "ix86_binary_operator_ok (LSHIFTRT, mode, operands)"
>
>Also split this one to QImode and HImode to avoid conditions in isa attribute.
>
>OK with this change.
>

Thanks for the review, here's the patch I'm check in.

>Thanks,
>Uros.


V3-0001-Support-logic-shift-left-right-for-avx512-mask-type.patch
Description: V3-0001-Support-logic-shift-left-right-for-avx512-mask-type.patch

RE: [PATCH] [x86] x86: Don't add crtfastmath.o for -shared and add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2022-12-14 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, December 14, 2022 4:23 PM
> To: Jakub Jelinek 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org;
> crazy...@gmail.com; hjl.to...@gmail.com; ubiz...@gmail.com
> Subject: Re: [PATCH] [x86] x86: Don't add crtfastmath.o for -shared and add a
> new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.
> 
> On Wed, Dec 14, 2022 at 9:16 AM Jakub Jelinek  wrote:
> >
> > On Wed, Dec 14, 2022 at 09:08:02AM +0100, Richard Biener via Gcc-patches
> wrote:
> > > On Wed, Dec 14, 2022 at 3:21 AM liuhongt via Gcc-patches
> > >  wrote:
> > > >
> > > > Don't add crtfastmath.o for -shared to avoid changing the MXCSR
> > > > register when loading a shared library.  crtfastmath.o will be
> > > > used only when building executables.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > Ok for trunk?
> > >
> > > You reject negative -mdaz-ftz but wouldn't that be useful with
> > > -Ofast -mno-daz-ftz since there's otherwise no way to avoid that?
> >
> > Agreed.
> > I even wonder if the best wouldn't be to make the option effectively
> > three state, default, no and yes, where if the option isn't specified
> > at all, then crtfastmath.o* is linked as is now except for -shared, if
> > it is -mno-daz-ftz, then it is never linked in regardless of other
> > options and if it is -mdaz-ftz, then it is linked even for -shared.
> 
> Possibly.  I'd also suggest to split the changed -shared handling to a 
> separate
> patch since people may want to backport this and it should be applicable to
> all other targets with similar handling.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522#c26
So patch in the upper link is ok for trunk?
I'll change -mdaz-ftz part as a separate patch.
> 
> > > > --- a/gcc/config/i386/i386.opt
> > > > +++ b/gcc/config/i386/i386.opt
> > > > @@ -420,6 +420,10 @@ mpc80
> > > >  Target RejectNegative
> > > >  Set 80387 floating-point precision to 80-bit.
> > > >
> > > > +mdaz-ftz
> > > > +Target RejectNegative
> > > > +Set the FTZ and DAZ Flags.
> >
> > As the option is only used in the driver, shouldn't it be marked
> > Driver and not Target?  It doesn't need to be saved/restored on every
> > cfun switch etc.
> >
> > > > +@item -mdaz-ftz
> > > > +@opindex mdaz-ftz
> > > > +
> > > > +the flush-to-zero (FTZ) and denormals-are-zero (DAZ) flags in the
> > > > +MXCSR register
> >
> > Shouldn't description start with capital letter?
> >
> > > > +are used to control floating-point calculations.SSE and AVX
> > > > +instructions including scalar and vector instructions could
> > > > +benefit from enabling the FTZ and DAZ flags when @option{-mdaz-ftz}
> is specified.
> > >
> > > Maybe say that the MXCSR register is set at program start to achieve
> > > this when the flag is specified at _link_ time and say this switch
> > > is ignored when -shared is specified?
> >
> > Jakub
> >

RE: [PATCH 2/4] Initial Emeraldrapids Support

2023-01-03 Thread Liu, Hongtao via Gcc-patches

There are actually only two patches, not four, and the subject *Patch 2/4* 
should be a typo.

> -Original Message-
> From: Hu, Lin1 
> Sent: Tuesday, January 3, 2023 4:37 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH 2/4] Initial Emeraldrapids Support
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/cpuinfo.h (get_intel_cpu): Handle
> Emeraldrapids.
>   * common/config/i386/i386-common.cc: Add Emeraldrapids.
> ---
>  gcc/common/config/i386/cpuinfo.h  | 2 ++
>  gcc/common/config/i386/i386-common.cc | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index bde231c07ee..3729b0f14a5 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -551,6 +551,8 @@ get_intel_cpu (struct __processor_model *cpu_model,
>break;
>  case 0x8f:
>/* Sapphire Rapids.  */
> +case 0xcf:
> +  /* Emerald Rapids.  */
>cpu = "sapphirerapids";
>CHECK___builtin_cpu_is ("corei7");
>CHECK___builtin_cpu_is ("sapphirerapids"); diff --git
> a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-
> common.cc
> index 7751265aff4..026926d8b41 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -2465,6 +2465,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_COOPERLAKE), P_PROC_AVX512F},
>{"sapphirerapids", PROCESSOR_SAPPHIRERAPIDS, CPU_HASWELL,
> PTA_SAPPHIRERAPIDS,
>  M_CPU_SUBTYPE (INTEL_COREI7_SAPPHIRERAPIDS), P_PROC_AVX512F},
> +  {"emeraldrapids", PROCESSOR_SAPPHIRERAPIDS, CPU_HASWELL,
> PTA_SAPPHIRERAPIDS,
> +M_CPU_SUBTYPE (INTEL_COREI7_SAPPHIRERAPIDS), P_PROC_AVX512F},
>{"alderlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"raptorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
> --
> 2.18.2

RE: [PATCH] Re-arrange sections of i386 cpuid

2023-04-18 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Mo, Zewei 
> Sent: Wednesday, April 19, 2023 10:03 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Re-arrange sections of i386 cpuid
> 
> Re-order i386 cpuid based on the order of CPUID.
> 
> gcc/ChangeLog:
> 
> * config/i386/cpuid.h: Open a new section for Extended Features
>   Leaf (%eax == 7, %ecx == 0) and Extended Features Sub-leaf (%eax
> == 7,
>   %ecx == 1).
Ok.
> ---
>  gcc/config/i386/cpuid.h | 35 +++
>  1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index
> be162dd8c78..971781c2b91 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -24,15 +24,6 @@
>  #ifndef _CPUID_H_INCLUDED
>  #define _CPUID_H_INCLUDED
> 
> -/* %eax */
> -#define bit_RAOINT   (1 << 3)
> -#define bit_AVXVNNI  (1 << 4)
> -#define bit_AVX512BF16   (1 << 5)
> -#define bit_CMPCCXADD(1 << 7)
> -#define bit_AMX_FP16 (1 << 21)
> -#define bit_HRESET   (1 << 22)
> -#define bit_AVXIFMA  (1 << 23)
> -
>  /* %ecx */
>  #define bit_SSE3 (1 << 0)
>  #define bit_PCLMUL   (1 << 1)
> @@ -52,10 +43,7 @@
>  #define bit_RDRND(1 << 30)
> 
>  /* %edx */
> -#define bit_AVXVNNIINT8 (1 << 4)
> -#define bit_AVXNECONVERT (1 << 5)
>  #define bit_CMPXCHG8B(1 << 8)
> -#define bit_PREFETCHI(1 << 14)
>  #define bit_CMOV (1 << 15)
>  #define bit_MMX  (1 << 23)
>  #define bit_FXSAVE   (1 << 24)
> @@ -84,7 +72,7 @@
>  #define bit_CLZERO   (1 << 0)
>  #define bit_WBNOINVD (1 << 9)
> 
> -/* Extended Features (%eax == 7) */
> +/* Extended Features Leaf (%eax == 7, %ecx == 0) */
>  /* %ebx */
>  #define bit_FSGSBASE (1 << 0)
>  #define bit_SGX (1 << 2)
> @@ -132,9 +120,9 @@
>  #define bit_AVX5124VNNIW (1 << 2)
>  #define bit_AVX5124FMAPS (1 << 3)
>  #define bit_AVX512VP2INTERSECT   (1 << 8)
> -#define bit_AVX512FP16   (1 << 23)
> -#define bit_IBT  (1 << 20)
> -#define bit_UINTR (1 << 5)
> +#define bit_AVX512FP16   (1 << 23)
> +#define bit_IBT (1 << 20)
> +#define bit_UINTR   (1 << 5)
>  #define bit_PCONFIG  (1 << 18)
>  #define bit_SERIALIZE(1 << 14)
>  #define bit_TSXLDTRK(1 << 16)
> @@ -142,6 +130,21 @@
>  #define bit_AMX_TILE(1 << 24)
>  #define bit_AMX_INT8(1 << 25)
> 
> +/* Extended Features Sub-leaf (%eax == 7, %ecx == 1) */
> +/* %eax */
> +#define bit_RAOINT  (1 << 3)
> +#define bit_AVXVNNI (1 << 4)
> +#define bit_AVX512BF16  (1 << 5)
> +#define bit_CMPCCXADD   (1 << 7)
> +#define bit_AMX_FP16(1 << 21)
> +#define bit_HRESET  (1 << 22)
> +#define bit_AVXIFMA (1 << 23)
> +
> +/* %edx */
> +#define bit_AVXVNNIINT8 (1 << 4)
> +#define bit_AVXNECONVERT (1 << 5)
> +#define bit_PREFETCHI (1 << 14)
> +
>  /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
>  #define bit_XSAVEOPT (1 << 0)
>  #define bit_XSAVEC   (1 << 1)
> --
> 2.31.1

RE: [PATCH] i386: Share AES xmm intrin with VAES

2023-04-18 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, April 19, 2023 10:41 AM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH] i386: Share AES xmm intrin with VAES
> 
> > > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > 33e281901cf..e7d565a8389 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -25107,67 +25107,71 @@
> > >
> > > 
> > > ;;
> > > ;;
> > >
> > >  (define_insn "aesenc"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +  (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >   UNSPEC_AESENC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >"@
> > > aesenc\t{%2, %0|%0, %2}
> > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > > vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > TARGET_AVX512VL)" from condition.
> 
> Since VAES should not imply AES, we need that "|| (TARGET_VAES &&
> TARGET_AVX512VL)"
> 
> And there is no need to add vaes_avx512vl since the last alternative will only
> be hit when there is no aes. When there is no aes, the pattern will need vaes
> and avx512vl both or we could not use this pattern. avx512vl here is just 
> like a
> placeholder.
Ok, I see, then LGTM.
> 
> BRs,
> Haochen
> 
> > Similar for below patterns.
> > Others LGTM.
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > > (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesenclast"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +  (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >   UNSPEC_AESENCLAST))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >"@
> > > aesenclast\t{%2, %0|%0, %2}
> > > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> > > vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > > (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesdec"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2D

RE: [PATCH 1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet.

2023-04-22 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Vladimir Makarov 
> Sent: Saturday, April 22, 2023 3:26 AM
> To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Cc: crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH 1/2] Use NO_REGS in cost calculation when the
> preferred register class are not known yet.
> 
> 
> On 4/19/23 20:46, liuhongt via Gcc-patches wrote:
> > 1547  /* If this insn loads a parameter from its stack slot, then it
> > 1548 represents a savings, rather than a cost, if the parameter is
> > 1549 stored in memory.  Record this fact.
> > 1550
> > 1551 Similarly if we're loading other constants from memory (constant
> > 1552 pool, TOC references, small data areas, etc) and this is the only
> > 1553 assignment to the destination pseudo.
> >
> > At that time, preferred regclass is unknown, and GENERAL_REGS is used
> > to record memory move cost, but it's not accurate especially for large
> > vector modes, i.e. 512-bit vector in x86 which would most probably
> > allocate with SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS
> > here will overestimate the cost of this load and make RA propagate the
> > memeory operand into many consume instructions which causes worse
> performance.
> 
> For this case GENERAL_REGS was used in GCC practically all the time. You can
> check this in the old regclass.c file (existing until IRA introduction).
> 
> But I guess it is ok to use NO_REGS for this to promote more usage of
> registers instead of equiv memory and as a lot of code was changed since
> then (the old versions of GCC even did not support vector regs).
> 
> Although it would be nice to do some benchmarking (SPEC is preferable) for
> such kind of changes.
Thanks, I've run SPEC2017 on x86 ICX, no big performance change, a little bit 
code size improvement as expected(codesize of 1 load + multi ops should be 
smaller than multi ciscy ops).  
> 
> On the other hand, I expect that any performance regression (if any) will be
> reported anyway.
> 
> The patch is ok for me.  You can commit it into the trunk.
> 
> Thank you for addressing this issue.
> 
> > Fortunately, NO_REGS is used to record the best scenario, so the patch
> > uses NO_REGS instead of GENERAL_REGS here, it could help RA in
> PR108707.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and
> > aarch64-linux-gnu.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR rtl-optimization/108707
> > * ira-costs.cc (scan_one_insn): Use NO_REGS instead of
> > GENERAL_REGS when preferred reg_class is not known.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr108707.c: New test.

RE: [PATCH] i386: Fix up -Wuninitialized warnings in avx512erintrin.h [PR105593]

2023-01-31 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, January 31, 2023 4:09 PM
> To: Liu, Hongtao ; Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] i386: Fix up -Wuninitialized warnings in avx512erintrin.h
> [PR105593]
> 
> Hi!
> 
> As reported in the PR, there are some -Wuninitialized warnings in
> avx512erintrin.h.  One can see that by compiling sse-23.c testcase with -
> Wuninitialized (or when actually using those intrinsics).
> Those 6 spots use an uninitialized variable and pass it as one of the argument
> to a builtin with constant mask -1, because there is no unmasked builtin.  It 
> is
> true that expansion of those builtins into RTL will see mask is all ones and
> ignore the unneeded argument, but -Wuninitialized is diagnosed on GIMPLE
> and on GIMPLE these builtins are just builtin calls.
> avx512fintrin.h and other headers use in these cases the _mm*_undefined_*
> () intrinsics, like:
>   return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
>  (__v8di) __Y,
>  (__v8di)
>  _mm512_undefined_epi32 (),
>  (__mmask8) -1); etc. and the 
> following patch does
> the same for avx512erintrin.h.
> With the recent changes in C++ FE and the _mm*_undefined_* intrinsics, we
> don't emit -Wuninitialized warnings for those (previously we didn't just in C
> due to self-initialization).  Of course we could also just self-initialize 
> these
> uninitialized vars and add the #pragma GCC diagnostic dances around it, but
> using the intrinsics is consistent with the rest and IMHO cleaner.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok, thanks.
> 
> 2023-01-31  Jakub Jelinek  
> 
>   PR c++/105593
>   * config/i386/avx512erintrin.h (_mm512_exp2a23_round_pd,
>   _mm512_exp2a23_round_ps, _mm512_rcp28_round_pd,
> _mm512_rcp28_round_ps,
>   _mm512_rsqrt28_round_pd, _mm512_rsqrt28_round_ps): Use
>   _mm512_undefined_pd () or _mm512_undefined_ps () instead of
> using
>   uninitialized automatic variable __W.
> 
>   * gcc.target/i386/sse-23.c: Add -Wuninitialized to dg-options.
> 
> --- gcc/config/i386/avx512erintrin.h.jj   2023-01-16 11:52:15.944736113
> +0100
> +++ gcc/config/i386/avx512erintrin.h  2023-01-30 20:53:08.057769691
> +0100
> @@ -51,9 +51,8 @@ extern __inline __m512d  __attribute__
> ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_exp2a23_round_pd (__m512d __A, int __R)  {
> -  __m512d __W;
>return (__m512d) __builtin_ia32_exp2pd_mask ((__v8df) __A,
> -(__v8df) __W,
> +(__v8df) _mm512_undefined_pd
> (),
>  (__mmask8) -1, __R);
>  }
> 
> @@ -79,9 +78,8 @@ extern __inline __m512  __attribute__
> ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_exp2a23_round_ps (__m512 __A, int __R)  {
> -  __m512 __W;
>return (__m512) __builtin_ia32_exp2ps_mask ((__v16sf) __A,
> -   (__v16sf) __W,
> +   (__v16sf) _mm512_undefined_ps
> (),
> (__mmask16) -1, __R);
>  }
> 
> @@ -107,9 +105,8 @@ extern __inline __m512d  __attribute__
> ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_rcp28_round_pd (__m512d __A, int __R)  {
> -  __m512d __W;
>return (__m512d) __builtin_ia32_rcp28pd_mask ((__v8df) __A,
> - (__v8df) __W,
> + (__v8df)
> _mm512_undefined_pd (),
>   (__mmask8) -1, __R);
>  }
> 
> @@ -135,9 +132,8 @@ extern __inline __m512  __attribute__
> ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_rcp28_round_ps (__m512 __A, int __R)  {
> -  __m512 __W;
>return (__m512) __builtin_ia32_rcp28ps_mask ((__v16sf) __A,
> -(__v16sf) __W,
> +(__v16sf) _mm512_undefined_ps
> (),
>  (__mmask16) -1, __R);
>  }
> 
> @@ -229,9 +225,8 @@ extern __inline __m512d  __attribute__
> ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_rsqrt28_round_pd (__m512d __A, int __R)  {
> -  __m512d __W;
>return (__m512d) __builtin_ia32_rsqrt28pd_mask ((__v8df) __A,
> -   (__v8df) __W,
> +

RE: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-29 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Alexander Monakov 
> Sent: Friday, May 27, 2022 5:39 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Add a bit dislike for separate mem alternative when op is
> REG_P.
> 
> On Wed, 25 May 2022, liuhongt via Gcc-patches wrote:
> 
> > Rigt now, mem_cost for separate mem alternative is 1 * frequency which
> > is pretty small and caused the unnecessary SSE spill in the PR, I've
> > tried to rework backend cost model, but RA still not happy with
> > that(regress somewhere else). I think the root cause of this is cost for 
> > separate
> 'm'
> > alternative cost is too small, especially considering that the mov
> > cost of gpr are 2(default for REGISTER_MOVE_COST). So this patch
> > increase mem_cost to 2*frequency, also increase 1 for reg_class cost when m
> alternative.
> 
> In the PR, the spill happens in the initial basic block of the function, i.e.
> the one with the highest frequency.
> 
> Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids 
> the spill,
> even though it does not affect the frequency of the initial basic block, and
> makes the block with the use more rarely executed.

The spill is mainly decided by 3 insns related to r92

283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
285 (expr_list:REG_DEAD (reg:SF 102)

288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
{*movsi_internal}
290 (nil))

And
382(insn 28 27 29 5 (set (reg:DF 98)
383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
{*extendsfdf2}
384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
385(nil)))
386(insn 29 28 30 5 (s

The frequency the for INSN 3 and INSN 9 is not affected, but frequency of INSN 
28 drop from 805 -> 89 after swapping "unlikely" and "likely".
Because of that, GPR cost decreases a lot, finally make the RA choose GPR 
instead of MEM.

GENERAL_REGS:2356,2356 
SSE_REGS:6000,6000
MEM:4089,4089

Dump of 301.ira:
67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
   MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
MEM:4089,4089

And although there's no spill, there's an extra VMOVD in the later BB which 
looks suboptimal(Guess we can stand with that since it's cold.)

24vmovd   %eax, %xmm2
25vcvtss2sd   %xmm2, %xmm2, %xmm1
26vmulsd  %xmm0, %xmm1, %xmm0
27vcvtsd2ss   %xmm0, %xmm0, %xmm0
> 
> Do you have a root cause analysis that explains the above?
> 
> Alexander

RE: [PATCH] i386: Add AVX512BW to AVX512F in MASK_ISA2

2022-06-29 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Jiang, Haochen 
> Sent: Thursday, June 30, 2022 9:51 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: [PATCH] i386: Add AVX512BW to AVX512F in MASK_ISA2
> 
> Hi all,
> 
> I just found in MASK_ISA2_UNSET part, since AVX512BW is based on AVX512F,
> we should add OPTION_MASK_ISA2_AVX512BW_UNSET to AVX512F for
> maintainence convenience and logic correctness, or we will need to add all
> future ISAs based on AVX512BW in both AVX512F and AVX512BW. This will be
> easily forgot and might cause confusion.
> 
> Also remove the redundant ones in this change.
> 
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
LGTM.
> 
> BRs,
> Haochen
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_AVX512F_UNSET):
>   Add OPTION_MASK_ISA2_AVX512BW_UNSET, remove
>   OPTION_MASK_ISA2_AVX512BF16_UNSET and
>   OPTION_MASK_ISA2_AVX512FP16_UNSET.
> ---
>  gcc/common/config/i386/i386-common.cc | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/common/config/i386/i386-common.cc
> b/gcc/common/config/i386/i386-common.cc
> index cb878163492..c0c2ad74d87 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -315,11 +315,10 @@ along with GCC; see the file COPYING3.  If not see
> | OPTION_MASK_ISA_SSE_UNSET)
> 
>  #define OPTION_MASK_ISA2_AVX512F_UNSET \
> -  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> +  (OPTION_MASK_ISA2_AVX512BW_UNSET \
> | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
> | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
> -   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> -   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> +   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>OPTION_MASK_ISA2_SSE_UNSET
>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> --
> 2.18.1

RE: [PATCH] i386: Extend cvtps2pd to memory

2022-06-30 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, June 30, 2022 4:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> 
> On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak  wrote:
> >
> > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen 
> wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > To: Jiang, Haochen 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > 
> > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > >
> > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > 
> > > > > wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > work on memory operand but currently does not. After this fix,
> > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > >
> > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > >
> > > > > > BRs,
> > > > > > Haochen
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/43618
> > > > > > * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > > (sse2_cvtps2pd_load): Rename extendvsdfv2df2.
> >
> > Rename FROM ...
> >
> > Please also mention change to sse2_cvtps2pd.
> >
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > PR target/43618
> > > > > > * gcc.target/i386/pr43618-1.c: New test.
> > > > >
> > > > > This patch could be as simple as:
> > > > >
> > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > --- a/gcc/config/i386/sse.md
> > > > > +++ b/gcc/config/i386/sse.md
> > > > > @@ -9195,7 +9195,7 @@
> > > > > (define_insn "extendv2sfv2df2"
> > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > >(float_extend:V2DF
> > > > > - (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > >   "TARGET_MMX_WITH_SSE"
> > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > >   [(set_attr "type" "ssecvt")
> > > >
> > > > We also tested on this version, it is ok.
> > > >
> > > > The reason why the patch looks like this is because in the
> > > > previous insn sse2_cvtps2pd, the constraint vm and
> > > > vector_operand actually does not match the actual instruction.
> > > > Memory operand is V2SF, not V4SF.
> > > >
> > > > Therefore, we changed the constraint in that insn. Then it caused 
> > > > another
> issue.
> > > > For memory operand, it seems that we cannot generate those mask
> instructions.
> > > > So I change the pattern to how extendv2hfv2df2 works.
> > >
> > > If you want to change the memory access in sse2_cvtps2pd,
> > > then please see how e.g. v2hiv2di is handled in sse.md. In
> > > addition to two instructions, you will need one
> > > define_insn_and_split with a pre-reload splitter.
> >
> > Oh, nowadays combine does vec_select from a paradoxical subreg on its own.
> >
> > +(define_expand "extendv2sfv2df2"
> > +  [(set (match_operand:V2DF 0 "register_operand")
> > +(float_extend:V2DF
> > +  (match_operand:V2SF 1 "nonimmediate_operand")))]
> > +  "TARGET_MMX_WITH_SSE"
> > +{
> > +  if (!MEM_P (operands[1]))
> > +{
> >
> > You will need force reg here:
> >
> > rtx op1 = force_reg (V2SFmode, operands[1]);
> > +  operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
> > +  emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
> > +  DONE;
> > +}
> > +})
> >
> >
> > -(define_insn "extendv2sfv2df2"
> > +(define_insn "sse2_cvtps2pd_load"
> >
> > Please name this insn "*sse2_cvtps2pd_1". Please note the
> > star at the beginning, You don't have to make the name public.
> >
> > OK with the above changes.
> 
> Forgot to mention:
> 
> 
> - (match_operand:V2SF 1 "register_operand" "v")))]
> -  "TARGET_MMX_WITH_SSE"
> -  "%vcvtps2pd\t{%1, %0|%0, %1}"
> + (match_operand:V2SF 1 "memory_operand" "m")))]
> + "TARGET_MMX_WITH_SSE && "
> +  "%vcvtps2pd\t{%1, %0|%0 and2>, %q1}"
>[(set_attr "type" "ssecvt")
> 
> The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so we
> can use TARGET_SSE2 here.
> 
> Which opens the question if the expander could also be TARGET_SSE2 only.
> There are no MMX registers involved in any of the patterns anymore.
Yes.
> 
> Uros.
> >
> > Thanks,
> > Uros,

RE: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-20 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Wang, Hongyu 
> Sent: Saturday, November 19, 2022 2:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; ubiz...@gmail.com; Liu, Hongtao
> 
> Subject: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]
> 
> Hi,
> 
> Followed by the discussion in pr107602, -munroll-only-small-loops Does not
PR107692?
> turns on/off -funroll-loops, and current check in pass_rtl_unroll_loops::gate
> would cause -funroll-loops do not take effect. Revert the change about
> targetm.loop_unroll_adjust and apply the backend option change to strictly
> follow the rule that -funroll-loops takes full control of loop unrolling, and
> munroll-only-small-loops just change its behavior to unroll small size loops.
> 
> Bootstrapped and regtested on x86-64-pc-linux-gnu.
> 
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   PR target/107602
>   * common/config/i386/i386-common.cc (ix86_optimization_table):
>   Enable loop unroll O2, disable -fweb and -frename-registers
>   by default.
>   * config/i386/i386-options.cc
>   (ix86_override_options_after_change):
>   Disable small loop unroll when funroll-loops enabled, reset
>   cunroll_grow_size when it is not explicitly enabled.
>   (ix86_option_override_internal): Call
>   ix86_override_options_after_change instead of calling
>   ix86_recompute_optlev_based_flags and ix86_default_align
>   separately.
>   * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
>   factor if -munroll-only-small-loops enabled.
>   * loop-init.cc (pass_rtl_unroll_loops::gate): Do not enable
>   loop unrolling for -O2-speed.
>   (pass_rtl_unroll_loops::execute): Rmove
>   targetm.loop_unroll_adjust check.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/107602
>   * gcc.target/i386/pr86270.c: Add -fno-unroll-loops.
>   * gcc.target/i386/pr93002.c: Likewise.
> ---
>  gcc/common/config/i386/i386-common.cc   |  8 ++
>  gcc/config/i386/i386-options.cc | 34 ++---
>  gcc/config/i386/i386.cc | 18 -
>  gcc/loop-init.cc| 11 +++-
>  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
>  6 files changed, 49 insertions(+), 26 deletions(-)
> 
> diff --git a/gcc/common/config/i386/i386-common.cc
> b/gcc/common/config/i386/i386-common.cc
> index 6ce2a588adc..660a977b68b 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1808,7 +1808,15 @@ static const struct default_options
> ix86_option_optimization_table[] =
>  /* The STC algorithm produces the smallest code at -Os, for x86.  */
>  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
>REORDER_BLOCKS_ALGORITHM_STC },
> +
> +/* Turn on -funroll-loops with -munroll-only-small-loops to enable small
> +   loop unrolling at -O2.  */
> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>  { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL,
> 1 },
> +/* Turns off -frename-registers and -fweb which are enabled by
> +   funroll-loops.  */
> +{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> +{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>  /* Turn off -fschedule-insns by default.  It tends to make the
> problem with not enough registers even worse.  */
>  { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, diff --git
> a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index
> e5c77f3a84d..bc1d36e36a8 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1838,8 +1838,37 @@ ix86_recompute_optlev_based_flags (struct
> gcc_options *opts,  void  ix86_override_options_after_change (void)  {
> +  /* Default align_* from the processor table.  */
>ix86_default_align (&global_options);
> +
>ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> +
> +  /* Disable unrolling small loops when there's explicit
> + -f{,no}unroll-loop.  */
> +  if ((OPTION_SET_P (flag_unroll_loops))
> + || (OPTION_SET_P (flag_unroll_all_loops)
> +  && flag_unroll_all_loops))
> +{
> +  if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> + ix86_unroll_only_small_loops = 0;
> +  /* Re-enable -frename-registers and -fweb if funroll-loops
> +  enabled.  */
> +  if (!OPTION_SET_P (flag_web))
> + flag_web = flag_unroll_loops;
> +  if (!OPTION_SET_P (flag_rename_registers))
> + flag_rename_registers = flag_unroll_loo

RE: [PATCH] x86: Update model value for Alderlake and Rocketlake

2022-01-03 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Cui, Lili 
> Sent: Tuesday, January 4, 2022 1:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ;
> hjl.to...@gmail.com
> Subject: [PATCH] x86: Update model value for Alderlake and Rocketlake
> 
> Hi Uros,
> 
> This patch is to update model value for Alderlake and Rocketlake.
Just note the update is according to latest 
https://www.intel.com/content/dam/develop/public/us/en/documents/325462-sdm-vol-1-2abcd-3abcd.pdf
> 
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
> 
> OK for master?
> 
> gcc/ChangeLog
> 
>   * common/config/i386/cpuinfo.h (get_intel_cpu): Add new model
> values
>   to Alderlake and Rocketlake.
> ---
>  gcc/common/config/i386/cpuinfo.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index 2d8ea201ab5..61b1a0f291c 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -415,6 +415,7 @@ get_intel_cpu (struct __processor_model
> *cpu_model,
>cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
>break;
>  case 0xa7:
> +case 0xa8:
>/* Rocket Lake.  */
>cpu = "rocketlake";
>CHECK___builtin_cpu_is ("corei7"); @@ -487,6 +488,7 @@ get_intel_cpu
> (struct __processor_model *cpu_model,
>break;
>  case 0x97:
>  case 0x9a:
> +case 0xbf:
>/* Alder Lake.  */
>cpu = "alderlake";
>CHECK___builtin_cpu_is ("corei7");
> --
> 2.17.1
> 
> Thanks,
> Lili.

RE: [PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

2023-04-05 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Vladimir Makarov 
> Sent: Wednesday, April 5, 2023 8:59 PM
> To: Jeff Law ; Liu, Hongtao
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Check hard_regno_mode_ok before setting lowest
> memory move cost for the mode with different reg classes.
> 
> 
> On 4/4/23 21:29, Jeff Law wrote:
> >
> >
> > On 4/3/23 23:13, liuhongt via Gcc-patches wrote:
> >> There's a potential performance issue when backend returns some
> >> unreasonable value for the mode which can be never be allocate with
> >> reg class.
> >>
> >> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >> Ok for trunk(or GCC14 stage1)?
> >>
> >> gcc/ChangeLog:
> >>
> >> PR rtl-optimization/109351
> >> * ira.cc (setup_class_subset_and_memory_move_costs): Check
> >> hard_regno_mode_ok before setting lowest memory move cost for
> >> the mode with different reg classes.
> > Not a regression *and* changing register allocation.  This seems like
> > it should defer to gcc-14.
> >
> Yes, I am agree.  It should wait for gcc-14, especially when we are close to 
> the
> release. Also the testing x86-64 is not enough for such changes (although I
> tried ppc64le and did not find any problem).
> 
> Cost related patches for RA frequently result in new testsuite failures on
> some targets.  Even if the change seems obvious and expected to improve
> the generated code.
> 
> Target dependent code sometimes defines correctly the costs only for some
> possible cases and making less dependent from this pitfall is good.  So I 
> think
> the patch moves us to the right direction.
> 
> The patch is ok for me to commit it to the trunk after the gcc-13 release and 
> if
> arm64 testing shows no GCC testsuite regression.
Bootstrapped and regtested on aarch64-unknown-linux-gnu.
Waiting for GCC14.
> 
> Thank you for working on this issue.
>

84 matches

Mail list logo