from:"Jiang, Haochen"

[r14-4629 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-10-17 Thread Jiang, Haochen

On Linux/x86_64,

3179ad72f67f31824c444ef30ef171ad7495d274 is the first bad commit
commit 3179ad72f67f31824c444ef30ef171ad7495d274
Author: Richard Biener rguent...@suse.de
Date:   Fri Oct 13 12:32:51 2023 +0200

OMP SIMD inbranch call vectorization for AVX512 style masks

caused

FAIL: gcc.dg/vect/vect-simd-clone-16b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4629/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

RE: [PATCH 0/3] Add Intel new cpu archs

2023-10-17 Thread Jiang, Haochen



> -Original Message-
> From: Hongtao Liu 
> Sent: Wednesday, October 18, 2023 8:25 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH 0/3] Add Intel new cpu archs
> 
> On Mon, Oct 16, 2023 at 2:25 PM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > The patches aim to add new cpu archs Clear Water Forest and Panther
> > Lake. Here comes the documentation:
> >
> > https://cdrdv2.intel.com/v1/dl/getContent/671368
> >
> > Also in the patches, I refactored how we detect cpu according to
> > features and added m_CORE_ATOM.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> Ok, please also update https://gcc.gnu.org/gcc-14/changes.html with your
> patches and USER_MSR.

I will commit the patches with naming change from Clear Water Forest to
Clearwater Forest.

Thx,
Haochen

> >
> > Thx,
> > Haochen
> >
> >
> 
> 
> 
> --
> BR,
> Hongtao

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread Jiang, Haochen

It seems that the mail got caught elsewhere and did not send into gcc-patches
mailing thread. Resending that.

Thx,
Haochen

-Original Message-
From: Jiang, Haochen 
Sent: Tuesday, October 24, 2023 4:43 PM
To: HAO CHEN GUI ; Richard Sandiford 

Cc: gcc-patches 
Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces 
[PR111449]

Hi Haochen Gui,

It seems that the commit caused lots of test case fail on x86 platforms:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html

Please help verify that if we need some testcase change or we get bug here.

A simple reproducer under build folder is:

make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: HAO CHEN GUI 
> Sent: Monday, October 23, 2023 9:30 AM
> To: Richard Sandiford 
> Cc: gcc-patches 
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for 
> compare_by_pieces [PR111449]
> 
> Committed as r14-4835.
> 
> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
> 
> Thanks
> Gui Haochen
> 
> 在 2023/10/20 16:49, Richard Sandiford 写道:
> > HAO CHEN GUI  writes:
> >> Hi,
> >>   Vector mode instructions are efficient for compare on some targets.
> >> This patch enables vector mode for compare_by_pieces. Two help 
> >> functions are added to check if vector mode is available for 
> >> certain by pieces operations and if if optabs exists for the mode 
> >> and certain by pieces operations. One member is added in class 
> >> op_by_pieces_d to record the type of operations.
> >>
> >>   The test case is in the second patch which is rs6000 specific.
> >>
> >>   Compared to last version, the main change is to add a target hook 
> >> check - scalar_mode_supported_p when retrieving the available 
> >> scalar modes. The mode which is not supported for a target should be 
> >> skipped.
> >> (e.g. TImode on ppc). Also some function names and comments are 
> >> refined according to reviewer's advice.
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with 
> >> no regressions.
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> Expand: Enable vector mode for by pieces compares
> >>
> >> Vector mode compare instructions are efficient for equality compare 
> >> on rs6000. This patch refactors the codes of by pieces operation to 
> >> enable vector mode for compare.
> >>
> >> gcc/
> >>PR target/111449
> >>* expr.cc (can_use_qi_vectors): New function to return true if
> >>we know how to implement OP using vectors of bytes.
> >>(qi_vector_mode_supported_p): New function to check if optabs
> >>exists for the mode and certain by pieces operations.
> >>(widest_fixed_size_mode_for_size): Replace the second argument
> >>with the type of by pieces operations.  Call can_use_qi_vectors
> >>and qi_vector_mode_supported_p to do the check.  Call
> >>scalar_mode_supported_p to check if the scalar mode is supported.
> >>(by_pieces_ninsns): Pass the type of by pieces operation to
> >>widest_fixed_size_mode_for_size.
> >>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
> >>record the type of by pieces operations.
> >>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
> >>type of by pieces operations, initialize m_op with it.  Pass
> >>m_op to function widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::get_usable_mode): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
> >>can_use_qi_vectors and qi_vector_mode_supported_p to do the
> >>check.
> >>(op_by_pieces_d::run): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(move_by_pieces_d::move_by_pieces_d): Set m_op to
> MOVE_BY_PIECES.
> >>(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
> >>(can_store_by_pieces): Pass the type of by pieces operations to
> >>

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread Jiang, Haochen

Hi Haochen Gui,

It seems that the commit caused lots of test case fail on x86 platforms:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html

Please help verify that if we need some testcase change or we get bug here.

A simple reproducer under build folder is:

make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: HAO CHEN GUI 
> Sent: Monday, October 23, 2023 9:30 AM
> To: Richard Sandiford 
> Cc: gcc-patches 
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> Committed as r14-4835.
> 
> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
> 
> Thanks
> Gui Haochen
> 
> 在 2023/10/20 16:49, Richard Sandiford 写道:
> > HAO CHEN GUI  writes:
> >> Hi,
> >>   Vector mode instructions are efficient for compare on some targets.
> >> This patch enables vector mode for compare_by_pieces. Two help
> >> functions are added to check if vector mode is available for certain
> >> by pieces operations and if if optabs exists for the mode and certain
> >> by pieces operations. One member is added in class op_by_pieces_d to
> >> record the type of operations.
> >>
> >>   The test case is in the second patch which is rs6000 specific.
> >>
> >>   Compared to last version, the main change is to add a target hook
> >> check - scalar_mode_supported_p when retrieving the available scalar
> >> modes. The mode which is not supported for a target should be skipped.
> >> (e.g. TImode on ppc). Also some function names and comments are refined
> >> according to reviewer's advice.
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >> regressions.
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> Expand: Enable vector mode for by pieces compares
> >>
> >> Vector mode compare instructions are efficient for equality compare on
> >> rs6000. This patch refactors the codes of by pieces operation to enable
> >> vector mode for compare.
> >>
> >> gcc/
> >>PR target/111449
> >>* expr.cc (can_use_qi_vectors): New function to return true if
> >>we know how to implement OP using vectors of bytes.
> >>(qi_vector_mode_supported_p): New function to check if optabs
> >>exists for the mode and certain by pieces operations.
> >>(widest_fixed_size_mode_for_size): Replace the second argument
> >>with the type of by pieces operations.  Call can_use_qi_vectors
> >>and qi_vector_mode_supported_p to do the check.  Call
> >>scalar_mode_supported_p to check if the scalar mode is supported.
> >>(by_pieces_ninsns): Pass the type of by pieces operation to
> >>widest_fixed_size_mode_for_size.
> >>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
> >>record the type of by pieces operations.
> >>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
> >>type of by pieces operations, initialize m_op with it.  Pass
> >>m_op to function widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::get_usable_mode): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
> >>can_use_qi_vectors and qi_vector_mode_supported_p to do the
> >>check.
> >>(op_by_pieces_d::run): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(move_by_pieces_d::move_by_pieces_d): Set m_op to
> MOVE_BY_PIECES.
> >>(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
> >>(can_store_by_pieces): Pass the type of by pieces operations to
> >>widest_fixed_size_mode_for_size.
> >>(clear_by_pieces): Initialize class store_by_pieces_d with
> >>CLEAR_BY_PIECES.
> >>(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
> >>COMPARE_BY_PIECES.
> >
> > OK, thanks.  And thanks for your patience.
> >
> > Richard
> >
> >> patch.diff
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index 2c9930ec674..ad5f9dd8ec2 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int
> max_pieces, unsigned int align)
> >>return align;
> >>  }
> >>
> >> -/* Return the widest QI vector, if QI_MODE is true, or integer mode
> >> -   that is narrower than SIZE bytes.  */
> >> +/* Return true if we know how to implement OP using vectors of bytes.  */
> >> +static bool
> >> +can_use_qi_vectors (by_pieces_operation op)
> >> +{
> >> +  return (op == COMPARE_BY_PIECES
> >> +|| op == SET_BY_PIECES
> >> +|| op == CLEAR_BY_PIECES);
> >> +}
> >> +

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread Jiang, Haochen

> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, October 25, 2023 4:40 PM
> To: HAO CHEN GUI 
> Cc: Jiang, Haochen ; gcc-patches  patc...@gcc.gnu.org>
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> HAO CHEN GUI  writes:
> > Hi Haochen,
> >   The regression cases are caused by "targetm.scalar_mode_supported_p"
> > I added for scalar mode checking. XImode, OImode and TImode (with
> > -m32) are not enabled in ix86_scalar_mode_supported_p. So they're
> > excluded from by pieces operations on i386.
> >
> >   The original code doesn't do a check for scalar modes. I think it
> > might be incorrect as not all scalar modes support move and compare optabs. 
> > (e.g.
> > TImode with -m32 on rs6000).
> >
> >   I drafted a new patch to manually check optabs for scalar mode. Now
> > both vector and scalar modes are checked for optabs.
> >
> >   I did a simple test. All former regression cases are back. Could you
> > help do a full regression test? I am worry about the coverage of my CI 
> > system.

Thanks for that. I am running the regression test now.

Thx,
Haochen

> 
> Thanks for the quick fix.  The patch LGTM FWIW.  Just a small suggestion for
> the function name:
> 
> >
> > Thanks
> > Gui Haochen
> >
> > patch.diff
> > diff --git a/gcc/expr.cc b/gcc/expr.cc index 7aac575eff8..2af9fcbed18
> > 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
> >  /* Return true if optabs exists for the mode and certain by pieces
> > operations.  */
> >  static bool
> > -qi_vector_mode_supported_p (fixed_size_mode mode,
> by_pieces_operation
> > op)
> > +mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> 
> Might be worth calling this something more specific, such as
> by_pieces_mode_supported_p.
> 
> Otherwise the patch is OK for trunk if it passes the x86 testing.
> 
> Thanks,
> Richard
> 
> >  {
> > +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> > +return false;
> > +
> >if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> > -  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
> > -return true;
> > +  && VECTOR_MODE_P (mode)
> > +  && optab_handler (vec_duplicate_optab, mode) ==
> CODE_FOR_nothing)
> > +return false;
> >
> >if (op == COMPARE_BY_PIECES
> > -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> > -  && can_compare_p (EQ, mode, ccp_jump))
> > -return true;
> > +  && !can_compare_p (EQ, mode, ccp_jump))
> > +return false;
> >
> > -  return false;
> > +  return true;
> >  }
> >
> >  /* Return the widest mode that can be used to perform part of an @@
> > -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size,
> by_pieces_operation op)
> >   {
> > if (GET_MODE_SIZE (candidate) >= size)
> >   break;
> > -   if (qi_vector_mode_supported_p (candidate, op))
> > +   if (mode_supported_p (candidate, op))
> >   result = candidate;
> >   }
> >
> > @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int
> size, by_pieces_operation op)
> >  {
> >mode = tmode.require ();
> >if (GET_MODE_SIZE (mode) < size
> > - && targetm.scalar_mode_supported_p (mode))
> > + && mode_supported_p (mode, op))
> >result = mode;
> >  }
> >
> > @@ -1454,7 +1457,7 @@
> op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size)
> >   break;
> >
> > if (GET_MODE_SIZE (candidate) >= size
> > -   && qi_vector_mode_supported_p (candidate, m_op))
> > +   && mode_supported_p (candidate, m_op))
> >   return candidate;
> >   }
> >  }

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Jiang, Haochen
> Sent: Wednesday, October 25, 2023 4:47 PM
> To: Richard Sandiford ; HAO CHEN GUI
> 
> Cc: gcc-patches 
> Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Wednesday, October 25, 2023 4:40 PM
> > To: HAO CHEN GUI 
> > Cc: Jiang, Haochen ; gcc-patches  > patc...@gcc.gnu.org>
> > Subject: Re: [PATCH-1v4, expand] Enable vector mode for
> > compare_by_pieces [PR111449]
> >
> > HAO CHEN GUI  writes:
> > > Hi Haochen,
> > >   The regression cases are caused by "targetm.scalar_mode_supported_p"
> > > I added for scalar mode checking. XImode, OImode and TImode (with
> > > -m32) are not enabled in ix86_scalar_mode_supported_p. So they're
> > > excluded from by pieces operations on i386.
> > >
> > >   The original code doesn't do a check for scalar modes. I think it
> > > might be incorrect as not all scalar modes support move and compare
> optabs. (e.g.
> > > TImode with -m32 on rs6000).
> > >
> > >   I drafted a new patch to manually check optabs for scalar mode.
> > > Now both vector and scalar modes are checked for optabs.
> > >
> > >   I did a simple test. All former regression cases are back. Could
> > > you help do a full regression test? I am worry about the coverage of my CI
> system.
> 
> Thanks for that. I am running the regression test now.

The patch works. Thanks a lot!

Sorry for the delay since my CI accidentally crashed.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> >
> > Thanks for the quick fix.  The patch LGTM FWIW.  Just a small
> > suggestion for the function name:
> >
> > >
> > > Thanks
> > > Gui Haochen
> > >
> > > patch.diff
> > > diff --git a/gcc/expr.cc b/gcc/expr.cc index
> > > 7aac575eff8..2af9fcbed18
> > > 100644
> > > --- a/gcc/expr.cc
> > > +++ b/gcc/expr.cc
> > > @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation
> op)
> > >  /* Return true if optabs exists for the mode and certain by pieces
> > > operations.  */
> > >  static bool
> > > -qi_vector_mode_supported_p (fixed_size_mode mode,
> > by_pieces_operation
> > > op)
> > > +mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> >
> > Might be worth calling this something more specific, such as
> > by_pieces_mode_supported_p.
> >
> > Otherwise the patch is OK for trunk if it passes the x86 testing.
> >
> > Thanks,
> > Richard
> >
> > >  {
> > > +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> > > +return false;
> > > +
> > >if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> > > -  && optab_handler (vec_duplicate_optab, mode) !=
> CODE_FOR_nothing)
> > > -return true;
> > > +  && VECTOR_MODE_P (mode)
> > > +  && optab_handler (vec_duplicate_optab, mode) ==
> > CODE_FOR_nothing)
> > > +return false;
> > >
> > >if (op == COMPARE_BY_PIECES
> > > -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> > > -  && can_compare_p (EQ, mode, ccp_jump))
> > > -return true;
> > > +  && !can_compare_p (EQ, mode, ccp_jump))
> > > +return false;
> > >
> > > -  return false;
> > > +  return true;
> > >  }
> > >
> > >  /* Return the widest mode that can be used to perform part of an @@
> > > -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int
> > > size,
> > by_pieces_operation op)
> > > {
> > >   if (GET_MODE_SIZE (candidate) >= size)
> > > break;
> > > - if (qi_vector_mode_supported_p (candidate, op))
> > > + if (mode_supported_p (candidate, op))
> > > result = candidate;
> > > }
> > >
> > > @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int
> > size, by_pieces_operation op)
> > >  {
> > >mode = tmode.require ();
> > >if (GET_MODE_SIZE (mode) < size
> > > -   && targetm.scalar_mode_supported_p (mode))
> > > +   && mode_supported_p (mode, op))
> > >result = mode;
> > >  }
> > >
> > > @@ -1454,7 +1457,7 @@
> > op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size)
> > > break;
> > >
> > >   if (GET_MODE_SIZE (candidate) >= size
> > > - && qi_vector_mode_supported_p (candidate, m_op))
> > > + && mode_supported_p (candidate, m_op))
> > > return candidate;
> > > }
> > >  }

RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march support

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang
> via Gcc-patches
> Sent: Monday, July 17, 2023 11:34 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march
> support
> 
> Hi all,
> 
> This patch adds documentation to wwwdocs to mention the recent
> introduction of Intel new ISA and march.
> 
> Ok for trunk?

I will commit the patch next Monday if there is no objection.

Thx,
Haochen

> 
> BRs,
> Haochen
> 
> ---
>  htdocs/gcc-13/changes.html |  4 
>  htdocs/gcc-14/changes.html | 34
> +-
>  2 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> 39414e18..68e8c5cc 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -593,6 +593,10 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Granite Rapids through
>  -march=graniterapids.
> +The switch enables the AMX-FP16, PREFETCHI ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Granite Rapids D through
> +-march=graniterapids-d.
>  The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
> extensions.
>
>GCC now supports AMD CPUs based on the znver4 core
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index
> 3f797642..dad1ba53 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -108,7 +108,39 @@ a work-in-progress.
> 
>  
> 
> -
> +IA-32/x86-64
> +
> +  New ISA extension support for Intel AVX-VNNI-INT16 was added.
> +  AVX-VNNI-INT16 intrinsics are available via the -
> mavxvnniint16
> +  compiler switch.
> +  
> +  New ISA extension support for Intel SHA512 was added.
> +  SHA512 intrinsics are available via the -msha512
> +  compiler switch.
> +  
> +  New ISA extension support for Intel SM3 was added.
> +  SM3 intrinsics are available via the -msm3
> +  compiler switch.
> +  
> +  New ISA extension support for Intel SM4 was added.
> +  SM4 intrinsics are available via the -msm4
> +  compiler switch.
> +  
> +  GCC now supports the Intel CPU named Arrow Lake through
> +-march=arrowlake.
> +Based on Alder Lake, the switch further enables the AVX-IFMA,
> +AVX-VNNI-INT8, AVX-NE-CONVERT and CMPccXADD ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Arrow Lake S through
> +-march=arrowlake-s.
> +Based on Arrow Lake, the switch further enables the AVX-VNNI-INT16,
> SHA512,
> +SM3 and SM4 ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Lunar Lake through
> +-march=lunarlake.
> +Lunar Lake is based on Arrow Lake S.
> +  
> +
> 
>  
> 
> --
> 2.31.1

RE: [gccwwwdocs PATCH] gcc-13/14: Mention Intel new ISA and march support

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Monday, October 23, 2023 10:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ger...@pfeifer.com; ubiz...@gmail.com; Liu, Hongtao
> 
> Subject: [gccwwwdocs PATCH] gcc-13/14: Mention Intel new ISA and march
> support
> 
> Hi all,
> 
> This patch mentions recent update for x86-64 backend, including ISAs enabled
> update on previous introduced CPU and newly introduced
> options/ISAs/CPUs.
> 
> Ok for wwwdocs?

I will commit the patch if there is no objection.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> ---
>  htdocs/gcc-13/changes.html |  8   htdocs/gcc-14/changes.html | 19
> +++
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> 10c54689..8ef3d639 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -579,13 +579,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Sierra Forest through
>  -march=sierraforest.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT and
> -CMPccXADD ISA extensions.
> +The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> +ENQCMD and UINTR ISA extensions.
>
>GCC now supports the Intel CPU named Grand Ridge through
>  -march=grandridge.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD
> -and RAO-INT ISA extensions.
> +The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> +ENQCMD, UINTR and RAO-INT ISA extensions.
>
>GCC now supports the Intel CPU named Emerald Rapids through
>  -march=emeraldrapids.
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index
> c817dde4..4f71061f 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -186,6 +186,10 @@ a work-in-progress.
> 
>  IA-32/x86-64
>  
> +  New compiler option -m[no-]evex512 was added.
> +  The compiler switch enables/disables 512 bit vector and 64 bit mask
> +  register. It will be default on if AVX512F is enabled.
> +  
>New ISA extension support for Intel AVX-VNNI-INT16 was added.
>AVX-VNNI-INT16 intrinsics are available via the -
> mavxvnniint16
>compiler switch.
> @@ -202,6 +206,16 @@ a work-in-progress.
>SM4 intrinsics are available via the -msm4
>compiler switch.
>
> +  New ISA extension support for Intel USER_MSR was added.
> +  USER_MSR intrinsics are available via the -muser_msr
> +  compiler switch.
> +  
> +  GCC now supports the Intel CPU named Clearwater Forest through
> +-march=clearwaterforest.
> +Based on Sierra Forest, the switch further enables the AVX-VNNI-INT16,
> +SHA512, SM3, SM4, USER_MSR and PREFETCHI ISA extensions.
> +extensions.
> +  
>GCC now supports the Intel CPU named Arrow Lake through
>  -march=arrowlake.
>  Based on Alder Lake, the switch further enables the AVX-IFMA, @@ -216,6
> +230,11 @@ a work-in-progress.
>  -march=lunarlake.
>  Lunar Lake is based on Arrow Lake S.
>
> +  GCC now supports the Intel CPU named Panther Lake through
> +-march=pantherlake.
> +Based on Arrow Lake S, the switch further enables the PREFETCHI ISA
> +extensions.
> +  
>  
> 
>  
> --
> 2.31.1

RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march support

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Jiang, Haochen 
> Sent: Friday, October 27, 2023 10:52 AM
> To: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com; Gerald Pfeifer
> 
> Subject: RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and
> march support
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen
> > bounces+Jiang
> > via Gcc-patches
> > Sent: Monday, July 17, 2023 11:34 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; ubiz...@gmail.com
> > Subject: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and
> > march support
> >
> > Hi all,
> >
> > This patch adds documentation to wwwdocs to mention the recent
> > introduction of Intel new ISA and march.
> >
> > Ok for trunk?
> 
> I will commit the patch next Monday if there is no objection.

Sorry for the disturb since I find the wrong mail to reply because they
are too similar.

> 
> Thx,
> Haochen
> 
> >
> > BRs,
> > Haochen
> >
> > ---
> >  htdocs/gcc-13/changes.html |  4 
> >  htdocs/gcc-14/changes.html | 34
> > +-
> >  2 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 39414e18..68e8c5cc 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -593,6 +593,10 @@ You may also want to check out our
> >
> >GCC now supports the Intel CPU named Granite Rapids through
> >  -march=graniterapids.
> > +The switch enables the AMX-FP16, PREFETCHI ISA extensions.
> > +  
> > +  GCC now supports the Intel CPU named Granite Rapids D through
> > +-march=graniterapids-d.
> >  The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
> > extensions.
> >
> >GCC now supports AMD CPUs based on the znver4 core
> > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> > index
> > 3f797642..dad1ba53 100644
> > --- a/htdocs/gcc-14/changes.html
> > +++ b/htdocs/gcc-14/changes.html
> > @@ -108,7 +108,39 @@ a work-in-progress.
> >
> >  
> >
> > -
> > +IA-32/x86-64
> > +
> > +  New ISA extension support for Intel AVX-VNNI-INT16 was added.
> > +  AVX-VNNI-INT16 intrinsics are available via the -
> > mavxvnniint16
> > +  compiler switch.
> > +  
> > +  New ISA extension support for Intel SHA512 was added.
> > +  SHA512 intrinsics are available via the -msha512
> > +  compiler switch.
> > +  
> > +  New ISA extension support for Intel SM3 was added.
> > +  SM3 intrinsics are available via the -msm3
> > +  compiler switch.
> > +  
> > +  New ISA extension support for Intel SM4 was added.
> > +  SM4 intrinsics are available via the -msm4
> > +  compiler switch.
> > +  
> > +  GCC now supports the Intel CPU named Arrow Lake through
> > +-march=arrowlake.
> > +Based on Alder Lake, the switch further enables the AVX-IFMA,
> > +AVX-VNNI-INT8, AVX-NE-CONVERT and CMPccXADD ISA extensions.
> > +  
> > +  GCC now supports the Intel CPU named Arrow Lake S through
> > +-march=arrowlake-s.
> > +Based on Arrow Lake, the switch further enables the
> > + AVX-VNNI-INT16,
> > SHA512,
> > +SM3 and SM4 ISA extensions.
> > +  
> > +  GCC now supports the Intel CPU named Lunar Lake through
> > +-march=lunarlake.
> > +Lunar Lake is based on Arrow Lake S.
> > +  
> > +
> >
> >  
> >
> > --
> > 2.31.1

RE: [x86 PATCH] PR target/110551: Fix reg allocation for widening multiplications.

2023-10-29 Thread Jiang, Haochen

Hi Roger,

It seems that your patch caused some regression on x86_64:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078390.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html

Could you help verify that?

A simple reproducer under build folder will be:

make check RUNTESTFLAGS="conformance.exp=std/time/year_month_day/io.cc 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: Roger Sayle 
> Sent: Wednesday, October 18, 2023 10:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' ; tobias.bur...@siemens.com
> Subject: RE: [x86 PATCH] PR target/110551: Fix reg allocation for widening
> multiplications.
> 
> 
> Many thanks to Tobias Burnus for pointing out the mistake/typo in the PR
> number.
> This fix is for PR 110551, not PR 110511.  I'll update the ChangeLog and
> filename
> of the new testcase, if approved.
> 
> Sorry for any inconvenience/confusion.
> Cheers,
> Roger
> --
> 
> > -Original Message-
> > From: Roger Sayle 
> > Sent: 17 October 2023 20:06
> > To: 'gcc-patches@gcc.gnu.org' 
> > Cc: 'Uros Bizjak' 
> > Subject: [x86 PATCH] PR target/110511: Fix reg allocation for widening
> > multiplications.
> >
> >
> > This patch contains clean-ups of the widening multiplication patterns in
> i386.md,
> > and provides variants of the existing highpart multiplication
> > peephole2 transformations (that tidy up register allocation after reload),
> and
> > thereby fixes PR target/110511, which is a superfluous move instruction.
> >
> > For the new test case, compiled on x86_64 with -O2.
> >
> > Before:
> > mulx64: movabsq $-7046029254386353131, %rcx
> > movq%rcx, %rax
> > mulq%rdi
> > xorq%rdx, %rax
> > ret
> >
> > After:
> > mulx64: movabsq $-7046029254386353131, %rax
> > mulq%rdi
> > xorq%rdx, %rax
> > ret
> >
> > The clean-ups are (i) that operand 1 is consistently made register_operand
> and
> > operand 2 becomes nonimmediate_operand, so that predicates match the
> > constraints, (ii) the representation of the BMI2 mulx instruction is
> updated to use
> > the new umul_highpart RTX, and (iii) because operands
> > 0 and 1 have different modes in widening multiplications, "a" is a more
> > appropriate constraint than "0" (which avoids spills/reloads containing
> SUBREGs).
> > The new peephole2 transformations are based upon those at around line
> 9951
> of
> > i386.md, that begins with the comment ;; Highpart multiplication
> peephole2s to
> > tweak register allocation.
> > ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq %rdi
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> > make -k check, both with and without --target_board=unix{-m32} with no
> new
> > failures.  Ok for mainline?
> >
> >
> > 2023-10-17  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/110511
> > * config/i386/i386.md (mul3): Make operands 1 and
> > 2 take "regiser_operand" and "nonimmediate_operand" respectively.
> > (mulqihi3): Likewise.
> > (*bmi2_umul3_1): Operand 2 needs to be
> register_operand
> > matching the %d constraint.  Use umul_highpart RTX to represent
> > the highpart multiplication.
> > (*umul3_1):  Operand 2 should use regiser_operand
> > predicate, and "a" rather than "0" as operands 0 and 2 have
> > different modes.
> > (define_split): For mul to mulx conversion, use the new
> > umul_highpart RTX representation.
> > (*mul3_1):  Operand 1 should be register_operand
> > and the constraint %a as operands 0 and 1 have different modes.
> > (*mulqihi3_1): Operand 1 should be register_operand matching
> > the constraint %0.
> > (define_peephole2): Providing widening multiplication variants
> > of the peephole2s that tweak highpart multiplication register
> > allocation.
> >
> > gcc/testsuite/ChangeLog
> > PR target/110511
> > * gcc.target/i386/pr110511.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
>

RE: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Jiang, Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, November 2, 2023 3:23 AM
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> using peephole2.
> 
> On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle 
> wrote:
> >
> >
> > Hi Uros,
> >
> > > From: Uros Bizjak 
> > > Sent: 01 November 2023 10:05
> > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register
> allocation
> > > using peephole2.
> > >
> > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle
> 
> > > wrote:
> > > >
> > > >
> > > > This patch is a follow-up to my previous PR target/110551 patch, this
> > > > time to address the additional move after mulx, seen on TARGET_BMI2
> > > > architectures (such as -march=haswell).  The complication here is that
> > > > the flexible multiple-set mulx instruction is introduced into RTL
> > > > after reload, by split2, and therefore can't benefit from register
> > > > preferencing.  This results in RTL like the following:
> > > >
> > > > (insn 32 31 17 2 (parallel [
> > > > (set (reg:DI 4 si [orig:101 r ] [101])
> > > > (mult:DI (reg:DI 1 dx [109])
> > > > (reg:DI 5 di [109])))
> > > > (set (reg:DI 5 di [ r+8 ])
> > > > (umul_highpart:DI (reg:DI 1 dx [109])
> > > > (reg:DI 5 di [109])))
> > > > ]) "pr110551-2.c":8:17 -1
> > > >  (nil))
> > > >
> > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> > > >  (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > > > (nil)))
> > > >
> > > > Here insn 32, the mulx instruction, places its results in si and di,
> > > > and then immediately after decides to move di to ax, with di now dead.
> > > > This can be trivially cleaned up by a peephole2.  I've added an
> > > > additional constraint that the two SET_DESTs can't be the same
> > > > register to avoid confusing the middle-end, but this has well-defined
> > > > behaviour on x86_64/BMI2, encoding a umul_highpart.
> > > >
> > > > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> > > >
> > > > Before:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > > mulx%rdi, %rsi, %rdi
> > > > movq%rdi, %rax
> > > > xorq%rsi, %rax
> > > > ret
> > > >
> > > > After:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > > mulx%rdi, %rsi, %rax
> > > > xorq%rsi, %rax
> > > > ret
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > > and make -k check, both with and without --target_board=unix{-m32}
> > > > with no new failures.  Ok for mainline?
> > >
> > > It looks that your previous PR110551 patch regressed -march=cascadelake 
> > > [1].

Actually it is not only on -march=cascadelake, w/o -march=cascadelake will also
fail.

Thx,
Haochen

> > > Let's fix these regressions first.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> > >
> > > Uros.
> >
> > This patch fixes that "regression".  Originally, the test case in PR110551 
> > contained
> > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
> > targets, including -march=haswell and -march=cascadelake.  The first patch
> > eliminated one of these MOVs, this patch eliminates the second.  I'm not 
> > sure
> > that you can call it a regression, the added test failed when run with a 
> > non-standard
> > -march setting.  The good news is that test case doesn't have to be changed 
> > with
> > this patch applied, i.e. the correct intended behaviour is no MOVs on all
> architectures.
> 
> I was not worried about the extra mov, but more about [2], also
> referred from [1], but it looks that that regression was wrongly
> attributed to your patch.
> 
> [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html
> 
> > I'll admit the timing is unusual; I had already written and was regression 
> > testing a
> > patch for the BMI2 issue, when the -march=cascadelake regression tester let 
> > me
> > know it was required for folks that helpfully run the regression suite with 
> > non
> > standard settings.  i.e. a long standing bug that wasn't previously tested 
> > for by
> > the testsuite.
> >
> > > > 2023-10-30  Roger Sayle  
> > > >
> > > > gcc/ChangeLog
> > > > PR target/110551
> > > > * config/i386/i386.md (*bmi2_umul3_1): Tidy condition
> > > > as operands[2] with predicate register_operand must be !MEM_P.
> > > > (peephole2): Optimize a mulx followed by a register-to-register
> > > > move, to place result in the correct destination if possible.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > > PR target/110551
> > > > * gcc.target/i386/pr110551-2.c: New test case.
> 
> The patch is OK.
> 
> Thanks,
> Uros.

RE: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Jiang, Haochen

Hi Jakub,

Sorry for the late response since I am on vacation for now.

> As the following testcase shows, the above change was incorrect.
> 
> Using aes isa for the second alternative is obviously wrong, aes is enabled
> whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> means that for -maes -mno-avx RA can choose, either it matches the first
> alternative with the dup operand, or it matches the second one (but that
> is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

When I wrote that patch, I suppose it will never match the second one when
AVX is not enabled because it will immediately drop to the first one so the
second one is automatically AES && AVX, which is tricky here.

But this patch is buggy when "-maes -mavx512vl -mno-vaes" with %xmm16+ so
your change is needed, really appreciate that.

> 
> The big question is if "Since VAES should not imply AES" is the case or not.
> Looking around at what LLVM does on godbolt, seems since clang 6 which added
> -mvaes support -mvaes there implies -maes, but GCC treats those two
> independent.
> 
> Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
> imply -mno-vaes, then we should probably just revert the above patch and
> tweak common/config/i386/ to do the implications (+ add the testcase from
> this patch).

LLVM always had less restrictions on ISA under such circumstances, I would like 
to
stick to how SDM did when implementing that, which is a little conservative.

However, I am also ok with VAES implying AES if there is no real HW that has
VAES w/o AES to reduce complexity in this scenario.

Thx,
Haochen

RE: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Jiang, Haochen

> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, April 8, 2024 9:43 PM
> To: Jiang, Haochen 
> Cc: Hongtao Liu ; gcc-patches@gcc.gnu.org; Liu, Hongtao
> ; ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]
> 
> On Mon, Apr 08, 2024 at 12:33:39PM +, Jiang, Haochen wrote:
> > Sorry for the late response since I am on vacation for now.
> >
> > > As the following testcase shows, the above change was incorrect.
> > >
> > > Using aes isa for the second alternative is obviously wrong, aes is 
> > > enabled
> > > whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> > > means that for -maes -mno-avx RA can choose, either it matches the first
> > > alternative with the dup operand, or it matches the second one (but that
> > > is of course wrong because vaesenc VEX encoded insn needs AES & AVX
> CPUID).
> >
> > When I wrote that patch, I suppose it will never match the second one when
> > AVX is not enabled because it will immediately drop to the first one so the
> > second one is automatically AES && AVX, which is tricky here.
> 
> Before the -mvaes changes the alternatives were noavx,avx isa and so clearly
> it was either the first alternative is the solely available, or the second,
> depending on TARGET_AVX.  But with noavx,aes on the first alternative is
> enabled only for !TARGET_AVX, but the second one whenever TARGET_AES, which
> is both if !TARGET_AVX and TARGET_AVX.  So, the RA is free to consider both
> alternatives, and because the first one is more restrictive (requires
> output matching input), if there is a match between those, it will use the
> first alternative, but if there isn't, it will happily use the second
> alternative.
> 

Aha, I see. Thanks for the explanation.

Thx,
Haochen

RE: [PATCH] i386: Add AVX10.1 related macros

2024-01-11 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, January 11, 2024 4:19 PM
> To: Liu, Hongtao 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; bur...@net-b.de; san...@codesourcery.com
> Subject: Re: [PATCH] i386: Add AVX10.1 related macros
> 
> On Thu, Jan 11, 2024 at 2:16 AM Liu, Hongtao 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, January 10, 2024 5:44 PM
> > > To: Liu, Hongtao 
> > > Cc: Jiang, Haochen ;
> > > gcc-patches@gcc.gnu.org; ubiz...@gmail.com; bur...@net-b.de;
> > > san...@codesourcery.com
> > > Subject: Re: [PATCH] i386: Add AVX10.1 related macros
> > >
> > > On Wed, Jan 10, 2024 at 9:01 AM Liu, Hongtao 
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Jiang, Haochen 
> > > > > Sent: Wednesday, January 10, 2024 3:35 PM
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Cc: Liu, Hongtao ; ubiz...@gmail.com;
> > > > > burnus@net- b.de; san...@codesourcery.com
> > > > > Subject: [PATCH] i386: Add AVX10.1 related macros
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch aims to add AVX10.1 related macros for libgomp's request.
> > > > > The request comes following:
> > > > >
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.ht
> > > > > ml
> > > > >
> > > > > Ok for trunk?
> > > > >
> > > > > Thx,
> > > > > Haochen
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   PR target/113288
> > > > >   * config/i386/i386-c.cc (ix86_target_macros_internal):
> > > > >   Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
> > > > > ---
> > > > >  gcc/config/i386/i386-c.cc | 7 +++
> > > > >  1 file changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/gcc/config/i386/i386-c.cc
> > > > > b/gcc/config/i386/i386-c.cc index c3ae984670b..366b560158a
> > > > > 100644
> > > > > --- a/gcc/config/i386/i386-c.cc
> > > > > +++ b/gcc/config/i386/i386-c.cc
> > > > > @@ -735,6 +735,13 @@ ix86_target_macros_internal
> (HOST_WIDE_INT
> > > > > isa_flag,
> > > > >  def_or_undef (parse_in, "__EVEX512__");
> > > > >if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
> > > > >  def_or_undef (parse_in, "__USER_MSR__");
> > > > > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
> > > > > +{
> > > > > +  def_or_undef (parse_in, "__AVX10_1_256__");
> > > > > +  def_or_undef (parse_in, "__AVX10_1__");
> > > > I think this is not needed, others LGTM.
> > >
> > > So __AVX10_1_256__ and __AVX10_1_512__ are redundant with
> > > __AVX10_1__ and __EVEX512__, right?
> > No, I mean __AVX10_1__ is redundant of __AVX10_1_256__ since -
> mavx10.1 is just alias of -mavx10.1-256.
> > We want explicit __AVX10_1_256__ and __AVX10_1_512__ and don't want
> mix __EVEX512__ with AVX10(They are related in their internal
> implementation, but we don't want the user to control the vector length of
> avx10 with -mno-evex512, -mno-evex512 is supposed for the existing
> AVX512).

Let's keep both of them if we prefer __AVX10_1_256__ since I just found
that LLVM got macro __AVX10_1__.

https://github.com/llvm/llvm-project/pull/67278/files#diff-7435d50346a810555df89deb1f879b767ee985ace43fb3990de17fb23a47f004

in file clang/lib/Basic/Targets/X86.cpp L774-777.

Thx,
Haochen

> 
> Ah, that makes sense.
> 
> > > > > +}
> > > > > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
> > > > > +def_or_undef (parse_in, "__AVX10_1_512__");
> > > > >if (TARGET_IAMCU)
> > > > >  {
> > > > >def_or_undef (parse_in, "__iamcu");
> > > > > --
> > > > > 2.31.1
> > > >

RE: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors) on Linux/x86_64

2023-12-24 Thread Jiang, Haochen

It is not a target specific issue, it will fail if we enabled AVX.

e.g.:

$ /export/users/haochenj/env/build_no_bootstrap_master/gcc/xgcc 
-B/export/users/haochenj/env/build_no_bootstrap_master/gcc/  
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c  -m64 
-mavx   -fdiagnostics-plain-output   -std=gnu23 -S -o gnu23-tag-4.s
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c: In 
function ‘bar’:
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c:18:47: 
error: initialization of ‘struct g *’ from incompatible pointer type ‘struct g 
*’ [-Wincompatible-pointer-types]

Thx,
Haochen

> -Original Message-
> From: Martin Uecker 
> Sent: Friday, December 22, 2023 5:39 PM
> To: gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> ; Joseph Myers 
> Subject: Re: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess
> errors) on Linux/x86_64
> 
> 
> Hm, this is weird, as it really seems to depend on the -march=  So if 
> there is
> really a difference between those structs which make them incompatible on
> some archs, we should not consider them to be compatible in general.
> 
> struct g { int a[n]; int b; } *y;
> { struct g { int a[4]; int b; } *y2 = y; }
> 
> But I do not see what could go wrong here as sizeof / alignment is the same 
> for
> n = 4.  So there is something else I missed
> 
> 
> 
> Am Freitag, dem 22.12.2023 um 05:07 +0800 schrieb haochen.jiang:
> > On Linux/x86_64,
> >
> > 23fee88f84873b0b8b41c8e5a9b229d533fb4022 is the first bad commit
> > commit 23fee88f84873b0b8b41c8e5a9b229d533fb4022
> > Author: Martin Uecker 
> > Date:   Tue Aug 15 14:58:32 2023 +0200
> >
> > c23: tag compatibility rules for struct and unions
> >
> > caused
> >
> > FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors)
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6770/
> > usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-
> tag-4.c --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-
> tag-4.c --target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report,
> > contact me at haochen dot jiang at intel.com.) (If you met problems
> > with cascadelake related, disabling AVX512F in command line might save
> that.) (However, please make sure that there is no potential problems with
> AVX512.)

RE: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend

2023-12-27 Thread Jiang, Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Thursday, December 21, 2023 4:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ;
> ger...@pfeifer.com
> Subject: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for
> x86_64 backend
> 
> Hi all,
> 
> This is the v2 patch for the wwwdocs change regarding to review.
> 
> If there is no objection, I will push this change next Tuesday.

I will commit the doc change patch.

Thx,
Haochen

> 
> Changes is v2:
> 
>   - Remove RAO-INT from Grand Ridge
>   - Remove the mask register restriction for -mno-evex512
>   - Arrange the options alphabetically
>   - Other minor text change
> 
> Thx,
> Haochen
> 
> Messages in v1:
> 
> This patch will mention the following changes in wwwdocs for x86_64
> backend:
> 
>   - AVX10.1 support
>   - APX EGPR, PUSH2POP2, PPX and NDD support
>   - Xeon Phi ISAs deprecated
> 
> Also I adjust the words in x86_64 part for GCC 13.
> 
> ---
> Mention AVX10.1 support, APX support and Xeon Phi deprecate in GCC 14.
> Also adjust documentation in GCC 13.
> ---
>  htdocs/gcc-13/changes.html | 38 --
>  htdocs/gcc-14/changes.html | 27 ++-
>  2 files changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> d3bacc16..b4b1a39a 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -543,24 +543,28 @@ You may also want to check out our
>__bf16 type to x86 psABI. Users need to adjust their
>AVX512BF16-related source code when upgrading GCC12 to GCC13.
>
> -  New ISA extension support for Intel AVX-IFMA was added.
> -  AVX-IFMA intrinsics are available via the -mavxifma
> +  New ISA extension support for Intel AMX-COMPLEX was added.
> +  AMX-COMPLEX intrinsics are available via the
> + -mamx-complex
>compiler switch.
>
> -  New ISA extension support for Intel AVX-VNNI-INT8 was added.
> -  AVX-VNNI-INT8 intrinsics are available via the -
> mavxvnniint8
> +  New ISA extension support for Intel AMX-FP16 was added.
> +  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AVX-IFMA was added.
> +  AVX-IFMA intrinsics are available via the -mavxifma
>compiler switch.
>
>New ISA extension support for Intel AVX-NE-CONVERT was added.
>AVX-NE-CONVERT intrinsics are available via the
>-mavxneconvert compiler switch.
>
> -  New ISA extension support for Intel CMPccXADD was added.
> -  CMPccXADD intrinsics are available via the -mcmpccxadd
> +  New ISA extension support for Intel AVX-VNNI-INT8 was added.
> +  AVX-VNNI-INT8 intrinsics are available via the
> + -mavxvnniint8
>compiler switch.
>
> -  New ISA extension support for Intel AMX-FP16 was added.
> -  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  New ISA extension support for Intel CMPccXADD was added.
> +  CMPccXADD intrinsics are available via the
> + -mcmpccxadd
>compiler switch.
>
>New ISA extension support for Intel PREFETCHI was added.
> @@ -571,10 +575,6 @@ You may also want to check out our
>RAO-INT intrinsics are available via the -mraoint
>compiler switch.
>
> -  New ISA extension support for Intel AMX-COMPLEX was added.
> -  AMX-COMPLEX intrinsics are available via the -mamx-
> complex
> -  compiler switch.
> -  
>GCC now supports the Intel CPU named Raptor Lake through
>  -march=raptorlake.
>  Raptor Lake is based on Alder Lake.
> @@ -585,13 +585,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Sierra Forest through
>  -march=sierraforest.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> -ENQCMD and UINTR ISA extensions.
> +Based on ISA extensions enabled on Alder Lake, the switch further enables
> +the AVX-IFMA, AVX-NE-CONVERT, AVX-VNNI-INT8, CMPccXADD,
> ENQCMD and UINTR
> +ISA extensions.
>
>GCC now supports the Intel CPU named Grand Ridge through
>  -march=grandridge.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> -ENQCMD, UINTR and RAO-INT ISA extensions.
> +Grand Ridge is based on Sierra Forest.
>
>GCC now supports the Intel CPU named Emerald Rapids through
>  -march=emeraldrapids.
> @@ -599,11 +599,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Granite Rapids through
>  -march=graniterapids.
> -The switch enables the AMX-FP16 and PREFETCHI ISA extensions.
> +Based on Sapphire Rapids, the switch further enables the AMX-FP16 and
> +PREFETCHI ISA extensions.
>
>GCC now supports the Intel CPU named Granite Rapids D through
>  -march=graniterapids-d.
> -The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
> ext

RE: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-11-30 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 1, 2023 3:04 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> 
> On Fri, Dec 1, 2023 at 3:22 AM Haochen Jiang 
> wrote:
> >
> > Since Knight Landing and Knight Mill microarchitectures are EOL, we
> > would like to remove its support in GCC 15. In GCC 14, we will first
> > emit a warning for the usage.
> 
> I think it's better to keep supporting -mtune/arch=knl without diagnostics

I see, it could be a choice and might be better. But if we take this, how should
we define -mtune=knl remains a question.

> but simply not enable the ISAs we don't support.  The better question is
> what to do about KNL specific intrinsics headers / intrinsics?  Will we
> simply remove those?

If there is no objection, The intrinsics are planned to be removed in GCC 15.
As far as concerned, almost nobody are using them with the latest GCC. And
there is no complaint when removing them in ICC/ICX.

Thx,
Haochen

> 
> Richard.
> 
> > gcc/ChangeLog:
> >
> > * config/i386/driver-i386.cc (host_detect_local_cpu):
> > Do not append "-mno-" for Xeon Phi ISAs.
> > * config/i386/i386-options.cc (ix86_option_override_internal):
> > Emit a warning for KNL/KNM targets.
> > * config/i386/i386.opt: Emit a warning for Xeon Phi ISAs.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/other/i386-2.C: Adjust testcases.
> > * g++.dg/other/i386-3.C: Ditto.
> > * g++.dg/pr80481.C: Ditto.
> > * gcc.dg/pr71279.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2pd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2pd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2ps-1.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2ps-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28pd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28pd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-3.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-4.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28sd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28sd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ss-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ss-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28pd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28pd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-3.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-4.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-5.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-6.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28sd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28sd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ss-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ss-2.c: Ditto.
> > * gcc.target/i386/avx512f-gather-1.c: Ditto.
> > * gcc.target/i386/avx512f-gather-2.c: Ditto.
> > * gcc.target/i386/avx512f-gather-3.c: Ditto.
> > * gcc.target/i386/avx512f-gather-4.c: Ditto.
> > * gcc.target/i386/avx512f-gather-5.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherd512-1.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherd512-2.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherpd512-1.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherpd512-2.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherps512-1.c: Ditto.
> > * gcc.target/i386/avx512f

RE: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-12-01 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 1, 2023 4:37 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> 
> On Fri, Dec 1, 2023 at 8:34 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, December 1, 2023 3:04 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > ubiz...@gmail.com
> > > Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> > >
> > > On Fri, Dec 1, 2023 at 3:22 AM Haochen Jiang 
> > > wrote:
> > > >
> > > > Since Knight Landing and Knight Mill microarchitectures are EOL, we
> > > > would like to remove its support in GCC 15. In GCC 14, we will first
> > > > emit a warning for the usage.
> > >
> > > I think it's better to keep supporting -mtune/arch=knl without diagnostics
> >
> > I see, it could be a choice and might be better. But if we take this, how 
> > should
> > we define -mtune=knl remains a question.
> 
> I'd say mapping it to a "close" micro-architecture makes most sense, but
> we could also simply keep the tuning entry for knl?

Actually I have written a removal test patch, one of the issue might be there is
something specific about knl in tuning for VZEROUPPER, which is also reflected 
in
PR82990.

/* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion
   before a transfer of control flow out of the function.  */
DEF_TUNE (X86_TUNE_EMIT_VZEROUPPER, "emit_vzeroupper", ~m_KNL)

If we chose to keep them, this behavior will be changed.

> 
> > > but simply not enable the ISAs we don't support.  The better question is
> > > what to do about KNL specific intrinsics headers / intrinsics?  Will we
> > > simply remove those?
> >
> > If there is no objection, The intrinsics are planned to be removed in GCC 
> > 15.
> > As far as concerned, almost nobody are using them with the latest GCC. And
> > there is no complaint when removing them in ICC/ICX.
> 
> I see.  Replacing the header contents with #error "XYZ is no longer supported"
> might be nicer.  OTOH x86intrin.h should simply no longer include them.

That is nicer. I will take that in GCC 15 patch.

Thx,
Haochen

> 
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > > Richard.
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/driver-i386.cc (host_detect_local_cpu):
> > > > Do not append "-mno-" for Xeon Phi ISAs.
> > > > * config/i386/i386-options.cc (ix86_option_override_internal):
> > > > Emit a warning for KNL/KNM targets.
> > > > * config/i386/i386.opt: Emit a warning for Xeon Phi ISAs.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * g++.dg/other/i386-2.C: Adjust testcases.
> > > > * g++.dg/other/i386-3.C: Ditto.
> > > > * g++.dg/pr80481.C: Ditto.
> > > > * gcc.dg/pr71279.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2pd-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2pd-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2ps-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2ps-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28pd-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28pd-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28ps-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28ps-2.c: Ditto.
> > > > * gcc.target/i386/avx5

RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Jiang, Haochen

> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Tuesday, December 12, 2023 9:01 AM
> To: haochen.jiang ; Andrew Pinski (QUIC)
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> > -Original Message-
> > From: haochen.jiang 
> > Sent: Monday, December 11, 2023 4:54 PM
> > To: Andrew Pinski (QUIC) ; gcc-
> > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; haochen.ji...@intel.com
> > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > On Linux/x86_64,
> >
> > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit commit
> > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > Author: Andrew Pinski 
> > Date:   Sat Nov 11 15:54:10 2023 -0800
> >
> > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type 
> > are
> > the same
> >
> > caused
> >
> > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> 
> 
> So I think this is a testsuite issue, in that shrx instruction is being used 
> here
> instead of just ` shrq` due to that instruction being enabled with `-
> march=cascadelake` .
> Can someone confirm that and submit a testcase change?

I will do that today.

Thx,
Haochen

> 
> Thanks,
> Andrew
> 
> >
> > with GCC configured with
> >
> > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-zlib -
> > -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report, contact 
> > me at
> > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > related, disabling AVX512F in command line might save that.) (However,
> > please make sure that there is no potential problems with AVX512.)

RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Jiang, Haochen

> -Original Message-
> From: Jiang, Haochen
> Sent: Tuesday, December 12, 2023 9:11 AM
> To: Andrew Pinski (QUIC) ; haochen.jiang
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> > -Original Message-
> > From: Andrew Pinski (QUIC) 
> > Sent: Tuesday, December 12, 2023 9:01 AM
> > To: haochen.jiang ; Andrew Pinski (QUIC)
> > ; gcc-regress...@gcc.gnu.org; gcc-
> > patc...@gcc.gnu.org; Jiang, Haochen 
> > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > > -Original Message-
> > > From: haochen.jiang 
> > > Sent: Monday, December 11, 2023 4:54 PM
> > > To: Andrew Pinski (QUIC) ; gcc-
> > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> haochen.ji...@intel.com
> > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > On Linux/x86_64,
> > >
> > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> commit
> > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > Author: Andrew Pinski 
> > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > >
> > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type
> are
> > > the same
> > >
> > > caused
> > >
> > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> >
> >
> > So I think this is a testsuite issue, in that shrx instruction is being 
> > used here
> > instead of just ` shrq` due to that instruction being enabled with `-
> > march=cascadelake` .
> > Can someone confirm that and submit a testcase change?
> 
> I will do that today.

I suppose we might just need to change the scan-asm from shrq to shr to cover
shrx.

Is that ok? If it is, I will commit a patch to change that.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> >
> > Thanks,
> > Andrew
> >
> > >
> > > with GCC configured with
> > >
> > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-
> zlib -
> > > -with-demangler-in-ld --with-fpmath=sse --enable-
> languages=c,c++,fortran --
> > > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> > >
> > > To reproduce:
> > >
> > > $ cd {build_dir}/gcc && make check
> > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > target_board='unix{-m64\ -march=cascadelake}'"
> > >
> > > (Please do not reply to this email, for question about this report, 
> > > contact
> me at
> > > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > > related, disabling AVX512F in command line might save that.) (However,
> > > please make sure that there is no potential problems with AVX512.)

RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-12 Thread Jiang, Haochen

> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, December 12, 2023 2:06 PM
> To: Jiang, Haochen 
> Cc: Andrew Pinski (QUIC) ; haochen.jiang
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> On Tue, Dec 12, 2023 at 1:47 PM Jiang, Haochen via Gcc-regression
>  wrote:
> >
> > > -----Original Message-
> > > From: Jiang, Haochen
> > > Sent: Tuesday, December 12, 2023 9:11 AM
> > > To: Andrew Pinski (QUIC) ; haochen.jiang
> > > ; gcc-regress...@gcc.gnu.org; gcc-
> > > patc...@gcc.gnu.org
> > > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > > -Original Message-
> > > > From: Andrew Pinski (QUIC) 
> > > > Sent: Tuesday, December 12, 2023 9:01 AM
> > > > To: haochen.jiang ; Andrew Pinski
> (QUIC)
> > > > ; gcc-regress...@gcc.gnu.org; gcc-
> > > > patc...@gcc.gnu.org; Jiang, Haochen 
> > > > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> > > scan-
> > > > assembler-times shrq 2 on Linux/x86_64
> > > >
> > > > > -Original Message-
> > > > > From: haochen.jiang 
> > > > > Sent: Monday, December 11, 2023 4:54 PM
> > > > > To: Andrew Pinski (QUIC) ; gcc-
> > > > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> > > haochen.ji...@intel.com
> > > > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > > > > assembler-times shrq 2 on Linux/x86_64
> > > > >
> > > > > On Linux/x86_64,
> > > > >
> > > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> > > commit
> > > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > > > Author: Andrew Pinski 
> > > > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > > > >
> > > > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one
> type
> > > are
> > > > > the same
> > > > >
> > > > > caused
> > > > >
> > > > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> > > >
> > > >
> > > > So I think this is a testsuite issue, in that shrx instruction is being 
> > > > used
> here
> > > > instead of just ` shrq` due to that instruction being enabled with `-
> > > > march=cascadelake` .
> > > > Can someone confirm that and submit a testcase change?
> > >
> > > I will do that today.
> >
> > I suppose we might just need to change the scan-asm from shrq to shr to
> cover
> > shrx.
> Please use shr\[qx\], not shr.

I see. I will take that.

Thx,
Haochen

> >
> > Is that ok? If it is, I will commit a patch to change that.
> >
> > Thx,
> > Haochen
> >
> > >
> > > Thx,
> > > Haochen
> > >
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > >
> > > > > with GCC configured with
> > > > >
> > > > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-
> system-
> > > zlib -
> > > > > -with-demangler-in-ld --with-fpmath=sse --enable-
> > > languages=c,c++,fortran --
> > > > > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-
> bootstrap
> > > > >
> > > > > To reproduce:
> > > > >
> > > > > $ cd {build_dir}/gcc && make check
> > > > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > > > target_board='unix{-m64\ -march=cascadelake}'"
> > > > >
> > > > > (Please do not reply to this email, for question about this report,
> contact
> > > me at
> > > > > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > > > > related, disabling AVX512F in command line might save that.)
> (However,
> > > > > please make sure that there is no potential problems with AVX512.)
> 
> 
> 
> --
> BR,
> Hongtao

RE: [RFC] Intel AVX10.1 Compiler Design and Support

2023-12-12 Thread Jiang, Haochen

> > On the other hand, a new EVEX-capable level might bring earlier adoption
> > of EVEX capabilities to AMD CPUs, which still should be an improvement
> > over AVX2.  This could benefit AMD as well.  So I would really like to
> > see some AMD feedback here.
> >
> > There's also the matter that time scales for EVEX adoption are so long
> > that by then, Intel CPUs may end up supporting and preferring 512 bit
> > vectors again.
> 
> True, there isn't even widespread VEX adoption yet ... and now there's
> APX as the next best thing to target.
> 
> That said, my main point was that x86-64-v4 is "broken" as it appears
> as a dead end - AVX512 is no more, the future is AVX10, but yet we have
> to define x86-64-v5 as something that includes x86-64-v4.
> 
> So, can we un-do x86-64-v4?

As far as I have heard, x86-64-v4 is rarely used. There should be a small
chance to un-do that and not to break too many things. But I am not sure.

Thx,
Haochen

> 
> Richard.
> 
> > Thanks,
> > Florian
> >

RE: [gcc-wwwdocs PATCH] gcc-13/14: Mention recent update for x86_64 backend

2023-12-13 Thread Jiang, Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Wednesday, December 13, 2023 2:20 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [gcc-wwwdocs PATCH] gcc-13/14: Mention recent update for
> x86_64 backend
> 
> On Fri, 8 Dec 2023, Haochen Jiang wrote:
> > +++ b/htdocs/gcc-13/changes.html
> 
> > +Based on ISA extensions enabled on Alder Lake, the switch further
> enables
> > +the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT, CMPccXADD,
> ENQCMD and UINTR
> > +ISA extensions.
> 
> Personally I would alphabetically sort all the options, like you have mostly
> done already. Just AVX-VNNI-INT8 and AVX-NE-CONVERT are not.
> 
> (Maybe you have a reason, and in any case this should not block this
> patch.)
> 
> 
> > +++ b/htdocs/gcc-14/changes.html
> > +  New ISA extension support for Intel AVX10.1 was added.
> > +  AVX10.1 intrinsics are available via the -mavx10.1 or
> > +  -mavx10.1-256 compiler switch with 256 bit vector size
> > +  support. 512 bit vector size support for AVX10.1 intrinsics are
> 
> We usually write 256-bit and 512-bit as adjectives, cf.
> gcc.gnu.org/codingconventions.html .
> 
> > +  Part of new feature support for Intel APX was added, including EGPR,
> > +  PUSH2POP2, PPX and NDD.
> 
> Alphabetically?
> 
> > APX features are available via the
> > +  -mapxf compiler switch.
> 
> Could we say "APX is enabled via..." or "APX support is available via..."?
> 
> > +  Xeon Phi CPUs support (a.k.a. Knight Landing and Knight Mill) are
> marked
> > +as deprecated. GCC will emit a warning when using the
> > +-mavx5124fmaps, -mavx5124vnniw,
> > +-mavx512er, -mavx512pf,
> > +-mprefetchwt1, -march=knl,
> > +-march=knm, -mtune=knl and -
> mtune=knm
> > +compiler switch. The support will be removed in GCC 15.
> > +  
> 
> I believe "or" instead of "and" will be clearer.
> 
> And "compiler switches" (plural).
> 
> And just "Support" in the last sentence.
> 
> 
> Thanks for submitting these! No need for further review before committing
> (a minor variation).

Thanks for your review. I will fix them and also other alphabetic issue in 
GCC13/14
doc. Since there will be a Grand Ridge ISA change coming soon in GCC13 (I have
not sent out the patch but will happen within one week), I will commit the whole
patch after that landed.

Thx,
Haochen

> 
> Gerald

RE: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Jiang, Haochen

> > > > I wonder whether adoption could be made easier by also providing a
> > > > -mavx10[.0] level that removes some of the more obscure sub-ISA
> > > > requirements to cover more existing implementations (I'd not add 
> > > > -mavx10.0-512 here).
> > > > I'd require only skylake-AVX512 features here, basically all
> > > > non-KNL AVX512 CPUs should have a "virtual" AVX10 level that
> > > > allows to use that feature set,
> > >
> > > We have -mno-evex512 can cover those cases, so what you want is like
> > > a simple alias of "-march=skylake-avx512 -mno-evex512"?
> >
> > For the AVX512 enabled sub-isas of skylake-avx512 yes I guess.
> >
> > > > restricted to 256bits so future AVX10-256 implementations can
> > > > handle it as well as all existing (and relevant, which excludes
> > > > KNL) AVX512 implementations.
> > > >
> > > > Otherwise AVX10 is really a hard sell (as AVX512 was originally).
> > >
> > > It's a rebranding of the existing AVX512 to AVX10, AVX10.0  just
> > > complicated things further(considering we already have x86-64-v4
> > > which is different from skylake-avx512).
> >
> > Well, the cut-off for "AVX512" is quite arbitrary.  Introducing a
> > "new" ISA that's only available in HW available in the future and
> > suggesting users to embrace that already (like Intel did with AVX512
> > without offering client SKU support) is a hard sell.
> >
> > I realize Intel thinks client SKU support for AVX10 (restricted to
> > 256bit) will be "easier".  But then don't expect anybody to adopt that in 
> > the next 10 years.
> >
> > Just to add - we were suggesting to use x86_64-v3 for the "next"
> > enterprise product but got downvoted to x86_64-v2 for compatibility reasons.
> >
> > If it were possible I'd axe x86_64-v4.  Maybe we should add a
> > x86_64-v3.5 that sits inbetween v3 and v4, offering AVX512 but
> > restricted to 256bit (and obviously not requiring more of the AVX512 
> > features that v4 requires).
>
> About the arch level is indeed a problem, especially since the default size of
> avx10 is 256.
> +Florian Weimer for more inputs.

IMO, AVX10.1 options should be there and the arch level issue should not affect 
the
existence of this series of options.

The issue currently we are facing is much about the arch level issue actually 
since
we have defined x86-64-v4 before. The "-march=skylake-server -mno-evex512" is
much like something x86-64-v4-256.

Thx,
Haochen

RE: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-23 Thread Jiang, Haochen

> -Original Message-
> From: Sebastian Huber 
> Sent: Wednesday, November 22, 2023 10:24 PM
> To: Christophe Lyon 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()
> 
> On 22.11.23 15:22, Christophe Lyon wrote:
> > On Tue, 21 Nov 2023 at 12:22, Sebastian Huber
> >   wrote:
> >> On 21.11.23 11:46, Jakub Jelinek wrote:
> >>> On Tue, Nov 21, 2023 at 11:42:06AM +0100, Sebastian Huber wrote:
>  On 21.11.23 11:34, Jakub Jelinek wrote:
> >> --- a/gcc/tree-profile.cc
> >> +++ b/gcc/tree-profile.cc
> >> @@ -281,10 +281,13 @@ gen_assign_counter_update
> (gimple_stmt_iterator *gsi, gcall *call, tree func,
> >>   if (result)
> >> {
> >>   tree result_type = TREE_TYPE (TREE_TYPE (func));
> >> -  tree tmp = make_temp_ssa_name (result_type, NULL, name);
> >> -  gimple_set_lhs (call, tmp);
> >> +  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
> >> +  gimple_set_lhs (call, tmp1);
> >>   gsi_insert_after (gsi, call, GSI_NEW_STMT);
> >> -  gassign *assign = gimple_build_assign (result, tmp);
> >> +  tree tmp2 = make_ssa_name (TREE_TYPE (result));
> >> +  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
> >> +  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> >> +  assign = gimple_build_assign (result, gimple_assign_lhs 
> >> (assign));
> > When you use a temporary tmp2 for the lhs of the conversion, you can
> just
> > use it here,
> >  assign = gimple_build_assign (result, tmp2);
> >
> > Ok for trunk with that change.
>  Just a question, could I also use
> 
>  tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);
> 
>  ?
> 
>  This make_temp_ssa_name() is used throughout the file and the new
>  make_ssa_name() would be the first use in this file.
> >>> Yes.  The only difference is that it won't be _234 = (type) something;
> >>> but PROF_time_profile_234 = (type) something; in the dumps, but sure,
> >>> consistency is useful.
> >> Thanks for your help. I checked in an updated version.
> >>
> > Our CI bisected a regression to this commit:
> > Running gcc:gcc.dg/tree-prof/tree-prof.exp ...
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
> > "Read tp_first_run: 0" 1
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
> > "Read tp_first_run: 2" 1
> >
> > (on aarch64)
> >
> > Can you check?
> 
> Yes, I will have a look at it.

The same issue also happened on i386. You can also reproduce that on
x86-64 platforms.

Thx,
Haochen

> 
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march support

2023-11-26 Thread Jiang, Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Saturday, November 25, 2023 7:29 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march
> support
> 
> On Mon, 17 Jul 2023, Haochen Jiang via Gcc-patches wrote:
> >GCC now supports the Intel CPU named Granite Rapids through
> >  -march=graniterapids.
> > +The switch enables the AMX-FP16, PREFETCHI ISA extensions.
> 
> Do I understand correclty that it enables AMX-FP16 and PREFETCHI?
> 
> How about changing this to use "and", as in
>   "The switch enables the AMX-FP16, PREFETCHI ISA extensions."
> ?
> 
> Let me know, and I can make the change.
> 

Ok for me.

Thx,
Haochen

> Gerald

RE: [r14-5578 Regression] FAIL: gfortran.dg/gomp/pr27573.f90 -O (test for excess errors) on Linux/x86_64

2023-11-27 Thread Jiang, Haochen

> -Original Message-
> From: Sebastian Huber 
> Sent: Monday, November 27, 2023 3:58 PM
> To: haochen.jiang ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> 
> Subject: Re: [r14-5578 Regression] FAIL: gfortran.dg/gomp/pr27573.f90 -O (test
> for excess errors) on Linux/x86_64
> 
> On 26.11.23 12:18, haochen.jiang wrote:
> > On Linux/x86_64,
> >
> > a350a74d6113e3a84943266eb691275951c109d9 is the first bad commit
> > commit a350a74d6113e3a84943266eb691275951c109d9
> > Author: Sebastian Huber
> > Date:   Sat Oct 21 15:52:15 2023 +0200
> >
> >  gcov: Add gen_counter_update()
> >
> > caused
> >
> > FAIL: gcc.dg/gomp/pr27573.c (internal compiler error: verify_gimple
> > failed)
> > FAIL: gcc.dg/gomp/pr27573.c (test for excess errors)
> > FAIL: gcc.dg/profile-update-warning.c (internal compiler error:
> > verify_gimple failed)
> > FAIL: gcc.dg/profile-update-warning.c (test for excess errors)
> > FAIL: gfortran.dg/gomp/pr27573.f90   -O  (internal compiler error: 
> > verify_gimple
> failed)
> > FAIL: gfortran.dg/gomp/pr27573.f90   -O  (test for excess errors)
> 
> The errors were fixed by:
> 
> commit 41aacdea55c5d795a7aa195357d966645845d00e
> Author: Sebastian Huber 
> Date:   Mon Nov 20 15:26:38 2023 +0100
> 
>  gcov: Fix integer types in gen_counter_update()
> 
> commit a034cca0a222598cb42302c059262b654685ff19
> Author: Sebastian Huber 
> Date:   Mon Nov 20 14:48:03 2023 +0100
> 
>  gcov: Use unshare_expr() in gen_counter_update()
> 

Hi Sebastian,

Thanks for your fix! This mail was automatically sent and delayed due to
the previous bootstrap fail on the trunk. If everything got fixed, that is
ok.

Thx,
Haochen

> --
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

RE: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-11 Thread Jiang, Haochen

> -Original Message-
> From: Hongtao Liu 
> Sent: Thursday, July 11, 2024 9:45 AM
> To: Victor Do Nascimento 
> Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com;
> richard.earns...@arm.com
> Subject: Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and
> sse targets
> 
> On Wed, Jul 10, 2024 at 10:10 PM Victor Do Nascimento
>  wrote:
> >
> > Following the migration of the dot_prod optab from a direct to a
> > conversion-type optab, ensure all back-end patterns incorporate the
> > second machine mode into pattern names.
> The patch LGTM. BTW you can use existing  instead of
> new  and  instead of 
> >
> > gcc/ChangeLog:
> >
> > * config/i386/mmx.md (usdot_prodv8qi): Deleted.
> > (usdot_prodv2siv8qi): New.

Hi Victor,

I suppose all the patterns are renamed not deleted and new right?
If that is the case, I suppose the log might be better and easier to understand
if changed to something like:

(old pattern): Renamed to ...
(new pattern): this.

Thx,
Haochen

> > (sdot_prodv8qi): Deleted.
> > (sdot_prodv2siv8qi): New.
> > (udot_prodv8qi): Deleted.
> > (udot_prodv2siv8qi): New.
> > (usdot_prodv4hi): Deleted.
> > (usdot_prodv2siv4hi): New.
> > (udot_prodv4hi): Deleted.
> > (udot_prodv2siv4hi): New.
> > (sdot_prodv4hi): Deleted.
> > (sdot_prodv2siv4hi): New.
> > * config/i386/sse.md (fourwayacc): New.
> > (twowayacc): New.
> > (sdot_prod): Deleted.
> > (sdot_prod): New.
> > (sdot_prodv4si): Deleted.
> > (sdot_prodv2div4si): New.
> > (usdot_prod): Deleted.
> > (usdot_prod): New.
> > (sdot_prod): Deleted.
> > (sdot_prod): New.
> > (sdot_prodv64qi): Deleted.
> > (sdot_prodv16siv64qi): New.
> > (udot_prod): Deleted.
> > (udot_prod): New.
> > (udot_prodv64qi): Deleted.
> > (udot_prodv16qiv64qi): New.
> > (usdot_prod): Deleted.
> > (usdot_prod): New.
> > (udot_prod): Deleted.
> > (udot_prod): New.
> > ---
> >  gcc/config/i386/mmx.md | 30 +--
> > gcc/config/i386/sse.md | 47 +
> -
> >  2 files changed, 43 insertions(+), 34 deletions(-)
> >
> > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index
> > 94d3a6e5692..d78739b033d 100644
> > --- a/gcc/config/i386/mmx.md
> > +++ b/gcc/config/i386/mmx.md
> > @@ -6344,7 +6344,7 @@ (define_expand "usadv8qi"
> >DONE;
> >  })
> >
> > -(define_expand "usdot_prodv8qi"
> > +(define_expand "usdot_prodv2siv8qi"
> >[(match_operand:V2SI 0 "register_operand")
> > (match_operand:V8QI 1 "register_operand")
> > (match_operand:V8QI 2 "register_operand") @@ -6363,7 +6363,7 @@
> > (define_expand "usdot_prodv8qi"
> >rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
> >rtx op0 = gen_reg_rtx (V4SImode);
> >
> > -  emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
> > +  emit_insn (gen_usdot_prodv4siv16qi (op0, op1, op2, op3));
> >emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0,
> V4SImode));
> >   }
> > else
> > @@ -6377,7 +6377,7 @@ (define_expand "usdot_prodv8qi"
> >emit_move_insn (op3, CONST0_RTX (V4SImode));
> >emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
> >emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
> > -  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
> > +  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
> >
> >/* vec_perm (op0, 2, 3, 0, 1);  */
> >emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78))); @@
> > -6388,7 +6388,7 @@ (define_expand "usdot_prodv8qi"
> >  DONE;
> >  })
> >
> > -(define_expand "sdot_prodv8qi"
> > +(define_expand "sdot_prodv2siv8qi"
> >[(match_operand:V2SI 0 "register_operand")
> > (match_operand:V8QI 1 "register_operand")
> > (match_operand:V8QI 2 "register_operand") @@ -6406,7 +6406,7 @@
> > (define_expand "sdot_prodv8qi"
> >rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
> >rtx op0 = gen_reg_rtx (V4SImode);
> >
> > -  emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
> > +  emit_insn (gen_sdot_prodv4siv16qi (op0, op1, op2, op3));
> >emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0,
> V4SImode));
> >  }
> >else
> > @@ -6420,7 +6420,7 @@ (define_expand "sdot_prodv8qi"
> >emit_move_insn (op3, CONST0_RTX (V4SImode));
> >emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
> >emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
> > -  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
> > +  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
> >
> >/* vec_perm (op0, 2, 3, 0, 1);  */
> >emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78))); @@
> > -6432,7 +6432,7 @@ (define_expand "sdot_prodv8qi"
> >
> >  })
> >
> > -(define_e

RE: [r15-2135 Regression] FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings, line 31) on Linux/x86_64

2024-07-18 Thread Jiang, Haochen

Hi Paul,

I suspect it is not the correct way to do that, those lines are ok since they 
are XFAIL. The problem is that specific warning test.

Thx,
Haochen

From: Paul Richard Thomas 
Sent: Friday, July 19, 2024 12:28 AM
To: haochen.jiang 
Cc: pa...@gcc.gnu.org; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: Re: [r15-2135 Regression] FAIL: 
libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings, line 
31) on Linux/x86_64

Hi Haochen,

Try removing lines 37-41 since these are precisely the bogus warnings that the 
patch is meant to eliminate.

Regards

Paul

On Thu, 18 Jul 2024 at 14:38, haochen.jiang 
mailto:haoch...@ecsmtp.sh.intel.com>> wrote:
On Linux/x86_64,

c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc is the first bad commit
commit c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc
Author: Paul Thomas mailto:pa...@gcc.gnu.org>>
Date:   Thu Jul 18 08:51:35 2024 +0100

Fortran: Suppress bogus used uninitialized warnings [PR108889].

caused

FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g   at line 32 (test for warnings, 
line 31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os   at line 32 (test for warnings, line 
31)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-2135/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com<http://intel.com>.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

RE: [r15-2135 Regression] FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings, line 31) on Linux/x86_64

2024-07-18 Thread Jiang, Haochen

Just did a quick test. Correct myself previously. Those lines also
needs to be removed since they are XPASS now.

However the real issue is the dg-note at Line 32, that is the warning
disappeared.

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
index 498ef70b63a..8cf79a10e8d 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
@@ -29,16 +29,10 @@ program main
   implicit none (type, external)
   integer :: j
   integer, allocatable :: A(:)
-  ! { dg-note {'a' declared here} {} { target *-*-* } .-1 }
   character(len=:), allocatable :: my_str
   character(len=15), allocatable :: my_str15

   A = [(3*j, j=1, 10)]
-  ! { dg-bogus {'a\.offset' is used uninitialized} {PR77504 etc.} { xfail 
*-*-* } .-1 }
-  ! { dg-bogus {'a\.dim\[0\]\.lbound' is used uninitialized} {PR77504 etc.} { 
xfail *-*-* } .-2 }
-  ! { dg-bogus {'a\.dim\[0\]\.ubound' is used uninitialized} {PR77504 etc.} { 
xfail *-*-* } .-3 }
-  ! { dg-bogus {'a\.dim\[0\]\.lbound' may be used uninitialized} {PR77504 
etc.} { xfail { ! __OPTIMIZE__ } } .-4 }
-  ! { dg-bogus {'a\.dim\[0\]\.ubound' may be used uninitialized} {PR77504 
etc.} { xfail { ! __OPTIMIZE__ } } .-5 }
   call foo (A, size(A))
   call bar (A)
   my_str = "1234567890"

After the change, all the tests are passed. However, is that right?

I am not familiar with either Fortran or libgomp, but the warning
like something declared here which might report variable declaration
conflict seems needed.

Thx,
Haochen

From: Jiang, Haochen
Sent: Friday, July 19, 2024 9:49 AM
To: Paul Richard Thomas 
Cc: pa...@gcc.gnu.org; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org
Subject: RE: [r15-2135 Regression] FAIL: 
libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings, line 
31) on Linux/x86_64

Hi Paul,

I suspect it is not the correct way to do that, those lines are ok since they 
are XFAIL. The problem is that specific warning test.

Thx,
Haochen

From: Paul Richard Thomas 
mailto:paul.richard.tho...@gmail.com>>
Sent: Friday, July 19, 2024 12:28 AM
To: haochen.jiang 
mailto:haoch...@ecsmtp.sh.intel.com>>
Cc: pa...@gcc.gnu.org<mailto:pa...@gcc.gnu.org>; 
gcc-regress...@gcc.gnu.org<mailto:gcc-regress...@gcc.gnu.org>; 
gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Jiang, Haochen 
mailto:haochen.ji...@intel.com>>
Subject: Re: [r15-2135 Regression] FAIL: 
libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings, line 
31) on Linux/x86_64

Hi Haochen,

Try removing lines 37-41 since these are precisely the bogus warnings that the 
patch is meant to eliminate.

Regards

Paul

On Thu, 18 Jul 2024 at 14:38, haochen.jiang 
mailto:haoch...@ecsmtp.sh.intel.com>> wrote:
On Linux/x86_64,

c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc is the first bad commit
commit c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc
Author: Paul Thomas mailto:pa...@gcc.gnu.org>>
Date:   Thu Jul 18 08:51:35 2024 +0100

Fortran: Suppress bogus used uninitialized warnings [PR108889].

caused

FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions   at line 32 (test for warnings, line 
31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g   at line 32 (test for warnings, 
line 31)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os   at line 32 (test for warnings, line 
31)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-2135/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite &&

RE: [r15-429 Regression] FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test for excess errors) on Linux/x86_64

2024-05-14 Thread Jiang, Haochen

Hi Matthias,

From my side, I get several error like this:

/export/users/haochenj/src/gcc-bisect/master/master/r15-429/bld/x86_64-linux/32/libstdc++-v3/include/experimental/bits/simd_builtin.h:131:
 error: could not convert 
'std::experimental::parallelism_v2::__vec_shuffle<__vector(4) wchar_t, 
__extract_part<2, 3, 2, wchar_t, 3>(_SimdWrapper)::, std::integer_sequence 
>(std::experimental::parallelism_v2::__as_vector<_SimdWrapper 
>(__x), (std::make_index_sequence<2>(), std::make_index_sequence<2>()), 
(std::experimental::parallelism_v2::__extract_part<2, 3, 
2, wchar_t, 3>(_SimdWrapper)::(), 
std::experimental::parallelism_v2::__extract_part<2, 3, 2, wchar_t, 
3>(_SimdWrapper)::()))' from 
'__vector(2) wchar_t' to 'std::conditional_t >' {aka 
'std::conditional >::type'}

See if this helps.

Thx,
Haochen

> -Original Message-
> From: Matthias Kretz 
> Sent: Tuesday, May 14, 2024 9:26 PM
> To: Jiang, Haochen 
> Cc: gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org
> Subject: Re: [r15-429 Regression] FAIL:
> experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test
> for excess errors) on Linux/x86_64
> 
> Thanks for the report. But I'm unable to reproduce the issue. I'm testing on a
> Skylake-AVX512 system. I even did a clean rebuild of all of GCC using your
> configuration (minus your prefix) and still no failure.
> 
> Could you please send me your libstdc++.log after failing the test?
> 
> Best,
>   Matthias
> 
> On Montag, 13. Mai 2024 18:55:13 MESZ haochen. jiang wrote:
> > On Linux/x86_64,
> >
> > fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9 is the first bad commit
> > commit fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9
> > Author: Matthias Kretz 
> > Date:   Mon May 6 12:13:55 2024 +0200
> >
> > libstdc++: Use __builtin_shufflevector for simd split and concat
> >
> > caused
> >
> > FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi
> > (test for excess errors)
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-429/usr
> > --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check
> >
> RUNTESTFLAGS="conformance.exp=experimental/simd/pr109261_constexpr_si
> md.cc
> > --target_board='unix{-m32}'"
> >
> > (Please do not reply to this email, for question about this report, contact
> > me at haochen dot jiang at intel.com.) (If you met problems with
> > cascadelake related, disabling AVX512F in command line might save that.)
> > (However, please make sure that there is no potential problems with
> > AVX512.)
> 
> 
> --
> ─
> ─
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
>  std::simd
> ─
> ─

RE: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-14 Thread Jiang, Haochen

Also cc Honza and Richard since we touched generic tune.

Thx,
Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Wednesday, May 15, 2024 11:04 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue
> 
> Hi all,
> 
> Recently, we have encountered several random performance regressions in
> benchmarks commit to commit. It is caused by cross cacheline issue for tight
> loops.
> 
> We are trying to solve the issue by two patches. One is adjusting the loop
> alignment for generic tune, the other is aligning tight and hot loops more
> aggressively.
> 
> For SPECINT, we get a 0.85% improvement overall in rates, under option
> -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids.
> 
> BenchMarks  EMR Rates
> 500.perlbench_r -1.21%
> 502.gcc_r   0.78%
> 505.mcf_r   0.00%
> 520.omnetpp_r   0.41%
> 523.xalancbmk_r 1.33%
> 525.x264_r  2.83%
> 531.deepsjeng_r 1.11%
> 541.leela_r 0.00%
> 548.exchange2_r 2.36%
> 557.xz_r0.98%
> Geomean-int 0.85%
> 
> Side effect is that we get a 1.40% increase in codesize.
> 
> BenchMarks  EMR Codesize
> 500.perlbench_r 0.70%
> 502.gcc_r   0.67%
> 505.mcf_r   3.26%
> 520.omnetpp_r   0.31%
> 523.xalancbmk_r 1.15%
> 525.x264_r  1.11%
> 531.deepsjeng_r 1.40%
> 541.leela_r 1.31%
> 548.exchange2_r 3.06%
> 557.xz_r1.04%
> Geomean-int 1.40%
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> 
> After we committed into trunk for a month, if there isn't any unexpected
> happen. We planned to backport it to GCC14.2.
> 
> Thx,
> Haochen
> 
> Haochen Jiang (1):
>   Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
> 
> liuhongt (1):
>   Align tight&hot loop without considering max skipping bytes.
> 
>  gcc/config/i386/i386.cc  | 148 ++-
>  gcc/config/i386/i386.md  |  10 ++-
>  gcc/config/i386/x86-tune-costs.h |   2 +-
>  3 files changed, 154 insertions(+), 6 deletions(-)
> 
> --
> 2.31.1

[r15-579 Regression] FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_prof-kernels-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2024-05-19 Thread Jiang, Haochen

On Linux/x86_64,

a9251ab3c91c8c559d0306838575a666ae62dff4 is the first bad commit
commit a9251ab3c91c8c559d0306838575a666ae62dff4
Author: Richard Biener 
Date:   Thu May 16 12:35:28 2024 +0200

wrong code with points-to and volatile

caused


with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-579/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

RE: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls

2024-05-20 Thread Jiang, Haochen

Thanks for your help! I haven't noticed this file is newly added to GCC.
I suppose that is why the buildbot is reporting something the whole
afternoon for me.

So just for confirm, does that mean we will always need to run
gcc/regenerate-opt-urls.py after adding or removing options in GCC?
My current understanding is yes.

Thx,
Haochen

> -Original Message-
> From: Mark Wielaard 
> Sent: Monday, May 20, 2024 7:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Palmer Dabbelt ; Jeff Law
> ; Jiang, Haochen ; Hu,
> Lin1 ; Mark Wielaard 
> Subject: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls
> 
> risc-v added an -mfence-tso option. i386 removed Xeon Phi ISA support options.
> But the opt.urls files weren't regenerated.
> 
> Fixes: a6114c2a6911 ("RISC-V: Implement -m{,no}fence-tso")
> Fixes: e1a7e2c54d52 ("i386: Remove Xeon Phi ISA support")
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.opt.urls: Regenerate.
>   * config/i386/i386.opt.urls: Likewise.
> ---
>  gcc/config/i386/i386.opt.urls   | 15 ---
>  gcc/config/riscv/riscv.opt.urls |  3 +++
>  2 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls 
> index
> 81c5bb9a9270..40e8a8449367 100644
> --- a/gcc/config/i386/i386.opt.urls
> +++ b/gcc/config/i386/i386.opt.urls
> @@ -238,12 +238,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx2)
>  mavx512f
>  UrlSuffix(gcc/x86-Options.html#index-mavx512f)
> 
> -mavx512pf
> -UrlSuffix(gcc/x86-Options.html#index-mavx512pf)
> -
> -mavx512er
> -UrlSuffix(gcc/x86-Options.html#index-mavx512er)
> -
>  mavx512cd
>  UrlSuffix(gcc/x86-Options.html#index-mavx512cd)
> 
> @@ -262,12 +256,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx512ifma)
>  mavx512vbmi
>  UrlSuffix(gcc/x86-Options.html#index-mavx512vbmi)
> 
> -mavx5124fmaps
> -UrlSuffix(gcc/x86-Options.html#index-mavx5124fmaps)
> -
> -mavx5124vnniw
> -UrlSuffix(gcc/x86-Options.html#index-mavx5124vnniw)
> -
>  mavx512vpopcntdq
>  UrlSuffix(gcc/x86-Options.html#index-mavx512vpopcntdq)
> 
> @@ -409,9 +397,6 @@ UrlSuffix(gcc/x86-Options.html#index-mrdrnd)
>  mf16c
>  UrlSuffix(gcc/x86-Options.html#index-mf16c)
> 
> -mprefetchwt1
> -UrlSuffix(gcc/x86-Options.html#index-mprefetchwt1)
> -
>  mfentry
>  UrlSuffix(gcc/x86-Options.html#index-mfentry)
> 
> diff --git a/gcc/config/riscv/riscv.opt.urls 
> b/gcc/config/riscv/riscv.opt.urls index
> 2f01ae5d6271..e02ef3ee3dd9 100644
> --- a/gcc/config/riscv/riscv.opt.urls
> +++ b/gcc/config/riscv/riscv.opt.urls
> @@ -91,3 +91,6 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
> 
>  ; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
> 
> +mfence-tso
> +UrlSuffix(gcc/RISC-V-Options.html#index-mfence-tso)
> +
> --
> 2.45.1

RE: [PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Jiang, Haochen

> > diff --git a/gcc/testsuite/gcc.target/i386/pr115069.c
> b/gcc/testsuite/gcc.target/i386/pr115069.c
> > new file mode 100644
> > index 000..c4b48b602ef
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr115069.c
> > @@ -0,0 +1,78 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +/* { dg-final { scan-assembler-not "vpermq" } } */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +typedef int8_t  stress_vint8_t  __attribute__ ((vector_size (16)));
> No need for such big testcase,
> 
> typedef char v16qi __attribute__((vector_size(16)));
> v16qi
> foo (v16qi a, v16qi b)
> {
> return a * b;
> }
> 
> should be enough, with -mavx2 -mno-avx512f

Yes. I will change to that.

Thx,
Haochen

> > +
> > +#define OPS(a, b, c, s, v23, v3) \
> > +do {   \
> > +   a += b; \
> > +   a |= b; \
> > +   a -= b; \
> > +   a &= ~b;\
> > +   a *= c; \
> > +   a = ~a; \
> > +   a *= s; \
> > +   a ^= c; \
> > +   a <<= 1;\
> > +   b >>= 1;\
> > +   b += c; \
> > +   a %= v23;   \
> > +   c /= v3;\
> > +   b = b ^ c;  \
> > +   c = b ^ c;  \
> > +   b = b ^ c;  \
> > +} while (0)
> > +
> > +volatile uint8_t csum8_put;
> > +
> > +void stress_vecmath(void)
> > +{
> > +   const stress_vint8_t v23_8 = {
> > +   0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17,
> > +   0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17
> > +   };
> > +   const stress_vint8_t v3_8 = {
> > +   0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
> > +   0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03
> > +   };
> > +   stress_vint8_t a8 = {
> > +   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> > +   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
> > +   };
> > +   stress_vint8_t b8 = {
> > +   0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef,
> > +   0x0f, 0x1e, 0x2d, 0x3c, 0x4b, 0x5a, 0x69, 0x78
> > +   };
> > +   stress_vint8_t c8 = {
> > +   0x01, 0x02, 0x03, 0x02, 0x01, 0x02, 0x03, 0x02,
> > +   0x03, 0x02, 0x01, 0x02, 0x03, 0x02, 0x01, 0x02
> > +   };
> > +   stress_vint8_t s8 = {
> > +   0x01, 0x01, 0x01, 0x01, 0x02, 0x02, 0x02, 0x02,
> > +   0x01, 0x01, 0x02, 0x02, 0x01, 0x01, 0x02, 0x02,
> > +   };
> > +   const uint8_t csum8_val =  (uint8_t)0x1b;
> > +   int i;
> > +   uint8_t csum8;
> > +
> > +   for (i = 1000; i; i--) {
> > +   OPS(a8, b8, c8, s8, v23_8, v3_8);
> > +   OPS(a8, b8, c8, s8, v23_8, v3_8);
> > +   OPS(a8, b8, c8, s8, v23_8, v3_8);
> > +   OPS(a8, b8, c8, s8, v23_8, v3_8);
> > +   OPS(a8, b8, c8, s8, v23_8, v3_8);
> > +   OPS(a8, b8, c8, s8, v23_8, v3_8);
> > +   }
> > +
> > +   csum8 = a8[0]  ^ a8[1]  ^ a8[2]  ^ a8[3]  ^
> > +   a8[4]  ^ a8[5]  ^ a8[6]  ^ a8[7]  ^
> > +   a8[8]  ^ a8[9]  ^ a8[10] ^ a8[11] ^
> > +   a8[12] ^ a8[13] ^ a8[14] ^ a8[15];
> > +   csum8_put = csum8;
> > +}
> > --
> > 2.31.1
> >
> 
> 
> --
> BR,
> Hongtao

RE: [PATCH v3] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Jiang, Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, May 21, 2024 9:04 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH v3] i386: Disable ix86_expand_vecop_qihi2
> when !TARGET_AVX512BW
> 
> On Tue, May 21, 2024 at 11:01 AM Haochen Jiang
>  wrote:
> >
> > Hi all,
> >
> > This is the v3 patch to fix PR115069. The new testcase has passed.
> >
> > Changes in v3:
> >   - Simplify the testcase.
> >
> > Changes in v2:
> >   - Add a testcase.
> >   - Change the comment for the early exit.
> >
> > Thx,
> > Haochen
> >
> > Since vpermq is really slow, we should avoid using it for permutation
> > when vpmovwb is not available (needs AVX512BW) for
> ix86_expand_vecop_qihi2
> > and fall back to ix86_expand_vecop_qihi.
> >
> > gcc/ChangeLog:
> >
> > PR target/115069
> > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> > Do not enable the optimization when AVX512BW is not enabled.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/115069
> > * gcc.target/i386/pr115069.c: New.
> 
> LGTM, with a nit below.

Ok and I will also backport the patch to GCC14.

Thx,
Haochen

> 
> Thanks,
> Uros.

RE: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-28 Thread Jiang, Haochen

> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for this if there's no objection in 48 hours.
> > > >
> > > > After we committed into trunk for a month, if there isn't any
> > > > unexpected happen. We planned to backport it to GCC14.2.

I accidentally backported it to GCC14.2 for now since I did not realize
that my local branch is on GCC14, not trunk.

If there is something unexpected on trunk, I will revert the patches for
GCC14.

Thx,
Haochen

> > > >
> > > > Thx,
> > > > Haochen
> > > >
> > > > Haochen Jiang (1):
> > > >   Adjust generic loop alignment from 16:11:8 to 16 for Intel
> > > > processors
> > For this one, current znver{1,2,3,4,5}_cost already set loop align as
> > 16, so I think it should be fine set it to generic_cost.
> > > >
> > > > liuhongt (1):
> > > >   Align tight&hot loop without considering max skipping bytes.
> > For this one, although we have seen similar growth on AMD's
> > processors, it's still nice to have someone from AMD to look at this
> > to see if it's what they need.
> > > >
> > > >  gcc/config/i386/i386.cc  | 148 ++-
> > > >  gcc/config/i386/i386.md  |  10 ++-
> > > >  gcc/config/i386/x86-tune-costs.h |   2 +-
> > > >  3 files changed, 154 insertions(+), 6 deletions(-)
> > > >
> > > > --
> > > > 2.31.1

RE: [r15-983 Regression] FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors) on Linux/x86_64

2024-06-03 Thread Jiang, Haochen

The fail is expected since -march=cascadelake -mavx10.1-256 will lead to a 
warning.
Also, we could not use -mno-avx512f with -mavx10.1-256, which also lead to a 
warning.

> -Original Message-
> From: haochen.jiang 
> Sent: Monday, June 3, 2024 10:22 PM
> To: Jiang, Haochen ; gcc-regress...@gcc.gnu.org;
> gcc-patches@gcc.gnu.org
> Subject: [r15-983 Regression] FAIL: gcc.target/i386/avx10_1-25.c (test for
> excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 1f2ca510065a2033bac408eb5a960ef0126f25cc is the first bad commit
> commit 1f2ca510065a2033bac408eb5a960ef0126f25cc
> Author: Haochen Jiang 
> Date:   Mon May 20 15:52:32 2024 +0800
> 
> Add AVX10.1 target_clones support
> 
> caused
> 
> FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-983/usr --enable-clocale=gnu --with-system-zlib --
> with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_1-25.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_1-25.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)

RE: [PATCH 00/22] Support AVX10.2 ymm rounding

2024-08-14 Thread Jiang, Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Wednesday, August 14, 2024 5:02 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH 00/22] Support AVX10.2 ymm rounding
> 
> Hi all,
> 
> The initial patch for AVX10.2 has been merged this week.
> 
> For the upcoming patches, we will first upstream ymm rounding control part.
> 
> In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding
> control will also have 256-bit rounding control in AVX10.2.
> 
> For clearness, the patch order is based on alphabetical order. Each patch will
> include its intrin definition and related tests. Sometimes pattern is not 
> changed
> in the patch because the previous change in the patch series has already
> enabled the 256 bit rounding in the pattern.

It seems that the patch series somehow corrupted.

Therefore, I have uploaded a vendor branch for all the patches for anyone 
interested:
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ix86/heads/avx10.2

Thx,
Haochen

> 
> Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?
> 
> Thx,
> Haochen
> 
> Ref: Intel Advanced Vector Extensions 10.2 Architecture Specification
> https://cdrdv2.intel.com/v1/dl/getContent/828965
>

RE: [r15-3000 Regression] FAIL: gcc.target/i386/avx10_2-rounding-3.c (test for excess errors) on Linux/x86_64

2024-08-20 Thread Jiang, Haochen

The three avx10.2 related test regression are all related to the usage of
-mavx10.2 -march=cascadelake, which is unavoidable warning.

I am considering to change -march=cascadelake to a arch with AVX10 in
the future to avoid these false alarms.

Thx,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Tuesday, August 20, 2024 11:59 PM
> To: Hu, Lin1 ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: [r15-3000 Regression] FAIL: gcc.target/i386/avx10_2-rounding-3.c
> (test for excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 3d1b5530ea1d23e26dc5ab70aa4a2e7b9dc19b50 is the first bad commit
> commit 3d1b5530ea1d23e26dc5ab70aa4a2e7b9dc19b50
> Author: Hu, Lin1 
> Date:   Mon Aug 19 10:09:03 2024 +0800
> 
> AVX10.2 ymm rounding: Support vcvt{,u}w2ph and vdivp{s,d,h} intrins
> 
> caused
> 
> FAIL: gcc.target/i386/avx10_2-rounding-3.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-3000/usr --enable-clocale=gnu --with-system-zlib -
> -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-rounding-3.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-rounding-3.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)

RE: [r15-3185 Regression] FAIL: gcc.target/i386/avx10_2-compare-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread Jiang, Haochen

As applied to all AVX10.2 patches, it is caused by vector size warning
mentioned previously.

Thx,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Monday, August 26, 2024 11:54 PM
> To: jun.zh...@intel.com; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: [r15-3185 Regression] FAIL: gcc.target/i386/avx10_2-compare-1.c
> (test for excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 576bd309ded9dfe258023f26924c064a7bf12875 is the first bad commit
> commit 576bd309ded9dfe258023f26924c064a7bf12875
> Author: Zhang, Jun 
> Date:   Mon Aug 26 10:53:54 2024 +0800
> 
> AVX10.2: Support compare instructions
> 
> caused
> 
> FAIL: gcc.target/i386/avx10_2-compare-1.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-3185/usr --enable-clocale=gnu --with-system-zlib -
> -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-compare-1.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-compare-1.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)

RE: [gcc-wwwdocs PATCH] gcc-15: Mention recent update for x86_64 backend

2024-08-28 Thread Jiang, Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Thursday, August 29, 2024 3:20 AM
> 
> On Wed, 28 Aug 2024, Haochen Jiang wrote:
> > Sorry for the disturb since I mis-typoed gcc-patches to gcc-patchs,
> > resend the patch.
> 
> No worries.
> 
> > This patch will add documentation for recent update in x86-64 backend.
> 
> Thank you!
> 
> > +  Xeon Phi CPUs support (a.k.a. Knight Landing and Knight Mill)
> > + were removed
> 
> I believe "Support for Xeon Phi CPUs" or "Xeon Phi CPU support" would be 
> better,
> though not 100% sure.
> 
> > +  in GCC 15. GCC will no longer accept -mavx5124fmaps,
> > +  -mavx5124vnniw, -mavx512er,
> > +  -mavx512pf, -mprefetchwt1,
> > +  -march=knl, -march=knm, -
> mtune=knl
> > +  or -mtune=knm compiler switches.
> 
> Is there a particular rationale for the order of switches? If not, I'd sort 
> them
> alphabetically (which is partially the case already) and start with -march=...
> 
> The patch is okay if you consider (which is not necessarily making) these 
> changes.

I will change them and commit them tomorrow if there is no objection.

Thx,
Haochen

> 
> Gerald

RE: [COMMITTED] testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

2024-06-04 Thread Jiang, Haochen

Hi Rainer,

I will also backport the patch to GCC14 since the original patch is also
backported.

Thank for your test on Solaris/x86!

Thx,
Haochen

> -Original Message-
> From: Rainer Orth 
> Sent: Tuesday, June 4, 2024 7:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jiang, Haochen 
> Subject: [COMMITTED] testsuite: i386: Require ifunc support in
> gcc.target/i386/avx10_1-25.c etc.
> 
> Two new AVX10.1 tests FAIL on Solaris/x86:
> 
> FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)
> FAIL: gcc.target/i386/avx10_1-26.c (test for excess errors)
> 
> Excess errors:
> /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/avx10_1-
> 25.c:6:9: error: the call requires 'ifunc', which is not supported by this 
> target
> 
> Fixed by requiring ifunc support.
> 
> Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
> 
> Committed to trunk.
> 
>   Rainer
> 
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2024-06-04  Rainer Orth  
> 
>   gcc/testsuite:
>   * gcc.target/i386/avx10_1-25.c: Require ifunc support.
>   * gcc.target/i386/avx10_1-26.c: Likewise.

RE: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread Jiang, Haochen

It might be a false positive timeout alert. Please ignore that first.

Thx,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Tuesday, July 23, 2024 7:51 PM
> To: j...@ventanamicro.com; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -
> std=gnu++98 execution test on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2 is the first bad commit
> commit 88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2
> Author: Jeff Law 
> Date:   Mon Jul 22 08:45:10 2024 -0600
> 
> [NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as 
> live
> in ext-dce
> 
> caused
> 
> FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14 execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17 execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20 execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98 execution test
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-2196/usr --enable-clocale=gnu --with-system-zlib -
> -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-10.c --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-10.c --target_board='unix{-m32\ -
> march=cascadelake}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-6.c --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-6.c --target_board='unix{-m32\ -
> march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)

RE: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread Jiang, Haochen




> -Original Message-
> From: Jakub Jelinek 
> Sent: Wednesday, July 24, 2024 1:09 PM
> To: Jiang, Haochen 
> Cc: j...@ventanamicro.com; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c
> -std=gnu++98 execution test on Linux/x86_64
> 
> On Wed, Jul 24, 2024 at 01:49:06AM +, Jiang, Haochen wrote:
> > It might be a false positive timeout alert. Please ignore that first.
> 
> It is not.  I'm seeing it too consistently on i686-linux:
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++11
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++23
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++26
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++11
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++23
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++26
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++11
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++23
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++26
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++11
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++23
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++26
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98
> execution test
> 
> The compilation of convert-bfp-6.c itself is identical between the older 
> (where
> it didn't fail) and newer (where it fails) builds, what has changed is 
> libgcc.a.
> In particular, what matters is libgcc/bid_binarydecimal.o.
> If I link all objects from libgcc from older (good libgcc) but 
> bid_binarydecimal.o
> (that one from newer bad libgcc), convert-bfp-6 still aborts, if I link all 
> objects
> from libgcc from newer (bad libgcc) but bid_binarydecimal.o (that one from
> older good libgcc), convert-bfp-6 works.

I see. If it is not a false alarm, then it seems to me that 
gcc-15-2212-gad642d2c950
from Jeff might fix the problem from the regression report. But I am not sure 
if it
really fix the problem or happen to be right.

Thx,
Haochen

> 
>   Jakub

RE: [PATCH Ping] i386: Use BLKmode for {ld,st}tilecfg

2024-07-25 Thread Jiang, Haochen

Ping for this patch

Thx,
Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Thursday, July 18, 2024 9:45 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; hjl.to...@gmail.com;
> ubiz...@gmail.com
> Subject: [PATCH] i386: Use BLKmode for {ld,st}tilecfg
> 
> Hi all,
> 
> For AMX instructions related with memory, we will treat the memory
> size as not specified since there won't be different size causing
> confusion for memory.
> 
> This will change the output under Intel mode, which is broken for now when
> using with assembler and aligns to current binutils behavior.
> 
> Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk?
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-expand.cc (ix86_expand_builtin): Change
>   from XImode to BLKmode.
>   * config/i386/i386.md (ldtilecfg): Change XI to BLK.
>   (sttilecfg): Ditto.
> ---
>  gcc/config/i386/i386-expand.cc |  2 +-
>  gcc/config/i386/i386.md| 12 +---
>  2 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 9a31e6df2aa..d9ad06264aa 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -14198,7 +14198,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx
> subtarget,
> op0 = convert_memory_address (Pmode, op0);
> op0 = copy_addr_to_reg (op0);
>   }
> -  op0 = gen_rtx_MEM (XImode, op0);
> +  op0 = gen_rtx_MEM (BLKmode, op0);
>if (fcode == IX86_BUILTIN_LDTILECFG)
>   icode = CODE_FOR_ldtilecfg;
>else
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index de9f4ba0496..86989d4875a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -28975,24 +28975,22 @@
> (set_attr "type" "other")])
> 
>  (define_insn "ldtilecfg"
> -  [(unspec_volatile [(match_operand:XI 0 "memory_operand" "m")]
> +  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
>  UNSPECV_LDTILECFG)]
>"TARGET_AMX_TILE"
>"ldtilecfg\t%0"
>[(set_attr "type" "other")
> (set_attr "prefix" "maybe_evex")
> -   (set_attr "memory" "load")
> -   (set_attr "mode" "XI")])
> +   (set_attr "memory" "load")])
> 
>  (define_insn "sttilecfg"
> -  [(set (match_operand:XI 0 "memory_operand" "=m")
> -(unspec_volatile:XI [(const_int 0)] UNSPECV_STTILECFG))]
> +  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +(unspec_volatile:BLK [(const_int 0)] UNSPECV_STTILECFG))]
>"TARGET_AMX_TILE"
>"sttilecfg\t%0"
>[(set_attr "type" "other")
> (set_attr "prefix" "maybe_evex")
> -   (set_attr "memory" "store")
> -   (set_attr "mode" "XI")])
> +   (set_attr "memory" "store")])
> 
>  (include "mmx.md")
>  (include "sse.md")
> --
> 2.31.1

RE: [PATCH] i386: Fix AVX512 intrin macro typo

2024-07-25 Thread Jiang, Haochen

> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, July 26, 2024 2:31 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Fix AVX512 intrin macro typo
> 
> On Fri, Jul 26, 2024 at 02:25:22PM +0800, Haochen Jiang wrote:
> > Hi all,
> >
> > There are several typo in AVX512 intrins macro define. They will
> > eventually result in errors with -O0. This patch will fix that.
> 
> Add a testcase that verifies that?

Ok, I will add testcases with -O0 for them.

Thx,
Haochen

> 
> > Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to
> > GCC14, GCC 13 and GCC 12?
> >
> > Thx,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > * config/i386/avx512dqintrin.h
> > (_mm_mask_fpclass_ss_mask): Correct operand order.
> > (_mm_mask_fpclass_sd_mask): Ditto.
> > (_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
> > (_mm_reduce_round_ss): Ditto.
> > * config/i386/avx512vlbwintrin.h
> > (_mm256_mask_alignr_epi8): Correct operand usage.
> > (_mm_mask_alignr_epi8): Ditto.
> > * config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.
> 
>   Jakub

RE: [PATCH v2] i386: Fix AVX512 intrin macro typo

2024-07-28 Thread Jiang, Haochen

> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, July 26, 2024 7:59 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH v2] i386: Fix AVX512 intrin macro typo
> 
> On Fri, Jul 26, 2024 at 04:10:48PM +0800, Haochen Jiang wrote:
> > * config/i386/avx512dqintrin.h
> > (_mm_mask_fpclass_ss_mask): Correct operand order.
> > (_mm_mask_fpclass_sd_mask): Ditto.
> > (_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
> > (_mm_reduce_round_ss): Ditto.
> 
> You haven't mentioned the
>   (_mm_maskz_reduce_round_ss): Use
> __builtin_ia32_reducess_mask_round
>   instead of __builtin_ia32_reducesd_mask_round.
> change here.
> 
> > * config/i386/avx512vlbwintrin.h
> > (_mm256_mask_alignr_epi8): Correct operand usage.
> > (_mm_mask_alignr_epi8): Ditto.
> > * config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512bw-vpalignr-1b.c: New test.
> > * gcc.target/i386/avx512dq-vfpclasssd-1b.c: Ditto.
> > * gcc.target/i386/avx512dq-vfpcla-1b.c: Ditto.
> > * gcc.target/i386/avx512dq-vreducesd-1b.c: Ditto.
> > * gcc.target/i386/avx512dq-vreducess-1b.c: Ditto.
> > * gcc.target/i386/avx512vl-valignq-1b.c: Ditto.
> 
> I went through all the cases and agree with all the changes.
> 
> LGTM with the above ChangeLog nit fixed, for trunk/release branches, even for
> 14.2 if committed RSN.

Ok. I will commit them and backport them to GCC13 and GCC12 now. For GCC14,
we could wait for GCC14.3 since it has been a weekend passed and not that RSN.
But if it could be in GCC14.2, I will also happy for that.

Thx,
Haochen

> 
>   Jakub

RE: [r13-8949 Regression] FAIL: gcc.target/i386/avx512dq-vreducess-1b.c (test for excess errors) on Linux/x86_64

2024-07-28 Thread Jiang, Haochen

Ah... When I solved the huge conflict this morning due to AVX10 refactor for
GCC13/12, I forgot that it is in GCC14 AVX10.1 refactor when I added the
_mm_avx512_setzero_ps/pd. Should use _mm_setzero_ps/pd instead.

Never do something tweak quickly on a sleepy Monday morning.

Thx,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Monday, July 29, 2024 1:28 PM
> To: Jiang, Haochen ; gcc-regress...@gcc.gnu.org;
> gcc-patches@gcc.gnu.org
> Subject: [r13-8949 Regression] FAIL: gcc.target/i386/avx512dq-vreducess-
> 1b.c (test for excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> bb15c4cf21dbe76df5a225342d1fbe8ecd3c7971 is the first bad commit
> commit bb15c4cf21dbe76df5a225342d1fbe8ecd3c7971
> Author: Haochen Jiang 
> Date:   Thu Jul 25 16:12:20 2024 +0800
> 
> i386: Fix AVX512 intrin macro typo
> 
> caused
> 
> FAIL: gcc.target/i386/avx512dq-vreducesd-1b.c (test for excess errors)
> FAIL: gcc.target/i386/avx512dq-vreducess-1b.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/gcc-
> 13/releases/gcc-13/r13-8949/usr --enable-clocale=gnu --with-system-zlib --
> with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducesd-1b.c --
> target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducesd-1b.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducesd-1b.c --
> target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducesd-1b.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducess-1b.c --
> target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducess-1b.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducess-1b.c --
> target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512dq-vreducess-1b.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)

RE: [PATCH v2] i386: Fix AVX512 intrin macro typo

2024-07-29 Thread Jiang, Haochen




> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, July 29, 2024 4:41 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH v2] i386: Fix AVX512 intrin macro typo
> 
> On Mon, Jul 29, 2024 at 02:07:24AM +, Jiang, Haochen wrote:
> > > LGTM with the above ChangeLog nit fixed, for trunk/release branches,
> > > even for
> > > 14.2 if committed RSN.
> >
> > Ok. I will commit them and backport them to GCC13 and GCC12 now. For
> > GCC14, we could wait for GCC14.3 since it has been a weekend passed and
> not that RSN.
> > But if it could be in GCC14.2, I will also happy for that.
> 
> Please commit it to 14.2 ASAP.

Pushed to GCC14.2

Thx,
Haochen

> 
>   Jakub

RE: [PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-30 Thread Jiang, Haochen

> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, July 30, 2024 2:57 PM
> To: Hongtao Liu 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org;
> Liu, Hongtao ; ubiz...@gmail.com
> Subject: Re: [PATCH v2] i386: Add non-optimize prefetchi intrins
> 
> On Tue, Jul 30, 2024 at 09:28:46AM +0800, Hongtao Liu wrote:
> > On Tue, Jul 30, 2024 at 9:27 AM Hongtao Liu  wrote:
> > >
> > > On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang 
> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I added related O0 testcase in this patch.
> > > >
> > > > Ok for trunk and backport to GCC 14 and GCC 13?
> > > Ok.
> > I mean for trunk, and it needs jakub's approval to backport to GCC14.2.
> 
> IMHO this needs to wait for GCC 14.3 (aka can be committed to 14 branch
> after the 14.2 release).

Ok, for GCC14, I will wait until the release happen.

Thx,
Haochen

> 
>   Jakub

RE: [PATCH 0/1] Initial support for AVX10.2

2024-08-01 Thread Jiang, Haochen

> -Original Message-
> From: Andi Kleen 
> Sent: Friday, August 2, 2024 2:04 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH 0/1] Initial support for AVX10.2
> 
> Haochen Jiang  writes:
> 
> > Hi all,
> >
> > AVX10.2 tech details has been just published on July 31st in the
> > following link:
> >
> > https://cdrdv2.intel.com/v1/dl/getContent/828965
> >
> > For new features and instructions, we could divide them into two parts.
> > One is ymm rounding control, the other is the new instructions.
> >
> > In the following weeks, we plan to upstream ymm rounding part first,
> > following by new instructions. After all of them upstreamed, we will
> > also upstream several patches optimizing codegen with new AVX10.2
> > instructions.
> 
> Are there plans to make INT8/FP8 types supported by the compiler?
> Or just supporting it through some intrinsics?

Hi Andi,

INT8 is actually char per my understanding.

For FP8, currently there is no basic calculation insts yet. So we have no
support for them in AVX10.2 currently, and treat them just as a piece
of char.

Also there might be other issues for FP8 to discuss, like ABI issues, so
we put the support aside for now. When everything is mature, we may
add the support for that.

Thx,
Haochen

> 
> It seems explicit types would be much more convenient to use
> for developers, although it has some drawbacks (like accuracy
> depending on spills)
> 
> I realize it's likely a lot more work, but it might be worth it?
> 
> -Andi

RE: [PATCH 0/1] Initial support for AVX10.2

2024-08-03 Thread Jiang, Haochen

> -Original Message-
> From: Andi Kleen 
> Sent: Saturday, August 3, 2024 3:06 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH 0/1] Initial support for AVX10.2
> 
> >
> > INT8 is actually char per my understanding.
> >
> > For FP8, currently there is no basic calculation insts yet. So we have no
> > support for them in AVX10.2 currently, and treat them just as a piece
> > of char.
> >
> > Also there might be other issues for FP8 to discuss, like ABI issues, so
> > we put the support aside for now. When everything is mature, we may
> > add the support for that.
> 
> But then it's too late isn't it? You wouldn't be able to change
> the types of the existing intrinsics anymore, or later end up with

Hi Andi,

For bf16 type adding in x86, we actually did not add another set of
intrins for the memory type to real type conversion, but reported warning
or maybe note. See https://gcc.gnu.org/gcc-13/changes.html

But you are right, it is somehow nasty when doing that change in the
future. People need to rewrite the code.

BTW, I noticed that in LLVM there is FP8 support for ARM currently
undergoing. I will have a look on it to see if everything is mature.

Maybe need some more input.

Thx,
Haochen

> two sets of intrinsics, and end up with interoperability problems
> with full computation.
> 
> Better to define proper types from the beginning.
> 
> -Andi

RE: [PATCH] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Jiang, Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, April 26, 2024 5:13 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Fix array index overflow in pr105354-2.c
> 
> On Fri, Apr 26, 2024 at 11:03 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > The array index should not be over 8 for v8hi, or it will fail
> > under -O0 or using -fstack-protector.
> >
> > This patch aims to fix that, which is mentioned in PR110621.
> >
> > Commit as obvious and backport to GCC13.
> >
> > Thx,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/110621
> > * gcc.target/i386/pr105354-2.c: As mentioned.
> 
> Please note that the ChangeLog entry gets copied into the relevant
> ChangeLog file independently of the commit message. So, the above
> entry will be copied to gcc/testsuite/ChangeLog without any reference
> to what was mentioned.
>

I see. Forget to pay attention to that ChangeLog entry. My Bad.

Thx,
Haochen
 
> Uros.
>

RE: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors) on Linux/x86_64

2024-09-02 Thread Jiang, Haochen

As each AVX10.2 testcases previously, this is caused by option combination 
warning,
which is expected.

> From: haochen.jiang 
> Sent: Monday, September 2, 2024 9:06 PM
> 
> On Linux/x86_64,
> 
> f77435aa3911c437cba71991509eee57b333b3ce is the first bad commit commit
> f77435aa3911c437cba71991509eee57b333b3ce
> Author: Levy Hsu 
> Date:   Mon Sep 2 10:24:49 2024 +0800
> 
> i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2
> 
> caused
> 
> FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-3359/usr --enable-clocale=gnu --with-system-zlib --
> with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-bf-vector-cmpp-1.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-bf-vector-cmpp-1.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake 
> related,
> disabling AVX512F in command line might save that.) (However, please make sure
> that there is no potential problems with AVX512.)

RE: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors) on Linux/x86_64

2024-09-02 Thread Jiang, Haochen



> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, September 3, 2024 1:47 PM
> To: Jiang, Haochen 
> Cc: haochen.jiang ; ad...@levyhsu.com; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org
> Subject: Re: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-
> cmpp-1.c (test for excess errors) on Linux/x86_64
> 
> On Tue, Sep 3, 2024 at 9:45 AM Jiang, Haochen via Gcc-regression
>  wrote:
> >
> > As each AVX10.2 testcases previously, this is caused by option combination
> warning,
> > which is expected.
> >
> Can we put the warning for mix usage of mavx10 and -mavx512f under -
> Wpsabi
> And add -Wno-psabi in addition to -march=cascadelake to avoid the
> false positive?

We could do that if nobody has objection to that.

Thx,
Haochen

> 
> --
> BR,
> Hongtao

RE: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors) on Linux/x86_64

2024-09-04 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, September 3, 2024 2:40 PM
> 
> On Tue, Sep 3, 2024 at 7:36 AM Jiang, Haochen 
> wrote:
> >
> >
> >
> > > From: Hongtao Liu 
> > > Sent: Tuesday, September 3, 2024 1:47 PM
> > >
> > > On Tue, Sep 3, 2024 at 9:45 AM Jiang, Haochen via Gcc-regression
> > >  wrote:
> > > >
> > > > As each AVX10.2 testcases previously, this is caused by option
> combination
> > > warning,
> > > > which is expected.
> > > >
> > > Can we put the warning for mix usage of mavx10 and -mavx512f under -
> > > Wpsabi
> > > And add -Wno-psabi in addition to -march=cascadelake to avoid the
> > > false positive?
> >
> > We could do that if nobody has objection to that.
> 
> But mixing both doesn't do anything to the ABI so -Wpsabi sounds like the
> wrong bucket to me.  Instead we have to solve the issue at hand - I would
> expect users to run into this warning as well if we do within our testsuite?

If we can bear that "false positive", I suppose it is ok.

l will change the -march=cascadelake to the future CPU contains AVX10.2
when it is doable to eliminate them.

Thx,
Haochen

> 
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > > --
> > > BR,
> > > Hongtao

RE: [PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Jiang, Haochen

> From: Levy Hsu 
> Sent: Thursday, September 5, 2024 4:55 PM
> To: gcc-patches@gcc.gnu.org
> 
> Simple testcase fix, ok for trunk?
> 
> This patch removes specific register checks to account for possible
> register spills and disables tests in 32-bit mode. This adjustment
> is necessary because V4BF operations in 32-bit mode require duplicating
> instructions, which lead to unintended test failures. It fixed the
> case when testing with --target_board='unix{-m32\ -march=cascadelake}'
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Remove specific
> register checks to account for potential register spills. Exclude 
> tests
> in 32-bit mode to prevent incorrect failure reports due to the need 
> for
> multiple instruction executions in handling V4BF operations.
> ---
>  .../gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> index 72e17e99603..17c32c1d36b 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */

You could simply add { target { ! ia32 } } here, but not each line of
scan-assembler-times.

I don't think we need this test been run for -m32 due to V4BF. Actually
the better choice is to split the testcase to two part, for V2BF, I suppose
it could be run under -m32.

Thx,
Haochen

>  /* { dg-options "-mavx10.2 -O2" } */
> -/* { dg-final { scan-assembler-times
> "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times
> "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times
> "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times
> "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-
> 9\]" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-
> 9\]" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times
> "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times
> "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
> 
>  typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
>  typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));
> --
> 2.31.1

RE: [PATCH v5] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-10-09 Thread Jiang, Haochen

> From: Andreas Schwab 
> Sent: Wednesday, October 9, 2024 2:04 PM
> 
> ../../libcpp/directives.cc: In function 'void do_pragma_once(cpp_reader*)':
> ../../libcpp/directives.cc:2078:20: error: unknown conversion type character
> '<' in format [-Werror=format=]
>  2078 |  "%<#pragma once%> in main file");
>   |^
> ../../libcpp/directives.cc:2078:34: error: unknown conversion type character
> '>' in format [-Werror=format=]
>  2078 |  "%<#pragma once%> in main file");
>   |  ^
> cc1plus: all warnings being treated as errors
> make[3]: *** [Makefile:227: directives.o] Error 1
> 

Same bootstrap fail for me and my script on x86_64:

https://gcc.gnu.org/pipermail/gcc-regression/2024-October/080957.html

Thx,
Haochen

> --
> Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196
> BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7 "And now for something
> completely different."

RE: [PATCH] i386: Refactor get_intel_cpu

2024-10-17 Thread Jiang, Haochen

> From: Uros Bizjak 
> Sent: Friday, October 18, 2024 2:05 PM
> 
> On Fri, Oct 18, 2024 at 4:56 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > ISE054 has just been disclosed and you can find doc from here:
> >
> > https://cdrdv2.intel.com/v1/dl/getContent/671368
> >
> > From ISE, it shows that we will have family 0x13 for Diamond Rapids.
> > Therefore, we need to refactor the get_intel_cpu to accept new families.
> > Also I did some reorder in the switch for clearness by putting earlier
> > added products on top for search convenience.
> 
> You can post "git diff -w" patch to see what the patch really does
> without drowning the real change in whitespace changes.
> 

That is a good idea. The change after using git diff -w:

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 2ae383eb6ab..e3eb6e9d250 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -343,10 +343,8 @@ get_intel_cpu (struct __processor_model *cpu_model,
 {
   const char *cpu = NULL;

-  /* Parse family and model only for model 6. */
-  if (cpu_model2->__cpu_family != 0x6)
-return cpu;
-
+  /* Parse family and model for family 0x6.  */
+  if (cpu_model2->__cpu_family == 0x6)
 switch (cpu_model2->__cpu_model)
   {
   case 0x1c:
@@ -390,6 +388,15 @@ get_intel_cpu (struct __processor_model *cpu_model,
CHECK___builtin_cpu_is ("tremont");
cpu_model->__cpu_type = INTEL_TREMONT;
break;
+  case 0x17:
+  case 0x1d:
+   /* Penryn.  */
+  case 0x0f:
+   /* Merom.  */
+   cpu = "core2";
+   CHECK___builtin_cpu_is ("core2");
+   cpu_model->__cpu_type = INTEL_CORE2;
+   break;
   case 0x1a:
   case 0x1e:
   case 0x1f:
@@ -466,14 +473,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
break;
-case 0xa7:
-  /* Rocket Lake.  */
-  cpu = "rocketlake";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("rocketlake");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_ROCKETLAKE;
-  break;
   case 0x55:
CHECK___builtin_cpu_is ("corei7");
cpu_model->__cpu_type = INTEL_COREI7;
@@ -509,6 +508,16 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_CANNONLAKE;
break;
+  case 0x7e:
+  case 0x7d:
+  case 0x9d:
+   /* Ice Lake client.  */
+   cpu = "icelake-client";
+   CHECK___builtin_cpu_is ("corei7");
+   CHECK___builtin_cpu_is ("icelake-client");
+   cpu_model->__cpu_type = INTEL_COREI7;
+   cpu_model->__cpu_subtype = INTEL_COREI7_ICELAKE_CLIENT;
+   break;
   case 0x6a:
   case 0x6c:
/* Ice Lake server.  */
@@ -518,15 +527,13 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_ICELAKE_SERVER;
break;
-case 0x7e:
-case 0x7d:
-case 0x9d:
-   /* Ice Lake client.  */
-  cpu = "icelake-client";
+  case 0xa7:
+   /* Rocket Lake.  */
+   cpu = "rocketlake";
CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("icelake-client");
+   CHECK___builtin_cpu_is ("rocketlake");
cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_ICELAKE_CLIENT;
+   cpu_model->__cpu_subtype = INTEL_COREI7_ROCKETLAKE;
break;
   case 0x8c:
   case 0x8d:
@@ -537,7 +544,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_TIGERLAKE;
break;
-
   case 0xbe:
/* Alder Lake N, E-core only.  */
   case 0x97:
@@ -626,15 +632,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_PANTHERLAKE;
break;
-case 0x17:
-case 0x1d:
-  /* Penryn.  */
-case 0x0f:
-  /* Merom.  */
-  cpu = "core2";
-  CHECK___builtin_cpu_is ("core2");
-  cpu_model->__cpu_type = INTEL_CORE2;
-  break;
   default:
break;
   }

Thx,
Haochen

RE: [PATCH] testsuite: Fix up builtin-prefetch-1.c tests

2024-11-01 Thread Jiang, Haochen

> From: Xi Ruoyao 
> Sent: Saturday, November 2, 2024 1:16 AM
> 
> How can you use "read-shared" as an identifier?  It's not allowed by all C
> standard versions.
> 
 
I did a last minute change to fix that but unfortunately did not get into
my patch.

My apology.

Thx,
Haochen

RE: Pushed: [PATCH] testsuite: Fix up builtin-prefetch-1.c tests

2024-11-01 Thread Jiang, Haochen

> From: Xi Ruoyao 
> Sent: Saturday, November 2, 2024 1:50 AM
\
> > On 11/1/24 11:16 AM, Xi Ruoyao wrote:
> > > How can you use "read-shared" as an identifier?  It's not allowed by
> > > all C standard versions.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.c-torture/execute/builtin-prefetch-1.c (rws): Use
> > >   "read_shared" instead of "read-shared" as the identifier for
> > >   enum value.
> > >   * gcc.dg/builtin-prefetch-1.c (rws): Likewise.
> > OK.  I was seeing similar failures in my tester, but hadn't started to
> > analyze yet.
> 
> Pushed to trunk.

I have a patch get approved to also fix that just waiting for another day.

But still thank for fixing that!

Thx,
Haochen

> 
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

RE: [testsuite] Fix bb-slp-77.c for x86

2024-10-31 Thread Jiang, Haochen

> From: Kugan Vivekanandarajah 
> Sent: Thursday, October 31, 2024 4:29 PM
> 
> This test  bb-slp-77.c  extracted  relies on the completely unrolling of the 
> inner
> loop. However, for x86 in gcc.dg/vect/, loop is not unrolled and the inner 
> loop
> is vectorized thus not triggering expected BB SLP
> 
> Also noticed that the "vectorizing stmts using SLP” count is different when I
> force the loop to unroll for x86. Thus, to keep it simple, moving the test to
> gcc.target/aarch64.

I suppose that should be ok for x86 if limited to aarch64. Thank for your help!

Thx,
Haochen

> 
> Regression tested on aarch64-linux-gnu. Is this OK?
> 
> Thanks,
> Kugan

RE: [r15-4833 Regression] FAIL: gcc.dg/builtin-prefetch-1.c (test for warnings, line 36) on Linux/x86_64

2024-11-01 Thread Jiang, Haochen

> From: haochen.jiang 
> Sent: Friday, November 1, 2024 4:32 PM
> 
> On Linux/x86_64,
> 
> e9ab41b79933d42410126f0eb7b29f820745276c is the first bad commit commit
> e9ab41b79933d42410126f0eb7b29f820745276c
> Author: Hu, Lin1 
> Date:   Fri Nov 1 10:04:40 2024 +0800
> 
> Support Intel MOVRS
> 
> caused
> 
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -O0  (test for excess 
> errors)
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -O1  (test for excess 
> errors)
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -O2 -flto 
> -fno-use-linker-plugin -
> flto-partition=none  (test for excess errors)
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -O2 -flto 
> -fuse-linker-plugin -
> fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -O2  (test for excess 
> errors)
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -O3 -g  (test for excess 
> errors)
> FAIL: gcc.c-torture/execute/builtin-prefetch-1.c   -Os  (test for excess 
> errors)
> FAIL: gcc.dg/builtin-prefetch-1.c (test for excess errors)
> FAIL: gcc.dg/builtin-prefetch-1.c  (test for warnings, line 31)
> FAIL: gcc.dg/builtin-prefetch-1.c  (test for warnings, line 32)
> FAIL: gcc.dg/builtin-prefetch-1.c  (test for warnings, line 33)
> FAIL: gcc.dg/builtin-prefetch-1.c  (test for warnings, line 34)
> FAIL: gcc.dg/builtin-prefetch-1.c  (test for warnings, line 35)
> FAIL: gcc.dg/builtin-prefetch-1.c  (test for warnings, line 36)
> 
Hmm... It seems that the last minute change I did in the patch did not get 
committed.

I will send out a patch to fix that.

Thx,
Haochen

RE: [PATCH 6/7] Support Intel MOVRS

2024-10-23 Thread Jiang, Haochen

> From: Uros Bizjak 
> Sent: Tuesday, October 22, 2024 7:32 PM
> 
> On Tue, Oct 22, 2024 at 8:31 AM Haochen Jiang 
> wrote:
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc index
> > 37c7c98e5c7..52520d54b84 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -1296,8 +1296,8 @@ expand_builtin_prefetch (tree exp)
> >  }
> >else
> >  op1 = expand_normal (arg1);
> > -  /* Argument 1 must be either zero or one.  */
> > -  if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
> > +  /* Argument 1 must be 0, 1 or 2.  */  if (INTVAL (op1) < 0 ||
> > + INTVAL (op1) > 2)
> 
> You can use the IN_RANGE macro here:
> 
> if (!IN_RANGE (INTVAL (op1), 0, 2))
> 
> This will call INTVAL only once.

Ok I will do that change.

Thx,
Haochen

> 
> Uros.

RE: [gcc-wwwdocs PATCH] gcc-15: Mention new ISA and Diamond Rapids support for x86_64 backend

2024-11-24 Thread Jiang, Haochen

> From: Gerald Pfeifer 
> Sent: Sunday, November 24, 2024 7:17 AM
> 
> On Mon, 11 Nov 2024, Haochen Jiang wrote:
> > This patch will add recent new ISA and arch support for x86_64 backend
> > into gcc-wwwdocs.
> 
> > +  New ISA extension support for Intel AMX-AVX512 was added.
> 
> In all these cases, can we just sasy "ISA extension support ... was added" and
> drop the "New"?
> 
> > +  compiler switch. 128 and 256 bit MOVRS intrinsics are available
> > + via
> 
> "128- and 256-bit..."
> 
> > +  The EVEX version support for Intel SM4 was added.
> > +  New 512-bit SM4 intrinsics are available via the
> > +  -msm4 -mavx10.2-512 compiler switch.
> 
> Just "EVEX version support..."
> 
> > +AMX-FP8, AMX-MOVRS, AMX-TF32, AMX-TRANSPOSE, APX_F, AVX10.2
> with
> > + 512 bit
> 
> "512-bit"
> 
> >Support for Xeon Phi CPUs (a.k.a. Knight Landing and Knight Mill) 
> > were
> >removed in GCC 15. GCC will no longer accept -march=knl,
> >-march=knm,-mavx5124fmaps,
> 
> "...no longer accepts..."
> 
> And make the last ", and -mavx5124fmaps" (adding a blank
> and the word "and").
>

I will do all of the changes with little tweak here. The "and" should be added
(actually changed the previous "or" to "and") between -mtune=knl
and -mtune=knm.

Thx,
Haochen

RE: [committed] c: Default to -std=gnu23

2024-11-24 Thread Jiang, Haochen

> From: Joseph Myers 
> Sent: Saturday, November 16, 2024 7:47 AM
> 
> Change the default language version for C compilation from -std=gnu17
> to -std=gnu23.  A few tests are updated to remove local definitions of
> bool, true and false (where making such an unconditional test change
> seemed to make more sense than changing the test conditionally earlier
> or building it with -std=gnu17); most test issues were already
> addressed in previous patches.  In the case of
> ctf-function-pointers-2.c, it was agreed in bug 117289 that it would
> be OK to put -std=gnu17 in the test and leave more optimal BTF / CTF
> output for this test as a potential future improvement.
> 
> Since the original test fixes, more such fixes have become necessary
> and so are included in this patch.  More noinline attributes are added
> to simulate-thread tests where () meaning a prototype affected test
> results, while gcc.dg/torture/pr117496-1.c (a test declaring a
> function with () then calling it with arguments) gets -std=gnu17
> added.
> 
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.
> 
> NOTE: it's likely there are target-specific tests for non-x86 targets
> that need updating as a result of this change.  See commit
> 9fb5348e3021021e82d75e4ca4e6f8d51a34c24f ("testsuite: Prepare for
> -std=gnu23 default") for examples of changes to prepare the testsuite
> to work with a -std=gnu23 default.  In most cases, adding
> -Wno-old-style-definition (for warnings for old-style function
> definitions) or -std=gnu17 (for other issues such as unprototyped
> function declarations with ()) is appropriate, but watch out for cases
> that indicate bugs with -std=gnu23 (in particular, any ICEs - there
> was only the one nested function test where I had to fix an ICE on
> x86_64).
> 

A quick question: Should we add this in gcc-wwwdocs porting doc or
somewhere else? The upgrade does cause some old code fail to compile
although it should fail.

Thx,
Haochen

RE: [gcc-wwwdocs PATCH] gcc-15: Mention new ISA and Diamond Rapids support for x86_64 backend

2024-11-18 Thread Jiang, Haochen

Ping for this gcc-wwwdocs patch.

Thx,
Haochen

> From: Haochen Jiang 
> Sent: Monday, November 11, 2024 11:16 AM
> 
> Hi all,
> 
> This patch will add recent new ISA and arch support for x86_64 backend into
> gcc-wwwdocs.
> 
> Ok for gcc-wwwdocs?
> 
> Thx,
> Haochen
> 
> ---
>  htdocs/gcc-15/changes.html | 37
> +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index
> 46dad391..d138942c 100644
> --- a/htdocs/gcc-15/changes.html
> +++ b/htdocs/gcc-15/changes.html
> @@ -191,12 +191,49 @@ a work-in-progress.
>  IA-32/x86-64
> 
>  
> +  New ISA extension support for Intel AMX-AVX512 was added.
> +  AMX-AVX512 intrinsics are available via the -mamx-
> avx512
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AMX-FP8 was added.
> +  AMX-FP8 intrinsics are available via the -mamx-fp8
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AMX-MOVRS was added.
> +  AMX-MOVRS intrinsics are available via the -mamx-movrs
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AMX-TF32 was added.
> +  AMX-TF32 intrinsics are available via the -mamx-tf32
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AMX-TRANSPOSE was added.
> +  AMX-TRANSPOSE intrinsics are available via the -mamx-
> transpose
> +  compiler switch.
> +  
>New ISA extension support for Intel AVX10.2 was added.
>AVX10.2 intrinsics are available via the -mavx10.2 or
>-mavx10.2-256 compiler switch with 256-bit vector size
>support. 512-bit vector size support for AVX10.2 intrinsics are
>available via the -mavx10.2-512 compiler switch.
>
> +  New ISA extension support for Intel MOVRS was added.
> +  MOVRS intrinsics are available via the -mmovrs
> +  compiler switch. 128 and 256 bit MOVRS intrinsics are available via the
> +  -mmovrs -mavx10.2 compiler switch. 512 bit MOVRS
> intrinsics
> +  are available via the -mmovrs -mavx10.2-512 compiler
> switch.
> +  
> +  The EVEX version support for Intel SM4 was added.
> +  New 512-bit SM4 intrinsics are available via the
> +  -msm4 -mavx10.2-512 compiler switch.
> +  
> +  GCC now supports the Intel CPU named Diamond Rapids through
> +-march=diamondrapids.
> +Based on Granite Rapids, the switch further enables the AMX-AVX512,
> +AMX-FP8, AMX-MOVRS, AMX-TF32, AMX-TRANSPOSE, APX_F, AVX10.2
> with 512 bit
> +support, AVX-IFMA. AVX-NE-CONVERT, AVX-VNNI-INT16, AVX-VNNI-
> INT8,
> +CMPccXADD, MOVRS, SHA512, SM3, SM4 and USER_MSR ISA extensions.
> +  
>Support for Xeon Phi CPUs (a.k.a. Knight Landing and Knight Mill) were
>removed in GCC 15. GCC will no longer accept -march=knl,
>-march=knm,-mavx5124fmaps,
> --
> 2.31.1

RE: [PATCH] i386: Enhance AVX10.2 convert tests

2024-09-17 Thread Jiang, Haochen

> From: Haochen Jiang 
> Sent: Wednesday, September 18, 2024 1:38 PM
> 
> Hi all,
> 
> For AVX10.2 convert tests, all of them are missing mask tests previously, this
> patch will add them in the tests.
> 
> Tested on sde with assembler with these insts. Ok for trunk?

Please ignore this patch, I missed some files and will re-send the patch soon.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Enhance mask test.
>   * gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
>   * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
>   * gcc.target/i386/avx512f-helper.h: Fix a typo in macro define.
> ---
>  .../i386/avx10_2-512-vcvt2ps2phx-2.c  | 35 ---
>  .../i386/avx10_2-512-vcvthf82ph-2.c   | 27 ++
>  .../i386/avx10_2-512-vcvtne2ph2bf8-2.c| 25 ++---
>  .../i386/avx10_2-512-vcvtne2ph2bf8s-2.c   | 25 ++---
>  .../i386/avx10_2-512-vcvtne2ph2hf8-2.c| 25 ++---
>  .../i386/avx10_2-512-vcvtne2ph2hf8s-2.c   | 25 ++---
>  .../i386/avx10_2-512-vcvtneph2bf8-2.c | 29 ++-
>  .../i386/avx10_2-512-vcvtneph2bf8s-2.c| 27 ++
>  .../i386/avx10_2-512-vcvtneph2hf8-2.c | 27 ++
>  .../i386/avx10_2-512-vcvtneph2hf8s-2.c| 27 ++
>  .../gcc.target/i386/avx512f-helper.h  |  2 +-
>  11 files changed, 209 insertions(+), 65 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c
> index 40dbe18abbe..5e355ae53d4 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c
> @@ -10,24 +10,25 @@
>  #include "avx10-helper.h"
>  #include 
> 
> -#define SIZE_RES (AVX512F_LEN / 16)
> +#define SIZE (AVX512F_LEN / 16)
> +#include "avx512f-mask-type.h"
> 
>  static void
>  CALC (_Float16 *res_ref, float *src1, float *src2)  {
>float fp32;
>int i;
> -  for (i = 0; i < SIZE_RES / 2; i++)
> +  for (i = 0; i < SIZE / 2; i++)
>  {
>fp32 = (float) 2 * i + 7 + i * 0.5;
>res_ref[i] = fp32;
>src2[i] = fp32;
>  }
> -  for (i = SIZE_RES / 2; i < SIZE_RES; i++)
> +  for (i = SIZE / 2; i < SIZE; i++)
>  {
>fp32 = (float)2 * i + 7 + i * 0.5;
>res_ref[i] = fp32;
> -  src1[i - (SIZE_RES / 2)] = fp32;
> +  src1[i - (SIZE / 2)] = fp32;
>  }
>  }
> 
> @@ -35,17 +36,27 @@ void
>  TEST (void)
>  {
>int i;
> -  UNION_TYPE (AVX512F_LEN, h) res1;
> +  UNION_TYPE (AVX512F_LEN, h) res1, res2, res3;
>UNION_TYPE (AVX512F_LEN, ) src1, src2;
> -  _Float16 res_ref[SIZE_RES];
> -  float fp32;
> -
> -  for (i = 0; i < SIZE_RES; i++)
> -res1.a[i] = 5;
> -
> +  MASK_TYPE mask = MASK_VALUE;
> +  _Float16 res_ref[SIZE];
> +
> +  for (i = 0; i < SIZE; i++)
> +res2.a[i] = DEFAULT_VALUE;
> +
>CALC (res_ref, src1.a, src2.a);
> -
> +
>res1.x = INTRINSIC (_cvtx2ps_ph) (src1.x, src2.x);
>if (UNION_CHECK (AVX512F_LEN, h) (res1, res_ref))
>  abort ();
> +
> +  res2.x = INTRINSIC (_mask_cvtx2ps_ph) (res2.x, mask, src1.x, src2.x);
> + MASK_MERGE (h) (res_ref, mask, SIZE);  if (UNION_CHECK (AVX512F_LEN,
> + h) (res2, res_ref))
> +abort ();
> +
> +  res3.x = INTRINSIC (_maskz_cvtx2ps_ph) (mask, src1.x, src2.x);
> + MASK_ZERO (h) (res_ref, mask, SIZE);  if (UNION_CHECK (AVX512F_LEN, h)
> + (res3, res_ref))
> +abort ();
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c
> index 6b9f07ff86a..1aa5daa6c58 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvthf82ph-2.c
> @@ -12,13 +12,14 @@
>  #include "fp8-helper.h"
> 
>  #define SIZE_SRC (AVX512F_LEN_HALF / 8) -#define SIZE_RES (AVX512F_LEN
> / 16)
> +#define SIZE (AVX512F_LEN / 16)
> +#include "avx512f-mask-type.h"
> 
>  void
>  CALC (_Float16 *r, unsigned char *s)
>  {
>int i;
> -  for (i = 0; i < SIZE_RES; i++)
> +  for (i = 0; i < SIZE; i++)
>  r[i] = convert_hf8_to_fp16(s[i]);
>  }
> 
> @@ -26,9 +27,10 @@ void
>  TEST (void)
>  {
>int i,sign;
> -  UNION_TYPE (AVX512F_LEN, h) res;
> +  UNION_TYPE (AVX512F_LEN, h) res1, res2, res3;
>UNION_TYPE (AVX512F_LEN_HALF, i_b) src;
> -  _Float16 res_ref[SIZE_RES];
> +  MASK_TYPE mask = MASK_VALUE;
> +  _Float16 res_ref[SIZE];
> 
>sign = 1;
>

RE: [PATCH] Support Intel AVX10.2 minmax, vector copy and compare instructions

2024-12-03 Thread Jiang, Haochen

Oops, Please ignore this patch, should be sent to Binutils not here.

Multi-threading made the mistake.

Thx,
Haochen

> From: Haochen Jiang 
> Sent: Wednesday, December 4, 2024 3:29 PM
> 
> From: "Mo, Zewei" 
> 
> Hi all,
> 
> As satcvt patch is about to commit, we will move on the final patch of
> AVX10.2.
> 
> This patch will focus on AVX10.2 minmax, vector copy and compare
> instructions, which is mainly Chapter 8, 11 and 14 of AVX10.2 SPEC.
> 
> Reference:
> Intel Advanced Vector Extensions 10.2 Architecture Specification
> https://cdrdv2.intel.com/v1/dl/getContent/828965
> 
> All of the instructions in this patch are new instruction forms except
> for vmovd and vmovw, which are extended usage from the old ones.
> 
> Patch descrption and changes are embedded below.
> 
> Tested on x86-64-pc-linux-gnu. Ok for trunk?
> 
> Nit: As mentioned in patch descrption, VMINMAXNEPBF16 will be changed
> to VMINMAXPBF16 eventually.
> 
> Thx,
> Haochen
>

RE: [PATCH] i386: Change mnemonics from TCVTROWPS2PBF16[H,L] to TCVTROWPS2BF16[H,L]

2025-01-07 Thread Jiang, Haochen

> From: Liu, Hongtao 
> Sent: Friday, January 3, 2025 6:33 PM
> 
> > From: Jiang, Haochen 
> > Sent: Friday, January 3, 2025 4:55 PM
> >
> > Hi all,
> >
> > The mnemonics for TCVTROWPS2PBF16[H,L] has been changed to
> > TCVTROWPS2BF16[H,L] in ISE056. There will be also some more BF16
> > mnemonics change upcoming, which will fix the regression in PR118270.
> Please add PR target/118270 to changelog, otherwise LGTM.

This inst has not landed on Binutils trunk so it would not get into that problem
mentioned in that PR.

I will add the PR in the following patches fixing up AVX10.2.

Thx,
Haochen

> >
> > Bootstraped and tested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > Ref: https://cdrdv2.intel.com/v1/dl/getContent/671368
> >
> > Thx,
> > Haochen
> >

RE: [gcc-wwwdocs PATCH] gcc-15: Mention new ISA and Diamond Rapids support for x86_64 backend

2024-12-17 Thread Jiang, Haochen

> From: Gerald Pfeifer 
> Sent: Tuesday, December 17, 2024 3:57 PM
> 
> On Mon, 25 Nov 2024, Jiang, Haochen wrote:
> > I will do all of the changes with little tweak here. The "and" should
> > be added (actually changed the previous "or" to "and") between
> > -mtune=knl and -mtune=knm.
> 
> Thank you.
> 
> I just pushed a little follow up patch, see below.
> 

Thank for your correction on trivial grammar mistakes! :)

Thx,
Haochen

> Gerald
> 
> 
> commit 7f4a4f377ca5e5fae8ffe6ab45a300799bd75b6f
> Author: Gerald Pfeifer 
> Date:   Tue Dec 17 16:54:47 2024 +0900
> 
> gcc-15: Copy edit Xeon Phi CPU support removal
> 
> diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index
> 23866bde..1c690c4a 100644
> --- a/htdocs/gcc-15/changes.html
> +++ b/htdocs/gcc-15/changes.html
> @@ -242,11 +242,11 @@ a work-in-progress.
>  CMPccXADD, MOVRS, SHA512, SM3, SM4 and USER_MSR ISA extensions.
>
>Support for Xeon Phi CPUs (a.k.a. Knight Landing and Knight Mill) were
> -  removed in GCC 15. GCC will no longer accepts -march=knl,
> +  removed in GCC 15. GCC will no longer accept
> + -march=knl,
>-march=knm, -mavx5124fmaps,
>-mavx5124vnniw, -mavx512er,
>-mavx512pf, -mprefetchwt1,
> -  -mtune=knl and -mtune=knm compiler
> switches.
> +  -mtune=knl, and -mtune=knm compiler
> switches.
>
>  
>

RE: [RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x

2025-01-27 Thread Jiang, Haochen

> From: Richard Biener 
> Sent: Monday, January 27, 2025 5:09 PM
> 
> On Mon, Jan 27, 2025 at 8:30 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > AVX10 has been published for one and half year and we have got many
> > feedbacks on that, one of the feedback is on whether the alias option
> > -mavx10.x should point to 256 or 512.
> >
> > If you also pay attention to LLVM community, you might see this thread
> > related to AVX10 options just sent out several hours ago:
> >
> > [X86][AVX10] Disable m[no-]avx10.1 and switch m[no-]avx10.2 to alias
> > of 512 bit options
> > https://github.com/llvm/llvm-project/pull/124511
> >
> > In GCC, we will also do so. This RFC patch is slightly different with
> > LLVM, just
> > including:
> >
> >   - Switch -m[no-]avx10.2 to alias of 512 bit options.
> >   - Change -mno-avx10.[1,2]-512 to disable both 256 and 512 instructions.
> This
> >   will also result in -mno-avx10.2 would still disable both 256 and 512 
> > insts
> >   according to new alias point to 512.
> >
> > But not including disabling -m[no-]avx10.1, since I still want more
> > input on how to handle that. We actually have three choices on that:
> >
> >  a. Directly re-alias -m[no-]avx10.1 to -m[no-]avx10.1-512 GCC 15 and
> > backport  to GCC 14.
> >  b. Disable -m[no]-avx10.1 in GCC 15, and add it back with
> > -m[no-]avx10.1-512  in the future. This is for in case if someone
> > cross compile with different versions  of GCC with -mavx10.1, it might get
> unexpected result sliently.
> >  c. Disable -m[no]-avx10.1 in GCC 15, and never add it back. Since the
> > option has  been 256 bit, changing them back and forth is messy.
> >
> > It might be the final chance we could change the alias option since
> > real
> > AVX10.1 hardware is coming soon. And it is only x86 specific, so it
> > might still squeeze into GCC 15 at this time.
> >
> > I call this patch RFC patch since we also need to change the doc and
> > testcases accordingly, which makes this patch incomplete. Discussion
> > and input is welcomed on this topic.
> 
> Can you re-hash on how users need to select 256bit vs 512bit support?
> I understand
> the above change basically makes -m[no-]avx10.[12] gate ISA features but not
> size?
> So that will now enable 512bit unless -mno-evex512 is given?
> 

-mno-evex512 will do nothing with AVX10 related options. It will only apply on
-mavx512xxx options.

In GCC currently, take AVX10.2 as example, we have the following options for one
AVX10 version:

-mavx10.2-256: Enable AVX10.2 ISA set with 256 bit vector size only
-mavx10.2-512: Enable AVX10.2 ISA set with both 256 and 512 bit vector size
-mavx10.2: An alias to -mavx10.2-256

Based on that, the current -mno- option would be:

-mno-avx10.2-256: Disable AVX10.2-256 ISA set, which actually disables the whole
AVX10.2.
-mno-avx10.2-512: Disable AVX10.2 ISA set 512 bit vector usage, but keep 256 bit
there.
-mno-avx10.2: An alias to -mno-avx10.2-256.

In this RFC, basically we have two main goal, both of them are introduced in 
GCC 15:
  - Change -mavx10.2 alias to 512 bit instead of 256 bit.
  - -mno-avx10.2-512 would disable the whole AVX10.2, instead of 512 bit only, 
to
  eliminate different understanding on that due to in AVX10 we turned the imply
  relationship from line-like to square-like. Also, it will lead to the new 
-mno-avx10.2
  still disable the whole AVX10.2 ISA set, which meets its first impression on 
option
  name.

Due to that, -mno-avx10.1-512 should also align with that, which is introduced 
in 
GCC 14.

> Given for the -mavx10.[12]-{256,512} the behavior changes compared to GCC
> 14 I'd rather drop the options that behave differently from GCC 14 on GCC 15
> than changing their meaning.  That unfortunately will make them a hard error
> (but I don't expect much use).  I'm not sure it's worth retaining -m[no-
> ]avx10.[12]-512.

-mavx10.[12]-256/512 are clear, -mno-avx10.[12]-256 is also clear. No need to
concern on that. As said, the changes are on -mavx10.x alias and
-mno-avx10.x-512, where AVX10.1 introduced in GCC 14 and AVX10.2 in GCC 15.

In my opinion, I would go either option a) or option c) for -mavx10.1. AVX10.1
options are not widely used for now due to its first appearance in GNR. So that
is why option a) could be doable although changing the meaning, but it is the
last chance for option a. Option c) is a safe choice and I pretty like it. I 
don't like
option b) but I list that out since it is also a choice. I listed them all for 
discussion.

For AVX10.2, since it is first introduced in GCC 15, we could change that w/o
considering compatibility issue. 

> 
> But maybe I'm misunderstanding the change (too many avx10.x related
> options are there).
> 

Yes, that is quite confusing and takes me some time to try to illustrate them
as clear as I can.

Thx,
Haochen

RE: [RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x

2025-01-28 Thread Jiang, Haochen

> From: Richard Biener 
> Sent: Tuesday, January 28, 2025 4:41 PM
> 
> >On Mon, Jan 27, 2025 at 3:54 PM Jiang, Haochen 
> >wrote:
> > -mno-evex512 will do nothing with AVX10 related options. It will only
> > apply on -mavx512xxx options.
> >
> > In GCC currently, take AVX10.2 as example, we have the following
> > options for one
> > AVX10 version:
> >
> > -mavx10.2-256: Enable AVX10.2 ISA set with 256 bit vector size only
> > -mavx10.2-512: Enable AVX10.2 ISA set with both 256 and 512 bit vector
> > size
> > -mavx10.2: An alias to -mavx10.2-256
> >
> > Based on that, the current -mno- option would be:
> >
> > -mno-avx10.2-256: Disable AVX10.2-256 ISA set, which actually disables
> > the whole AVX10.2.
> > -mno-avx10.2-512: Disable AVX10.2 ISA set 512 bit vector usage, but
> > keep 256 bit there.
> > -mno-avx10.2: An alias to -mno-avx10.2-256.
> 
> IMO -mavx10.2-512 -mno-avx10.2-512 should cancel, thus not leave -
> mavx10.2-256 on.  Mixing both ISA and width into a single set of options
> makes combinations quite awkward - consider
> 
>   -mavx10.2-256 -mavx10.2-512 -mno-avx10.2-512 -- should leave all of
> avx10.2 disabled?
>   -mavx10.2-512 -mavx10.2-256 -- gets you 512? or 256?
>   -mavx10.2-256 -mno-avx10.2-512 -mavx10.2-512 -- gets you 512
> 
> so you have to disable AVX 10.x with either -mno-avx10.2-256 or -mno-
> avx10.2-512 to switch from -512 to -256 or vice versa.  That means people
> would need to remember to always apppend -mno-avx10.x-256 -mavx10.x-
> WIDTH when they want to override previous settings with something specific.
> 
> If I'd were to re-design this from scratch I'd have
> 
>  -mavx10.2-{256,512}  but no negative form

It is a great idea, the negative form is quite confusing here. While I am not 
quite
sure on ...

>  -m[no-]avx10.2
> 
> with -mavx10.2 controlling the ISA and -mavx10.2-{256,512} controlling the
> width.
> -mavx10.2-{256,512} might imply -mavx10.2 and -mavx10.2 would imply a
> default width.
> 
> For -mavx10.2-{256,512} the latest option would take precedence (so it'd be -
> mavx10.2={256,512} and not really two independent options).

...this. The latter taking precedence might out of some of users expectation
on how ISA option works. Typically, they would expect what they typed in
options would all be enabled. "=" might solve that but it looks a little weird
to me for an ISA option. Let's see if any others get some idea on that.

> >
> >
> > > Given for the -mavx10.[12]-{256,512} the behavior changes compared
> > > to GCC
> > > 14 I'd rather drop the options that behave differently from GCC 14
> > > on GCC 15 than changing their meaning.  That unfortunately will make
> > > them a hard error (but I don't expect much use).  I'm not sure it's
> > > worth retaining -m[no- ]avx10.[12]-512.
> >
> > -mavx10.[12]-256/512 are clear, -mno-avx10.[12]-256 is also clear. No
> > need to concern on that. As said, the changes are on -mavx10.x alias
> > and -mno-avx10.x-512, where AVX10.1 introduced in GCC 14 and AVX10.2
> in GCC 15.
> >
> > In my opinion, I would go either option a) or option c) for -mavx10.1.
> > AVX10.1 options are not widely used for now due to its first
> > appearance in GNR. So that is why option a) could be doable although
> > changing the meaning, but it is the last chance for option a. Option
> > c) is a safe choice and I pretty like it. I don't like option b) but I list 
> > that out
> since it is also a choice. I listed them all for discussion.
> 
> So I like Option c), drop -m[no-]avx10.1 from GCC 15 (and not introduce -
> m[no-]avx10.2).
> 
> But then as said above mixing two concepts will result in confusion.
> An important
> part will be to ensure all the different option combinations end up behaving
> the same between different compilers.
> 
> Iff we settle on something incompatible with GCC 14 (that is, you get
> semantically different behavior), then I'd recommend to backport the change
> to GCC 14 as well, even if we usually would try to avoid that.

Currently we are all on Option c).
Let me also align with LLVM guys on our opinion on how to handle that. Hope
they could also go this way so that we will have the same options.

Thx,
Haochen

RE: [COMMITTED] OpenMP: Fix metadirective test failures on x86_64 with -m32

2025-01-16 Thread Jiang, Haochen

> From: Sandra Loosemore 
> Sent: Friday, January 17, 2025 12:11 PM
> 

Thanks for the quick fix!

Thx,
Haochen

> gcc/testsuite/ChangeLog
>   * c-c++-common/gomp/metadirective-device.c: Don't add extra
> options
>   for target ia32.
>   * c-c++-common/gomp/metadirective-target-device-1.c: Likewise.
> ---
>  gcc/testsuite/c-c++-common/gomp/metadirective-device.c  | 2 +-
>  gcc/testsuite/c-c++-common/gomp/metadirective-target-device-1.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/c-c++-common/gomp/metadirective-device.c
> b/gcc/testsuite/c-c++-common/gomp/metadirective-device.c
> index 09b795eeabe..380762477b0 100644
> --- a/gcc/testsuite/c-c++-common/gomp/metadirective-device.c
> +++ b/gcc/testsuite/c-c++-common/gomp/metadirective-device.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile }  */
>  /* { dg-additional-options "-foffload=disable -fdump-tree-optimized" } */
> -/* { dg-additional-options "-DDEVICE_ARCH=x86_64 -DDEVICE_ISA=sse -
> msse" { target x86_64-*-* } } */
> +/* { dg-additional-options "-DDEVICE_ARCH=x86_64 -DDEVICE_ISA=sse -
> msse" { target { x86_64-*-* && { ! ia32 } } } } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/c-c++-common/gomp/metadirective-target-device-
> 1.c b/gcc/testsuite/c-c++-common/gomp/metadirective-target-device-1.c
> index 6373349d37f..5d3a4c3ff9b 100644
> --- a/gcc/testsuite/c-c++-common/gomp/metadirective-target-device-1.c
> +++ b/gcc/testsuite/c-c++-common/gomp/metadirective-target-device-1.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile }  */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
> -/* { dg-additional-options "-DDEVICE_ARCH=x86_64 -DDEVICE_ISA=mmx -
> mmmx" { target x86_64-*-* } }  */
> +/* { dg-additional-options "-DDEVICE_ARCH=x86_64 -DDEVICE_ISA=mmx -
> mmmx" { target { x86_64-*-* && { ! ia32 } } } } */
> 
>  #include 
> 
> --
> 2.34.1

RE: [PATCH 00/27] Use avx10.x as the only option for AVX10 with 512 bit vector support while remove avx10.x-256/512 option and 256 bit rounding support

2025-04-05 Thread Jiang, Haochen

> all AVX512 features while disabling AVX512FP16. And "-mavx512f -mno-
> avx10.1"
> should disable all AVX512 features. Both of the two combinations currently
> will ignore "-mno-". In GCC 15, we will raise a warning to mention that we

Correction here, "-mavx512f -mno-avx10.1" should keep AVX512 on since
It will finally become AVX10.1 implies all AVX512, just like AVX512F implies
AVX2. So it won't disable AVX512.

Thx,
Haochen

RE: [gcc-wwwdocs PATCH] gcc-14/15: Mention recent change for Intel x86_64

2025-04-08 Thread Jiang, Haochen

> From: Gerald Pfeifer 
> Sent: Sunday, March 30, 2025 4:23 AM
> 
> On Mon, 24 Mar 2025, Haochen Jiang wrote:
> > Mention AVX10.1 option changes, revise AVX10.2 option and mention
> > APX_F new feature in GCC 15.
> > ---
> 
> >New ISA extension support for Intel AVX10.1 was added.
> > -  AVX10.1 intrinsics are available via the -mavx10.1 or
> > -  -mavx10.1-256 compiler switch with 256-bit vector size
> > -  support. 512-bit vector size support for AVX10.1 intrinsics are
> > -  available via the -mavx10.1-512 compiler switch.
> > +  AVX10.1 intrinsics are available via the -mavx10.1-256
> > +  compiler switch with 256-bit vector size support. 512-bit vector size
> > +  support for AVX10.1 intrinsics are available via the
> > +  -mavx10.1-512 compiler switch. -
> mavx10.1
> > +  enables AVX10.1 intrinsics with 256-bit vector size support in
> > + GCC 14.1
> 
> I suggest to just use "256-bit vector support", dropping the word "size", and
> similar for the 512-bit case.
> 
> > +  and GCC 14.2. Since GCC 14.3, it enables AVX10.1 intrinsics with 
> > 512-bit
> > +  vector size support. Since GCC 14.3, using -mavx10.1 
> > will
> > +  emit a warning due to this behavior change.
> 
> How about streamlining this to
> 
>   "Since GCC 14.3, it enables AVX10.1 intrinsics with 512-bit vector
>   support (and emits a warning due to this behavior change)."
> 
> ?
> 
> > +  compiler switch. MOVRS vector intrinsics are available via
> > +  the -mmovrs -mavx10.2 compiler switch.
> 
> Technically this is not one switch, so "switches"?
> 
> > +AMX-FP8, AMX-MOVRS, AMX-TF32, AMX-TRANSPOSE, APX_F, AVX10.2,
> AVX-IFMA,
> > +AVX-NE-CONVERT, AVX-VNNI-INT16, AVX-VNNI-INT8, CMPccXADD,
> MOVRS, SHA512,
> > +SM3, SM4 and USER_MSR ISA extensions.
> 
> We usually go for an Oxford comma, so "...SM4, and USER_MSR...".
> 
> > +  -mavx10.1-256, -mavx10.1-512 and
> > +  -mevex512 are marked as deprecated. Meanwhile,
> 
> How about just "... are deprecated"?
> 
> 
> > +  -mavx10.1 enables AVX10.1 intrinsics with 512-bit
> > +  vector size support, while in GCC 14.1 and GCC 14.2, it only enables
> > +  256-bit vector size support. GCC will emit a warning when using these
> > +  compiler switches. -mavx10.1-256, -mavx10.1-
> 512
> > +  and -mevex512 will be removed in GCC 16, while the
> warning
> > +  for the behavior change on -mavx10.1 will also be
> removed.
> 
> How about "..in GCC 16 together with he warning..." or "in GCC 16, as will the
> warning..."?
> 
> 
> This is fine with these (or similar) changes.
> 

I have done all the changes and going to commit the patch.

Thx,
Haochen

RE: [gcc-wwwdocs PATCH] gcc-14/15: Mention recent change for Intel x86_64

2025-03-31 Thread Jiang, Haochen

> From: Gerald Pfeifer 
> Sent: Sunday, March 30, 2025 5:23 AM
> 
> On Mon, 24 Mar 2025, Haochen Jiang wrote:
> > Mention AVX10.1 option changes, revise AVX10.2 option and mention
> > APX_F new feature in GCC 15.
> > ---
> 
> >New ISA extension support for Intel AVX10.1 was added.
> > -  AVX10.1 intrinsics are available via the -mavx10.1 or
> > -  -mavx10.1-256 compiler switch with 256-bit vector size
> > -  support. 512-bit vector size support for AVX10.1 intrinsics are
> > -  available via the -mavx10.1-512 compiler switch.
> > +  AVX10.1 intrinsics are available via the -mavx10.1-256
> > +  compiler switch with 256-bit vector size support. 512-bit vector size
> > +  support for AVX10.1 intrinsics are available via the
> > +  -mavx10.1-512 compiler switch. -
> mavx10.1
> > +  enables AVX10.1 intrinsics with 256-bit vector size support in
> > + GCC 14.1
> 
> I suggest to just use "256-bit vector support", dropping the word "size", and
> similar for the 512-bit case.
> 
> > +  and GCC 14.2. Since GCC 14.3, it enables AVX10.1 intrinsics with 
> > 512-bit
> > +  vector size support. Since GCC 14.3, using -mavx10.1 
> > will
> > +  emit a warning due to this behavior change.
> 
> How about streamlining this to
> 
>   "Since GCC 14.3, it enables AVX10.1 intrinsics with 512-bit vector
>   support (and emits a warning due to this behavior change)."
> 
> ?
> 
> > +  compiler switch. MOVRS vector intrinsics are available via
> > +  the -mmovrs -mavx10.2 compiler switch.
> 
> Technically this is not one switch, so "switches"?
> 
> > +AMX-FP8, AMX-MOVRS, AMX-TF32, AMX-TRANSPOSE, APX_F, AVX10.2,
> AVX-IFMA,
> > +AVX-NE-CONVERT, AVX-VNNI-INT16, AVX-VNNI-INT8, CMPccXADD,
> MOVRS, SHA512,
> > +SM3, SM4 and USER_MSR ISA extensions.
> 
> We usually go for an Oxford comma, so "...SM4, and USER_MSR...".
> 
> > +  -mavx10.1-256, -mavx10.1-512 and
> > +  -mevex512 are marked as deprecated. Meanwhile,
> 
> How about just "... are deprecated"?
> 
> 
> > +  -mavx10.1 enables AVX10.1 intrinsics with 512-bit
> > +  vector size support, while in GCC 14.1 and GCC 14.2, it only enables
> > +  256-bit vector size support. GCC will emit a warning when using these
> > +  compiler switches. -mavx10.1-256, -mavx10.1-
> 512
> > +  and -mevex512 will be removed in GCC 16, while the
> warning
> > +  for the behavior change on -mavx10.1 will also be
> removed.
> 
> How about "..in GCC 16 together with he warning..." or "in GCC 16, as will the
> warning..."?
> 
> 
> This is fine with these (or similar) changes.

Thanks for the review!

I will do those changes after my vacation ends (approximately Apr. 8th)

Thx,
Haochen

RE: [PATCH 0/5] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-13 Thread Jiang, Haochen

> From: Haochen Jiang 
> Sent: Wednesday, May 14, 2025 2:17 PM
> 
> Hi all,
> 
> As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-
> evex512 options in GCC 16. Also we will do some clean up in code for all the
> size happening all together.
> 
> The first patch of the patch set removes those options, while the following

Oops, the first patch seems too big for server to handle. The biggest part is
i386-builtin.def change. Let me find a way to separate them out.

Thx,
Haochen

> four is refactoring and cleaning up for the machine description and AVX10.2.
>

[r14-2462 Regression] FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-12.c execution test on Linux/x86_64

2023-07-13 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

450b05ce54d3f08c583c3b5341233ce0df99725b is the first bad commit commit 
450b05ce54d3f08c583c3b5341233ce0df99725b
Author: Tobias Burnus 
Date:   Wed Jul 12 13:50:21 2023 +0200

libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

caused


with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2462/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-11.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-11.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-12.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-12.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.) (If you met problems with cascadelake 
related, disabling AVX512F in command line might save that.) (However, please 
make sure that there is no potential problems with AVX512.)

RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-13 Thread Jiang, Haochen via Gcc-patches

> The recent change in TImode parameter passing on x86_64 results in the FAIL
> of pr91681-1.c.  The issue is that with the extra flexibility, the combine 
> pass is
> now spoilt for choice between using either the
> *add3_doubleword_concat or the *add3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an
> *add3_doubleword_concat_zext define_insn_and_split, that can
> benefit both from the register allocation of *concat, and still avoid the xor
> normally required by zero extension.
> 
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =&r, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures.  Ok for mainline?
> 
> 
> 2023-07-11  Roger Sayle  
> 
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add3_doubleword_concat_zext): New
> define_insn_and_split derived from *add3_doubleword_concat
> and *add3_doubleword_zext.

Hi Roger,

This commit currently changed the codegen of testcase p443644-2.c from:

movq%rdx, %rax
xorl%edx, %edx
addq%rdi, %rax
adcq%rsi, %rdx
to:

movq%rdx, %rcx
movq%rdi, %rax
movq%rsi, %rdx
addq%rcx, %rax
adcq$0, %rdx

which causes the testcase fail under -m64.

Is this within your expectation?

BRs,
Haochen

> 
> 
> Thanks,
> Roger
> --

RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-16 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Jiang, Haochen
> Sent: Friday, July 14, 2023 10:50 AM
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' 
> Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
> 
> > The recent change in TImode parameter passing on x86_64 results in the
> > FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> > the combine pass is now spoilt for choice between using either the
> > *add3_doubleword_concat or the *add3_doubleword_zext
> > patterns, when one operand is a *concat and the other is a zero_extend.
> > The solution proposed below is provide an
> > *add3_doubleword_concat_zext define_insn_and_split, that can
> > benefit both from the register allocation of *concat, and still avoid
> > the xor normally required by zero extension.
> >
> > I'm investigating a follow-up refinement to improve register
> > allocation further by avoiding the early clobber in the =&r, and
> > handling (custom) reloads explicitly, but this piece resolves the testcase
> failure.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-11  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md (*add3_doubleword_concat_zext): New
> > define_insn_and_split derived from *add3_doubleword_concat
> > and *add3_doubleword_zext.
> 
> Hi Roger,
> 
> This commit currently changed the codegen of testcase p443644-2.c from:

Oops, a typo, I mean pr43644-2.c.

Haochen

> 
> movq%rdx, %rax
> xorl%edx, %edx
> addq%rdi, %rax
> adcq%rsi, %rdx
> to:
> 
> movq%rdx, %rcx
> movq%rdi, %rax
> movq%rsi, %rdx
> addq%rcx, %rax
> adcq$0, %rdx
> 
> which causes the testcase fail under -m64.
> 
> Is this within your expectation?
> 
> BRs,
> Haochen
> 
> >
> >
> > Thanks,
> > Roger
> > --

RE: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 20, 2023 9:28 PM
> To: Maciej W. Rozycki 
> Cc: haochen.jiang ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> 
> Subject: Re: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c
> scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64
> 
> On Thu, Jul 20, 2023 at 3:13 PM Maciej W. Rozycki 
> wrote:
> >
> > On Thu, 20 Jul 2023, Richard Biener wrote:
> >
> > > > c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
> > > > commit c1e420549f2305efb70ed37e693d380724eb7540
> > > > Author: Maciej W. Rozycki 
> > > > Date:   Wed Jul 19 11:59:29 2023 +0100
> > > >
> > > > testsuite: Add 64-bit vector variant for bb-slp-pr95839.c
> > >
> > > I think the issue is we disable V2SF on ia32 because of the conflict
> > > with MMX which we don't want to use.
> >
> >  I'm not sure if I have a way to test with such a target.  Would you
> > expect:
> >
> > /* { dg-require-effective-target vect64 } */
> >
> > to cover it?  If so, then I'll put it back as in the original version
> > and post for Haochen to verify.

I suppose just commit to trunk and it should be ok since it is only -m32 issue.

Thx,
Haochen

> 
> Yeah, that should work here.
> 
> Richard.
> 
> >   Maciej

RE: [PATCH] i386: Extend cvtps2pd to memory

2022-07-03 Thread Jiang, Haochen via Gcc-patches

Hi all,

I revised my patch according to all your reviews.

Regtested on x86_64-pc-linux-gnu.

BRs,
Haochen

> -Original Message-
> From: Liu, Hongtao 
> Sent: Thursday, June 30, 2022 4:57 PM
> To: Uros Bizjak ; Jiang, Haochen
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] i386: Extend cvtps2pd to memory
> 
> 
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Thursday, June 30, 2022 4:53 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> >
> > On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen
> 
> > wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Uros Bizjak 
> > > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > > To: Jiang, Haochen 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > 
> > > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > > work on memory operand but currently does not. After this fix,
> > > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > > >
> > > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > > >
> > > > > > > BRs,
> > > > > > > Haochen
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/43618
> > > > > > > * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > > > (sse2_cvtps2pd_load): Rename
> extendvsdfv2df2.
> > >
> > > Rename FROM ...
> > >
> > > Please also mention change to sse2_cvtps2pd.
> > >
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > PR target/43618
> > > > > > > * gcc.target/i386/pr43618-1.c: New test.
> > > > > >
> > > > > > This patch could be as simple as:
> > > > > >
> > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > > --- a/gcc/config/i386/sse.md
> > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > @@ -9195,7 +9195,7 @@
> > > > > > (define_insn "extendv2sfv2df2"
> > > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > > >(float_extend:V2DF
> > > > > > - (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > > + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > > >   "TARGET_MMX_WITH_SSE"
> > > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > > >   [(set_attr "type" "ssecvt")
> > > > >
> > > > > We also tested on this version, it is ok.
> > > > >
> > > > > The reason why the patch looks like this is because in the
> > > > > previous insn sse2_cvtps2pd, the constraint vm and
> > > > > vector_operand actually does not match the actual instruction.
> > > > > Memory operand is V2SF, not V4SF.
> > > > >
> > > > > Therefore, we changed the constraint in that insn. Then it caused
> another
> > issue.
> > > > > For memory operand, it seems that we cannot generate those mask
> > instructions.
> > > > > So I change the pattern to how extendv2hfv2df2 works.
> > > >
> > > > If you want to change the memory access in
> sse2_cvtps2pd,
> > > > then please see how e.g. v2hiv2di is handled in sse.md. In
> > >

RE: [PATCH][pushed] MAINTAINERS: fix alphabetic sorting

2022-07-04 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Martin Liška 
> Sent: Monday, July 4, 2022 6:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jiang, Haochen 
> Subject: [PATCH][pushed] MAINTAINERS: fix alphabetic sorting
> 
> ChangeLog:
> 
>   * MAINTAINERS: fix sorting of names
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f4a11cdc755..7d9aab76dd9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -463,8 +463,8 @@ Andreas Jaeger
>   
>  Harsha Jagasia   
>  Fariborz Jahanian
>  Surya Kumari Jangala 
> -Qian Jianhua 
>  Haochen Jiang
> 
> +Qian Jianhua 

Sorry for misordering that g and h in alphabet table.

Maybe time to go back to kindergarten to have a review on that. Thanks for 
fixing!

Haochen

>  Janis Johnson
>   
>  Teresa Johnson
>   
>  Kean Johnston
> --
> 2.36.1

RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jiang, Haochen via Gcc-patches

Hi Jakub,

> So, what does this imply for the current ISAs?

AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
independent ISA feature set. Although sharing the same instructions and
encodings, AVX10 and AVX512 are conceptual independent features, which
means they are orthogonal.

> The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> means 512 bit vector support is available and most of the various -mavx512XXX
> options imply -mavx512f (and -mno-avx512f turns those off).  And if
> -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> access [xy]mm16+).

For AVX10, the 128/256/scalar version of the instructions are always there, and
also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
in options. When 512 version is enabled, 128/256/scalar version is also enabled,
which is kind of reverse relation between the current AVX512F/AVX512VL.

Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the 
current
pattern, which is shown in our AVX512DQ+VL sample patches.

> Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> one mixes the -mavx10* options together with -mno-avx512vl or similar
> options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom 
Server and
client will only have AVX10 CPUIDs with 256 bit support set.

-mno-avx512f will still imply -mno-avx512vl.

As we mentioned below, we don't recommend users to combine the AVX10 and legacy
AVX512 options. We understand that there will be different opinions on what 
should
compiler behave on some controversial option combinations.

If there is someone mixes the options, the golden rule is that we are using OR 
logic.
Therefore, enabling either feature will turn on the shared instructions, no 
matter the other
feature is not mentioned or closed. That is why we are emitting warning for 
some scenarios,
which is also mentioned in the letter.

Thx,
Haochen

> 
>   Jakub

RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 9, 2023 1:38 PM
> To: Phoebe Wang 
> Cc: Hongtao Liu ; Joseph Myers
> ; Jiang, Haochen ; gcc-
> patc...@gcc.gnu.org; ubiz...@gmail.com; Liu, Hongtao
> ; Zhang, Annita ; Wang,
> Phoebe ; x86-64-abi  a...@googlegroups.com>; llvm-dev ; Craig Topper
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> 
> 
> > Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > I have some proposals about unifying ABI on AVX10 for both 256-bit
> > and 512-bit.
> >
> >
> >
> > Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any
> > function which has 512-bit or above vectors in passing/returning arguments.
> >
> > Problem: Binary cannot run on AVX10-256 only target.
> >
> > Reason:
> >
> > When user tries to pass/return 512-bit vector, they should be aware of
> > it will become target dependent. User should be taught not to use it
> > on 256-bit targets and there will be unexpected things happening if
> > they insist.
> >
> > Actually, ICC and MSVC already have chosen to promote for the argument:
> > https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> > misbehavior between fail in result and crash due to illegal
> > instruction, the latter is definitely better than the former.
> >
> > In this way, we can also declare x86-64-v5 is inherit from x86-64-v4
> > and has the interaction with previous versions.
> >
> >
> >
> > Proposal 2: Abort compilation when user tries to pass/return 512-bit
> > vectors.
> >
> > Reason: This turns possible run time crash into compile time error.
> >
> >
> >
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
> 
> I don’t think we can realistically change the ABI.  If we could passing them 
> in two
> 256bit registers would be possible as well.
> 
> Note I fully expect intel to turn around and implement 512 bits on a 256 but 
> data
> path on the E cores in 5 years.  And it will take at least that time for 
> AVX10 to take
> off (look at AVX512 for this and how they cautionously chose to include bf16 
> to
> cut off Zen4).  So IMHO we shouldn’t worry at all and just wait and see for 
> AVX42
> to arrive.

Let me try to clarify the whole thing.

I suppose Phoebe's "change" is based on LLVM.

In GCC, current behavior is to pass 512 bit vector in memory when there is no
512 bit support. But when there is support, everything should be passed in 
register.

In AVX10, I prefer to still keep to this pattern. But if most of you want to 
change it,
I have no objection since AVX10 is a new start.

Thx,
Haochen

> 
> Richard
> 
> > Reason: We expect AVX10-256 is a universal configuration and in most
> > scenarios, 512-bit vector won't bring performance improvements. So we
> > can sacrifice a little 512-bit performance to achieve the interaction
> > between
> > AVX10-256 and AVX10-512. In this way, there won't have any runtime
> > issue in the future either.
> >
> >
> >
> > Thanks
> >
> > Phoebe
> >
> > Hongtao Liu  于2023年8月9日周三 10:18写道：
> >
> >>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
> >>>
> >>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >>>>
> >>>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers
> >>>> 
> >> wrote:
> >>>>>
> >>>>> Do you have any comments on the interaction of AVX10 with the
> >>>>> micro-architecture levels defined in the ABI (and supported with
> >>>>> glibc-hwcaps directories in glibc)?  Given that the levels are
> >> cumulative,
> >>>>> should we take it that any future levels will be ones supporting
> >> 512-bit
> >>>>> vector width for AVX10 (because x86-64-v4 requires the current
> >> AVX512F,
> >>>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> >> processors
> >>>>> that only support 256-bit vector width will be considered to match
> >> the
> >>>>> x86-64-v3 micro-architecture level but not any higher level?
> >>>> This is actually something we really want to discuss in the
> >>>> community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-
> 256) + APX.
> >>>> One big reason is Intel E-core will only support AVX10 256-bit, if
> >>>>

RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 8, 2023 8:45 PM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; Liu, Hongtao 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions
> > and encodings, AVX10 and AVX512 are conceptual independent features,
> > which means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f /
> > > TARGET_AVX512F means 512 bit vector support is available and most of
> > > the various -mavx512XXX options imply -mavx512f (and -mno-avx512f
> > > turns those off).  And if -mavx512vl / TARGET_AVX512VL isn't
> > > available, tons of places just use 512-bit EVEX instructions for
> > > 256-bit or 128-bit stuff (mostly to be able to access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always
> > there, and also for [xy]mm16+. 512 version is "optional", which needs
> > user to indicate them in options. When 512 version is enabled,
> > 128/256/scalar version is also enabled, which is kind of reverse relation
> > between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic
> > for the current pattern, which is shown in our AVX512DQ+VL sample patches.
> 
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
>

We take AVX10 and AVX512 two independent ISAs.

Therefore, it is quite weird to disable something with another unrelated ISA.
I don't think -mavx10.1 -mno-avx512f should disable anything.

Thx,
Haochen

> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they
> > > have AVX512F CPUID even when the 512-bit vectors aren't present?
> > > What happens if one mixes the -mavx10* options together with
> > > -mno-avx512vl or similar options?  Will -mno-avx512f still imply 
> > > -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only
> > Xeon Server will have AVX512 related CPUIDs for backward
> > compatibility. For GNR, it will be AVX512F, AVX512VL, AVX512CD,
> > AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI, AVX512_VNNI,
> > AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom
> Server and client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10
> > and legacy
> > AVX512 options. We understand that there will be different opinions on
> > what should compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using 
> > OR logic.
> > Therefore, enabling either feature will turn on the shared
> > instructions, no matter the other feature is not mentioned or closed.
> > That is why we are emitting warning for some scenarios, which is also
> > mentioned in the letter.
> 
> I'm refraining from commenting on the senslesness of AVX10 as you're likely on
> the same receiving side as us.
> 
> Thanks,
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > >   Jakub
> >

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Jan Beulich 
> Sent: Thursday, August 10, 2023 9:31 PM
> To: Phoebe Wang 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com; Liu,
> Hongtao ; Zhang, Annita ;
> x86-64-abi ; llvm-dev  d...@lists.llvm.org>; Craig Topper ; Richard Biener
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined
> > ABI in practice. A well defined ABI should guarantee 1) interlinkable
> > across different compile options within the same compiler; 2)
> > interlinkable across different compilers. Both aspects are failed in the 
> > non 512-
> bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can
> > run on both AVX10-256 and AVX10-512. It would be common that binaries
> > compiled with AVX10-256 may link with native built binaries on AVX10-512
> targets.

IMO it is not acceptable for AVX10-256 to generate zmm registers.

If I have to choose among the three proposal, the second is better.

But the best choice I suppose is to keep what we are doing currently, which is
passing them in memory and emit a warning. It is a reasonable behavior.

Thx,
Haochen

> 
> But you're only describing a pre-existing problem here afaict. Code compiled 
> with
> -mavx51f passing __m512 type data to a function compiled with only, say, 
> -maxv2
> won't interoperate properly either. What's worse, imo the psABI doesn't
> sufficiently define what __m256 etc actually are. After all these aren't types
> defined by the C standard (as opposed to at least most other types in the
> respective table there), and you can't really make assumptions like "this is 
> what
> certain compilers think this is".
> 
> Jan

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches

Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate 
that.

For the arch level issue, it might be a little early to discuss and should not 
block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
> converged vector instruction set across all Intel architectures, including
> Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
> optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
> instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
> AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
> (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the 
> links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
> It
> describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper 
> It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
> We will not provide -m[no-]xxx to enable/disable each single vector 
> feature
> in one version as we used to before. Instead, a simple option 
> -m[no-]avx10.x
> is used. If 512 bit version is needed, -mavx10.x-512 is all you need. 
> Also,
> maximum vector width should be the same when different version of AVX10 is
> used. For example, enabling AVX10.1 with 512 bit vector width while 
> enabling
> AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
> Every feature showed up in the current version will always show up in 
> future
> version.
>   - AVX10 is an independent ISA feature set.
> Although sharing the same instructions and encodings, AVX10 and AVX512 are
> conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler 
> options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
> 256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 
> bit
> vector width. “-mno-avx10.x-512” option will not be provided to avoid
> confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 
> bit
> vector width. But it will disable 512 bit vector width since the vector 
> size
> is indicated in option. “-mno-avx10.x-256” option will not be provided to
> keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the features introduced 
> >=avx10.x
> (both 256 and 512 bit) and keep features  how
> -mno- options behave previously.
> 
> When there comes an option combination of various vector size indicated (e.g. 
> -
> mavx10.x-512 -mavx10.y-256), we would like to emit a wa

[r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches

From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c 
scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

On Linux/x86_64,

3a13884b23ae32b43d56d68a9c6bd4ce53d60017 is the first bad commit commit 
3a13884b23ae32b43d56d68a9c6bd4ce53d60017
Author: Richard Biener 
Date:   Fri Aug 11 12:08:10 2023 +0200

Improve BB vectorization opt-info

caused

FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  
scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: 
basic block" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3148/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches

From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c 
scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

On Linux/x86_64,

46c8c225455273ce7f7da7cc5707aed54f23e78d is the first bad commit
commit 46c8c225455273ce7f7da7cc5707aed54f23e78d
Author: Richard Biener 
Date:   Wed Jul 26 15:23:45 2023 +0200

Improve sinking with unrelated defs

caused

FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2946/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

RE: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: ZiNgA BuRgA 
> Sent: Monday, August 21, 2023 5:27 PM
> To: Richard Biener ; Hongtao Liu
> 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> Another way (not saying this is better, just throwing out ideas) is to
> break AVX10.1 into all the AVX-512 subsets.
> So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> 
> * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> subsets, and set the __AVX10_1__ define
> * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> * -mno-avx512vbmi  would similarly be an alias for
> `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> unusual (enable all AVX10.1 but disable all VBMI)
> * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off

I am considering a proposal quite similar to this if we want to change the
design so that it is flexible.

But there are a few proposals on the table. The problem for this proposal
is that if it is a over-design to make each AVX512 feature to split since in 
most
scenarios we just need to keep the vector width as the same.

Thx,
Haochen

> 
> 
> On 21/08/2023 5:36 pm, Richard Biener wrote:
> > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> >  wrote:
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
> >
> > Richard.
>

RE: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8 on Linux/x86_64

2023-07-07 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Hongtao Liu 
> Sent: Friday, July 7, 2023 3:55 PM
> To: Beulich, Jan 
> Cc: haochen.jiang ; Jiang, Haochen
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-
> assembler-times vpandn 8 on Linux/x86_64
> 
> On Fri, Jul 7, 2023 at 3:50 PM Hongtao Liu  wrote:
> >
> > On Fri, Jul 7, 2023 at 3:50 PM Jan Beulich  wrote:
> > >
> > > On 07.07.2023 09:46, Hongtao Liu wrote:
> > > > On Fri, Jul 7, 2023 at 3:18 PM Jan Beulich via Gcc-regression
> > > >  wrote:
> > > >>
> > > >> On 06.07.2023 13:57, haochen.jiang wrote:
> > > >>> On Linux/x86_64,
> > > >>>
> > > >>> e007369c8b67bcabd57c4fed8cff2a6db82e78e6 is the first bad commit
> > > >>> commit e007369c8b67bcabd57c4fed8cff2a6db82e78e6
> > > >>> Author: Jan Beulich 
> > > >>> Date:   Wed Jul 5 09:49:16 2023 +0200
> > > >>>
> > > >>> x86: yet more PR target/100711-like splitting
> > > >>>
> > > >>> caused
> > > >>>
> > > >>> FAIL: gcc.target/i386/pr100711-1.c scan-assembler-times pandn 2
> > > >>> FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8
> > > >>
> > > >> I expect the same applies here - -mno-avx512f (or -mno-avx512vl)
> > > >> might
> > > > For this one, we can just add -mno-avx512f to the testcase,it aims
> > > > to optimize pandn for avx2 target.
> > > >> address this failure. But whether that's really the way to go I'm
> > > >> not sure of. Plus of course such adjustments should have been
> > > >> done ahead of time, when it was decided that testing with certain
> > > >> -march= settings is a goal. My changes have merely uncovered the
> prior omissions.
> > > > It's not a standard request, it's just our private tester which is
> > > > used to find gcc bugs and miss-optimizations.
> > > > It sometimes generates false positive reports (usually adding
> > > > -mno-avx512f to the testcase can fix that), hope that's not too
> > > > annoying.
> > >
> > > Wouldn't that then better be done once uniformly for all affected
> > > tests, rather than being discovered piecemeal?
> This also prevents us from finding potential problems.

Yes, -march=cascadelake actually opens AVX512F related features. It sometimes
show the potential problems while sometimes false positive.

I will add a hint in the script email.

Thx,
Haochen

> > >
> > > Anyway, in this case: Since you said you'd take care of the other
> > > test, will/can you do so for the two ones here as well, or am I on the 
> > > hook?
> > I'll do that.
> > >
> > > Jan
> >
> >
> >
> > --
> > BR,
> > Hongtao
> 
> 
> 
> --
> BR,
> Hongtao

[r13-5971 Regression] FAIL: gcc.target/i386/pr108774.c (test for excess errors) on Linux/x86_64

2023-02-14 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

a33e3dcbd15e73603796e30b5eeec11a0c8bacec is the first bad commit commit 
a33e3dcbd15e73603796e30b5eeec11a0c8bacec
Author: Vladimir N. Makarov 
Date:   Mon Feb 13 16:05:04 2023 -0500

RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

caused

FAIL: gcc.target/i386/pr108774.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5971/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108774.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108774.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)

RE: [COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-25 Thread Jiang, Haochen via Gcc-patches

> gcc/ChangeLog:
> 
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
> instructions when available.  Emulate truncation via
> ix86_expand_vec_perm_const_1 when native truncate insn
> is not available.
> (ix86_expand_vecop_qihi_partial) : Use pmovzx
> when available.  Trivially rename some variables.
> (ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.

Hi Uros,

I suppose you pushed wrong patch to trunk.

On trunk, we see this:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;

-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;

   switch (qimode)

It should not be if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))

The patch in this thread is correct, where is:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;
 
-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;
 
   switch (qimode)

Thx,
Haochen

> * config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
> calculation of V*QImode emulations to account for generation of
> 2x-wider mode instructions.
> (ix86_shift_rotate_cost): Update cost calculation of V*QImode
> emulations to account for generation of 2x-wider mode instructions.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Uros.

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-09-21 Thread Jiang, Haochen via Gcc-patches

Hi all,

I would like to backport this patch to GCC 12 release branch as machines with 
the version of default GCC
is 12.x (which is always using newer kernels), if the patch is not backported, 
the amx tests will always fail.

Ok for backport?

BRs,
Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, June 21, 2022 10:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Tue, Jun 21, 2022 at 9:41 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Tuesday, June 21, 2022 3:06 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > kernels
> > >
> > > On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen
> > > 
> > > wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Monday, June 20, 2022 10:54 PM
> > > > > To: Jiang, Haochen 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > 
> > > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > > > kernels
> > > > >
> > > > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > > > 
> > > > > wrote:
> > > > > >
> > > > > > From: "Jiang, Haochen" 
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We need syscall to enable AMX for kernels>=5.4. It is missing
> > > > > > in current amx tests, which will cause test fail.
> > > > >
> > > > > So this new code is only valid for linux & co?
> > > >
> > > > Thanks for reminding me for that, I only test on linux since the
> > > > header file is
> > > only in linux.
> > > >
> > > > Just updated a patch wrapping with a macro not to change the
> > > > behavior on
> > > windows.
> > >
> > > I think you want __linux__ there, not __unix__.
> >
> > Fixed with __linux__.
> 
> OK.
> 
> Thanks,
> Uros.
> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > Uros.
> > >
> > > >
> > > > Regtested on x86_64-pc-linux-gnu.
> > > >
> > > > Thx,
> > > > Haochen
> > > > >
> > > > > Uros.
> > > > >
> > > > > >
> > > > > > This patch aims to add them to fix this bug.
> > > > > >
> > > > > > BRs,
> > > > > > Haochen
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > > > New function to check if AMX is usable and enable AMX.
> > > > > > (main): Run test if AMX is usable.
> > > > > > ---
> > > > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > > > +++
> > > > > >  1 file changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > index 434b0e59703..92ed8669304 100644
> > > > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > @@ -4,11 +4,22 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > > +#include 
> > > > > >  #ifdef DEBUG
> > > > > >  #include 
> > > > > >  #endif
> > > > > >  #include "cpuid.h"
> > > > > >
> > > > > > +#define XFEATURE_XTILECFG  17
> > > > > > +#define XFEATURE_XTILEDATA 18
> > > > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > > > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > > > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > > > XFEATURE_MASK_XTILEDATA)
> > > > > > +
> > > > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > > > +
> > > > > >  /* TODO: The tmm emulation is temporary for current
> > > > > > AMX implementation with no tmm regclass, should
> > > > > > be changed in the future. */ @@ -44,6 +55,18 @@ typedef
> > > > > > struct __tile
> > > > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > > > #define _STRIDE 64
> > > > > >
> > > > > > +/* We need syscall to use amx functions */ int
> > > > > > +request_perm_xtile_data() {
> > > > > > +  unsigned long bitmask;
> > > > > > +
> > > > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > > > XFEATURE_XTILEDATA) ||
> > > > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, &bitmask))
> > > > > > +return 0;
> > > > > > +
> > > > > > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > > > > > +
> > > > > >  /* Initialize tile config by setting all tmm size to 16x64 */
> > > > > > void init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7
> > > > > > @@ main () #ifdef AMX_BF16
> > > > > >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > > > > > +  && request_perm_xtile_data ()
> > > > > >)
> > > > > >  {
> > > > > >DO_TEST ();
> > > > > > --
> > > > > > 2.18.2
> > > > > >

RE: [PATCH] [i386]Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-07 Thread Jiang, Haochen via Gcc-patches

Hi Uros,

I have fixed that in this patch attached for checking in. Is that ok for trunk?

Regtested on x86_64-pc-linux-gnu.

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, December 8, 2021 12:14 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386]Add combine splitter to transform 
vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

On Tue, Dec 7, 2021 at 3:10 AM Haochen Jiang via Gcc-patches 
 wrote:
>
> This patch adds combine splitter to transform vpcmpeqd/vpxor/vblendvps to 
> vblendvps for ~op0.
>
> OK for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/100738
> * config/i386/sse.md 
> (*_blendv_not_ltint):
> Add new define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100738
> * g++.target/i386/pr100738-1.C: New test.

OK with a change below.

Thanks,
Uros.

>
> ---
>  gcc/config/i386/sse.md | 28 ++
>  gcc/testsuite/g++.target/i386/pr100738-1.C | 19 +++
>  2 files changed, 47 insertions(+)
>  create mode 100755 gcc/testsuite/g++.target/i386/pr100738-1.C
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
> 08bdcddc111..db3506c78d7 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -20659,6 +20659,34 @@
> (set_attr "btver2_decode" "vector,vector,vector")
> (set_attr "mode" "")])
>
> +;; PR target/100738: Transform vpcmpeqd + vpxor + vblendvps to 
> +vblendvps for inverted mask; (define_insn_and_split 
> "*_blendv_not_ltint"
> +  [(set (match_operand: 0 "register_operand")
> +   (unspec:
> + [(match_operand: 1 "register_operand")
> +  (match_operand: 2 "vector_operand")
> +  (subreg:
> +(lt:VI48_AVX
> +  (subreg:VI48_AVX
> +  (not:
> +(match_operand: 3 "register_operand")) 0)
> +  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> + UNSPEC_BLENDV))]
> +  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (unspec:
> +[(match_dup 2) (match_dup 1) (match_dup 3)] UNSPEC_BLENDV))] 
> +{
> +  operands[0] = gen_lowpart (mode, operands[0]);
> +  operands[1] = gen_lowpart (mode, operands[1]);
> +  operands[2] = gen_lowpart (mode, operands[2]);
> +  operands[3] = gen_lowpart (mode, operands[3]);
> +  if (MEM_P (operands[2]))
> +operands[2] = force_reg (mode, operands[2]);

You don't need to check for MEM_P, force_reg will do it for you.

> +})
> +
>  (define_insn "_dp"
>[(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> (unspec:VF_128_256
> diff --git a/gcc/testsuite/g++.target/i386/pr100738-1.C 
> b/gcc/testsuite/g++.target/i386/pr100738-1.C
> new file mode 100755
> index 000..5a04c5b031f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr100738-1.C
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mavx2" } */
> +/* { dg-final {scan-assembler-times "vblendvps\[ \\t\]" 2 } } */
> +/* { dg-final {scan-assembler-not "vpcmpeqd\[ \\t\]" } } */
> +/* { dg-final {scan-assembler-not "vpxor\[ \\t\]" } } */
> +
> +typedef int v4si __attribute__((vector_size(16))); typedef char v16qi 
> +__attribute__((vector_size(16)));
> +v4si
> +foo_1 (v16qi a, v4si b, v4si c, v4si d) {
> +  return ((v4si)~a) < 0 ? c : d;
> +}
> +
> +v4si
> +foo_2 (v16qi a, v4si b, v4si c, v4si d) {
> +  return ((v4si)~a) >= 0 ? c : d;
> +}
> --
> 2.18.1
>


0001-i386-Add-combine-splitter-to-transform-vpcmpeqd-vpxo.patch
Description: 0001-i386-Add-combine-splitter-to-transform-vpcmpeqd-vpxo.patch

[r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

[r13-3212 Regression] FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR. on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

b88adba751da635c6f0c353c5bc51bbe2ecf4c89 is the first bad commit
commit b88adba751da635c6f0c353c5bc51bbe2ecf4c89
Author: Liwei Xu liwei...@intel.com
Date:   Fri Sep 23 13:46:02 2022 +0800

Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

caused

FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR.

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3212/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64\ -march=cascadelake}'"

1 2 >

1 - 100 of 132 matches

Mail list logo