[PATCH] Avoid ICE in except.cc on targets that don't support exceptions.

2024-05-22 Thread Roger Sayle

A number of testcases currently fail on nvptx with the ICE:

during RTL pass: final
openmp-simd-2.c: In function 'foo':
openmp-simd-2.c:28:1: internal compiler error: in get_personality_function,
at expr.cc:14037
   28 | }
  | ^
0x98a38f get_personality_function(tree_node*)
/home/roger/GCC/nvptx-none/gcc/gcc/expr.cc:14037
0x969d3b output_function_exception_table(int)
/home/roger/GCC/nvptx-none/gcc/gcc/except.cc:3226
0x9b760d rest_of_handle_final
/home/roger/GCC/nvptx-none/gcc/gcc/final.cc:4252

The simple oversight in output_function_exception_table is that it calls
get_personality_function (immediately) before checking the target's
except_unwind_info hook (which on nvptx always returns UI_NONE).
The (perhaps obvious) fix is to move the assignments of fname and
personality after the tests that they are needed, and before their
first use.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with no new failures in the testsuite, and ~220 fewer FAILs.
Ok for mainline?

2024-05-22  Roger Sayle  

gcc/ChangeLog
* except.cc (output_function_exception_table): Move call to
get_personality_function after targetm_common.except_unwind_info
check, to avoid ICE on targets that don't support exceptions.


Thanks in advance,
Roger
--

diff --git a/gcc/except.cc b/gcc/except.cc
index 2080fcc..b5886e9 100644
--- a/gcc/except.cc
+++ b/gcc/except.cc
@@ -3222,9 +3222,6 @@ output_one_function_exception_table (int section)
 void
 output_function_exception_table (int section)
 {
-  const char *fnname = get_fnname_from_decl (current_function_decl);
-  rtx personality = get_personality_function (current_function_decl);
-
   /* Not all functions need anything.  */
   if (!crtl->uses_eh_lsda
   || targetm_common.except_unwind_info (&global_options) == UI_NONE)
@@ -3234,6 +3231,9 @@ output_function_exception_table (int section)
   if (section == 1 && !crtl->eh.call_site_record_v[1])
 return;
 
+  const char *fnname = get_fnname_from_decl (current_function_decl);
+  rtx personality = get_personality_function (current_function_decl);
+
   if (personality)
 {
   assemble_external_libcall (personality);


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-22 Thread Richard Biener
On Tue, May 21, 2024 at 11:36 PM David Malcolm  wrote:
>
> On Tue, 2024-05-21 at 15:13 +, Qing Zhao wrote:
> > Thanks for the comments and suggestions.
> >
> > > On May 15, 2024, at 10:00, David Malcolm 
> > > wrote:
> > >
> > > On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> > > > On Mon, 13 May 2024, Qing Zhao wrote:
> > > >
> > > > > -Warray-bounds is an important option to enable linux kernal to
> > > > > keep
> > > > > the array out-of-bound errors out of the source tree.
> > > > >
> > > > > However, due to the false positive warnings reported in
> > > > > PR109071
> > > > > (-Warray-bounds false positive warnings due to code duplication
> > > > > from
> > > > > jump threading), -Warray-bounds=1 cannot be added on by
> > > > > default.
> > > > >
> > > > > Although it's impossible to elinimate all the false positive
> > > > > warnings
> > > > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > > > documentation says "always out of bounds"), we should minimize
> > > > > the
> > > > > false positive warnings in -Warray-bounds=1.
> > > > >
> > > > > The root reason for the false positive warnings reported in
> > > > > PR109071 is:
> > > > >
> > > > > When the thread jump optimization tries to reduce the # of
> > > > > branches
> > > > > inside the routine, sometimes it needs to duplicate the code
> > > > > and
> > > > > split into two conditional pathes. for example:
> > > > >
> > > > > The original code:
> > > > >
> > > > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > > > {
> > > > >   if (index >= 4)
> > > > > warn ();
> > > > >   *ptr = 0;
> > > > >   *val = sg->vals[index];
> > > > >   if (index >= 4)
> > > > > warn ();
> > > > >   *ptr = *val;
> > > > >
> > > > >   return;
> > > > > }
> > > > >
> > > > > With the thread jump, the above becomes:
> > > > >
> > > > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > > > {
> > > > >   if (index >= 4)
> > > > > {
> > > > >   warn ();
> > > > >   *ptr = 0; // Code duplications since "warn" does
> > > > > return;
> > > > >   *val = sg->vals[index];   // same this line.
> > > > > // In this path, since it's
> > > > > under
> > > > > the condition
> > > > > // "index >= 4", the compiler
> > > > > knows
> > > > > the value
> > > > > // of "index" is larger then 4,
> > > > > therefore the
> > > > > // out-of-bound warning.
> > > > >   warn ();
> > > > > }
> > > > >   else
> > > > > {
> > > > >   *ptr = 0;
> > > > >   *val = sg->vals[index];
> > > > > }
> > > > >   *ptr = *val;
> > > > >   return;
> > > > > }
> > > > >
> > > > > We can see, after the thread jump optimization, the # of
> > > > > branches
> > > > > inside
> > > > > the routine "sparx5_set" is reduced from 2 to 1, however,  due
> > > > > to
> > > > > the
> > > > > code duplication (which is needed for the correctness of the
> > > > > code),
> > > > > we
> > > > > got a false positive out-of-bound warning.
> > > > >
> > > > > In order to eliminate such false positive out-of-bound warning,
> > > > >
> > > > > A. Add one more flag for GIMPLE: is_splitted.
> > > > > B. During the thread jump optimization, when the basic blocks
> > > > > are
> > > > >duplicated, mark all the STMTs inside the original and
> > > > > duplicated
> > > > >basic blocks as "is_splitted";
> > > > > C. Inside the array bound checker, add the following new
> > > > > heuristic:
> > > > >
> > > > > If
> > > > >1. the stmt is duplicated and splitted into two conditional
> > > > > paths;
> > > > > +  2. the warning level < 2;
> > > > > +  3. the current block is not dominating the exit block
> > > > > Then not report the warning.
> > > > >
> > > > > The false positive warnings are moved from -Warray-bounds=1 to
> > > > >  -Warray-bounds=2 now.
> > > > >
> > > > > Bootstrapped and regression tested on both x86 and aarch64.
> > > > > adjusted
> > > > >  -Warray-bounds-61.c due to the false positive warnings.
> > > > >
> > > > > Let me know if you have any comments and suggestions.
> > > >
> > > > At the last Cauldron I talked with David Malcolm about these kind
> > > > of
> > > > issues and thought of instead of suppressing diagnostics to
> > > > record
> > > > how a block was duplicated.  For jump threading my idea was to
> > > > record
> > > > the condition that was proved true when entering the path and do
> > > > this
> > > > by recording the corresponding locations
> >
> > Is only recording the location for the TRUE path  enough?
> > We might need to record the corresponding locations for both TRUE and
> > FALSE paths since the VRP might be more accurate on both paths.
> > Is only recording the location is enough?
> > Do we need to record the pointer to the original condition stmt?
>
> Just to be clear: I don't plan to work on this myself (I have far too
> much already to wor

Re: [PATCH] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-05-22 Thread Richard Biener
On Wed, May 22, 2024 at 3:58 AM liuhongt  wrote:
>
> According to IEEE standard, for conversions from floating point to
> integer. When a NaN or infinite operand cannot be represented in the
> destination format and this cannot otherwise be indicated, the invalid
> operation exception shall be signaled. When a numeric operand would
> convert to an integer outside the range of the destination format, the
> invalid operation exception shall be signaled if this situation cannot
> otherwise be indicated.
>
> The patch prevent simplication of the conversion from floating point
> to integer for NAN/INF/out-of-range constant when flag_trapping_math.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?

OK if there are no further comments today.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR rtl-optimization/100927
> PR rtl-optimization/115161
> PR rtl-optimization/115115
> * simplify-rtx.cc (simplify_const_unary_operation): Prevent
> simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range
> constant when flag_trapping_math.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr100927.c: New test.
> ---
>  gcc/simplify-rtx.cc  | 23 
>  gcc/testsuite/gcc.target/i386/pr100927.c | 27 
>  2 files changed, 46 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100927.c
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 53f54d1d392..b7a770dad60 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2256,14 +2256,25 @@ simplify_const_unary_operation (enum rtx_code code, 
> machine_mode mode,
>switch (code)
> {
> case FIX:
> + /* According to IEEE standard, for conversions from floating point 
> to
> +integer. When a NaN or infinite operand cannot be represented in
> +the destination format and this cannot otherwise be indicated, 
> the
> +invalid operation exception shall be signaled. When a numeric
> +operand would convert to an integer outside the range of the
> +destination format, the invalid operation exception shall be
> +signaled if this situation cannot otherwise be indicated.  */
>   if (REAL_VALUE_ISNAN (*x))
> -   return const0_rtx;
> +   return flag_trapping_math ? NULL_RTX : const0_rtx;
> +
> + if (REAL_VALUE_ISINF (*x) && flag_trapping_math)
> +   return NULL_RTX;
>
>   /* Test against the signed upper bound.  */
>   wmax = wi::max_value (width, SIGNED);
>   real_from_integer (&t, VOIDmode, wmax, SIGNED);
>   if (real_less (&t, x))
> -   return immed_wide_int_const (wmax, mode);
> +   return (flag_trapping_math
> +   ? NULL_RTX : immed_wide_int_const (wmax, mode));
>
>   /* Test against the signed lower bound.  */
>   wmin = wi::min_value (width, SIGNED);
> @@ -2276,13 +2287,17 @@ simplify_const_unary_operation (enum rtx_code code, 
> machine_mode mode,
>
> case UNSIGNED_FIX:
>   if (REAL_VALUE_ISNAN (*x) || REAL_VALUE_NEGATIVE (*x))
> -   return const0_rtx;
> +   return flag_trapping_math ? NULL_RTX : const0_rtx;
> +
> + if (REAL_VALUE_ISINF (*x) && flag_trapping_math)
> +   return NULL_RTX;
>
>   /* Test against the unsigned upper bound.  */
>   wmax = wi::max_value (width, UNSIGNED);
>   real_from_integer (&t, VOIDmode, wmax, UNSIGNED);
>   if (real_less (&t, x))
> -   return immed_wide_int_const (wmax, mode);
> +   return (flag_trapping_math
> +   ? NULL_RTX : immed_wide_int_const (wmax, mode));
>
>   return immed_wide_int_const (real_to_integer (x, &fail, width),
>mode);
> diff --git a/gcc/testsuite/gcc.target/i386/pr100927.c 
> b/gcc/testsuite/gcc.target/i386/pr100927.c
> new file mode 100644
> index 000..b137396c30f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr100927.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2 -ftrapping-math" } */
> +/* { dg-final { scan-assembler-times "cvttps2dq" 3 } }  */
> +
> +#include 
> +
> +__m128i foo_ofr() {
> +  const __m128i iv = _mm_set_epi32(0x4f00, 0x4f00, 0x4f00, 
> 0x4f00);
> +  const __m128  fv = _mm_castsi128_ps(iv);
> +  const __m128i riv = _mm_cvttps_epi32(fv);
> +  return riv;
> +}
> +
> +__m128i foo_nan() {
> +  const __m128i iv = _mm_set_epi32(0xff81, 0xff81, 0xff81, 
> 0xff81);
> +  const __m128  fv = _mm_castsi128_ps(iv);
> +  const __m128i riv = _mm_cvttps_epi32(fv);
> +  return riv;
> +}
> +
> +__m128i foo_inf() {
> +  const __m128i iv = _mm_set_epi32(0xff80, 0xff80, 0xff80, 
> 0xff80);
> +  const __m128  fv = _mm_castsi128_ps(iv);
> +  const __m128i riv = _mm_cvttps_epi32(fv);
>

Re: [PATCH] Avoid ICE in except.cc on targets that don't support exceptions.

2024-05-22 Thread Richard Biener
On Wed, May 22, 2024 at 9:21 AM Roger Sayle  wrote:
>
>
> A number of testcases currently fail on nvptx with the ICE:
>
> during RTL pass: final
> openmp-simd-2.c: In function 'foo':
> openmp-simd-2.c:28:1: internal compiler error: in get_personality_function,
> at expr.cc:14037
>28 | }
>   | ^
> 0x98a38f get_personality_function(tree_node*)
> /home/roger/GCC/nvptx-none/gcc/gcc/expr.cc:14037
> 0x969d3b output_function_exception_table(int)
> /home/roger/GCC/nvptx-none/gcc/gcc/except.cc:3226
> 0x9b760d rest_of_handle_final
> /home/roger/GCC/nvptx-none/gcc/gcc/final.cc:4252
>
> The simple oversight in output_function_exception_table is that it calls
> get_personality_function (immediately) before checking the target's
> except_unwind_info hook (which on nvptx always returns UI_NONE).
> The (perhaps obvious) fix is to move the assignments of fname and
> personality after the tests that they are needed, and before their
> first use.
>
> This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
> with no new failures in the testsuite, and ~220 fewer FAILs.
> Ok for mainline?

OK.

Richard.

> 2024-05-22  Roger Sayle  
>
> gcc/ChangeLog
> * except.cc (output_function_exception_table): Move call to
> get_personality_function after targetm_common.except_unwind_info
> check, to avoid ICE on targets that don't support exceptions.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-05-22 Thread Jakub Jelinek
On Wed, May 22, 2024 at 09:46:41AM +0200, Richard Biener wrote:
> On Wed, May 22, 2024 at 3:58 AM liuhongt  wrote:
> >
> > According to IEEE standard, for conversions from floating point to
> > integer. When a NaN or infinite operand cannot be represented in the
> > destination format and this cannot otherwise be indicated, the invalid
> > operation exception shall be signaled. When a numeric operand would
> > convert to an integer outside the range of the destination format, the
> > invalid operation exception shall be signaled if this situation cannot
> > otherwise be indicated.
> >
> > The patch prevent simplication of the conversion from floating point
> > to integer for NAN/INF/out-of-range constant when flag_trapping_math.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > Ok for trunk?
> 
> OK if there are no further comments today.

As I wrote in the PR, I don't think this is the right fix for the PR,
the simplify-rtx.cc change is the right thing to do, the C standard
in F.4 says that the out of range conversions to integers should raise
exceptions, but still says that the resulting value in those cases is
unspecified.
So, for the C part we should verify that with -ftrapping-math we don't
constant fold it and cover it both by pure C and perhaps backend specific
testcases which just search asm for the conversion instructions
or even runtime test which tests that the exceptions are triggered,
verify that we don't fold it either during GIMPLE opts or RTL opts
(dunno whether they can be folded in e.g. C constant initializers or not).

And then on the backend side, it should stop using FIX/UNSIGNED_FIX RTLs
in patterns which are used for the intrinsics (and keep using them in
patterns used for C scalar/vector conversions), because even with
-fno-trapping-math the intrinsics should have the documented behavior,
particular result value, while in C they are clearly unspecified and
FIX/UNSIGNED_FIX folding even with the patch chooses some particular values
which don't match those (sure, they are like that because of Java, but am
not sure it is the right time to change what we do in those cases say
by providing a target hook to pick a different value).

The provided testcase tests the values though, so I think is inappropriate
for this patch.

Jakub



[PATCH] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-22 Thread Manolis Tsamis
The match.pd patterns to merge two vector permutes into one fail when a
potentially no-op view convert expressions is between the two permutes.
This change lifts this restriction.

gcc/ChangeLog:

* match.pd: Allow no-op view_convert between permutes.

gcc/testsuite/ChangeLog:

* gcc.dg/fold-perm-2.c: New test.

Signed-off-by: Manolis Tsamis 
---

 gcc/match.pd   | 14 --
 gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
 2 files changed, 24 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..cbb3c5d86e0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10039,19 +10039,21 @@ and,
  d = VEC_PERM_EXPR ;  */
 
 (simplify
- (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 VECTOR_CST@5)
  (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
   (with
{
  machine_mode result_mode = TYPE_MODE (type);
- machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
  int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
  vec_perm_builder builder0;
  vec_perm_builder builder1;
  vec_perm_builder builder2 (nelts, nelts, 1);
}
-   (if (tree_to_vec_perm_builder (&builder0, @3)
-   && tree_to_vec_perm_builder (&builder1, @4))
+   (if (tree_to_vec_perm_builder (&builder0, @4)
+   && tree_to_vec_perm_builder (&builder1, @5)
+   && element_precision (TREE_TYPE (@0))
+  == element_precision (TREE_TYPE (@1)))
 (with
  {
vec_perm_indices sel0 (builder0, 2, nelts);
@@ -10073,10 +10075,10 @@ and,
   ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
  || !can_vec_perm_const_p (result_mode, op_mode, sel1, false))
   : !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
-op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
  }
  (if (op0)
-  (vec_perm @1 @2 { op0; })))
+  (view_convert (vec_perm @2 @3 { op0; }
 
 /* Merge
  c = VEC_PERM_EXPR ;
diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
b/gcc/testsuite/gcc.dg/fold-perm-2.c
new file mode 100644
index 000..1a4ab4065de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
+typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof (unsigned 
int;
+
+void fun (veci *a, veci *b, veci *c)
+{
+  veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
+  vecu r2 = __builtin_convertvector (r1, vecu);
+  vecu r3 = __builtin_shufflevector (r2, r2, 2, 3, 1, 0);
+  *c = __builtin_convertvector (r3, veci);
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 5, 0 }" "fre1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "fre1" } } */
-- 
2.44.0



[PATCH] web/115183 - fix typo in C++ docs

2024-05-22 Thread Richard Biener
The following fixes a reported typo.

Pushed.

* doc/invoke.texi (C++ Modules): Fix typo.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 218901c0b20..0625a5ede6f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -37646,7 +37646,7 @@ not get debugging information for routines in the 
precompiled header.
 @cindex speed of compilation
 
 Modules are a C++20 language feature.  As the name suggests, they
-provides a modular compilation system, intending to provide both
+provide a modular compilation system, intending to provide both
 faster builds and better library isolation.  The ``Merging Modules''
 paper @uref{https://wg21.link/p1103}, provides the easiest to read set
 of changes to the standard, although it does not capture later
-- 
2.35.3


[PATCH v2 1/8] [APX NF]: Support APX NF add

2024-05-22 Thread Kong, Lingling
> I wonder if we can use "define_subst" to conditionally add flags clobber
> for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the insn
> w/ and w/o the clobber, so I think it is worth considering this approach.
> 
> Uros.

Good Suggestion, I defined new subst for no flags, and Bootstrapped and 
regtested on x86_64-linux-gnu. Also supported SPEC 2017 run normally on Intel 
software development emulator.
Ok for trunk?

Thanks,
Lingling

Subject: [PATCH v2 1/8] [APX NF]: Support APX NF add
APX NF(no flags) feature implements suppresses the update of status flags
for arithmetic operations.

For NF add, it is not clear whether nf add can be faster than lea. If so,
the pattern needs to be adjusted to perfer lea generation.

gcc/ChangeLog:

* config/i386/i386-opts.h (enum apx_features): Add nf
enumeration.
* config/i386/i386.h (TARGET_APX_NF): New.
* config/i386/i386.md (nf_subst): New define_subst.
(nf_name): New subst_attr.
(nf_prefix): Ditto.
(nf_condition): Ditto.
(nf_mem_constraint): Ditto.
(nf_applied): Ditto.
(*add_1_nf): New define_insn.
(addhi_1_nf): Ditto.
(addqi_1_nf): Ditto.
* config/i386/i386.opt: Add apx_nf enumeration.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Fixed test.
* gcc.target/i386/apx-nf.c: New test.

Co-authored-by: Lingling Kong 
---
 gcc/config/i386/i386-opts.h |   3 +-
 gcc/config/i386/i386.h  |   1 +
 gcc/config/i386/i386.md | 179 +++-
 gcc/config/i386/i386.opt|   3 +
 gcc/testsuite/gcc.target/i386/apx-ndd.c |   2 +-
 gcc/testsuite/gcc.target/i386/apx-nf.c  |   6 +
 6 files changed, 126 insertions(+), 68 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf.c

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index ef2825803b3..60176ce609f 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -140,7 +140,8 @@ enum apx_features {
   apx_push2pop2 = 1 << 1,
   apx_ndd = 1 << 2,
   apx_ppx = 1 << 3,
-  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx,
+  apx_nf = 1<< 4,
+  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx | apx_nf,
 };
 
 #endif
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 529edff93a4..f20ae4726da 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -55,6 +55,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
 #define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
 #define TARGET_APX_PPX (ix86_apx_features & apx_ppx)
+#define TARGET_APX_NF (ix86_apx_features & apx_nf)
 
 #include "config/vxworks-dummy.h"
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 764bfe20ff2..bae344518bd 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6233,28 +6233,6 @@
 }
 })
 

-;; Load effective address instructions
-
-(define_insn "*lea"
-  [(set (match_operand:SWI48 0 "register_operand" "=r")
-   (match_operand:SWI48 1 "address_no_seg_operand" "Ts"))]
-  "ix86_hardreg_mov_ok (operands[0], operands[1])"
-{
-  if (SImode_address_operand (operands[1], VOIDmode))
-{
-  gcc_assert (TARGET_64BIT);
-  return "lea{l}\t{%E1, %k0|%k0, %E1}";
-}
-  else
-return "lea{}\t{%E1, %0|%0, %E1}";
-}
-  [(set_attr "type" "lea")
-   (set (attr "mode")
- (if_then_else
-   (match_operand 1 "SImode_address_operand")
-   (const_string "SI")
-   (const_string "")))])
-
 (define_peephole2
   [(set (match_operand:SWI48 0 "register_operand")
(match_operand:SWI48 1 "address_no_seg_operand"))]
@@ -6290,6 +6268,13 @@
   [(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))
   (clobber (reg:CC FLAGS_REG))])]
   "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
+
+(define_split
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 "const1248_operand")))]
+  "TARGET_APX_NF && reload_completed"
+  [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))]
+  "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
 

 ;; Add instructions
 
@@ -6437,48 +6422,65 @@
  (clobber (reg:CC FLAGS_REG))])]
  "split_double_mode (mode, &operands[0], 1, &operands[0], &operands[5]);")
 
-(define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r")
+(define_subst_attr "nf_name" "nf_subst" "_nf" "")
+(define_subst_attr "nf_prefix" "nf_subst" "%{nf%} " "")
+(define_subst_attr "nf_condition" "nf_subst" "TARGET_APX_NF" "true")
+(define_subst_attr "nf_mem_constraint" "nf_subst" "je" "m")
+(define_subst_attr "nf_applied" "nf_subst" "true" "false")
+
+(define_subst "nf_subst"
+  [(set (match_operand:SWI 0)
+(match_operand:SWI 1))]
+  ""
+  [(set (match_dup 0)
+ 

[PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

   * config/i386/i386.md (nf_and_applied): New subst_attr.
   (nf_x64_and_applied): Ditto.
   (*sub_1_nf): New define_insn.
   (*anddi_1_nf): Ditto.
   (*and_1_nf): Ditto.
   (*qi_1_nf): Ditto.
   (*_1_nf): Ditto.
   (*neg_1_nf): Ditto.
   * config/i386/sse.md : New define_split.

gcc/testsuite/ChangeLog:

   * gcc.target/i386/apx-nf.c: Add test.
---
gcc/config/i386/i386.md| 174 +
gcc/config/i386/sse.md |  11 ++
gcc/testsuite/gcc.target/i386/apx-nf.c |   9 ++
3 files changed, 112 insertions(+), 82 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bae344518bd..099d7f35c8f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -575,7 +575,7 @@
 
noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
 
avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
 
avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
- vaes_avx512vl"
+vaes_avx512vl,noapx_nf"
   (const_string "base"))
 ;; The (bounding maximum) length of an instruction immediate.
@@ -981,6 +981,7 @@
 (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
   (eq_attr "mmx_isa" "avx")
 (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
+ (eq_attr "isa" "noapx_nf") (symbol_ref "!TARGET_APX_NF")
  ]
  (const_int 1)))
@@ -7893,20 +7894,21 @@
   "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);"
[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
-(define_insn "*sub_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r,r")
+(define_insn "*sub_1"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" 
"=m,r,,r,r,r")
  (minus:SWI
-(match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,rjM,r")
-(match_operand:SWI 2 "" 
",,r,,")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands, TARGET_APX_NDD)"
+   (match_operand:SWI 1 "nonimmediate_operand" "0,0,0,rm,rjM,r")
+   (match_operand:SWI 2 "" 
",,,r,,")))]
+  "ix86_binary_operator_ok (MINUS, mode, operands, TARGET_APX_NDD)
+  && "
   "@
-  sub{}\t{%2, %0|%0, %2}
-  sub{}\t{%2, %0|%0, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd")
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd,apx_ndd")
(set_attr "type" "alu")
(set_attr "mode" "")])
@@ -11795,27 +11797,31 @@
}
[(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd_64,apx_ndd")])
-(define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm,r,r,r,r,r,?k")
+(define_subst_attr "nf_and_applied" "nf_subst"  "noapx_nf" "*")
+(define_subst_attr "nf_x64_and_applied" "nf_subst" "noapx_nf" "x64")
+
+(define_insn "*anddi_1"
+  [(set (match_operand:DI 0 "nonimmediate_operand" 
"=r,r,rm,r,r,r,r,r,r,?k")
  (and:DI
-  (match_operand:DI 1 "nonimmediate_operand" 
"%0,r,0,0,rm,rjM,r,qm,k")
-  (match_operand:DI 2 "x86_64_szext_general_operand" 
"Z,Z,re,m,r,e,m,L,k")))
-   (clobber (reg:CC FLAGS_REG))]
+ (match_operand:DI 1 "nonimmediate_operand" 
"%0,r,0,0,0,rm,rjM,r,qm,k")
+ (match_operand:DI 2 "x86_64_szext_general_operand" 
"Z,Z,r,e,m,r,e,m,L,k")))]
   "TARGET_64BIT
-   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)"
+   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)
+   && "
   "@
-   and{l}\t{%k2, %k0|%k0, %k2}
-   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
-   and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{l}\t{%k2, %k0|%k0, %k2}
+   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
#
#"
-  [(set_attr "isa" "x64,apx_ndd,x64,x64,apx_ndd,apx_ndd,apx_ndd,x64,avx512bw")
-   (set_attr "type" "alu,alu,alu,alu,alu,alu,alu,imovx,msklog")
-   (set_attr "length_immediate" "*,*,*,*,*,*,*,0,*")
+  [(set_attr "isa" 
"x64,apx_ndd,x64,x64,x64,apx_ndd,apx_ndd,apx_ndd,,avx512bw")
+   (set_attr "type" "alu,alu,alu,alu,alu,alu,alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,*,0,*")
(set (attr "prefix_rex")
  (if_then_else
(and (eq_attr "type" "imovx")
@@ -11

Re: [PATCH v2 1/8] [APX NF]: Support APX NF add

2024-05-22 Thread Uros Bizjak
On Wed, May 22, 2024 at 10:29 AM Kong, Lingling  wrote:
>
> > I wonder if we can use "define_subst" to conditionally add flags clobber
> > for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the 
> > insn
> > w/ and w/o the clobber, so I think it is worth considering this approach.
> >
> > Uros.
>
> Good Suggestion, I defined new subst for no flags, and Bootstrapped and 
> regtested on x86_64-linux-gnu. Also supported SPEC 2017 run normally on Intel 
> software development emulator.
> Ok for trunk?
>
> Thanks,
> Lingling
>
> Subject: [PATCH v2 1/8] [APX NF]: Support APX NF add
> APX NF(no flags) feature implements suppresses the update of status flags
> for arithmetic operations.
>
> For NF add, it is not clear whether nf add can be faster than lea. If so,
> the pattern needs to be adjusted to perfer lea generation.
>
> gcc/ChangeLog:
>
> * config/i386/i386-opts.h (enum apx_features): Add nf
> enumeration.
> * config/i386/i386.h (TARGET_APX_NF): New.
> * config/i386/i386.md (nf_subst): New define_subst.
> (nf_name): New subst_attr.
> (nf_prefix): Ditto.
> (nf_condition): Ditto.
> (nf_mem_constraint): Ditto.
> (nf_applied): Ditto.
> (*add_1_nf): New define_insn.
> (addhi_1_nf): Ditto.
> (addqi_1_nf): Ditto.
> * config/i386/i386.opt: Add apx_nf enumeration.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ndd.c: Fixed test.
> * gcc.target/i386/apx-nf.c: New test.

LGTM, but I'll leave the final approval to Hongtao.

Thanks,
Uros.

>
> Co-authored-by: Lingling Kong 
> ---
>  gcc/config/i386/i386-opts.h |   3 +-
>  gcc/config/i386/i386.h  |   1 +
>  gcc/config/i386/i386.md | 179 +++-
>  gcc/config/i386/i386.opt|   3 +
>  gcc/testsuite/gcc.target/i386/apx-ndd.c |   2 +-
>  gcc/testsuite/gcc.target/i386/apx-nf.c  |   6 +
>  6 files changed, 126 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf.c
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index ef2825803b3..60176ce609f 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -140,7 +140,8 @@ enum apx_features {
>apx_push2pop2 = 1 << 1,
>apx_ndd = 1 << 2,
>apx_ppx = 1 << 3,
> -  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx,
> +  apx_nf = 1<< 4,
> +  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx | apx_nf,
>  };
>
>  #endif
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 529edff93a4..f20ae4726da 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -55,6 +55,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
>  #define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
>  #define TARGET_APX_PPX (ix86_apx_features & apx_ppx)
> +#define TARGET_APX_NF (ix86_apx_features & apx_nf)
>
>  #include "config/vxworks-dummy.h"
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 764bfe20ff2..bae344518bd 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6233,28 +6233,6 @@
>  }
>  })
>
>
> -;; Load effective address instructions
> -
> -(define_insn "*lea"
> -  [(set (match_operand:SWI48 0 "register_operand" "=r")
> -   (match_operand:SWI48 1 "address_no_seg_operand" "Ts"))]
> -  "ix86_hardreg_mov_ok (operands[0], operands[1])"
> -{
> -  if (SImode_address_operand (operands[1], VOIDmode))
> -{
> -  gcc_assert (TARGET_64BIT);
> -  return "lea{l}\t{%E1, %k0|%k0, %E1}";
> -}
> -  else
> -return "lea{}\t{%E1, %0|%0, %E1}";
> -}
> -  [(set_attr "type" "lea")
> -   (set (attr "mode")
> - (if_then_else
> -   (match_operand 1 "SImode_address_operand")
> -   (const_string "SI")
> -   (const_string "")))])
> -
>  (define_peephole2
>[(set (match_operand:SWI48 0 "register_operand")
> (match_operand:SWI48 1 "address_no_seg_operand"))]
> @@ -6290,6 +6268,13 @@
>[(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))
>(clobber (reg:CC FLAGS_REG))])]
>"operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
> +
> +(define_split
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +   (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 
> "const1248_operand")))]
> +  "TARGET_APX_NF && reload_completed"
> +  [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))]
> +  "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
>
>
>  ;; Add instructions
>
> @@ -6437,48 +6422,65 @@
>   (clobber (reg:CC FLAGS_REG))])]
>   "split_double_mode (mode, &operands[0], 1, &operands[0], 
> &operands[5]);")
>
> -(define_insn "*add_1"
> -  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r")
> +(define_subst_attr "nf_name" "nf_subst" "_

[PATCH v2 3/8] [APX NF] Support APX NF for left shift insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (*ashl3_1_nf): New.
(*ashlhi3_1_nf): Ditto.
(*ashlqi3_1_nf): Ditto.
* config/i386/sse.md: New define_split.
---
 gcc/config/i386/i386.md | 80 +++--
 gcc/config/i386/sse.md  | 13 +++
 2 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
099d7f35c8f..271d449d7c4 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15012,12 +15012,12 @@
   [(set_attr "type" "ishiftx")
(set_attr "mode" "")])
 
-(define_insn "*ashl3_1"
+(define_insn "*ashl3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k,r")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm,k,rm")
- (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -15030,7 +15030,7 @@
 case TYPE_ALU:
   gcc_assert (operands[2] == const1_rtx);
   gcc_assert (rtx_equal_p (operands[0], operands[1]));
-  return "add{}\t%0, %0";
+  return "add{}\t%0, %0";
 
 default:
   if (operands[2] == const1_rtx
@@ -15038,11 +15038,11 @@
  /* For NDD form instructions related to TARGET_SHIFT1, the $1
 immediate do not need to be omitted as assembler will map it
 to use shorter encoding. */
- && !use_ndd)
+ && !use_ndd && !)
return "sal{}\t%0";
   else
-   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, %2}"
-  : "sal{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, 
%2}"
+  : "sal{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,*,bmi2,avx512bw,apx_ndd") @@ -15073,6 +15073,17 @@
(set_attr "mode" "")])
 
 ;; Convert shift to the shiftx pattern to avoid flags dependency.
+;; For NF/NDD doesn't support shift count as r, it just support c, 
+;; but it has no flag.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
+ (match_operand:QI 2 "register_operand")))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+   (ashift:SWI48 (match_dup 1) (match_dup 2)))]
+  "operands[2] = gen_lowpart (mode, operands[2]);")
+
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") @@ 
-15159,12 +15170,12 @@
(zero_extend:DI (ashift:SI (match_dup 1) (match_dup 2]
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
-(define_insn "*ashlhi3_1"
+(define_insn "*ashlhi3_1"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k,r")
(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k,rm")
-  (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww,cI")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, HImode, operands, TARGET_APX_NDD)"
+  (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww,cI")))]
+  "ix86_binary_operator_ok (ASHIFT, HImode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -15175,16 +15186,16 @@
 
 case TYPE_ALU:
   gcc_assert (operands[2] == const1_rtx);
-  return "add{w}\t%0, %0";
+  return "add{w}\t%0, %0";
 
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "sal{w}\t%0";
   else
-   return use_ndd ? "sal{w}\t{%2, %1, %0|%0, %1, %2}"
-  : "sal{w}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{w}\t{%2, %1, %0|%0, %1, %2}"
+  : "sal{w}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,*,avx512f,apx_ndd") @@ -15212,12 +15223,12 @@
(const_string "*")))
(set_attr "mode" "HI,SI,HI,HI")])
 
-(define_insn "*ashlqi3_1"
+(define_insn "*ashlqi3_1"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k,r")
(ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k,rm")
-  (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb,cI")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, QImode, operands, TARGET_APX_NDD)"
+  (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb,cI")))]
+  "ix86_binary_operator_ok (ASHIFT, QImode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -15229,14 +15240,14 @@
 case TYPE_ALU:
   gcc_as

[PATCH v2 4/8] [APX NF] Support APX NF for right shift insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (*ashr3_1_nf): New.
(*lshr3_1_nf): Ditto.
(*lshrqi3_1_nf): Ditto.
(*lshrhi3_1_nf): Ditto.
---
 gcc/config/i386/i386.md | 82 +++--
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
271d449d7c4..7f191749342 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16308,13 +16308,13 @@
   [(set_attr "type" "ishiftx")
(set_attr "mode" "")])
 
-(define_insn "*ashr3_1"
+(define_insn "*ashr3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(ashiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c,r,c")))]
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -16325,11 +16325,11 @@
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "sar{}\t%0";
   else
-   return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
-  : "sar{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, 
%2}"
+  : "sar{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2,apx_ndd")
@@ -16369,14 +16369,13 @@
 }
 [(set_attr "isa" "*,*,*,apx_ndd")])
 
-
-(define_insn "*lshr3_1"
+(define_insn "*lshr3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k,r")
(lshiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r,,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (LSHIFTRT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c,r,,c")))]
+  "ix86_binary_operator_ok (LSHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -16388,11 +16387,11 @@
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "shr{}\t%0";
   else
-   return use_ndd ? "shr{}\t{%2, %1, %0|%0, %1, %2}"
-  : "shr{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "shr{}\t{%2, %1, %0|%0, %1, 
%2}"
+  : "shr{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2,avx512bw,apx_ndd") @@ -16408,6 +16407,17 @@
(set_attr "mode" "")])
 
 ;; Convert shift to the shiftx pattern to avoid flags dependency.
+;; For NF/NDD doesn't support shift count as r, it just support c, 
+;; but it has no flag.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
+  (match_operand:QI 2 "register_operand")))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+   (any_shiftrt:SWI48 (match_dup 1) (match_dup 2)))]
+  "operands[2] = gen_lowpart (mode, operands[2]);")
+
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
(any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") @@ 
-16476,22 +16486,22 @@
(zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2]
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
-(define_insn "*ashr3_1"
+(define_insn "*ashr3_1"
   [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m, r")
(ashiftrt:SWI12
  (match_operand:SWI12 1 "nonimmediate_operand" "0, rm")
- (match_operand:QI 2 "nonmemory_operand" "c, c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c, c")))]
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   if (operands[2] == const1_rtx
   && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
-  && !use_ndd)
+  && !use_ndd && !)
 return "sar{}\t%0";
   else
-return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
-  : "sar{}\t{%2, %0|%0, %2}";
+return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sar{}\t{%2, %0|%0, %2}";
 }
   [(set_attr "isa" "*, apx_ndd")
(set_attr "type" "ishift")
@@ -16504,13 +16514,13 @@
(const_string "*")))
(set_attr "mode" "")])
 
-(define_insn "*lshrqi3_1"
+(define_insn "*lshrqi3_1"
   [(set (match_operand:QI 0 "nonimmediate_operand"  "=qm,?k,r")
(lshiftrt:QI
  (match_operand

[PATCH v2 5/8] [APX NF] Support APX NF for rotate insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (ashr3_cvt_nf): New define_insn.
(*3_1_nf): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-nf.c: Add NF test for rotate insns.
---
 gcc/config/i386/i386.md| 53 --
 gcc/testsuite/gcc.target/i386/apx-nf.c |  5 +++
 2 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
7f191749342..731eb12d13a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16230,19 +16230,19 @@
 (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
 
-(define_insn "ashr3_cvt"
+(define_insn "ashr3_cvt"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm,r")
(ashiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "*a,0,rm")
- (match_operand:QI 2 "const_int_operand")))
-   (clobber (reg:CC FLAGS_REG))]
+ (match_operand:QI 2 "const_int_operand")))]
   "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)-1
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)"
+   && ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
   "@

-   sar{}\t{%2, %0|%0, %2}
-   sar{}\t{%2, %1, %0|%0, %1, %2}"
+   sar{}\t{%2, %0|%0, %2}
+   sar{}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "*,*,apx_ndd")
(set_attr "type" "imovx,ishift,ishift")
(set_attr "prefix_0f" "0,*,*")
@@ -17094,13 +17094,13 @@
   [(set_attr "type" "rotatex")
(set_attr "mode" "")])
 
-(define_insn "*3_1"
+(define_insn "*3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(any_rotate:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c,,c")))]
+  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -17111,11 +17111,11 @@
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "{}\t%0";
   else
-   return use_ndd ? "{}\t{%2, %1, %0|%0, %1, %2}"
-  : "{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "{}\t{%2, %1, %0|%0, 
%1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2,apx_ndd")
@@ -17135,6 +17135,19 @@
(set_attr "mode" "")])
 
 ;; Convert rotate to the rotatex pattern to avoid flags dependency.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (rotate:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
+ (match_operand:QI 2 "const_int_operand")))]
+  "TARGET_BMI2 && reload_completed && !optimize_function_for_size_p (cfun)"
+  [(set (match_dup 0)
+   (rotatert:SWI48 (match_dup 1) (match_dup 2)))] {
+  int bitsize = GET_MODE_BITSIZE (mode);
+
+  operands[2] = GEN_INT ((bitsize - INTVAL (operands[2])) % bitsize);
+})
+
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
(rotate:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") @@ 
-17236,22 +17249,22 @@
   [(set (match_dup 0)
(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2])
 
-(define_insn "*3_1"
+(define_insn "*3_1"
   [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m,r")
(any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" 
"c,c")))]
+  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   if (operands[2] == const1_rtx
   && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
-  && !use_ndd)
+  && !use_ndd && !)
 return "{}\t%0";
   else
 return use_ndd
-  ? "{}\t{%2, %1, %0|%0, %1, %2}"
-  : "{}\t{%2, %0|%0, %2}";
+  ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
   [(set_attr "isa" "*,apx_ndd")
(set_attr "type" "rotate")
diff --git a/gcc/testsuite/gcc.target/i386/apx-nf.c 
b/gcc/testsuite/gcc.target/i386/apx-nf.c
index 608dbf8f5f7..6e59803be64 100644
--- a/gcc/testsuite/gcc.target/i386/apx-nf.c
+++ b/gcc/testsuite/gcc.target/i386/apx-nf.c
@@ -3,6 +3,7 @@
 /* { dg-final { scan-assembler-times "\{nf\} add" 4 } } */
 /* { dg-final { scan-assembler-times "\{nf\} and" 1 } } */
 /* { dg-final { scan-assembler-times "\{nf\} or" 1 } } */
+/* { dg-final { scan-assembler-times "\{nf\} rol" 4 } } */
 
 #

[PATCH v2 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (x86_64_shld_nf): New define_insn.
(x86_64_shld_ndd_nf): Ditto.
(x86_64_shld_1_nf): Ditto.
(x86_64_shld_ndd_1_nf): Ditto.
(*x86_64_shld_shrd_1_nozext_nf): Ditto.
(x86_shld_nf): Ditto.
(x86_shld_ndd_nf): Ditto.
(x86_shld_1_nf): Ditto.
(x86_shld_ndd_1_nf): Ditto.
(*x86_shld_shrd_1_nozext_nf): Ditto.
(3_doubleword_lowpart_nf): Ditto.
(x86_64_shrd_nf): Ditto.
(x86_64_shrd_ndd_nf): Ditto.
(x86_64_shrd_1_nf): Ditto.
(x86_64_shrd_ndd_1_nf): Ditto.
(*x86_64_shrd_shld_1_nozext_nf): Ditto.
(x86_shrd_nf): Ditto.
(x86_shrd_ndd_nf): Ditto.
(x86_shrd_1_nf): Ditto.
(x86_shrd_ndd_1_nf): Ditto.
(*x86_shrd_shld_1_nozext_nf): Ditto.
---
 gcc/config/i386/i386.md | 377 +++-
 1 file changed, 296 insertions(+), 81 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
731eb12d13a..4d684e8d919 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14552,7 +14552,7 @@
   DONE;
 })
 
-(define_insn "x86_64_shld"
+(define_insn "x86_64_shld"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
  (and:QI (match_operand:QI 2 "nonmemory_operand" "Jc") @@ 
-14562,10 +14562,9 @@
(zero_extend:TI
  (match_operand:DI 1 "register_operand" "r"))
(minus:QI (const_int 64)
- (and:QI (match_dup 2) (const_int 63 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT"
-  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
+ (and:QI (match_dup 2) (const_int 63 0)))]
+  "TARGET_64BIT && "
+  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ishift")
(set_attr "prefix_0f" "1")
(set_attr "mode" "DI")
@@ -14573,7 +14572,7 @@
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
-(define_insn "x86_64_shld_ndd"
+(define_insn "x86_64_shld_ndd"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
  (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc") @@ 
-14583,14 +14582,13 @@
(zero_extend:TI
  (match_operand:DI 2 "register_operand" "r"))
(minus:QI (const_int 64)
- (and:QI (match_dup 3) (const_int 63 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_APX_NDD"
-  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+ (and:QI (match_dup 3) (const_int 63 0)))]
+  "TARGET_APX_NDD && "
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ishift")
(set_attr "mode" "DI")])
 
-(define_insn "x86_64_shld_1"
+(define_insn "x86_64_shld_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
   (match_operand:QI 2 "const_0_to_63_operand")) @@ 
-14598,11 +14596,11 @@
  (lshiftrt:TI
(zero_extend:TI
  (match_operand:DI 1 "register_operand" "r"))
-   (match_operand:QI 3 "const_0_to_255_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
+   (match_operand:QI 3 "const_0_to_255_operand")) 0)))]
   "TARGET_64BIT
-   && INTVAL (operands[3]) == 64 - INTVAL (operands[2])"
-  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
+   && INTVAL (operands[3]) == 64 - INTVAL (operands[2])
+   && "
+  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ishift")
(set_attr "prefix_0f" "1")
(set_attr "mode" "DI")
@@ -14611,7 +14609,7 @@
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
-(define_insn "x86_64_shld_ndd_1"
+(define_insn "x86_64_shld_ndd_1"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
   (match_operand:QI 3 "const_0_to_63_operand")) @@ 
-14619,15 +14617,66 @@
  (lshiftrt:TI
(zero_extend:TI
  (match_operand:DI 2 "register_operand" "r"))
-   (match_operand:QI 4 "const_0_to_255_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
+   (match_operand:QI 4 "const_0_to_255_operand")) 0)))]
   "TARGET_APX_NDD
-   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
-  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])
+   && "
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ishift")
(set_attr "mode" "DI")
(set_attr "length_immediate" "1")])
 
+(define_insn_and_split "*x86_64_shld_shrd_1_nozext_nf"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+   (ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_oper

[PATCH v2 7/8] [APX NF] Support APX NF for mul/div

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (*mul3_1_nf): New define_insn.
(*mulqi3_1_nf): Ditto.
(*divmod4_noext_nf): Ditto.
(divmodhiqi3_nf): Ditto.
---
 gcc/config/i386/i386.md | 47 ++---
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
4d684e8d919..087761e5b3a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9896,17 +9896,17 @@
 ;;
 ;; On BDVER1, all HI MULs use DoublePath
 
-(define_insn "*mul3_1"
+(define_insn "*mul3_1"
   [(set (match_operand:SWIM248 0 "register_operand" "=r,r,r")
(mult:SWIM248
  (match_operand:SWIM248 1 "nonimmediate_operand" "%rm,rm,0")
- (match_operand:SWIM248 2 "" "K,,r")))
-   (clobber (reg:CC FLAGS_REG))]
-  "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
+ (match_operand:SWIM248 2 "" "K,,r")))]
+  "!(MEM_P (operands[1]) && MEM_P (operands[2]))
+   && "
   "@
-   imul{}\t{%2, %1, %0|%0, %1, %2}
-   imul{}\t{%2, %1, %0|%0, %1, %2}
-   imul{}\t{%2, %0|%0, %2}"
+   imul{}\t{%2, %1, %0|%0, %1, %2}
+   imul{}\t{%2, %1, %0|%0, %1, %2}
+   imul{}\t{%2, %0|%0, %2}"
   [(set_attr "type" "imul")
(set_attr "prefix_0f" "0,0,1")
(set (attr "athlon_decode")
@@ -9967,14 +9967,14 @@
 ;; MUL reg8Direct
 ;; MUL mem8Direct
 
-(define_insn "*mulqi3_1"
+(define_insn "*mulqi3_1"
   [(set (match_operand:QI 0 "register_operand" "=a")
(mult:QI (match_operand:QI 1 "nonimmediate_operand" "%0")
-(match_operand:QI 2 "nonimmediate_operand" "qm")))
-   (clobber (reg:CC FLAGS_REG))]
+(match_operand:QI 2 "nonimmediate_operand" "qm")))]
   "TARGET_QIMODE_MATH
-   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "mul{b}\t%2"
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))
+   && "
+  "mul{b}\t%2"
   [(set_attr "type" "imul")
(set_attr "length_immediate" "0")
(set (attr "athlon_decode")
@@ -7,6 +7,19 @@
   [(set_attr "type" "multi")
(set_attr "mode" "SI")])
 
+(define_insn "*divmod4_noext_nf"
+  [(set (match_operand:SWIM248 0 "register_operand" "=a")
+   (any_div:SWIM248
+ (match_operand:SWIM248 2 "register_operand" "0")
+ (match_operand:SWIM248 3 "nonimmediate_operand" "rm")))
+   (set (match_operand:SWIM248 1 "register_operand" "=d")
+   (:SWIM248 (match_dup 2) (match_dup 3)))
+   (use (match_operand:SWIM248 4 "register_operand" "1"))]
+  "TARGET_APX_NF"
+  "%{nf%} div{}\t%3"
+  [(set_attr "type" "idiv")
+   (set_attr "mode" "")])
+
 (define_insn "*divmod4_noext"
   [(set (match_operand:SWIM248 0 "register_operand" "=a")
(any_div:SWIM248
@@ -11264,7 +11277,7 @@
 ;; Change div/mod to HImode and extend the second argument to HImode  ;; so 
that mode of div/mod matches with mode of arguments.  Otherwise  ;; combine may 
fail.
-(define_insn "divmodhiqi3"
+(define_insn "divmodhiqi3"
   [(set (match_operand:HI 0 "register_operand" "=a")
(ior:HI
  (ashift:HI
@@ -11276,10 +11289,10 @@
(const_int 8))
  (zero_extend:HI
(truncate:QI
- (div:HI (match_dup 1) (any_extend:HI (match_dup 2)))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_QIMODE_MATH"
-  "div{b}\t%2"
+ (div:HI (match_dup 1) (any_extend:HI (match_dup 2)))]
+  "TARGET_QIMODE_MATH
+   && "
+  "div{b}\t%2"
   [(set_attr "type" "idiv")
(set_attr "mode" "QI")])
 
--
2.31.1



[PATCH v2 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (clz2_lzcnt_nf): New define_insn.
(*clz2_lzcnt_falsedep_nf): Ditto.
(__nf): Ditto.
(*__falsedep_nf): Ditto.
(_hi_nf): Ditto.
(popcount2_nf): Ditto.
(*popcount2_falsedep_nf): Ditto.
(popcounthi2_nf): Ditto.
---
 gcc/config/i386/i386.md | 124 
 1 file changed, 113 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
087761e5b3a..c9a3a99ca70 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20250,6 +20250,24 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
+(define_insn_and_split "clz2_lzcnt_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (clz:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_APX_NF && TARGET_LZCNT"
+  "%{nf%} lzcnt{}\t{%1, %0|%0, %1}"
+  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && !reg_mentioned_p (operands[0], operands[1])"
+  [(parallel
+[(set (match_dup 0)
+ (clz:SWI48 (match_dup 1)))
+ (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
+  "ix86_expand_clear (operands[0]);"
+  [(set_attr "prefix_rep" "1")
+   (set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
 (define_insn_and_split "clz2_lzcnt"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(clz:SWI48
@@ -20273,6 +20291,18 @@
 ; False dependency happens when destination is only updated by tzcnt,  ; lzcnt 
or popcnt.  There is no false dependency when destination is  ; also used in 
source.
+(define_insn "*clz2_lzcnt_falsedep_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (clz:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")))
+   (unspec [(match_operand:SWI48 2 "register_operand" "0")]
+  UNSPEC_INSN_FALSE_DEP)]
+  "TARGET_APX_NF && TARGET_LZCNT"
+  "%{nf%} lzcnt{}\t{%1, %0|%0, %1}"
+  [(set_attr "prefix_rep" "1")
+   (set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
 (define_insn "*clz2_lzcnt_falsedep"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(clz:SWI48
@@ -20379,6 +20409,25 @@
 ;; Version of lzcnt/tzcnt that is expanded from intrinsics.  This version  ;; 
provides operand size as output when source operand is zero. 
 
+(define_insn_and_split "__nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (unspec:SWI48
+ [(match_operand:SWI48 1 "nonimmediate_operand" "rm")] LT_ZCNT))]
+  "TARGET_APX_NF"
+  "%{nf%} {}\t{%1, %0|%0, %1}"
+  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && !reg_mentioned_p (operands[0], operands[1])"
+  [(parallel
+[(set (match_dup 0)
+ (unspec:SWI48 [(match_dup 1)] LT_ZCNT))
+ (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
+  "ix86_expand_clear (operands[0]);"
+  [(set_attr "type" "")
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")
+   (set_attr "mode" "")])
+
 (define_insn_and_split "_"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(unspec:SWI48
@@ -20403,6 +20452,19 @@
 ; False dependency happens when destination is only updated by tzcnt,  ; lzcnt 
or popcnt.  There is no false dependency when destination is  ; also used in 
source.
+(define_insn "*__falsedep_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (unspec:SWI48
+ [(match_operand:SWI48 1 "nonimmediate_operand" "rm")] LT_ZCNT))
+   (unspec [(match_operand:SWI48 2 "register_operand" "0")]
+  UNSPEC_INSN_FALSE_DEP)]
+  "TARGET_APX_NF"
+  "%{nf%} {}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "")
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")
+   (set_attr "mode" "")])
+
 (define_insn "*__falsedep"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(unspec:SWI48
@@ -20417,13 +20479,12 @@
(set_attr "prefix_rep" "1")
(set_attr "mode" "")])
 
-(define_insn "_hi"
+(define_insn "_hi"
   [(set (match_operand:HI 0 "register_operand" "=r")
(unspec:HI
- [(match_operand:HI 1 "nonimmediate_operand" "rm")] LT_ZCNT))
-   (clobber (reg:CC FLAGS_REG))]
-  ""
-  "{w}\t{%1, %0|%0, %1}"
+ [(match_operand:HI 1 "nonimmediate_operand" "rm")] LT_ZCNT))]
+  ""
+  "{w}\t{%1, %0|%0, %1}"
   [(set_attr "type" "")
(set_attr "prefix_0f" "1")
(set_attr "prefix_rep" "1")
@@ -20841,6 +20902,30 @@
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
+(define_insn_and_split "popcount2_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (popcount:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_APX_NF && TARGET_POPCNT"
+{
+#if TARGET_MACHO
+  return "%{nf%} popcnt\t{%1, %0|%0, %1}"; #else
+  return "%{nf%} popcnt{}\t{%1, %0|%0, %1}"; #endif }
+  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && !reg_menti

RE: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-22 Thread Kong, Lingling
Cc Uros.

From: Kong, Lingling 
Sent: Wednesday, May 22, 2024 4:35 PM
To: gcc-patches@gcc.gnu.org
Cc: Liu, Hongtao ; Kong, Lingling 

Subject: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

gcc/ChangeLog:

   * config/i386/i386.md (nf_and_applied): New subst_attr.
   (nf_x64_and_applied): Ditto.
   (*sub_1_nf): New define_insn.
   (*anddi_1_nf): Ditto.
   (*and_1_nf): Ditto.
   (*qi_1_nf): Ditto.
   (*_1_nf): Ditto.
   (*neg_1_nf): Ditto.
   * config/i386/sse.md : New define_split.

gcc/testsuite/ChangeLog:

   * gcc.target/i386/apx-nf.c: Add test.
---
gcc/config/i386/i386.md| 174 +
gcc/config/i386/sse.md |  11 ++
gcc/testsuite/gcc.target/i386/apx-nf.c |   9 ++
3 files changed, 112 insertions(+), 82 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bae344518bd..099d7f35c8f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -575,7 +575,7 @@
 
noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
 
avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
 
avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
- vaes_avx512vl"
+vaes_avx512vl,noapx_nf"
   (const_string "base"))

 ;; The (bounding maximum) length of an instruction immediate.
@@ -981,6 +981,7 @@
 (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
   (eq_attr "mmx_isa" "avx")
 (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
+ (eq_attr "isa" "noapx_nf") (symbol_ref "!TARGET_APX_NF")
  ]
  (const_int 1)))

@@ -7893,20 +7894,21 @@
   "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);"
[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])

-(define_insn "*sub_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r,r")
+(define_insn "*sub_1"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" 
"=m,r,,r,r,r")
  (minus:SWI
-(match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,rjM,r")
-(match_operand:SWI 2 "" 
",,r,,")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands, TARGET_APX_NDD)"
+   (match_operand:SWI 1 "nonimmediate_operand" "0,0,0,rm,rjM,r")
+   (match_operand:SWI 2 "" 
",,,r,,")))]
+  "ix86_binary_operator_ok (MINUS, mode, operands, TARGET_APX_NDD)
+  && "
   "@
-  sub{}\t{%2, %0|%0, %2}
-  sub{}\t{%2, %0|%0, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd")
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd,apx_ndd")
(set_attr "type" "alu")
(set_attr "mode" "")])

@@ -11795,27 +11797,31 @@
}
[(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd_64,apx_ndd")])

-(define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm,r,r,r,r,r,?k")
+(define_subst_attr "nf_and_applied" "nf_subst"  "noapx_nf" "*")
+(define_subst_attr "nf_x64_and_applied" "nf_subst" "noapx_nf" "x64")
+
+(define_insn "*anddi_1"
+  [(set (match_operand:DI 0 "nonimmediate_operand" 
"=r,r,rm,r,r,r,r,r,r,?k")
  (and:DI
-  (match_operand:DI 1 "nonimmediate_operand" 
"%0,r,0,0,rm,rjM,r,qm,k")
-  (match_operand:DI 2 "x86_64_szext_general_operand" 
"Z,Z,re,m,r,e,m,L,k")))
-   (clobber (reg:CC FLAGS_REG))]
+ (match_operand:DI 1 "nonimmediate_operand" 
"%0,r,0,0,0,rm,rjM,r,qm,k")
+ (match_operand:DI 2 "x86_64_szext_general_operand" 
"Z,Z,r,e,m,r,e,m,L,k")))]
   "TARGET_64BIT
-   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)"
+   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)
+   && "
   "@
-   and{l}\t{%k2, %k0|%k0, %k2}
-   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
-   and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{l}\t{%k2, %k0|%k0, %k2}
+   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
#
#"
-  [(set_attr "isa" "x64,apx_ndd,x64,x64,apx_ndd,apx_ndd,apx_ndd,x64,avx512bw")
-   (set_attr "type" "alu,alu,alu,alu,alu,alu,alu,imovx,msklog")
-   (set_attr "length_immediate" "*,*,*,*,*,*,*,0,*")
+  [(set_attr "isa" 
"x64,apx_ndd,x64,x64,x64,apx_ndd,apx_ndd,apx_ndd,,avx512bw")
+   (

[committed] libstdc++: Ensure std::variant relops convert to bool [PR115145]

2024-05-22 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Ensure that the result of comparing the variant alternatives is
converted to bool immediately rather than copied.

libstdc++-v3/ChangeLog:

PR libstdc++/115145
* include/std/variant (operator==, operator!=, operator<)
(operator<=, operator>, operator>=): Add trailing-return-type to
lambda expressions to trigger conversion to bool.
* testsuite/20_util/variant/relops/115145.cc: New test.
---
 libstdc++-v3/include/std/variant  | 63 ++-
 .../20_util/variant/relops/115145.cc  | 36 +++
 2 files changed, 71 insertions(+), 28 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/variant/relops/115145.cc

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index cfb4bcdbcc9..371cbb90f54 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1271,10 +1271,11 @@ namespace __detail::__variant
 operator== [[nodiscard]] (const variant<_Types...>& __lhs,
  const variant<_Types...>& __rhs)
 {
-  return __detail::__variant::__compare(true, __lhs, __rhs,
-   [](auto&& __l, auto&& __r) {
- return __l == __r;
-   });
+  namespace __variant = __detail::__variant;
+  return __variant::__compare(true, __lhs, __rhs,
+ [](auto&& __l, auto&& __r) -> bool {
+   return __l == __r;
+ });
 }
 
   template
@@ -1286,10 +1287,11 @@ namespace __detail::__variant
 operator!= [[nodiscard]] (const variant<_Types...>& __lhs,
  const variant<_Types...>& __rhs)
 {
-  return __detail::__variant::__compare(true, __lhs, __rhs,
-   [](auto&& __l, auto&& __r) {
- return __l != __r;
-   });
+  namespace __variant = __detail::__variant;
+  return __variant::__compare(true, __lhs, __rhs,
+ [](auto&& __l, auto&& __r) -> bool {
+   return __l != __r;
+ });
 }
 
   template
@@ -1301,10 +1303,11 @@ namespace __detail::__variant
 operator< [[nodiscard]] (const variant<_Types...>& __lhs,
 const variant<_Types...>& __rhs)
 {
-  return __detail::__variant::__compare(true, __lhs, __rhs,
-   [](auto&& __l, auto&& __r) {
- return __l < __r;
-   });
+  namespace __variant = __detail::__variant;
+  return __variant::__compare(true, __lhs, __rhs,
+ [](auto&& __l, auto&& __r) -> bool {
+   return __l < __r;
+ });
 }
 
   template
@@ -1316,10 +1319,11 @@ namespace __detail::__variant
 operator<= [[nodiscard]] (const variant<_Types...>& __lhs,
  const variant<_Types...>& __rhs)
 {
-  return __detail::__variant::__compare(true, __lhs, __rhs,
-   [](auto&& __l, auto&& __r) {
- return __l <= __r;
-   });
+  namespace __variant = __detail::__variant;
+  return __variant::__compare(true, __lhs, __rhs,
+ [](auto&& __l, auto&& __r) -> bool {
+   return __l <= __r;
+ });
 }
 
   template
@@ -1331,10 +1335,11 @@ namespace __detail::__variant
 operator> [[nodiscard]] (const variant<_Types...>& __lhs,
 const variant<_Types...>& __rhs)
 {
-  return __detail::__variant::__compare(true, __lhs, __rhs,
-   [](auto&& __l, auto&& __r) {
- return __l > __r;
-   });
+  namespace __variant = __detail::__variant;
+  return __variant::__compare(true, __lhs, __rhs,
+ [](auto&& __l, auto&& __r) -> bool {
+   return __l > __r;
+ });
 }
 
   template
@@ -1346,10 +1351,11 @@ namespace __detail::__variant
 operator>= [[nodiscard]] (const variant<_Types...>& __lhs,
  const variant<_Types...>& __rhs)
 {
-  return __detail::__variant::__compare(true, __lhs, __rhs,
-   [](auto&& __l, auto&& __r) {
- return __l >= __r;
-  

Re: [PATCH] Add %[zt][diox] support to pretty-print

2024-05-22 Thread YunQiang Su
Jakub Jelinek  于2024年2月10日周六 17:41写道:
>
> Hi!
>
> In the previous patch I haven't touched the gcc diagnostic routines,
> using HOST_SIZE_T_PRINT* for those is obviously undesirable because we
> want the strings to be translatable.  We already have %w[diox] for
> HOST_WIDE_INT arguments, this patch adds t and z modifiers for those.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-02-10  Jakub Jelinek  
>
> gcc/
> * pretty-print.cc (pp_integer_with_precision): Handle precision 3 for
> size_t and precision 4 for ptrdiff_t.  Formatting fix.
> (pp_format): Document %{t,z}{d,i,u,o,x}.  Implement t and z modifiers.
> Formatting fixes.
> (test_pp_format): Test t and z modifiers.
> * gcc.cc (read_specs): Use %td instead of %ld and casts to long.
> gcc/c-family/
> * c-format.cc (gcc_diag_length_specs): Add t and z modifiers.
> (PP_FORMAT_CHAR_TABLE, gcc_gfc_char_table): Add entries for t and
> z modifiers.
> gcc/fortran/
> * error.cc (error_print): Handle z and t modifiers on d, i and u.
> * check.cc (gfc_check_transfer): Use %zd instead of %ld and casts to
> long.
> * primary.cc (gfc_convert_to_structure_constructor): Use %td instead
> of %ld and casts to long.
>
> --- gcc/gcc.cc.jj   2024-02-09 14:54:09.141489744 +0100
> +++ gcc/gcc.cc  2024-02-09 22:04:37.655678742 +0100
> @@ -2410,8 +2410,7 @@ read_specs (const char *filename, bool m
>   if (*p1++ != '<' || p[-2] != '>')
> fatal_error (input_location,
>  "specs %%include syntax malformed after "
> -"%ld characters",
> -(long) (p1 - buffer + 1));
> +"%td characters", p1 - buffer + 1);
>

Should we use %td later for gcc itself? Since we may use older
compiler to build gcc.
My major workstation is Debian Bookworm, which has GCC 12, and then I
get some warnings:

../../gcc/gcc.cc: In function ‘void read_specs(const char*, bool,
bool)’:
../../gcc/gcc.cc:2417:32: warning: unknown conversion type character
‘t’ in format [-Wformat=]
 2417 |  "%td characters", p1 - buffer +
1);
  |^
../../gcc/gcc.cc:2416:30: warning: too many arguments for format
[-Wformat-extra-args]
 2416 |  "specs %%include syntax malformed
after "
  |
^
 2417 |  "%td characters", p1 - buffer + 1);

>   p[-2] = '\0';
>   new_filename = find_a_file (&startfile_prefixes, p1, R_OK, 
> true);
> @@ -2431,8 +2430,7 @@ read_specs (const char *filename, bool m
>   if (*p1++ != '<' || p[-2] != '>')
> fatal_error (input_location,
>  "specs %%include syntax malformed after "
> -"%ld characters",
> -(long) (p1 - buffer + 1));
> +"%td characters", p1 - buffer + 1);
>
>   p[-2] = '\0';
>   new_filename = find_a_file (&startfile_prefixes, p1, R_OK, 
> true);
> @@ -2458,8 +2456,7 @@ read_specs (const char *filename, bool m
>   if (! ISALPHA ((unsigned char) *p1))
> fatal_error (input_location,
>  "specs %%rename syntax malformed after "
> -"%ld characters",
> -(long) (p1 - buffer));
> +"%td characters", p1 - buffer);
>
>   p2 = p1;
>   while (*p2 && !ISSPACE ((unsigned char) *p2))
> @@ -2468,8 +2465,7 @@ read_specs (const char *filename, bool m
>   if (*p2 != ' ' && *p2 != '\t')
> fatal_error (input_location,
>  "specs %%rename syntax malformed after "
> -"%ld characters",
> -(long) (p2 - buffer));
> +"%td characters", p2 - buffer);
>
>   name_len = p2 - p1;
>   *p2++ = '\0';
> @@ -2479,8 +2475,7 @@ read_specs (const char *filename, bool m
>   if (! ISALPHA ((unsigned char) *p2))
> fatal_error (input_location,
>  "specs %%rename syntax malformed after "
> -"%ld characters",
> -(long) (p2 - buffer));
> +"%td characters", p2 - buffer);
>
>   /* Get new spec name.  */
>   p3 = p2;
> @@ -2490,8 +2485,7 @@ read_specs (const char *filename, bool m
>   if (p3 != p - 1)
> fatal_error (input_location,
>  "specs %%rename syntax malformed after "
> -"%ld characters",
> -(long) (p3 - buffer));
> +   

Re: [PATCH] Add %[zt][diox] support to pretty-print

2024-05-22 Thread Jakub Jelinek
On Wed, May 22, 2024 at 05:05:30PM +0800, YunQiang Su wrote:
> > --- gcc/gcc.cc.jj   2024-02-09 14:54:09.141489744 +0100
> > +++ gcc/gcc.cc  2024-02-09 22:04:37.655678742 +0100
> > @@ -2410,8 +2410,7 @@ read_specs (const char *filename, bool m
> >   if (*p1++ != '<' || p[-2] != '>')
> > fatal_error (input_location,
> >  "specs %%include syntax malformed after "
> > -"%ld characters",
> > -(long) (p1 - buffer + 1));
> > +"%td characters", p1 - buffer + 1);
> >
> 
> Should we use %td later for gcc itself? Since we may use older
> compiler to build gcc.
> My major workstation is Debian Bookworm, which has GCC 12, and then I
> get some warnings:

That is fine and expected.  During stage1 such warnings are intentionally
not fatal, only in stage2+ when we know it is the same version of gcc
we want those can be fatal.
Otherwise we could never add any new modifies...

Jakub



Re: [PATCH] libstdc++: Implement std::formatter without [PR115099]

2024-05-22 Thread Jonathan Wakely
Pushed to trunk. Backport to gcc-14 to follow.

On Fri, 17 May 2024 at 14:45, Jonathan Wakely  wrote:
>
> Does anybody see any issue with the drive-by fixes to constraint
> std::formatter to only work for pointers and integers (since
> we don't know how to format pthread_t if it's an arbitrary struct, for
> example), and to cast pointers to const void* for output (because if
> pthread_t is char* then writing it to a stream would be bad! and we
> don't want to allow users to overload operator<< for pointers to opaque
> structs, for example). I don't think this will change anything in
> practice for common targets, where pthread_t is either an integer or
> void*.
>
> Tested x86_64-linux.
>
> -- >8 --
>
> The std::thread::id formatter uses std::basic_ostringstream without
> including , which went unnoticed because the test for it uses
> a stringstream to check the output is correct.
>
> The fix implemented here is to stop using basic_ostringstream for
> formatting thread::id and just use std::format instead.
>
> As a drive-by fix, the formatter specialization is constrained to
> require that the thread::id::native_handle_type can be formatted, to
> avoid making the formatter ill-formed if the pthread_t type is not a
> pointer or integer. Since non-void pointers can't be formatted, ensure
> that we convert pointers to const void* for formatting. Make a similar
> change to the existing operator<< overload so that in the unlikely case
> that pthread_t is a typedef for char* we don't treat it as a
> null-terminated string when inserting into a stream.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/115099
> * include/bits/std_thread.h: Declare formatter as friend of
> thread::id.
> * include/std/thread (operator<<): Convert non-void pointers to
> void pointers for output.
> (formatter): Add constraint that thread::native_handle_type is a
> pointer or integer.
> (formatter::format): Reimplement without basic_ostringstream.
> * testsuite/30_threads/thread/id/output.cc: Check output
> compiles before  has been included.
> ---
>  libstdc++-v3/include/bits/std_thread.h| 11 -
>  libstdc++-v3/include/std/thread   | 43 ++-
>  .../testsuite/30_threads/thread/id/output.cc  | 21 -
>  3 files changed, 63 insertions(+), 12 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/std_thread.h 
> b/libstdc++-v3/include/bits/std_thread.h
> index 2d7df12650d..5817bfb29dd 100644
> --- a/libstdc++-v3/include/bits/std_thread.h
> +++ b/libstdc++-v3/include/bits/std_thread.h
> @@ -53,6 +53,10 @@ namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> +#if __glibcxx_formatters
> +  template class formatter;
> +#endif
> +
>/** @addtogroup threads
> *  @{
> */
> @@ -117,13 +121,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
> friend basic_ostream<_CharT, _Traits>&
> operator<<(basic_ostream<_CharT, _Traits>& __out, id __id);
> +
> +#if __glibcxx_formatters
> +  friend formatter;
> +  friend formatter;
> +#endif
>  };
>
>private:
>  id _M_id;
>
>  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> -// 2097.  packaged_task constructors should be constrained
> +// 2097. packaged_task constructors should be constrained
>  // 3039. Unnecessary decay in thread and packaged_task
>  template
>using __not_same = __not_, thread>>;
> diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
> index 09ca3116e7f..e994d683bff 100644
> --- a/libstdc++-v3/include/std/thread
> +++ b/libstdc++-v3/include/std/thread
> @@ -42,10 +42,6 @@
>  # include  // std::stop_source, std::stop_token, std::nostopstate
>  #endif
>
> -#if __cplusplus > 202002L
> -# include 
> -#endif
> -
>  #include  // std::thread, get_id, yield
>  #include  // std::this_thread::sleep_for, 
> sleep_until
>
> @@ -53,6 +49,10 @@
>  #define __glibcxx_want_formatters
>  #include 
>
> +#if __cpp_lib_formatters
> +# include 
> +#endif
> +
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> @@ -104,10 +104,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  inline basic_ostream<_CharT, _Traits>&
>  operator<<(basic_ostream<_CharT, _Traits>& __out, thread::id __id)
>  {
> +  // Convert non-void pointers to const void* for formatted output.
> +  using __output_type
> +   = __conditional_t::value,
> + const void*,
> + thread::native_handle_type>;
> +
>if (__id == thread::id())
> return __out << "thread::id of a non-executing thread";
>else
> -   return __out << __id._M_thread;
> +   return __out << static_cast<__output_type>(__id._M_thread);
>  }
>/// @}
>
> @@ -287,8 +293,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif // __cpp_lib_jthread
>
>  #ifdef __cpp_lib_formatters // C++ >= 23
> -
>t

Re: [PATCH] Add %[zt][diox] support to pretty-print

2024-05-22 Thread YunQiang Su
Jakub Jelinek  于2024年5月22日周三 17:14写道:
>
> On Wed, May 22, 2024 at 05:05:30PM +0800, YunQiang Su wrote:
> > > --- gcc/gcc.cc.jj   2024-02-09 14:54:09.141489744 +0100
> > > +++ gcc/gcc.cc  2024-02-09 22:04:37.655678742 +0100
> > > @@ -2410,8 +2410,7 @@ read_specs (const char *filename, bool m
> > >   if (*p1++ != '<' || p[-2] != '>')
> > > fatal_error (input_location,
> > >  "specs %%include syntax malformed after "
> > > -"%ld characters",
> > > -(long) (p1 - buffer + 1));
> > > +"%td characters", p1 - buffer + 1);
> > >
> >
> > Should we use %td later for gcc itself? Since we may use older
> > compiler to build gcc.
> > My major workstation is Debian Bookworm, which has GCC 12, and then I
> > get some warnings:
>
> That is fine and expected.  During stage1 such warnings are intentionally
> not fatal, only in stage2+ when we know it is the same version of gcc
> we want those can be fatal.

It may have only 1 stage in some cases.
For example we have a full binutils/libc stack, and just build a cross-gcc.
For all libraries for target, such as libgcc etc, it is OK; while for
host executables
it will be a problem.

> Otherwise we could never add any new modifies...
>
> Jakub
>


-- 
YunQiang Su


Re: [PATCH] Fix mixed input kind permute optimization

2024-05-22 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Biener  writes:
>> When change_vec_perm_layout runs into a permute combining two
>> nodes where one is invariant and one internal the partition of
>> one input can be -1 but the other might not be.  The following
>> supports this case by simply ignoring inputs with input partiton -1.
>>
>> I'm not sure this is correct but it avoids ICEing when accessing
>> that partitions layout for gcc.target/i386/pr98928.c with the
>> change to avoid splitting store dataref groups during SLP discovery.
>>
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu (ontop of
>> the SLP series).  The change can't break anything that's already
>> broken but I'm not sure this does the right thing - the testcase
>> has an uniform constant.  I'll try to come up with a better runtime
>> testcase tomorrow.  Hints as to where to correctly fix such case
>> appreciated.
>
> Famous last words, but yeah, it looks correct to me.  I think the
> routine in principle should have a free choice of which layout to
> choose for invariants (as long as it's consistent for all queries
> about the same node).  So it should just be a question of whether
> keeping the original layout is more likely to give a valid
> permutation, or whether going with out_layout_i would be better.
> I don't have a strong intuition either way.

BTW, I should have said that using a different layout from 0
would require compensating code in the materialize function.
So this is definitely the simplest and most direct fix.

Thanks,
Richard


RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-22 Thread Tamar Christina
> 
> Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
> (I'm open to other suggestions.)  Just looking for something that describes
> either the architecture or the end result that we want to achieve.
> And preferable something fairly short :)
> 
> avoid_* would be consistent with the existing "avoid_cross_loop_fma".
> 
> > +
> >  #undef AARCH64_EXTRA_TUNING_OPTION
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index
> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
> 6b46c74084ba7c3c 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
> AARCH64_FL_SM_OFF;
> >  enabled through +gcs.  */
> >  #define TARGET_GCS (AARCH64_ISA_GCS)
> >
> > +/*  Prefer different predicate registers for the output of a predicated 
> > operation
> over
> > +re-using an existing input predicate.  */
> > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> > +&& (aarch64_tune_params.extra_tuning_flags \
> > +&
> AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
> >
> >  /* Standard register usage.  */
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index
> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
> 53473b478c5ddba82 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
> "any"))
> >  ;; target-independent code.
> >  (define_attr "is_call" "no,yes" (const_string "no"))
> >
> > +;; Indicates whether we want to enable the pattern with an optional early
> > +;; clobber for SVE predicates.
> > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
> > +
> >  ;; [For compatibility with Arm in pipeline models]
> >  ;; Attribute that specifies whether or not the instruction touches fp
> >  ;; registers.
> > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
> >  (define_attr "arch_enabled" "no,yes"
> >(if_then_else
> >  (ior
> > -   (eq_attr "arch" "any")
> > +   (and (eq_attr "arch" "any")
> > +(eq_attr "pred_clobber" "no"))
> >
> > (and (eq_attr "arch" "rcpc8_4")
> >  (match_test "AARCH64_ISA_RCPC8_4"))
> > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
> >  (match_test "TARGET_SVE"))
> >
> > (and (eq_attr "arch" "sme")
> > -(match_test "TARGET_SME")))
> > +(match_test "TARGET_SME"))
> > +
> > +   (and (eq_attr "pred_clobber" "yes")
> > +(match_test "TARGET_SVE_PRED_CLOBBER")))
> 
> IMO it'd be bettero handle pred_clobber separately from arch, as a new
> top-level AND:
> 
>   (and
> (ior
>   (eq_attr "pred_clobber" "no")
>   (match_test "!TARGET_..."))
> (ior
>   ...existing arch tests...))
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def
(AVOID_PRED_RMW): New.
* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
* config/aarch64/aarch64.md (pred_clobber): New.
(arch_enabled): Use it.

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 
d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb
 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated 
operation over
+re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+&& (aarch64_tune_params.extra_tuning_flags \
+& AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
dbde066f7478bec51a8703b017ea553aa98be309..52e5adba4172e14b794b5df9394e58ce49ef8b7f
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/con

[PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.

2024-05-22 Thread Tamar Christina
Hi All,

This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
It is kept off for generic codegen.

Note the reason for the +sve even though they are in aarch64-sve.exp is if the
testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then
the intrinsics end up being disabled because the -march is preferred over the
-mcpu even though the -mcpu comes later.

This prevents the tests from failing in such runs.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
* config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
* config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred_clobber_1.c: New test.
* gcc.target/aarch64/sve/pred_clobber_2.c: New test.
* gcc.target/aarch64/sve/pred_clobber_3.c: New test.
* gcc.target/aarch64/sve/pred_clobber_4.c: New test.
* gcc.target/aarch64/sve/pred_clobber_5.c: New test.

---
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
b/gcc/config/aarch64/tuning_models/neoversen2.h
index 
7e799bbe762fe862e31befed50e54040a7fd1f2f..be9a48ac3adc097f967c217fe09dcac194d7d14f
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen2.h
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -236,7 +236,8 @@ static const struct tune_params neoversen2_tunings =
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
| AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h 
b/gcc/config/aarch64/tuning_models/neoversev1.h
index 
9363f2ad98a5279cc99f2f9b1509ba921d582e84..0fc41ce6a41b3135fa06d2bda1f517fdf4f8dbcf
 100644
--- a/gcc/config/aarch64/tuning_models/neoversev1.h
+++ b/gcc/config/aarch64/tuning_models/neoversev1.h
@@ -227,7 +227,8 @@ static const struct tune_params neoversev1_tunings =
   (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
-   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h 
b/gcc/config/aarch64/tuning_models/neoversev2.h
index 
bc01ed767c9b690504eb98456402df5d9d64eee3..f76e4ef358f7dfb9c7d7b470ea7240eaa2120f8e
 100644
--- a/gcc/config/aarch64/tuning_models/neoversev2.h
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -236,7 +236,8 @@ static const struct tune_params neoversev2_tunings =
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
| AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
new file mode 100644
index 
..934a00a38531c5fd4139d99ff33414904b2c104f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-n2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include 
+
+extern void use(svbool_t);
+
+/*
+** foo:
+** ...
+** ptrue   p([1-9][0-9]?).b, all
+** cmplo   p0.h, p\1/z, z0.h, z[0-9]+.h
+** ...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c
new file mode 100644
index 
..58badb66a43b1ac50eeec153b9cac44fc831b145
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neovers

[PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Tamar Christina
Hi All,

This patch adds new alternatives to the patterns which are affected.  The new
alternatives with the conditional early clobbers are added before the normal
ones in order for LRA to prefer them in the event that we have enough free
registers to accommodate them.

In case register pressure is too high the normal alternatives will be preferred
before a reload is considered as we rather have the tie than a spill.

Tests are in the next patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (and3,
@aarch64_pred__z, *3_cc,
*3_ptest, aarch64_pred__z,
*3_cc, *3_ptest,
aarch64_pred__z, *3_cc,
*3_ptest, @aarch64_pred_cmp,
*cmp_cc, *cmp_ptest,
@aarch64_pred_cmp_wide,
*aarch64_pred_cmp_wide_cc,
*aarch64_pred_cmp_wide_ptest, @aarch64_brk,
*aarch64_brk_cc, *aarch64_brk_ptest,
@aarch64_brk, *aarch64_brkn_cc, *aarch64_brkn_ptest,
*aarch64_brk_cc, *aarch64_brk_ptest,
aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest,
*aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber
alternative.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_): Likewise.

---
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c297428c85fe46
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z"
  (reg:VNx16BI FFRT_REGNUM)
  (match_operand:VNx16BI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
- [ Upa , Upa ] rdffr\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+ [ &Upa, Upa; yes ] rdffr\t%0.b, %1/z
+ [ ?Upa, Upa; yes ] ^
+ [ Upa , Upa; *   ] ^
   }
 )
 
@@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest"
  UNSPEC_PTEST))
(clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
- [ Upa , Upa ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+ [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
+ [ ?Upa, Upa; yes ] ^
+ [ Upa , Upa; *   ] ^
   }
 )
 
@@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest"
  UNSPEC_PTEST))
(clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
- [ Upa , Upa ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+ [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
+ [ ?Upa, Upa; yes ] ^
+ [ Upa , Upa; *   ] ^
   }
 )
 
@@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc"
  (reg:VNx16BI FFRT_REGNUM)
  (match_dup 1)))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
- [ Upa , Upa ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+ [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
+ [ ?Upa, Upa; yes ] ^
+ [ Upa , Upa; *   ] ^
   }
 )
 
@@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc"
(set (match_operand:VNx16BI 0 "register_operand")
(reg:VNx16BI FFRT_REGNUM))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
- [ Upa , Upa ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+ [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
+ [ ?Upa, Upa; yes ] ^
+ [ Upa , Upa; *   ] ^
   }
 )
 
@@ -6651,8 +6661,10 @@ (define_insn "and3"
(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
  (match_operand:PRED_ALL 2 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2   ]
- [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
+  {@ [ cons: =0, 1  , 2  ; attrs: pred_clobber ]
+ [ &Upa, Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b
+ [ ?Upa, Upa, Upa; yes ] ^
+ [ Upa , Upa, Upa; *   ] ^
   }
 )
 
@@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred__z"
(match_operand:PRED_ALL 3 "register_operand"))
  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
- [ Upa , Upa, Upa, Upa ] \t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
+ [ &Upa, Upa, Upa, Upa; yes ] \t%0.b, %1/z, 
%2.b, %3.b
+ [ ?Upa, Upa, Upa, Upa; yes ] ^
+ [ Upa , Upa, Upa, Upa; *   ] ^
   }
 )
 
@@ -6703,8 +6717,10 @@ (define_insn "*3_cc"
(and:PRED_ALL (LOGICAL:PRED_ALL (ma

Re: [PATCH] Add %[zt][diox] support to pretty-print

2024-05-22 Thread Jakub Jelinek
On Wed, May 22, 2024 at 05:23:33PM +0800, YunQiang Su wrote:
> Jakub Jelinek  于2024年5月22日周三 17:14写道:
> >
> > On Wed, May 22, 2024 at 05:05:30PM +0800, YunQiang Su wrote:
> > > > --- gcc/gcc.cc.jj   2024-02-09 14:54:09.141489744 +0100
> > > > +++ gcc/gcc.cc  2024-02-09 22:04:37.655678742 +0100
> > > > @@ -2410,8 +2410,7 @@ read_specs (const char *filename, bool m
> > > >   if (*p1++ != '<' || p[-2] != '>')
> > > > fatal_error (input_location,
> > > >  "specs %%include syntax malformed after "
> > > > -"%ld characters",
> > > > -(long) (p1 - buffer + 1));
> > > > +"%td characters", p1 - buffer + 1);
> > > >
> > >
> > > Should we use %td later for gcc itself? Since we may use older
> > > compiler to build gcc.
> > > My major workstation is Debian Bookworm, which has GCC 12, and then I
> > > get some warnings:
> >
> > That is fine and expected.  During stage1 such warnings are intentionally
> > not fatal, only in stage2+ when we know it is the same version of gcc
> > we want those can be fatal.
> 
> It may have only 1 stage in some cases.
> For example we have a full binutils/libc stack, and just build a cross-gcc.
> For all libraries for target, such as libgcc etc, it is OK; while for
> host executables
> it will be a problem.

That is still ok, it is just a warning about unknown gcc format specifiers,
at runtime the code from the compiler being built will be used and that
handles those.  We have added dozens of these over years, %td/%zd certainly
aren't an exception.  Just try to build with some older gcc version, say
4.8.5, and you'll see far more such warnings.
But also as recommended, you shouldn't be building cross-gcc with old
version of gcc, you should use same version of the native compiler to
build the cross compiler.

https://gcc.gnu.org/install/build.html

"To build a cross compiler, we recommend first building and installing a native
compiler. You can then use the native GCC compiler to build the cross
compiler."

Jakub



Re: [PATCH] Fix mixed input kind permute optimization

2024-05-22 Thread Richard Biener
On Wed, 22 May 2024, Richard Sandiford wrote:

> Richard Sandiford  writes:
> > Richard Biener  writes:
> >> When change_vec_perm_layout runs into a permute combining two
> >> nodes where one is invariant and one internal the partition of
> >> one input can be -1 but the other might not be.  The following
> >> supports this case by simply ignoring inputs with input partiton -1.
> >>
> >> I'm not sure this is correct but it avoids ICEing when accessing
> >> that partitions layout for gcc.target/i386/pr98928.c with the
> >> change to avoid splitting store dataref groups during SLP discovery.
> >>
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (ontop of
> >> the SLP series).  The change can't break anything that's already
> >> broken but I'm not sure this does the right thing - the testcase
> >> has an uniform constant.  I'll try to come up with a better runtime
> >> testcase tomorrow.  Hints as to where to correctly fix such case
> >> appreciated.
> >
> > Famous last words, but yeah, it looks correct to me.  I think the
> > routine in principle should have a free choice of which layout to
> > choose for invariants (as long as it's consistent for all queries
> > about the same node).  So it should just be a question of whether
> > keeping the original layout is more likely to give a valid
> > permutation, or whether going with out_layout_i would be better.
> > I don't have a strong intuition either way.
> 
> BTW, I should have said that using a different layout from 0
> would require compensating code in the materialize function.
> So this is definitely the simplest and most direct fix.

Yeah, I guess we can improve on that later.  I'm going to push the
change after lunch together with the other two fixes - the ARM CI
discovered its share of testsuite fallout for the actual change
I'm going to look at.

Richard.


[PATCH] tree-optimization/115144 - improve sinking destination choice

2024-05-22 Thread Richard Biener
When sinking code closer to its uses we already try to minimize the
distance we move by inserting at the start of the basic-block.  The
following makes sure to sink closest to the control dependence
check of the region we want to sink to as well as make sure to
ignore control dependences that are only guarding exceptional code.
This restores somewhat the old profile check but without requiring
nearly even probabilities.  The patch also makes sure to not give
up completely when the best sink location is one we do not want to
sink to but possibly then choose the next best one.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu after
a minor fix.

PR tree-optimization/115144
* tree-ssa-sink.cc (do_not_sink): New function, split out
from ...
(select_best_block): Here.  First pick valid block to
sink to.  From that search for the best valid block,
avoiding sinking across conditions to exceptional code.

* gcc.dg/tree-ssa/ssa-sink-22.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c |  14 +++
 gcc/tree-ssa-sink.cc| 101 +---
 2 files changed, 82 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..e35626d4070
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink1-details" } */
+
+extern void abort (void);
+
+int foo (int x, int y, int f)
+{
+  int tem = x / y;
+  if (f)
+abort ();
+  return tem;
+}
+
+/* { dg-final { scan-tree-dump-not "Sinking" "sink1" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2188b7523c7..a06b43e61af 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -172,6 +172,38 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return whether sinking STMT from EARLY_BB to BEST_BB should be avoided.  */
+
+static bool
+do_not_sink (gimple *stmt, basic_block early_bb, basic_block best_bb)
+{
+  /* Placing a statement before a setjmp-like function would be invalid
+ (it cannot be reevaluated when execution follows an abnormal edge).
+ If we selected a block with abnormal predecessors, just punt.  */
+  if (bb_has_abnormal_pred (best_bb))
+return true;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+ something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+  && empty_block_p (best_bb))
+return true;
+
+  /* Avoid turning an unconditional read into a conditional one when we
+ still might want to perform vectorization.  */
+  if (best_bb->loop_father == early_bb->loop_father
+  && loop_outer (best_bb->loop_father)
+  && !best_bb->loop_father->inner
+  && gimple_vuse (stmt)
+  && flag_tree_loop_vectorize
+  && !(cfun->curr_properties & PROP_loop_opts_done)
+  && dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, early_bb)
+  && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, 
best_bb))
+return true;
+
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
statements.
@@ -185,54 +217,57 @@ select_best_block (basic_block early_bb,
   basic_block late_bb,
   gimple *stmt)
 {
+  /* First pick a block we do not disqualify.  */
+  while (late_bb != early_bb
+&& do_not_sink (stmt, early_bb, late_bb))
+late_bb = get_immediate_dominator (CDI_DOMINATORS, late_bb);
+
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-
   while (temp_bb != early_bb)
 {
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 
+  /* Do not consider blocks we do not want to sink to.  */
+  if (temp_bb != early_bb && do_not_sink (stmt, early_bb, temp_bb))
+   ;
+
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
-  if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
+  else if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
-}
 
-  /* Placing a statement before a setjmp-like function would be invalid
- (it cannot be reevaluated when execution follows an abnormal edge).
- If we selected a block with abnormal predecessors, just punt.  */
-  if (bb_has_abnormal_pred (best_bb))
-return early_bb;
-
-  /* If we found a shallower loop nest, then we always consider that
- a win.  This will always give us the most control dependent block
- within that loop nest.  */
-  if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
-retu

Re: [PATCH] Add %[zt][diox] support to pretty-print

2024-05-22 Thread YunQiang Su
Jakub Jelinek  于2024年5月22日周三 17:33写道:
>
> On Wed, May 22, 2024 at 05:23:33PM +0800, YunQiang Su wrote:
> > Jakub Jelinek  于2024年5月22日周三 17:14写道:
> > >
> > > On Wed, May 22, 2024 at 05:05:30PM +0800, YunQiang Su wrote:
> > > > > --- gcc/gcc.cc.jj   2024-02-09 14:54:09.141489744 +0100
> > > > > +++ gcc/gcc.cc  2024-02-09 22:04:37.655678742 +0100
> > > > > @@ -2410,8 +2410,7 @@ read_specs (const char *filename, bool m
> > > > >   if (*p1++ != '<' || p[-2] != '>')
> > > > > fatal_error (input_location,
> > > > >  "specs %%include syntax malformed after "
> > > > > -"%ld characters",
> > > > > -(long) (p1 - buffer + 1));
> > > > > +"%td characters", p1 - buffer + 1);
> > > > >
> > > >
> > > > Should we use %td later for gcc itself? Since we may use older
> > > > compiler to build gcc.
> > > > My major workstation is Debian Bookworm, which has GCC 12, and then I
> > > > get some warnings:
> > >
> > > That is fine and expected.  During stage1 such warnings are intentionally
> > > not fatal, only in stage2+ when we know it is the same version of gcc
> > > we want those can be fatal.
> >
> > It may have only 1 stage in some cases.
> > For example we have a full binutils/libc stack, and just build a cross-gcc.
> > For all libraries for target, such as libgcc etc, it is OK; while for
> > host executables
> > it will be a problem.
>
> That is still ok, it is just a warning about unknown gcc format specifiers,
> at runtime the code from the compiler being built will be used and that
> handles those.  We have added dozens of these over years, %td/%zd certainly
> aren't an exception.  Just try to build with some older gcc version, say
> 4.8.5, and you'll see far more such warnings.

Thanks for your explaination. It's OK for me if it can work well at runtime.

> But also as recommended, you shouldn't be building cross-gcc with old
> version of gcc, you should use same version of the native compiler to
> build the cross compiler.
>
> https://gcc.gnu.org/install/build.html
>
> "To build a cross compiler, we recommend first building and installing a 
> native
> compiler. You can then use the native GCC compiler to build the cross
> compiler."
>
> Jakub
>


-- 
YunQiang Su


[PATCH] RISC-V: Add Zfbfmin extension

2024-05-22 Thread Xiao Zeng
1 In the previous patch, the libcall for BF16 was implemented:


2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 Converts":


3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (truncsfbf2): New pattern for BFmode.
(trunchfbf2): Dotto.
(truncdfbf2): Dotto.
(trunctfbf2): Dotto.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.
---
 gcc/config/riscv/riscv.cc |  4 +-
 gcc/config/riscv/riscv.md | 75 +--
 .../riscv/zfbfmin-bf16_arithmetic.c   | 35 +
 .../riscv/zfbfmin-bf16_comparison.c   | 33 
 .../zfbfmin-bf16_float_libcall_convert.c  | 45 +++
 .../zfbfmin-bf16_integer_libcall_convert.c| 66 
 6 files changed, 249 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_arithmetic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_comparison.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d0c22058b8c..7c6bafedda3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4106,7 +4106,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.x.h\t%0,%1";
/* Using fmv.x.s + sign-extend to emulate fmv.x.h.  */
return "fmv.x.s\t%0,%1;slli\t%0,%0,16;srai\t%0,%0,16";
@@ -4162,7 +4162,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.h.x\t%0,%z1";
/* High 16 bits should be all-1, otherwise HW will treated
   as a n-bit canonical NaN, but isn't matter for softfloat.  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 78c16adee98..7fd2e3aa23e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1763,6 +1763,57 @@
   [(set_attr "type" "fcvt")
(set_attr "mode" "HF")])
 
+(define_insn "truncsfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:SF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  "fcvt.bf16.s\t%0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+;; The conversion of HF/DF/TF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __trunc[h|d|t]fbf2.
+(define_expand "trunchfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:HF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, HFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+(define_expand "trunctfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:TF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, TFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
 ;;
 ;;  
 ;;
@@ -1907,6 +1958,15 @@
   [(set_attr "type" "fcvt")
(set_attr "mode" "SF")])
 
+(define_insn "extendbfsf2"
+  [(set (match_operand:SF0 "register_operand" "=f")
+   (float_extend:SF
+  (match_operand:BF 1 "regi

Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This patch adds new alternatives to the patterns which are affected.  The new
> alternatives with the conditional early clobbers are added before the normal
> ones in order for LRA to prefer them in the event that we have enough free
> registers to accommodate them.
>
> In case register pressure is too high the normal alternatives will be 
> preferred
> before a reload is considered as we rather have the tie than a spill.
>
> Tests are in the next patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md (and3,
>   @aarch64_pred__z, *3_cc,
>   *3_ptest, aarch64_pred__z,
>   *3_cc, *3_ptest,
>   aarch64_pred__z, *3_cc,
>   *3_ptest, @aarch64_pred_cmp,
>   *cmp_cc, *cmp_ptest,
>   @aarch64_pred_cmp_wide,
>   *aarch64_pred_cmp_wide_cc,
>   *aarch64_pred_cmp_wide_ptest, @aarch64_brk,
>   *aarch64_brk_cc, *aarch64_brk_ptest,
>   @aarch64_brk, *aarch64_brkn_cc, *aarch64_brkn_ptest,
>   *aarch64_brk_cc, *aarch64_brk_ptest,
>   aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest,
>   *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber
>   alternative.
>   * config/aarch64/aarch64-sve2.md
>   (@aarch64_pred_): Likewise.
>
> ---
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c297428c85fe46
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z"
> (reg:VNx16BI FFRT_REGNUM)
> (match_operand:VNx16BI 1 "register_operand")))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  {@ [ cons: =0, 1   ]
> - [ Upa , Upa ] rdffr\t%0.b, %1/z
> +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
> + [ &Upa, Upa; yes ] rdffr\t%0.b, %1/z
> + [ ?Upa, Upa; yes ] ^
> + [ Upa , Upa; *   ] ^
>}
>  )

Sorry for not explaining it very well, but in the previous review I suggested:

> The gather-like approach would be something like:
>
>  [ &Upa , Upl , w , ; yes ] 
> cmp\t%0., %1/z, %3., #%4
>  [ ?Upl , 0   , w , ; yes ] ^
>  [ Upa  , Upl , w , ; no  ] ^
>  [ &Upa , Upl , w , w; yes ] 
> cmp\t%0., %1/z, %3., %4.
>  [ ?Upl , 0   , w , w; yes ] ^
>  [ Upa  , Upl , w , w; no  ] ^
>
> with:
>
>   (define_attr "pred_clobber" "any,no,yes" (const_string "any"))

(with emphasis on the last line).  What I didn't say explicitly is
that "no" should require !TARGET_SVE_PRED_CLOBBER.

The premise of that review was that we shouldn't enable things like:

 [ Upa  , Upl , w , w; no  ] ^

for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber
alternative.  So we should enable either the pred_clobber=yes
alternatives or the pred_clobber=no alternatives, but not both.

The default "any" is then for other non-predicate instructions that
don't care about TARGET_SVE_PRED_CLOBBER either way.

In contrast, this patch makes pred_clobber=yes enable the alternatives
that correctly describe the restriction (good!) but then also enables
the normal alternatives too, which IMO makes the semantics unclear.

Thanks,
Richard

>  
> @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest"
> UNSPEC_PTEST))
> (clobber (match_scratch:VNx16BI 0))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  {@ [ cons: =0, 1   ]
> - [ Upa , Upa ] rdffrs\t%0.b, %1/z
> +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
> + [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
> + [ ?Upa, Upa; yes ] ^
> + [ Upa , Upa; *   ] ^
>}
>  )
>  
> @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest"
> UNSPEC_PTEST))
> (clobber (match_scratch:VNx16BI 0))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  {@ [ cons: =0, 1   ]
> - [ Upa , Upa ] rdffrs\t%0.b, %1/z
> +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
> + [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
> + [ ?Upa, Upa; yes ] ^
> + [ Upa , Upa; *   ] ^
>}
>  )
>  
> @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc"
> (reg:VNx16BI FFRT_REGNUM)
> (match_dup 1)))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  {@ [ cons: =0, 1   ]
> - [ Upa , Upa ] rdffrs\t%0.b, %1/z
> +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
> + [ &Upa, Upa; yes ] rdffrs\t%0.b, %1/z
> + [ ?Upa, Upa; yes ] ^
> + [ Upa , Upa; *   ] ^
>}
>  )
>  
> @@ -123

[PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-22 Thread YunQiang Su
If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain,
the final fallback is `as/ld` of system.  In fact, we can have a try with
-as/ld/objcopy before fallback to native as/ld/objcopy.

This patch is derivatived from Debian's patch:
  gcc-search-prefixed-as-ld.diff

gcc
* gcc.cc(execute): Looks for -as/ld/objcopy before fallback
to native as/ld/objcopy.
---
 gcc/gcc.cc | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 830a4700a87..3dc6348d761 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -3293,6 +3293,26 @@ execute (void)
   string = find_a_program(commands[0].prog);
   if (string)
commands[0].argv[0] = string;
+  else if (*cross_compile != '0'
+   && !strcmp (commands[0].argv[0], commands[0].prog)
+   && (!strcmp (commands[0].prog, "as")
+   || !strcmp (commands[0].prog, "ld")
+   || !strcmp (commands[0].prog, "objcopy")))
+   {
+ string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
+   commands[0].prog, NULL);
+ const char *string_args[] = {string, "--version", NULL};
+ int exit_status = 0;
+ int err = 0;
+ const char *errmsg = pex_one (PEX_SEARCH, string,
+ CONST_CAST (char **, string_args), string,
+ NULL, NULL, &exit_status, &err);
+ if (errmsg == NULL && exit_status == 0 && err == 0)
+   {
+ commands[0].argv[0] = string;
+ commands[0].prog = string;
+   }
+   }
 }
 
   for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
-- 
2.39.2



[PATCH v2 2/2] driver: Search -as/ld/objcopy before non-triple ones

2024-05-22 Thread YunQiang Su
When looking for as/ld/objcopy, `find_a_program/file_at_path` only
try to find the raw name, but won't find the one with -
prefix.

This patch is derivatived from Debian's patch:
gcc-search-prefixed-as-ld.diff

gcc
* gcc.cc(for_each_path): Add more space for -.
(file_at_path): Search -as/ld/objcopy before
non-triple ones.
---
 gcc/gcc.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 3dc6348d761..0fa2eafea84 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -2820,6 +2820,8 @@ for_each_path (const struct path_prefix *paths,
{
  len = paths->max_len + extra_space + 1;
  len += MAX (MAX (suffix_len, multi_os_dir_len), multiarch_len);
+ /* triplet prefix for as, ld.  */
+ len += MAX (strlen (DEFAULT_REAL_TARGET_MACHINE), multiarch_len) + 2;
  path = XNEWVEC (char, len);
}
 
@@ -3033,6 +3035,17 @@ file_at_path (char *path, void *data)
   struct file_at_path_info *info = (struct file_at_path_info *) data;
   size_t len = strlen (path);
 
+  /* search for the -as / -ld / objcopy first.  */
+  if (! strcmp (info->name, "as") || ! strcmp (info->name, "ld")
+   || ! strcmp (info->name, "objcopy"))
+{
+  struct file_at_path_info prefix_info = *info;
+  prefix_info.name = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
+   info->name, NULL);
+  prefix_info.name_len = strlen (prefix_info.name);
+  if (file_at_path (path, &prefix_info))
+   return path;
+}
   memcpy (path + len, info->name, info->name_len);
   len += info->name_len;
 
-- 
2.39.2



Re: [Patch, aarch64, middle-end] v3: Move pair_fusion pass from aarch64 to middle-end

2024-05-22 Thread Alex Coplan
Hi Ajit,

You need to remove the header dependencies that are no longer required
for aarch64-ldp-fusion.o in t-aarch64 (not forgetting to update the
ChangeLog).  A few other minor nits below.

LGTM with those changes, but you'll need Richard S to approve.

Thanks a lot for doing this.

On 22/05/2024 00:16, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All comments are addressed.
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> 2024-05-22  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * pair-fusion.h: Generic header code for load store pair fusion
>   that can be shared across different architectures.
>   * pair-fusion.cc: Generic source code implementation for
>   load store pair fusion that can be shared across different 
> architectures.
>   * Makefile.in: Add new object file pair-fusion.o.
>   * config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it
>   to pair-fusion.cc in the middle-end.
>   * config/aarch64/t-aarch64: Add header file dependency on pair-fusion.h.
> ---
>  gcc/Makefile.in  |1 +
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 3298 +-
>  gcc/config/aarch64/t-aarch64 |2 +-
>  gcc/pair-fusion.cc   | 3013 
>  gcc/pair-fusion.h|  193 ++
>  5 files changed, 3286 insertions(+), 3221 deletions(-)
>  create mode 100644 gcc/pair-fusion.cc
>  create mode 100644 gcc/pair-fusion.h
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..643342f623d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1563,6 +1563,7 @@ OBJS = \
>   ipa-strub.o \
>   ipa.o \
>   ira.o \
> + pair-fusion.o \
>   ira-build.o \
>   ira-costs.o \
>   ira-conflicts.o \
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 085366cdf68..0af927231d3 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc

> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
> index 78713558e7d..bdada08be70 100644
> --- a/gcc/config/aarch64/t-aarch64
> +++ b/gcc/config/aarch64/t-aarch64
> @@ -203,7 +203,7 @@ aarch64-early-ra.o: 
> $(srcdir)/config/aarch64/aarch64-early-ra.cc \
>  aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc \
>  $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \
>  $(RTL_SSA_H) cfgcleanup.h tree-pass.h ordered-hash-map.h tree-dfa.h \
> -fold-const.h tree-hash-traits.h print-tree.h
> +fold-const.h tree-hash-traits.h print-tree.h pair-fusion.h

So now you also need to remove the deps on the includes removed in the latest
version of the patch.

>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>   $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc
>  
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> new file mode 100644
> index 000..827b88cf2fc
> --- /dev/null
> +++ b/gcc/pair-fusion.cc
> @@ -0,0 +1,3013 @@
> +// Pass to fuse adjacent loads/stores into paired memory accesses.
> +// Copyright (C) 2024 Free Software Foundation, Inc.
> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it
> +// under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +//
> +// GCC is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// .
> +
> +#define INCLU

Re: [committed][wwwdocs] gcc-12/changes.html: Document RISC-V changes

2024-05-22 Thread Gerald Pfeifer
On Fri, 17 May 2024, Palmer Dabbelt wrote:
> Ya, I guess it's kind of an odd phrasing.  Maybe it should be something like

Yes, this would have helped me understand. Thank you.

>The vector and scalar crypto extensions are now accepted in ISA strings
>via the -march argument.  Note that enabling these extensions
>will only set the coorespending feature test macros and enable assembler
   ^
>support, they don't yet generate binaries with the instructions added in
>these extensions.

Are you going to push this change (with the typo marked above fixed and
maybe "added by these extensions" instead of "...in...")?

Gerald


Re: [PATCH v2] testsuite: Verify r0-r3 are extended with CMSE

2024-05-22 Thread Richard Earnshaw (lists)
On 06/05/2024 12:50, Torbjorn SVENSSON wrote:
> Hi,
> 
> Forgot to mention when I sent the patch that I would like to commit it to the 
> following branches:
> 
> - releases/gcc-11
> - releases/gcc-12
> - releases/gcc-13
> - releases/gcc-14
> - trunk
> 

Well you can [commit it to the release branches], but I'm not sure it's 
essential.  It seems pretty unlikely to me that this would regress on a release 
branch without having first regressed on trunk.

R.

> Kind regards,
> Torbjörn
> 
> On 2024-05-02 12:50, Torbjörn SVENSSON wrote:
>> Add regression test to the existing zero/sign extend tests for CMSE to
>> verify that r0, r1, r2 and r3 are properly extended, not just r0.
>>
>> boolCharShortEnumSecureFunc test is done using -O0 to ensure the
>> instructions are in a predictable order.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/arm/cmse/extend-param.c: Add regression test. Add
>>   -fshort-enums.
>> * gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.
>>
>> Signed-off-by: Torbjörn SVENSSON 
>> ---
>>   .../gcc.target/arm/cmse/extend-param.c    | 21 +++
>>   .../gcc.target/arm/cmse/extend-return.c   |  4 ++--
>>   2 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
>> b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>> index 01fac786238..d01ef87e0be 100644
>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-mcmse" } */
>> +/* { dg-options "-mcmse -fshort-enums" } */
>>   /* { dg-final { check-function-bodies "**" "" "" } } */
>>     #include 
>> @@ -78,7 +78,6 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
>> (enum offset index) {
>>     if (index >= ARRAY_SIZE)
>>   return 0;
>>     return array[index];
>> -
>>   }
>>     /*
>> @@ -88,9 +87,23 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
>> (enum offset index) {
>>   **    ...
>>   */
>>   __attribute__((cmse_nonsecure_entry)) char boolSecureFunc (bool index) {
>> -
>>     if (index >= ARRAY_SIZE)
>>   return 0;
>>     return array[index];
>> +}
>>   -}
>> \ No newline at end of file
>> +/*
>> +**__acle_se_boolCharShortEnumSecureFunc:
>> +**    ...
>> +**    uxtb    r0, r0
>> +**    uxtb    r1, r1
>> +**    uxth    r2, r2
>> +**    uxtb    r3, r3
>> +**    ...
>> +*/
>> +__attribute__((cmse_nonsecure_entry,optimize(0))) char 
>> boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, enum 
>> offset d) {
>> +  size_t index = a + b + c + d;
>> +  if (index >= ARRAY_SIZE)
>> +    return 0;
>> +  return array[index];
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
>> b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>> index cf731ed33df..081de0d699f 100644
>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-mcmse" } */
>> +/* { dg-options "-mcmse -fshort-enums" } */
>>   /* { dg-final { check-function-bodies "**" "" "" } } */
>>     #include 
>> @@ -89,4 +89,4 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
>> (ns_enum_foo_t * ns_foo_p)
>>   unsigned char boolNonsecure0 (ns_bool_foo_t * ns_foo_p)
>>   {
>>     return ns_foo_p ();
>> -}
>> \ No newline at end of file
>> +}



Re: [PATCH 4/4] Testsuite updates

2024-05-22 Thread Richard Biener
On Tue, 21 May 2024, Richard Biener wrote:

> The gcc.dg/vect/slp-12a.c case is interesting as we currently split
> the 8 store group into lanes 0-5 which we SLP with an unroll factor
> of two (on x86-64 with SSE) and the remaining two lanes are using
> interleaving vectorization with a final unroll factor of four.  Thus
> we're using hybrid SLP within a single store group.  After the change
> we discover the same 0-5 lane SLP part as well as two single-lane
> parts feeding the full store group.  But that results in a load
> permutation that isn't supported (I have WIP patchs to rectify that).
> So we end up cancelling SLP and vectorizing the whole loop with
> interleaving which is IMO good and results in better code.
> 
> This is similar for gcc.target/i386/pr52252-atom.c where interleaving
> generates much better code than hybrid SLP.  I'm unsure how to update
> the testcase though.
> 
> gcc.dg/vect/slp-21.c runs into similar situations.  Note that when
> when analyzing SLP operations we discard an instance we currently
> force the full loop to have no SLP because hybrid detection is
> broken.  It's probably not worth fixing this at this moment.
> 
> For gcc.dg/vect/pr97428.c we are not splitting the 16 store group
> into two but merge the two 8 lane loads into one before doing the
> store and thus have only a single SLP instance.  A similar situation
> happens in gcc.dg/vect/slp-11c.c but the branches feeding the
> single SLP store only have a single lane.  Likewise for
> gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c.
> 
> gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization
> with a SLP store group of size two but two single-lane branches.
> 
> gcc.target/i386/pr98928.c ICEs in SLP permute optimization
> because we don't expect a constant and internal branch to be
> merged with a permute node in
> vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only
> permutes merging two SLP nodes are two-operator nodes right now).
> This still requires fixing.
> 
> The whole series has been bootstrapped and tested on 
> x86_64-unknown-linux-gnu with the gcc.target/i386/pr98928.c FAIL
> unfixed.
> 
> Comments welcome (and hello ARM CI), RISC-V and other arch
> testing appreciated.  Unless there are comments to the contrary
> I plan to push patch 1 and 2 tomorrow.

RISC-V CI didn't trigger (not sure what magic is required).  Both
ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit
fragile because we sometimes cancel SLP becuase we want to use
load/store-lanes.

I have locally scrapped the SLP scanning for gcc.dg/vect/slp-21.c where
it doesn't really matter (and if we are finished with all-SLP it will
matter nowhere).  I've conditionalized the outcome based on
vect_load_lanes for gcc.dg/vect/slp-11c.c and
gcc.dg/vect/slp-cond-1.c

On AARCH64 additionally gcc.target/aarch64/sve/mask_struct_store_4.c
ICEs, I have a fix for that.

gcc.target/aarch64/pr99873_2.c FAILs because with a single
SLP store group merged from two two-lane load groups we cancel
the SLP and want to use load/store-lanes.  I'll leave this
FAILing or shall I XFAIL it?

Thanks,
Richard.

> Thanks,
> Richard.
> 
>   * gcc.dg/vect/pr97428.c: Expect a single store SLP group.
>   * gcc.dg/vect/slp-11c.c: Likewise.
>   * gcc.dg/vect/vect-complex-5.c: Likewise.
>   * gcc.dg/vect/slp-12a.c: Do not expect SLP.
>   * gcc.dg/vect/slp-21.c: Likewise.
>   * gcc.dg/vect/slp-cond-1.c: Expect one more SLP.
>   * gcc.dg/vect/vect-gather-2.c: Expect SLP to be used.
>   * gcc.target/i386/pr52252-atom.c: XFAIL test for palignr.
> ---
>  gcc/testsuite/gcc.dg/vect/pr97428.c  |  2 +-
>  gcc/testsuite/gcc.dg/vect/slp-11c.c  |  5 +++--
>  gcc/testsuite/gcc.dg/vect/slp-12a.c  |  6 +-
>  gcc/testsuite/gcc.dg/vect/slp-21.c   | 19 +--
>  gcc/testsuite/gcc.dg/vect/slp-cond-1.c   |  2 +-
>  gcc/testsuite/gcc.dg/vect/vect-complex-5.c   |  2 +-
>  gcc/testsuite/gcc.dg/vect/vect-gather-2.c|  1 -
>  gcc/testsuite/gcc.target/i386/pr52252-atom.c |  3 ++-
>  8 files changed, 18 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c 
> b/gcc/testsuite/gcc.dg/vect/pr97428.c
> index 60dd984cfd3..3cc9976c00c 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr97428.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
> @@ -44,5 +44,5 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n)
>  /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" 
> "vect" } } */
>  /* We're not able to peel & apply re-aligning to make accesses well-aligned 
> for !vect_hw_misalign,
> but we could by peeling the stores for alignment and applying re-aligning 
> loads.  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { xfail { ! vect_hw_misalign } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { ! vect_hw_misalign } } } } */
>  /* { dg-final { scan-tr

RE: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, May 22, 2024 10:48 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to
> patterns
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This patch adds new alternatives to the patterns which are affected.  The 
> > new
> > alternatives with the conditional early clobbers are added before the normal
> > ones in order for LRA to prefer them in the event that we have enough free
> > registers to accommodate them.
> >
> > In case register pressure is too high the normal alternatives will be 
> > preferred
> > before a reload is considered as we rather have the tie than a spill.
> >
> > Tests are in the next patch.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve.md (and3,
> > @aarch64_pred__z, *3_cc,
> > *3_ptest, aarch64_pred__z,
> > *3_cc, *3_ptest,
> > aarch64_pred__z, *3_cc,
> > *3_ptest, @aarch64_pred_cmp,
> > *cmp_cc, *cmp_ptest,
> > @aarch64_pred_cmp_wide,
> > *aarch64_pred_cmp_wide_cc,
> > *aarch64_pred_cmp_wide_ptest,
> @aarch64_brk,
> > *aarch64_brk_cc, *aarch64_brk_ptest,
> > @aarch64_brk, *aarch64_brkn_cc, *aarch64_brkn_ptest,
> > *aarch64_brk_cc, *aarch64_brk_ptest,
> > aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest,
> > *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber
> > alternative.
> > * config/aarch64/aarch64-sve2.md
> > (@aarch64_pred_): Likewise.
> >
> > ---
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-
> sve.md
> > index
> e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c
> 297428c85fe46 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z"
> >   (reg:VNx16BI FFRT_REGNUM)
> >   (match_operand:VNx16BI 1 "register_operand")))]
> >"TARGET_SVE && TARGET_NON_STREAMING"
> > -  {@ [ cons: =0, 1   ]
> > - [ Upa , Upa ] rdffr\t%0.b, %1/z
> > +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
> > + [ &Upa, Upa; yes ] rdffr\t%0.b, %1/z
> > + [ ?Upa, Upa; yes ] ^
> > + [ Upa , Upa; *   ] ^
> >}
> >  )
> 
> Sorry for not explaining it very well, but in the previous review I suggested:
> 
> > The gather-like approach would be something like:
> >
> >  [ &Upa , Upl , w , ; yes ]
> cmp\t%0., %1/z, %3., #%4
> >  [ ?Upl , 0   , w , ; yes ] ^
> >  [ Upa  , Upl , w , ; no  ] ^
> >  [ &Upa , Upl , w , w; yes ] 
> > cmp\t%0., %1/z,
> %3., %4.
> >  [ ?Upl , 0   , w , w; yes ] ^
> >  [ Upa  , Upl , w , w; no  ] ^
> >
> > with:
> >
> >   (define_attr "pred_clobber" "any,no,yes" (const_string "any"))
> 
> (with emphasis on the last line).  What I didn't say explicitly is
> that "no" should require !TARGET_SVE_PRED_CLOBBER.
> 
> The premise of that review was that we shouldn't enable things like:
> 
>  [ Upa  , Upl , w , w; no  ] ^
> 
> for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber
> alternative.  So we should enable either the pred_clobber=yes
> alternatives or the pred_clobber=no alternatives, but not both.
> 
> The default "any" is then for other non-predicate instructions that
> don't care about TARGET_SVE_PRED_CLOBBER either way.
> 
> In contrast, this patch makes pred_clobber=yes enable the alternatives
> that correctly describe the restriction (good!) but then also enables
> the normal alternatives too, which IMO makes the semantics unclear.

Sure, the reason I still had that is because this ICEs under high register
pressure:

  {@ [ cons: =0 , 1   , 3 , 4; attrs: pred_clobber ]
 [ &Upa , Upl , w , ; yes ] 
cmp\t%0., %1/z, %3., #%4
 [ ?Upa , 0   , w , ; yes ] ^
 [ Upa  , Upl , w , ; no  ] ^
 [ &Upa , Upl , w , w; yes ] 
cmp\t%0., %1/z, %3., %4.
 [ ?Upa , 0   , w , w; yes ] ^
 [ Upa  , Upl , w , w; no  ] ^
  }

So now in the `yes` case reload does:

 Considering alt=0 of insn 10:   (0) =&Upa  (1) Upl  (3) w  (4) vsd
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
0 Early clobber: reject++
Bad operand -- refuse
 Considering alt=1 of insn 10:   (0) ?Upa  (1) 0  (3) w  (4) vsd
Staticly defined alt reject+=6

Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-22 Thread Richard Sandiford
Evgeny Karpov  writes:
> This patch extracts the ix86 implementation for expanding a SYMBOL
> into its corresponding dllimport, far-address, or refptr symbol.
> It will be reused in the aarch64-w64-mingw32 target.
> The implementation is copied as is from i386/i386.cc with
> minor changes to follow to the code style.
>
> Also this patch replaces the original DLL import/export
> implementation in ix86 with mingw.
>
> gcc/ChangeLog:
>
>   * config.gcc: Add winnt-dll.o, which contains the DLL
>   import/export implementation.
>   * config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Remove the
>   old implementation. Rename the required function to MinGW.
>   Rename it to a conditional function that will reuse the
>   MinGW implementation for COFF and nothing otherwise.
>   * config/i386/i386-expand.cc (ix86_expand_move): Likewise.
>   * config/i386/i386-expand.h (is_imported_p): Likewise.
>   (mingw_GOT_alias_set): Likewise.
>   (ix86_legitimize_pe_coff_symbol): Likewise.
>   * config/i386/i386-protos.h: Likewise.
>   * config/i386/i386.cc (is_imported_p): Likewise.
>   (ix86_legitimize_pe_coff_symbol): Likewise.
>   (ix86_GOT_alias_set): Likewise.
>   (legitimize_pic_address): Likewise.
>   (struct dllimport_hasher):
>   (GTY): Likewise.
>   (get_dllimport_decl): Likewise.
>   (legitimize_pe_coff_extern_decl): Likewise.
>   (legitimize_dllimport_symbol): Likewise.
>   (legitimize_pe_coff_symbol): Likewise.
>   (ix86_legitimize_address): Likewise.
>   * config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
>   * config/mingw/winnt.cc (i386_pe_record_stub): Likewise.
>   (mingw_pe_record_stub): Likewise.
>   * config/mingw/t-cygming: Add the winnt-dll.o compilation.
>   * config/mingw/winnt-dll.cc: New file.

This looks good to me apart from a couple of very minor comments below,
but please get approval from the x86 maintainers as well.  In particular,
they might prefer to handle ix86_legitimize_pe_coff_symbol in some other way.

> [...]
> diff --git a/gcc/config/mingw/winnt-dll.cc b/gcc/config/mingw/winnt-dll.cc
> new file mode 100644
> index 000..349ade6f5c0
> --- /dev/null
> +++ b/gcc/config/mingw/winnt-dll.cc
> @@ -0,0 +1,229 @@
> +/* Expand a SYMBOL into its corresponding dllimport, far-address,
> +or refptr symbol.
> +Copyright (C) 2024 Free Software Foundation, Inc.

I suppose this should retain the range from the i386 file that the
code is moving from:

   Copyright (C) 1988-2024 Free Software Foundation, Inc.

> [...]
> diff --git a/gcc/config/mingw/winnt-dll.h b/gcc/config/mingw/winnt-dll.h
> new file mode 100644
> index 000..19c16e747a2
> --- /dev/null
> +++ b/gcc/config/mingw/winnt-dll.h
> @@ -0,0 +1,26 @@
> +/* Expand a SYMBOL into its corresponding dllimport, far-address,
> +or refptr symbol.
> +Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +http://www.gnu.org/licenses/.  */
> +
> +#ifndef GCC_MINGW_WINNT_DLL_H
> +#define GCC_MINGW_WINNT_DLL_H
> +
> +extern bool is_imported_p (rtx x);
> +extern alias_set_type mingw_GOT_alias_set (void);
> +extern rtx legitimize_pe_coff_symbol (rtx addr, bool inreg);
> +
> +#endif
> \ No newline at end of file

Would be good to add the newlihe.

Thanks,
Richard


Re: [PATCH v2] testsuite: Verify r0-r3 are extended with CMSE

2024-05-22 Thread Torbjorn SVENSSON

Hello Richard,

Thanks for the reply.

From my point of view, at least the -fshort-enums part should be on all 
branches. Just to be clean, maybe it's easier to backport the entire patch?


Unless you have an objection, I would like to go ahead and just backport 
it to all branches.


Kind regards,
Torbjörn

On 2024-05-22 12:55, Richard Earnshaw (lists) wrote:

On 06/05/2024 12:50, Torbjorn SVENSSON wrote:

Hi,

Forgot to mention when I sent the patch that I would like to commit it to the 
following branches:

- releases/gcc-11
- releases/gcc-12
- releases/gcc-13
- releases/gcc-14
- trunk



Well you can [commit it to the release branches], but I'm not sure it's 
essential.  It seems pretty unlikely to me that this would regress on a release 
branch without having first regressed on trunk.

R.


Kind regards,
Torbjörn

On 2024-05-02 12:50, Torbjörn SVENSSON wrote:

Add regression test to the existing zero/sign extend tests for CMSE to
verify that r0, r1, r2 and r3 are properly extended, not just r0.

boolCharShortEnumSecureFunc test is done using -O0 to ensure the
instructions are in a predictable order.

gcc/testsuite/ChangeLog:

 * gcc.target/arm/cmse/extend-param.c: Add regression test. Add
   -fshort-enums.
 * gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.

Signed-off-by: Torbjörn SVENSSON 
---
   .../gcc.target/arm/cmse/extend-param.c    | 21 +++
   .../gcc.target/arm/cmse/extend-return.c   |  4 ++--
   2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
index 01fac786238..d01ef87e0be 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
@@ -1,5 +1,5 @@
   /* { dg-do compile } */
-/* { dg-options "-mcmse" } */
+/* { dg-options "-mcmse -fshort-enums" } */
   /* { dg-final { check-function-bodies "**" "" "" } } */
     #include 
@@ -78,7 +78,6 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
(enum offset index) {
     if (index >= ARRAY_SIZE)
   return 0;
     return array[index];
-
   }
     /*
@@ -88,9 +87,23 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
(enum offset index) {
   **    ...
   */
   __attribute__((cmse_nonsecure_entry)) char boolSecureFunc (bool index) {
-
     if (index >= ARRAY_SIZE)
   return 0;
     return array[index];
+}
   -}
\ No newline at end of file
+/*
+**__acle_se_boolCharShortEnumSecureFunc:
+**    ...
+**    uxtb    r0, r0
+**    uxtb    r1, r1
+**    uxth    r2, r2
+**    uxtb    r3, r3
+**    ...
+*/
+__attribute__((cmse_nonsecure_entry,optimize(0))) char 
boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, enum 
offset d) {
+  size_t index = a + b + c + d;
+  if (index >= ARRAY_SIZE)
+    return 0;
+  return array[index];
+}
diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
index cf731ed33df..081de0d699f 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
@@ -1,5 +1,5 @@
   /* { dg-do compile } */
-/* { dg-options "-mcmse" } */
+/* { dg-options "-mcmse -fshort-enums" } */
   /* { dg-final { check-function-bodies "**" "" "" } } */
     #include 
@@ -89,4 +89,4 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
(ns_enum_foo_t * ns_foo_p)
   unsigned char boolNonsecure0 (ns_bool_foo_t * ns_foo_p)
   {
     return ns_foo_p ();
-}
\ No newline at end of file
+}




Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Wednesday, May 22, 2024 10:48 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to
>> patterns
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This patch adds new alternatives to the patterns which are affected.  The 
>> > new
>> > alternatives with the conditional early clobbers are added before the 
>> > normal
>> > ones in order for LRA to prefer them in the event that we have enough free
>> > registers to accommodate them.
>> >
>> > In case register pressure is too high the normal alternatives will be 
>> > preferred
>> > before a reload is considered as we rather have the tie than a spill.
>> >
>> > Tests are in the next patch.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* config/aarch64/aarch64-sve.md (and3,
>> >@aarch64_pred__z, *3_cc,
>> >*3_ptest, aarch64_pred__z,
>> >*3_cc, *3_ptest,
>> >aarch64_pred__z, *3_cc,
>> >*3_ptest, @aarch64_pred_cmp,
>> >*cmp_cc, *cmp_ptest,
>> >@aarch64_pred_cmp_wide,
>> >*aarch64_pred_cmp_wide_cc,
>> >*aarch64_pred_cmp_wide_ptest,
>> @aarch64_brk,
>> >*aarch64_brk_cc, *aarch64_brk_ptest,
>> >@aarch64_brk, *aarch64_brkn_cc, *aarch64_brkn_ptest,
>> >*aarch64_brk_cc, *aarch64_brk_ptest,
>> >aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest,
>> >*aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber
>> >alternative.
>> >* config/aarch64/aarch64-sve2.md
>> >(@aarch64_pred_): Likewise.
>> >
>> > ---
>> > diff --git a/gcc/config/aarch64/aarch64-sve.md 
>> > b/gcc/config/aarch64/aarch64-
>> sve.md
>> > index
>> e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c
>> 297428c85fe46 100644
>> > --- a/gcc/config/aarch64/aarch64-sve.md
>> > +++ b/gcc/config/aarch64/aarch64-sve.md
>> > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z"
>> >  (reg:VNx16BI FFRT_REGNUM)
>> >  (match_operand:VNx16BI 1 "register_operand")))]
>> >"TARGET_SVE && TARGET_NON_STREAMING"
>> > -  {@ [ cons: =0, 1   ]
>> > - [ Upa , Upa ] rdffr\t%0.b, %1/z
>> > +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
>> > + [ &Upa, Upa; yes ] rdffr\t%0.b, %1/z
>> > + [ ?Upa, Upa; yes ] ^
>> > + [ Upa , Upa; *   ] ^
>> >}
>> >  )
>> 
>> Sorry for not explaining it very well, but in the previous review I 
>> suggested:
>> 
>> > The gather-like approach would be something like:
>> >
>> >  [ &Upa , Upl , w , ; yes ]
>> cmp\t%0., %1/z, %3., #%4
>> >  [ ?Upl , 0   , w , ; yes ] ^
>> >  [ Upa  , Upl , w , ; no  ] ^
>> >  [ &Upa , Upl , w , w; yes ] 
>> > cmp\t%0., %1/z,
>> %3., %4.
>> >  [ ?Upl , 0   , w , w; yes ] ^
>> >  [ Upa  , Upl , w , w; no  ] ^
>> >
>> > with:
>> >
>> >   (define_attr "pred_clobber" "any,no,yes" (const_string "any"))
>> 
>> (with emphasis on the last line).  What I didn't say explicitly is
>> that "no" should require !TARGET_SVE_PRED_CLOBBER.
>> 
>> The premise of that review was that we shouldn't enable things like:
>> 
>>  [ Upa  , Upl , w , w; no  ] ^
>> 
>> for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber
>> alternative.  So we should enable either the pred_clobber=yes
>> alternatives or the pred_clobber=no alternatives, but not both.
>> 
>> The default "any" is then for other non-predicate instructions that
>> don't care about TARGET_SVE_PRED_CLOBBER either way.
>> 
>> In contrast, this patch makes pred_clobber=yes enable the alternatives
>> that correctly describe the restriction (good!) but then also enables
>> the normal alternatives too, which IMO makes the semantics unclear.
>
> Sure, the reason I still had that is because this ICEs under high register
> pressure:
>
>   {@ [ cons: =0 , 1   , 3 , 4; attrs: pred_clobber ]
>  [ &Upa , Upl , w , ; yes ] 
> cmp\t%0., %1/z, %3., #%4
>  [ ?Upa , 0   , w , ; yes ] ^
>  [ Upa  , Upl , w , ; no  ] ^
>  [ &Upa , Upl , w , w; yes ] 
> cmp\t%0., %1/z, %3., %4.
>  [ ?Upa , 0   , w , w; yes ] ^
>  [ Upa  , Upl , w , w; no  ] ^
>   }
>
> So now in the `yes` case reload does:
>
>  Considering alt=0 of insn 10:   (0) =&Upa  (1) Upl  (3) w  (4) vsd
> 0 Small class reload: reject+=3
> 0 Non input pseudo reload: reject++
> 0 Early clobber

[PATCH] LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move [PR115169]

2024-05-22 Thread Xi Ruoyao
gcc/ChangeLog:

PR target/115169
* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Guard REGNO with REG_P.
---

Bootstrapped with --enable-checking=all.  Ok for trunk and 14?

 gcc/config/loongarch/loongarch.cc | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..1b6df6a4365 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5344,6 +5344,7 @@ loongarch_expand_conditional_move (rtx *operands)
   rtx op1_extend = op1;
 
   /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_op[2] = {false, false};
   bool promote_p = false;
   machine_mode mode = GET_MODE (operands[0]);
 
@@ -5351,9 +5352,15 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (&code, &op0, &op1);
   else
 {
-  if ((REGNO (op0) == REGNO (operands[2])
-  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
- && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+   {
+ promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
+  REGNO (op0) == REGNO (operands[2]));
+ promote_op[1] = (REG_P (op1) && REG_P (operands[3]) &&
+  REGNO (op1) == REGNO (operands[3]));
+   }
+
+  if (promote_op[0] || promote_op[1])
{
  mode = word_mode;
  promote_p = true;
@@ -5395,7 +5402,7 @@ loongarch_expand_conditional_move (rtx *operands)
 
   if (promote_p)
{
- if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+ if (promote_op[0])
op2 = op0_extend;
  else
{
@@ -5403,7 +5410,7 @@ loongarch_expand_conditional_move (rtx *operands)
  op2 = force_reg (mode, op2);
}
 
- if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+ if (promote_op[1])
op3 = op1_extend;
  else
{
-- 
2.45.1



Re: [PATCH 4/4] Testsuite updates

2024-05-22 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, 21 May 2024, Richard Biener wrote:
>
>> The gcc.dg/vect/slp-12a.c case is interesting as we currently split
>> the 8 store group into lanes 0-5 which we SLP with an unroll factor
>> of two (on x86-64 with SSE) and the remaining two lanes are using
>> interleaving vectorization with a final unroll factor of four.  Thus
>> we're using hybrid SLP within a single store group.  After the change
>> we discover the same 0-5 lane SLP part as well as two single-lane
>> parts feeding the full store group.  But that results in a load
>> permutation that isn't supported (I have WIP patchs to rectify that).
>> So we end up cancelling SLP and vectorizing the whole loop with
>> interleaving which is IMO good and results in better code.
>> 
>> This is similar for gcc.target/i386/pr52252-atom.c where interleaving
>> generates much better code than hybrid SLP.  I'm unsure how to update
>> the testcase though.
>> 
>> gcc.dg/vect/slp-21.c runs into similar situations.  Note that when
>> when analyzing SLP operations we discard an instance we currently
>> force the full loop to have no SLP because hybrid detection is
>> broken.  It's probably not worth fixing this at this moment.
>> 
>> For gcc.dg/vect/pr97428.c we are not splitting the 16 store group
>> into two but merge the two 8 lane loads into one before doing the
>> store and thus have only a single SLP instance.  A similar situation
>> happens in gcc.dg/vect/slp-11c.c but the branches feeding the
>> single SLP store only have a single lane.  Likewise for
>> gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c.
>> 
>> gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization
>> with a SLP store group of size two but two single-lane branches.
>> 
>> gcc.target/i386/pr98928.c ICEs in SLP permute optimization
>> because we don't expect a constant and internal branch to be
>> merged with a permute node in
>> vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only
>> permutes merging two SLP nodes are two-operator nodes right now).
>> This still requires fixing.
>> 
>> The whole series has been bootstrapped and tested on 
>> x86_64-unknown-linux-gnu with the gcc.target/i386/pr98928.c FAIL
>> unfixed.
>> 
>> Comments welcome (and hello ARM CI), RISC-V and other arch
>> testing appreciated.  Unless there are comments to the contrary
>> I plan to push patch 1 and 2 tomorrow.
>
> RISC-V CI didn't trigger (not sure what magic is required).  Both
> ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit
> fragile because we sometimes cancel SLP becuase we want to use
> load/store-lanes.
>
> I have locally scrapped the SLP scanning for gcc.dg/vect/slp-21.c where
> it doesn't really matter (and if we are finished with all-SLP it will
> matter nowhere).  I've conditionalized the outcome based on
> vect_load_lanes for gcc.dg/vect/slp-11c.c and
> gcc.dg/vect/slp-cond-1.c
>
> On AARCH64 additionally gcc.target/aarch64/sve/mask_struct_store_4.c
> ICEs, I have a fix for that.
>
> gcc.target/aarch64/pr99873_2.c FAILs because with a single
> SLP store group merged from two two-lane load groups we cancel
> the SLP and want to use load/store-lanes.  I'll leave this
> FAILing or shall I XFAIL it?

Yeah, agree it's probably worth leaving it FAILing for now, since it
is something we should try to fix for GCC 15.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>> Thanks,
>> Richard.
>> 
>>  * gcc.dg/vect/pr97428.c: Expect a single store SLP group.
>>  * gcc.dg/vect/slp-11c.c: Likewise.
>>  * gcc.dg/vect/vect-complex-5.c: Likewise.
>>  * gcc.dg/vect/slp-12a.c: Do not expect SLP.
>>  * gcc.dg/vect/slp-21.c: Likewise.
>>  * gcc.dg/vect/slp-cond-1.c: Expect one more SLP.
>>  * gcc.dg/vect/vect-gather-2.c: Expect SLP to be used.
>>  * gcc.target/i386/pr52252-atom.c: XFAIL test for palignr.
>> ---
>>  gcc/testsuite/gcc.dg/vect/pr97428.c  |  2 +-
>>  gcc/testsuite/gcc.dg/vect/slp-11c.c  |  5 +++--
>>  gcc/testsuite/gcc.dg/vect/slp-12a.c  |  6 +-
>>  gcc/testsuite/gcc.dg/vect/slp-21.c   | 19 +--
>>  gcc/testsuite/gcc.dg/vect/slp-cond-1.c   |  2 +-
>>  gcc/testsuite/gcc.dg/vect/vect-complex-5.c   |  2 +-
>>  gcc/testsuite/gcc.dg/vect/vect-gather-2.c|  1 -
>>  gcc/testsuite/gcc.target/i386/pr52252-atom.c |  3 ++-
>>  8 files changed, 18 insertions(+), 22 deletions(-)
>> 
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c 
>> b/gcc/testsuite/gcc.dg/vect/pr97428.c
>> index 60dd984cfd3..3cc9976c00c 100644
>> --- a/gcc/testsuite/gcc.dg/vect/pr97428.c
>> +++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
>> @@ -44,5 +44,5 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n)
>>  /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" 
>> "vect" } } */
>>  /* We're not able to peel & apply re-aligning to make accesses well-aligned 
>> for !vect_hw_misalign,
>> but we could by peeling the stores for alignment and applying 
>> re-aligning loads.  */

Re: [PATCH v1 3/6] Rename functions for reuse in AArch64

2024-05-22 Thread Richard Sandiford
Evgeny Karpov  writes:
> This patch renames functions related to dllimport/dllexport
> and selectany functionality. These functions will be reused
> in the aarch64-w64-mingw32 target.
>
> gcc/ChangeLog:
>
>   * config/i386/cygming.h (mingw_pe_record_stub):
>   Rename functions in mingw folder which will be reused for
>   aarch64.
>   (TARGET_ASM_FILE_END): Update to new target-independent name.
>   (SUBTARGET_ATTRIBUTE_TABLE): Likewise.
>   (TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
>   (SUB_TARGET_RECORD_STUB): Likewise.
>   * config/i386/i386-protos.h (ix86_handle_selectany_attribute): Likewise.
>   (mingw_handle_selectany_attribute): Likewise.
>   (i386_pe_valid_dllimport_attribute_p): Likewise.
>   (mingw_pe_valid_dllimport_attribute_p): Likewise.
>   (i386_pe_file_end): Likewise.
>   (mingw_pe_file_end): Likewise.
>   (i386_pe_record_stub): Likewise.
>   (mingw_pe_record_stub): Likewise.
>   * config/mingw/winnt.cc (ix86_handle_selectany_attribute): Likewise.
>   (mingw_handle_selectany_attribute): Likewise.
>   (i386_pe_valid_dllimport_attribute_p): Likewise.
>   (mingw_pe_valid_dllimport_attribute_p): Likewise.
>   (i386_pe_record_stub): Likewise.
>   (mingw_pe_record_stub): Likewise.
>   (i386_pe_file_end): Likewise.
>   (mingw_pe_file_end): Likewise.
>   * config/mingw/winnt.h (mingw_handle_selectany_attribute):
>   Declate functionality that will be reused by multiple targets.
>   (mingw_pe_file_end): Likewise.
>   (mingw_pe_record_stub): Likewise.
>   (mingw_pe_valid_dllimport_attribute_p): Likewise.

Ok, but...

> [...]
> diff --git a/gcc/config/mingw/winnt.cc b/gcc/config/mingw/winnt.cc
> index 9901576ade0..a0b5950be2e 100644
> --- a/gcc/config/mingw/winnt.cc
> +++ b/gcc/config/mingw/winnt.cc
> @@ -71,7 +71,7 @@ ix86_handle_shared_attribute (tree *node, tree name, tree, 
> int,
>  /* Handle a "selectany" attribute;
> arguments as in struct attribute_spec.handler.  */
>  tree
> -ix86_handle_selectany_attribute (tree *node, tree name, tree, int,
> +mingw_handle_selectany_attribute (tree *node, tree name, tree, int,
>bool *no_add_attrs)

please reindent the parameters for the new name length.

Thanks,
Richard


[PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-05-22 Thread Di Zhao OS
The test case is for targets that support FMA. Previously
the "target" selector is missed in dg-final command.

Tested on x86_64-pc-linux-gnu.

Thanks
Di Zhao

gcc/testsuite/ChangeLog:

* gcc.dg/pr110279-1.c: add target selector.

---
 gcc/testsuite/gcc.dg/pr110279-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr110279-1.c 
b/gcc/testsuite/gcc.dg/pr110279-1.c
index a8c7257b28d..5436163bc78 100644
--- a/gcc/testsuite/gcc.dg/pr110279-1.c
+++ b/gcc/testsuite/gcc.dg/pr110279-1.c
@@ -64,4 +64,4 @@ foo3 (data_e a, data_e b, data_e c, data_e d)
   return result;
 }
 
-/* { dg-final { scan-tree-dump-times "Generated FMA" 3 "widening_mul"} } */
\ No newline at end of file
+/* { dg-final { scan-tree-dump-times "Generated FMA" 3 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } || { aarch64*-*-* } } } } } */
-- 
2.25.1


Re: [PATCH v2] testsuite: Verify r0-r3 are extended with CMSE

2024-05-22 Thread Richard Earnshaw (lists)
On 22/05/2024 12:14, Torbjorn SVENSSON wrote:
> Hello Richard,
> 
> Thanks for the reply.
> 
> From my point of view, at least the -fshort-enums part should be on all 
> branches. Just to be clean, maybe it's easier to backport the entire patch?

Yes, that's a fair point.  I was only thinking about the broadening of the test 
to the other argument registers when I said that.

So, just to be clear, OK all.

R.

> 
> Unless you have an objection, I would like to go ahead and just backport it 
> to all branches.
> 
> Kind regards,
> Torbjörn
> 
> On 2024-05-22 12:55, Richard Earnshaw (lists) wrote:
>> On 06/05/2024 12:50, Torbjorn SVENSSON wrote:
>>> Hi,
>>>
>>> Forgot to mention when I sent the patch that I would like to commit it to 
>>> the following branches:
>>>
>>> - releases/gcc-11
>>> - releases/gcc-12
>>> - releases/gcc-13
>>> - releases/gcc-14
>>> - trunk
>>>
>>
>> Well you can [commit it to the release branches], but I'm not sure it's 
>> essential.  It seems pretty unlikely to me that this would regress on a 
>> release branch without having first regressed on trunk.
>>
>> R.
>>
>>> Kind regards,
>>> Torbjörn
>>>
>>> On 2024-05-02 12:50, Torbjörn SVENSSON wrote:
 Add regression test to the existing zero/sign extend tests for CMSE to
 verify that r0, r1, r2 and r3 are properly extended, not just r0.

 boolCharShortEnumSecureFunc test is done using -O0 to ensure the
 instructions are in a predictable order.

 gcc/testsuite/ChangeLog:

  * gcc.target/arm/cmse/extend-param.c: Add regression test. Add
    -fshort-enums.
  * gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.

 Signed-off-by: Torbjörn SVENSSON 
 ---
    .../gcc.target/arm/cmse/extend-param.c    | 21 +++
    .../gcc.target/arm/cmse/extend-return.c   |  4 ++--
    2 files changed, 19 insertions(+), 6 deletions(-)

 diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
 b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
 index 01fac786238..d01ef87e0be 100644
 --- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
 +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
 @@ -1,5 +1,5 @@
    /* { dg-do compile } */
 -/* { dg-options "-mcmse" } */
 +/* { dg-options "-mcmse -fshort-enums" } */
    /* { dg-final { check-function-bodies "**" "" "" } } */
      #include 
 @@ -78,7 +78,6 @@ __attribute__((cmse_nonsecure_entry)) char 
 enumSecureFunc (enum offset index) {
  if (index >= ARRAY_SIZE)
    return 0;
  return array[index];
 -
    }
      /*
 @@ -88,9 +87,23 @@ __attribute__((cmse_nonsecure_entry)) char 
 enumSecureFunc (enum offset index) {
    **    ...
    */
    __attribute__((cmse_nonsecure_entry)) char boolSecureFunc (bool index) {
 -
  if (index >= ARRAY_SIZE)
    return 0;
  return array[index];
 +}
    -}
 \ No newline at end of file
 +/*
 +**__acle_se_boolCharShortEnumSecureFunc:
 +**    ...
 +**    uxtb    r0, r0
 +**    uxtb    r1, r1
 +**    uxth    r2, r2
 +**    uxtb    r3, r3
 +**    ...
 +*/
 +__attribute__((cmse_nonsecure_entry,optimize(0))) char 
 boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, 
 enum offset d) {
 +  size_t index = a + b + c + d;
 +  if (index >= ARRAY_SIZE)
 +    return 0;
 +  return array[index];
 +}
 diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
 b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
 index cf731ed33df..081de0d699f 100644
 --- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
 +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
 @@ -1,5 +1,5 @@
    /* { dg-do compile } */
 -/* { dg-options "-mcmse" } */
 +/* { dg-options "-mcmse -fshort-enums" } */
    /* { dg-final { check-function-bodies "**" "" "" } } */
      #include 
 @@ -89,4 +89,4 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
 (ns_enum_foo_t * ns_foo_p)
    unsigned char boolNonsecure0 (ns_bool_foo_t * ns_foo_p)
    {
  return ns_foo_p ();
 -}
 \ No newline at end of file
 +}
>>



Re: [PATCH v1 4/6] aarch64: Add selectany attribute handling

2024-05-22 Thread Richard Sandiford
Evgeny Karpov  writes:
> This patch extends the aarch64 attributes list with the selectany
> attribute for the aarch64-w64-mingw32 target and reuses the mingw
> implementation to handle it.
>
>   * config/aarch64/aarch64.cc:
>   Extend the aarch64 attributes list.
>   * config/aarch64/cygming.h (SUBTARGET_ATTRIBUTE_TABLE):
>   Define the selectany attribute.

Now that TARGET_ATTRIBUTE_TABLE is an array, it should in principle
be possible to define the attribute in winnt.cc and so avoid
duplicating the table entry.  That'd be a separate clean-up though.
I agree that for this series we should stick with the current approach.

So the patch is ok, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 5 -
>  gcc/config/aarch64/cygming.h  | 3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index c763a8a6298..19205927430 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -855,7 +855,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
> NULL },
>{ "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
>{ "SVE type",3, 3, false, true,  false, true,  NULL, NULL 
> },
> -  { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL }
> +  { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
> +#ifdef SUBTARGET_ATTRIBUTE_TABLE
> +  SUBTARGET_ATTRIBUTE_TABLE
> +#endif
>  };
>  
>  static const scoped_attribute_specs aarch64_gnu_attribute_table =
> diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
> index 0d048879311..76623153080 100644
> --- a/gcc/config/aarch64/cygming.h
> +++ b/gcc/config/aarch64/cygming.h
> @@ -154,6 +154,9 @@ still needed for compilation.  */
>  flag_stack_check = STATIC_BUILTIN_STACK_CHECK;   \
>} while (0)
>  
> +#define SUBTARGET_ATTRIBUTE_TABLE \
> +  { "selectany", 0, 0, true, false, false, false, \
> +mingw_handle_selectany_attribute, NULL }
>  
>  #define SUPPORTS_ONE_ONLY 1


Re: [PATCH v1 5/6] Adjust DLL import/export implementation for AArch64

2024-05-22 Thread Richard Sandiford
Evgeny Karpov  writes:
> The DLL import/export mingw implementation, originally from ix86, requires
> minor adjustments to be compatible with AArch64.
>
> gcc/ChangeLog:
>
>   * config/mingw/mingw32.h (defined): Use the correct DllMainCRTStartup
>   entry function.
>   * config/mingw/winnt-dll.cc (defined): Exclude ix86-related code.
> ---
>  gcc/config/mingw/mingw32.h| 2 +-
>  gcc/config/mingw/winnt-dll.cc | 4 
>  2 files changed, 5 insertions(+), 1 deletion(-)

Could we provide some abstractions here, rather than testing
CPU-specific macros directly?  E.g.:

>
> diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
> index 08f1b5f0696..efe777051b4 100644
> --- a/gcc/config/mingw/mingw32.h
> +++ b/gcc/config/mingw/mingw32.h
> @@ -79,7 +79,7 @@ along with GCC; see the file COPYING3.  If not see
>  #endif
>  
>  #undef SUB_LINK_ENTRY
> -#if TARGET_64BIT_DEFAULT
> +#if TARGET_64BIT_DEFAULT || defined (TARGET_AARCH64_MS_ABI)

it looks like this is equivalent to something like "HAVE_64BIT_POINTERS"
or something, which aarch64 could define to 1 and x86 could define
to TARGET_64BIT_DEFAULT.

The name is just a suggestion, based on not really knowing what the
macro selects.  Please use whatever makes most sense :)

>  #define SUB_LINK_ENTRY SUB_LINK_ENTRY64
>  #else
>  #define SUB_LINK_ENTRY SUB_LINK_ENTRY32
> diff --git a/gcc/config/mingw/winnt-dll.cc b/gcc/config/mingw/winnt-dll.cc
> index 349ade6f5c0..294361fab4c 100644
> --- a/gcc/config/mingw/winnt-dll.cc
> +++ b/gcc/config/mingw/winnt-dll.cc
> @@ -206,9 +206,13 @@ legitimize_pe_coff_symbol (rtx addr, bool inreg)
>   }
>  }
>  
> +#if !defined (TARGET_AARCH64_MS_ABI)
> +
>if (ix86_cmodel != CM_LARGE_PIC && ix86_cmodel != CM_MEDIUM_PIC)
>  return NULL_RTX;
>  
> +#endif
> +

Similarly here, it feels like there is a concept underlying this check.
Could we just use:

  if (!NEW_MACRO)
return NULL_RTX;

with NEW_MACRO describing the underlying property that is common to
medium x86 PIC, large x86 PIC, and aarch64.

Thanks,
Richard

>if (GET_CODE (addr) == SYMBOL_REF
>&& !is_imported_p (addr)
>&& SYMBOL_REF_EXTERNAL_P (addr)


Re: [PATCH] aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-22 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold vget_high_*
> intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from
> arm_neon.h to use the new intrinsics framework.
>
>   PR target/102171
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_HIGH_BUILTINS):
>   New macro to create definitions for all vget_high intrinsics.
>   (VGET_HIGH_BUILTIN): Likewise.
>   (enum aarch64_builtins): Add vget_high function codes.
>   (AARCH64_SIMD_VGET_LOW_BUILTINS): Delete duplicate macro.
>   (aarch64_general_fold_builtin): Fold vget_high calls.
>   * config/aarch64/aarch64-simd-builtins.def: Delete vget_high builtins.
>   * config/aarch64/aarch64-simd.md (aarch64_get_high): Delete.
>   (aarch64_vget_hi_halfv8bf): Likewise.
>   * config/aarch64/arm_neon.h (__attribute__): Delete.
>   (vget_high_f16): Likewise.
>   (vget_high_f32): Likewise.
>   (vget_high_f64): Likewise.
>   (vget_high_p8): Likewise.
>   (vget_high_p16): Likewise.
>   (vget_high_p64): Likewise.
>   (vget_high_s8): Likewise.
>   (vget_high_s16): Likewise.
>   (vget_high_s32): Likewise.
>   (vget_high_s64): Likewise.
>   (vget_high_u8): Likewise.
>   (vget_high_u16): Likewise.
>   (vget_high_u32): Likewise.
>   (vget_high_u64): Likewise.
>   (vget_high_bf16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/vget_high_2.c: New test.
>   * gcc.target/aarch64/vget_high_2_be.c: New test.

OK, thanks.

Richard

> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-builtins.cc|  59 +++---
>  gcc/config/aarch64/aarch64-simd-builtins.def  |   6 -
>  gcc/config/aarch64/aarch64-simd.md|  22 
>  gcc/config/aarch64/arm_neon.h | 105 --
>  .../gcc.target/aarch64/vget_high_2.c  |  30 +
>  .../gcc.target/aarch64/vget_high_2_be.c   |  31 ++
>  6 files changed, 104 insertions(+), 149 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2_be.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 11b888016ed..f8eeccb554d 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -675,6 +675,23 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>VGET_LOW_BUILTIN(u64) \
>VGET_LOW_BUILTIN(bf16)
>  
> +#define AARCH64_SIMD_VGET_HIGH_BUILTINS \
> +  VGET_HIGH_BUILTIN(f16) \
> +  VGET_HIGH_BUILTIN(f32) \
> +  VGET_HIGH_BUILTIN(f64) \
> +  VGET_HIGH_BUILTIN(p8) \
> +  VGET_HIGH_BUILTIN(p16) \
> +  VGET_HIGH_BUILTIN(p64) \
> +  VGET_HIGH_BUILTIN(s8) \
> +  VGET_HIGH_BUILTIN(s16) \
> +  VGET_HIGH_BUILTIN(s32) \
> +  VGET_HIGH_BUILTIN(s64) \
> +  VGET_HIGH_BUILTIN(u8) \
> +  VGET_HIGH_BUILTIN(u16) \
> +  VGET_HIGH_BUILTIN(u32) \
> +  VGET_HIGH_BUILTIN(u64) \
> +  VGET_HIGH_BUILTIN(bf16)
> +
>  typedef struct
>  {
>const char *name;
> @@ -717,6 +734,9 @@ typedef struct
>  #define VGET_LOW_BUILTIN(A) \
>AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
>  
> +#define VGET_HIGH_BUILTIN(A) \
> +  AARCH64_SIMD_BUILTIN_VGET_HIGH_##A,
> +
>  #undef VAR1
>  #define VAR1(T, N, MAP, FLAG, A) \
>AARCH64_SIMD_BUILTIN_##T##_##N##A,
> @@ -753,6 +773,7 @@ enum aarch64_builtins
>/* SIMD intrinsic builtins.  */
>AARCH64_SIMD_VREINTERPRET_BUILTINS
>AARCH64_SIMD_VGET_LOW_BUILTINS
> +  AARCH64_SIMD_VGET_HIGH_BUILTINS
>/* ARMv8.3-A Pointer Authentication Builtins.  */
>AARCH64_PAUTH_BUILTIN_AUTIA1716,
>AARCH64_PAUTH_BUILTIN_PACIA1716,
> @@ -855,26 +876,21 @@ static aarch64_fcmla_laneq_builtin_datum 
> aarch64_fcmla_lane_builtin_data[] = {
> false \
>},
>  
> -#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> -  VGET_LOW_BUILTIN(f16) \
> -  VGET_LOW_BUILTIN(f32) \
> -  VGET_LOW_BUILTIN(f64) \
> -  VGET_LOW_BUILTIN(p8) \
> -  VGET_LOW_BUILTIN(p16) \
> -  VGET_LOW_BUILTIN(p64) \
> -  VGET_LOW_BUILTIN(s8) \
> -  VGET_LOW_BUILTIN(s16) \
> -  VGET_LOW_BUILTIN(s32) \
> -  VGET_LOW_BUILTIN(s64) \
> -  VGET_LOW_BUILTIN(u8) \
> -  VGET_LOW_BUILTIN(u16) \
> -  VGET_LOW_BUILTIN(u32) \
> -  VGET_LOW_BUILTIN(u64) \
> -  VGET_LOW_BUILTIN(bf16)
> +#undef VGET_HIGH_BUILTIN
> +#define VGET_HIGH_BUILTIN(A) \
> +  {"vget_high_" #A, \
> +   AARCH64_SIMD_BUILTIN_VGET_HIGH_##A, \
> +   2, \
> +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> +   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> +   FLAG_AUTO_FP, \
> +   false \
> +  },
>  
>  static const aarch64_simd_intrinsic_datum aarch64_simd_intrinsic_data[] = {
>AARCH64_SIMD_VREINTERPRET_BUILTINS
>AARCH64_SIMD_VGET_LOW_BUILTINS
> +  AARCH64_SIMD_VGET_HIGH_BUILTINS
>  };
>  
>  
> @@ -3270,6 +3286,10 @@ aarch64_fold_builtin_lane_check (tree arg0, tree arg1, 
> tree arg2)
>  #define VGET_LOW_BU

RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jivan Hakobyan
After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
* testsuite/gcc.target/riscv/round_32.c: Fixed test


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/testsuite/gcc.target/riscv/round_32.c b/gcc/testsuite/gcc.target/riscv/round_32.c
index 88ff77aff2e..b74be4e1103 100644
--- a/gcc/testsuite/gcc.target/riscv/round_32.c
+++ b/gcc/testsuite/gcc.target/riscv/round_32.c
@@ -7,17 +7,17 @@
 
 /* { dg-final { scan-assembler-times {\mfcvt.w.s} 15 } } */
 /* { dg-final { scan-assembler-times {\mfcvt.s.w} 5 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.d.w} 65 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.w.d} 15 } } */
-/* { dg-final { scan-assembler-times {,rup} 6 } } */
-/* { dg-final { scan-assembler-times {,rmm} 6 } } */
-/* { dg-final { scan-assembler-times {,rdn} 6 } } */
-/* { dg-final { scan-assembler-times {,rtz} 6 } } */
+/* { dg-final { scan-assembler-times {\mfcvt.d.w} 60 } } */
+/* { dg-final { scan-assembler-times {\mfcvt.w.d} 10 } } */
+/* { dg-final { scan-assembler-times {,rup} 5 } } */
+/* { dg-final { scan-assembler-times {,rmm} 5 } } */
+/* { dg-final { scan-assembler-times {,rdn} 5 } } */
+/* { dg-final { scan-assembler-times {,rtz} 5 } } */
 /* { dg-final { scan-assembler-not {\mfcvt.l.d} } } */
 /* { dg-final { scan-assembler-not {\mfcvt.d.l} } } */
-/* { dg-final { scan-assembler-not "\\sceil\\s" } } */
-/* { dg-final { scan-assembler-not "\\sfloor\\s" } } */
-/* { dg-final { scan-assembler-not "\\sround\\s" } } */
-/* { dg-final { scan-assembler-not "\\snearbyint\\s" } } */
-/* { dg-final { scan-assembler-not "\\srint\\s" } } */
-/* { dg-final { scan-assembler-not "\\stail\\s" } } */
+/* { dg-final { scan-assembler-times "\tceil\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\tfloor\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\tround\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\tnearbyint\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\ttrunc\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\stail\\s" 5 { target { no-opts "-O1" } } } } */


[PATCH 1/2][v2] Avoid splitting store dataref groups during SLP discovery

2024-05-22 Thread Richard Biener
The following avoids splitting store dataref groups during SLP
discovery but instead forces (eventually single-lane) consecutive
lane SLP discovery for all lanes of the group, creating VEC_PERM
SLP nodes merging them so the store will always cover the whole group.

With this for example

int x[1024], y[1024], z[1024], w[1024];
void foo (void)
{
  for (int i = 0; i < 256; i++)
{
  x[4*i+0] = y[2*i+0];
  x[4*i+1] = y[2*i+1];
  x[4*i+2] = z[i];
  x[4*i+3] = w[i];
}
}

which was previously using hybrid SLP can now be fully SLPed and
SSE code generated looks better (but of course you never know,
I didn't actually benchmark).  We of course need a VF of four here.

.L2:
movdqa  z(%rax), %xmm0
movdqa  w(%rax), %xmm4
movdqa  y(%rax,%rax), %xmm2
movdqa  y+16(%rax,%rax), %xmm1
movdqa  %xmm0, %xmm3
punpckhdq   %xmm4, %xmm0
punpckldq   %xmm4, %xmm3
movdqa  %xmm2, %xmm4
shufps  $238, %xmm3, %xmm2
movaps  %xmm2, x+16(,%rax,4)
movdqa  %xmm1, %xmm2
shufps  $68, %xmm3, %xmm4
shufps  $68, %xmm0, %xmm2
movaps  %xmm4, x(,%rax,4)
shufps  $238, %xmm0, %xmm1
movaps  %xmm2, x+32(,%rax,4)
movaps  %xmm1, x+48(,%rax,4)
addq$16, %rax
cmpq$1024, %rax
jne .L2

The extra permute nodes merging distinct branches of the SLP
tree might be unexpected for some code, esp. since
SLP_TREE_REPRESENTATIVE cannot be meaningfully set and we
cannot populate SLP_TREE_SCALAR_STMTS or SLP_TREE_SCALAR_OPS
consistently as we can have a mix of both.

The patch keeps the sub-trees form consecutive lanes but that's
in principle not necessary if we for example have an even/odd
split which now would result in N single-lane sub-trees.  That's
left for future improvements.

The interesting part is how VLA vector ISAs handle merging of
two vectors that's not trivial even/odd merging.  The strathegy
of how to build the permute tree might need adjustments for that
(in the end splitting each branch to single lanes and then doing
even/odd merging would be the brute-force fallback).  Not sure
how much we can or should rely on the SLP optimize pass to handle
this.

* tree-vect-slp.cc (vect_build_slp_instance): Do not split
store dataref groups on loop SLP discovery failure but create
a single SLP instance for the stores but branch to SLP sub-trees
and merge with a series of VEC_PERM nodes.
---
 gcc/tree-vect-slp.cc | 247 ++-
 1 file changed, 221 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3f8209b43a7..1fbc7a672a7 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3468,12 +3468,7 @@ vect_build_slp_instance (vec_info *vinfo,
  return true;
}
 }
-  else
-{
-  /* Failed to SLP.  */
-  /* Free the allocated memory.  */
-  scalar_stmts.release ();
-}
+  /* Failed to SLP.  */
 
   stmt_vec_info stmt_info = stmt_info_;
   /* Try to break the group up into pieces.  */
@@ -3491,6 +3486,9 @@ vect_build_slp_instance (vec_info *vinfo,
   if (is_a  (vinfo)
  && (i > 1 && i < group_size))
{
+ /* Free the allocated memory.  */
+ scalar_stmts.release ();
+
  tree scalar_type
= TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
@@ -3535,38 +3533,235 @@ vect_build_slp_instance (vec_info *vinfo,
}
}
 
-  /* For loop vectorization split into arbitrary pieces of size > 1.  */
-  if (is_a  (vinfo)
- && (i > 1 && i < group_size)
- && !vect_slp_prefer_store_lanes_p (vinfo, stmt_info, group_size, i))
+  /* For loop vectorization split the RHS into arbitrary pieces of
+size >= 1.  */
+  else if (is_a  (vinfo)
+  && (i > 0 && i < group_size)
+  && !vect_slp_prefer_store_lanes_p (vinfo,
+ stmt_info, group_size, i))
{
- unsigned group1_size = i;
-
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "Splitting SLP group at stmt %u\n", i);
 
- stmt_vec_info rest = vect_split_slp_store_group (stmt_info,
-  group1_size);
- /* Loop vectorization cannot handle gaps in stores, make sure
-the split group appears as strided.  */
- STMT_VINFO_STRIDED_P (rest) = 1;
- DR_GROUP_GAP (rest) = 0;
- STMT_VINFO_STRIDED_P (stmt_info) = 1;
- DR_GROUP_GAP (stmt_info) = 0;
+ /* Analyze the stored values and pinch them together with
+a permute node so we can preserve the whole store group.  */
+ auto_vec rhs_nodes;
+
+ /* Calculate the unrolling factor based on

[PATCH 2/2][v2] RISC-V: Testsuite updates

2024-05-22 Thread Richard Biener
The gcc.dg/vect/slp-12a.c case is interesting as we currently split
the 8 store group into lanes 0-5 which we SLP with an unroll factor
of two (on x86-64 with SSE) and the remaining two lanes are using
interleaving vectorization with a final unroll factor of four.  Thus
we're using hybrid SLP within a single store group.  After the change
we discover the same 0-5 lane SLP part as well as two single-lane
parts feeding the full store group.  But that results in a load
permutation that isn't supported (I have WIP patchs to rectify that).
So we end up cancelling SLP and vectorizing the whole loop with
interleaving which is IMO good and results in better code.

This is similar for gcc.target/i386/pr52252-atom.c where interleaving
generates much better code than hybrid SLP.  I'm unsure how to update
the testcase though.

gcc.dg/vect/slp-21.c runs into similar situations.  Note that when
when analyzing SLP operations we discard an instance we currently
force the full loop to have no SLP because hybrid detection is
broken.  It's probably not worth fixing this at this moment.

For gcc.dg/vect/pr97428.c we are not splitting the 16 store group
into two but merge the two 8 lane loads into one before doing the
store and thus have only a single SLP instance.  A similar situation
happens in gcc.dg/vect/slp-11c.c but the branches feeding the
single SLP store only have a single lane.  Likewise for
gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c.

gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization
with a SLP store group of size two but two single-lane branches.

gcc.target/i386/pr98928.c ICEs in SLP permute optimization
because we don't expect a constant and internal branch to be
merged with a permute node in
vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only
permutes merging two SLP nodes are two-operator nodes right now).
This still requires fixing.

* gcc.dg/vect/pr97428.c: Expect a single store SLP group.
* gcc.dg/vect/slp-11c.c: Likewise, if !vect_load_lanes.
* gcc.dg/vect/vect-complex-5.c: Likewise.
* gcc.dg/vect/slp-12a.c: Do not expect SLP.
* gcc.dg/vect/slp-21.c: Remove not important scanning for SLP.
* gcc.dg/vect/slp-cond-1.c: Expect one more SLP if !vect_load_lanes.
* gcc.dg/vect/vect-gather-2.c: Expect SLP to be used.
* gcc.target/i386/pr52252-atom.c: XFAIL test for palignr.
---
 gcc/testsuite/gcc.dg/vect/pr97428.c  |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-11c.c  |  6 --
 gcc/testsuite/gcc.dg/vect/slp-12a.c  |  6 +-
 gcc/testsuite/gcc.dg/vect/slp-21.c   | 18 +++---
 gcc/testsuite/gcc.dg/vect/slp-cond-1.c   |  3 ++-
 gcc/testsuite/gcc.dg/vect/vect-complex-5.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-2.c|  1 -
 gcc/testsuite/gcc.target/i386/pr52252-atom.c |  3 ++-
 8 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c 
b/gcc/testsuite/gcc.dg/vect/pr97428.c
index 60dd984cfd3..3cc9976c00c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97428.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
@@ -44,5 +44,5 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n)
 /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" "vect" 
} } */
 /* We're not able to peel & apply re-aligning to make accesses well-aligned 
for !vect_hw_misalign,
but we could by peeling the stores for alignment and applying re-aligning 
loads.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail { ! vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
xfail { ! vect_hw_misalign } } } } */
 /* { dg-final { scan-tree-dump-not "gap of 6 elements" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c 
b/gcc/testsuite/gcc.dg/vect/slp-11c.c
index 0f680cd4e60..2e70fca39ba 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
@@ -13,7 +13,8 @@ main1 ()
   unsigned int in[N*8] = 
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
   float out[N*8];
 
-  /* Different operations - not SLPable.  */
+  /* Different operations - we SLP the store and split the group to two
+ single-lane branches.  */
   for (i = 0; i < N*4; i++)
 {
   out[i*2] = ((float) in[i*2] * 2 + 6) ;
@@ -44,4 +45,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
{ vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  
} } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 

[PING] Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity

2024-05-22 Thread Aleksandar Rakic
Hi!

I'd like to ping the following patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647966.html
  a patch for the computation of the complexity for the unsupported addressing 
modes in ivopts

  This patch should be a fix for the bug which is described on the following 
link:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429
  It modifies the order of the complexity calculation. By fixing the 
complexities, the
  candidate selection is also fixed, which leads to the smaller code size.


Thanks

Aleksandar Rakić

Re: [PATCH] c++: canonicity of fn types w/ complex eh specs [PR115159]

2024-05-22 Thread Patrick Palka
On Tue, 21 May 2024, Jason Merrill wrote:

> On 5/21/24 21:55, Patrick Palka wrote:
> > On Tue, 21 May 2024, Jason Merrill wrote:
> > 
> > > On 5/21/24 17:27, Patrick Palka wrote:
> > > > On Tue, 21 May 2024, Jason Merrill wrote:
> > > > 
> > > > > On 5/21/24 15:36, Patrick Palka wrote:
> > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > > OK for trunk?
> > > > > > 
> > > > > > Alternatively, I considered fixing this by incrementing
> > > > > > comparing_specializations around the call to comp_except_specs in
> > > > > > cp_check_qualified_type, but generally for types whose identity
> > > > > > depends on whether comparing_specializations is set we need to
> > > > > > use structural equality anyway IIUC.
> > > > > 
> > > > > Why not both?
> > > > 
> > > > I figured the latter change isn't necessary/observable since
> > > > comparing_specializations would only make a difference for complex
> > > > exception specifications, and with this patch we won't even call
> > > > cp_check_qualified_type on a complex eh spec.
> > > 
> > > My concern is that if we're building a function type multiple times with
> > > the
> > > same noexcept-spec, this patch would mean creating multiple equivalent
> > > function types instead of reusing one already created for the same
> > > function.
> > > 
> > > > > > +  bool complex_p = (cr && cr != noexcept_true_spec
> > > > > > +   && !UNPARSED_NOEXCEPT_SPEC_P (cr));
> > > > > 
> > > > > Why treat unparsed specs differently from parsed ones?
> > > > 
> > > > Unparsed specs are unique according to cp_tree_equal, so in turn
> > > > function types with unparsed specs are unique, so it should be safe to
> > > > treat such types as canonical.  I'm not sure if this optimization
> > > > matters though; I'm happy to remove this case.
> > > 
> > > The idea that this optimization could make a difference raised the concern
> > > above.
> > 
> > Aha, makes sense.  To that end it seems we could strengthen the ce_exact
> > in comp_except_specs to require == instead of cp_tree_equal equality
> > when comparing two noexcept-specs; the only ce_exact callers are
> > cp_check_qualified_type and cxx_type_hash_eq, which should be fine with
> > that strengthening.  This way, we at least do try to reuse a variant if
> > the (complex or unparsed) noexcept-spec is exactly the same.
> 
> Sounds good.
> 
> Given that, we probably still want to move the canonical_eh_spec up in
> build_cp_fntype_variant, and pass that to cp_check_qualified_type?

And compare the canonical spec directly from cp_check_qualified_type
instead of using comp_except_specs?  Then IIUC for

  void f() throw(int);
  void g() throw(char);

we'd give g the same function type as f, which seems wrong?


> 
> > Like so?
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: canonicity of fn types w/ complex eh specs [PR115159]
> > 
> > Here the member functions QList::g and QList::h are given the same
> > function type since their exception specifications are equivalent
> > according to cp_tree_equal.  In doing so however this means that the
> > type of QList::h refers to a function parameter from QList::g, which
> > ends up confusing modules streaming.
> > 
> > I'm not sure if modules can be fixed to handle this situation, but
> > regardless it seems weird in principle that a function parameter can
> > escape in such a way.  The analogous situation with a trailing return
> > type and decltype
> > 
> >auto g(QList &other) -> decltype(f(other));
> >auto h(QList &other) -> decltype(f(other));
> > 
> > behaves better because we don't canonicalize decltype, and so the
> > function types of g and h are non-canonical and therefore not shared.
> > 
> > In light of this, it seems natural to treat function types with complex
> > eh specs as non-canonical as well so that each such function declaration
> > is given a unique function/method type node.  The main benefit of type
> > canonicalization is to speed up repeated type comparisons, but it should
> > rare for us to repeatedly compare two otherwise compatible function
> > types with complex exception specifications, so foregoing canonicalization
> > should not cause any problems.
> > 
> > To that end, this patch strengthens the ce_exact case of comp_except_specs
> > to require identity instead of equivalence of the exception specification
> > so that build_cp_fntype_variant doesn't reuse a variant when it shouldn't.
> > And in build_cp_fntype_variant we need to use structural equality for types
> > with a complex eh spec.  In turn we could simplify the code responsible
> > for adjusting unparsed eh spec variants.
> > 
> > PR c++/115159
> > 
> > gcc/cp/ChangeLog:
> > 
> > * tree.cc (build_cp_fntype_variant): Always use structural
> > equality for types with a complex exception specification.
> > (fixup_deferred_exception_variants): Always use structural
> > equality for adjusted variants.
> > * typeck.cc (comp_except_specs): R

Re: [PATCH v2] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-22 Thread Richard Biener
On Mon, May 20, 2024 at 1:00 PM  wrote:
>
> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.

I think it's more useful to add an overload to types_match with three
arguments and then use

 (if (INTEGRAL_TYPE_P (type)
  && types_match (type, TREE_TYPE (@0), TREE_TYPE (@1))
...

Richard.

> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (integer_types_ternary_match): New helper
> function to check tenary tree type matches or not.
> * gimple-match-head.cc (integer_types_ternary_match): Ditto but
> for match.
> * match.pd: Leverage above helper function to avoid code dup.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P 
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P 
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..401b52e7573 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (gt @0 (plus:c @0 @1
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_right_part_2 @0 @1)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))

Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-22 Thread Richard Biener
On Wed, May 22, 2024 at 3:17 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

I'm not convinced we should match this during early if-conversion, should we?
The middle-end doesn't really know .SAT_ADD but some handling of
.ADD_OVERFLOW is present.

But please add a comment before the new pattern, esp. since it's
non-obvious that this is an improvent.

I suspect you rely on this form being recognized as .SAT_ADD later but
what prevents us from breaking this?  Why not convert it to .SAT_ADD
immediately?  If this is because the ISEL pass (or the widen-mult pass)
cannot handle PHIs then I would suggest to split out enough parts of
tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cff67c84498..2dc77a46e67 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
> +  && direct_internal_fn_supported_p (IFN_SAT_ADD, type, 
> OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))
> +
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>


Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow for branchless unsigned SAT_ADD

2024-05-22 Thread Richard Biener
On Sun, May 19, 2024 at 8:37 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the branchless form for unsigned
> SAT_ADD when leverage __builtin_add_overflow.  For example as below:
>
> uint64_t sat_add_u(uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   uint64_t overflow = __builtin_add_overflow (x, y, &ret);
>
>   return (T)(-overflow) | ret;
> }
>
> Before this patch:
>
> uint64_t sat_add_u (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   long unsigned int _3;
>   __complex__ long unsigned int _6;
>   uint64_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _1 = REALPART_EXPR <_6>;
>   _2 = IMAGPART_EXPR <_6>;
>   _3 = -_2;
>   _8 = _1 | _3;
>   return _8;
> ;;succ:   EXIT
>
> }
>
> After this patch:
>
> uint64_t sat_add_u (uint64_t x, uint64_t y)
> {
>   uint64_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _8 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _8;
> ;;succ:   EXIT
>
> }
>
> The below tests suite are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add SAT_ADD right part 2 for __builtin_add_overflow.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index b291e34bbe4..5328e846aff 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3064,6 +3064,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
>   (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
>
> +(match (usadd_right_part_2 @0 @1)
> + (negate (imagpart (IFN_ADD_OVERFLOW:c @0 @1)))
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
> +

Can you merge this with the patch that makes use of the
usadd_right_part_2 match?
It's difficult to review on its own.

>  /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> because the sub part of left_part_2 cannot work with right_part_1.
> For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> --
> 2.34.1
>


Re: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD

2024-05-22 Thread Richard Biener
On Mon, May 20, 2024 at 1:50 PM Tamar Christina  wrote:
>
> Hi Pan,
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Monday, May 20, 2024 12:01 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com; Pan Li
> > 
> > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> >
> > From: Pan Li 
> >
> > This patch would like to support the branch form for unsigned
> > SAT_ADD.  For example as below:
> >
> > uint64_t
> > sat_add (uint64_t x, uint64_t y)
> > {
> >   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> > }
> >
> > Different to the branchless version,  we leverage the simplify to
> > convert the branch version of SAT_ADD into branchless if and only
> > if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> > the ability to choose branch or branchless implementation of .SAT_ADD.
> > For example,  some target can take care of branches code more optimally.
> >
> > When the target implement the IFN_SAT_ADD for unsigned and before this
> > patch:
> > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   uint64_t _2;
> >   __complex__ long unsigned int _6;
> >   long unsigned int _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
> >   _1 = REALPART_EXPR <_6>;
> >   _7 = IMAGPART_EXPR <_6>;
> >   if (_7 == 0)
> > goto ; [65.00%]
> >   else
> > goto ; [35.00%]
> > ;;succ:   4
> > ;;3
> >
> > ;;   basic block 3, loop depth 0
> > ;;pred:   2
> > ;;succ:   4
> >
> > ;;   basic block 4, loop depth 0
> > ;;pred:   3
> > ;;2
> >   # _2 = PHI <18446744073709551615(3), _1(2)>
> >   return _2;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _9;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
> >   return _9;
> > ;;succ:   EXIT
> > }
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
> >
> > gcc/ChangeLog:
> >
> >   * match.pd: Add new simplify to convert branch SAT_ADD into
> >   branchless,  if and only if backend implement the IFN.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/match.pd | 18 ++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 0f9c34fa897..0547b57b3a3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +#if GIMPLE
> > +
> > +/* Simplify the branch version of SAT_ADD into branchless if and only if
> > +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> > +   ability to choose branch or branchless implementation of .SAT_ADD.  */

This comment or part of the description above should say this simplifies

   (x + y) >= x ? (x + y) : -1

as

  (x + y) | (-(typeof(x))((x + y) < x))

> > +(simplify
> > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))
> > +
> > +(simplify
> > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))

and this should probably be (gt @2 @0)?

This misses INTEGER_TYPE_P constraints and it's supposed to be only
for TYPE_UNSIGNED?

> > +
> > +#endif
>
> Thanks, this looks good to me!
>
> I'll leave it up to Richard to approve,
> Richard: The reason for the direct_internal_fn_supported_p is because some
> targets said that they currently handle the branch version better due to the 
> lack
> of some types.  At the time I reason it's just a target expansion bug but 
> didn't hear anything.
>
> To be honest, it feels to me like we should do this unconditionally, and just 
> have the targets
> that get faster branch version to handle it during expand? Since the patch 
> series provides
> a canonicalized version now.

I'm not sure this is a good canonical form.

__imag .ADD_OVERFLOW (x, y) ? __real .ADD_OVERFLOW (x, y) : -1

would be better IMO.  It can be branch-less by using a COND_EXPR.

> This means we can also better support targets that have the vector optab but 
> not the scalar one
> as the above check would fail for these targets.
>
> What do you think?
>
> Thanks,
> Tamar
>
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> > x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > --
> > 2.34.1
>


Re: [PATCH] c++: canonicity of fn types w/ complex eh specs [PR115159]

2024-05-22 Thread Jason Merrill

On 5/22/24 09:01, Patrick Palka wrote:

On Tue, 21 May 2024, Jason Merrill wrote:


On 5/21/24 21:55, Patrick Palka wrote:

On Tue, 21 May 2024, Jason Merrill wrote:


On 5/21/24 17:27, Patrick Palka wrote:

On Tue, 21 May 2024, Jason Merrill wrote:


On 5/21/24 15:36, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

Alternatively, I considered fixing this by incrementing
comparing_specializations around the call to comp_except_specs in
cp_check_qualified_type, but generally for types whose identity
depends on whether comparing_specializations is set we need to
use structural equality anyway IIUC.


Why not both?


I figured the latter change isn't necessary/observable since
comparing_specializations would only make a difference for complex
exception specifications, and with this patch we won't even call
cp_check_qualified_type on a complex eh spec.


My concern is that if we're building a function type multiple times with
the
same noexcept-spec, this patch would mean creating multiple equivalent
function types instead of reusing one already created for the same
function.


+  bool complex_p = (cr && cr != noexcept_true_spec
+   && !UNPARSED_NOEXCEPT_SPEC_P (cr));


Why treat unparsed specs differently from parsed ones?


Unparsed specs are unique according to cp_tree_equal, so in turn
function types with unparsed specs are unique, so it should be safe to
treat such types as canonical.  I'm not sure if this optimization
matters though; I'm happy to remove this case.


The idea that this optimization could make a difference raised the concern
above.


Aha, makes sense.  To that end it seems we could strengthen the ce_exact
in comp_except_specs to require == instead of cp_tree_equal equality
when comparing two noexcept-specs; the only ce_exact callers are
cp_check_qualified_type and cxx_type_hash_eq, which should be fine with
that strengthening.  This way, we at least do try to reuse a variant if
the (complex or unparsed) noexcept-spec is exactly the same.


Sounds good.

Given that, we probably still want to move the canonical_eh_spec up in
build_cp_fntype_variant, and pass that to cp_check_qualified_type?


And compare the canonical spec directly from cp_check_qualified_type
instead of using comp_except_specs?  Then IIUC for

   void f() throw(int);
   void g() throw(char);

we'd give g the same function type as f, which seems wrong?


Good point, I was confused about what canonical_eh_spec was doing.  Your 
last patch is OK.


Jason



Re: [PATCH 3/4] Avoid splitting store dataref groups during SLP discovery

2024-05-22 Thread Richard Biener
On Tue, 21 May 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following avoids splitting store dataref groups during SLP
> > discovery but instead forces (eventually single-lane) consecutive
> > lane SLP discovery for all lanes of the group, creating VEC_PERM
> > SLP nodes merging them so the store will always cover the whole group.
> >
> > With this for example
> >
> > int x[1024], y[1024], z[1024], w[1024];
> > void foo (void)
> > {
> >   for (int i = 0; i < 256; i++)
> > {
> >   x[4*i+0] = y[2*i+0];
> >   x[4*i+1] = y[2*i+1];
> >   x[4*i+2] = z[i];
> >   x[4*i+3] = w[i];
> > }
> > }
> >
> > which was previously using hybrid SLP can now be fully SLPed and
> 
> Nice!
> 
> > SSE code generated looks better (but of course you never know,
> > I didn't actually benchmark).  We of course need a VF of four here.
> >
> > .L2:
> > movdqa  z(%rax), %xmm0
> > movdqa  w(%rax), %xmm4
> > movdqa  y(%rax,%rax), %xmm2
> > movdqa  y+16(%rax,%rax), %xmm1
> > movdqa  %xmm0, %xmm3
> > punpckhdq   %xmm4, %xmm0
> > punpckldq   %xmm4, %xmm3
> > movdqa  %xmm2, %xmm4
> > shufps  $238, %xmm3, %xmm2
> > movaps  %xmm2, x+16(,%rax,4)
> > movdqa  %xmm1, %xmm2
> > shufps  $68, %xmm3, %xmm4
> > shufps  $68, %xmm0, %xmm2
> > movaps  %xmm4, x(,%rax,4)
> > shufps  $238, %xmm0, %xmm1
> > movaps  %xmm2, x+32(,%rax,4)
> > movaps  %xmm1, x+48(,%rax,4)
> > addq$16, %rax
> > cmpq$1024, %rax
> > jne .L2
> >
> > The extra permute nodes merging distinct branches of the SLP
> > tree might be unexpected for some code, esp. since
> > SLP_TREE_REPRESENTATIVE cannot be meaningfully set and we
> > cannot populate SLP_TREE_SCALAR_STMTS or SLP_TREE_SCALAR_OPS
> > consistently as we can have a mix of both.
> >
> > The patch keeps the sub-trees form consecutive lanes but that's
> > in principle not necessary if we for example have an even/odd
> > split which now would result in N single-lane sub-trees.  That's
> > left for future improvements.
> >
> > The interesting part is how VLA vector ISAs handle merging of
> > two vectors that's not trivial even/odd merging.  The strathegy
> > of how to build the permute tree might need adjustments for that
> > (in the end splitting each branch to single lanes and then doing
> > even/odd merging would be the brute-force fallback).  Not sure
> > how much we can or should rely on the SLP optimize pass to handle
> > this.
> 
> Yeah, I think we'll have to play it by ear.  It might involve tweaking
> the order in which we "reduce" the VEC_PERM_EXPRs.  E.g. in the above
> example, my guess is that it would be better to reduce the z/w part
> first and then permute that with y, whereas it looks like the patch
> always goes left-to-right.

The patch reduces the two inputs with the least number of lanes
recursively.  And within that from left-to-right.  That should keep
us in the bound of two input vectors for one output vector.  It
should also resemble classical interleaving when we have N single
lanes.

> The patch LGTM FWIW.

I've sent out a v2 for the CIs and pushed the bugfix parts of the
series.  I hope to see that riscv isn't left with 100s of FAILs
beause of the change and if that looks green push and polish up
what I have for the load side.

> I suppose this does further hard-code the assumption that the vector
> type is uniquely determined by the element type (and so we can safely
> assume that everything has the same vector type as the first split node).
> But that's pretty much pervasive, and not easy to solve until we're
> serious about putting some infrastructre in place for it.  It just
> caught me out when reading vector code for the first time in a while :)
>
> (E.g. in the above example, the y vector could eventually be double the
> z & w vectors.)

Yeah, you might have noticed the RFC patch series I sent out last
year where I tried to get rid of this constraint.  I stopped implementing
when I figured it should work but doing all-SLP first really is
important.

Richard.
 
> Thanks,
> Richard
> 
> > * tree-vect-slp.cc (vect_build_slp_instance): Do not split
> > store dataref groups on loop SLP discovery failure but create
> > a single SLP instance for the stores but branch to SLP sub-trees
> > and merge with a series of VEC_PERM nodes.
> > ---
> >  gcc/tree-vect-slp.cc | 240 ++-
> >  1 file changed, 214 insertions(+), 26 deletions(-)
> >
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 43f2c153bf0..873748b0a72 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -3468,12 +3468,7 @@ vect_build_slp_instance (vec_info *vinfo,
> >   return true;
> > }
> >  }
> > -  else
> > -{
> > -  /* Failed to SLP.  */
> > -  /* Free the allocated memory.  */
> > -  scalar_stmts.rele

[PATCH RFC] c++: add module extensions

2024-05-22 Thread Jason Merrill
Tested x86_64-pc-linux-gnu.  Any thoughts about the mkdeps output?

-- 8< --

There is a trend in the broader C++ community to use a different extension
for module interface units, even though they are compiled in the same way as
other source files.  Let's also support these extensions.

.ixx is the MSVC standard, while the .c*m are supported by Clang.  libc++
standard headers use .cppm, as their other source files use .cpp.
Perhaps libstdc++ will use .ccm for parallel consistency?

One issue with .c++m is that libcpp/mkdeps.cc uses it for the phony
dependencies to express module dependencies, so I'm disabling that one for
now.  We probably want to change the extension that mkdeps uses to something
less likely to be an actual file, say .module? .c++-module?

gcc/cp/ChangeLog:

* lang-specs.h: Add module interface extensions.
---
 gcc/cp/lang-specs.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/cp/lang-specs.h b/gcc/cp/lang-specs.h
index 7a7f5ff0ab5..74b450fd66e 100644
--- a/gcc/cp/lang-specs.h
+++ b/gcc/cp/lang-specs.h
@@ -39,6 +39,14 @@ along with GCC; see the file COPYING3.  If not see
   {".HPP", "@c++-header", 0, 0, 0},
   {".tcc", "@c++-header", 0, 0, 0},
   {".hh",  "@c++-header", 0, 0, 0},
+  /* Module interface unit.  Do we also want a .C counterpart?
+ Skipping .c++m for now at least to avoid conflicts with .PHONY .c++m
+ files in mkdeps.cc output.  */
+  {".ixx", "@c++", 0, 0, 0}, /* MSVC */
+  {".cppm", "@c++", 0, 0, 0}, /* Clang/libc++ */
+  {".cxxm", "@c++", 0, 0, 0},
+  /* {".c++m", "@c++", 0, 0, 0}, */
+  {".ccm", "@c++", 0, 0, 0},
   {"@c++-header",
   "%{E|M|MM:cc1plus -E %{fmodules-ts:-fdirectives-only -fmodule-header}"
   "  %(cpp_options) %2 %(cpp_debug_options)}"

base-commit: 1a5e4dd83788ea4c049d354d83ad58a6a3d747e6
prerequisite-patch-id: 3c000c95725bc74cff0b0e33fac97055caa64e7e
-- 
2.44.0



RE: [PATCH v2] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-22 Thread Li, Pan2
Thanks Richard for comments.

> I think it's more useful to add an overload to types_match with three
> arguments and then use

> (if (INTEGRAL_TYPE_P (type)
>   && types_match (type, TREE_TYPE (@0), TREE_TYPE (@1))

Sure thing, will try to add overloaded types_match here.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v2] Match: Extract integer_types_ternary_match helper to 
avoid code dup [NFC]

On Mon, May 20, 2024 at 1:00 PM  wrote:
>
> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.

I think it's more useful to add an overload to types_match with three
arguments and then use

 (if (INTEGRAL_TYPE_P (type)
  && types_match (type, TREE_TYPE (@0), TREE_TYPE (@1))
...

Richard.

> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (integer_types_ternary_match): New helper
> function to check tenary tree type matches or not.
> * gimple-match-head.cc (integer_types_ternary_match): Ditto but
> for match.
> * match.pd: Leverage above helper function to avoid code dup.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P 
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P 
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..401b52e7573 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (i

Re: [PATCH] rs6000: Don't pass -many to the assembler [PR112868]

2024-05-22 Thread Peter Bergner
On 5/21/24 8:27 AM, jeevitha wrote:
> The following patch has been bootstrapped and regtested with default 
> configuration
> [--enable-checking=yes] and with --enable-checking=release on 
> powerpc64le-linux.
> 
> This patch removes passing the -many assembler option for release builds. Now,
> GCC no longer passes -many under any conditions to the assembler.
> 
> 2024-05-15  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/112868
>   * config/rs6000/rs6000.h (ASM_OPT_ANY): Removed Define.
>   (ASM_CPU_SPEC): Remove ASM_OPT_ANY usage.

You are missing a ChangeLog entry for the target-supports.exp change plus
there is no mention of why it's needed in the git log entry.

Otherwise, the rest LGTM.

Peter




Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-22 Thread Evgeny Karpov
Wednesday, May 22, 2024 1:06 PM
Richard Sandiford  wrote:

> This looks good to me apart from a couple of very minor comments below, but
> please get approval from the x86 maintainers as well.  In particular, they 
> might
> prefer to handle ix86_legitimize_pe_coff_symbol in some other way.

Thanks, Richard, for the review!
The suggestions will be addressed in the next version.

Jan and Uros, could you please review x86 refactoring for mingw part? Thanks.

Regards,
Evgeny



RE: [PATCH v1 1/2] Match: Support __builtin_add_overflow for branchless unsigned SAT_ADD

2024-05-22 Thread Li, Pan2
Thanks Richard for comments, will merge the rest form of .SAT_ADD in one middle 
end patch for fully picture, as well as comments addressing.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow for 
branchless unsigned SAT_ADD

On Sun, May 19, 2024 at 8:37 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the branchless form for unsigned
> SAT_ADD when leverage __builtin_add_overflow.  For example as below:
>
> uint64_t sat_add_u(uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   uint64_t overflow = __builtin_add_overflow (x, y, &ret);
>
>   return (T)(-overflow) | ret;
> }
>
> Before this patch:
>
> uint64_t sat_add_u (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   long unsigned int _3;
>   __complex__ long unsigned int _6;
>   uint64_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _1 = REALPART_EXPR <_6>;
>   _2 = IMAGPART_EXPR <_6>;
>   _3 = -_2;
>   _8 = _1 | _3;
>   return _8;
> ;;succ:   EXIT
>
> }
>
> After this patch:
>
> uint64_t sat_add_u (uint64_t x, uint64_t y)
> {
>   uint64_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _8 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _8;
> ;;succ:   EXIT
>
> }
>
> The below tests suite are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add SAT_ADD right part 2 for __builtin_add_overflow.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index b291e34bbe4..5328e846aff 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3064,6 +3064,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
>   (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
>
> +(match (usadd_right_part_2 @0 @1)
> + (negate (imagpart (IFN_ADD_OVERFLOW:c @0 @1)))
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
> +

Can you merge this with the patch that makes use of the
usadd_right_part_2 match?
It's difficult to review on its own.

>  /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> because the sub part of left_part_2 cannot work with right_part_1.
> For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> --
> 2.34.1
>


Re: [PATCH] rs6000: Don't pass -many to the assembler [PR112868]

2024-05-22 Thread Segher Boessenkool
Hi!

On Wed, May 22, 2024 at 09:29:13AM -0500, Peter Bergner wrote:
> On 5/21/24 8:27 AM, jeevitha wrote:
> > The following patch has been bootstrapped and regtested with default 
> > configuration
> > [--enable-checking=yes] and with --enable-checking=release on 
> > powerpc64le-linux.
> > 
> > This patch removes passing the -many assembler option for release builds. 
> > Now,
> > GCC no longer passes -many under any conditions to the assembler.

Why do we want that?  I cannot read minds.

> You are missing a ChangeLog entry for the target-supports.exp change plus
> there is no mention of why it's needed in the git log entry.

In the commit message you mean?  Yeah.  This info belongs in the commit
message.

Is the target-supports thing that Cell thing?  That does not belong here
at all.  If it wasn't simply a mistake, it should be a separate commit,
with a lot of explanation.


Segher


Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-05-22 Thread Jeff Law




On 5/22/24 5:46 AM, Di Zhao OS wrote:

The test case is for targets that support FMA. Previously
the "target" selector is missed in dg-final command.

Tested on x86_64-pc-linux-gnu.

Thanks
Di Zhao

gcc/testsuite/ChangeLog:

 * gcc.dg/pr110279-1.c: add target selector.
Rather than list targets explicitly in the test, wouldn't it be better 
to have a common routine that could be used in other cases where we have 
a test that requires FMA?


So something similar to check_effective_target_scalar_all_fma?


JEff


Re: [PATCH 4/4] Testsuite updates

2024-05-22 Thread Jeff Law




On 5/22/24 4:58 AM, Richard Biener wrote:



RISC-V CI didn't trigger (not sure what magic is required).  Both
ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit
fragile because we sometimes cancel SLP becuase we want to use
load/store-lanes.

The RISC-V tag on the subject line is the trigger.

Jeff


[x86_64 PATCH] Correct insn_cost of movabsq.

2024-05-22 Thread Roger Sayle
This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
which considers an instruction loading a 64-bit constant to be significantly
cheaper than loading a 32-bit (or smaller) constant.

Consider the two functions:
unsigned long long foo() { return 0x0123456789abcdefULL; }
unsigned int bar() { return 10; }

and the corresponding lines from combine's dump file:
  insn_cost 1 for #: r98:DI=0x123456789abcdef
  insn_cost 4 for #: ax:SI=0xa

The same issue can be seen in -dP assembler output.
  movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4

The problem is that pattern_costs interpretation of rtx_costs contains
"return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
example a register or small immediate constant) is considered special,
and equivalent to a single instruction, but all other values are treated
as verbatim.  Hence to make x86_64's 10-byte long movabsq instruction
slightly more expensive than a simple constant, rtx_costs needs to
return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
of movabsq is the intended value 5:
  insn_cost 5 for #: r98:DI=0x123456789abcdef
and
  movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4


[I'd originally tried fixing this by adding a ix86_insn_cost target
hook, but the testsuite is very sensitive to the costing of insns].


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-05-22  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) :
A CONST_INT that isn't x86_64_immediate_operand requires an extra
(expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b4838b7..b4a9519 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21569,7 +21569,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
   if (x86_64_immediate_operand (x, VOIDmode))
*total = 0;
  else
-   *total = 1;
+   *total = COSTS_N_INSNS (1) + 1;
   return true;
 
 case CONST_DOUBLE:


Re: [PATCH] Fix PR rtl-optimization/115038

2024-05-22 Thread Jeff Law




On 5/20/24 1:13 AM, Eric Botcazou wrote:

Hi,

this is a regression present on mainline and 14 branch under the form of an
ICE in seh_cfa_offset from config/i386/winnt.cc on the attached C++ testcase
compiled with -O2 -fno-omit-frame-pointer.

The problem directly comes from the -ffold-mem-offsets pass messing up with
the prologue and the frame-related instructions, which is a no-no with SEH, so
the fix simply disconnects the pass in these circumstances, the question being
whether this should be done unconditionally as in the fix or only with SEH.

Tested on x86-64/Linux, OK for the mainline and 14 branch?


2024-05-20  Eric Botcazou  

PR rtl-optimization/115038
* fold-mem-offsets.cc (fold_offsets): Return 0 if the defining
instruction of the register is frame related.


2024-05-20  Eric Botcazou  

* g++.dg/opt/fmo1.C: New test.
lol.  I missed that you had already submitted this when I made my 
comment in the PR.


OK for the trunk and gcc-14 branch.

Jeff


Re: [x86_64 PATCH] Correct insn_cost of movabsq.

2024-05-22 Thread Uros Bizjak
On Wed, May 22, 2024 at 5:15 PM Roger Sayle  wrote:
>
> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
> which considers an instruction loading a 64-bit constant to be significantly
> cheaper than loading a 32-bit (or smaller) constant.
>
> Consider the two functions:
> unsigned long long foo() { return 0x0123456789abcdefULL; }
> unsigned int bar() { return 10; }
>
> and the corresponding lines from combine's dump file:
>   insn_cost 1 for #: r98:DI=0x123456789abcdef
>   insn_cost 4 for #: ax:SI=0xa
>
> The same issue can be seen in -dP assembler output.
>   movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4
>
> The problem is that pattern_costs interpretation of rtx_costs contains
> "return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
> example a register or small immediate constant) is considered special,
> and equivalent to a single instruction, but all other values are treated
> as verbatim.  Hence to make x86_64's 10-byte long movabsq instruction
> slightly more expensive than a simple constant, rtx_costs needs to
> return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
> of movabsq is the intended value 5:
>   insn_cost 5 for #: r98:DI=0x123456789abcdef
> and
>   movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4
>
>
> [I'd originally tried fixing this by adding a ix86_insn_cost target
> hook, but the testsuite is very sensitive to the costing of insns].
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-05-22  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs) :
> A CONST_INT that isn't x86_64_immediate_operand requires an extra
> (expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.

1 of 20,796

[x86_64 PATCH] Correct insn_cost of movabsq.

Inbox

Roger Sayle

5:15 PM (12 minutes ago)


to gcc-patches, me
This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
which considers an instruction loading a 64-bit constant to be significantly
cheaper than loading a 32-bit (or smaller) constant.

Consider the two functions:
unsigned long long foo() { return 0x0123456789abcdefULL; }
unsigned int bar() { return 10; }

and the corresponding lines from combine's dump file:
  insn_cost 1 for #: r98:DI=0x123456789abcdef
  insn_cost 4 for #: ax:SI=0xa

The same issue can be seen in -dP assembler output.
  movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4

The problem is that pattern_costs interpretation of rtx_costs contains
"return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
example a register or small immediate constant) is considered special,
and equivalent to a single instruction, but all other values are treated
as verbatim.  Hence to make x86_64's 10-byte long movabsq instruction
slightly more expensive than a simple constant, rtx_costs needs to
return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
of movabsq is the intended value 5:
  insn_cost 5 for #: r98:DI=0x123456789abcdef
and
  movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4


[I'd originally tried fixing this by adding a ix86_insn_cost target
hook, but the testsuite is very sensitive to the costing of insns].


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-05-22  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) :
A CONST_INT that isn't x86_64_immediate_operand requires an extra
(expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.


Thanks in advance,
Roger
--


One attachment • Scanned by Gmail


Roger Sayle (nextmovesoftware.com), gcc-patches@gcc.gnu.org


On Wed, May 22, 2024 at 5:15 PM Roger Sayle  wrote:
>
> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
> which considers an instruction loading a 64-bit constant to be significantly
> cheaper than loading a 32-bit (or smaller) constant.
>
> Consider the two functions:
> unsigned long long foo() { return 0x0123456789abcdefULL; }
> unsigned int bar() { return 10; }
>
> and the corresponding lines from combine's dump file:
>   insn_cost 1 for #: r98:DI=0x123456789abcdef
>   insn_cost 4 for #: ax:SI=0xa
>
> The same issue can be seen in -dP assembler output.
>   movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4
>
> The problem is that pattern_costs interpretation of rtx_costs contains
> "return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
> example a register or small immediate constant) is considered special,
> and equivalent to a single instruction, but all other values are treated
> as verbatim.  Hence to make x86_64's 10-

Re: [PATCH] Fix auto deduction for template specialization scopes [114915].

2024-05-22 Thread Jason Merrill

Thanks for the patch!

Please review https://gcc.gnu.org/contribute.html for more details of 
the format patches should have.  In particular, you don't seem to have a 
copyright assignment on file with the FSF, so you'll need to either do 
that or certify that the contribution is under the DCO.


Also, you need a component tag (c++:) in the subject line, and ChangeLog 
entries in the commit message.  Note what contribute.html says about git 
gcc-commit-mklog, which makes that a lot simpler.


On 5/1/24 18:52, Seyed Sajad Kahani wrote:

When deducing auto for `adc_return_type`, `adc_variable_type`, and 
`adc_decomp_type` contexts (at the usage time), we try to resolve the outermost 
template arguments to be used for satisfaction. This is done by one of the 
following, depending on the scope:

1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and 
extracting DECL_TI_ARGS from it for function scope deductions (pt.cc:31236).
2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with other 
conditions) for non-function scope variable declaration deductions 
(decl.cc:8527).

Then, we do not retrieve the deeper layers of the template arguments; instead, 
we fill the missing levels with dummy levels (pt.cc:31260).

The problem (that is shown in PR114915) is that we do not consider the case 
where the deduction happens in a template specialization scope. In this case, 
the type is not dependent on the outermost template arguments (which are the 
specialization arguments). Yet, we still resolve the outermost template 
arguments, and then the number of layers in the template arguments exceeds the 
number of levels in the type. This causes the missing levels to be negative. 
This leads to the rejection of valid code and ICEs (like segfault) in the 
release mode. In the debug mode, it is possible to show as an assertion failure 
(when creating a tree_vec with a negative size).
The code that generates the issue is added to the test suite as 
`g++.dg/cpp2a/concepts-placeholder14.C`.


This testcase could use more cases, like variable template 
specialization (both full and partial) and member functions where not 
all enclosing classes are fully specialized.



This patch fixes the issue by checking that the template usage, whose arguments 
are going to be used for satisfaction, is not a partial or explicit 
specialization (and therefore it is an implicit or explicit instantiation). 
This check is done in the two only places that affect the `outer_targs` for the 
mentioned contexts.


It seems like we want a function to use instead of DECL_TI_ARGS to get 
the args for parameters that are actually in scope in the definition 
that we're substituting into.  In the case of a full specialization, 
that would be NULL_TREE, but it's more complicated for partial 
specializations.


This function should probably go after outer_template_args in pt.cc.


One might ask why this is not implemented as a simple `missing_level > 0` 
check. The reason is that the recovery from the negative `missing_levels` will not 
be easy, and it is not clear how to recover from it. Therefore, it is better to 
prevent it from happening.


But you still have that check in the patch.  Would it be better as an 
assert?


Thanks,
Jason



Re: [PATCH] Fix auto deduction for template specialization scopes [114915].

2024-05-22 Thread Patrick Palka
On Wed, 22 May 2024, Jason Merrill wrote:

> Thanks for the patch!
> 
> Please review https://gcc.gnu.org/contribute.html for more details of the
> format patches should have.  In particular, you don't seem to have a copyright
> assignment on file with the FSF, so you'll need to either do that or certify
> that the contribution is under the DCO.
> 
> Also, you need a component tag (c++:) in the subject line, and ChangeLog
> entries in the commit message.  Note what contribute.html says about git
> gcc-commit-mklog, which makes that a lot simpler.
> 
> On 5/1/24 18:52, Seyed Sajad Kahani wrote:
> > When deducing auto for `adc_return_type`, `adc_variable_type`, and
> > `adc_decomp_type` contexts (at the usage time), we try to resolve the
> > outermost template arguments to be used for satisfaction. This is done by
> > one of the following, depending on the scope:
> > 
> > 1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and
> > extracting DECL_TI_ARGS from it for function scope deductions (pt.cc:31236).
> > 2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with
> > other conditions) for non-function scope variable declaration deductions
> > (decl.cc:8527).
> > 
> > Then, we do not retrieve the deeper layers of the template arguments;
> > instead, we fill the missing levels with dummy levels (pt.cc:31260).
> > 
> > The problem (that is shown in PR114915) is that we do not consider the case
> > where the deduction happens in a template specialization scope. In this
> > case, the type is not dependent on the outermost template arguments (which
> > are the specialization arguments). Yet, we still resolve the outermost
> > template arguments, and then the number of layers in the template arguments
> > exceeds the number of levels in the type. This causes the missing levels to
> > be negative. This leads to the rejection of valid code and ICEs (like
> > segfault) in the release mode. In the debug mode, it is possible to show as
> > an assertion failure (when creating a tree_vec with a negative size).
> > The code that generates the issue is added to the test suite as
> > `g++.dg/cpp2a/concepts-placeholder14.C`.
> 
> This testcase could use more cases, like variable template specialization
> (both full and partial) and member functions where not all enclosing classes
> are fully specialized.

Note I think the latest version of the patch is
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651805.html
which has more test coverage and takes a more context oblivious approach
that keeps the innermost arguments if there's an excess, based on some
earlier discussion e.g.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650834.html
This should do the right thing at least until we implement explicit
specializations in template scope (CWG 727)

> 
> > This patch fixes the issue by checking that the template usage, whose
> > arguments are going to be used for satisfaction, is not a partial or
> > explicit specialization (and therefore it is an implicit or explicit
> > instantiation). This check is done in the two only places that affect the
> > `outer_targs` for the mentioned contexts.
> 
> It seems like we want a function to use instead of DECL_TI_ARGS to get the
> args for parameters that are actually in scope in the definition that we're
> substituting into.  In the case of a full specialization, that would be
> NULL_TREE, but it's more complicated for partial specializations.
> 
> This function should probably go after outer_template_args in pt.cc.
> 
> > One might ask why this is not implemented as a simple `missing_level > 0`
> > check. The reason is that the recovery from the negative `missing_levels`
> > will not be easy, and it is not clear how to recover from it. Therefore, it
> > is better to prevent it from happening.
> 
> But you still have that check in the patch.  Would it be better as an assert?
> 
> Thanks,
> Jason
> 
> 



Re: [PATCH] Fix auto deduction for template specialization scopes [114915].

2024-05-22 Thread Jason Merrill

On 5/22/24 12:48, Patrick Palka wrote:

On Wed, 22 May 2024, Jason Merrill wrote:


Thanks for the patch!

Please review https://gcc.gnu.org/contribute.html for more details of the
format patches should have.  In particular, you don't seem to have a copyright
assignment on file with the FSF, so you'll need to either do that or certify
that the contribution is under the DCO.

Also, you need a component tag (c++:) in the subject line, and ChangeLog
entries in the commit message.  Note what contribute.html says about git
gcc-commit-mklog, which makes that a lot simpler.

On 5/1/24 18:52, Seyed Sajad Kahani wrote:

When deducing auto for `adc_return_type`, `adc_variable_type`, and
`adc_decomp_type` contexts (at the usage time), we try to resolve the
outermost template arguments to be used for satisfaction. This is done by
one of the following, depending on the scope:

1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and
extracting DECL_TI_ARGS from it for function scope deductions (pt.cc:31236).
2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with
other conditions) for non-function scope variable declaration deductions
(decl.cc:8527).

Then, we do not retrieve the deeper layers of the template arguments;
instead, we fill the missing levels with dummy levels (pt.cc:31260).

The problem (that is shown in PR114915) is that we do not consider the case
where the deduction happens in a template specialization scope. In this
case, the type is not dependent on the outermost template arguments (which
are the specialization arguments). Yet, we still resolve the outermost
template arguments, and then the number of layers in the template arguments
exceeds the number of levels in the type. This causes the missing levels to
be negative. This leads to the rejection of valid code and ICEs (like
segfault) in the release mode. In the debug mode, it is possible to show as
an assertion failure (when creating a tree_vec with a negative size).
The code that generates the issue is added to the test suite as
`g++.dg/cpp2a/concepts-placeholder14.C`.


This testcase could use more cases, like variable template specialization
(both full and partial) and member functions where not all enclosing classes
are fully specialized.


Note I think the latest version of the patch is
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651805.html


Oops, thanks!


which has more test coverage and takes a more context oblivious approach
that keeps the innermost arguments if there's an excess, based on some
earlier discussion e.g.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650834.html
This should do the right thing at least until we implement explicit
specializations in template scope (CWG 727)




This patch fixes the issue by checking that the template usage, whose
arguments are going to be used for satisfaction, is not a partial or
explicit specialization (and therefore it is an implicit or explicit
instantiation). This check is done in the two only places that affect the
`outer_targs` for the mentioned contexts.


It seems like we want a function to use instead of DECL_TI_ARGS to get the
args for parameters that are actually in scope in the definition that we're
substituting into.  In the case of a full specialization, that would be
NULL_TREE, but it's more complicated for partial specializations.

This function should probably go after outer_template_args in pt.cc.


One might ask why this is not implemented as a simple `missing_level > 0`
check. The reason is that the recovery from the negative `missing_levels`
will not be easy, and it is not clear how to recover from it. Therefore, it
is better to prevent it from happening.


But you still have that check in the patch.  Would it be better as an assert?

Thanks,
Jason








Re: [x86_64 PATCH] Correct insn_cost of movabsq.

2024-05-22 Thread Richard Biener



> Am 22.05.2024 um 17:30 schrieb Uros Bizjak :
> 
> On Wed, May 22, 2024 at 5:15 PM Roger Sayle  
> wrote:
>> 
>> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
>> which considers an instruction loading a 64-bit constant to be significantly
>> cheaper than loading a 32-bit (or smaller) constant.
>> 
>> Consider the two functions:
>> unsigned long long foo() { return 0x0123456789abcdefULL; }
>> unsigned int bar() { return 10; }
>> 
>> and the corresponding lines from combine's dump file:
>>  insn_cost 1 for #: r98:DI=0x123456789abcdef
>>  insn_cost 4 for #: ax:SI=0xa
>> 
>> The same issue can be seen in -dP assembler output.
>>  movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4
>> 
>> The problem is that pattern_costs interpretation of rtx_costs contains
>> "return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
>> example a register or small immediate constant) is considered special,
>> and equivalent to a single instruction, but all other values are treated
>> as verbatim.

A zero cost is interpreted as „not implemented“ and assigned a cost of 1, 
assuming a COSTS_N_INSNS basing.
IMO a bit bogus but I didn’t dare to argue further with Segher.

Richard 


>>  Hence to make x86_64's 10-byte long movabsq instruction
>> slightly more expensive than a simple constant, rtx_costs needs to
>> return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
>> of movabsq is the intended value 5:
>>  insn_cost 5 for #: r98:DI=0x123456789abcdef
>> and
>>  movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4
>> 
>> 
>> [I'd originally tried fixing this by adding a ix86_insn_cost target
>> hook, but the testsuite is very sensitive to the costing of insns].
>> 
>> 
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check, both with and without --target_board=unix{-m32}
>> with no new failures.  Ok for mainline?
>> 
>> 
>> 2024-05-22  Roger Sayle  
>> 
>> gcc/ChangeLog
>>* config/i386/i386.cc (ix86_rtx_costs) :
>>A CONST_INT that isn't x86_64_immediate_operand requires an extra
>>(expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.
> 
> 1 of 20,796
> 
> [x86_64 PATCH] Correct insn_cost of movabsq.
> 
> Inbox
> 
> Roger Sayle
> 
> 5:15 PM (12 minutes ago)
> 
> 
> to gcc-patches, me
> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
> which considers an instruction loading a 64-bit constant to be significantly
> cheaper than loading a 32-bit (or smaller) constant.
> 
> Consider the two functions:
> unsigned long long foo() { return 0x0123456789abcdefULL; }
> unsigned int bar() { return 10; }
> 
> and the corresponding lines from combine's dump file:
>  insn_cost 1 for #: r98:DI=0x123456789abcdef
>  insn_cost 4 for #: ax:SI=0xa
> 
> The same issue can be seen in -dP assembler output.
>  movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4
> 
> The problem is that pattern_costs interpretation of rtx_costs contains
> "return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
> example a register or small immediate constant) is considered special,
> and equivalent to a single instruction, but all other values are treated
> as verbatim.  Hence to make x86_64's 10-byte long movabsq instruction
> slightly more expensive than a simple constant, rtx_costs needs to
> return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
> of movabsq is the intended value 5:
>  insn_cost 5 for #: r98:DI=0x123456789abcdef
> and
>  movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4
> 
> 
> [I'd originally tried fixing this by adding a ix86_insn_cost target
> hook, but the testsuite is very sensitive to the costing of insns].
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 
> 2024-05-22  Roger Sayle  
> 
> gcc/ChangeLog
>* config/i386/i386.cc (ix86_rtx_costs) :
>A CONST_INT that isn't x86_64_immediate_operand requires an extra
>(expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.
> 
> 
> Thanks in advance,
> Roger
> --
> 
> 
> One attachment • Scanned by Gmail
> 
> 
> Roger Sayle (nextmovesoftware.com), gcc-patches@gcc.gnu.org
> 
> 
>> On Wed, May 22, 2024 at 5:15 PM Roger Sayle  
>> wrote:
>> 
>> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
>> which considers an instruction loading a 64-bit constant to be significantly
>> cheaper than loading a 32-bit (or smaller) constant.
>> 
>> Consider the two functions:
>> unsigned long long foo() { return 0x0123456789abcdefULL; }
>> unsigned int bar() { return 10; }
>> 
>> and the corresponding lines from combine's dump file:
>>  insn_cost 1 for #: r98:DI=0x123456789abcdef
>>  insn_cost 4 for #: ax:SI=0xa
>> 
>> The same issue 

Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jeff Law




On 5/22/24 6:47 AM, Jivan Hakobyan wrote:
After 8367c996e55b2 commit several checks on round_32.c test started to 
fail.

The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test
I wonder if this test even makes sense for rv32 anymore given we can't 
do a DF->DI as a single instruction and DF->SI is going to give 
incorrect results.  So the underlying optimization to improve those 
rounding cases just doesn't apply to DF mode objects for rv32.


Thoughts?
Jeff



Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Palmer Dabbelt

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the 
float roundings on rv32?


I think with Zfa we'd also have testable sequences for the double/double 
and float/float roundings, which could be useful to test.  I'm not 
entirely sure there, though, as I always get a bit lost in which FP 
rounding flavors map down.


I'd also kicked off some run trying to promote these to executable 
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?) 
but I don't know where I stashed the results...



Jeff


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-22 Thread Qing Zhao


> On May 22, 2024, at 03:38, Richard Biener  wrote:
> 
> On Tue, May 21, 2024 at 11:36 PM David Malcolm  wrote:
>> 
>> On Tue, 2024-05-21 at 15:13 +, Qing Zhao wrote:
>>> Thanks for the comments and suggestions.
>>> 
 On May 15, 2024, at 10:00, David Malcolm 
 wrote:
 
 On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> On Mon, 13 May 2024, Qing Zhao wrote:
> 
>> -Warray-bounds is an important option to enable linux kernal to
>> keep
>> the array out-of-bound errors out of the source tree.
>> 
>> However, due to the false positive warnings reported in
>> PR109071
>> (-Warray-bounds false positive warnings due to code duplication
>> from
>> jump threading), -Warray-bounds=1 cannot be added on by
>> default.
>> 
>> Although it's impossible to elinimate all the false positive
>> warnings
>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>> documentation says "always out of bounds"), we should minimize
>> the
>> false positive warnings in -Warray-bounds=1.
>> 
>> The root reason for the false positive warnings reported in
>> PR109071 is:
>> 
>> When the thread jump optimization tries to reduce the # of
>> branches
>> inside the routine, sometimes it needs to duplicate the code
>> and
>> split into two conditional pathes. for example:
>> 
>> The original code:
>> 
>> void sparx5_set (int * ptr, struct nums * sg, int index)
>> {
>>  if (index >= 4)
>>warn ();
>>  *ptr = 0;
>>  *val = sg->vals[index];
>>  if (index >= 4)
>>warn ();
>>  *ptr = *val;
>> 
>>  return;
>> }
>> 
>> With the thread jump, the above becomes:
>> 
>> void sparx5_set (int * ptr, struct nums * sg, int index)
>> {
>>  if (index >= 4)
>>{
>>  warn ();
>>  *ptr = 0; // Code duplications since "warn" does
>> return;
>>  *val = sg->vals[index];   // same this line.
>>// In this path, since it's
>> under
>> the condition
>>// "index >= 4", the compiler
>> knows
>> the value
>>// of "index" is larger then 4,
>> therefore the
>>// out-of-bound warning.
>>  warn ();
>>}
>>  else
>>{
>>  *ptr = 0;
>>  *val = sg->vals[index];
>>}
>>  *ptr = *val;
>>  return;
>> }
>> 
>> We can see, after the thread jump optimization, the # of
>> branches
>> inside
>> the routine "sparx5_set" is reduced from 2 to 1, however,  due
>> to
>> the
>> code duplication (which is needed for the correctness of the
>> code),
>> we
>> got a false positive out-of-bound warning.
>> 
>> In order to eliminate such false positive out-of-bound warning,
>> 
>> A. Add one more flag for GIMPLE: is_splitted.
>> B. During the thread jump optimization, when the basic blocks
>> are
>>   duplicated, mark all the STMTs inside the original and
>> duplicated
>>   basic blocks as "is_splitted";
>> C. Inside the array bound checker, add the following new
>> heuristic:
>> 
>> If
>>   1. the stmt is duplicated and splitted into two conditional
>> paths;
>> +  2. the warning level < 2;
>> +  3. the current block is not dominating the exit block
>> Then not report the warning.
>> 
>> The false positive warnings are moved from -Warray-bounds=1 to
>> -Warray-bounds=2 now.
>> 
>> Bootstrapped and regression tested on both x86 and aarch64.
>> adjusted
>> -Warray-bounds-61.c due to the false positive warnings.
>> 
>> Let me know if you have any comments and suggestions.
> 
> At the last Cauldron I talked with David Malcolm about these kind
> of
> issues and thought of instead of suppressing diagnostics to
> record
> how a block was duplicated.  For jump threading my idea was to
> record
> the condition that was proved true when entering the path and do
> this
> by recording the corresponding locations
>>> 
>>> Is only recording the location for the TRUE path  enough?
>>> We might need to record the corresponding locations for both TRUE and
>>> FALSE paths since the VRP might be more accurate on both paths.
>>> Is only recording the location is enough?
>>> Do we need to record the pointer to the original condition stmt?
>> 
>> Just to be clear: I don't plan to work on this myself (I have far too
>> much already to work on...); I'm assuming Richard Biener is
>> hoping/planning to implement this himself.
> 
> While I think some of this might be an improvement to those vast
> number of "false positive" diagnostics we have from (too) late diagnostic
> passes I do not have the cycles to work on this.

I can study a 

Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jeff Law




On 5/22/24 12:15 PM, Palmer Dabbelt wrote:

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the float 
roundings on rv32?
I initially thought that as well.  The problem is we don't have a DF->DI 
conversion instruction for rv32.  We can't use DF->SI as the range of 
representable values is wrong.





I think with Zfa we'd also have testable sequences for the double/double 
and float/float roundings, which could be useful to test.  I'm not 
entirely sure there, though, as I always get a bit lost in which FP 
rounding flavors map down.
Zfa is a different story as it has instructions with the proper 
semantics ;-)  We'd just emit those new instructions and wouldn't have 
to worry about the initial range test.





I'd also kicked off some run trying to promote these to executable 
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?) 
but I don't know where I stashed the results...

Not a bad idea, particularly if we test the border cases.

jeff



Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Palmer Dabbelt

On Wed, 22 May 2024 12:02:26 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 12:15 PM, Palmer Dabbelt wrote:

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the float
roundings on rv32?

I initially thought that as well.  The problem is we don't have a DF->DI
conversion instruction for rv32.  We can't use DF->SI as the range of
representable values is wrong.


Ya, right.  I guess we'd need to be calling roundf(), not round(), for 
those?  So maybe we should adjust the tests to do that?



I think with Zfa we'd also have testable sequences for the double/double
and float/float roundings, which could be useful to test.  I'm not
entirely sure there, though, as I always get a bit lost in which FP
rounding flavors map down.

Zfa is a different story as it has instructions with the proper
semantics ;-)  We'd just emit those new instructions and wouldn't have
to worry about the initial range test.


and I guess that'd just be an entirely different set of scan-assembly 
sets than round_32 or round_64, so maybe it's not a reason to keep these 
around.



I'd also kicked off some run trying to promote these to executable
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?)
but I don't know where I stashed the results...

Not a bad idea, particularly if we test the border cases.


Ya, makes sense -- I guess the current values aren't that exciting for 
execution, but we could just add some more interesting ones...



jeff


Re: [PATCH v2] testsuite: Verify r0-r3 are extended with CMSE

2024-05-22 Thread Torbjorn SVENSSON

Hi,

I've now pushed the below change to the following branches with the 
corresponding commit id.


trunk: 9ddad76e98ac8f257f90b3814ed3c6ba78d0f3c7
releases/gcc-14: da3a6b0dda45bc676bb985d7940853b50803e11a
releases/gcc-13: 75d394c20b0ad85dfe8511324d61d13e453c9285
releases/gcc-12: d9c89402b54be4c15bb3c7bcce3465f534746204
releases/gcc-11: 08ca81e4b49bda153d678a372df7f7143a94f4ad

Kind regards,
Torbjörn


On 2024-05-22 13:54, Richard Earnshaw (lists) wrote:

On 22/05/2024 12:14, Torbjorn SVENSSON wrote:

Hello Richard,

Thanks for the reply.

 From my point of view, at least the -fshort-enums part should be on all 
branches. Just to be clean, maybe it's easier to backport the entire patch?


Yes, that's a fair point.  I was only thinking about the broadening of the test 
to the other argument registers when I said that.

So, just to be clear, OK all.

R.



Unless you have an objection, I would like to go ahead and just backport it to 
all branches.

Kind regards,
Torbjörn

On 2024-05-22 12:55, Richard Earnshaw (lists) wrote:

On 06/05/2024 12:50, Torbjorn SVENSSON wrote:

Hi,

Forgot to mention when I sent the patch that I would like to commit it to the 
following branches:

- releases/gcc-11
- releases/gcc-12
- releases/gcc-13
- releases/gcc-14
- trunk



Well you can [commit it to the release branches], but I'm not sure it's 
essential.  It seems pretty unlikely to me that this would regress on a release 
branch without having first regressed on trunk.

R.


Kind regards,
Torbjörn

On 2024-05-02 12:50, Torbjörn SVENSSON wrote:

Add regression test to the existing zero/sign extend tests for CMSE to
verify that r0, r1, r2 and r3 are properly extended, not just r0.

boolCharShortEnumSecureFunc test is done using -O0 to ensure the
instructions are in a predictable order.

gcc/testsuite/ChangeLog:

  * gcc.target/arm/cmse/extend-param.c: Add regression test. Add
    -fshort-enums.
  * gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.

Signed-off-by: Torbjörn SVENSSON 
---
    .../gcc.target/arm/cmse/extend-param.c    | 21 +++
    .../gcc.target/arm/cmse/extend-return.c   |  4 ++--
    2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
index 01fac786238..d01ef87e0be 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
@@ -1,5 +1,5 @@
    /* { dg-do compile } */
-/* { dg-options "-mcmse" } */
+/* { dg-options "-mcmse -fshort-enums" } */
    /* { dg-final { check-function-bodies "**" "" "" } } */
      #include 
@@ -78,7 +78,6 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
(enum offset index) {
  if (index >= ARRAY_SIZE)
    return 0;
  return array[index];
-
    }
      /*
@@ -88,9 +87,23 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
(enum offset index) {
    **    ...
    */
    __attribute__((cmse_nonsecure_entry)) char boolSecureFunc (bool index) {
-
  if (index >= ARRAY_SIZE)
    return 0;
  return array[index];
+}
    -}
\ No newline at end of file
+/*
+**__acle_se_boolCharShortEnumSecureFunc:
+**    ...
+**    uxtb    r0, r0
+**    uxtb    r1, r1
+**    uxth    r2, r2
+**    uxtb    r3, r3
+**    ...
+*/
+__attribute__((cmse_nonsecure_entry,optimize(0))) char 
boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, enum 
offset d) {
+  size_t index = a + b + c + d;
+  if (index >= ARRAY_SIZE)
+    return 0;
+  return array[index];
+}
diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
index cf731ed33df..081de0d699f 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
@@ -1,5 +1,5 @@
    /* { dg-do compile } */
-/* { dg-options "-mcmse" } */
+/* { dg-options "-mcmse -fshort-enums" } */
    /* { dg-final { check-function-bodies "**" "" "" } } */
      #include 
@@ -89,4 +89,4 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
(ns_enum_foo_t * ns_foo_p)
    unsigned char boolNonsecure0 (ns_bool_foo_t * ns_foo_p)
    {
  return ns_foo_p ();
-}
\ No newline at end of file
+}






Re: [PATCH] aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-22 Thread Andrew Pinski
On Wed, May 22, 2024 at 5:28 AM Richard Sandiford
 wrote:
>
> Pengxuan Zheng  writes:
> > This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold 
> > vget_high_*
> > intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from
> > arm_neon.h to use the new intrinsics framework.
> >
> >   PR target/102171
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-builtins.cc 
> > (AARCH64_SIMD_VGET_HIGH_BUILTINS):
> >   New macro to create definitions for all vget_high intrinsics.
> >   (VGET_HIGH_BUILTIN): Likewise.
> >   (enum aarch64_builtins): Add vget_high function codes.
> >   (AARCH64_SIMD_VGET_LOW_BUILTINS): Delete duplicate macro.
> >   (aarch64_general_fold_builtin): Fold vget_high calls.
> >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_high builtins.
> >   * config/aarch64/aarch64-simd.md (aarch64_get_high): Delete.
> >   (aarch64_vget_hi_halfv8bf): Likewise.
> >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> >   (vget_high_f16): Likewise.
> >   (vget_high_f32): Likewise.
> >   (vget_high_f64): Likewise.
> >   (vget_high_p8): Likewise.
> >   (vget_high_p16): Likewise.
> >   (vget_high_p64): Likewise.
> >   (vget_high_s8): Likewise.
> >   (vget_high_s16): Likewise.
> >   (vget_high_s32): Likewise.
> >   (vget_high_s64): Likewise.
> >   (vget_high_u8): Likewise.
> >   (vget_high_u16): Likewise.
> >   (vget_high_u32): Likewise.
> >   (vget_high_u64): Likewise.
> >   (vget_high_bf16): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/vget_high_2.c: New test.
> >   * gcc.target/aarch64/vget_high_2_be.c: New test.
>
> OK, thanks.

Pushed as r15-778-g1d1ef1c22752b3 .

Thanks,
Andrew


>
> Richard
>
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  59 +++---
> >  gcc/config/aarch64/aarch64-simd-builtins.def  |   6 -
> >  gcc/config/aarch64/aarch64-simd.md|  22 
> >  gcc/config/aarch64/arm_neon.h | 105 --
> >  .../gcc.target/aarch64/vget_high_2.c  |  30 +
> >  .../gcc.target/aarch64/vget_high_2_be.c   |  31 ++
> >  6 files changed, 104 insertions(+), 149 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2_be.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 11b888016ed..f8eeccb554d 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -675,6 +675,23 @@ static aarch64_simd_builtin_datum 
> > aarch64_simd_builtin_data[] = {
> >VGET_LOW_BUILTIN(u64) \
> >VGET_LOW_BUILTIN(bf16)
> >
> > +#define AARCH64_SIMD_VGET_HIGH_BUILTINS \
> > +  VGET_HIGH_BUILTIN(f16) \
> > +  VGET_HIGH_BUILTIN(f32) \
> > +  VGET_HIGH_BUILTIN(f64) \
> > +  VGET_HIGH_BUILTIN(p8) \
> > +  VGET_HIGH_BUILTIN(p16) \
> > +  VGET_HIGH_BUILTIN(p64) \
> > +  VGET_HIGH_BUILTIN(s8) \
> > +  VGET_HIGH_BUILTIN(s16) \
> > +  VGET_HIGH_BUILTIN(s32) \
> > +  VGET_HIGH_BUILTIN(s64) \
> > +  VGET_HIGH_BUILTIN(u8) \
> > +  VGET_HIGH_BUILTIN(u16) \
> > +  VGET_HIGH_BUILTIN(u32) \
> > +  VGET_HIGH_BUILTIN(u64) \
> > +  VGET_HIGH_BUILTIN(bf16)
> > +
> >  typedef struct
> >  {
> >const char *name;
> > @@ -717,6 +734,9 @@ typedef struct
> >  #define VGET_LOW_BUILTIN(A) \
> >AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> >
> > +#define VGET_HIGH_BUILTIN(A) \
> > +  AARCH64_SIMD_BUILTIN_VGET_HIGH_##A,
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > @@ -753,6 +773,7 @@ enum aarch64_builtins
> >/* SIMD intrinsic builtins.  */
> >AARCH64_SIMD_VREINTERPRET_BUILTINS
> >AARCH64_SIMD_VGET_LOW_BUILTINS
> > +  AARCH64_SIMD_VGET_HIGH_BUILTINS
> >/* ARMv8.3-A Pointer Authentication Builtins.  */
> >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > @@ -855,26 +876,21 @@ static aarch64_fcmla_laneq_builtin_datum 
> > aarch64_fcmla_lane_builtin_data[] = {
> > false \
> >},
> >
> > -#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > -  VGET_LOW_BUILTIN(f16) \
> > -  VGET_LOW_BUILTIN(f32) \
> > -  VGET_LOW_BUILTIN(f64) \
> > -  VGET_LOW_BUILTIN(p8) \
> > -  VGET_LOW_BUILTIN(p16) \
> > -  VGET_LOW_BUILTIN(p64) \
> > -  VGET_LOW_BUILTIN(s8) \
> > -  VGET_LOW_BUILTIN(s16) \
> > -  VGET_LOW_BUILTIN(s32) \
> > -  VGET_LOW_BUILTIN(s64) \
> > -  VGET_LOW_BUILTIN(u8) \
> > -  VGET_LOW_BUILTIN(u16) \
> > -  VGET_LOW_BUILTIN(u32) \
> > -  VGET_LOW_BUILTIN(u64) \
> > -  VGET_LOW_BUILTIN(bf16)
> > +#undef VGET_HIGH_BUILTIN
> > +#define VGET_HIGH_BUILTIN(A) \
> > +  {"vget_high_" #A, \
> > +   AARCH64_SIMD_BUILTIN_VGET_HIGH_##A, \
> > +   2, \
> > +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> > +   { SIMD_INTR_QUAL(A), SIMD_INTR_QUA

Re: [PATCH v4] c++: fix constained auto deduction in templ spec scopes [PR114915]

2024-05-22 Thread Jason Merrill

OK, on the right patch this time I hope.

Looks like you still need either FSF copyright assignment or DCO 
certification per https://gcc.gnu.org/contribute.html#legal


On 5/15/24 13:27, Seyed Sajad Kahani wrote:

This patch resolves PR114915 by replacing the logic that fills in the
missing levels in do_auto_deduction in cp/pt.cc.


I miss the text in your original patch that explained the problem more.


The new approach now trims targs if the depth of targs is deeper than desired
(this will only happen in specific contexts), and still fills targs with empty
layers if it has fewer depths than expected.

PR c++/114915


This line needs to start with a tab.


gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): Handle excess outer template
arguments during constrained auto satisfaction.


This one, too.  These issues are flagged by git gcc-verify, and are 
easier to avoid with git gcc-commit-mklog.



gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-placeholder14.C: New test.
* g++.dg/cpp2a/concepts-placeholder15.C: New test.


This test still needs a variable template partial specialization.

A few coding style nits below.


* g++.dg/cpp2a/concepts-placeholder16.C: New test.
---
  gcc/cp/pt.cc  | 20 ---
  .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +++
  .../g++.dg/cpp2a/concepts-placeholder15.C | 15 +
  .../g++.dg/cpp2a/concepts-placeholder16.C | 33 +++
  4 files changed, 83 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 32640f8e9..ecfda67aa 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -31253,6 +31253,19 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
full_targs = add_outermost_template_args (tmpl, full_targs);
full_targs = add_to_template_args (full_targs, targs);
  
+  int want = TEMPLATE_TYPE_ORIG_LEVEL (auto_node);

+  int have = TMPL_ARGS_DEPTH (full_targs);
+
+  if (want < have)
+   {
+ // if a constrained auto is declared in an explicit specialization


We generally use C-style /* */ comments, that start with a capital 
letter and end with a period.



+ gcc_assert (context == adc_variable_type || context == adc_return_type
+ || context == adc_decomp_type);


The || should line up with the 'c' on the previous line.


+ tree trimmed_full_args = get_innermost_template_args
+   (full_targs, want);


We try to avoid having arguments to the left of the function name; here 
I'd start the new line with the = instead.



+ full_targs = trimmed_full_args;
+   }
+


Unnecessary tab on this line.


/* HACK: Compensate for callers not always communicating all levels of
 outer template arguments by filling in the outermost missing levels
 with dummy levels before checking satisfaction.  We'll still crash
@@ -31260,11 +31273,10 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
 these missing levels, but this hack otherwise allows us to handle a
 large subset of possible constraints (including all non-dependent
 constraints).  */
-  if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL (auto_node)
-   - TMPL_ARGS_DEPTH (full_targs)))
+  if (want > have)
{
- tree dummy_levels = make_tree_vec (missing_levels);
- for (int i = 0; i < missing_levels; ++i)
+ tree dummy_levels = make_tree_vec (want - have);
+ for (int i = 0; i < want - have; ++i)
TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0);
  full_targs = add_to_template_args (dummy_levels, full_targs);
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
new file mode 100644
index 0..fcdbd7608
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
@@ -0,0 +1,19 @@
+// PR c++/114915
+// { dg-do compile { target c++20 } }
+
+template
+concept C = __is_same(T, int);
+
+template
+void f() {
+}
+
+template<>
+void f() {
+  C auto x = 1;
+}
+
+int main() {
+  f();
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C
new file mode 100644
index 0..b4f73f407
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C
@@ -0,0 +1,15 @@
+// PR c++/114915
+// { dg-do compile { target c++20 } }
+
+template
+concept C = __is_same(T, U);
+
+template
+int x = 0;
+
+template<>
+C auto x = 1.0;
+
+int main() {
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C
new file mode

Re: [PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-22 Thread Peter Bergner
On 5/19/24 10:28 PM, HAO CHEN GUI wrote:
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
> +  DONE;
> +})

Is there a reason not to use the vsx_register_operand predicate for op1
which matches the predicate for the operand of the xststdcp pattern
we're passing op1 to?

Ditto for the other optab patches you've submitted.

Peter



[committed] libstdc++: Guard use of sized deallocation [PR114940]

2024-05-22 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Backport needed too.

-- >8 --

Clang does not enable -fsized-deallocation by default, which means it
can't compile our  and  headers.

Make the __cpp_lib_generator macro depend on the compiler-defined
__cpp_sized_deallocation macro, and change  to use unsized
deallocation when __cpp_sized_deallocation isn't defined.

libstdc++-v3/ChangeLog:

PR libstdc++/114940
* include/bits/version.def (generator): Depend on
__cpp_sized_deallocation.
* include/bits/version.h: Regenerate.
* include/std/stacktrace (_GLIBCXX_SIZED_DELETE): New macro.
(basic_stacktrace::_Impl::_M_deallocate): Use it.
---
 libstdc++-v3/include/bits/version.def |  2 +-
 libstdc++-v3/include/bits/version.h   |  2 +-
 libstdc++-v3/include/std/stacktrace   | 13 +++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index f0ba4f2bb3d..5cbc9d1a8d8 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1651,7 +1651,7 @@ ftms = {
   values = {
 v = 202207;
 cxxmin = 23;
-extra_cond = "__glibcxx_coroutine";
+extra_cond = "__glibcxx_coroutine && __cpp_sized_deallocation";
   };
 };
 
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index f30f51dcedc..164ebed4983 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1834,7 +1834,7 @@
 #undef __glibcxx_want_forward_like
 
 #if !defined(__cpp_lib_generator)
-# if (__cplusplus >= 202100L) && (__glibcxx_coroutine)
+# if (__cplusplus >= 202100L) && (__glibcxx_coroutine && 
__cpp_sized_deallocation)
 #  define __glibcxx_generator 202207L
 #  if defined(__glibcxx_want_all) || defined(__glibcxx_want_generator)
 #   define __cpp_lib_generator 202207L
diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index d217d63af3b..962dbed7a41 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -551,6 +551,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
 # define _GLIBCXX_OPERATOR_NEW ::operator new
 # define _GLIBCXX_OPERATOR_DELETE ::operator delete
+#endif
+
+#if __cpp_sized_deallocation
+# define _GLIBCXX_SIZED_DELETE(T, p, n) \
+  _GLIBCXX_OPERATOR_DELETE((p), (n) * sizeof(T))
+#else
+# define _GLIBCXX_SIZED_DELETE(T, p, n) _GLIBCXX_OPERATOR_DELETE(p)
 #endif
 
// Precondition: _M_frames == nullptr && __n != 0
@@ -592,8 +599,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (_M_capacity)
{
  if constexpr (is_same_v>)
-   _GLIBCXX_OPERATOR_DELETE (static_cast(_M_frames),
- _M_capacity * sizeof(value_type));
+   _GLIBCXX_SIZED_DELETE(value_type,
+ static_cast(_M_frames),
+ _M_capacity);
  else
__alloc.deallocate(_M_frames, _M_capacity);
  _M_frames = nullptr;
@@ -601,6 +609,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
 
+#undef _GLIBCXX_SIZED_DELETE
 #undef _GLIBCXX_OPERATOR_DELETE
 #undef _GLIBCXX_OPERATOR_NEW
 
-- 
2.45.1



[committed] libstdc++: Add [[nodiscard]] to some std::locale functions

2024-05-22 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/locale_classes.h (locale::combine)
(locale::name, locale::operator==, locale::operator!=)
(locale::operator(), locale::classic): Add nodiscard
attribute.
* include/bits/locale_classes.tcc (has_facet, use_facet):
Likewise.
* testsuite/22_locale/locale/cons/12438.cc: Add dg-warning for
nodiscard diagnostic.
* testsuite/22_locale/locale/cons/2.cc: Cast use_facet
expression to void, to suppress diagnostic.
* testsuite/22_locale/locale/cons/unicode.cc: Likewise.
* testsuite/22_locale/locale/operations/2.cc: Add dg-warning.
---
 libstdc++-v3/include/bits/locale_classes.h  | 7 ++-
 libstdc++-v3/include/bits/locale_classes.tcc| 2 ++
 libstdc++-v3/testsuite/22_locale/locale/cons/12438.cc   | 2 +-
 libstdc++-v3/testsuite/22_locale/locale/cons/2.cc   | 2 +-
 libstdc++-v3/testsuite/22_locale/locale/cons/unicode.cc | 2 +-
 libstdc++-v3/testsuite/22_locale/locale/operations/2.cc | 2 +-
 6 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_classes.h 
b/libstdc++-v3/include/bits/locale_classes.h
index a2e94217006..50a748066f1 100644
--- a/libstdc++-v3/include/bits/locale_classes.h
+++ b/libstdc++-v3/include/bits/locale_classes.h
@@ -240,6 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  *  @throw  std::runtime_error if __other has no facet of type _Facet.
 */
 template
+  _GLIBCXX_NODISCARD
   locale
   combine(const locale& __other) const;
 
@@ -248,7 +249,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  *  @brief  Return locale name.
  *  @return  Locale name or "*" if unnamed.
 */
-_GLIBCXX_DEFAULT_ABI_TAG
+_GLIBCXX_NODISCARD _GLIBCXX_DEFAULT_ABI_TAG
 string
 name() const;
 
@@ -269,6 +270,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  *  @return  True if other and this refer to the same locale instance, are
  *  copies, or have the same name.  False otherwise.
 */
+_GLIBCXX_NODISCARD
 bool
 operator==(const locale& __other) const throw();
 
@@ -279,6 +281,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  *  @param  __other  The locale to compare against.
  *  @return  ! (*this == __other)
 */
+_GLIBCXX_NODISCARD
 bool
 operator!=(const locale& __other) const throw()
 { return !(this->operator==(__other)); }
@@ -300,6 +303,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  *  @return  True if collate<_Char> facet compares __s1 < __s2, else false.
 */
 template
+  _GLIBCXX_NODISCARD
   bool
   operator()(const basic_string<_Char, _Traits, _Alloc>& __s1,
 const basic_string<_Char, _Traits, _Alloc>& __s2) const;
@@ -321,6 +325,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 /**
  *  @brief  Return reference to the C locale.
 */
+_GLIBCXX_NODISCARD
 static const locale&
 classic();
 
diff --git a/libstdc++-v3/include/bits/locale_classes.tcc 
b/libstdc++-v3/include/bits/locale_classes.tcc
index 00eeb7dd9f8..c79574e58de 100644
--- a/libstdc++-v3/include/bits/locale_classes.tcc
+++ b/libstdc++-v3/include/bits/locale_classes.tcc
@@ -173,6 +173,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @return  true if @p __loc contains a facet of type _Facet, else false.
   */
   template
+_GLIBCXX_NODISCARD
 inline bool
 has_facet(const locale& __loc) _GLIBCXX_USE_NOEXCEPT
 {
@@ -202,6 +203,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wdangling-reference"
   template
+_GLIBCXX_NODISCARD
 inline const _Facet&
 use_facet(const locale& __loc)
 {
diff --git a/libstdc++-v3/testsuite/22_locale/locale/cons/12438.cc 
b/libstdc++-v3/testsuite/22_locale/locale/cons/12438.cc
index 7ff3a487745..4838e1ba693 100644
--- a/libstdc++-v3/testsuite/22_locale/locale/cons/12438.cc
+++ b/libstdc++-v3/testsuite/22_locale/locale/cons/12438.cc
@@ -45,7 +45,7 @@ void test01(int iters)
  locale loc2 = locale("");
  VERIFY( !has_facet(loc2) );
  
- loc1.combine(loc2);
+ loc1.combine(loc2); // { dg-warning "nodiscard" "" { target 
c++17 } }
  VERIFY( false );
}
   catch (std::runtime_error&)
diff --git a/libstdc++-v3/testsuite/22_locale/locale/cons/2.cc 
b/libstdc++-v3/testsuite/22_locale/locale/cons/2.cc
index 12478dbfdc2..dce150effea 100644
--- a/libstdc++-v3/testsuite/22_locale/locale/cons/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/locale/cons/2.cc
@@ -68,7 +68,7 @@ void test01()
 { VERIFY( false ); }
 
   try 
-{ use_facet(loc02); }
+{ (void) use_facet(loc02); }
   catch(bad_cast& obj)
 { VERIFY( true ); }
   catch(...)
diff --git a/libstdc++-v3/testsuite/22_locale/locale/cons/unicode.cc 
b/libstdc++-v3/testsuite/22_locale/locale/cons/unicode.cc
index 24af4142cd9..98d744de91e 100644
--- a/libstdc++-v3/testsuite/22_locale/l

[committed] libstdc++: Fix effects of combining locales [PR108323]

2024-05-22 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This fixes a bug in locale::combine where we fail to meet the standard's
requirement that the result is unnamed. It also implements two library
issues related to the names of combined locales (2295 and 3676).

libstdc++-v3/ChangeLog:

PR libstdc++/108323
* include/bits/locale_classes.tcc (locale(const locale&, Facet*)):
Return a copy of the first argument when the facet pointer is
null, as per LWG 2295.
(locale::combine): Ensure the result is unnamed.
* src/c++11/localename.cc (_M_replace_categories): Ignore
whether the second locale has a name when cat == none, as per
LWG 3676.
* src/c++98/locale.cc (_M_install_facet): Use __builtin_expect
to predict that the facet pointer is non-null.
* testsuite/22_locale/locale/cons/names.cc: New test.
---
 libstdc++-v3/include/bits/locale_classes.tcc  | 13 +++-
 libstdc++-v3/src/c++11/localename.cc  |  4 +-
 libstdc++-v3/src/c++98/locale.cc  |  2 +-
 .../testsuite/22_locale/locale/cons/names.cc  | 61 +++
 4 files changed, 77 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/22_locale/locale/cons/names.cc

diff --git a/libstdc++-v3/include/bits/locale_classes.tcc 
b/libstdc++-v3/include/bits/locale_classes.tcc
index 63097582dec..00eeb7dd9f8 100644
--- a/libstdc++-v3/include/bits/locale_classes.tcc
+++ b/libstdc++-v3/include/bits/locale_classes.tcc
@@ -44,6 +44,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 locale::
 locale(const locale& __other, _Facet* __f)
 {
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2295. Locale name when the provided Facet is a nullptr
+  if (__builtin_expect(!__f, 0))
+   {
+ _M_impl = __other._M_impl;
+ _M_impl->_M_add_reference();
+ return;
+   }
+
   _M_impl = new _Impl(*__other._M_impl, 1);
 
   __try
@@ -72,6 +81,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __tmp->_M_remove_reference();
  __throw_exception_again;
}
+  delete[] __tmp->_M_names[0];
+  __tmp->_M_names[0] = 0;   // Unnamed.
   return locale(__tmp);
 }
 
@@ -163,7 +174,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   */
   template
 inline bool
-has_facet(const locale& __loc) throw()
+has_facet(const locale& __loc) _GLIBCXX_USE_NOEXCEPT
 {
 #if __cplusplus >= 201103L
   static_assert(__is_base_of(locale::facet, _Facet),
diff --git a/libstdc++-v3/src/c++11/localename.cc 
b/libstdc++-v3/src/c++11/localename.cc
index cde94ec6e19..909cf4c66d3 100644
--- a/libstdc++-v3/src/c++11/localename.cc
+++ b/libstdc++-v3/src/c++11/localename.cc
@@ -326,7 +326,9 @@ const int num_facets = (
   _M_replace_categories(const _Impl* __imp, category __cat)
   {
 category __mask = 1;
-if (!_M_names[0] || !__imp->_M_names[0])
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3676. Name of locale composed using std::locale::none
+if (!_M_names[0] || (__cat != none && !__imp->_M_names[0]))
   {
if (_M_names[0])
  {
diff --git a/libstdc++-v3/src/c++98/locale.cc b/libstdc++-v3/src/c++98/locale.cc
index 3749408115e..0e7533e1e15 100644
--- a/libstdc++-v3/src/c++98/locale.cc
+++ b/libstdc++-v3/src/c++98/locale.cc
@@ -323,7 +323,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   locale::_Impl::
   _M_install_facet(const locale::id* __idp, const facet* __fp)
   {
-if (__fp)
+if (__builtin_expect(__fp != 0, 1))
   {
size_t __index = __idp->_M_id();
 
diff --git a/libstdc++-v3/testsuite/22_locale/locale/cons/names.cc 
b/libstdc++-v3/testsuite/22_locale/locale/cons/names.cc
new file mode 100644
index 000..2a9cfe4c14d
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/locale/cons/names.cc
@@ -0,0 +1,61 @@
+// { dg-do run }
+
+#include 
+#include 
+
+void
+test_pr108323()
+{
+  std::locale named = std::locale::classic();
+  std::locale unnamed = named.combine >(named);
+
+  // Bug libstdc++/108323 - combine does not change the locale name
+  VERIFY( unnamed.name() == "*" );
+}
+
+void
+test_lwg2295()
+{
+  std::locale named = std::locale::classic();
+  std::locale unnamed(named, &std::use_facet >(named));
+  VERIFY( unnamed.name() == "*" );
+
+  // LWG 2295. Locale name when the provided Facet is a nullptr
+  std::locale loc(named, (std::ctype*)0);
+  VERIFY( loc.name() != "*" );
+  VERIFY( loc.name() == named.name() );
+}
+
+void
+test_lwg3676()
+{
+  std::locale named = std::locale::classic();
+  std::locale unnamed = named.combine >(named);
+  std::locale combo;
+
+  // LWG 3676. Name of locale composed using std::locale::none
+
+  combo = std::locale(named, named, std::locale::numeric);
+  VERIFY( combo.name() != "*" );
+  combo = std::locale(named, named, std::locale::none);
+  VERIFY( combo.name() != "*" );
+  combo = std::locale(named, unnamed, std::locale::numeric);
+  VERIFY( combo.name() == "*" );
+  combo = std::locale(named, unnamed, std::locale::none);
+  VERIFY( com

libstdc++: the specialization atomic_ref should use the primary template

2024-05-22 Thread Lebrun-Grandie, Damien
See patch attached to this email.
Best,
Damien


0001-libstdc-the-specialization-atomic_ref-bool-should-us.patch
Description: 0001-libstdc-the-specialization-atomic_ref-bool-should-us.patch


[PATCH v4] Match: Add overloaded types_match to avoid code dup [NFC]

2024-05-22 Thread pan2 . li
From: Pan Li 

There are sorts of match pattern for SAT related cases,  there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches.  Thus,  add overloaded types_match func
do this and avoid match code duplication.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 regression test.

gcc/ChangeLog:

* generic-match-head.cc (types_match): Add overloaded types_match
for 3 types.
* gimple-match-head.cc (types_match): Ditto.
* match.pd: Leverage overloaded types_match.

Signed-off-by: Pan Li 
---
 gcc/generic-match-head.cc | 14 ++
 gcc/gimple-match-head.cc  | 14 ++
 gcc/match.pd  | 30 ++
 3 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 0d3f648fe8d..8d8ecfaeb1d 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -59,6 +59,20 @@ types_match (tree t1, tree t2)
   return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
 }
 
+/* Routine to determine if the types T1, T2 and T3 are effectively
+   the same for GENERIC.  If T1, T2 or T2 is not a type, the test
+   applies to their TREE_TYPE.  */
+
+static inline bool
+types_match (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  return types_match (t1, t2) && types_match (t2, t3);
+}
+
 /* Return if T has a single use.  For GENERIC, we assume this is
always true.  */
 
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 5f8a1a1ad8e..2b7f746ab13 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -79,6 +79,20 @@ types_match (tree t1, tree t2)
   return types_compatible_p (t1, t2);
 }
 
+/* Routine to determine if the types T1, T2 and T3 are effectively
+   the same for GIMPLE.  If T1, T2 or T2 is not a type, the test
+   applies to their TREE_TYPE.  */
+
+static inline bool
+types_match (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  return types_match (t1, t2) && types_match (t2, t3);
+}
+
 /* Return if T has a single use.  For GIMPLE, we also allow any
non-SSA_NAME (ie constants) and zero uses to cope with uses
that aren't linked up yet.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index 35e3d82b131..7081d76d56a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3048,38 +3048,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned Saturation Add */
 (match (usadd_left_part_1 @0 @1)
  (plus:c @0 @1)
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
 
 (match (usadd_left_part_2 @0 @1)
  (realpart (IFN_ADD_OVERFLOW:c @0 @1))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
 
 (match (usadd_right_part_1 @0 @1)
  (negate (convert (lt (plus:c @0 @1) @0)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
 
 (match (usadd_right_part_1 @0 @1)
  (negate (convert (gt @0 (plus:c @0 @1
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
 
 (match (usadd_right_part_2 @0 @1)
  (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
 
 /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
because the sub part of left_part_2 cannot work with right_part_1.
-- 
2.34.1



Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-22 Thread Carl Love
Kewen:

On 5/13/24 22:44, Kewen.Lin wrote:
>> perform the same operation as setting a specific element in the vector in
>> C code.  For example:
>>
>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>   src_v4si[index] = int_val;
>>
>> The built-in actually generates more instructions than the inline C code
>> with no optimization but is identical with -O3 optimizations.
>>
>> All of the above built-ins that are removed do not have test cases and
>> are not documented.
>>
>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>> __builtin_vec_set_v2df are not removed as they are used in function
>> resolve_vec_insert() in file rs6000-c.cc.
> I think we can replace these calls with the equivalent gimple codes
> (early expanding it) and then we can get rid of these instances.

Hmm, going to need a little coaching here.  I am not sure how to do this.  
Looks like I get to lean some  something new.

   Carl 


[PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-22 Thread Pengxuan Zheng
vget_low_2.c is a test case for little-endian, but we missed the -mlittle-endian
flag in r15-697-ga2e4fe5a53cf75.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.

Signed-off-by: Pengxuan Zheng 
---
 gcc/testsuite/gcc.target/aarch64/vget_low_2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c 
b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
index 44414e1c043..93e9e664ee9 100644
--- a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fdump-tree-optimized -mlittle-endian" } */
 
 #include 
 
-- 
2.17.1



[PATCH] missing reuire target has_arch_ppc64 for pr106550.c

2024-05-22 Thread Jiufu Guo
Hi,

Case pr106550.c is testing constant building for 64bit
register. So, this case requires target of has_arch_ppc64.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff(Jiufu) Guo

---
 gcc/testsuite/gcc.target/powerpc/pr106550.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
b/gcc/testsuite/gcc.target/powerpc/pr106550.c
index 74e395331ab..146514b3adf 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
@@ -1,6 +1,7 @@
 /* PR target/106550 */
 /* { dg-options "-O2 -mdejagnu-cpu=power10" } */
 /* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target has_arch_ppc64 } */
 
 void
 foo (unsigned long long *a)
-- 
2.43.0



Re: [V2 PATCH] Don't reduce estimated unrolled size for innermost loop at cunrolli.

2024-05-22 Thread Hongtao Liu
On Wed, May 22, 2024 at 1:07 PM liuhongt  wrote:
>
> >> Hard to find a default value satisfying all testcases.
> >> some require loop unroll with 7 insns increment, some don't want loop
> >> unroll w/ 5 insn increment.
> >> The original 2/3 reduction happened to meet all those testcases(or the
> >> testcases are constructed based on the old 2/3).
> >> Can we define the parameter as the size of the loop, below the size we
> >> still do the reduction, so the small loop can be unrolled?
>
> >Yeah, that's also a sensible possibility.  Does it work to have a parameter
> >for the unrolled body size?  Thus, amend the existing
> >--param max-completely-peeled-insns with a --param
> >max-completely-peeled-insns-nogrowth?
>
> Update V2:
> It's still hard to find a default value for loop boday size. So I move the
> 2 / 3 reduction from estimated_unrolled_size to try_unroll_loop_completely.
> For the check of body size shrink, 2 / 3 reduction is added, so small loops
> can still be unrolled.
> For the check of comparison between body size and 
> param_max_completely_peeled_insns,
> 2 / 3 is conditionally added for loop->inner || !cunrolli.
> Then the patch avoid gcc testsuite regression, and also prevent big inner loop
> completely unrolled at cunrolli.
The patch regressed arm-*-eabi for

FAIL: 3 regressions



regressions.sum:

=== gcc tests ===



Running gcc:gcc.dg/tree-ssa/tree-ssa.exp ...

FAIL: gcc.dg/tree-ssa/pr83403-1.c scan-tree-dump-times lim2 "Executing
store motion of" 10

FAIL: gcc.dg/tree-ssa/pr83403-2.c scan-tree-dump-times lim2 "Executing
store motion of" 10

=== gfortran tests ===



Running gfortran:gfortran.dg/dg.exp ...

FAIL: gfortran.dg/reassoc_4.f -O   scan-tree-dump-times reassoc1 "[0-9] \\* " 22

for 32-bit arm, estimate_num_insns_seq returns more for load/store of double.

The loop in pr83403-1.c
 198Estimating sizes for loop 4
 199 BB: 6, after_exit: 0
 200  size:   2 if (m_23 != 10)
 201   Exit condition will be eliminated in peeled copies.
 202   Exit condition will be eliminated in last copy.
 203   Constant conditional.
 204 BB: 5, after_exit: 1
 205  size:   1 _5 = n_24 * 10;
 206  size:   1 _6 = _5 + m_23;
 207  size:   1 _7 = _6 * 8;
 208  size:   1 _8 = C_35 + _7;
 209  size:   2 _9 = *_8;
 210  size:   1 _10 = k_25 * 20;
 211  size:   1 _11 = _10 + m_23;
 212  size:   1 _12 = _11 * 8;
 213  size:   1 _13 = A_31 + _12;
 214  size:   2 _14 = *_13;
 215  size:   1 _15 = n_24 * 20;
 216  size:   1 _16 = _15 + k_25;
 217  size:   1 _17 = _16 * 8;
 218  size:   1 _18 = B_33 + _17;
 219  size:   2 _19 = *_18;
 220  size:   1 _20 = _14 * _19;
 221  size:   1 _21 = _9 + _20;
 222  size:   2 *_8 = _21;
 223  size:   1 m_40 = m_23 + 1;
 224   Induction variable computation will be folded away.
 225size: 25-3, last_iteration: 2-2
 226  Loop size: 25
 227  Estimated size after unrolling: 220

For aarch64 and x86, it's ok

 198Estimating sizes for loop 4
 199 BB: 6, after_exit: 0
 200  size:   2 if (m_27 != 10)
 201   Exit condition will be eliminated in peeled copies.
 202   Exit condition will be eliminated in last copy.
 203   Constant conditional.
 204 BB: 5, after_exit: 1
 205  size:   1 _6 = n_28 * 10;
 206  size:   1 _7 = _6 + m_27;
 207  size:   0 _8 = (long unsigned int) _7;
 208  size:   1 _9 = _8 * 8;
 209  size:   1 _10 = C_39 + _9;
 210  size:   1 _11 = *_10;
 211  size:   1 _12 = k_29 * 20;
 212  size:   1 _13 = _12 + m_27;
 213  size:   0 _14 = (long unsigned int) _13;
 214  size:   1 _15 = _14 * 8;
 215  size:   1 _16 = A_35 + _15;
 216  size:   1 _17 = *_16;
 217  size:   1 _18 = n_28 * 20;
 218  size:   1 _19 = _18 + k_29;
 219  size:   0 _20 = (long unsigned int) _19;
 220  size:   1 _21 = _20 * 8;
 221  size:   1 _22 = B_37 + _21;
 222  size:   1 _23 = *_22;
 223  size:   1 _24 = _17 * _23;
 224  size:   1 _25 = _11 + _24;
 225  size:   1 *_10 = _25;
 226  size:   1 m_44 = m_27 + 1;
 227   Induction variable computation will be folded away.
 228size: 21-3, last_iteration: 2-2
 229  Loop size: 21
 230  Estimated size after unrolling: 180

>
> --
>
> For the innermost loop, after completely loop unroll, it will most likely
> not be able to reduce the body size to 2/3. The current 2/3 reduction
> will make some of the larger loops completely unrolled during
> cunrolli, which will then result in them not being able to be
> vectorized. It also increases the register pressure. The patch move
> from estimated_unrolled_size to
> the 2/3 reduction at cunrolli.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/112325
> * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Move the
> 2 / 3 loop body size reduction to ..
> (try_unroll_loop_completely): .. here, add it for the check of
> body size shrink, and the check of comparison against
> param_max_completely_peeled_insns when
> (!

Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-22 Thread Kewen.Lin
Hi Carl,

on 2024/5/23 08:29, Carl Love wrote:
> Kewen:
> 
> On 5/13/24 22:44, Kewen.Lin wrote:
>>> perform the same operation as setting a specific element in the vector in
>>> C code.  For example:
>>>
>>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>>   src_v4si[index] = int_val;
>>>
>>> The built-in actually generates more instructions than the inline C code
>>> with no optimization but is identical with -O3 optimizations.
>>>
>>> All of the above built-ins that are removed do not have test cases and
>>> are not documented.
>>>
>>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>>> __builtin_vec_set_v2df are not removed as they are used in function
>>> resolve_vec_insert() in file rs6000-c.cc.
>> I think we can replace these calls with the equivalent gimple codes
>> (early expanding it) and then we can get rid of these instances.
> 
> Hmm, going to need a little coaching here.  I am not sure how to do this.  
> Looks like I get to lean some  something new.
> 

We have functions rs6000_gimple_fold.*_builtin to fold the builtins,
it's folding (expanding) the bif with equivalent gimple codes, what
we want here is similar, you can refer to some implementation there.
For the expected gimple code, you can refer to what's generated with
normal C code.  Feel free to let me know when you meet some issues
when you are trying, even you prefer me to follow up this.

BR,
Kewen


Re: [PATCH] missing reuire target has_arch_ppc64 for pr106550.c

2024-05-22 Thread Kewen.Lin
Hi Jeff,

subject typo: s/reuire/require/

on 2024/5/23 09:11, Jiufu Guo wrote:
> Hi,
> 
> Case pr106550.c is testing constant building for 64bit
> register. So, this case requires target of has_arch_ppc64.
> 

Nit: Maybe add more comments saying it fails with -m32
without having the expected rldimi?  So it requires
has_arch_ppc64.

> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
> 

Missing a changelog entry here, maybe something like:

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106550.c: Adjust by requiring has_arch_ppc64
effective target.

> BR,
> Jeff(Jiufu) Guo
> 
> ---
>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> index 74e395331ab..146514b3adf 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> @@ -1,6 +1,7 @@
>  /* PR target/106550 */
>  /* { dg-options "-O2 -mdejagnu-cpu=power10" } */
>  /* { dg-require-effective-target power10_ok } */

Nit: power10_ok can be dropped.

> +/* { dg-require-effective-target has_arch_ppc64 } */
OK with the nits above tweaked, thanks.

BR,
Kewen



[PATCH] .gitattributes: disable crlf translation

2024-05-22 Thread Peter Damianov
By default, git has the "autocrlf" """feature""" enabled. This causes the files
to have CRLF line endings when checked out on windows, which in the case of
configure, causes confusing errors like:

./gcc/configure: line 14: $'\r': command not found
./gcc/configure: line 29: syntax error near unexpected token `newline'
'/gcc/configure: line 29: ` ;;

when it is invoked.

Any files damaged in this way can be fixed with:
$ git config core.autocrlf false
$ git reset
$ git checkout .

But, it's better to simply avoid this problem in the first place.
This behavior is never helpful or desired for gcc.

Signed-off-by: Peter Damianov 
---
 .gitattributes | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitattributes b/.gitattributes
index e75bfc595bf..1e116987c98 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -8,3 +8,6 @@ ChangeLog 
whitespace=indent-with-non-tab,space-before-tab,trailing-space
 # Use together with git config diff.md.xfuncname '^\(define.*$'
 # which is run by contrib/gcc-git-customization.sh too.
 *.md diff=md
+
+# Disable lf -> crlf translation on windows.
+* -crlf
-- 
2.39.2



Re: [PATCH] AARCH64: Add Qualcomnm oryon-1 core

2024-05-22 Thread Andrew Pinski
On Tue, May 14, 2024 at 10:27 AM Kyrill Tkachov
 wrote:
>
> Hi Andrew,
>
> On Fri, May 3, 2024 at 8:50 PM Andrew Pinski  wrote:
>>
>> This patch adds Qualcomm's new oryon-1 core; this is enough
>> to recongize the core and later on will add the tuning structure.
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-cores.def (oryon-1): New entry.
>> * config/aarch64/aarch64-tune.md: Regenerate.
>> * doc/invoke.texi  (AArch64 Options): Document oryon-1.
>>
>> Signed-off-by: Andrew Pinski 
>> Co-authored-by: Joel Jones 
>> Co-authored-by: Wei Zhao 
>> ---
>>  gcc/config/aarch64/aarch64-cores.def | 5 +
>>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>>  gcc/doc/invoke.texi  | 1 +
>>  3 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index f69fc212d56..be60929e400 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, 
>> cortexa57, V8_4A,  (SVE, I8MM, B
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("saphira", saphira,saphira,V8_4A,  (CRYPTO), 
>> saphira,   0x51, 0xC01, -1)
>>
>> +/* ARMv8.6-A Architecture Processors.  */
>> +
>> +/* Qualcomm ('Q') cores. */
>> +AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3, F16), 
>> cortexa72,   0x51, 0x001, -1)
>> +
>>  /* ARMv8-A big.LITTLE implementations.  */
>>
>>  AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, V8A,  
>> (CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
>> diff --git a/gcc/config/aarch64/aarch64-tune.md 
>> b/gcc/config/aarch64/aarch64-tune.md
>> index abd3c9e0822..ba940f1c890 100644
>> --- a/gcc/config/aarch64/aarch64-tune.md
>> +++ b/gcc/config/aarch64/aarch64-tune.md
>> @@ -1,5 +1,5 @@
>>  ;; -*- buffer-read-only: t -*-
>>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>>  (define_attr "tune"
>> -   
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>> +   
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 9456ced468a..eabe09dc28f 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21323,6 +21323,7 @@ performance of the code.  Permissible values for 
>> this option are:
>>  @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
>>  @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
>>  @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
>> +@samp{oyron-1},
>
>
> Typo in the name.
> LGTM with that fixed.

Thanks, pushed as r15-784-g01cfd601825014.

Thanks,
Andrew

> Thanks,
> Kyrill
>
>>
>>
>>  @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
>>  @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{qdf24xx},
>>  @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
>> --
>> 2.43.0
>>


RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-22 Thread Li, Pan2
Thanks Richard for reviewing.

> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.

I tried to do the branch (aka cond) match in widen-mult pass similar as 
previous branchless form.
Unfortunately, the branch will be converted to PHI when widen-mult, thus try to 
bypass the PHI handling
and convert the branch form to the branchless form in v2.

> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.

Sure thing.

> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately 
in match.pd before widen-mult.

Sorry I may get confused here, for branch form like below, what transform 
should we perform in phiopt?
The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may 
hit the simplify in the
other early pass.

Or we can leverage branch version of unsigned_integer_sat_add gimple match in 
phiopt and generate the gimple call .SAT_ADD
In phiopt (mostly like what we do in widen-mult).
Not sure if my understanding is correct or not, thanks again for help.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint8_t);

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:14 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD

On Wed, May 22, 2024 at 3:17 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

I'm not convinced we should match this during early if-conversion, should we?
The middle-end doesn't really know .SAT_ADD but some handling of
.ADD_OVERFLOW is present.

But please add a comment before the new pattern, esp. since it's
non-obvious that this is an improvent.

I suspect you rely on this form being recognized as .SAT_ADD later but
what prevents us from breaking this?  Why not convert it to .SAT_ADD
immediately?  If this is because the ISEL pass (or the widen-mult pass)
cannot handle PHIs then I would suggest to split out enough parts of
tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cff67c84498..2dc77a46e67 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_par

[PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-22 Thread François Dumont

As explained in this email:

https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html

I experimented -Wfree-nonheap-object because of my enhancements on algos.

So here is a patch to extend the usage of the _Guard type to other parts 
of vector.


    libstdc++: Use RAII to replace try/catch blocks

    Move _Guard into std::vector declaration and use it to guard all 
calls to

    vector _M_allocate.

    Doing so the compiler has more visibility on what is done with the 
pointers

    and do not raise anymore the -Wfree-nonheap-object warning.

    libstdc++-v3/ChangeLog:

    * include/bits/vector.tcc (_Guard): Move...
    * include/bits/stl_vector.h: ...here.
    (_M_allocate_and_copy): Use latter.
    (_M_initialize_dispatch): Likewise and set _M_finish first 
from the result

    of __uninitialize_fill_n_a that can throw.
    (_M_range_initialize): Likewise.

Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 31169711a48..4ea74e3339a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   clear() _GLIBCXX_NOEXCEPT
   { _M_erase_at_end(this->_M_impl._M_start); }
 
+private:
+  // RAII guard for allocated storage.
+  struct _Guard
+  {
+   pointer _M_storage; // Storage to deallocate
+   size_type _M_len;
+   _Base& _M_vect;
+
+   _GLIBCXX20_CONSTEXPR
+   _Guard(pointer __s, size_type __l, _Base& __vect)
+   : _M_storage(__s), _M_len(__l), _M_vect(__vect)
+   { }
+
+   _GLIBCXX20_CONSTEXPR
+   ~_Guard()
+   {
+ if (_M_storage)
+   _M_vect._M_deallocate(_M_storage, _M_len);
+   }
+
+   _GLIBCXX20_CONSTEXPR
+   pointer
+   _M_release()
+   {
+ pointer __res = _M_storage;
+ _M_storage = 0;
+ return __res;
+   }
+
+  private:
+   _Guard(const _Guard&);
+  };
+
 protected:
   /**
*  Memory expansion handler.  Uses the member allocation function to
@@ -1618,18 +1651,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_M_allocate_and_copy(size_type __n,
 _ForwardIterator __first, _ForwardIterator __last)
{
- pointer __result = this->_M_allocate(__n);
- __try
-   {
- std::__uninitialized_copy_a(__first, __last, __result,
- _M_get_Tp_allocator());
- return __result;
-   }
- __catch(...)
-   {
- _M_deallocate(__result, __n);
- __throw_exception_again;
-   }
+ _Guard __guard(this->_M_allocate(__n), __n, *this);
+ std::__uninitialized_copy_a
+   (__first, __last, __guard._M_storage, _M_get_Tp_allocator());
+ return __guard._M_release();
}
 
 
@@ -1642,13 +1667,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   // 438. Ambiguity in the "do the right thing" clause
   template
void
-   _M_initialize_dispatch(_Integer __n, _Integer __value, __true_type)
+   _M_initialize_dispatch(_Integer __int_n, _Integer __value, __true_type)
{
- this->_M_impl._M_start = _M_allocate(_S_check_init_len(
-   static_cast(__n), _M_get_Tp_allocator()));
- this->_M_impl._M_end_of_storage =
-   this->_M_impl._M_start + static_cast(__n);
- _M_fill_initialize(static_cast(__n), __value);
+ const size_type __n = static_cast(__int_n);
+ _Guard __guard(_M_allocate(_S_check_init_len(
+   __n, _M_get_Tp_allocator())), __n, *this);
+ this->_M_impl._M_finish = std::__uninitialized_fill_n_a
+   (__guard._M_storage, __n, __value, _M_get_Tp_allocator());
+ pointer __start = this->_M_impl._M_start = __guard._M_release();
+ this->_M_impl._M_end_of_storage = __start + __n;
}
 
   // Called by the range constructor to implement [23.1.1]/9
@@ -1690,17 +1717,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
std::forward_iterator_tag)
{
  const size_type __n = std::distance(__first, __last);
- this->_M_impl._M_start
-   = this->_M_allocate(_S_check_init_len(__n, _M_get_Tp_allocator()));
- this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
- this->_M_impl._M_finish =
-   std::__uninitialized_copy_a(__first, __last,
-   this->_M_impl._M_start,
-   _M_get_Tp_allocator());
+ _Guard __guard(this->_M_allocate(_S_check_init_len(
+   __n, _M_get_Tp_allocator())), __n, *this);
+ this->_M_impl._M_finish = std::__uninitialized_copy_a
+   (__first, __last, __guard._M_storage, _M_get_Tp_allocator());
+ pointer __s

  1   2   >