Re: [PATCH v3] MATCH: Simplify `a rrotate (32-b) -> a lrotate b` [PR109906]

2024-10-17 Thread Kyrylo Tkachov
Hi Eikansh

> On 16 Oct 2024, at 18:23, Eikansh Gupta  wrote:
> 
> The pattern `a rrotate (32-b)` should be optimized to `a lrotate b`.
> The same is also true for `a lrotate (32-b)`. It can be optimized to
> `a rrotate b`.
> 
> This patch adds following patterns:
> a rrotate (32-b) -> a lrotate b
> a lrotate (32-b) -> a rrotate b
> 
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> PR tree-optimization/109906
> 
> gcc/ChangeLog:
> 
> * match.pd (a rrotate (32-b) -> a lrotate b): New pattern
> (a lrotate (32-b) -> a rrotate b): New pattern
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/tree-ssa/pr109906.c: New test.
> 
> Signed-off-by: Eikansh Gupta 
> ---
> gcc/match.pd |  9 ++
> gcc/testsuite/gcc.dg/tree-ssa/pr109906.c | 41 
> 2 files changed, 50 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109906.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5ec31ef6269..078ef050351 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4861,6 +4861,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>build_int_cst (TREE_TYPE (@1),
>   element_precision (type)), @1); }))
> 
> +/* a rrotate (32-b) -> a lrotate b */
> +/* a lrotate (32-b) -> a rrotate b */
> +(for rotate (lrotate rrotate)
> + orotate (rrotate lrotate)
> + (simplify
> +  (rotate @0 (minus INTEGER_CST@1 @2))
> +   (if (TYPE_PRECISION (TREE_TYPE (@0)) == wi::to_wide (@1))
> + (orotate @0 @2

There is already a transformation for lrotate (x, C) into rotate (x, SIZE - C) 
around line 4937 in match.pd
Isn’t there a risk that we enter an infinite recursion situation with this?

Thanks,
Kyrill


> +
> /* Turn (a OP c1) OP c2 into a OP (c1+c2).  */
> (for op (lrotate rrotate rshift lshift)
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109906.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr109906.c
> new file mode 100644
> index 000..9aa015d8c65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109906.c
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/109906 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> +/* { dg-require-effective-target int32 } */
> +
> +/* Implementation of rotate right operation */
> +static inline
> +unsigned rrotate(unsigned x, int t)
> +{
> +  if (t >= 32) __builtin_unreachable();
> +  unsigned tl = x >> (t);
> +  unsigned th = x << (32 - t);
> +  return tl | th;
> +}
> +
> +/* Here rotate left is achieved by doing rotate right by (32 - x) */
> +unsigned rotateleft(unsigned t, int x)
> +{
> +  return rrotate (t, 32 - x);
> +}
> +
> +/* Implementation of rotate left operation */
> +static inline
> +unsigned lrotate(unsigned x, int t)
> +{
> +  if (t >= 32) __builtin_unreachable();
> +  unsigned tl = x << (t);
> +  unsigned th = x >> (32 - t);
> +  return tl | th;
> +}
> +
> +/* Here rotate right is achieved by doing rotate left by (32 - x) */
> +unsigned rotateright(unsigned t, int x)
> +{
> +  return lrotate (t, 32 - x);
> +}
> +
> +/* Shouldn't have instruction for (32 - x). */
> +/* { dg-final { scan-tree-dump-not "minus_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump "rrotate_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump "lrotate_expr" "optimized" } } */
> -- 
> 2.17.1
> 



Re: [PATCH v4] libstdc++: implement concatenation of strings and string_views

2024-10-17 Thread Giuseppe D'Angelo

Hello,

Il 17/10/24 06:32, François Dumont ha scritto:

As a side note you should provide your patches as .txt files so that any
email client can render it without going through an editor.


Apologies for that. Do you mean I should use text/plain attachments 
instead of text/x-patch?




And regarding the patch, I wonder what the std::move is for on the
returned value ?

Like this one:

+    {
+  return std::move(__lhs.append(__rhs));
+    }

As it's a C&P the question might not be for you Giuseppe.


Well, I'll gladly give it a shot. :) I'm assuming you're talking about 
the operator+(string &&lhs, const char *rhs) overload, whose 
implementation I copied for operator+(string &&, string_view)?


By spec https://eel.is/c++draft/string.op.plus#2 it must do:


lhs.append(rhs);
return std::move(lhs);


The call to std::move in the specification is likely a pre-C++20 
remnant, before the adoption of P1825, which made lhs "implicitly 
movable". So in principle right now you could just write:



lhs.append(rhs);
return lhs;


Before P1825, `return lhs` would've caused a copy of `lhs`, not a move, 
even if `lhs` is a variable with automatic storage of type rvalue reference.


In C++23 this area has been further modified by P2266.

In any case both papers are new features, and are not to be treated as 
DRs. Therefore, since this operator+ overload's code has to be 
compatible with pre-C++20 semantics (the code is from C++11), the call 
to std::move must be left in there.¹


The behaviour described by the current wording can also be achieved by


return std::move(lhs.append(rhs))


(the actual implementation in libstdc++), because lhs.append returns a 
reference to lhs; so the implementation is completely equivalent to the 
specification in the Standard. Even if the standard were changed to do a 
plain `return lhs;`, libstdc++'s code would still be correct (we can't 
just `return lhs.append(rhs)`, as that would cause a copy; append 
returns an lvalue reference.)




¹ I just C&P'd the implementation of operator+(string, string_view) from 
the const char * overloads, and that includes these pre-C++26 patterns 
in the code. A prior review also noticed a "typedef" instead of a 
"using". In general, I'm really not sure if I should favour consistency 
with the existing code or can liberally apply refactorings to code meant 
to be compatible only with the latest standards (such as new features), 
any advice is appreciated :)


Thank you,
--
Giuseppe D'Angelo



smime.p7s
Description: Firma crittografica S/MIME


Re: [PATCH 1/2] Disable -fbit-tests and -fjump-tables at -O0

2024-10-17 Thread Richard Biener
On Thu, Oct 17, 2024 at 2:51 AM Andi Kleen  wrote:
>
> From: Andi Kleen 

Instead of initializing with -1 can you Init(0) and add OPT_fjump_tables
and OPT_fbit_tests to the default_options_table table in opts.cc,
under OPT_LEVELS_1_PLUS_NOT_DEBUG I'd guess.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
> * tree-switch-conversion.h (jump_table_cluster::is_enabled):
>   Dito.
> ---
>  gcc/common.opt   | 4 ++--
>  gcc/tree-switch-conversion.h | 5 +++--
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 12b25ff486de..4af7a94fea42 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2189,11 +2189,11 @@ Common Var(flag_ivopts) Init(1) Optimization
>  Optimize induction variables on trees.
>
>  fjump-tables
> -Common Var(flag_jump_tables) Init(1) Optimization
> +Common Var(flag_jump_tables) Init(-1) Optimization
>  Use jump tables for sufficiently large switch statements.
>
>  fbit-tests
> -Common Var(flag_bit_tests) Init(1) Optimization
> +Common Var(flag_bit_tests) Init(-1) Optimization
>  Use bit tests for sufficiently large switch statements.
>
>  fkeep-inline-functions
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index 6468995eb316..fbfd7ff7b3ff 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -442,7 +442,7 @@ public:
>/* Return whether bit test expansion is allowed.  */
>static inline bool is_enabled (void)
>{
> -return flag_bit_tests;
> +return flag_bit_tests >= 0 ? flag_bit_tests : (optimize >= 1);
>}
>
>/* True when the jump table handles an entire switch statement.  */
> @@ -524,7 +524,8 @@ bool jump_table_cluster::is_enabled (void)
>   over-ruled us, we really have no choice.  */
>if (!targetm.have_casesi () && !targetm.have_tablejump ())
>  return false;
> -  if (!flag_jump_tables)
> +  int flag = flag_jump_tables >= 0 ? flag_jump_tables : (optimize >= 1);
> +  if (!flag)
>  return false;
>  #ifndef ASM_OUTPUT_ADDR_DIFF_ELT
>if (flag_pic)
> --
> 2.46.2
>


Re: [PATCH] testsuite: Fix typos for AVX10.2 convert testcases

2024-10-17 Thread Hongtao Liu
On Thu, Oct 17, 2024 at 3:17 PM Haochen Jiang  wrote:
>
> From: Victor Rodriguez 
>
> Hi all,
>
> There are some typos in AVX10.2 vcvtne[,2]ph[b,h]f8[,s] testcases.
> They will lead to type mismatch.
>
> Previously they are not found due to the binutils did not checkin.
>
> Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> ---
>
> Fix typos related to types for vcvtne[,2]ph[b,h]f8[,s] testcases.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Fix typo.
> * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
> ---
>  .../gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c  | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c  | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c   | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c  | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c   | 10 +-
>  .../gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c  | 10 +-
>  8 files changed, 40 insertions(+), 40 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c
> index 0dd58ee710e..7e7865d64fe 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c
> @@ -65,16 +65,16 @@ TEST (void)
>CALC(res_ref, src1.a, src2.a);
>
>res1.x = INTRINSIC (_cvtne2ph_pbf8) (src1.x, src2.x);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
>  abort ();
>
>res2.x = INTRINSIC (_mask_cvtne2ph_pbf8) (res2.x, mask, src1.x, src2.x);
> -  MASK_MERGE (h) (res_ref, mask, SIZE);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  MASK_MERGE (i_b) (res_ref, mask, SIZE);
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
>  abort ();
>
>res3.x = INTRINSIC (_maskz_cvtne2ph_pbf8) (mask, src1.x, src2.x);
> -  MASK_ZERO (h) (res_ref, mask, SIZE);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  MASK_ZERO (i_b) (res_ref, mask, SIZE);
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
>  abort ();
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c
> index 5e3ea3e37a4..0ca0c420ff7 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c
> @@ -65,16 +65,16 @@ TEST (void)
>CALC(res_ref, src1.a, src2.a);
>
>res1.x = INTRINSIC (_cvtnes2ph_pbf8) (src1.x, src2.x);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
>  abort ();
>
>res2.x = INTRINSIC (_mask_cvtnes2ph_pbf8) (res2.x, mask, src1.x, src2.x);
> -  MASK_MERGE (h) (res_ref, mask, SIZE);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  MASK_MERGE (i_b) (res_ref, mask, SIZE);
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
>  abort ();
>
>res3.x = INTRINSIC (_maskz_cvtnes2ph_pbf8) (mask, src1.x, src2.x);
> -  MASK_ZERO (h) (res_ref, mask, SIZE);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  MASK_ZERO (i_b) (res_ref, mask, SIZE);
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
>  abort ();
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c
> index aa928b582b3..97afd395bb5 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c
> @@ -65,16 +65,16 @@ TEST (void)
>CALC(res_ref, src1.a, src2.a);
>
>res1.x = INTRINSIC (_cvtne2ph_phf8) (src1.x, src2.x);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
>  abort ();
>
>res2.x = INTRINSIC (_mask_cvtne2ph_phf8) (res2.x, mask, src1.x, src2.x);
> -  MASK_MERGE (h) (res_ref, mask, SIZE);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> +  MASK_MERGE (i_b) (res_ref, mask, SIZE);
> +  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
>  abort ();
>
>res3.x = INTRINSIC (_maskz_cvtne2ph_phf8) (mask, src1.x, src2.x);
> -  MASK_ZERO (h) (res_ref, mask, SIZE);
> -  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
> 

Re: [PATCH] c++: Fix crash during NRV optimization with invalid input [PR117099]

2024-10-17 Thread Simon Martin
Hi Sam,

On 16 Oct 2024, at 22:06, Sam James wrote:

> Simon Martin  writes:
>
>> We ICE upon the following invalid code because we end up calling
>> finalize_nrv_r with a RETURN_EXPR with no operand.
>>
>> === cut here ===
>> struct X {
>>   ~X();
>> };
>> X test(bool b) {
>>   {
>> X x;
>> return x;
>>   }
>>   if (!(b)) return;
>> }
>> === cut here ===
>>
>> This patch fixes this by simply returning error_mark_node when 
>> detecting
>> a void return in a function returning non-void.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/117099
>>
>> gcc/cp/ChangeLog:
>>
>>  * typeck.cc (check_return_expr): Return error_mark_node upon
>>  void return for function returning non-void.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/parse/crash77.C: New test.
>>
>> ---
>>  gcc/cp/typeck.cc |  1 +
>>  gcc/testsuite/g++.dg/parse/crash77.C | 14 ++
>>  2 files changed, 15 insertions(+)
>>  create mode 100644 gcc/testsuite/g++.dg/parse/crash77.C
>>
>> diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
>> index 71d879abef1..22a6ec9a185 100644
>> --- a/gcc/cp/typeck.cc
>> +++ b/gcc/cp/typeck.cc
>> @@ -11238,6 +11238,7 @@ check_return_expr (tree retval, bool 
>> *no_warning, bool *dangling)
>>   RETURN_EXPR to avoid control reaches end of non-void function
>>   warnings in tree-cfg.cc.  */
>>*no_warning = true;
>> +  return error_mark_node;
>>  }
>>/* Check for a return statement with a value in a function that
>>   isn't supposed to return a value.  */
>> diff --git a/gcc/testsuite/g++.dg/parse/crash77.C 
>> b/gcc/testsuite/g++.dg/parse/crash77.C
>> new file mode 100644
>> index 000..d3f0ae6a877
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/parse/crash77.C
>> @@ -0,0 +1,14 @@
>> +// PR c++/117099
>> +// { dg-compile }
>
> dg-do compile
>
Aarg, of course, thanks for spotting this! Fixed in the attached 
version.

>> +
>> +struct X {
>> +  ~X();
>> +};
>> +
>> +X test(bool b) {
>> +  {
>> +X x;
>> +return x;
>> +  }
>> +  if (!(b)) return; // { dg-error "return-statement with no value" }
>> +}
>> -- 
>> 2.44.0
>>
>
> BTW, the line-endings on this seem a bit odd. Did you use 
> git-send-email?
I did use git-send-email indeed. What oddities do you see with line 
endings?
cat -A over the patch file looks good.

Thanks, Simon
From 46fcb8cd0f89213b80f2815f6b7c2af064d7e86d Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Wed, 16 Oct 2024 15:47:12 +0200
Subject: [PATCH] c++: Fix crash during NRV optimization with invalid input 
[PR117099]

We ICE upon the following invalid code because we end up calling
finalize_nrv_r with a RETURN_EXPR with no operand.

=== cut here ===
struct X {
  ~X();
};
X test(bool b) {
  {
X x;
return x;
  }
  if (!(b)) return;
}
=== cut here ===

This patch fixes this by simply returning error_mark_node when detecting
a void return in a function returning non-void.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/117099

gcc/cp/ChangeLog:

* typeck.cc (check_return_expr): Return error_mark_node upon
void return for function returning non-void.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash77.C: New test.

---
 gcc/cp/typeck.cc |  1 +
 gcc/testsuite/g++.dg/parse/crash77.C | 14 ++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/parse/crash77.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 71d879abef1..22a6ec9a185 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11238,6 +11238,7 @@ check_return_expr (tree retval, bool *no_warning, bool 
*dangling)
 RETURN_EXPR to avoid control reaches end of non-void function
 warnings in tree-cfg.cc.  */
   *no_warning = true;
+  return error_mark_node;
 }
   /* Check for a return statement with a value in a function that
  isn't supposed to return a value.  */
diff --git a/gcc/testsuite/g++.dg/parse/crash77.C 
b/gcc/testsuite/g++.dg/parse/crash77.C
new file mode 100644
index 000..912b997177c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash77.C
@@ -0,0 +1,14 @@
+// PR c++/117099
+// { dg-do compile }
+
+struct X {
+  ~X();
+};
+
+X test(bool b) {
+  {
+X x;
+return x;
+  } 
+  if (!(b)) return; // { dg-error "return-statement with no value" }
+}
-- 
2.44.0



Re: [PATCH 1/7] libstdc++: Refactor std::uninitialized_{copy, fill, fill_n} algos [PR68350]

2024-10-17 Thread Jonathan Wakely
On Thu, 17 Oct 2024 at 02:39, Patrick Palka  wrote:
>
> On Tue, 15 Oct 2024, Jonathan Wakely wrote:
>
> > This is v2 of
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665246.html
> > fixing some thinkos in uninitialized_{fill,fill_n}. We don't need to
> > worry about overwriting tail-padding in those algos, because we only use
> > memset for 1-byte integer types. So they have no tail padding that can
> > be reused anyway! So this changes __n > 1 to __n > 0 in a few places
> > (which fixes the problem that it was not actually filling anything for
> > the n==1 cases).
> >
> > Also simplify std::__to_address(__result++) to just __result++ because
> > we already have a pointer, and use std::to_address(result++) for a C++20
> > std::contiguous_iterator case, instead of addressof(*result++).
> >
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > This refactors the std::uninitialized_copy, std::uninitialized_fill and
> > std::uninitialized_fill_n algorithms to directly perform memcpy/memset
> > optimizations instead of dispatching to std::copy/std::fill/std::fill_n.
> >
> > The reasons for this are:
> >
> > - Use 'if constexpr' to simplify and optimize compilation throughput, so
> >   dispatching to specialized class templates is only needed for C++98
> >   mode.
> > - Relax the conditions for using memcpy/memset, because the C++20 rules
> >   on implicit-lifetime types mean that we can rely on memcpy to begin
> >   lifetimes of trivially copyable types.  We don't need to require
> >   trivially default constructible, so don't need to limit the
> >   optimization to trivial types. See PR 68350 for more details.
> > - The conditions on non-overlapping ranges are stronger for
> >   std::uninitialized_copy than for std::copy so we can use memcpy instead
> >   of memmove, which might be a minor optimization.
> > - Avoid including  in .
> >   It only needs some iterator utilities from that file now, which belong
> >   in  anyway, so this moves them there.
> >
> > Several tests need changes to the diagnostics matched by dg-error
> > because we no longer use the __constructible() function that had a
> > static assert in. Now we just get straightforward errors for attempting
> > to use a deleted constructor.
> >
> > Two tests needed more signficant changes to the actual expected results
> > of executing the tests, because they were checking for old behaviour
> > which was incorrect according to the standard.
> > 20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
> > std::copy to be used for a call to std::uninitialized_copy involving two
> > trivially copyable types. That was incorrect behaviour, because a
> > non-trivial constructor should have been used, but using std::copy used
> > trivial default initialization followed by assignment.
> > 20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
> > the behaviour with a non-integral Size passed to uninitialized_fill_n,
> > but I wrote the test looking at the requirements of uninitialized_copy_n
> > which are not the same as uninitialized_fill_n. The former uses --n and
> > tests n > 0, but the latter just tests n-- (which will never be false
> > for a floating-point value with a fractional part).
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/68350
> >   PR libstdc++/93059
> >   * include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
> >   to ...
> >   * include/bits/stl_iterator.h: ... here.
> >   * include/bits/stl_uninitialized.h (__check_constructible)
> >   (_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
> >   [C++98] (__unwrappable_niter): New trait.
> >   (__uninitialized_copy): Replace use of std::copy.
> >   (uninitialized_copy): Fix Doxygen comments. Open-code memcpy
> >   optimization for C++11 and later.
> >   (__uninitialized_fill): Replace use of std::fill.
> >   (uninitialized_fill): Fix Doxygen comments. Open-code memset
> >   optimization for C++11 and later.
> >   (__uninitialized_fill_n): Replace use of std::fill_n.
> >   (uninitialized_fill_n): Fix Doxygen comments. Open-code memset
> >   optimization for C++11 and later.
> >   * 
> > testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
> >   Adjust expected behaviour to match what the standard specifies.
> >   * 
> > testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
> >   Likewise.
> >   * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
> >   Adjust dg-error directives.
> >   * 
> > testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
> >   Likewise.
> >   * 
> > testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc:
> >   Likewise.
> >   * 
> > testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc:
> >   Likewise.
> >   * 
> > testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc:
> >   Likewise.

Re: [PATCH RFC] build: update bootstrap req to C++14

2024-10-17 Thread Richard Biener
On Wed, Oct 16, 2024 at 5:14 PM Jakub Jelinek  wrote:
>
> On Wed, Oct 16, 2024 at 11:04:32AM -0400, Jason Merrill wrote:
> > > Alternatively, systems (that care about Ada and D) running 4.7 could
> > > build 10.5, systems running 4.8 could build 11.5.
> >
> > Here's an updated patch.  I tested C++14 bootstrap again with 5.x compilers,
> > and Jakub's dwarf2asm change breaks on 5.3 due to PR69995, while 5.4
>
> The dwarf2asm and libcpp _cpp_trigraph_map changes were just optimizations,
> so if we wanted, we could just guard it with additional __GCC_PREREQ (5, 4)
> or similar.
>
> > successfully bootstraps.
> >
> > I also added the 9.5 recommendation.
>
> > From 87e90d3677a6211b5bb9fc6865b987203a819108 Mon Sep 17 00:00:00 2001
> > From: Jason Merrill 
> > Date: Tue, 17 Sep 2024 17:38:35 -0400
> > Subject: [PATCH] build: update bootstrap req to C++14
> > To: gcc-patches@gcc.gnu.org
> >
> > This implements my proposal to update our bootstrap requirement to C++14.
> > The big benefit of the change is the greater constexpr power, but C++14 also
> > added variable templates, generic lambdas, lambda init-capture, binary
> > literals, and numeric literal digit separators.
> >
> > C++14 was feature-complete in GCC 5, and became the default in GCC 6.  5.4.0
> > bootstraps trunk correctly; trunk stage1 built with 5.3.0 breaks in
> > eh_data_format_name due to PR69995.
> >
> > gcc/ChangeLog:
> >
> >   * doc/install.texi (Prerequisites): Update to C++14.
> >
> > ChangeLog:
> >
> >   * configure.ac: Update requirement to C++14.
> >   * configure: Regenerate.
>
> Ok from my side, but please give Richi and others a week to disagree before
> committing.

I'm fine with it.

Richard.

>
> Jakub
>


Re: [PATCH][LRA][PR116550] Reuse scratch registers generated by LRA

2024-10-17 Thread Denis Chertykov
чт, 17 окт. 2024 г. в 00:32, Vladimir Makarov :
>
>
> On 10/10/24 14:32, Denis Chertykov wrote:
> >
> > The patch is very simple.
> > On x86_64, it bootstraps+regtests fine.
> > Ok for trunk?
> >
> Sorry for the delay with the answer. I missed your patch and pinging it
> was the right thing to do.
>
> Thanks for the detail explanation of the problem which makes me easy to
> approve your patch.
>
> I don't expect that the patch will create some problems for other
> targets, but LRA patch behavior prediction can be very tricky.  So
> please still pay attention for possible issues on the other targets for
> couple days.
>
> The patch is ok to commit to the trunk.  Thank you for the patch, Denis.

commit e7393cbb

Denis.


Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

2024-10-17 Thread Richard Biener
On Thu, Oct 17, 2024 at 8:38 AM Li, Pan2  wrote:
>
> It is quit a while since last discussion.
> I recall these materials recently and have a try in the risc-v backend.
>
>1   │ void foo (int * __restrict a, int * __restrict b, int stride, int n)
>2   │ {
>3   │ for (int i = 0; i < n; i++)
>4   │   a[i*stride] = b[i*stride] + 100;
>5   │ }
>
> We will have expand similar as below for VEC_SERIES_EXPR + 
> MASK_LEN_GATHER_LOAD.
> There will be 8 insns after expand which is not applicable when try_combine 
> (at most 4 insn) if
> my understand is correct.
>
> Thus, is there any other approaches instead of adding new IFN? If we need to 
> add new IFN, can
> we leverage match.pd to try to match the MASK_LEN_GATHER_LOAD(base, 
> VEC_SERICES_EXPR, ...)
> pattern and then emit the new IFN like sat alu does.

Adding an optab (and direct internal fn) is fine I guess - it should
be modeled after the
gather optab specifying the vec_series is implicit with the then scalar stride.

Enabling it via match.pd looks possible but also possibly sub-optimal
for costing side on the
vectorizer - supporting it directly in the vectorizer can be done later though.

Richard.

> Thanks a lot.
>
>  316   │ ;; _58 = VEC_SERIES_EXPR <0, _57>;
>  317   │
>  318   │ (insn 17 16 18 (set (reg:DI 156 [ _56 ])
>  319   │ (ashiftrt:DI (reg:DI 141 [ _54 ])
>  320   │ (const_int 2 [0x2]))) -1
>  321   │  (expr_list:REG_EQUAL (div:DI (reg:DI 141 [ _54 ])
>  322   │ (const_int 4 [0x4]))
>  323   │ (nil)))
>  324   │
>  325   │ (insn 18 17 19 (set (reg:DI 158)
>  326   │ (unspec:DI [
>  327   │ (const_int 32 [0x20])
>  328   │ ] UNSPEC_VLMAX)) -1
>  329   │  (nil))
>  330   │
>  331   │ (insn 19 18 20 (set (reg:RVVM1SI 157)
>  332   │ (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  333   │ (const_vector:RVVMF32BI repeat [
>  334   │ (const_int 1 [0x1])
>  335   │ ])
>  336   │ (reg:DI 158)
>  337   │ (const_int 2 [0x2]) repeated x2
>  338   │ (const_int 1 [0x1])
>  339   │ (reg:SI 66 vl)
>  340   │ (reg:SI 67 vtype)
>  341   │ ] UNSPEC_VPREDICATE)
>  342   │ (vec_series:RVVM1SI (const_int 0 [0])
>  343   │ (const_int 1 [0x1]))
>  344   │ (unspec:RVVM1SI [
>  345   │ (reg:DI 0 zero)
>  346   │ ] UNSPEC_VUNDEF))) -1
>  347   │  (nil))
>  348   │
>  349   │ (insn 20 19 21 (set (reg:DI 160)
>  350   │ (unspec:DI [
>  351   │ (const_int 32 [0x20])
>  352   │ ] UNSPEC_VLMAX)) -1
>  353   │  (nil))
>  354   │
>  355   │ (insn 21 20 22 (set (reg:RVVM1SI 159)
>  356   │ (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  357   │ (const_vector:RVVMF32BI repeat [
>  358   │ (const_int 1 [0x1])
>  359   │ ])
>  360   │ (reg:DI 160)
>  361   │ (const_int 2 [0x2]) repeated x2
>  362   │ (const_int 1 [0x1])
>  363   │ (reg:SI 66 vl)
>  364   │ (reg:SI 67 vtype)
>  365   │ ] UNSPEC_VPREDICATE)
>  366   │ (mult:RVVM1SI (vec_duplicate:RVVM1SI (subreg:SI (reg:DI 
> 156 [ _56 ]) 0))
>  367   │ (reg:RVVM1SI 157))
>  368   │ (unspec:RVVM1SI [
>  369   │ (reg:DI 0 zero)
>  370   │ ] UNSPEC_VUNDEF))) -1
>  371   │  (nil))
>  ...
>  403   │ ;; vect__5.16_61 = .MASK_LEN_GATHER_LOAD (vectp_b.14_59, _58, 4, { 
> 0, ... }, { -1, ... }, _73, 0);
>  404   │
>  405   │ (insn 27 26 28 (set (reg:RVVM2DI 161)
>  406   │ (sign_extend:RVVM2DI (reg:RVVM1SI 145 [ _58 ]))) 
> "strided_ld-st.c":4:22 -1
>  407   │  (nil))
>  408   │
>  409   │ (insn 28 27 29 (set (reg:RVVM2DI 162)
>  410   │ (ashift:RVVM2DI (reg:RVVM2DI 161)
>  411   │ (const_int 2 [0x2]))) "strided_ld-st.c":4:22 -1
>  412   │  (nil))
>  413   │
>  414   │ (insn 29 28 0 (set (reg:RVVM1SI 146 [ vect__5.16 ])
>  415   │ (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  416   │ (const_vector:RVVMF32BI repeat [
>  417   │ (const_int 1 [0x1])
>  418   │ ])
>  419   │ (reg:DI 149 [ _73 ])
>  420   │ (const_int 2 [0x2]) repeated x2
>  421   │ (const_int 0 [0])
>  422   │ (reg:SI 66 vl)
>  423   │ (reg:SI 67 vtype)
>  424   │ ] UNSPEC_VPREDICATE)
>  425   │ (unspec:RVVM1SI [
>  426   │ (reg/v/f:DI 151 [ b ])
>  427   │ (mem:BLK (scratch) [0  A8])
>  428   │

Re: [PATCH 3/7] libstdc++: Inline memmove optimizations for std::copy etc. [PR115444]

2024-10-17 Thread Jonathan Wakely
On Thu, 17 Oct 2024, 03:04 Patrick Palka,  wrote:

> On Tue, 15 Oct 2024, Jonathan Wakely wrote:
>
> > This is a slightly different approach to C++98 compatibility than used
> > in patch 1/1 of this series for the uninitialized algos. It worked out a
> > bit cleaner this way for these algos, I think.
> >
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > This removes all the __copy_move class template specializations that
> > decide how to optimize std::copy and std::copy_n. We can inline those
> > optimizations into the algorithms, using if-constexpr (and macros for
> > C++98 compatibility) and remove the code dispatching to the various
> > class template specializations.
> >
> > Doing this means we implement the optimization directly for std::copy_n
> > instead of deferring to std::copy, That avoids the unwanted consequence
> > of advancing the iterator in copy_n only to take the difference later to
> > get back to the length that we already had in copy_n originally (as
> > described in PR 115444).
> >
> > With the new flattened implementations, we can also lower contiguous
> > iterators to pointers in std::copy/std::copy_n/std::copy_backwards, so
> > that they benefit from the same memmove optimizations as pointers.
> > There's a subtlety though: contiguous iterators can potentially throw
> > exceptions to exit the algorithm early.  So we can only transform the
> > loop to memmove if dereferencing the iterator is noexcept. We don't
> > check that incrementing the iterator is noexcept because we advance the
> > contiguous iterators before using memmove, so that if incrementing would
> > throw, that happens first. I am writing a proposal (P3249R0) which would
> > make this unnecessary, so I hope we can drop the nothrow requirements
> > later.
> >
> > This change also solves PR 114817 by checking is_trivially_assignable
> > before optimizing copy/copy_n etc. to memmove. It's not enough to check
> > that the types are trivially copyable (a precondition for using memmove
> > at all), we also need to check that the specific assignment that would
> > be performed by the algorithm is also trivial. Replacing a non-trivial
> > assignment with memmove would be observable, so not allowed.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/115444
> >   PR libstdc++/114817
> >   * include/bits/stl_algo.h (__copy_n): Remove generic overload
> >   and overload for random access iterators.
> >   (copy_n): Inline generic version of __copy_n here. Do not defer
> >   to std::copy for random access iterators.
> >   * include/bits/stl_algobase.h (__copy_move): Remove.
> >   (__nothrow_contiguous_iterator, __memcpyable_iterators): New
> >   concepts.
> >   (__assign_one, _GLIBCXX_TO_ADDR, _GLIBCXX_ADVANCE): New helpers.
> >   (__copy_move_a2): Inline __copy_move logic and conditional
> >   memmove optimization into the most generic overload.
> >   (__copy_n_a): Likewise.
> >   (__copy_move_backward): Remove.
> >   (__copy_move_backward_a2): Inline __copy_move_backward logic and
> >   memmove optimization into the most generic overload.
> >   *
> testsuite/20_util/specialized_algorithms/uninitialized_copy/114817.cc:
> >   New test.
> >   *
> testsuite/20_util/specialized_algorithms/uninitialized_copy_n/114817.cc:
> >   New test.
> >   * testsuite/25_algorithms/copy/114817.cc: New test.
> >   * testsuite/25_algorithms/copy/115444.cc: New test.
> >   * testsuite/25_algorithms/copy_n/114817.cc: New test.
> > ---
> >  libstdc++-v3/include/bits/stl_algo.h  |  24 +-
> >  libstdc++-v3/include/bits/stl_algobase.h  | 426 +-
> >  .../uninitialized_copy/114817.cc  |  39 ++
> >  .../uninitialized_copy_n/114817.cc|  39 ++
> >  .../testsuite/25_algorithms/copy/114817.cc|  38 ++
> >  .../testsuite/25_algorithms/copy/115444.cc|  93 
> >  .../testsuite/25_algorithms/copy_n/114817.cc  |  38 ++
> >  7 files changed, 469 insertions(+), 228 deletions(-)
> >  create mode 100644
> libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/114817.cc
> >  create mode 100644
> libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy_n/114817.cc
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/copy/114817.cc
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/copy/115444.cc
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/copy_n/114817.cc
> >
> > diff --git a/libstdc++-v3/include/bits/stl_algo.h
> b/libstdc++-v3/include/bits/stl_algo.h
> > index a1ef665506d..489ce7e14d2 100644
> > --- a/libstdc++-v3/include/bits/stl_algo.h
> > +++ b/libstdc++-v3/include/bits/stl_algo.h
> > @@ -665,25 +665,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >return __result;
> >  }
> >
> > -  template _OutputIterator>
> > -_GLIBCXX20_CONSTEXPR
> > -_OutputIterator
> > -__copy_n(_InputIterator __first, _Size __n,
> > -  _OutputIterato

[PATCH] testsuite: Fix typos for AVX10.2 convert testcases

2024-10-17 Thread Haochen Jiang
From: Victor Rodriguez 

Hi all,

There are some typos in AVX10.2 vcvtne[,2]ph[b,h]f8[,s] testcases.
They will lead to type mismatch.

Previously they are not found due to the binutils did not checkin.

Ok for trunk?

Thx,
Haochen

---

Fix typos related to types for vcvtne[,2]ph[b,h]f8[,s] testcases.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Fix typo.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
---
 .../gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c  | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c  | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c   | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c  | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c   | 10 +-
 .../gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c  | 10 +-
 8 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c
index 0dd58ee710e..7e7865d64fe 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c
@@ -65,16 +65,16 @@ TEST (void)
   CALC(res_ref, src1.a, src2.a);
 
   res1.x = INTRINSIC (_cvtne2ph_pbf8) (src1.x, src2.x);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
 abort ();
 
   res2.x = INTRINSIC (_mask_cvtne2ph_pbf8) (res2.x, mask, src1.x, src2.x);
-  MASK_MERGE (h) (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  MASK_MERGE (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
 abort ();
 
   res3.x = INTRINSIC (_maskz_cvtne2ph_pbf8) (mask, src1.x, src2.x);
-  MASK_ZERO (h) (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  MASK_ZERO (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
 abort ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c
index 5e3ea3e37a4..0ca0c420ff7 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c
@@ -65,16 +65,16 @@ TEST (void)
   CALC(res_ref, src1.a, src2.a);
 
   res1.x = INTRINSIC (_cvtnes2ph_pbf8) (src1.x, src2.x);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
 abort ();
 
   res2.x = INTRINSIC (_mask_cvtnes2ph_pbf8) (res2.x, mask, src1.x, src2.x);
-  MASK_MERGE (h) (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  MASK_MERGE (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
 abort ();
 
   res3.x = INTRINSIC (_maskz_cvtnes2ph_pbf8) (mask, src1.x, src2.x);
-  MASK_ZERO (h) (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  MASK_ZERO (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
 abort ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c
index aa928b582b3..97afd395bb5 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c
@@ -65,16 +65,16 @@ TEST (void)
   CALC(res_ref, src1.a, src2.a);
 
   res1.x = INTRINSIC (_cvtne2ph_phf8) (src1.x, src2.x);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
 abort ();
 
   res2.x = INTRINSIC (_mask_cvtne2ph_phf8) (res2.x, mask, src1.x, src2.x);
-  MASK_MERGE (h) (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  MASK_MERGE (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
 abort ();
 
   res3.x = INTRINSIC (_maskz_cvtne2ph_phf8) (mask, src1.x, src2.x);
-  MASK_ZERO (h) (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, i_b) (res, res_ref))
+  MASK_ZERO (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
 abort ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c
index 891fb66

Re: SVE intrinsics: Fold constant operands for svlsl.

2024-10-17 Thread Kyrylo Tkachov
Hi Soumya

> On 17 Oct 2024, at 06:10, Soumya AR  wrote:
> 
> Hi Richard,
> 
> Thanks for the feedback. I’ve updated the patch with the suggested change.
> Ok for mainline?
> 
> Best,
> Soumya
> 
> > On 14 Oct 2024, at 6:40 PM, Richard Sandiford  
> > wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > Soumya AR  writes:
> >> This patch implements constant folding for svlsl. Test cases have been 
> >> added to
> >> check for the following cases:
> >> 
> >> Zero, merge, and don't care predication.
> >> Shift by 0.
> >> Shift by register width.
> >> Overflow shift on signed and unsigned integers.
> >> Shift on a negative integer.
> >> Maximum possible shift, eg. shift by 7 on an 8-bit integer.
> >> 
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> >> regression.
> >> OK for mainline?
> >> 
> >> Signed-off-by: Soumya AR 
> >> 
> >> gcc/ChangeLog:
> >> 
> >>  * config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::fold):
> >>  Try constant folding.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>  * gcc.target/aarch64/sve/const_fold_lsl_1.c: New test.
> >> 
> >> From 0cf5223e51623dcdbc47a06cbd17d927c74094e2 Mon Sep 17 00:00:00 2001
> >> From: Soumya AR 
> >> Date: Tue, 24 Sep 2024 09:09:32 +0530
> >> Subject: [PATCH] SVE intrinsics: Fold constant operands for svlsl.
> >> 
> >> This patch implements constant folding for svlsl. Test cases have been 
> >> added to
> >> check for the following cases:
> >> 
> >> Zero, merge, and don't care predication.
> >> Shift by 0.
> >> Shift by register width.
> >> Overflow shift on signed and unsigned integers.
> >> Shift on a negative integer.
> >> Maximum possible shift, eg. shift by 7 on an 8-bit integer.
> >> 
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> >> regression.
> >> OK for mainline?
> >> 
> >> Signed-off-by: Soumya AR 
> >> 
> >> gcc/ChangeLog:
> >> 
> >>  * config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::fold):
> >>  Try constant folding.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>  * gcc.target/aarch64/sve/const_fold_lsl_1.c: New test.
> >> ---
> >> .../aarch64/aarch64-sve-builtins-base.cc  |  15 +-
> >> .../gcc.target/aarch64/sve/const_fold_lsl_1.c | 133 ++
> >> 2 files changed, 147 insertions(+), 1 deletion(-)
> >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_lsl_1.c
> >> 
> >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> >> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> index afce52a7e8d..be5d6eae525 100644
> >> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> @@ -1893,6 +1893,19 @@ public:
> >>   }
> >> };
> >> 
> >> +class svlsl_impl : public rtx_code_function
> >> +{
> >> +public:
> >> +  CONSTEXPR svlsl_impl ()
> >> +: rtx_code_function (ASHIFT, ASHIFT) {}
> >> +
> >> +  gimple *
> >> +  fold (gimple_folder &f) const override
> >> +  {
> >> +return f.fold_const_binary (LSHIFT_EXPR);
> >> +  }
> >> +};
> >> +
> > 
> > Sorry for the slow review.  I think we should also make aarch64_const_binop
> > return 0 for LSHIFT_EXPR when the shift is out of range, to match the
> > behaviour of the underlying instruction.
> > 
> > It looks good otherwise.
> > 
> > Thanks,
> > Richard
> > 
> 

In the test case:
+/*
+** s64_x_bit_width:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_bit_width (svbool_t pg) {
+return svlsl_n_s64_x (pg, svdup_s64 (5), 64); 
+}
+
+/*
+** s64_x_out_of_range:
+** mov z[0-9]+\.b, #0
+** ret
+*/

You’ll need to adjust the scan for register zeroing according to the upcoming 
changes from Tamar as per:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665669.html

The patch LGTM (but Richard should approve as it’s SVE-specific)

Thanks,
Kyrill 



Re: [PATCH 2/2] Only do switch bit test clustering when multiple labels point to same bb

2024-10-17 Thread Filip Kastl
Hi Andi,

This seems like a reasonable way to avoid the specific issue in PR117091 and
generally speed up switch lowering of switches with all cases unique.  I cannot
approve this but want to share some comments.

On Wed 2024-10-16 17:50:59, Andi Kleen wrote:
> diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
> index 96ce1c380a59..4151d1bb520e 100644
> --- a/gcc/gimple-if-to-switch.cc
> +++ b/gcc/gimple-if-to-switch.cc
> @@ -254,7 +254,7 @@ if_chain::is_beneficial ()
>else
>  output.release ();
>  
> -  output = bit_test_cluster::find_bit_tests (filtered_clusters);
> +  output = bit_test_cluster::find_bit_tests (filtered_clusters, 2);

Maybe it would be nicer to not have max_c as a parameter of find_bit_tests()
but just guard the call to find_bit_tests() in analyze_switch_statement() with
max_c > 2.  Since we don't have max_c in if_chain::is_beneficial(), having
max_c as parameter leads to this constant 2 in this call that will possibly
confuse people reading if_chain::is_beneficial().  But this is a minor thing
and I guess your way (max_c as a parameter) is also fine by me.

>r = output.length () < filtered_clusters.length ();
>if (r)
>  dump_clusters (&output, "BT can be built");
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index 00426d46..bb7b8cf215a3 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -1772,12 +1772,13 @@ jump_table_cluster::is_beneficial (const vec *> &,
>  }
>  
>  /* Find bit tests of given CLUSTERS, where all members of the vector
> -   are of type simple_cluster.  New clusters are returned.  */
> +   are of type simple_cluster. max_c is the max number of cases per label.

There should be two spaces before "max_c".  Also it would be nice if MAX_C was
written in uppercase for consistency with other function comments in
tree-switch-conversion.cc.

> @@ -577,8 +577,9 @@ public:
>bool try_switch_expansion (vec &clusters);
>/* Compute the number of case labels that correspond to each outgoing edge 
> of
>   switch statement.  Record this information in the aux field of the edge.
> + Returns max number of cases per edge.
>   */

I would specify "approx max number" instead of "max number" here.

Otherwise looks good to me.

Cheers,
Filip Kastl


RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

2024-10-17 Thread Li, Pan2
Thanks Richard for comments.

> Enabling it via match.pd looks possible but also possibly sub-optimal
> for costing side on the
> vectorizer - supporting it directly in the vectorizer can be done later 
> though.

Sure, will have a try in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, October 17, 2024 3:13 PM
To: Li, Pan2 
Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

On Thu, Oct 17, 2024 at 8:38 AM Li, Pan2  wrote:
>
> It is quit a while since last discussion.
> I recall these materials recently and have a try in the risc-v backend.
>
>1   │ void foo (int * __restrict a, int * __restrict b, int stride, int n)
>2   │ {
>3   │ for (int i = 0; i < n; i++)
>4   │   a[i*stride] = b[i*stride] + 100;
>5   │ }
>
> We will have expand similar as below for VEC_SERIES_EXPR + 
> MASK_LEN_GATHER_LOAD.
> There will be 8 insns after expand which is not applicable when try_combine 
> (at most 4 insn) if
> my understand is correct.
>
> Thus, is there any other approaches instead of adding new IFN? If we need to 
> add new IFN, can
> we leverage match.pd to try to match the MASK_LEN_GATHER_LOAD(base, 
> VEC_SERICES_EXPR, ...)
> pattern and then emit the new IFN like sat alu does.

Adding an optab (and direct internal fn) is fine I guess - it should
be modeled after the
gather optab specifying the vec_series is implicit with the then scalar stride.

Enabling it via match.pd looks possible but also possibly sub-optimal
for costing side on the
vectorizer - supporting it directly in the vectorizer can be done later though.

Richard.

> Thanks a lot.
>
>  316   │ ;; _58 = VEC_SERIES_EXPR <0, _57>;
>  317   │
>  318   │ (insn 17 16 18 (set (reg:DI 156 [ _56 ])
>  319   │ (ashiftrt:DI (reg:DI 141 [ _54 ])
>  320   │ (const_int 2 [0x2]))) -1
>  321   │  (expr_list:REG_EQUAL (div:DI (reg:DI 141 [ _54 ])
>  322   │ (const_int 4 [0x4]))
>  323   │ (nil)))
>  324   │
>  325   │ (insn 18 17 19 (set (reg:DI 158)
>  326   │ (unspec:DI [
>  327   │ (const_int 32 [0x20])
>  328   │ ] UNSPEC_VLMAX)) -1
>  329   │  (nil))
>  330   │
>  331   │ (insn 19 18 20 (set (reg:RVVM1SI 157)
>  332   │ (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  333   │ (const_vector:RVVMF32BI repeat [
>  334   │ (const_int 1 [0x1])
>  335   │ ])
>  336   │ (reg:DI 158)
>  337   │ (const_int 2 [0x2]) repeated x2
>  338   │ (const_int 1 [0x1])
>  339   │ (reg:SI 66 vl)
>  340   │ (reg:SI 67 vtype)
>  341   │ ] UNSPEC_VPREDICATE)
>  342   │ (vec_series:RVVM1SI (const_int 0 [0])
>  343   │ (const_int 1 [0x1]))
>  344   │ (unspec:RVVM1SI [
>  345   │ (reg:DI 0 zero)
>  346   │ ] UNSPEC_VUNDEF))) -1
>  347   │  (nil))
>  348   │
>  349   │ (insn 20 19 21 (set (reg:DI 160)
>  350   │ (unspec:DI [
>  351   │ (const_int 32 [0x20])
>  352   │ ] UNSPEC_VLMAX)) -1
>  353   │  (nil))
>  354   │
>  355   │ (insn 21 20 22 (set (reg:RVVM1SI 159)
>  356   │ (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  357   │ (const_vector:RVVMF32BI repeat [
>  358   │ (const_int 1 [0x1])
>  359   │ ])
>  360   │ (reg:DI 160)
>  361   │ (const_int 2 [0x2]) repeated x2
>  362   │ (const_int 1 [0x1])
>  363   │ (reg:SI 66 vl)
>  364   │ (reg:SI 67 vtype)
>  365   │ ] UNSPEC_VPREDICATE)
>  366   │ (mult:RVVM1SI (vec_duplicate:RVVM1SI (subreg:SI (reg:DI 
> 156 [ _56 ]) 0))
>  367   │ (reg:RVVM1SI 157))
>  368   │ (unspec:RVVM1SI [
>  369   │ (reg:DI 0 zero)
>  370   │ ] UNSPEC_VUNDEF))) -1
>  371   │  (nil))
>  ...
>  403   │ ;; vect__5.16_61 = .MASK_LEN_GATHER_LOAD (vectp_b.14_59, _58, 4, { 
> 0, ... }, { -1, ... }, _73, 0);
>  404   │
>  405   │ (insn 27 26 28 (set (reg:RVVM2DI 161)
>  406   │ (sign_extend:RVVM2DI (reg:RVVM1SI 145 [ _58 ]))) 
> "strided_ld-st.c":4:22 -1
>  407   │  (nil))
>  408   │
>  409   │ (insn 28 27 29 (set (reg:RVVM2DI 162)
>  410   │ (ashift:RVVM2DI (reg:RVVM2DI 161)
>  411   │ (const_int 2 [0x2]))) "strided_ld-st.c":4:22 -1
>  412   │  (nil))
>  413   │
>  414   │ (insn 29 28 0 (set (reg:RVVM1SI 146 [ vect__5.16 ])
>  415   │ (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  416   │ (const_vector:RVVMF32BI repeat [
>  417   │

Re: [PATCH v4 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-10-17 Thread Tobias Burnus

Minor follow-up comments:

Paul-Antoine Arras wrote:

This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `gomp_get_mapped_ptr` for the adequate device.


...



@@ -4067,23 +4069,125 @@ gimplify_call_expr


...


+ // Mark mapped argument as device pointer to ensure
+ // idempotency in gimplification
+ gcc_assert (gimplify_omp_ctxp->code == OMP_DISPATCH);


Just an observation: This assert looks as if it really should never
become true if one reads the code flow. I think it is therefore a
candidate for:  gcc_checking_assert

While gcc_assert is enabled by --enable-checking=assert_checking, which
is on by default, gcc_checking_assert is enabled by …=checking, which is 
enabled by default only when gcc/DEV-PHASE == experimental.


Thus, in releases this saves a few bytes (code + string) and µs 
execution time ...


* * *


+/* Gimplify an OMP_DISPATCH construct.  */
+
+static enum gimplify_status
+gimplify_omp_dispatch (tree *expr_p, gimple_seq *pre_p)
+{


...


+  // If device clause, adjust ICV
+  tree device
+= omp_find_clause (OMP_DISPATCH_CLAUSES (expr), OMP_CLAUSE_DEVICE);
+  tree saved_device_icv;
+  if (device)
+{


I think you should do:

if (device
  && (TREE_CODE (TREE_VALUE (device)) != INTEGER_CST
  || !wi::eq_p (device, -1 /* omp_initial_device */ )))


--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -80,6 +80,12 @@ DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_TEAM_NUM, 
"omp_get_team_num",
  BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_TEAMS, "omp_get_num_teams",
  BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_MAPPED_PTR, "omp_get_mapped_ptr",
+ BT_FN_PTR_CONST_PTR_INT, ATTR_NOTHROW_LEAF_LIST)


That's 'void *omp_get_mapped_ptr (const void *, int)'.

When using the function (the builtin), the pointer address of the first
argument escapes for the compiler, but we know it doesn't. Thus, I think
we want to use 'fn attr' here

→ builtin_fnspec in builtins.cc and attr-fnspec.h for the syntax.

Thanks,

Tobias


[PATCH] tree-optimization/117172 - single lane SLP for non-linear inductions

2024-10-17 Thread Richard Biener
The following adds single-lane SLP support for vectorizing non-linear
inductions.

This fixes a bunch of i386 specific testcases with --param vect-force-slp=1.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/117172
* tree-vect-loop.cc (vectorizable_nonlinear_induction): Add
single-lane SLP support.
---
 gcc/tree-vect-loop.cc | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d1f1edc704c..50a1531f4c3 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10006,10 +10006,7 @@ vectorizable_nonlinear_induction (loop_vec_info 
loop_vinfo,
 
   gcc_assert (induction_type > vect_step_op_add);
 
-  if (slp_node)
-ncopies = 1;
-  else
-ncopies = vect_get_num_copies (loop_vinfo, vectype);
+  ncopies = vect_get_num_copies (loop_vinfo, slp_node, vectype);
   gcc_assert (ncopies >= 1);
 
   /* FORNOW. Only handle nonlinear induction in the same loop.  */
@@ -10024,9 +10021,10 @@ vectorizable_nonlinear_induction (loop_vec_info 
loop_vinfo,
   iv_loop = loop;
   gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
 
-  /* TODO: Support slp for nonlinear iv. There should be separate vector iv
- update for each iv and a permutation to generate wanted vector iv.  */
-  if (slp_node)
+  /* TODO: Support multi-lane SLP for nonlinear iv. There should be separate
+ vector iv update for each iv and a permutation to generate wanted
+ vector iv.  */
+  if (slp_node && SLP_TREE_LANES (slp_node) > 1)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -10237,8 +10235,13 @@ vectorizable_nonlinear_induction (loop_vec_info 
loop_vinfo,
   add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
   UNKNOWN_LOCATION);
 
-  STMT_VINFO_VEC_STMTS (stmt_info).safe_push (induction_phi);
-  *vec_stmt = induction_phi;
+  if (slp_node)
+slp_node->push_vec_def (induction_phi);
+  else
+{
+  STMT_VINFO_VEC_STMTS (stmt_info).safe_push (induction_phi);
+  *vec_stmt = induction_phi;
+}
 
   /* In case that vectorization factor (VF) is bigger than the number
  of elements that we can fit in a vectype (nunits), we have to generate
@@ -10268,7 +10271,10 @@ vectorizable_nonlinear_induction (loop_vec_info 
loop_vinfo,
  induction_type);
  gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT);
  new_stmt = SSA_NAME_DEF_STMT (vec_def);
- STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ if (slp_node)
+   slp_node->push_vec_def (new_stmt);
+ else
+   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
}
 }
 
-- 
2.43.0


Re: [PATCH 2/2] c++: constrained auto NTTP vs associated constraints

2024-10-17 Thread Patrick Palka
On Tue, 15 Oct 2024, Patrick Palka wrote:

> On Tue, 15 Oct 2024, Patrick Palka wrote:
> 
> > According to [temp.param]/11, the constraint on an auto NTTP is an
> > associated constraint and so should be checked as part of satisfaction
> > of the overall associated constraints rather than checked individually
> > during coerion/deduction.
> 
> By the way, I wonder if such associated constraints should be relevant for
> subsumption now?
> 
> template concept C = true;
> 
> template concept D = C && true;
> 
> template void f(); // #1
> template void f(); // #2
> 
> int main() {
>   f<0>(); // still ambiguous?
> }
> 
> With this patch the above call is still ambiguous despite #2 now being
> more constrained than #1 because "more constrained" is only considered for
> function templates with the same signatures as per
> https://eel.is/c++draft/temp.func.order#6.2.2 and we deem their signatures
> to be different due to the different type-constraint.

I think I convinced myself that this example should be accepted, and the
way to go about that is to replace the constrained auto in the NTTP type
with an ordinary auto once we set its TEMPLATE_PARM_CONSTRAINTS.  That
way both templates have the same signature modulo associated constraints.

> 
> MSVC also rejects, but Clang accepts and selects #2.
> 
> > 
> > In order to implement this we mainly need to make handling of
> > constrained auto NTTPs go through finish_constrained_parameter so that
> > TEMPLATE_PARMS_CONSTRAINTS gets set on them.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constraint.cc (finish_shorthand_constraint): Add is_non_type
> > parameter for handling constrained (auto) NTTPS.
> > * cp-tree.h (do_auto_deduction): Adjust declaration.
> > (copy_template_args): Declare.
> > (finish_shorthand_constraint): Adjust declaration.
> > * parser.cc (cp_parser_constrained_type_template_parm): Inline
> > into its only caller and remove.
> > (cp_parser_constrained_non_type_template_parm): Likewise.
> > (finish_constrained_parameter): Simplify after the above.
> > (cp_parser_template_parameter): Dispatch to
> > finish_constrained_parameter for a constrained auto NTTP.
> > * pt.cc (process_template_parm): Pass is_non_type to
> > finish_shorthand_constraint.
> > (convert_template_argument): Adjust call to do_auto_deduction.
> > (copy_template_args): Remove static.
> > (unify): Adjust call to do_auto_deduction.
> > (make_constrained_placeholder_type): Return the type not the
> > TYPE_NAME for consistency with make_auto etc.
> > (do_auto_deduction): Remove now unused tmpl parameter.  Don't
> > check constraints on an auto NTTP even in a non-template
> > context.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-placeholder12.C: Adjust expected error
> > upon constrained auto NTTP satisfaction failure.
> > * g++.dg/cpp2a/concepts-pr97093.C: Likewise.
> > * g++.dg/cpp2a/concepts-template-parm2.C: Likewise.
> > * g++.dg/cpp2a/concepts-template-parm6.C: Likewise.
> > ---
> >  gcc/cp/constraint.cc  | 32 +--
> >  gcc/cp/cp-tree.h  |  6 +--
> >  gcc/cp/parser.cc  | 54 +++
> >  gcc/cp/pt.cc  | 35 +---
> >  .../g++.dg/cpp2a/concepts-placeholder12.C |  4 +-
> >  gcc/testsuite/g++.dg/cpp2a/concepts-pr97093.C |  2 +-
> >  .../g++.dg/cpp2a/concepts-template-parm2.C|  2 +-
> >  .../g++.dg/cpp2a/concepts-template-parm6.C|  2 +-
> >  8 files changed, 66 insertions(+), 71 deletions(-)
> > 
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 35be9cc2b41..9394bea8835 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -1189,7 +1189,7 @@ build_constrained_parameter (tree cnc, tree proto, 
> > tree args)
> > done only after the requires clause has been parsed (or not).  */
> >  
> >  tree
> > -finish_shorthand_constraint (tree decl, tree constr)
> > +finish_shorthand_constraint (tree decl, tree constr, bool is_non_type)
> >  {
> >/* No requirements means no constraints.  */
> >if (!constr)
> > @@ -1198,9 +1198,22 @@ finish_shorthand_constraint (tree decl, tree constr)
> >if (error_operand_p (constr))
> >  return NULL_TREE;
> >  
> > -  tree proto = CONSTRAINED_PARM_PROTOTYPE (constr);
> > -  tree con = CONSTRAINED_PARM_CONCEPT (constr);
> > -  tree args = CONSTRAINED_PARM_EXTRA_ARGS (constr);
> > +  tree proto, con, args;
> > +  if (is_non_type)
> > +{
> > +  tree id = PLACEHOLDER_TYPE_CONSTRAINTS (constr);
> > +  tree tmpl = TREE_OPERAND (id, 0);
> > +  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (tmpl);
> > +  proto = TREE_VALUE (TREE_VEC_ELT (parms, 0));
> > +  con = DECL_TEMPLATE_RESULT (tmpl);
> > +  args = TREE_OPERAND (id, 1);
> > +}
> > +  else
> > +{
> > +  proto = CONSTRAINED_PARM_P

[PATCH] RISC-V: override alignment of function/jump/loop

2024-10-17 Thread Wang Pengcheng
Just like what AArch64 has done.

Signed-off-by: Wang Pengcheng

gcc/ChangeLog:

* config/riscv/riscv.cc (struct riscv_tune_param): Add new
tune options.
(riscv_override_options_internal): Override the default alignment
when not optimizing for size.
---
gcc/config/riscv/riscv.cc | 15 +++
1 file changed, 15 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3ac40234345..7d6fc1429b5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -295,6 +295,9 @@ struct riscv_tune_param
bool overlap_op_by_pieces;
unsigned int fusible_ops;
const struct cpu_vector_cost *vec_costs;
+ const char *function_align = nullptr;
+ const char *jump_align = nullptr;
+ const char *loop_align = nullptr;
};


@@ -10283,6 +10286,18 @@ riscv_override_options_internal (struct
gcc_options *opts)
? &optimize_size_tune_info
: cpu->tune_param;

+ /* If not optimizing for size, set the default
+ alignment to what the target wants. */
+ if (!opts->x_optimize_size)
+ {
+ if (opts->x_flag_align_loops && !opts->x_str_align_loops)
+ opts->x_str_align_loops = tune_param->loop_align;
+ if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
+ opts->x_str_align_jumps = tune_param->jump_align;
+ if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+ opts->x_str_align_functions = tune_param->function_align;
+ }
+
/* Use -mtune's setting for slow_unaligned_access, even when optimizing
for size. For architectures that trap and emulate unaligned accesses,
the performance cost is too great, even for -Os. Similarly, if
-- 
2.39.5


Re: [PATCH] doc: remove outdated C++ Concepts section

2024-10-17 Thread Jason Merrill

On 10/15/24 2:05 PM, Patrick Palka wrote:

This was added as part of the initial Concepts TS implementation and
reflects an early version of the Concepts TS paper, which is very
different from standard C++20 concepts (and even from more recent
versions of the Concepts TS, support for which we deprecated in GCC 14
and removed for GCC 15).  So there's not much to salvage from this
section besides the __is_same trait documentation which we can
conveniently move to the previous Type Traits section.


OK.


gcc/ChangeLog:

* doc/extend.texi (C++ Concepts): Remove section.  Move
__is_same documentation to the previous Type Traits section.
---
  gcc/doc/extend.texi | 44 
  1 file changed, 44 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 302c3299ede..c1ab526e871 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29213,7 +29213,6 @@ Predefined Macros,cpp,The GNU C Preprocessor}).
  * C++ Attributes::  Variable, function, and type attributes for C++ only.
  * Function Multiversioning::   Declaring multiple function versions.
  * Type Traits:: Compiler support for type traits.
-* C++ Concepts::Improved support for generic programming.
  * Deprecated Features:: Things will disappear from G++.
  * Backwards Compatibility:: Compatibilities with earlier definitions of C++.
  @end menu
@@ -30090,49 +30089,6 @@ from @code{0} to @code{@var{length}-1}.  This is 
provided for
  efficient implementation of @code{std::make_integer_sequence}.
  @enddefbuiltin
  
-

-@node C++ Concepts
-@section C++ Concepts
-
-C++ concepts provide much-improved support for generic programming. In
-particular, they allow the specification of constraints on template arguments.
-The constraints are used to extend the usual overloading and partial
-specialization capabilities of the language, allowing generic data structures
-and algorithms to be ``refined'' based on their properties rather than their
-type names.
-
-The following keywords are reserved for concepts.
-
-@table @code
-@kindex assumes
-@item assumes
-States an expression as an assumption, and if possible, verifies that the
-assumption is valid. For example, @code{assume(n > 0)}.
-
-@kindex axiom
-@item axiom
-Introduces an axiom definition. Axioms introduce requirements on values.
-
-@kindex forall
-@item forall
-Introduces a universally quantified object in an axiom. For example,
-@code{forall (int n) n + 0 == n}.
-
-@kindex concept
-@item concept
-Introduces a concept definition. Concepts are sets of syntactic and semantic
-requirements on types and their values.
-
-@kindex requires
-@item requires
-Introduces constraints on template arguments or requirements for a member
-function of a class template.
-@end table
-
-The front end also exposes a number of internal mechanism that can be used
-to simplify the writing of type traits. Note that some of these traits are
-likely to be removed in the future.
-
  @defbuiltin{bool __is_same (@var{type1}, @var{type2})}
  A binary type trait: @code{true} whenever the @var{type1} and
  @var{type2} refer to the same type.




RE: [PATCH 4/4]middle-end: create the longest possible zero extend chain after overwidening

2024-10-17 Thread Richard Biener
On Tue, 15 Oct 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 15, 2024 1:42 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH 4/4]middle-end: create the longest possible zero extend 
> > chain
> > after overwidening
> > 
> > On Mon, 14 Oct 2024, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > Consider loops such as:
> > >
> > > void test9(unsigned char *x, long long *y, int n, unsigned char k) {
> > > for(int i = 0; i < n; i++) {
> > > y[i] = k + x[i];
> > > }
> > > }
> > >
> > > where today we generate:
> > >
> > > .L5:
> > > ldr q29, [x5], 16
> > > add x4, x4, 128
> > > uaddl   v1.8h, v29.8b, v30.8b
> > > uaddl2  v29.8h, v29.16b, v30.16b
> > > zip1v2.8h, v1.8h, v31.8h
> > > zip1v0.8h, v29.8h, v31.8h
> > > zip2v1.8h, v1.8h, v31.8h
> > > zip2v29.8h, v29.8h, v31.8h
> > > sxtlv25.2d, v2.2s
> > > sxtlv28.2d, v0.2s
> > > sxtlv27.2d, v1.2s
> > > sxtlv26.2d, v29.2s
> > > sxtl2   v2.2d, v2.4s
> > > sxtl2   v0.2d, v0.4s
> > > sxtl2   v1.2d, v1.4s
> > > sxtl2   v29.2d, v29.4s
> > > stp q25, q2, [x4, -128]
> > > stp q27, q1, [x4, -96]
> > > stp q28, q0, [x4, -64]
> > > stp q26, q29, [x4, -32]
> > > cmp x5, x6
> > > bne .L5
> > >
> > > Note how the zero extend from short to long is half way the chain 
> > > transformed
> > > into a sign extend.  There are two problems with this:
> > >
> > >   1. sign extends are typically slower than zero extends on many uArches.
> > >   2. it prevents vectorizable_conversion from attempting to do a single 
> > > step
> > >  promotion.
> > >
> > > These sign extend happen due to the varous range reduction optimizations 
> > > and
> > > patterns we have, such as multiplication widening, etc.
> > >
> > > My first attempt to fix this was just updating the patterns to when the 
> > > original
> > > source is a zero extend, to not add the intermediate sign extend.
> > >
> > > However this behavior happens in many other places, some of it and as new
> > > patterns get added the problem can be re-introduced.
> > >
> > > Instead I have added a new pattern vect_recog_zero_extend_chain_pattern 
> > > that
> > > attempts to simplify and extend an existing zero extend over multiple
> > > conversions statements.
> > >
> > > As an example, T3 a = (T3)(signed T2)(unsigned T1)x where bitsize T3 > T2 
> > > > T1
> > > gets transformed into T3 a = (T3)(signed T2)(unsigned T2)x.
> > >
> > > The final cast to signed it kept so the types in the tree still match. It 
> > > will
> > > be correctly elided later on.
> > >
> > > This represenation is the most optimal as vectorizable_conversion is 
> > > already
> > > able to decompose a long promotion into multiple steps if the target does 
> > > not
> > > support it in a single step.  More importantly it allows us to do proper 
> > > costing
> > > and support such conversions like (double)x, where bitsize(x) < int in an
> > > efficient manner.
> > >
> > > To do this I have used Ranger's on-demand analysis to perform the check 
> > > to see
> > > if an extension can be removed and extended to zero extend.  The reason 
> > > for this
> > > is that the vectorizer introduces several patterns that are not in the 
> > > IL,  but
> > > also lots of widening IFNs for which handling in a switch wouldn't be very
> > > future proof.
> > >
> > > I did try to do it without Ranger, but ranger had two benefits:
> > >
> > > 1.  It simplified the handling of the IL changes the vectorizer 
> > > introduces, and
> > > makes it future proof.
> > > 2.  Ranger has the advantage of doing the transformation in cases where it
> > knows
> > > that the top bits of the value is zero.  Which we wouldn't be able to 
> > > tell
> > > by looking purely at statements.
> > > 3.  Ranger simplified the handling of corner cases.  Without it the 
> > > handling was
> > > quite complex and I wasn't very confident in it's correctness.
> > >
> > > So I think ranger is the right way to go here...  With these changes the 
> > > above
> > > now generates:
> > >
> > > .L5:
> > > add x4, x4, 128
> > > ldr q26, [x5], 16
> > > uaddl   v2.8h, v26.8b, v31.8b
> > > uaddl2  v26.8h, v26.16b, v31.16b
> > > tbl v4.16b, {v2.16b}, v30.16b
> > > tbl v3.16b, {v2.16b}, v29.16b
> > > tbl v24.16b, {v2.16b}, v28.16b
> > > tbl v1.16b, {v26.16b}, v30.16b
> > > tbl v0.16b, {v26.16b}, v29.16b
> > > tbl v25.16b, {v26.16b}, v28.16b
> > > tbl v2.16b, {v2.16b}, v27.16b
> > > tbl v26.16b, {v26.16b}, v27.16b
> > > stp q4, q3, [x4, -128]
> > > stp q1, q0, [x4, -64]
> > > stp q24, q

[patch,avr,applied] Add test cases for PR116550

2024-10-17 Thread Georg-Johann Lay

Added two test cases for that PR.

Johann

--

rtl-optimization/116550 - Add test cases.

PR rtl-optimization/116550
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116550-1.c: New file.
* gcc.target/avr/torture/lra-pr116550-2.c: New file.rtl-optimization/116550 - Add test cases.

PR rtl-optimization/116550
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116550-1.c: New file.
* gcc.target/avr/torture/lra-pr116550-2.c: New file.

diff --git a/gcc/testsuite/gcc.target/avr/torture/lra-pr116550-1.c b/gcc/testsuite/gcc.target/avr/torture/lra-pr116550-1.c
new file mode 100644
index 000..854698c1ec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/lra-pr116550-1.c
@@ -0,0 +1,216 @@
+/* { dg-additional-options { -std=c99 } } */
+
+typedef int SItype __attribute__ ((mode (SI)));
+typedef unsigned int USItype __attribute__ ((mode (SI)));
+
+
+typedef int DItype __attribute__ ((mode (DI)));
+typedef unsigned int UDItype __attribute__ ((mode (DI)));
+
+struct DWstruct {SItype low, high;};
+
+typedef union
+{
+  struct DWstruct s;
+  DItype ll;
+} DWunion;
+
+
+static inline __attribute__ ((__always_inline__))
+
+UDItype
+__udivmoddi4 (UDItype n, UDItype d, UDItype *rp)
+{
+  const DWunion nn = {.ll = n};
+  const DWunion dd = {.ll = d};
+  DWunion rr;
+  USItype d0, d1, n0, n1, n2;
+  USItype q0, q1;
+  USItype b, bm;
+
+  d0 = dd.s.low;
+  d1 = dd.s.high;
+  n0 = nn.s.low;
+  n1 = nn.s.high;
+
+  if (d1 == 0)
+{
+  if (d0 > n1)
+ {
+
+
+   ((bm) = __builtin_clzl (d0));
+
+   if (bm != 0)
+ {
+
+
+
+   d0 = d0 << bm;
+   n1 = (n1 << bm) | (n0 >> ((4 * 8) - bm));
+   n0 = n0 << bm;
+ }
+
+   do { USItype __d1, __d0, __q1, __q0; USItype __r1, __r0, __m; __d1 = ((USItype) (d0) >> ((4 * 8) / 2)); __d0 = ((USItype) (d0) & (((USItype) 1 << ((4 * 8) / 2)) - 1)); __r1 = (n1) % __d1; __q1 = (n1) / __d1; __m = (USItype) __q1 * __d0; __r1 = __r1 * ((USItype) 1 << ((4 * 8) / 2)) | ((USItype) (n0) >> ((4 * 8) / 2)); if (__r1 < __m) { __q1--, __r1 += (d0); if (__r1 >= (d0)) if (__r1 < __m) __q1--, __r1 += (d0); } __r1 -= __m; __r0 = __r1 % __d1; __q0 = __r1 / __d1; __m = (USItype) __q0 * __d0; __r0 = __r0 * ((USItype) 1 << ((4 * 8) / 2)) | ((USItype) (n0) & (((USItype) 1 << ((4 * 8) / 2)) - 1)); if (__r0 < __m) { __q0--, __r0 += (d0); if (__r0 >= (d0)) if (__r0 < __m) __q0--, __r0 += (d0); } __r0 -= __m; (q0) = (USItype) __q1 * ((USItype) 1 << ((4 * 8) / 2)) | __q0; (n0) = __r0; } while (0);
+   q1 = 0;
+
+
+ }
+  else
+ {
+
+
+   if (d0 == 0)
+ d0 = 1 / d0;
+
+   ((bm) = __builtin_clzl (d0));
+
+   if (bm == 0)
+ {
+
+
+
+
+
+
+
+   n1 -= d0;
+   q1 = 1;
+ }
+   else
+ {
+
+
+   b = (4 * 8) - bm;
+
+   d0 = d0 << bm;
+   n2 = n1 >> b;
+   n1 = (n1 << bm) | (n0 >> b);
+   n0 = n0 << bm;
+
+   do { USItype __d1, __d0, __q1, __q0; USItype __r1, __r0, __m; __d1 = ((USItype) (d0) >> ((4 * 8) / 2)); __d0 = ((USItype) (d0) & (((USItype) 1 << ((4 * 8) / 2)) - 1)); __r1 = (n2) % __d1; __q1 = (n2) / __d1; __m = (USItype) __q1 * __d0; __r1 = __r1 * ((USItype) 1 << ((4 * 8) / 2)) | ((USItype) (n1) >> ((4 * 8) / 2)); if (__r1 < __m) { __q1--, __r1 += (d0); if (__r1 >= (d0)) if (__r1 < __m) __q1--, __r1 += (d0); } __r1 -= __m; __r0 = __r1 % __d1; __q0 = __r1 / __d1; __m = (USItype) __q0 * __d0; __r0 = __r0 * ((USItype) 1 << ((4 * 8) / 2)) | ((USItype) (n1) & (((USItype) 1 << ((4 * 8) / 2)) - 1)); if (__r0 < __m) { __q0--, __r0 += (d0); if (__r0 >= (d0)) if (__r0 < __m) __q0--, __r0 += (d0); } __r0 -= __m; (q1) = (USItype) __q1 * ((USItype) 1 << ((4 * 8) / 2)) | __q0; (n1) = __r0; } while (0);
+ }
+
+
+
+   do { USItype __d1, __d0, __q1, __q0; USItype __r1, __r0, __m; __d1 = ((USItype) (d0) >> ((4 * 8) / 2)); __d0 = ((USItype) (d0) & (((USItype) 1 << ((4 * 8) / 2)) - 1)); __r1 = (n1) % __d1; __q1 = (n1) / __d1; __m = (USItype) __q1 * __d0; __r1 = __r1 * ((USItype) 1 << ((4 * 8) / 2)) | ((USItype) (n0) >> ((4 * 8) / 2)); if (__r1 < __m) { __q1--, __r1 += (d0); if (__r1 >= (d0)) if (__r1 < __m) __q1--, __r1 += (d0); } __r1 -= __m; __r0 = __r1 % __d1; __q0 = __r1 / __d1; __m = (USItype) __q0 * __d0; __r0 = __r0 * ((USItype) 1 << ((4 * 8) / 2)) | ((USItype) (n0) & (((USItype) 1 << ((4 * 8) / 2)) - 1)); if (__r0 < __m) { __q0--, __r0 += (d0); if (__r0 >= (d0)) if (__r0 < __m) __q0--, __r0 += (d0); } __r0 -= __m; (q0) = (USItype) __q1 * ((USItype) 1 << ((4 * 8) / 2)) | __q0; (n0) = __r0; } while (0);
+
+
+ }
+
+  if (rp != 0)
+ {
+   rr.s.low = n0 >> bm;
+   rr.s.high = 0;
+   *rp = rr.ll;
+ }
+}
+
+
+  else
+{
+  if (d1 > n1)
+ {
+
+
+   q0 = 0;
+   q1 = 0;
+
+
+   if (rp != 0)
+ {
+   rr.s.low = n0;
+   rr.s.high = n1;
+   *rp = rr.ll;
+ }
+ }
+  else
+ {
+
+
+   ((bm) = __builtin_clzl (d1));
+   if (bm == 0)
+ {
+
+   if (n1 > d1 || n0 >= d0)
+  {
+q0 = 1;
+do { USItype __x; __x = (n0) - (d0); (n1) = (n1) - (d1) - (__x > (n0));

Re: [PATCH] RISC-V:Auto vect for vector bf16

2024-10-17 Thread 钟居哲
+;; -
+;; - vfwmaccbf16
+;; -
+;; Combine extend + fma to widen_fma (vfwmacc)
+(define_insn_and_split "*widen_bf16_fma"
+  [(set (match_operand:VWEXTF_ZVFBF 0 "register_operand")
+(plus:VWEXTF_ZVFBF
+ (mult:VWEXTF_ZVFBF
+(float_extend:VWEXTF_ZVFBF
+ (match_operand: 2 "register_operand"))
+(float_extend:VWEXTF_ZVFBF
+ (match_operand: 3 "register_operand")))
+ (match_operand:VWEXTF_ZVFBF 1 "register_operand")))]
+  "TARGET_ZVFBFWMA && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
+riscv_vector::emit_vlmax_insn (code_for_pred_widen_bf16_mul (mode),
+  riscv_vector::WIDEN_TERNARY_OP_FRM_DYN, ops);
+DONE;
+  }
+  [(set_attr "type" "vfwmaccbf16")
+   (set_attr "mode" "")])

It should be in autovec-opt.md


juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-10-16 22:10
To: gcc-patches
CC: kito.cheng; juzhe.zhong; Feng Wang
Subject: [PATCH] RISC-V:Auto vect for vector bf16
This patch add auto-vect patterns for vector-bfloat16 extension.
Similar to vector extensions, these patterns can use vector
BF16 instructions to optimize the automatic vectorization of for loops.
gcc/ChangeLog:
 
* config/riscv/vector-bfloat16.md (extend2):
Add auto-vect pattern for vector-bfloat16.
(trunc2): Ditto.
(*widen_bf16_fma): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vfncvt-auto-vect.c: New test.
* gcc.target/riscv/rvv/autovec/vfwcvt-auto-vect.c: New test.
* gcc.target/riscv/rvv/autovec/vfwmacc-auto-vect.c: New test.
 
Signed-off-by: Feng Wang 
---
gcc/config/riscv/vector-bfloat16.md   | 144 --
.../riscv/rvv/autovec/vfncvt-auto-vect.c  |  19 +++
.../riscv/rvv/autovec/vfwcvt-auto-vect.c  |  19 +++
.../riscv/rvv/autovec/vfwmacc-auto-vect.c |  14 ++
4 files changed, 182 insertions(+), 14 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vfncvt-auto-vect.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vfwcvt-auto-vect.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vfwmacc-auto-vect.c
 
diff --git a/gcc/config/riscv/vector-bfloat16.md 
b/gcc/config/riscv/vector-bfloat16.md
index 562aa8ee5ed..e6482a83356 100644
--- a/gcc/config/riscv/vector-bfloat16.md
+++ b/gcc/config/riscv/vector-bfloat16.md
@@ -25,8 +25,24 @@
   (RVVMF2SF "TARGET_VECTOR_ELEN_BF_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
])
-(define_mode_attr V_FP32TOBF16_TRUNC [
+(define_mode_iterator VSF [
+  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
+  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
+])
+
+(define_mode_iterator VDF [
+  (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
+  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
+])
+
+(define_mode_attr V_FPWIDETOBF16_TRUNC [
   (RVVM8SF "RVVM4BF") (RVVM4SF "RVVM2BF") (RVVM2SF "RVVM1BF") (RVVM1SF 
"RVVMF2BF") (RVVMF2SF "RVVMF4BF")
+  (RVVM8DF "RVVM2BF") (RVVM4DF "RVVM1BF") (RVVM2DF "RVVMF2BF") (RVVM1DF 
"RVVMF4BF")
+])
+
+(define_mode_attr v_fpwidetobf16_trunc [
+  (RVVM8SF "rvvm4bf") (RVVM4SF "rvvm2bf") (RVVM2SF "rvvm1bf") (RVVM1SF 
"rvvmf2bf") (RVVMF2SF "rvvmf4bf")
+  (RVVM8DF "rvvm2bf") (RVVM4DF "rvvm1bf") (RVVM2DF "rvvmf2bf") (RVVM1DF 
"rvvmf4bf")
])
(define_mode_attr VF32_SUBEL [
@@ -35,8 +51,8 @@
;; Zvfbfmin extension
(define_insn "@pred_trunc_to_bf16"
-  [(set (match_operand: 0 "register_operand"   "=vd, vd, 
vr, vr,  &vr,  &vr")
- (if_then_else:
+  [(set (match_operand: 0 "register_operand"   "=vd, vd, 
vr, vr,  &vr,  &vr")
+ (if_then_else:
(unspec:
  [(match_operand: 1 "vector_mask_operand"  " vm, 
vm,Wc1,Wc1,vmWc1,vmWc1")
   (match_operand 4 "vector_length_operand" " rK, rK, 
rK, rK,   rK,   rK")
@@ -47,13 +63,13 @@
   (reg:SI VL_REGNUM)
   (reg:SI VTYPE_REGNUM)
   (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
-   (float_truncate:
+   (float_truncate:
   (match_operand:VWEXTF_ZVFBF 3 "register_operand"  "  0,  0,  
0,  0,   vr,   vr"))
-   (match_operand: 2 "vector_merge_operand" " vu,  0, 
vu,  0,   vu,0")))]
+   (match_operand: 2 "vector_merge_operand" " vu,  
0, vu,  0,   vu,0")))]
   "TARGET_ZVFBFMIN"
   "vfncvtbf16.f.f.w\t%0,%3%p1"
   [(set_attr "type" "vfncvtbf16")
-   (set_attr "mode" "")
+   (set_attr "mode" "")
(set (attr "frm_mode")
(symbol_ref "riscv_vector::get_frm_mode (operands[8])"))])
@@ -69,12 +85,12 @@
  (reg:SI VL_REGNUM)
  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
   (float_extend:VWEXTF_ZVFBF
- (match_operand: 3 

RE: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR

2024-10-17 Thread Richard Biener
On Tue, 15 Oct 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 15, 2024 1:20 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH 1/4]middle-end: support multi-step zero-extends using
> > VEC_PERM_EXPR
> > 
> > On Tue, 15 Oct 2024, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, October 15, 2024 12:13 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > > Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends 
> > > > using
> > > > VEC_PERM_EXPR
> > > >
> > > > On Tue, 15 Oct 2024, Tamar Christina wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for the look,
> > > > >
> > > > > The 10/15/2024 09:54, Richard Biener wrote:
> > > > > > On Mon, 14 Oct 2024, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > This patch series adds support for a target to do a direct 
> > > > > > > convertion for
> > zero
> > > > > > > extends using permutes.
> > > > > > >
> > > > > > > To do this it uses a target hook use_permute_for_promotio which 
> > > > > > > must be
> > > > > > > implemented by targets.  This hook is used to indicate:
> > > > > > >
> > > > > > >  1. can a target do this for the given modes.
> > > > > >
> > > > > > can_vec_perm_const_p?
> > > > > >
> > > > > > >  3. can the target convert between various vector modes with a
> > > > VIEW_CONVERT.
> > > > > >
> > > > > > We have modes_tieable_p for this I think.
> > > > > >
> > > > >
> > > > > Yes, though the reason I didn't use either of them was because they 
> > > > > are
> > reporting
> > > > > a capability of the backend.  In which case the hook, which is 
> > > > > already backend
> > > > > specific already should answer these two.
> > > > >
> > > > > I initially had these checks there, but they didn't seem to add 
> > > > > value, for
> > > > > promotions the masks are only dependent on the input and output modes.
> > So
> > > > they really
> > > > > don't change.
> > > > >
> > > > > When you have say a loop that does lots of conversions from say char 
> > > > > to int,
> > it
> > > > seemed
> > > > > like a waste to retest the same permute constants over and over again.
> > > > >
> > > > > I can add them back in if you prefer...
> > > > >
> > > > > > >  2. is it profitable for the target to do it.
> > > > > >
> > > > > > So you say the target can do both ways but both zip and tbl are
> > > > > > permute instructions so I really fail to see the point and why
> > > > > > the target itself doesn't choose to use tbl for unpack.
> > > > > >
> > > > > > Is the intent in the end to have VEC_PERM in the IL rather than
> > > > > > VEC_UNPACK_* so it combines with other VEC_PERMs?
> > > > > >
> > > > >
> > > > > Yes, and this happens quite often, e.g. load permutes or lane 
> > > > > shuffles etc.
> > > > > The reason for exposing them as VEC_PERM was to trigger further
> > optimizations.
> > > > >
> > > > > If you remember the ticket about LOAD_LANES, with this optimization 
> > > > > and an
> > > > open
> > > > > encoding of LOAD_LANES we stop using it in cases where theres a zero 
> > > > > extend
> > > > after
> > > > > the LOAD_LANES, because then you're doing effectively two permutes and
> > the
> > > > LOAD_LANES
> > > > > is no longer beneficial. There are other examples, load and replicate 
> > > > > etc.
> > > > >
> > > > > > That said, I'm not against supporting VEC_PERM code gen from
> > > > > > unsigned promotion but I don't see why we should do this when
> > > > > > the target advertises VEC_UNPACK_* support or direct conversion
> > > > > > support?
> > > > > >
> > > > > > Esp. with adding a "local" cost related hook which cannot take
> > > > > > into accout context.
> > > > > >
> > > > >
> > > > > To summarize a long story:
> > > > >
> > > > >   yes I open encode zero extends as permutes to allow further 
> > > > > optimizations.
> > > > One could convert
> > > > >   vec_unpacks to convert optabs and use that, but that is an opague 
> > > > > value
> > that
> > > > can't be further
> > > > >   optimized.
> > > > >
> > > > >   The hook isn't really a costing thing in the general sense. It's 
> > > > > literally just "do
> > you
> > > > want
> > > > >   permutes yes or no".  The reason it gets the modes is simply that I 
> > > > > don't
> > think a
> > > > single level
> > > > >   extend is worth it, but I can just change it to never try to do 
> > > > > this on more
> > than
> > > > one level.
> > > >
> > > > When you mention LOAD_LANES we do not expose "permutes" in them on
> > > > GIMPLE
> > > > either, so why should we for VEC_UNPACK_*.
> > >
> > > I think not exposing LOAD_LANES in GIMPLE *is* an actual mistake that I 
> > > hope to
> > correct in GCC-16.
> > > Or at least the time we pick LOAD_LANES is too early.  So I don't think 
> > > pointing to
> > this is a convincing
> > > argument.  It's 

[PATCH] Relax boolean processing in vect_maybe_update_slp_op_vectype

2024-10-17 Thread Richard Biener
The following makes VECTOR_BOOLEAN_TYPE_P processing consistent with
what we do without SLP.  The original motivation for rejecting of
VECTOR_BOOLEAN_TYPE_P extern defs was bad code generation.  But
the non-SLP codepath happily goes along - but always hits the
case of an uniform vector and this case specifically we can now
code-generate optimally.  So the following allows single-lane
externs as well.

Requiring patterns to code-generate can have bad influence on
the vectorization factor though a prototype patch of mine shows
that generating vector compares externally isn't always trivial.

The patch fixes the gcc.dg/vect/vect-early-break_82.c FAIL on x86_64
when --param vect-force-slp=1 is in effect.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/117171
* tree-vect-stmts.cc (vect_maybe_update_slp_op_vectype):
Relax vect_external_def VECTOR_BOOLEAN_TYPE_P constraint.
---
 gcc/tree-vect-stmts.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 948726d51c5..7d7d1512bd4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14292,9 +14292,12 @@ vect_maybe_update_slp_op_vectype (slp_tree op, tree 
vectype)
   if (SLP_TREE_VECTYPE (op))
 return types_compatible_p (SLP_TREE_VECTYPE (op), vectype);
   /* For external defs refuse to produce VECTOR_BOOLEAN_TYPE_P, those
- should be handled by patters.  Allow vect_constant_def for now.  */
+ should be handled by patters.  Allow vect_constant_def for now
+ as well as the trivial single-lane uniform vect_external_def case
+ both of which we code-generate reasonably.  */
   if (VECTOR_BOOLEAN_TYPE_P (vectype)
-  && SLP_TREE_DEF_TYPE (op) == vect_external_def)
+  && SLP_TREE_DEF_TYPE (op) == vect_external_def
+  && SLP_TREE_LANES (op) > 1)
 return false;
   SLP_TREE_VECTYPE (op) = vectype;
   return true;
-- 
2.43.0


[PATCH] Add --param vect-force-slp=1 to VECT_ADDITIONAL_FLAGS

2024-10-17 Thread Richard Biener
This makes us also use --param vect-force-slp=1 in addition to
-flto where LTO is supported.  Note this only covers the subset
of tests not in one of the special naming-adds-option set.  Note
neither g++.dg nor gfortran.dg vect.exp has VECT_ADDITIONAL_FLAGS.

This is a request for comments - the test coverage from this isn't
too big (it'll also trigger the CI)

* gcc.dg/vect/vect.exp (VECT_ADDITIONAL_FLAGS): Add
--param vect-force-slp=1.
---
 gcc/testsuite/gcc.dg/vect/vect.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect.exp 
b/gcc/testsuite/gcc.dg/vect/vect.exp
index 54640d845a8..89699c12aef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect.exp
+++ b/gcc/testsuite/gcc.dg/vect/vect.exp
@@ -83,7 +83,7 @@ lappend DEFAULT_VECTCFLAGS "-fdump-tree-vect-details"
 lappend VECT_SLP_CFLAGS "-fdump-tree-slp-details"
 
 # Main loop.
-set VECT_ADDITIONAL_FLAGS [list ""]
+set VECT_ADDITIONAL_FLAGS [list "--param=vect-force-slp=1"]
 if { [check_effective_target_lto] } {
 lappend VECT_ADDITIONAL_FLAGS "-flto -ffat-lto-objects"
 }
-- 
2.43.0


Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-17 Thread Richard Sandiford
Richard Sandiford  writes:
> Christophe Lyon  writes:
>> When --enable-werror is enabled when running the top-level configure,
>> it passes --enable-werror-always to subdirs.  Some of them, like
>> libgcc, ignore it.
>>
>> This patch adds support for it, enabled only for aarch64, to avoid
>> breaking bootstrap for other targets.
>>
>> The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling 
>> lse_init.c
>>
>>  libgcc/
>>  * Makefile.in (WERROR): New.
>>  * config/aarch64/t-aarch64: Handle WERROR. Always use
>>  -Wno-prio-ctor-dtor.
>>  * configure.ac: Add support for --enable-werror-always.
>>  * configure: Regenerate.
>> ---
>>  libgcc/Makefile.in  |  1 +
>>  libgcc/config/aarch64/t-aarch64 |  1 +
>>  libgcc/configure| 31 +++
>>  libgcc/configure.ac |  5 +
>>  4 files changed, 38 insertions(+)
>>
>> [...]
>> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
>> index 4e8c036990f..6b3ea2aea5c 100644
>> --- a/libgcc/configure.ac
>> +++ b/libgcc/configure.ac
>> @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
>>  sinclude(../config/gthr.m4)
>>  sinclude(../config/sjlj.m4)
>>  sinclude(../config/cet.m4)
>> +sinclude(../config/warnings.m4)
>>  
>>  AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
>>  AC_CONFIG_SRCDIR([static-object.mk])
>> @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
>>  # Determine what GCC version number to use in filesystem paths.
>>  GCC_BASE_VER
>>  
>> +# Only enable with --enable-werror-always until existing warnings are
>> +# corrected.
>> +ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual])
>
> It looks like this is borrowed from libcpp and/or libdecnumber.
> Those are a bit different from libgcc in that they're host libraries
> that can be built with any supported compiler (including non-GCC ones).
> In constrast, libgcc can only be built with the corresponding version
> of GCC.  The usual restrictions on -Werror -- only use it during stages
> 2 and 3, or if the user explicitly passes --enable-werror -- don't apply
> in libgcc's case.  We should always be building with the "right" version
> of GCC (even for Canadian crosses) and so should always be able to use
> -Werror.
>
> So personally, I think we should just go with:
>
> diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
> index b70e7b94edd..ae1588ce307 100644
> --- a/libgcc/config/aarch64/t-aarch64
> +++ b/libgcc/config/aarch64/t-aarch64
> @@ -30,3 +30,4 @@ LIB2ADDEH += \
>   $(srcdir)/config/aarch64/__arm_za_disable.S
>  
>  SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
> +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
>
> ...this, but with $(WERROR) replaced by -Werror.
>
> At least, it would be a good way of finding out if there's a case
> I've forgotten :)
>
> Let's see what others think though.

As per the later discussion, the t-aarch64 change described above is OK
for trunk, but anyone with commit access should feel free to revert it
if it breaks their build.  (Although please post a description of what
went wrong as well :))

Thanks for doing this.

Richard


Re: [PATCH 1/7] libstdc++: Refactor std::uninitialized_{copy, fill, fill_n} algos [PR68350]

2024-10-17 Thread Jonathan Wakely
On Thu, 17 Oct 2024 at 11:12, Jonathan Wakely  wrote:
>
> On Thu, 17 Oct 2024 at 02:39, Patrick Palka  wrote:
> >
> > On Tue, 15 Oct 2024, Jonathan Wakely wrote:
> >
> > > This is v2 of
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665246.html
> > > fixing some thinkos in uninitialized_{fill,fill_n}. We don't need to
> > > worry about overwriting tail-padding in those algos, because we only use
> > > memset for 1-byte integer types. So they have no tail padding that can
> > > be reused anyway! So this changes __n > 1 to __n > 0 in a few places
> > > (which fixes the problem that it was not actually filling anything for
> > > the n==1 cases).
> > >
> > > Also simplify std::__to_address(__result++) to just __result++ because
> > > we already have a pointer, and use std::to_address(result++) for a C++20
> > > std::contiguous_iterator case, instead of addressof(*result++).
> > >
> > > Tested x86_64-linux.
> > >
> > > -- >8 --
> > >
> > > This refactors the std::uninitialized_copy, std::uninitialized_fill and
> > > std::uninitialized_fill_n algorithms to directly perform memcpy/memset
> > > optimizations instead of dispatching to std::copy/std::fill/std::fill_n.
> > >
> > > The reasons for this are:
> > >
> > > - Use 'if constexpr' to simplify and optimize compilation throughput, so
> > >   dispatching to specialized class templates is only needed for C++98
> > >   mode.
> > > - Relax the conditions for using memcpy/memset, because the C++20 rules
> > >   on implicit-lifetime types mean that we can rely on memcpy to begin
> > >   lifetimes of trivially copyable types.  We don't need to require
> > >   trivially default constructible, so don't need to limit the
> > >   optimization to trivial types. See PR 68350 for more details.
> > > - The conditions on non-overlapping ranges are stronger for
> > >   std::uninitialized_copy than for std::copy so we can use memcpy instead
> > >   of memmove, which might be a minor optimization.
> > > - Avoid including  in .
> > >   It only needs some iterator utilities from that file now, which belong
> > >   in  anyway, so this moves them there.
> > >
> > > Several tests need changes to the diagnostics matched by dg-error
> > > because we no longer use the __constructible() function that had a
> > > static assert in. Now we just get straightforward errors for attempting
> > > to use a deleted constructor.
> > >
> > > Two tests needed more signficant changes to the actual expected results
> > > of executing the tests, because they were checking for old behaviour
> > > which was incorrect according to the standard.
> > > 20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
> > > std::copy to be used for a call to std::uninitialized_copy involving two
> > > trivially copyable types. That was incorrect behaviour, because a
> > > non-trivial constructor should have been used, but using std::copy used
> > > trivial default initialization followed by assignment.
> > > 20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
> > > the behaviour with a non-integral Size passed to uninitialized_fill_n,
> > > but I wrote the test looking at the requirements of uninitialized_copy_n
> > > which are not the same as uninitialized_fill_n. The former uses --n and
> > > tests n > 0, but the latter just tests n-- (which will never be false
> > > for a floating-point value with a fractional part).
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   PR libstdc++/68350
> > >   PR libstdc++/93059
> > >   * include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
> > >   to ...
> > >   * include/bits/stl_iterator.h: ... here.
> > >   * include/bits/stl_uninitialized.h (__check_constructible)
> > >   (_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
> > >   [C++98] (__unwrappable_niter): New trait.
> > >   (__uninitialized_copy): Replace use of std::copy.
> > >   (uninitialized_copy): Fix Doxygen comments. Open-code memcpy
> > >   optimization for C++11 and later.
> > >   (__uninitialized_fill): Replace use of std::fill.
> > >   (uninitialized_fill): Fix Doxygen comments. Open-code memset
> > >   optimization for C++11 and later.
> > >   (__uninitialized_fill_n): Replace use of std::fill_n.
> > >   (uninitialized_fill_n): Fix Doxygen comments. Open-code memset
> > >   optimization for C++11 and later.
> > >   * 
> > > testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
> > >   Adjust expected behaviour to match what the standard specifies.
> > >   * 
> > > testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
> > >   Likewise.
> > >   * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
> > >   Adjust dg-error directives.
> > >   * 
> > > testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
> > >   Likewise.
> > >   * 
> > > testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89

Re: [SH][committed] PR 113533

2024-10-17 Thread Oleg Endo
On Mon, 2024-10-14 at 11:37 +0900, Oleg Endo wrote:
> For memory loads/stores (that contain a MEM rtx) sh_rtx_costs would wrongly
> report a cost lower than 1 insn which is not accurate as it makes
> loads/stores appear cheaper than simple arithmetic insns.  The cost of a
> load/store insn is at least 1 insn plus the cost of the address expression
> (some addressing modes can be considered more expensive than others due to
> additional constraints).
> 
> Tested with make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-
> mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> CSiBE set shows a little bit of +/- code size movement due to some insn
> reordering.  Difficult to judge whether it's all good or bad.  Doesn't seem
> that significant.
> 
> Thanks to Roger for the original patch proposal.
> Committed to master.
> 

The previous patch had a typo.  Committed the attached amendment to master
after re-testing.

Best regards,
Oleg Endo
From 2390cbad85cbd122d4e58c94f7891d7c5fde49b3 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Thu, 17 Oct 2024 21:40:14 +0900
Subject: [PATCH] SH: Fix typo of commit b717c462b96e

gcc/ChangeLog:
	PR target/113533
	* config/sh/sh.cc (sh_rtx_costs): Delete wrong semicolon.
---
 gcc/config/sh/sh.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 6ad202f..f69ede0 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -3353,17 +3353,17 @@ sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	  if (GET_CODE (xx) == SET && MEM_P (XEXP (xx, 0)))
 	{
 	  *total = sh_address_cost (XEXP (XEXP (xx, 0), 0), 
 	GET_MODE (XEXP (xx, 0)),
-	MEM_ADDR_SPACE (XEXP (xx, 0)), speed);
+	MEM_ADDR_SPACE (XEXP (xx, 0)), speed)
 		   + COSTS_N_INSNS (1);
 	  return true;
 	}
 	  if (GET_CODE (xx) == SET && MEM_P (XEXP (xx, 1)))
 	{
 	  *total = sh_address_cost (XEXP (XEXP (xx, 1), 0),
 	GET_MODE (XEXP (xx, 1)),
-	MEM_ADDR_SPACE (XEXP (xx, 1)), speed);
+	MEM_ADDR_SPACE (XEXP (xx, 1)), speed)
 		   + COSTS_N_INSNS (1);
 	  return true;
 	}
 	}
--
libgit2 1.7.2



[PATCH] [1/n] remove gcc.dg/vect special naming in favor of dg-additional-options

2024-10-17 Thread Richard Biener
This kicks off removal of keying options used on testcase names as
done in gcc.dg/vect as the appropriate way to do this is using
dg-additional-options.

Starting with two of the least used ones.

This causes the moved tests to be covered by VECT_ADDITIONAL_FLAGS
processing.

Tested on x86_64-unknown-linux-gnu, pushed.

* gcc.dg/vect/vect.exp: Process no-fast-math-* and
no-math-errno-* in the main set.
* gcc.dg/vect/no-fast-math-vect16.c: Add -fno-fast-math.
* gcc.dg/vect/no-math-errno-slp-32.c: Add -fno-math-errno.
* gcc.dg/vect/no-math-errno-vect-pow-1.c: Likewise.
---
 .../gcc.dg/vect/no-fast-math-vect16.c |  2 +-
 .../gcc.dg/vect/no-math-errno-slp-32.c|  1 +
 .../gcc.dg/vect/no-math-errno-vect-pow-1.c|  1 +
 gcc/testsuite/gcc.dg/vect/vect.exp| 20 ++-
 4 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 
b/gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c
index 5f871289337..a3c530683d0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c
+++ b/gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c
@@ -1,5 +1,5 @@
 /* Disabling epilogues until we find a better way to deal with scans.  */
-/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-fno-fast-math --param vect-epilogues-nomask=0" } 
*/
 /* { dg-require-effective-target vect_float_strict } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/vect/no-math-errno-slp-32.c 
b/gcc/testsuite/gcc.dg/vect/no-math-errno-slp-32.c
index 18064cc3e87..0b16a1395e4 100644
--- a/gcc/testsuite/gcc.dg/vect/no-math-errno-slp-32.c
+++ b/gcc/testsuite/gcc.dg/vect/no-math-errno-slp-32.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-math-errno" } */
 
 double x[256];
 
diff --git a/gcc/testsuite/gcc.dg/vect/no-math-errno-vect-pow-1.c 
b/gcc/testsuite/gcc.dg/vect/no-math-errno-vect-pow-1.c
index 8e3989a3283..9794de78f2b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-math-errno-vect-pow-1.c
+++ b/gcc/testsuite/gcc.dg/vect/no-math-errno-vect-pow-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-math-errno" } */
 
 double x[256];
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect.exp 
b/gcc/testsuite/gcc.dg/vect/vect.exp
index 4566e904eb9..54640d845a8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect.exp
+++ b/gcc/testsuite/gcc.dg/vect/vect.exp
@@ -100,6 +100,12 @@ foreach flags $VECT_ADDITIONAL_FLAGS {
 et-dg-runtest dg-runtest [lsort \
[glob -nocomplain $srcdir/$subdir/slp-*.\[cS\]]] \
$flags $DEFAULT_VECTCFLAGS
+et-dg-runtest dg-runtest [lsort \
+   [glob -nocomplain $srcdir/$subdir/no-fast-math-*.\[cS\]]] \
+   $flags $DEFAULT_VECTCFLAGS
+et-dg-runtest dg-runtest [lsort \
+   [glob -nocomplain $srcdir/$subdir/no-math-errno-*.\[cS\]]] \
+   $flags $DEFAULT_VECTCFLAGS
 et-dg-runtest dg-runtest [lsort \
[glob -nocomplain $srcdir/$subdir/bb-slp*.\[cS\]]] \
$flags $VECT_SLP_CFLAGS
@@ -131,20 +137,6 @@ et-dg-runtest dg-runtest [lsort \
[glob -nocomplain $srcdir/$subdir/fast-math-bb-slp-*.\[cS\]]] \
"" $VECT_SLP_CFLAGS
 
-# -fno-fast-math tests
-set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
-lappend DEFAULT_VECTCFLAGS "-fno-fast-math"
-et-dg-runtest dg-runtest [lsort \
-   [glob -nocomplain $srcdir/$subdir/no-fast-math-*.\[cS\]]] \
-   "" $DEFAULT_VECTCFLAGS
-
-# -fno-math-errno tests
-set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
-lappend DEFAULT_VECTCFLAGS "-fno-math-errno"
-et-dg-runtest dg-runtest [lsort \
-   [glob -nocomplain $srcdir/$subdir/no-math-errno-*.\[cS\]]] \
-   "" $DEFAULT_VECTCFLAGS
-
 # -fwrapv tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fwrapv"
-- 
2.43.0


[PATCH v3] AArch64: Fix copysign patterns

2024-10-17 Thread Wilco Dijkstra
The current copysign pattern has a mismatch in the predicates and constraints -
operand[2] is a register_operand but also has an alternative X which allows any
operand.  Since it is a floating point operation, having an integer alternative
makes no sense.  Change the expander to always use vector immediates which 
results
in better code and sharing of immediates between copysign and xorsign.

Passes bootstrap and regress, OK for commit?

gcc/Changelog:
* config/aarch64/aarch64.md (copysign3): Widen immediate to 
vector.
(copysign3_insn): Use VQ_INT_EQUIV in operand 3.
* config/aarch64/iterators.md (VQ_INT_EQUIV): New iterator.
(vq_int_equiv): Likewise.

testsuite/Changelog:
* gcc.target/aarch64/copysign_3.c: New test.
* gcc.target/aarch64/copysign_4.c: New test.
* gcc.target/aarch64/fneg-abs_2.c: Fixup test.
* gcc.target/aarch64/sve/fneg-abs_2.c: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
c54b29cd64b9e0dc6c6d12735049386ccedc5408..71f9743df671b70e6a2d189f49de58995398abee
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7218,20 +7218,11 @@ (define_expand "lrint2"
 }
 )
 
-;; For copysign (x, y), we want to generate:
+;; For copysignf (x, y), we want to generate:
 ;;
-;;   LDR d2, #(1 << 63)
-;;   BSL v2.8b, [y], [x]
+;; moviv31.4s, 0x80, lsl 24
+;; bit v0.16b, v1.16b, v31.16b
 ;;
-;; or another, equivalent, sequence using one of BSL/BIT/BIF.  Because
-;; we expect these operations to nearly always operate on
-;; floating-point values, we do not want the operation to be
-;; simplified into a bit-field insert operation that operates on the
-;; integer side, since typically that would involve three inter-bank
-;; register copies.  As we do not expect copysign to be followed by
-;; other logical operations on the result, it seems preferable to keep
-;; this as an unspec operation, rather than exposing the underlying
-;; logic to the compiler.
 
 (define_expand "copysign3"
   [(match_operand:GPF 0 "register_operand")
@@ -7239,32 +7230,25 @@ (define_expand "copysign3"
(match_operand:GPF 2 "nonmemory_operand")]
   "TARGET_SIMD"
 {
-  rtx signbit_const = GEN_INT (HOST_WIDE_INT_M1U
-  << (GET_MODE_BITSIZE (mode) - 1));
-  /* copysign (x, -1) should instead be expanded as orr with the sign
- bit.  */
+  rtx sign = GEN_INT (HOST_WIDE_INT_M1U << (GET_MODE_BITSIZE (mode) - 
1));
+  rtx v_bitmask = gen_const_vec_duplicate (mode, sign);
+  v_bitmask = force_reg (mode, v_bitmask);
+
+  /* copysign (x, -1) should instead be expanded as orr with the signbit.  */
   rtx op2_elt = unwrap_const_vec_duplicate (operands[2]);
+
   if (GET_CODE (op2_elt) == CONST_DOUBLE
   && real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
 {
-  rtx v_bitmask
-   = force_reg (V2mode,
-gen_const_vec_duplicate (V2mode,
- signbit_const));
-
-  emit_insn (gen_iorv23 (
-   lowpart_subreg (V2mode, operands[0], mode),
-   lowpart_subreg (V2mode, operands[1], mode),
+  emit_insn (gen_ior3 (
+   lowpart_subreg (mode, operands[0], mode),
+   lowpart_subreg (mode, operands[1], mode),
v_bitmask));
   DONE;
 }
-
-  machine_mode int_mode = mode;
-  rtx bitmask = gen_reg_rtx (int_mode);
-  emit_move_insn (bitmask, signbit_const);
   operands[2] = force_reg (mode, operands[2]);
   emit_insn (gen_copysign3_insn (operands[0], operands[1], operands[2],
-  bitmask));
+  v_bitmask));
   DONE;
 }
 )
@@ -7273,23 +7257,21 @@ (define_insn "copysign3_insn"
   [(set (match_operand:GPF 0 "register_operand")
(unspec:GPF [(match_operand:GPF 1 "register_operand")
 (match_operand:GPF 2 "register_operand")
-(match_operand: 3 "register_operand")]
+(match_operand: 3 "register_operand")]
 UNSPEC_COPYSIGN))]
   "TARGET_SIMD"
   {@ [ cons: =0 , 1 , 2 , 3 ; attrs: type  ]
  [ w, w , w , 0 ; neon_bsl  ] bsl\t%0., %2., 
%1.
  [ w, 0 , w , w ; neon_bsl  ] bit\t%0., %2., 
%3.
  [ w, w , 0 , w ; neon_bsl  ] bif\t%0., %1., 
%3.
- [ r, r , 0 , X ; bfm  ] bfxil\t%0, %1, #0, 

   }
 )
 
-
-;; For xorsign (x, y), we want to generate:
+;; For xorsignf (x, y), we want to generate:
 ;;
-;; LDR   d2, #1<<63
-;; AND   v3.8B, v1.8B, v2.8B
-;; EOR   v0.8B, v0.8B, v3.8B
+;; moviv31.4s, 0x80, lsl 24
+;; and v31.16b, v31.16b, v1.16b
+;; eor v0.16b, v31.16b, v0.16b
 ;;
 
 (define_expand "@xorsign3"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
1322193b027c9ad1d45d5b5ebbeea0e8537615d3..a1ea66048b2a5066381194779ff4d2998228d8f7
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1889,6 +1889,1

[PATCH] libstdc++: Move std::__niter_base and std::__niter_wrap to stl_iterator.h

2024-10-17 Thread Jonathan Wakely
I've split this out of "Refactor std::uninitialized_{copy, fill, fill_n}"
because this part can be done separately. Call it [PATCH -1/7] if you
like :-)

This fixes the ordering problem that Patrick noticed in [PATCH 1/7], and
adds a test for it. It also updates the comments as was previously done
in [PATCH 2/7], which Patrick noted could have been done when moving the
functions into stl_iterator.h.

Note that the __niter_base overloads for reverse_iterator and
move_iterator call __niter_base unqualified, which means that in
contrast to all other uses of __niter_base they *do* use ADL to find the
next __niter_base to call. I think that's necessary so that it works for
both reverse_iterator> and the inverse order,
move_iterator>. I haven't changed that here, they
still use unqualified calls.

As a further change in this area, I think it would be possible (and
maybe nice) to remove __miter_base and replace the uses of it in
std::move_backward(I,I,O) and std::move(I,I,O). That's left for another
day.

Tested x86_64-linux.

-- >8 --

Move the functions for unwrapping and rewrapping __normal_iterator
objects to the same file as the definition of __normal_iterator itself.

This will allow a later commit to make use of std::__niter_base in other
headers without having to include all of .

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
to ...
* include/bits/stl_iterator.h: ... here.
(__niter_base, __miter_base): Move all overloads to the end of
the header.
* testsuite/24_iterators/normal_iterator/wrapping.cc: New test.
---
 libstdc++-v3/include/bits/stl_algobase.h  |  45 --
 libstdc++-v3/include/bits/stl_iterator.h  | 138 +-
 .../24_iterators/normal_iterator/wrapping.cc  |  29 
 3 files changed, 132 insertions(+), 80 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/normal_iterator/wrapping.cc

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 384e5fdcdc9..751b7ad119b 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -308,51 +308,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __a;
 }
 
-  // Fallback implementation of the function in bits/stl_iterator.h used to
-  // remove the __normal_iterator wrapper. See copy, fill, ...
-  template
-_GLIBCXX20_CONSTEXPR
-inline _Iterator
-__niter_base(_Iterator __it)
-_GLIBCXX_NOEXCEPT_IF(std::is_nothrow_copy_constructible<_Iterator>::value)
-{ return __it; }
-
-#if __cplusplus < 201103L
-  template
-_Ite
-__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
-std::random_access_iterator_tag>&);
-
- template
-_Ite
-__niter_base(const ::__gnu_debug::_Safe_iterator<
-::__gnu_cxx::__normal_iterator<_Ite, _Cont>, _Seq,
-std::random_access_iterator_tag>&);
-#else
-  template
-_GLIBCXX20_CONSTEXPR
-decltype(std::__niter_base(std::declval<_Ite>()))
-__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
-std::random_access_iterator_tag>&)
-noexcept(std::is_nothrow_copy_constructible<_Ite>::value);
-#endif
-
-  // Reverse the __niter_base transformation to get a
-  // __normal_iterator back again (this assumes that __normal_iterator
-  // is only used to wrap random access iterators, like pointers).
-  template
-_GLIBCXX20_CONSTEXPR
-inline _From
-__niter_wrap(_From __from, _To __res)
-{ return __from + (std::__niter_base(__res) - std::__niter_base(__from)); }
-
-  // No need to wrap, iterator already has the right type.
-  template
-_GLIBCXX20_CONSTEXPR
-inline _Iterator
-__niter_wrap(const _Iterator&, _Iterator __res)
-{ return __res; }
-
   // All of these auxiliary structs serve two purposes.  (1) Replace
   // calls to copy with memmove whenever possible.  (Memmove, not memcpy,
   // because the input and output ranges are permitted to overlap.)
diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 28a600c81cb..be3fa6f7a34 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -654,24 +654,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #  endif // C++20
 # endif // __glibcxx_make_reverse_iterator
 
-  template
-_GLIBCXX20_CONSTEXPR
-auto
-__niter_base(reverse_iterator<_Iterator> __it)
--> decltype(__make_reverse_iterator(__niter_base(__it.base(
-{ return __make_reverse_iterator(__niter_base(__it.base())); }
-
   template
 struct __is_move_iterator >
   : __is_move_iterator<_Iterator>
 { };
-
-  template
-_GLIBCXX20_CONSTEXPR
-auto
-__miter_base(reverse_iterator<_Iterator> __it)
--> decltype(__make_reverse_iterator(__miter_base(__it.base(
-{ return __make_reverse_iterator(__miter_base(__it.base())); }
 #endif // C++11

Re: [PATCH] libstdc++: Move std::__niter_base and std::__niter_wrap to stl_iterator.h

2024-10-17 Thread Patrick Palka
On Thu, 17 Oct 2024, Jonathan Wakely wrote:

> I've split this out of "Refactor std::uninitialized_{copy, fill, fill_n}"
> because this part can be done separately. Call it [PATCH -1/7] if you
> like :-)
> 
> This fixes the ordering problem that Patrick noticed in [PATCH 1/7], and
> adds a test for it. It also updates the comments as was previously done
> in [PATCH 2/7], which Patrick noted could have been done when moving the
> functions into stl_iterator.h.

LGTM

> 
> Note that the __niter_base overloads for reverse_iterator and
> move_iterator call __niter_base unqualified, which means that in
> contrast to all other uses of __niter_base they *do* use ADL to find the
> next __niter_base to call. I think that's necessary so that it works for
> both reverse_iterator> and the inverse order,
> move_iterator>. I haven't changed that here, they
> still use unqualified calls.

IIUC since the overloads' constraints are mutually recursive it'd be
kind of awkward to avoid ADL.  Dunno how badly we want to avoid ADL
here, but I think one way would be to define the overloads as static
member functions and make the calls within the signature
dependently-scoped, e.g.

  struct __niter_base_overloads {
template
  _GLIBCXX20_CONSTEXPR
  static auto
  __niter_base(reverse_iterator<_Iterator> __it)
  -> decltype(__make_reverse_iterator(_Self::__niter_base(__it.base(
  { return __make_reverse_iterator(__niter_base(__it.base())); }

template
  _GLIBCXX20_CONSTEXPR
  static auto
  __niter_base(move_iterator<_Iterator> __it)
  -> decltype(make_move_iterator(_Self::__niter_base(__it.base(
  { return make_move_iterator(__niter_base(__it.base())); }

...
  };

> 
> As a further change in this area, I think it would be possible (and
> maybe nice) to remove __miter_base and replace the uses of it in
> std::move_backward(I,I,O) and std::move(I,I,O). That's left for another
> day.
> 
> Tested x86_64-linux.
> 
> -- >8 --
> 
> Move the functions for unwrapping and rewrapping __normal_iterator
> objects to the same file as the definition of __normal_iterator itself.
> 
> This will allow a later commit to make use of std::__niter_base in other
> headers without having to include all of .
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
>   to ...
>   * include/bits/stl_iterator.h: ... here.
>   (__niter_base, __miter_base): Move all overloads to the end of
>   the header.
>   * testsuite/24_iterators/normal_iterator/wrapping.cc: New test.
> ---
>  libstdc++-v3/include/bits/stl_algobase.h  |  45 --
>  libstdc++-v3/include/bits/stl_iterator.h  | 138 +-
>  .../24_iterators/normal_iterator/wrapping.cc  |  29 
>  3 files changed, 132 insertions(+), 80 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/normal_iterator/wrapping.cc
> 
> diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
> b/libstdc++-v3/include/bits/stl_algobase.h
> index 384e5fdcdc9..751b7ad119b 100644
> --- a/libstdc++-v3/include/bits/stl_algobase.h
> +++ b/libstdc++-v3/include/bits/stl_algobase.h
> @@ -308,51 +308,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>return __a;
>  }
>  
> -  // Fallback implementation of the function in bits/stl_iterator.h used to
> -  // remove the __normal_iterator wrapper. See copy, fill, ...
> -  template
> -_GLIBCXX20_CONSTEXPR
> -inline _Iterator
> -__niter_base(_Iterator __it)
> -
> _GLIBCXX_NOEXCEPT_IF(std::is_nothrow_copy_constructible<_Iterator>::value)
> -{ return __it; }
> -
> -#if __cplusplus < 201103L
> -  template
> -_Ite
> -__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
> -  std::random_access_iterator_tag>&);
> -
> - template
> -_Ite
> -__niter_base(const ::__gnu_debug::_Safe_iterator<
> -  ::__gnu_cxx::__normal_iterator<_Ite, _Cont>, _Seq,
> -  std::random_access_iterator_tag>&);
> -#else
> -  template
> -_GLIBCXX20_CONSTEXPR
> -decltype(std::__niter_base(std::declval<_Ite>()))
> -__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
> -  std::random_access_iterator_tag>&)
> -noexcept(std::is_nothrow_copy_constructible<_Ite>::value);
> -#endif
> -
> -  // Reverse the __niter_base transformation to get a
> -  // __normal_iterator back again (this assumes that __normal_iterator
> -  // is only used to wrap random access iterators, like pointers).
> -  template
> -_GLIBCXX20_CONSTEXPR
> -inline _From
> -__niter_wrap(_From __from, _To __res)
> -{ return __from + (std::__niter_base(__res) - 
> std::__niter_base(__from)); }
> -
> -  // No need to wrap, iterator already has the right type.
> -  template
> -_GLIBCXX20_CONSTEXPR
> -inline _Iterator
> -__niter_wrap(const _Iterator&, _Iterator __res)
> -{ return __res; }
> -
>// All of these auxiliary structs serve two purposes.  (1) Replac

libgo: fix for C23 nullptr keyword

2024-10-17 Thread Joseph Myers
Making GCC default to -std=gnu23 for C code produces Go test failures 
because of C code used by Go that uses a variable called nullptr, which is 
a keyword in C23.

I've submitted this fix upstream at 
https://github.com/golang/go/pull/69927 using the GitHub mirror workflow.  
Ian, once some form of such a fix is upstream, could you backport it to 
GCC's libgo?

diff --git a/libgo/go/runtime/testdata/testprogcgo/threadprof.go 
b/libgo/go/runtime/testdata/testprogcgo/threadprof.go
index d62d4b4be83..f61c51b8b62 100644
--- a/libgo/go/runtime/testdata/testprogcgo/threadprof.go
+++ b/libgo/go/runtime/testdata/testprogcgo/threadprof.go
@@ -36,10 +36,10 @@ __attribute__((constructor)) void issue9456() {
}
 }
 
-void **nullptr;
+void **nullpointer;
 
 void *crash(void *p) {
-   *nullptr = p;
+   *nullpointer = p;
return 0;
 }
 

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] rs6000, fix test builtins-1-p10-runnable.c

2024-10-17 Thread Carl Love

Ping 2


On 10/9/24 7:43 AM, Carl Love wrote:

Ping, FYI this is a fairly simple fix to a testcase.


On 10/3/24 8:11 AM, Carl Love wrote:

GCC maintainers:

The builtins-1-10-runnable.c has the debugging inadvertently 
enabled.  The test uses #ifdef to enable/disable the debugging. 
Unfortunately, the #define DEBUG was set to 0 to disable debugging 
and enable the call to abort in case of error.  The #define should 
have been removed to disable debugging.
Additionally, a change in the expected output which was made for 
testing purposes was not removed.  Hence, the test is printing that 
there was an error not calling abort.  The result is the test does 
not get reported as failing.


This patch removes the #define DEBUG to enable the call to abort and 
restores the expected output to the correct value.  The patch was 
tested on a Power 10 without the #define DEBUG to verify that the 
test does fail with the incorrect expected value.  The correct 
expected value was then restored.  The test reports 19 expected 
passes and no errors.


Please let me know if this patch is acceptable for mainline. Thanks.

Carl


--- 



rs6000, fix test builtins-1-p10-runnable.c

The test has two issues:

1) The test should generate execute abort() if an error is found.
However, the test contains a #define 0 which actually enables the
error prints not exectuting void() because the debug code is protected
by an #ifdef not #if.  The #define DEBUG needs to be removed to so the
test will abort on an error.

2) The vec_i_expected output was tweeked to test that it would fail.
The test value was not removed.

By removing the #define DEBUG, the test fails and reports 1 failure.
Removing the intentionally wrong expected value results in the test
passing with no errors as expected.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove #define
    DEBUG.    Replace vec_i_expected value with correct value.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..3e8a1c736e3 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -25,8 +25,6 @@
 #include 
 #include 

-#define DEBUG 0
-
 #ifdef DEBUG
 #include 
 #endif
@@ -281,8 +279,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);







Re: [PATCH ver2 0/4] rs6000, remove redundant built-ins and add more test cases

2024-10-17 Thread Carl Love

Ping 2


On 10/9/24 7:44 AM, Carl Love wrote:


Ping


On 10/1/24 8:12 AM, Carl Love wrote:


GCC maintainers:

The following version 2 of a series of patches for PowerPC removes 
some built-ins that are covered by existing overloaded built-ins. 
Additionally, there are patches to add missing testcases and 
documentation.  The original version of the patch series was posted 
on 8/7/2024.  It was originally reviewed by Kewen.


The patches have been updated per the review.  Note patches 2 and 3 
in the series were approved with minor changes.  I will post the 
entire series for review for completeness.


The patch series has been re-tested on Power 10 LE and BE with no 
regressions.


Please let me know if the patches are acceptable for mainline. Thanks.

    Carl






Re: [PATCH] AArch64: Remove redundant check in aarch64_simd_mov

2024-10-17 Thread Victor Do Nascimento

FWIW, I definitely agree about the spuriousness of the V2DI mode check.
While I can't approve, I can confirm it looks good.

Thanks,
Victor.

On 10/17/24 16:10, Wilco Dijkstra wrote:


The split condition in aarch64_simd_mov uses aarch64_simd_special_constant_p.  
While
doing the split, it checks the mode before calling 
aarch64_maybe_generate_simd_constant.
This risky since it may result in unexpectedly calling aarch64_split_simd_move 
instead
of aarch64_maybe_generate_simd_constant.  Since the mode is already checked, 
remove the
spurious explicit mode check.

Passes bootstrap & regress, OK for commit?

---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
18795a08b61da874a9e811822ed82e7eb9350bb4..5ac80103502112664528d37e3b8e24edc16eb932
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -208,7 +208,6 @@ (define_insn_and_split "*aarch64_simd_mov"
  else
{
if (FP_REGNUM_P (REGNO (operands[0]))
-   && mode == V2DImode
&& aarch64_maybe_generate_simd_constant (operands[0], operands[1],
 mode))
  ;



Re: [PATCH] Add fancy pointer support in std::map/set

2024-10-17 Thread Jonathan Wakely
On Thu, 17 Oct 2024 at 20:52, François Dumont  wrote:

> Here is an updated version that compiles, I think, all your feedbacks.
> It's much cleaner indeed.
>

Thanks, I'll take a look tomorrow.


> It's also tested in C++98/17/23.
>
> I'm surprised that we do not need to consider potential
> allocator::const_pointer.
>
Do you mean consider the case where Alloc::const_pointer is not the same
type as rebinding 'pointer' to a const element type?

We don't need to consider that because we never get a 'const_pointer' from
the allocator, and we never need to pass a 'const_pointer' to the
allocator. The allocator's 'allocate' and 'deallocate' members both work
with the 'pointer' type, so we only need to use that type when interacting
with the allocator. For all the other uses, such as _Const_Node_ptr, what
we need is a pointer-to-const that's compatible with the allocator's
pointer type. It doesn't actually matter if it's the same type as
allocator_traits::const_pointer, because we don't need


> Is there a plan to deprecate it ?
>

No, although I think that would be possible. Nothing in the allocator
requirements ever uses that type or cares what it is, as long as it's
convertible to  'const_void_pointer', and 'pointer' is convertible to
'const_pointer'.


> And if not, should not alloc traits const_pointer be per default a rebind
> of pointer for const element_type like in the __add_const_to_ptr you made
> me add ? I can try to work on a patch for that if needed.
>

No, allocator_traits is defined correctly as the standard requires. If an
allocator A defines a 'A::const_pointer' typedef, then that is used. If
not, then 'allocator_traits::const_pointer' defaults to rebinding the
non-const 'allocator_traits::pointer' type.


[PATCH] c++: redundant hashing in register_specialization

2024-10-17 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

After r15-4050-g5dad738c1dd164 register_specialization needs to set
elt.hash to the (maybe) precomputed hash in order to avoid redundantly
rehashing.

gcc/cp/ChangeLog:

* pt.cc (register_specialization): Set elt.hash.
---
 gcc/cp/pt.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8b183a139d7..ec4313090bd 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1545,9 +1545,7 @@ register_specialization (tree spec, tree tmpl, tree args, 
bool is_friend,
   elt.tmpl = tmpl;
   elt.args = args;
   elt.spec = spec;
-
-  if (hash == 0)
-hash = spec_hasher::hash (&elt);
+  elt.hash = hash;
 
   spec_entry **slot = decl_specializations->find_slot (&elt, INSERT);
   if (*slot)
-- 
2.47.0.86.g15030f9556



Re: [PATCH v2] contrib/: Configure git-format-patch(1) to add To: gcc-patches@gcc.gnu.org

2024-10-17 Thread Eric Gallager
On Thu, Oct 17, 2024 at 10:54 AM Alejandro Colomar  wrote:
>
> Just like we already do for git-send-email(1).  In some cases, patches
> are prepared with git-format-patch(1), but are sent with a different
> program, or some flags to git-send-email(1) may accidentally inhibit the
> configuration.  By adding the TO in the email file, we make sure that
> gcc-patches@ will receive the patch.
>
> contrib/ChangeLog:
>
> * gcc-git-customization.sh: Configure git-format-patch(1) to add
> 'To: gcc-patches@gcc.gnu.org'.
>
> Cc: Eric Gallager 
> Signed-off-by: Alejandro Colomar 
> ---
>
> Hi!
>
> v2 changes:
>
> -  Fix comment.  [Eric]
>
> Cheers,
> Alex
>
>
> Range-diff against v1:
> 1:  0ee3f802637 ! 1:  2bd0e0f82bf contrib/: Configure git-format-patch(1) to 
> add To: gcc-patches@gcc.gnu.org
> @@ Commit message
>  * gcc-git-customization.sh: Configure git-format-patch(1) to 
> add
>  'To: gcc-patches@gcc.gnu.org'.
>
> +Cc: Eric Gallager 
>  Signed-off-by: Alejandro Colomar 
>
>   ## contrib/gcc-git-customization.sh ##
> -@@ contrib/gcc-git-customization.sh: git config diff.md.xfuncname 
> '^\(define.*$'
> +@@ contrib/gcc-git-customization.sh: git config alias.gcc-style '!f() {
> + # *.mddiff=md
> + git config diff.md.xfuncname '^\(define.*$'
>
> - # Tell git send-email where patches go.
> +-# Tell git send-email where patches go.
> ++# Tell git-format-patch(1)/git-send-email(1) where patches go.
>   # ??? Maybe also set sendemail.tocmd to guess from MAINTAINERS?
>  +git config format.to 'gcc-patches@gcc.gnu.org'
>   git config sendemail.to 'gcc-patches@gcc.gnu.org'
>
>  contrib/gcc-git-customization.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/gcc-git-customization.sh 
> b/contrib/gcc-git-customization.sh
> index 54bd35ea1aa..dd59bece1dc 100755
> --- a/contrib/gcc-git-customization.sh
> +++ b/contrib/gcc-git-customization.sh
> @@ -41,8 +41,9 @@ git config alias.gcc-style '!f() {
>  # *.mddiff=md
>  git config diff.md.xfuncname '^\(define.*$'
>
> -# Tell git send-email where patches go.
> +# Tell git-format-patch(1)/git-send-email(1) where patches go.
>  # ??? Maybe also set sendemail.tocmd to guess from MAINTAINERS?
> +git config format.to 'gcc-patches@gcc.gnu.org'
>  git config sendemail.to 'gcc-patches@gcc.gnu.org'
>
>  set_user=$(git config --get "user.name")
> --
> 2.45.2
>

LGTM, but my approval doesn't actually mean anything, so...


Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-17 Thread Hongtao Liu
On Fri, Oct 18, 2024 at 9:08 AM Antoni Boucher  wrote:
>
> Hi.
> This is a patch for the bug 116725.
> I'm not sure if it is a good fix, but it seems to do the job.
> If you have suggestions for better comments than what I wrote that would
> explain what's happening, I'm open to suggestions.

>@@ -7548,7 +7548,8 @@ (define_insn 
>"avx512fp16_vcvtph2_<
> [(match_operand: 1 "" 
> "")]
> UNSPEC_US_FIX_NOTRUNC))]
>   "TARGET_AVX512FP16 && "
>-  "vcvtph2\t{%1, 
>%0|%0, %1}"
>+;; %X1 so that we don't emit any *WORD PTR for -masm=intel.
>+  "vcvtph2\t{%1, 
>%0|%0, %X1}"
Could you define something like

 ;; Pointer size override for 16-bit upper-convert modes (Intel asm dialect)
 (define_mode_attr iptrh
  [(V32HI "") (V16SI "") (V8DI "")
   (V16HI "") (V8SI "") (V4DI "q")
   (V8HI "") (V4SI "q") (V2DI "k")])

And use
+  "vcvtph2\t{%1,
%0|%0, %1}"

>   [(set_attr "type" "ssecvt")
>(set_attr "prefix" "evex")
>(set_attr "mode" "")])
>@@ -29854,7 +29855,8 @@ (define_insn 
>"avx512dq_vmfpclass"
>  UNSPEC_FPCLASS)
>(const_int 1)))]
>"TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(mode)"
>-   "vfpclass\t{%2, %1, 
>%0|%0, %1, %2}";
>+;; %X1 so that we don't emit any *WORD PTR for -masm=intel.
>+   "vfpclass\t{%2, %1, 
>%0|%0, %X1, %2}";

For scaar memory operand rewrite, we usually use , so
   "vfpclass\t{%2, %1,
%0|%0,
%1, %2}";




-- 
BR,
Hongtao


PR105361 Fix of testcase

2024-10-17 Thread Jerry D

Pushed as stated in the PR to cleanup the test case.

commit 6604a05fa27bc21c3409e767552daca3fcf43964 (HEAD -> master, 
origin/master, origin/HEAD)

Author: Jerry DeLisle 
Date:   Thu Oct 17 13:39:09 2024 -0700

Fortran: Add tolerance to real value comparisons.

gcc/testsuite/ChangeLog:

PR fortran/105361
* gfortran.dg/pr105361.f90: In the comparisons of
real values after a read, use a tolerance so that
subtle differences in results between different
architectures do not fail.



Re: [patch, fortran] Fix ICE with use of INT32 et al from ISO_FORTRAN_ENV

2024-10-17 Thread Jerry D

On 10/17/24 12:52 PM, Thomas Koenig wrote:

Hello world,

The attached patch fixes an ICE when an UNSIGNED-specific constant
is used from ISO_FORTRAN_ENV.  The error message is not particularly
great, it is

Error: Unsigned: The symbol 'uint32', referenced at (1), is not in the 
selected standard


but it is better than an ICE.

OK for trunk?


Looks good to me, yes OK.

Jerry



Best regards

 Thomas

gcc/fortran/ChangeLog:

     * error.cc (notify_std_msg): Handle GFC_STD_UNSIGNED.

gcc/testsuite/ChangeLog:

     * gfortran.dg/unsigned_37.f90: New test.




Re: [PATCH] Add fancy pointer support in std::map/set

2024-10-17 Thread François Dumont
Here is an updated version that compiles, I think, all your feedbacks. 
It's much cleaner indeed.


It's also tested in C++98/17/23.

I'm surprised that we do not need to consider potential 
allocator::const_pointer. Is there a plan to deprecate it ?


And if not, should not alloc traits const_pointer be per default a 
rebind of pointer for const element_type like in the __add_const_to_ptr 
you made me add ? I can try to work on a patch for that if needed.


    libstdc++: Add fancy pointer support in map and set

    Support fancy allocator pointer type in std::_Rb_tree<>.

    In case of fancy pointer type the container is now storing the 
pointer to

    _Rb_tree_pnode<> as a pointer to _Rb_tree_pnode_base<>.

    Many methods are adapted to take and return _Base_ptr in place of 
_Link_type

    which has been renamed into _Node_ptr.

    libstdc++-v3/ChangeLog:

    * include/bits/stl_tree.h
    (__add_const_to_ptr<>): New.
    (_Rb_tree_ptr_traits<>): New.
    (_Rb_tree_pnode_base<>): New.
    (_Rb_tree_node_base): Inherit from latter.
    (_Rb_tree_pheader): New.
    (_Rb_tree_header): Inherit from latter.
    (_Rb_tree_node_val): New.
    (_Rb_tree_node): Inherit from latter.
    (_Rb_tree_pnode): New.
    (_Rb_tree_iterator<>::_Link_type): Rename into...
    (_Rb_tree_iterator<>::_Node_type): ...this.
    (_Rb_tree_const_iterator<>::_Link_type): Rename into...
    (_Rb_tree_const_iterator<>::_Node_type): ...this.
    (_Rb_tree_const_iterator<>::_M_const_cast()): Adapt to use
    _Rb_tree_pnode_base<>::_M_self_ptr_nc method.
    (_Rb_tree_helpers<>): New.
    (_Rb_tree_piterator): New.
    (_Rb_tree_const_piterator): New.
    (_Rb_tree_node_traits<>): New.
    (_Rb_tree::_Node_base, _Rb_tree::_Node_type): New.
    (_Rb_tree): Adapt to generalize usage of _Base_ptr in place 
of _Link_type.
    * testsuite/23_containers/map/allocator/ext_ptr.cc: New 
test case.
    * testsuite/23_containers/multimap/allocator/ext_ptr.cc: 
New test case.
    * testsuite/23_containers/multiset/allocator/ext_ptr.cc: 
New test case.
    * testsuite/23_containers/set/allocator/ext_ptr.cc: New 
test case.


Ok to commit ?

François


On 09/10/2024 00:02, Jonathan Wakely wrote:



On Tue, 8 Oct 2024 at 22:50, Jonathan Wakely  
wrote:




On Thu, 1 Aug 2024 at 18:28, François Dumont
 wrote:

Hi

Here is a proposal to add fancy pointer support in
std::_Rb_tree container.

As you'll see there are still several usages of
pointer_traits<>::pointer_to. The ones in _M_header_ptr() are
unavoidable.


Yes, those are necessary.

The pointer_to use in _M_const_cast could be simplified by adding
a _M_self_ptr() member to _Rb_tree_pnode_base:

  _Base_ptr
  _M_self_ptr() _GLIBCXX_NOEXCEPT
  { return pointer_traits<_Base_ptr>::pointer_to(*this); }


I don't think it's needed (because _M_const_cast is only in C++11 
code, I think?) but you could define that for both C++11 and C++98:


#if __cplusplus
  _Base_ptr
  _M_self_ptr() noexcept
  { return pointer_traits<_Base_ptr>::pointer_to(*this); }
#else
  _Base_ptr _M_self_ptr() { return this; }
#endif

Then this works for both (if you replace the 'auto'):


  _Base_ptr
  _M_self_ptr_nc() const _GLIBCXX_NOEXCEPT
  {
    auto __self = const_cast<_Rb_tree_pnode_base*>(this);
    return __self->_M_self_ptr();
  }

 _Const_Base_ptr
  _M_self_ptr() const _GLIBCXX_NOEXCEPT
  { return pointer_traits<_Const_Base_ptr>::pointer_to(*this); }

Then _M_const_cast would do:

return iterator(_M_node->_M_self_ptr_nc());

The ones to extract a node or to return a node to the
allocator are more questionable. Are they fine ? Is there
another way to
mimic the static_cast<_Link_type> that can be done on raw
pointers with
fancy pointers ?


I think you can just do
static_cast<_Link_type>(_M_node->_M_self_ptr()) i.e. convert the
_Base_ptr to the derived _Link_type
(we should really rename _Link_type to _Node_ptr, I keep getting
confused by the fact that the actual links between nodes are
_Base_ptr, so what does _Link_type mean now?!)

Alternatively, you could add a _M_node_ptr() to the
_Rb_tree_pnode_base type:

template
  __ptr_rebind<_BasePtr, _NodeT>
  _M_node_ptr() _GLIBCXX_NOEXCEPT
  {
    auto __node = static_cast<_NodeT*>(this);
    using _Node_ptr = __ptr_rebind<_BasePtr, _NodeT>;
    return pointer_traits<_Node_ptr>::pointer_to(*__node);
  }


You could add a C++98 version of this as well:

#if __cplusplus >= 201103L
 // as above
#else
  template
  _NodeT*
  _M_node_ptr() _GLIBCXX_NOEXCEPT
  { return static_cast<_Node*>(this); }
#endif


Then the code to deallocate the n

[patch, fortran] Fix ICE with use of INT32 et al from ISO_FORTRAN_ENV

2024-10-17 Thread Thomas Koenig

Hello world,

The attached patch fixes an ICE when an UNSIGNED-specific constant
is used from ISO_FORTRAN_ENV.  The error message is not particularly
great, it is

Error: Unsigned: The symbol 'uint32', referenced at (1), is not in the 
selected standard


but it is better than an ICE.

OK for trunk?

Best regards

Thomas

gcc/fortran/ChangeLog:

* error.cc (notify_std_msg): Handle GFC_STD_UNSIGNED.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_37.f90: New test.diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index d184ffd878a..afe2e49e499 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -362,6 +362,8 @@ notify_std_msg(int std)
 return _("Obsolescent feature:");
   else if (std & GFC_STD_F95_DEL)
 return _("Deleted feature:");
+  else if (std & GFC_STD_UNSIGNED)
+return _("Unsigned:");
   else
 gcc_unreachable ();
 }
diff --git a/gcc/testsuite/gfortran.dg/unsigned_37.f90 b/gcc/testsuite/gfortran.dg/unsigned_37.f90
new file mode 100644
index 000..b11f214336a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/unsigned_37.f90
@@ -0,0 +1,4 @@
+! { dg-do compile }
+program main
+  use iso_fortran_env, only : uint32 ! { dg-error "not in the selected standard" }
+end program main


Re: [PATCH] Add fancy pointer support in std::map/set

2024-10-17 Thread Jonathan Wakely
On Thu, 17 Oct 2024 at 21:39, Jonathan Wakely  wrote:

>
>
> On Thu, 17 Oct 2024 at 20:52, François Dumont 
> wrote:
>
>> Here is an updated version that compiles, I think, all your feedbacks.
>> It's much cleaner indeed.
>>
>
> Thanks, I'll take a look tomorrow.
>
>
>> It's also tested in C++98/17/23.
>>
>> I'm surprised that we do not need to consider potential
>> allocator::const_pointer.
>>
> Do you mean consider the case where Alloc::const_pointer is not the same
> type as rebinding 'pointer' to a const element type?
>
> We don't need to consider that because we never get a 'const_pointer' from
> the allocator, and we never need to pass a 'const_pointer' to the
> allocator. The allocator's 'allocate' and 'deallocate' members both work
> with the 'pointer' type, so we only need to use that type when interacting
> with the allocator. For all the other uses, such as _Const_Node_ptr, what
> we need is a pointer-to-const that's compatible with the allocator's
> pointer type. It doesn't actually matter if it's the same type as
> allocator_traits::const_pointer, because we don't need
>

Sorry, I sent the email before finishing that thought!

... we don't need to pass a const_pointer to anything, we only need it for
the container's own purposes.

But thinking about it some more, do we even need a const-pointer for the
container?  Currently the const_iterator stores a const-pointer, and some
members like _M_root() and _M_leftmost() return a const-pointer. But they
don't need to. The nodes are all pointed to by a non-const _Base_ptr, none
of the storage managed by the container is const. We could just use the
non-const pointers everywhere, which would make things much simpler!

The const_iterator stores a const_pointer, and returns a const-pointer from
operator->(), so maybe _Rb_tree_const_piterator should take the allocator's
const_pointer as its template argument, instead of the non-const ValPtr. So
the trait would take two pointer types, but the const one would only be
used for the const iterator:

template
  struct _Rb_tree_node_traits
  {
using _Node_base = _Rb_tree_pnode_base<_ValPtr>;
using _Node_type = _Rb_tree_pnode<_ValPtr>;
using _Header_t = _Rb_tree_pheader<_Node_base>;
using _Iterator_t = _Rb_tree_piterator<_ValPtr>;
using _Const_iterator_t = _Rb_tree_const_piterator<_CValPtr>;
  };

Would that work? I can experiment with that if you like.



>
>
>> Is there a plan to deprecate it ?
>>
>
> No, although I think that would be possible. Nothing in the allocator
> requirements ever uses that type or cares what it is, as long as it's
> convertible to  'const_void_pointer', and 'pointer' is convertible to
> 'const_pointer'.
>
>
>> And if not, should not alloc traits const_pointer be per default a rebind
>> of pointer for const element_type like in the __add_const_to_ptr you made
>> me add ? I can try to work on a patch for that if needed.
>>
>
> No, allocator_traits is defined correctly as the standard requires. If an
> allocator A defines a 'A::const_pointer' typedef, then that is used. If
> not, then 'allocator_traits::const_pointer' defaults to rebinding the
> non-const 'allocator_traits::pointer' type.
>
>
>


[PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-17 Thread Antoni Boucher

Hi.
This is a patch for the bug 116725.
I'm not sure if it is a good fix, but it seems to do the job.
If you have suggestions for better comments than what I wrote that would 
explain what's happening, I'm open to suggestions.


Here are the tests results:

=== gcc Summary ===

# of expected passes208652
# of unexpected failures120
# of unexpected successes   25
# of expected failures  1545
# of unsupported tests  3518

=== g++ Summary ===

# of expected passes234412
# of unexpected failures41
# of expected failures  2215
# of unsupported tests  2061

They are the same on master:

# Comparing directories
## Dir1=/home/bouanto/tmp/master/: 2 sum files
## Dir2=/home/bouanto/tmp/branch/: 2 sum files

# Comparing 2 common sum files
## /bin/sh ./contrib/compare_tests  /tmp/gxx-sum1.29654 /tmp/gxx-sum2.29654
New tests that PASS (1 tests):

gcc: gcc.target/i386/pr116725.c (test for excess errors)

Old tests that passed, that have disappeared (136 tests): (Eeek!)

g++: c-c++-common/tsan/atomic_stack.c   -O0  execution test
[…]

Old tests that failed, that have disappeared (69 tests): (Eeek!)

g++: c-c++-common/tsan/atomic_stack.c   -O0  output pattern test
[…]

# No differences found in 2 common sum files

Thanks.From 20ba6ec63d29b5d1ac93bdb9d461527eaf8962f5 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Mon, 23 Sep 2024 18:58:47 -0400
Subject: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2*
 instructions

This only happens when using -masm=intel.

gcc/ChangeLog:
PR target/116725
* config/i386/sse.md: Fix asm generation.

gcc/testsuite/ChangeLog:
PR target/116725
* gcc.target/i386/pr116725.c: Add test using those AVX builtins.
---
 gcc/config/i386/sse.md   |  6 ++--
 gcc/testsuite/gcc.target/i386/pr116725.c | 40 
 2 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116725.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 685bce3094a..040f3ed3d14 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -7548,7 +7548,8 @@ (define_insn "avx512fp16_vcvtph2_<
 	   [(match_operand: 1 "" "")]
 	   UNSPEC_US_FIX_NOTRUNC))]
   "TARGET_AVX512FP16 && "
-  "vcvtph2\t{%1, %0|%0, %1}"
+;; %X1 so that we don't emit any *WORD PTR for -masm=intel.
+  "vcvtph2\t{%1, %0|%0, %X1}"
   [(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -29854,7 +29855,8 @@ (define_insn "avx512dq_vmfpclass"
 	UNSPEC_FPCLASS)
 	  (const_int 1)))]
"TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(mode)"
-   "vfpclass\t{%2, %1, %0|%0, %1, %2}";
+;; %X1 so that we don't emit any *WORD PTR for -masm=intel.
+   "vfpclass\t{%2, %1, %0|%0, %X1, %2}";
   [(set_attr "type" "sse")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
diff --git a/gcc/testsuite/gcc.target/i386/pr116725.c b/gcc/testsuite/gcc.target/i386/pr116725.c
new file mode 100644
index 000..9e5070e16e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116725.c
@@ -0,0 +1,40 @@
+/* PR gcc/116725 */
+/* { dg-do assemble } */
+/* { dg-options "-masm=intel -mavx512dq -mavx512fp16 -mavx512vl" } */
+/* { dg-require-effective-target masm_intel } */
+
+#include 
+
+typedef double __m128d __attribute__ ((__vector_size__ (16)));
+typedef float __m128f __attribute__ ((__vector_size__ (16)));
+typedef int __v16si __attribute__ ((__vector_size__ (64)));
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32)));
+typedef long long __m512i __attribute__((__vector_size__(64)));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+typedef int __v4si __attribute__ ((__vector_size__ (16)));
+typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+
+int main(void) {
+__m128d vec = {1.0, 2.0};
+char res = __builtin_ia32_fpclasssd_mask(vec, 1, 1);
+printf("%d\n", res);
+
+__m128f vec2 = {1.0, 2.0, 3.0, 4.0};
+char res2 = __builtin_ia32_fpcla_mask(vec2, 1, 1);
+printf("%d\n", res2);
+
+__m128h vec3 = {2.0, 1.0, 3.0};
+__v4si vec4 = {};
+__v4si res3 = __builtin_ia32_vcvtph2dq128_mask(vec3, vec4, -1);
+printf("%d\n", res3[0]);
+
+__v4si res4 = __builtin_ia32_vcvtph2udq128_mask(vec3, vec4, -1);
+printf("%d\n", res4[0]);
+
+__m128i vec5 = {};
+__m128i res5 = __builtin_ia32_vcvtph2qq128_mask(vec3, vec5, -1);
+printf("%d\n", res5[0]);
+
+__m128i res6 = __builtin_ia32_vcvtph2uqq128_mask(vec3, vec5, -1);
+printf("%d\n", res6[0]);
+}
-- 
2.47.0



Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2024-10-17 Thread Nicolas Boulenguez
> > > It may be surprising to have the RTEMS file used by other OS. The
> > > original comment should have mentionned that in the first place, but
> > > the file was only used with RTEMS. With your change, the file is
> > > effectively shared, so it would be best to rename it.
> > 
> > Could you please suggest an appropriate file name?  This may be
> > obvious for you, but with my limited knowledge of GNAT internals, the
> > diff between s-osprim__rtems.adb and __unix/optide.adb is not
> > sufficient to guess why a separate implementation is/was required.
> 
> Would it be possible to drop it altogether and use s-osprim__posix.adb 
> instead?
> Otherwise what's the remaining difference between
> s-osprim__posix.adb and s-osprim__rtems.adb? The difference should
> help us find a proper name based on properties of the file.

At first view, it seems possible and desirable to merge _posix and
_rtems, but working on this right now would be counter-productive.

I suggest to apply patches 1-2 and fix PR114065 first.

Then the cosmetic changes in patches 3-6 (and possibly a trivial
backport of 8).

After that, we will have all the time in the long winter afternoons to
discuss file names in patch 7.

> Can you post also a diff between version 11 and version 12? It's not
> practical to review the complete changes from scratch at this stage,
> the patch is too big.

The diff between git version branches are attached.
diff --git a/gcc/ada/doc/gnat_rm/the_gnat_library.rst b/gcc/ada/doc/gnat_rm/the_gnat_library.rst
index bcec49f..58c790f 100644
--- a/gcc/ada/doc/gnat_rm/the_gnat_library.rst
+++ b/gcc/ada/doc/gnat_rm/the_gnat_library.rst
@@ -674,6 +674,8 @@ Machine-specific implementations are available in some cases.
 
 Extends the facilities provided by ``Ada.Calendar`` to include handling
 of days of the week, an extended ``Split`` and ``Time_Of`` capability.
+Also provides conversion of ``Ada.Calendar.Duration`` values to and from the
+C ``timeval`` format.
 
 .. _`GNAT.Calendar.Time_IO_(g-catiio.ads)`:
 
diff --git a/gcc/ada/libgnarl/s-osinte__android.ads b/gcc/ada/libgnarl/s-osinte__android.ads
index ee3a5dc..ecf4a32 100644
--- a/gcc/ada/libgnarl/s-osinte__android.ads
+++ b/gcc/ada/libgnarl/s-osinte__android.ads
@@ -207,7 +207,7 @@ package System.OS_Interface is
type clockid_t is new int;
 
function clock_gettime
- (clock_id : clockid_t; tp : access C_time.timespec) return int;
+ (clock_id : clockid_t; tp : access C_Time.timespec) return int;
pragma Import (C, clock_gettime, "clock_gettime");
 
function clock_getres
diff --git a/gcc/ada/libgnat/a-calcon.ads b/gcc/ada/libgnat/a-calcon.ads
index ad4ca64..196028e 100644
--- a/gcc/ada/libgnat/a-calcon.ads
+++ b/gcc/ada/libgnat/a-calcon.ads
@@ -30,9 +30,10 @@
 --
 
 --  This package provides various routines for conversion between Ada and Unix
---  time models - Time, Duration and struct tm.
+--  time models - Time, Duration, struct tm and struct timespec.
 
 with Interfaces.C;
+with System.C_Time;
 
 package Ada.Calendar.Conversions is
 
@@ -67,6 +68,20 @@ package Ada.Calendar.Conversions is
--  the input values are out of the defined ranges or if tm_sec equals 60
--  and the instance in time is not a leap second occurrence.
 
+   function To_Duration
+ (tv_sec  : System.C_Time.Tv_Sec_Long;
+  tv_nsec : System.C_Time.Tv_Nsec_Long)
+ return System.C_Time.Non_Negative_Duration
+   renames System.C_Time.To_Duration;
+   --  Deprecated.  Please use C_Time directly.
+
+   procedure To_Struct_Timespec
+ (D   : System.C_Time.Non_Negative_Duration;
+  tv_sec  : out System.C_Time.Tv_Sec_Long;
+  tv_nsec : out System.C_Time.Tv_Nsec_Long)
+   renames System.C_Time.To_Struct_Timespec;
+   --  Deprecated.  Please use C_Time directly.
+
procedure To_Struct_Tm
  (T   : Time;
   tm_year : out Interfaces.C.int;
diff --git a/gcc/ada/libgnat/g-calend.ads b/gcc/ada/libgnat/g-calend.ads
index f317ab4..7791943 100644
--- a/gcc/ada/libgnat/g-calend.ads
+++ b/gcc/ada/libgnat/g-calend.ads
@@ -40,6 +40,7 @@
 --  Day_Of_Week, Day_In_Year and Week_In_Year.
 
 with Ada.Calendar.Formatting;
+with System.C_Time;
 
 package GNAT.Calendar is
 
@@ -144,6 +145,19 @@ package GNAT.Calendar is
--  Return the week number as defined in ISO 8601 along with the year in
--  which the week occurs.
 
+   subtype timeval is System.C_Time.timeval;
+   --  Deprecated.  Please use C_Time directly.
+
+   function To_Duration (T : not null access timeval)
+return System.C_Time.Non_Negative_Duration
+ with Inline
+   is (System.C_Time.To_Duration (T.all));
+   --  Deprecated.  Please use C_Time directly.
+
+   function To_Timeval (D : System.C_Time.Non_Negative_Duration) return timeval
+   renames System.C_Time.To_Timeval;
+   --  Deprecated.  Please use C_Time directly.
+
 private
 
function Julian_Day
diff --git a/gcc/ada/libgnat/

[PATCH v2 07/12] libgomp, AArch64: Test OpenMP user-defined reductions with SVE types.

2024-10-17 Thread Tejas Belagod
This patch tests user-defined reductions on various constructs with objects
of SVE type.

libgomp/ChangeLog:

* testsuite/libgomp.target/aarch64/udr-sve.c: New.
---
 .../libgomp.target/aarch64/udr-sve.c  | 108 ++
 1 file changed, 108 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/udr-sve.c

diff --git a/libgomp/testsuite/libgomp.target/aarch64/udr-sve.c 
b/libgomp/testsuite/libgomp.target/aarch64/udr-sve.c
new file mode 100644
index 000..749f3c2123b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/udr-sve.c
@@ -0,0 +1,108 @@
+/* { dg-do run } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+#pragma omp declare reduction (+:svint32_t: omp_out = svadd_s32_z 
(svptrue_b32(), omp_in, omp_out)) \
+   initializer (omp_priv = svindex_s32 (0, 0))
+
+int __attribute__((noipa))
+parallel_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int i = 0;
+  int64_t res;
+
+  #pragma omp parallel reduction (+:va, i)
+{
+  va = svld1_s32 (svptrue_b32 (), a);
+  i++;
+}
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != i * 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int  __attribute__((noipa))
+for_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int j;
+  int64_t res;
+
+  #pragma omp parallel for reduction (+:va)
+  for (j = 0; j < 8; j++)
+va = svld1_s32 (svptrue_b32 (), a);
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != 64)
+__builtin_abort ();
+
+  return 0;
+}
+
+int __attribute__((noipa))
+simd_reduction ()
+{
+  int a[8];
+  svint32_t va = svindex_s32 (0, 0);
+  int i = 0;
+  int j;
+  int64_t res = 0;
+
+  for (j = 0; j < 8; j++)
+a[j] = 1;
+
+  #pragma omp simd reduction (+:va, i)
+  for (j = 0; j < 16; j++)
+va = svld1_s32 (svptrue_b32 (), a);
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int __attribute__((noipa))
+inscan_reduction_incl ()
+{
+  svint32_t va = svindex_s32 (0, 0);
+  int j;
+  int64_t res = 0;
+
+  #pragma omp parallel
+  #pragma omp for reduction (inscan,+:va) firstprivate (res) lastprivate (res)
+  for (j = 0; j < 8; j++)
+{
+  va = svindex_s32 (1, 0);
+  #pragma omp scan inclusive (va)
+  res += svaddv_s32 (svptrue_b32 (), va);
+}
+
+  if (res != 64)
+__builtin_abort ();
+
+  return 0;
+}
+
+int
+main ()
+{
+  parallel_reduction ();
+  for_reduction ();
+  simd_reduction ();
+  inscan_reduction_incl ();
+
+  return 0;
+}
-- 
2.25.1



[PATCH v2 01/12] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-10-17 Thread Tejas Belagod
Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog
* omp-low.c (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.
---
 gcc/omp-low.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index da2051b0279..6b3853ed528 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -466,7 +466,8 @@ static bool
 use_pointer_for_field (tree decl, omp_context *shared_ctx)
 {
   if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
-  || TYPE_ATOMIC (TREE_TYPE (decl)))
+  || TYPE_ATOMIC (TREE_TYPE (decl))
+  || POLY_INT_CST_P (DECL_SIZE (decl)))
 return true;
 
   /* We can only use copy-in/copy-out semantics for shared variables
-- 
2.25.1



[PATCH v2 03/12] [tree] Add function to strip pointer type and get down to the actual pointee type.

2024-10-17 Thread Tejas Belagod
Add a function to traverse down the pointer layers to the pointee type.

gcc/ChangeLog:
* tree.h (strip_pointer_types): New.
---
 gcc/tree.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree.h b/gcc/tree.h
index 75efc760a16..e2b4dd36444 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4992,6 +4992,15 @@ strip_array_types (tree type)
   return type;
 }
 
+inline const_tree
+strip_pointer_types (const_tree type)
+{
+  while (POINTER_TYPE_P (type))
+type = TREE_TYPE (type);
+
+  return type;
+}
+
 /* Desription of the reason why the argument of valid_constant_size_p
is not a valid size.  */
 enum cst_size_error {
-- 
2.25.1



[PATCH v2 10/12] AArch64: Diagnose OpenMP linear clause for SVE type objects.

2024-10-17 Thread Tejas Belagod
This patch tests if SVE object types if applied to linear clause is diagnosed
as expected.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/linear.c: New test.
---
 .../gcc.target/aarch64/sve/omp/linear.c   | 85 +++
 1 file changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
new file mode 100644
index 000..9391b981056
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
@@ -0,0 +1,85 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+int a[256];
+
+__attribute__((noinline, noclone)) int
+f1 (svint32_t va, int i)
+{
+  #pragma omp parallel for linear (va: 8) linear (i: 4) /* { dg-error {linear 
clause applied to non-integral non-pointer variable with type 'svint32_t'} } */
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  va = svindex_s32 (0,1);
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f2 (svbool_t p, int i)
+{
+  #pragma omp parallel for linear (p: 1) linear (i: 4) /* { dg-error {linear 
clause applied to non-integral non-pointer variable with type 'svbool_t'} } */
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  p = svptrue_b32 ();
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f3 (svint32_t va, int i)
+{
+  #pragma omp simd linear (va: 8) linear (i: 4) /* { dg-error {linear clause 
applied to non-integral non-pointer variable with type 'svint32_t'} } */
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  va = svindex_s32 (0,1);
+}
+  return i;
+}
+
+#pragma omp declare simd linear (va: 8) /* { dg-error {linear clause applied 
to non-integral non-pointer variable with type 'svint32_t'} } */
+__attribute__((noinline, noclone)) int
+f4 (svint32_t va, int i)
+{
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  va = svindex_s32 (0,1);
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f5 (svbool_t p, int i)
+{
+  #pragma omp simd linear (p: 1) linear (i: 4) /* { dg-error {linear clause 
applied to non-integral non-pointer variable with type 'svbool_t'} } */
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  p = svptrue_b32 ();
+}
+  return i;
+}
+
+#pragma omp declare simd linear (p: 8) /* { dg-error {linear clause applied to 
non-integral non-pointer variable with type 'svbool_t'} } */
+__attribute__((noinline, noclone)) int
+f6 (svbool_t p, int i)
+{
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  p = svptrue_b32 ();
+}
+  return i;
+}
+
-- 
2.25.1



[PATCH v2 06/12] libgomp, AArch64: Test OpenMP threadprivate clause on SVE type.

2024-10-17 Thread Tejas Belagod
This patch adds a test for ensuring threadprivate clause works for SVE type
objects.

libgomp/ChangeLog:

* testsuite/libgomp.target/aarch64/threadprivate.c: New test.
---
 .../libgomp.target/aarch64/threadprivate.c| 48 +++
 1 file changed, 48 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/threadprivate.c

diff --git a/libgomp/testsuite/libgomp.target/aarch64/threadprivate.c 
b/libgomp/testsuite/libgomp.target/aarch64/threadprivate.c
new file mode 100644
index 000..3b10201fdd0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/threadprivate.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+typedef __SVInt32_t v8si __attribute__((arm_sve_vector_bits(256)));
+
+v8si vec1;
+#pragma omp threadprivate (vec1)
+
+void  __attribute__((noipa))
+foo ()
+{
+  int64_t res = 0;
+
+  vec1 = svindex_s32 (1, 0);
+
+#pragma omp parallel copyin (vec1) firstprivate (res) num_threads(10)
+  {
+res = svaddv_s32 (svptrue_b32 (), vec1);
+
+#pragma omp barrier
+if (res != 8LL)
+  __builtin_abort ();
+  }
+
+  return;
+}
+
+int
+main()
+{
+  int64_t res = 0;
+
+#pragma omp parallel firstprivate (res) num_threads(10)
+  {
+vec1 = svindex_s32 (1, 0);
+res = svaddv_s32 (svptrue_b32 (), vec1);
+
+#pragma omp barrier
+if (res != 8LL)
+  __builtin_abort ();
+  }
+
+  foo ();
+
+  return 0;
+}
-- 
2.25.1



[PATCH v2 11/12] libgomp, AArch64: Test OpenMP depend clause and its variations on SVE types

2024-10-17 Thread Tejas Belagod
This patch adds a test to test depend clause and its various dependency
variations with SVE type objects.

libgomp/ChangeLog:

* testsuite/libgomp.target/aarch64/depend-1.c: New.
---
 .../libgomp.target/aarch64/depend-1.c | 223 ++
 1 file changed, 223 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/depend-1.c

diff --git a/libgomp/testsuite/libgomp.target/aarch64/depend-1.c 
b/libgomp/testsuite/libgomp.target/aarch64/depend-1.c
new file mode 100644
index 000..8ac97685727
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/depend-1.c
@@ -0,0 +1,223 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+int zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0};
+int ones[8] = { 1, 1, 1, 1, 1, 1, 1, 1 };
+int twos[8] = { 2, 2, 2, 2, 2, 2, 2, 2 };
+
+void
+dep (void)
+{
+  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort ();
+  }
+}
+
+void
+dep2 (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort ();
+#pragma omp taskwait
+  }
+}
+
+void
+dep3 (void)
+{
+  #pragma omp parallel
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp single
+{
+  #pragma omp task shared (x) depend(out: x)
+  x = svld1_s32 (svptrue_b32 (), twos);
+  #pragma omp task shared (x) depend(in: x)
+  if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+   __builtin_abort ();
+}
+  }
+}
+
+void
+firstpriv (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+  __builtin_abort ();
+  }
+}
+
+void
+antidep (void)
+{
+  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task shared(x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+  __builtin_abort ();
+#pragma omp task shared(x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+  }
+}
+
+void
+antidep2 (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp taskgroup
+{
+  #pragma omp task shared(x) depend(in: x)
+  if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+   __builtin_abort ();
+  #pragma omp task shared(x) depend(out: x)
+  x = svld1_s32 (svptrue_b32 (), twos);
+}
+  }
+}
+
+void
+antidep3 (void)
+{
+  #pragma omp parallel
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp single
+{
+  #pragma omp task shared(x) depend(in: x)
+  if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+   __builtin_abort ();
+  #pragma omp task shared(x) depend(out: x)
+  x = svld1_s32 (svptrue_b32 (), twos);
+}
+  }
+}
+
+
+void
+outdep (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), zeros);
+#pragma omp task shared(x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task shared(x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp taskwait
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort ();
+  }
+}
+
+void
+concurrent (void)
+{
+  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort ();
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort ();
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort ();
+  }
+}
+
+void
+concurrent2 (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), tw

[PATCH v2 00/12] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-10-17 Thread Tejas Belagod
The following patch series is reworked from its first version based on Jakub's
review comments in  
  https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659540.html

The changes in v2: 

1. Moved all execute tests to under libgomp/testsuite/libgomp.target/aarch64/.
2. Retained gcc/testsuite/gcc.target/aarch64/sve/omp/ for compile tests.
3. Handled offloading SVE types differently based on sizeless and fixed-size
   types.  Also added more tests to check for VLA and VLS types.
4. Made tests more representative of real-world scenarios.
5. Converted some compile tests to execute tests.
6. For user-defined reductions, I have removed task and taskloop tests for now.
   I need to understand the constructs better before adding meaningful tests.
7. One known fail where declare simd uniform clones a function to a variant
   to support a particular type in the clause.  This fails on SVE with a decl
   without prototype error. It is unclear how this ought to be handled.
   I went ahead and posted the rest of the series as I didn't want this issue
   to block the rest of the patches.

The following patch series handles various scenarios with OpenMP and SVE types.
The starting point for the series follows a suggestion from Jakub to cover all 
the possible scenarios that could arise when OMP constructs/clauses etc are 
used with SVE ACLE types. Here are a few instances that this patch series test
and in some cases fixes the expected output.  This patch series does not follow
a formal definition or a spec of how OMP interacts with SVE ACLE types, so its 
more of a proposed behaviour.  Comments and discussion welcome.

This list is not exhaustive, but covers most scenarios of how SVE ACLE types
ought to interact with OMP constructs/clauses.

1. Poly-int structures that represent variable-sized objects and OMP runtime.

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

2. SVE ACLE types in OMP Shared clauses.

We test the behaviour where SVE ACLE type objects are shared in the following
methods into an OMP region:
  a. Explicit Shared clause on SVE ACLE type objects.
  b. Implicit shared clause.
  c. Implicit shared with default clause.
  d. SVE ALCE types in the presence of predetemined (static) shared objects.

The associated tests ensure that all such shared objects are passed by address
into the OMP runtime.  There are runtime tests to verify the functional
correctness of the change.

3. [tree] Add function to strip pointer type and get down to the actual pointee 
type.

Adds a support function in tree.h to strip pointer types to drill down to the 
pointee 
type.

4. Offloading and SVE ACLE types.

The target clause in OpenMP is used to offload loop kernels to accelarator
peripeherals.  target's 'map' clause is used to move data from and to the
accelarator.  When the data is sizeless SVE type, it may be unsuitable due to
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes sizeless SVE types
unsuitable for use in OMP's 'map' clause.  We diagnose all such cases and issue
errors where appropriate.  The cases we cover in this patch are:

  a. Implicitly-mapped SVE ACLE types in OMP target regions are diagnosed.
  b. Explicitly-mapped SVE ACLE types in OMP target regions using map clause
 are diagnosed.
  c. Explicilty-mapped SVLE ACLE types of various directions - to, from, tofrom
 in the map clause are diagnosed.
  d. target enter and exit data clauses with map on SVE ACLE types are
 diagnosed.
  e. target data map with alloc on SVE ACLE types are diagnosed.
  f. target update from clause on SVE ACLE types are diagnosed.
  g. target private firstprivate with SVE ACLE types are diagnosed.
  h. All combinations of target with work-sharing constructs like parallel,
 loop, simd, teams, distribute etc are also diagnosed when SVE ACLE types
 are involved.

For a fixed size SVE vector types(eg. fixed by arm_sve_vector_bits attribute),
we don't diagnose.  Fixed size vectors are allowed to be used in OMP offloading
constructs and clauses.  The only caveat is that LTO streamers that handle
streaming in the offloaded bytecode is expected to check for matching vector
size and diagnose as the attribute sizes are also streamed out.

5. Lastprivate and SVE ACLE types.

Various OpenMP lastprivate clause scenarios with SVE object types are
diagnosed.  Worksharing constructs like sections, for, distribute bind to an
implicit outer parallel region in whose scope SVE ACLE types are declared and
are therefore default private.  The lastprivate clause list with SVE ACLE type
object items are diagnosed in this scenario.

Execute tests have been added for checking functional behaviour.

6. Threadprivate on SVE ACLE type objects.

We ensure threadprivate SVE ACLE type objects are support

[PATCH v2 12/12] AArch64: Diagnose SVE type objects when applied to OpenMP doacross clause.

2024-10-17 Thread Tejas Belagod
This patch tests if SVE type objects when applied to doacross clause are
correctly diagnosed.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/doacross.c: New test.
---
 .../gcc.target/aarch64/sve/omp/doacross.c | 22 +++
 1 file changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
new file mode 100644
index 000..dc5020d53f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+int a[256];
+
+__attribute__((noinline, noclone)) int
+f1 (svint32_t va)
+{
+  int j;
+  #pragma omp for ordered (1)
+  for (j = 16; j < 64; j++)
+{
+  #pragma omp ordered doacross(sink: va) /* { dg-error {variable 'va' is 
not an iteration of outermost loop 1, expected 'j'} } */
+  a[j - 1] = j + svaddv_s32 (svptrue_b32 (), va);
+  #pragma omp ordered doacross(source: omp_cur_iteration)
+  j += 4;
+  va = svindex_s32 (0,1);
+}
+  return j;
+}
-- 
2.25.1



[PATCH v2 05/12] libgomp, AArch64: Test OpenMP lastprivate clause for various constructs.

2024-10-17 Thread Tejas Belagod
This patch tests various OpenMP lastprivate clause with SVE object types in
various construct contexts.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/lastprivate.c: New test.

libgomp/ChangeLog:

* testsuite/libgomp.target/aarch64/lastprivate.c: New test.
---
 .../gcc.target/aarch64/sve/omp/lastprivate.c  |  94 ++
 .../libgomp.target/aarch64/lastprivate.c  | 162 ++
 2 files changed, 256 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/lastprivate.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c
new file mode 100644
index 000..8f89c68647b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+#define N 8
+
+#ifndef CONSTRUCT
+#define CONSTRUCT
+#endif
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_sections ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+/* This worksharing construct binds to an implicit outer parallel region in
+whose scope va is declared and therefore is default private.  This causes
+the lastprivate clause list item va to be diagnosed as private in the outer
+context.  Similarly for constructs for and distribute.  */
+#pragma omp sections lastprivate (va) /* { dg-error {lastprivate variable 'va' 
is private in outer context} } */
+{
+  #pragma omp section
+  vb = svld1_s32 (svptrue_b32 (), b);
+  #pragma omp section
+  vc = svld1_s32 (svptrue_b32 (), c);
+  #pragma omp section
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_for ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp for lastprivate (va) /* { dg-error {lastprivate variable 'va' is 
private in outer context} } */
+  for (i = 0; i < 1; i++)
+{
+  vb = svld1_s32 (svptrue_b32 (), b);
+  vc = svld1_s32 (svptrue_b32 (), c);
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_distribute ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp distribute lastprivate (va) /* { dg-error {lastprivate variable 
'va' is private in outer context} } */
+  for (i = 0; i < 1; i++)
+{
+  vb = svld1_s32 (svptrue_b32 (), b);
+  vc = svld1_s32 (svptrue_b32 (), c);
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
diff --git a/libgomp/testsuite/libgomp.target/aarch64/lastprivate.c 
b/libgomp/testsuite/libgomp.target/aarch64/lastprivate.c
new file mode 100644
index 000..da3c4d64d70
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/lastprivate.c
@@ -0,0 +1,162 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+#ifndef CONSTRUCT
+#define CONSTRUCT
+#endif
+
+void  __attribute__ ((noinline))
+omp_lastprivate_sections ()
+{
+
+  int a[8], b[8], c[8];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < 8; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp parallel
+#pragma omp sections lastprivate (vb, vc)
+{
+  #pragma omp section
+  vb = svld1_s32 (svptrue_b32 (), b);
+  #pragma omp section
+  vc = svld1_s32 (svptrue_b32 (), c);
+}
+
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+  svst1_s32 (svptrue_b32 (), a, va);
+
+  for (i = 0; i < 8; i++)
+if (a[i] != b[i] + c[i])
+  __builtin_abort ();
+}
+
+void __attribute__ ((noinline))
+omp_lastprivate_for ()
+{
+
+  int a[32], b[32], c[32];
+  int aa[8], bb[8], cc[8];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < 32; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp parallel
+#pragma omp for lastprivate (va, vb, vc)
+  for (i = 0; i < 4; i++)
+{
+  vb = svld1_s32 (svptrue_b32 (), b + i * 8);
+  vc = svld1_s32 (svptrue_b32 (), c + i * 8);
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+  svst1_s32 (svptrue_b32 (), a + i * 8, va);
+}
+
+  svst1_s32 (svptrue_b32 (), aa, va);
+  svst1_s32 (svptrue_b32 (), bb, vb);
+  svst1_s32 (svptrue_b32 (), cc, vc);
+  for (i = 0; i < 8; i++)
+if (aa[i] != bb[i] + cc[i])
+  __builtin_abort ();
+
+  for (i = 0; i < 32; i++)
+if (a[i] != b[i] + c[i])
+  __builti

Re: [PATCH] i386: Refactor get_intel_cpu

2024-10-17 Thread Uros Bizjak
On Fri, Oct 18, 2024 at 4:56 AM Haochen Jiang  wrote:
>
> Hi all,
>
> ISE054 has just been disclosed and you can find doc from here:
>
> https://cdrdv2.intel.com/v1/dl/getContent/671368
>
> From ISE, it shows that we will have family 0x13 for Diamond Rapids.
> Therefore, we need to refactor the get_intel_cpu to accept new families.
> Also I did some reorder in the switch for clearness by putting earlier
> added products on top for search convenience.

You can post "git diff -w" patch to see what the patch really does
without drowning the real change in whitespace changes.

> Bootstraped and tested on x86_64-pc-linux-gnu. Ok for trunk?
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_intel_cpu): Refactor the
> function for future expansion on different family.

LGTM.

Thanks,
Uros.

> ---
>  gcc/common/config/i386/cpuinfo.h | 587 +++
>  1 file changed, 292 insertions(+), 295 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 2ae383eb6ab..e3eb6e9d250 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -343,301 +343,298 @@ get_intel_cpu (struct __processor_model *cpu_model,
>  {
>const char *cpu = NULL;
>
> -  /* Parse family and model only for model 6. */
> -  if (cpu_model2->__cpu_family != 0x6)
> -return cpu;
> -
> -  switch (cpu_model2->__cpu_model)
> -{
> -case 0x1c:
> -case 0x26:
> -  /* Bonnell.  */
> -  cpu = "bonnell";
> -  CHECK___builtin_cpu_is ("atom");
> -  cpu_model->__cpu_type = INTEL_BONNELL;
> -  break;
> -case 0x37:
> -case 0x4a:
> -case 0x4d:
> -case 0x5d:
> -  /* Silvermont.  */
> -case 0x4c:
> -case 0x5a:
> -case 0x75:
> -  /* Airmont.  */
> -  cpu = "silvermont";
> -  CHECK___builtin_cpu_is ("silvermont");
> -  cpu_model->__cpu_type = INTEL_SILVERMONT;
> -  break;
> -case 0x5c:
> -case 0x5f:
> -  /* Goldmont.  */
> -  cpu = "goldmont";
> -  CHECK___builtin_cpu_is ("goldmont");
> -  cpu_model->__cpu_type = INTEL_GOLDMONT;
> -  break;
> -case 0x7a:
> -  /* Goldmont Plus.  */
> -  cpu = "goldmont-plus";
> -  CHECK___builtin_cpu_is ("goldmont-plus");
> -  cpu_model->__cpu_type = INTEL_GOLDMONT_PLUS;
> -  break;
> -case 0x86:
> -case 0x96:
> -case 0x9c:
> -  /* Tremont.  */
> -  cpu = "tremont";
> -  CHECK___builtin_cpu_is ("tremont");
> -  cpu_model->__cpu_type = INTEL_TREMONT;
> -  break;
> -case 0x1a:
> -case 0x1e:
> -case 0x1f:
> -case 0x2e:
> -  /* Nehalem.  */
> -  cpu = "nehalem";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("nehalem");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_NEHALEM;
> -  break;
> -case 0x25:
> -case 0x2c:
> -case 0x2f:
> -  /* Westmere.  */
> -  cpu = "westmere";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("westmere");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_WESTMERE;
> -  break;
> -case 0x2a:
> -case 0x2d:
> -  /* Sandy Bridge.  */
> -  cpu = "sandybridge";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("sandybridge");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_SANDYBRIDGE;
> -  break;
> -case 0x3a:
> -case 0x3e:
> -  /* Ivy Bridge.  */
> -  cpu = "ivybridge";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("ivybridge");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_IVYBRIDGE;
> -  break;
> -case 0x3c:
> -case 0x3f:
> -case 0x45:
> -case 0x46:
> -  /* Haswell.  */
> -  cpu = "haswell";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("haswell");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_HASWELL;
> -  break;
> -case 0x3d:
> -case 0x47:
> -case 0x4f:
> -case 0x56:
> -  /* Broadwell.  */
> -  cpu = "broadwell";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("broadwell");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_BROADWELL;
> -  break;
> -case 0x4e:
> -case 0x5e:
> -  /* Skylake.  */
> -case 0x8e:
> -case 0x9e:
> -  /* Kaby Lake.  */
> -case 0xa5:
> -case 0xa6:
> -  /* Comet Lake.  */
> -  cpu = "skylake";
> -  CHECK___builtin_cpu_is ("corei7");
> -  CHECK___builtin_cpu_is ("skylake");
> -  cpu_model->__cpu_type = INTEL_COREI7;
> -  cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
> -  break;
> -case 0xa7:
> -  /* Rocket

RE: [PATCH] i386: Refactor get_intel_cpu

2024-10-17 Thread Jiang, Haochen
> From: Uros Bizjak 
> Sent: Friday, October 18, 2024 2:05 PM
> 
> On Fri, Oct 18, 2024 at 4:56 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > ISE054 has just been disclosed and you can find doc from here:
> >
> > https://cdrdv2.intel.com/v1/dl/getContent/671368
> >
> > From ISE, it shows that we will have family 0x13 for Diamond Rapids.
> > Therefore, we need to refactor the get_intel_cpu to accept new families.
> > Also I did some reorder in the switch for clearness by putting earlier
> > added products on top for search convenience.
> 
> You can post "git diff -w" patch to see what the patch really does
> without drowning the real change in whitespace changes.
> 

That is a good idea. The change after using git diff -w:

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 2ae383eb6ab..e3eb6e9d250 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -343,10 +343,8 @@ get_intel_cpu (struct __processor_model *cpu_model,
 {
   const char *cpu = NULL;

-  /* Parse family and model only for model 6. */
-  if (cpu_model2->__cpu_family != 0x6)
-return cpu;
-
+  /* Parse family and model for family 0x6.  */
+  if (cpu_model2->__cpu_family == 0x6)
 switch (cpu_model2->__cpu_model)
   {
   case 0x1c:
@@ -390,6 +388,15 @@ get_intel_cpu (struct __processor_model *cpu_model,
CHECK___builtin_cpu_is ("tremont");
cpu_model->__cpu_type = INTEL_TREMONT;
break;
+  case 0x17:
+  case 0x1d:
+   /* Penryn.  */
+  case 0x0f:
+   /* Merom.  */
+   cpu = "core2";
+   CHECK___builtin_cpu_is ("core2");
+   cpu_model->__cpu_type = INTEL_CORE2;
+   break;
   case 0x1a:
   case 0x1e:
   case 0x1f:
@@ -466,14 +473,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
break;
-case 0xa7:
-  /* Rocket Lake.  */
-  cpu = "rocketlake";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("rocketlake");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_ROCKETLAKE;
-  break;
   case 0x55:
CHECK___builtin_cpu_is ("corei7");
cpu_model->__cpu_type = INTEL_COREI7;
@@ -509,6 +508,16 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_CANNONLAKE;
break;
+  case 0x7e:
+  case 0x7d:
+  case 0x9d:
+   /* Ice Lake client.  */
+   cpu = "icelake-client";
+   CHECK___builtin_cpu_is ("corei7");
+   CHECK___builtin_cpu_is ("icelake-client");
+   cpu_model->__cpu_type = INTEL_COREI7;
+   cpu_model->__cpu_subtype = INTEL_COREI7_ICELAKE_CLIENT;
+   break;
   case 0x6a:
   case 0x6c:
/* Ice Lake server.  */
@@ -518,15 +527,13 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_ICELAKE_SERVER;
break;
-case 0x7e:
-case 0x7d:
-case 0x9d:
-   /* Ice Lake client.  */
-  cpu = "icelake-client";
+  case 0xa7:
+   /* Rocket Lake.  */
+   cpu = "rocketlake";
CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("icelake-client");
+   CHECK___builtin_cpu_is ("rocketlake");
cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_ICELAKE_CLIENT;
+   cpu_model->__cpu_subtype = INTEL_COREI7_ROCKETLAKE;
break;
   case 0x8c:
   case 0x8d:
@@ -537,7 +544,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_TIGERLAKE;
break;
-
   case 0xbe:
/* Alder Lake N, E-core only.  */
   case 0x97:
@@ -626,15 +632,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
cpu_model->__cpu_type = INTEL_COREI7;
cpu_model->__cpu_subtype = INTEL_COREI7_PANTHERLAKE;
break;
-case 0x17:
-case 0x1d:
-  /* Penryn.  */
-case 0x0f:
-  /* Merom.  */
-  cpu = "core2";
-  CHECK___builtin_cpu_is ("core2");
-  cpu_model->__cpu_type = INTEL_CORE2;
-  break;
   default:
break;
   }

Thx,
Haochen


[PATCH v2 02/12] libgomp, AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-10-17 Thread Tejas Belagod
This patch adds a test scaffold for OpenMP compile tests in under the gcc.target
testsuite.  It also adds a target tests directory libgomp.target along with an
SVE execution test

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/gomp.exp: New scaffold.

libgomp/ChangeLog:

* testsuite/libgomp.target/aarch64/aarch64.exp: New scaffold.
* testsuite/libgomp.target/aarch64/shared.c: New test.
---
 .../gcc.target/aarch64/sve/omp/gomp.exp   |  46 +
 .../libgomp.target/aarch64/aarch64.exp|  57 ++
 .../testsuite/libgomp.target/aarch64/shared.c | 186 ++
 3 files changed, 289 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/gomp.exp
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/aarch64.exp
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/shared.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/gomp.exp 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/gomp.exp
new file mode 100644
index 000..279df81cf89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/gomp.exp
@@ -0,0 +1,46 @@
+# Copyright (C) 2006-2024 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+if ![check_effective_target_fopenmp] {
+  return
+}
+
+if { [check_effective_target_aarch64_sve] } {
+set sve_flags ""
+} else {
+set sve_flags "-march=armv8.2-a+sve"
+}
+
+# Main loop.
+dg-runtest [lsort [find $srcdir/$subdir *.c]] "$sve_flags -fopenmp" ""
+
+# All done.
+dg-finish
diff --git a/libgomp/testsuite/libgomp.target/aarch64/aarch64.exp 
b/libgomp/testsuite/libgomp.target/aarch64/aarch64.exp
new file mode 100644
index 000..828cc06c65b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/aarch64.exp
@@ -0,0 +1,57 @@
+# Copyright (C) 2006-2024 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# Load support procs.
+load_lib libgomp-dg.exp
+load_gcc_lib gcc-dg.exp
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+lappend ALWAYS_CFLAGS "compiler=$GCC_UNDER_TEST"
+
+if { [check_effective_target_aarch64_sve] } {
+set sve_flags ""
+} else {
+set sve_flags "-march=armv8.2-a+sve"
+}
+
+# Initialize `dg'.
+dg-init
+
+#if ![check_effective_target_fopenmp] {
+#  return
+#}
+
+# Turn on OpenMP.
+lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
+
+# Gather a list of all tests.
+set tests [lsort [find $srcdir/$subdir *.c]]
+
+set ld_library_path $always_ld_library_path
+append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
+set_ld_library_path_env_vars
+
+# Main loop.
+dg-runtest $tests "" $sve_flags
+
+# All done.
+dg-finish
diff --git a/libgomp/testsuite/libgomp.target/aarch64/shared.c 
b/libgomp/testsuite/libgomp.target/aarch64/shared.c
new file mode 100644
index 000..3f380d95da4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/shared.c
@@ -0,0 +1,186 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+#include 
+#include 
+#include 
+
+svint32_t
+__attribute__ ((noinline))
+explicit_shared (svint32_t a, svint32_t b, svbool_t p)
+{
+
+#pragma omp parallel shared (a, b, p) num_threads (1)
+  {
+/* 'a', 'b' and 'p' are explicitly shared.  */
+a = svadd_s32_z (p, a, b);
+  }
+
+#pragma omp parallel shared (a, b, p) num_threads (1)
+  {
+a = svadd_s32_z (p, a, b);
+  }
+
+  return a;
+}
+
+svin

[PATCH v2 04/12] AArch64: Diagnose OpenMP offloading when SVE types involved.

2024-10-17 Thread Tejas Belagod
The target clause in OpenMP is used to offload loop kernels to accelarator
peripeherals.  target's 'map' clause is used to move data from and to the
accelarator.  When the data is SVE type, it may not be suitable because of
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes SVE unsuitable
for use in OMP's 'map' clause.  This patch diagnoses all such cases and issues
an error where SVE types are not suitable.

Co-authored-by: Andrea Corallo 

gcc/ChangeLog:

* target.h (type_context_kind): Add new context kinds for target 
clauses.
* config/aarch64/aarch64-sve-builtins.cc (verify_type_context): Diagnose
SVE types for a given OpenMP context.
(omp_type_context): New.
* gimplify.cc (omp_notice_variable):  Diagnose implicitly-mapped SVE
objects in OpenMP regions.
(gimplify_scan_omp_clauses): Diagnose SVE types for various target
clauses.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/offload.c: New test.
* gcc.target/aarch64/sve/omp/offload-parallel-loop.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-parallel.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-simd.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams-distribute-simd.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams-distribute.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams-loop.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams.c: Likewise.
* gcc.target/aarch64/sve/omp/target-device.c: Likewise.
* gcc.target/aarch64/sve/omp/target-link.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc|  52 +-
 gcc/gimplify.cc   |  34 +-
 gcc/target.h  |  19 +-
 .../aarch64/sve/omp/offload-parallel-loop.c   | 442 +
 .../aarch64/sve/omp/offload-parallel.c| 376 +++
 .../gcc.target/aarch64/sve/omp/offload-simd.c | 442 +
 .../sve/omp/offload-teams-distribute-simd.c   | 442 +
 .../sve/omp/offload-teams-distribute.c| 442 +
 .../aarch64/sve/omp/offload-teams-loop.c  | 442 +
 .../aarch64/sve/omp/offload-teams.c   | 365 ++
 .../gcc.target/aarch64/sve/omp/offload.c  | 452 ++
 .../aarch64/sve/omp/target-device.c   | 186 +++
 .../gcc.target/aarch64/sve/omp/target-link.c  |  54 +++
 13 files changed, 3745 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel-loop.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-simd.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-distribute-simd.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-distribute.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-loop.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/target-device.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/target-link.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index e7c703c987e..2c169ea3806 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -4956,12 +4956,35 @@ handle_arm_sve_vector_bits_attribute (tree *node, tree, 
tree args, int,
   return NULL_TREE;
 }
 
+
+/* Return true if OpenMP context types.  */
+
+static bool
+omp_type_context (type_context_kind context)
+{
+  switch (context)
+{
+case TCTX_OMP_MAP:
+case TCTX_OMP_MAP_IMP_REF:
+case TCTX_OMP_PRIVATE:
+case TCTX_OMP_FIRSTPRIVATE:
+case TCTX_OMP_DEVICE_ADDR:
+  return true;
+default:
+  return false;;
+}
+}
+
 /* Implement TARGET_VERIFY_TYPE_CONTEXT for SVE types.  */
 bool
 verify_type_context (location_t loc, type_context_kind context,
 const_tree type, bool silent_p)
 {
-  if (!sizeless_type_p (type))
+  const_tree tmp = type;
+  if (omp_type_context (context) && POINTER_TYPE_P (type))
+tmp = strip_pointer_types (tmp);
+
+  if (!sizeless_type_p (tmp))
 return true;
 
   switch (context)
@@ -5021,6 +5044,33 @@ verify_type_context (location_t loc, type_context_kind 
context,
   if (!silent_p)
error_at (loc, "capture by copy of SVE type %qT", type);
   return false;
+
+case TCTX_OMP_MAP:
+  if (!silent_p)
+   error_at (loc, "SVE type %qT not allowed in map clause", type);
+  return false;
+
+case TCTX_OMP_MAP_IMP_REF:
+  /* The diagnosis is done in the caller.  */
+  return false;
+
+case TC

[PATCH v2 08/12] libgomp, AArch64: Test OpenMP uniform clause on SVE types.

2024-10-17 Thread Tejas Belagod
This patch tests if simd uniform clause works with SVE types in simd regions.

libgomp/ChangeLog:
* testsuite/libgomp.target/aarch64/simd-uniform.c: New.
---
 .../libgomp.target/aarch64/simd-uniform.c | 83 +++
 1 file changed, 83 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.target/aarch64/simd-uniform.c

diff --git a/libgomp/testsuite/libgomp.target/aarch64/simd-uniform.c 
b/libgomp/testsuite/libgomp.target/aarch64/simd-uniform.c
new file mode 100644
index 000..48a8a91b004
--- /dev/null
+++ b/libgomp/testsuite/libgomp.target/aarch64/simd-uniform.c
@@ -0,0 +1,83 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
+
+#include 
+
+#define N 256
+
+void
+init (int *a, int *a_ref, int *b, int n)
+{
+   int i;
+   for (i = 0; i < N; i++)
+   {
+  a[i] = i;
+  a_ref[i] = i;
+  b[i] = N-i;
+   }
+}
+
+void vec_add (svint32_t ones, int *a, int *b, int i, int64_t sz);
+
+#pragma omp declare simd uniform(ones, a, b, sz) linear (i)
+void
+vec_add (svint32_t ones, int *a, int *b, int i, int64_t sz)
+{
+   svint32_t tmp;
+   svint32_t va, vb;
+
+   va = svld1_s32 (svptrue_b32 (), a + i * sz);
+   vb = svld1_s32 (svptrue_b32 (), b + i * sz);
+   tmp = svadd_s32_z (svptrue_b32 (), va, vb);
+   tmp = svadd_s32_z (svptrue_b32 (), tmp, ones);
+   svst1_s32 (svptrue_b32 (), a + i * sz, tmp);
+}
+
+void
+work (int *a, int *b, int n)
+{
+  int i;
+  int64_t sz = svcntw ();
+
+  #pragma omp simd
+  for (i = 0; i < n/sz; i++)
+{
+  svint32_t va, vb;
+  svint32_t ones = svdup_n_s32 (1);
+  vec_add (ones, a, b, i, sz);
+}
+}
+
+void
+work_ref (int *a, int *b, int n)
+{
+   int i;
+   for ( i = 0; i < n; i++ ) {
+ a[i] = a[i] + b[i] + 1;
+   }
+}
+
+void
+check (int *a, int *b)
+{
+  int i;
+  for (i = 0; i < N; i++)
+if (a[i] != b[i])
+  __builtin_abort ();
+}
+
+int
+main ()
+{
+   int i;
+   int a[N], a_ref[N], b[N];
+
+   init(a, a_ref, b, N);
+
+   work(a, b, N );
+   work_ref(a_ref, b, N );
+
+   check(a, a_ref);
+
+   return 0;
+}
-- 
2.25.1



Re: [PATCH v2 00/12] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-10-17 Thread Tejas Belagod

Hi Jakub,

Just wanted to add that I'm sorry for the delay in respinning the 
patchset - I was caught up with another piece of work. Thanks for the 
reviews so far and thank you for your patience.


Thanks,
Tejas.

On 10/18/24 11:52 AM, Tejas Belagod wrote:

The following patch series is reworked from its first version based on Jakub's
review comments in
   https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659540.html

The changes in v2:

1. Moved all execute tests to under libgomp/testsuite/libgomp.target/aarch64/.
2. Retained gcc/testsuite/gcc.target/aarch64/sve/omp/ for compile tests.
3. Handled offloading SVE types differently based on sizeless and fixed-size
types.  Also added more tests to check for VLA and VLS types.
4. Made tests more representative of real-world scenarios.
5. Converted some compile tests to execute tests.
6. For user-defined reductions, I have removed task and taskloop tests for now.
I need to understand the constructs better before adding meaningful tests.
7. One known fail where declare simd uniform clones a function to a variant
to support a particular type in the clause.  This fails on SVE with a decl
without prototype error. It is unclear how this ought to be handled.
I went ahead and posted the rest of the series as I didn't want this issue
to block the rest of the patches.

The following patch series handles various scenarios with OpenMP and SVE types.
The starting point for the series follows a suggestion from Jakub to cover all
the possible scenarios that could arise when OMP constructs/clauses etc are
used with SVE ACLE types. Here are a few instances that this patch series test
and in some cases fixes the expected output.  This patch series does not follow
a formal definition or a spec of how OMP interacts with SVE ACLE types, so its
more of a proposed behaviour.  Comments and discussion welcome.

This list is not exhaustive, but covers most scenarios of how SVE ACLE types
ought to interact with OMP constructs/clauses.

1. Poly-int structures that represent variable-sized objects and OMP runtime.

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

2. SVE ACLE types in OMP Shared clauses.

We test the behaviour where SVE ACLE type objects are shared in the following
methods into an OMP region:
   a. Explicit Shared clause on SVE ACLE type objects.
   b. Implicit shared clause.
   c. Implicit shared with default clause.
   d. SVE ALCE types in the presence of predetemined (static) shared objects.

The associated tests ensure that all such shared objects are passed by address
into the OMP runtime.  There are runtime tests to verify the functional
correctness of the change.

3. [tree] Add function to strip pointer type and get down to the actual pointee 
type.

Adds a support function in tree.h to strip pointer types to drill down to the 
pointee
type.

4. Offloading and SVE ACLE types.

The target clause in OpenMP is used to offload loop kernels to accelarator
peripeherals.  target's 'map' clause is used to move data from and to the
accelarator.  When the data is sizeless SVE type, it may be unsuitable due to
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes sizeless SVE types
unsuitable for use in OMP's 'map' clause.  We diagnose all such cases and issue
errors where appropriate.  The cases we cover in this patch are:

   a. Implicitly-mapped SVE ACLE types in OMP target regions are diagnosed.
   b. Explicitly-mapped SVE ACLE types in OMP target regions using map clause
  are diagnosed.
   c. Explicilty-mapped SVLE ACLE types of various directions - to, from, tofrom
  in the map clause are diagnosed.
   d. target enter and exit data clauses with map on SVE ACLE types are
  diagnosed.
   e. target data map with alloc on SVE ACLE types are diagnosed.
   f. target update from clause on SVE ACLE types are diagnosed.
   g. target private firstprivate with SVE ACLE types are diagnosed.
   h. All combinations of target with work-sharing constructs like parallel,
  loop, simd, teams, distribute etc are also diagnosed when SVE ACLE types
  are involved.

For a fixed size SVE vector types(eg. fixed by arm_sve_vector_bits attribute),
we don't diagnose.  Fixed size vectors are allowed to be used in OMP offloading
constructs and clauses.  The only caveat is that LTO streamers that handle
streaming in the offloaded bytecode is expected to check for matching vector
size and diagnose as the attribute sizes are also streamed out.

5. Lastprivate and SVE ACLE types.

Various OpenMP lastprivate clause scenarios with SVE object types are
diagnosed.  Worksharing constructs like sections, for, distribute bind to an
implicit outer parallel region in whose scope SVE ACLE types are declared and
are therefore d

[PATCH] i386: Refactor get_intel_cpu

2024-10-17 Thread Haochen Jiang
Hi all,

ISE054 has just been disclosed and you can find doc from here:

https://cdrdv2.intel.com/v1/dl/getContent/671368

>From ISE, it shows that we will have family 0x13 for Diamond Rapids.
Therefore, we need to refactor the get_intel_cpu to accept new families.
Also I did some reorder in the switch for clearness by putting earlier
added products on top for search convenience.

Bootstraped and tested on x86_64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Refactor the
function for future expansion on different family.
---
 gcc/common/config/i386/cpuinfo.h | 587 +++
 1 file changed, 292 insertions(+), 295 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 2ae383eb6ab..e3eb6e9d250 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -343,301 +343,298 @@ get_intel_cpu (struct __processor_model *cpu_model,
 {
   const char *cpu = NULL;
 
-  /* Parse family and model only for model 6. */
-  if (cpu_model2->__cpu_family != 0x6)
-return cpu;
-
-  switch (cpu_model2->__cpu_model)
-{
-case 0x1c:
-case 0x26:
-  /* Bonnell.  */
-  cpu = "bonnell";
-  CHECK___builtin_cpu_is ("atom");
-  cpu_model->__cpu_type = INTEL_BONNELL;
-  break;
-case 0x37:
-case 0x4a:
-case 0x4d:
-case 0x5d:
-  /* Silvermont.  */
-case 0x4c:
-case 0x5a:
-case 0x75:
-  /* Airmont.  */
-  cpu = "silvermont";
-  CHECK___builtin_cpu_is ("silvermont");
-  cpu_model->__cpu_type = INTEL_SILVERMONT;
-  break;
-case 0x5c:
-case 0x5f:
-  /* Goldmont.  */
-  cpu = "goldmont";
-  CHECK___builtin_cpu_is ("goldmont");
-  cpu_model->__cpu_type = INTEL_GOLDMONT;
-  break;
-case 0x7a:
-  /* Goldmont Plus.  */
-  cpu = "goldmont-plus";
-  CHECK___builtin_cpu_is ("goldmont-plus");
-  cpu_model->__cpu_type = INTEL_GOLDMONT_PLUS;
-  break;
-case 0x86:
-case 0x96:
-case 0x9c:
-  /* Tremont.  */
-  cpu = "tremont";
-  CHECK___builtin_cpu_is ("tremont");
-  cpu_model->__cpu_type = INTEL_TREMONT;
-  break;
-case 0x1a:
-case 0x1e:
-case 0x1f:
-case 0x2e:
-  /* Nehalem.  */
-  cpu = "nehalem";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("nehalem");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_NEHALEM;
-  break;
-case 0x25:
-case 0x2c:
-case 0x2f:
-  /* Westmere.  */
-  cpu = "westmere";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("westmere");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_WESTMERE;
-  break;
-case 0x2a:
-case 0x2d:
-  /* Sandy Bridge.  */
-  cpu = "sandybridge";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("sandybridge");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_SANDYBRIDGE;
-  break;
-case 0x3a:
-case 0x3e:
-  /* Ivy Bridge.  */
-  cpu = "ivybridge";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("ivybridge");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_IVYBRIDGE;
-  break;
-case 0x3c:
-case 0x3f:
-case 0x45:
-case 0x46:
-  /* Haswell.  */
-  cpu = "haswell";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("haswell");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_HASWELL;
-  break;
-case 0x3d:
-case 0x47:
-case 0x4f:
-case 0x56:
-  /* Broadwell.  */
-  cpu = "broadwell";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("broadwell");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_BROADWELL;
-  break;
-case 0x4e:
-case 0x5e:
-  /* Skylake.  */
-case 0x8e:
-case 0x9e:
-  /* Kaby Lake.  */
-case 0xa5:
-case 0xa6:
-  /* Comet Lake.  */
-  cpu = "skylake";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("skylake");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
-  break;
-case 0xa7:
-  /* Rocket Lake.  */
-  cpu = "rocketlake";
-  CHECK___builtin_cpu_is ("corei7");
-  CHECK___builtin_cpu_is ("rocketlake");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  cpu_model->__cpu_subtype = INTEL_COREI7_ROCKETLAKE;
-  break;
-case 0x55:
-  CHECK___builtin_cpu_is ("corei7");
-  cpu_model->__cpu_type = INTEL_COREI7;
-  if (has_cpu_feature (cpu_model, cpu_features2,
-  FEATURE_AVX512BF16))
-   {
- /* Cooper Lake.  */
- cpu = "co

Re: Frontend access to target features (was Re: [PATCH] libgccjit: Add ability to get CPU features)

2024-10-17 Thread Antoni Boucher

Hi.
Thanks for the review, David!

I talked to Arthur and he's OK with having a file to include in both 
gccrs and libgccjit.


I sent the patch to gccrs to move the code in a new file that we can 
include in both frontends: https://github.com/Rust-GCC/gccrs/pull/3195


I also renamed gcc_jit_target_info_supports_128bit_int to 
gcc_jit_target_info_supports_target_dependent_type because a subsequent 
patch will allow to check if other types are supported like _Float16 and 
_Float128.


Here's the patch for libgccjit updated to include this file.

Thanks.

Le 2024-06-26 à 17 h 55, David Malcolm a écrit :

On Sun, 2024-03-10 at 12:05 +0100, Iain Buclaw wrote:

Excerpts from David Malcolm's message of März 5, 2024 4:09 pm:

On Thu, 2023-11-09 at 19:33 -0500, Antoni Boucher wrote:

Hi.
See answers below.

On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:

On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:

Hi.
This patch adds support for getting the CPU features in
libgccjit
(bug
112466)

There's a TODO in the test:
I'm not sure how to test that gcc_jit_target_info_arch
returns
the
correct value since it is dependant on the CPU.
Any idea on how to improve this?

Also, I created a CStringHash to be able to have a
std::unordered_set. Is there any built-in way
of
doing
this?


Thanks for the patch.

Some high-level questions:

Is this specifically about detecting capabilities of the host
that
libgccjit is currently running on? or how the target was
configured
when libgccjit was built?


I'm less sure about this part. I'll need to do more tests.



One of the benefits of libgccjit is that, in theory, we support
all
of
the targets that GCC already supports.  Does this patch change
that,
or
is this more about giving client code the ability to determine
capabilities of the specific host being compiled for?


This should not change that. If it does, this is a bug.



I'm nervous about having per-target jit code.  Presumably
there's a
reason that we can't reuse existing target logic here - can you
please
describe what the problem is.  I see that the ChangeLog has:


 * config/i386/i386-jit.cc: New file.


where i386-jit.cc has almost 200 lines of nontrivial code.
Where
did
this come from?  Did you base it on existing code in our source
tree,
making modifications to fit the new internal API, or did you
write
it
from scratch?  In either case, how onerous would this be for
other
targets?


This was mostly copied from the same code done for the Rust and D
frontends.
See this commit and the following:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1c06fd9723453dd2b2ec306684cb806dc2b4fbb
The equivalent to i386-jit.cc is there:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=22e3557e2d52f129f2bbfdc98688b945dba28dc9


[CCing Iain and Arthur re those patches; for reference, the patch
being
discussed is attached to :
https://gcc.gnu.org/pipermail/jit/2024q1/001792.html ]

One of my concerns about this patch is that we seem to be gaining
code
that's per-(frontend x config) which seems to be copied and pasted
with
a search and replace, which could lead to an M*N explosion.



That's certainly the case with the configure/make rules. Itself I
think
is copied originally from the {cpu_type}-protos.h machinery.

It might be worth pointing out that the c-family of front-ends don't
have separate headers because their per-target macros are defined in
{cpu_type}.h directly - for better or worse.


Is there any real difference between the per-config code for the
different frontends, or should there be a general "enumerate all
features of the target" hook that's independent of the frontend?
(but
perhaps calls into it).



As far as I understand, the configure parts should all be identical
between tm_p, tm_d, tm_rust, ..., so would benefit from being
templated
to aid any other front-ends adding in their own per target hooks.


Am I right in thinking that (rustc with default LLVM backend) has
some
set of feature strings that both (rustc with rustc_codegen_gcc) and
gccrs are trying to emulate?  If so, is it presumably a goal that
libgccjit gives identical results to gccrs?  If so, would it be
crazy
for libgccjit to consume e.g. config/i386/i386-rust.cc ?


I don't know whether libgccjit can just pull in directly the
implementation of the rust target hooks here.


Sorry for the delay in responding.

I don't want to be in the business of maintaining a copy of the per-
target code for "jit", and I think it makes sense for libgccjit to
return identical information compared to gccrs.

So I think it would be ideal for jit to share code with rust for this,
rather than do a one-time copy-and-paste followed by a ongoing "keep
things updated" treadmill.

Presumably there would be Makefile.in issues given that e.g. Makefile
has i386-rust.o listed in:

# Target specific, Rust specific object file
RUST_TARGET_OBJS= i386-rust.o linux-rust.o

One approach might be to move e.g. the i386-rust.cc code into, say,  a
i386-rust-and-jit.inc file an

Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-17 Thread Christophe Lyon
On Thu, 17 Oct 2024 at 15:06, Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Christophe Lyon  writes:
> >> When --enable-werror is enabled when running the top-level configure,
> >> it passes --enable-werror-always to subdirs.  Some of them, like
> >> libgcc, ignore it.
> >>
> >> This patch adds support for it, enabled only for aarch64, to avoid
> >> breaking bootstrap for other targets.
> >>
> >> The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling 
> >> lse_init.c
> >>
> >>  libgcc/
> >>  * Makefile.in (WERROR): New.
> >>  * config/aarch64/t-aarch64: Handle WERROR. Always use
> >>  -Wno-prio-ctor-dtor.
> >>  * configure.ac: Add support for --enable-werror-always.
> >>  * configure: Regenerate.
> >> ---
> >>  libgcc/Makefile.in  |  1 +
> >>  libgcc/config/aarch64/t-aarch64 |  1 +
> >>  libgcc/configure| 31 +++
> >>  libgcc/configure.ac |  5 +
> >>  4 files changed, 38 insertions(+)
> >>
> >> [...]
> >> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> >> index 4e8c036990f..6b3ea2aea5c 100644
> >> --- a/libgcc/configure.ac
> >> +++ b/libgcc/configure.ac
> >> @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
> >>  sinclude(../config/gthr.m4)
> >>  sinclude(../config/sjlj.m4)
> >>  sinclude(../config/cet.m4)
> >> +sinclude(../config/warnings.m4)
> >>
> >>  AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
> >>  AC_CONFIG_SRCDIR([static-object.mk])
> >> @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
> >>  # Determine what GCC version number to use in filesystem paths.
> >>  GCC_BASE_VER
> >>
> >> +# Only enable with --enable-werror-always until existing warnings are
> >> +# corrected.
> >> +ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual])
> >
> > It looks like this is borrowed from libcpp and/or libdecnumber.
> > Those are a bit different from libgcc in that they're host libraries
> > that can be built with any supported compiler (including non-GCC ones).
> > In constrast, libgcc can only be built with the corresponding version
> > of GCC.  The usual restrictions on -Werror -- only use it during stages
> > 2 and 3, or if the user explicitly passes --enable-werror -- don't apply
> > in libgcc's case.  We should always be building with the "right" version
> > of GCC (even for Canadian crosses) and so should always be able to use
> > -Werror.
> >
> > So personally, I think we should just go with:
> >
> > diff --git a/libgcc/config/aarch64/t-aarch64 
> > b/libgcc/config/aarch64/t-aarch64
> > index b70e7b94edd..ae1588ce307 100644
> > --- a/libgcc/config/aarch64/t-aarch64
> > +++ b/libgcc/config/aarch64/t-aarch64
> > @@ -30,3 +30,4 @@ LIB2ADDEH += \
> >   $(srcdir)/config/aarch64/__arm_za_disable.S
> >
> >  SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
> > +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
> >
> > ...this, but with $(WERROR) replaced by -Werror.
> >
> > At least, it would be a good way of finding out if there's a case
> > I've forgotten :)
> >
> > Let's see what others think though.
>
> As per the later discussion, the t-aarch64 change described above is OK
> for trunk, but anyone with commit access should feel free to revert it
> if it breaks their build.  (Although please post a description of what
> went wrong as well :))
>
> Thanks for doing this.
>

Thanks, find attached what I'm pushing.

Christophe

> Richard
From 71c7b446b98aa51294d79c45e37f1564668a1f3a Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 3 Oct 2024 16:02:55 +
Subject: [PATCH v2 3/3] aarch64: libgcc: Use -Werror

This patch adds -Werror to LIBGCC2_CFLAGS so that aarch64 can catch
warnings during bootstrap, while not impacting other targets.

The patch also adds -Wno-prio-ctor-dtor to avoid a warning when
compiling lse_init.c

	libgcc/
	* config/aarch64/t-aarch64: Always use -Werror
	-Wno-prio-ctor-dtor.
---
 libgcc/config/aarch64/t-aarch64 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
index b70e7b94edd..d500bd0de67 100644
--- a/libgcc/config/aarch64/t-aarch64
+++ b/libgcc/config/aarch64/t-aarch64
@@ -30,3 +30,4 @@ LIB2ADDEH += \
 	$(srcdir)/config/aarch64/__arm_za_disable.S
 
 SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
+LIBGCC2_CFLAGS += -Werror -Wno-prio-ctor-dtor
-- 
2.34.1



Re: [PATCH] SVE intrinsics: Add fold_active_lanes_to method to refactor svmul and svdiv.

2024-10-17 Thread Richard Sandiford
Jennifer Schmitz  writes:
>> On 16 Oct 2024, at 21:16, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz  writes:
>>> As suggested in
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html,
>>> this patch adds the method gimple_folder::fold_active_lanes_to (tree X).
>>> This method folds active lanes to X and sets inactive lanes according to
>>> the predication, returning a new gimple statement. That makes folding of
>>> SVE intrinsics easier and reduces code duplication in the
>>> svxxx_impl::fold implementations.
>>> Using this new method, svdiv_impl::fold and svmul_impl::fold were 
>>> refactored.
>>> Additionally, the method was used for two optimizations:
>>> 1) Fold svdiv to the dividend, if the divisor is all ones and
>>> 2) for svmul, if one of the operands is all ones, fold to the other operand.
>>> Both optimizations were previously applied to _x and _m predication on
>>> the RTL level, but not for _z, where svdiv/svmul were still being used.
>>> For both optimization, codegen was improved by this patch, for example by
>>> skipping sel instructions with all-same operands and replacing sel
>>> instructions by mov instructions.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
>>> gcc/
>>>  * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>>>  Refactor using fold_active_lanes_to and fold to dividend, is the
>>>  divisor is all ones.
>>>  (svmul_impl::fold): Refactor using fold_active_lanes_to and fold
>>>  to the other operand, if one of the operands is all ones.
>>>  * config/aarch64/aarch64-sve-builtins.h: Declare
>>>  gimple_folder::fold_active_lanes_to (tree).
>>>  * config/aarch64/aarch64-sve-builtins.cc
>>>  (gimple_folder::fold_actives_lanes_to): Add new method to fold
>>>  actives lanes to given argument and setting inactives lanes
>>>  according to the predication.
>>> 
>>> gcc/testsuite/
>>>  * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected outcome.
>>>  * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>  * gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>  * gcc.target/aarch64/sve/mul_const_run.c: Likewise.
>> 
>> Thanks, this looks great.  Just one comment on the tests:
>> 
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>> index d5a23bf0726..521f8bb4758 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>> @@ -57,7 +57,6 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
>>> 
>>> /*
>>> ** div_1_s32_m_tied1:
>>> -**   sel z0\.s, p0, z0\.s, z0\.s
>>> **   ret
>>> */
>>> TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
>>> @@ -66,7 +65,7 @@ TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
>>> 
>>> /*
>>> ** div_1_s32_m_untied:
>>> -**   sel z0\.s, p0, z1\.s, z1\.s
>>> +**   mov z0\.d, z1\.d
>>> **   ret
>>> */
>>> TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
>>> @@ -217,9 +216,8 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, 
>>> int32_t,
>>> 
>>> /*
>>> ** div_1_s32_z_tied1:
>>> -**   mov (z[0-9]+\.s), #1
>>> -**   movprfx z0\.s, p0/z, z0\.s
>>> -**   sdivz0\.s, p0/m, z0\.s, \1
>>> +**   mov (z[0-9]+)\.b, #0
>>> +**   sel z0\.s, p0, z0\.s, \1\.s
>>> **   ret
>>> */
>>> TEST_UNIFORM_Z (div_1_s32_z_tied1, svint32_t,
>> 
>> Tamar will soon push a patch to change how we generate zeros.
>> Part of that will involve rewriting existing patterns to be more
>> forgiving about the exact instruction that is used to zero a register.
>> 
>> The new preferred way of matching zeros is:
>> 
>> **  movi?   [vdz]([0-9]+)\.(?:[0-9]*[bhsd])?, #?0
>> 
>> (yeah, it's a bit of mouthful).  Could you change all the tests
>> to use that?  The regexp only captures the register number, so uses
>> of \1 etc. will need to become z\1.
>> 
>> OK with that change.  But would you mind waiting until Tamar pushes
>> his patch ("AArch64: use movi d0, #0 to clear SVE registers instead
>> of mov z0.d, #0"), just to make sure that the tests work with that?
>> 
> Thanks for the review. Sur

[committed] tree-object-size: Fall back to wholesize for non-const offset

2024-10-17 Thread Siddhesh Poyarekar
Sorry I had missed calling the test case itself, so fixed that up,
rebased on master and committed.

->8--

Don't bail out early if the offset to a pointer in __builtin_object_size
is a variable, return the wholesize instead since that is a better
fallback for maximum estimate.  This should keep checks in place for
fortified functions to constrain overflows to at lesat some extent.

gcc/ChangeLog:

PR middle-end/77608
* tree-object-size.cc (plus_stmt_object_size): Drop check for
constant offset.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-1.c (test12): New test.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/testsuite/gcc.dg/builtin-object-size-1.c | 21 
 gcc/tree-object-size.cc  |  6 +++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
index d6d13c5ef7a..6161adbd128 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
@@ -712,6 +712,25 @@ test11 (void)
 }
 #endif
 
+void
+__attribute__ ((noinline))
+test12 (unsigned off)
+{
+  char *buf2 = malloc (10);
+  char *p;
+  size_t t;
+
+  p = &buf2[off];
+
+#ifdef __builtin_object_size
+  if (__builtin_object_size (p, 0) != 10 - off)
+FAIL ();
+#else
+  if (__builtin_object_size (p, 0) != 10)
+FAIL ();
+#endif
+}
+
 int
 main (void)
 {
@@ -730,5 +749,7 @@ main (void)
 #ifndef SKIP_STRNDUP
   test11 ();
 #endif
+  test12 (0);
+  test12 (2);
   DONE ();
 }
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 78faae7ad0d..0e4bf84fd11 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -1501,8 +1501,7 @@ plus_stmt_object_size (struct object_size_info *osi, tree 
var, gimple *stmt)
 return false;
 
   /* Handle PTR + OFFSET here.  */
-  if (size_valid_p (op1, object_size_type)
-  && (TREE_CODE (op0) == SSA_NAME || TREE_CODE (op0) == ADDR_EXPR))
+  if ((TREE_CODE (op0) == SSA_NAME || TREE_CODE (op0) == ADDR_EXPR))
 {
   if (TREE_CODE (op0) == SSA_NAME)
{
@@ -1528,7 +1527,8 @@ plus_stmt_object_size (struct object_size_info *osi, tree 
var, gimple *stmt)
;
   else if ((object_size_type & OST_DYNAMIC)
   || bytes != wholesize
-  || compare_tree_int (op1, offset_limit) <= 0)
+  || (size_valid_p (op1, object_size_type)
+  && compare_tree_int (op1, offset_limit) <= 0))
bytes = size_for_offset (bytes, op1, wholesize);
   /* In the static case, with a negative offset, the best estimate for
 minimum size is size_unknown but for maximum size, the wholesize is a
-- 
2.46.0



[PATCH v2] c++: Fix crash during NRV optimization with invalid input [PR117099, PR117129]

2024-10-17 Thread Simon Martin
Hi,

The issue reported in PR117129 is pretty similar to the one in PR117099,

so here’s an updated version of the patch that fixes both issues, by
ensuring that finish_return_expr sets current_function_return_value to
error_mark_node in case of error with said return value.

Successfully tested on x86_64-pc-linux-gnu. OK for trunk?

Thanks, SimonFrom ff01d18d97893ef65259f9063794945d5062cf4e Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Wed, 16 Oct 2024 15:47:12 +0200
Subject: [PATCH] c++: Fix crash during NRV optimization with invalid input 
[PR117099, PR117129]

PR117099 and PR117129 are ICEs upon invalid code that happen when NRVO
is activated, and both due to the fact that we don't consistently set
current_function_return_value to error_mark_node upon error in
finish_return_expr.

This patch fixes this inconsistency which fixes both cases since we skip
calling finalize_nrv when current_function_return_value is
error_mark_node.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/117099
PR c++/117129

gcc/cp/ChangeLog:

* typeck.cc (check_return_expr): Upon error, set
current_function_return_value to error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash77.C: New test.
* g++.dg/parse/crash78.C: New test.

---
 gcc/cp/typeck.cc | 11 ---
 gcc/testsuite/g++.dg/parse/crash77.C | 14 ++
 gcc/testsuite/g++.dg/parse/crash78.C | 15 +++
 3 files changed, 37 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/crash77.C
 create mode 100644 gcc/testsuite/g++.dg/parse/crash78.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 71d879abef1..485e8b347bb 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11232,8 +11232,9 @@ check_return_expr (tree retval, bool *no_warning, bool 
*dangling)
   if (functype != error_mark_node)
permerror (input_location, "return-statement with no value, in "
   "function returning %qT", valtype);
-  /* Remember that this function did return.  */
+  /* Remember that this function returns an error.  */
   current_function_returns_value = 1;
+  retval = error_mark_node;
   /* And signal caller that TREE_NO_WARNING should be set on the
 RETURN_EXPR to avoid control reaches end of non-void function
 warnings in tree-cfg.cc.  */
@@ -11442,9 +11443,13 @@ check_return_expr (tree retval, bool *no_warning, bool 
*dangling)
   tf_warning_or_error);
   retval = convert (valtype, retval);
 
-  /* If the conversion failed, treat this just like `return;'.  */
+  /* If the conversion failed, treat this just like
+`return ;'.  */
   if (retval == error_mark_node)
-   return retval;
+   {
+ current_function_return_value = error_mark_node;
+ return retval;
+   }
   /* We can't initialize a register from a AGGR_INIT_EXPR.  */
   else if (! cfun->returns_struct
   && TREE_CODE (retval) == TARGET_EXPR
diff --git a/gcc/testsuite/g++.dg/parse/crash77.C 
b/gcc/testsuite/g++.dg/parse/crash77.C
new file mode 100644
index 000..8c3dddc13d2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash77.C
@@ -0,0 +1,14 @@
+// PR c++/117099
+// { dg-do "compile" }
+
+struct X {
+  ~X();
+};
+
+X test(bool b) {
+  {
+X x;
+return x;
+  } 
+  if (!(b)) return; // { dg-error "return-statement with no value" }
+}
diff --git a/gcc/testsuite/g++.dg/parse/crash78.C 
b/gcc/testsuite/g++.dg/parse/crash78.C
new file mode 100644
index 000..08d62af39c6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash78.C
@@ -0,0 +1,15 @@
+// PR c++/117129
+// { dg-do "compile" { target c++11 } }
+
+struct Noncopyable {
+  Noncopyable();
+  Noncopyable(const Noncopyable &) = delete; // { dg-note "declared here" }
+  virtual ~Noncopyable();
+};
+Noncopyable nrvo() { 
+  {
+Noncopyable A;
+return A; // { dg-error "use of deleted function" }
+ // { dg-note "display considered" "" { target *-*-* } .-1 }
+  }
+}
-- 
2.44.0



Re: [PATCH v2] contrib/: Configure git-format-patch(1) to add To: gcc-patches@gcc.gnu.org

2024-10-17 Thread Alejandro Colomar
On Thu, Oct 17, 2024 at 04:54:04PM GMT, Alejandro Colomar wrote:
> Just like we already do for git-send-email(1).  In some cases, patches
> are prepared with git-format-patch(1), but are sent with a different
> program, or some flags to git-send-email(1) may accidentally inhibit the
> configuration.  By adding the TO in the email file, we make sure that
> gcc-patches@ will receive the patch.
> 
> contrib/ChangeLog:
> 
>   * gcc-git-customization.sh: Configure git-format-patch(1) to add
>   'To: gcc-patches@gcc.gnu.org'.
> 
> Cc: Eric Gallager 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> Hi!
> 
> v2 changes:
> 
> -  Fix comment.  [Eric]
> 
> Cheers,
> Alex

BTW, this patch will need

to be applied first, to allow the Cc: tag.  Please apply it too.  :)

Have a lovely day!
Alex

> 
> 
> Range-diff against v1:
> 1:  0ee3f802637 ! 1:  2bd0e0f82bf contrib/: Configure git-format-patch(1) to 
> add To: gcc-patches@gcc.gnu.org
> @@ Commit message
>  * gcc-git-customization.sh: Configure git-format-patch(1) to 
> add
>  'To: gcc-patches@gcc.gnu.org'.
>  
> +Cc: Eric Gallager 
>  Signed-off-by: Alejandro Colomar 
>  
>   ## contrib/gcc-git-customization.sh ##
> -@@ contrib/gcc-git-customization.sh: git config diff.md.xfuncname 
> '^\(define.*$'
> +@@ contrib/gcc-git-customization.sh: git config alias.gcc-style '!f() {
> + # *.mddiff=md
> + git config diff.md.xfuncname '^\(define.*$'
>   
> - # Tell git send-email where patches go.
> +-# Tell git send-email where patches go.
> ++# Tell git-format-patch(1)/git-send-email(1) where patches go.
>   # ??? Maybe also set sendemail.tocmd to guess from MAINTAINERS?
>  +git config format.to 'gcc-patches@gcc.gnu.org'
>   git config sendemail.to 'gcc-patches@gcc.gnu.org'
> 
>  contrib/gcc-git-customization.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/contrib/gcc-git-customization.sh 
> b/contrib/gcc-git-customization.sh
> index 54bd35ea1aa..dd59bece1dc 100755
> --- a/contrib/gcc-git-customization.sh
> +++ b/contrib/gcc-git-customization.sh
> @@ -41,8 +41,9 @@ git config alias.gcc-style '!f() {
>  # *.mddiff=md
>  git config diff.md.xfuncname '^\(define.*$'
>  
> -# Tell git send-email where patches go.
> +# Tell git-format-patch(1)/git-send-email(1) where patches go.
>  # ??? Maybe also set sendemail.tocmd to guess from MAINTAINERS?
> +git config format.to 'gcc-patches@gcc.gnu.org'
>  git config sendemail.to 'gcc-patches@gcc.gnu.org'
>  
>  set_user=$(git config --get "user.name")
> -- 
> 2.45.2
> 



-- 



signature.asc
Description: PGP signature


[PATCH] AArch64: Remove redundant check in aarch64_simd_mov

2024-10-17 Thread Wilco Dijkstra

The split condition in aarch64_simd_mov uses aarch64_simd_special_constant_p.  
While
doing the split, it checks the mode before calling 
aarch64_maybe_generate_simd_constant.
This risky since it may result in unexpectedly calling aarch64_split_simd_move 
instead
of aarch64_maybe_generate_simd_constant.  Since the mode is already checked, 
remove the
spurious explicit mode check.

Passes bootstrap & regress, OK for commit?

---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
18795a08b61da874a9e811822ed82e7eb9350bb4..5ac80103502112664528d37e3b8e24edc16eb932
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -208,7 +208,6 @@ (define_insn_and_split "*aarch64_simd_mov"
 else
   {
if (FP_REGNUM_P (REGNO (operands[0]))
-   && mode == V2DImode
&& aarch64_maybe_generate_simd_constant (operands[0], operands[1],
 mode))
  ;



RE: [PATCH 1/2] Reduce lane utilization in VEC_PERM_EXPRs for two_operator nodes

2024-10-17 Thread Tamar Christina
Hi Christoph,

> -Original Message-
> From: Christoph Müllner 
> Sent: Tuesday, October 15, 2024 3:57 PM
> To: gcc-patches@gcc.gnu.org; Philipp Tomsich ; Tamar
> Christina ; Richard Biener 
> Cc: Jeff Law ; Robin Dapp ;
> Christoph Müllner 
> Subject: [PATCH 1/2] Reduce lane utilization in VEC_PERM_EXPRs for
> two_operator nodes
> 
> When two_operator SLP nodes are built, the VEC_PERM_EXPR that merges the
> result
> selects a lane only based on the operator found. If the input nodes have
> duplicate elements, there may be more than one way to choose. This commit
> changes the policy to reuse an existing lane with the result that we can
> possibly free up lanes entirely.
> 
> For example, given two vectors with duplicates:
>   A = {a1, a1, a2, a2}
>   B = {b1, b1, b3, b2}
> 
> A two_operator node with operators +, -, +, - is currently built as:
>   RES = VEC_PERM_EXPR(0, 5, 2, 7)
> With this patch, the permutation becomes:
>   RES = VEC_PERM_EXPR(0, 4, 2, 6)
> Lanes 0 and 2 are reused and lanes 1 and 3 are not utilized anymore.
> 
> The direct effect of this change can be seen in the AArch64 test case,
> where the simpler permutation allows to lower to a TRN1 instead of an
> expensive TBL.

That makes sense,  So this is trying to make the permutes EVEN/EVEN or
ODD/ODD if possible.  Just wondering if

> 
> Bootstrapped and reg-tested on AArch64 (C, C++, Fortran).
> 
> Manolis Tsamis was the patch's initial author before I took it over.
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.cc: Reduce lane utilization in VEC_PERM_EXPRs for
> two_operators
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-perm-13.c: New test.
>   * gcc.target/aarch64/sve/slp-perm-13.c: New test.
> 
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-perm-13.c   | 29 +++
>  .../gcc.target/aarch64/sve/slp-perm-13.c  |  4 +++
>  gcc/tree-vect-slp.cc  | 21 +-
>  3 files changed, 53 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-13.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
> b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
> new file mode 100644
> index 000..08639e72fb0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-slp2-details" } */
> +
> +#define LOAD_VEC(e0, e1, e2, e3, p) \
> +int e0 = p[0]; \
> +int e1 = p[1]; \
> +int e2 = p[2]; \
> +int e3 = p[3];
> +
> +#define STORE_VEC(p, e0, e1, e2, e3) \
> +p[0] = e0; \
> +p[1] = e1; \
> +p[2] = e2; \
> +p[3] = e3;
> +
> +void
> +foo (int *p)
> +{
> +  LOAD_VEC(s0, s1, s2, s3, p);
> +
> +  int t0 = s0 + s1;
> +  int t1 = s0 - s1;
> +  int t2 = s2 + s3;
> +  int t3 = s2 - s3;
> +
> +  STORE_VEC(p, t0, t1, t2, t3);
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 4, 2, 6 }" "slp2" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> new file mode 100644
> index 000..f5839f273e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> @@ -0,0 +1,4 @@
> +#include "../../../gcc.dg/vect/slp-perm-13.c"
> +
> +/* { dg-final { scan-assembler-not {\ttbl\t} } } */
> +
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 16332e0b6d7..8794c94ef90 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -2921,7 +2921,26 @@ fail:
> gassign *ostmt = as_a  (ostmt_info->stmt);
> if (gimple_assign_rhs_code (ostmt) != code0)
>   {
> -   SLP_TREE_LANE_PERMUTATION (node).safe_push (std::make_pair (1,
> i));
> +   /* If the current element can be found in another lane that has
> +  been used previously then use that one instead.  This can
> +  happen when the ONE and TWO contain duplicate elements and
> +  reduces the number of 'active' lanes.  */
> +   int idx = i;
> +   for (int alt_idx = (int) i - 1; alt_idx >= 0; alt_idx--)
> + {
> +   gassign *alt_stmt = as_a  (stmts[alt_idx]->stmt);
> +   if (gimple_assign_rhs_code (alt_stmt) == code0
> +   && gimple_assign_rhs1 (ostmt)
> + == gimple_assign_rhs1 (alt_stmt)
> +   && gimple_assign_rhs2 (ostmt)
> + == gimple_assign_rhs2 (alt_stmt))
> + {

.. we may want operand_equal_p (ostmt, alt_stmt, 0) etc here.

This looks good to me though but can't approve. 

Lets see what Richi thinks.

Cheers,
Tamar

> +   idx = alt_idx;
> +   break;
> + }
> + }
> +   SLP_TREE_LANE_PERMUTATION (node)
> + .safe_push (std::make_pair (1, idx));
> ocode = gimple_assign_rh

[PATCH v2] contrib/: Configure git-format-patch(1) to add To: gcc-patches@gcc.gnu.org

2024-10-17 Thread Alejandro Colomar
Just like we already do for git-send-email(1).  In some cases, patches
are prepared with git-format-patch(1), but are sent with a different
program, or some flags to git-send-email(1) may accidentally inhibit the
configuration.  By adding the TO in the email file, we make sure that
gcc-patches@ will receive the patch.

contrib/ChangeLog:

* gcc-git-customization.sh: Configure git-format-patch(1) to add
'To: gcc-patches@gcc.gnu.org'.

Cc: Eric Gallager 
Signed-off-by: Alejandro Colomar 
---

Hi!

v2 changes:

-  Fix comment.  [Eric]

Cheers,
Alex


Range-diff against v1:
1:  0ee3f802637 ! 1:  2bd0e0f82bf contrib/: Configure git-format-patch(1) to 
add To: gcc-patches@gcc.gnu.org
@@ Commit message
 * gcc-git-customization.sh: Configure git-format-patch(1) to 
add
 'To: gcc-patches@gcc.gnu.org'.
 
+Cc: Eric Gallager 
 Signed-off-by: Alejandro Colomar 
 
  ## contrib/gcc-git-customization.sh ##
-@@ contrib/gcc-git-customization.sh: git config diff.md.xfuncname 
'^\(define.*$'
+@@ contrib/gcc-git-customization.sh: git config alias.gcc-style '!f() {
+ # *.mddiff=md
+ git config diff.md.xfuncname '^\(define.*$'
  
- # Tell git send-email where patches go.
+-# Tell git send-email where patches go.
++# Tell git-format-patch(1)/git-send-email(1) where patches go.
  # ??? Maybe also set sendemail.tocmd to guess from MAINTAINERS?
 +git config format.to 'gcc-patches@gcc.gnu.org'
  git config sendemail.to 'gcc-patches@gcc.gnu.org'

 contrib/gcc-git-customization.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh
index 54bd35ea1aa..dd59bece1dc 100755
--- a/contrib/gcc-git-customization.sh
+++ b/contrib/gcc-git-customization.sh
@@ -41,8 +41,9 @@ git config alias.gcc-style '!f() {
 # *.mddiff=md
 git config diff.md.xfuncname '^\(define.*$'
 
-# Tell git send-email where patches go.
+# Tell git-format-patch(1)/git-send-email(1) where patches go.
 # ??? Maybe also set sendemail.tocmd to guess from MAINTAINERS?
+git config format.to 'gcc-patches@gcc.gnu.org'
 git config sendemail.to 'gcc-patches@gcc.gnu.org'
 
 set_user=$(git config --get "user.name")
-- 
2.45.2



signature.asc
Description: PGP signature


RE: [PATCH 2/2] Add a new permute optimization step in SLP

2024-10-17 Thread Tamar Christina
Hi Christoph,

> -Original Message-
> From: Christoph Müllner 
> Sent: Tuesday, October 15, 2024 3:57 PM
> To: gcc-patches@gcc.gnu.org; Philipp Tomsich ; Tamar
> Christina ; Richard Biener 
> Cc: Jeff Law ; Robin Dapp ;
> Christoph Müllner 
> Subject: [PATCH 2/2] Add a new permute optimization step in SLP
> 
> This commit adds a new permute optimization step after running SLP
> vectorization.
> Although there are existing places where individual or nested permutes
> can be optimized, there are cases where independent permutes can be optimized,
> which cannot be expressed in the current pattern matching framework.
> The optimization step is run at the end so that permutes from completely 
> different
> SLP builds can be optimized.
> 
> The initial optimizations implemented can detect some cases where different
> "select permutes" (permutes that only use some of the incoming vector lanes)
> can be co-located in a single permute. This can optimize some cases where
> two_operator SLP nodes have duplicate elements.
> 
> Bootstrapped and reg-tested on AArch64 (C, C++, Fortran).
> 
> Manolis Tsamis was the patch's initial author before I took it over.
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.cc (get_tree_def): Return the definition of a name.
>   (recognise_perm_binop_perm_pattern): Helper function.
>   (vect_slp_optimize_permutes): New permute optimization step.
>   (vect_slp_function): Run the new permute optimization step.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-perm-14.c: New test.
>   * gcc.target/aarch64/sve/slp-perm-14.c: New test.
> 
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-perm-14.c   |  42 +++
>  .../gcc.target/aarch64/sve/slp-perm-14.c  |   3 +
>  gcc/tree-vect-slp.cc  | 248 ++
>  3 files changed, 293 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-14.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> b/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> new file mode 100644
> index 000..f56e3982a62
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> @@ -0,0 +1,42 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-slp1-details" } */
> +
> +#include 
> +
> +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
> +int t0 = s0 + s1;\
> +int t1 = s0 - s1;\
> +int t2 = s2 + s3;\
> +int t3 = s2 - s3;\
> +d0 = t0 + t2;\
> +d1 = t1 + t3;\
> +d2 = t0 - t2;\
> +d3 = t1 - t3;\
> +}
> +
> +int
> +x264_pixel_satd_8x4_simplified (uint8_t *pix1, int i_pix1, uint8_t *pix2, int
> i_pix2)
> +{
> +  uint32_t tmp[4][4];
> +  uint32_t a0, a1, a2, a3;
> +  int sum = 0;
> +
> +  for (int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2)
> +{
> +  a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
> +  a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
> +  a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
> +  a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
> +  HADAMARD4(tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0, a1, a2, a3);
> +}
> +
> +  for (int i = 0; i < 4; i++)
> +{
> +  HADAMARD4(a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
> +  sum += a0 + a1 + a2 + a3;
> +}
> +
> +  return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "slp1" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
> b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
> new file mode 100644
> index 000..4e0d5175be8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
> @@ -0,0 +1,3 @@
> +#include "../../../gcc.dg/vect/slp-perm-14.c"
> +
> +/* { dg-final { scan-assembler-not {\ttbl\t} } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 8794c94ef90..4bf5ccb9cdf 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -9478,6 +9478,252 @@ vect_slp_if_converted_bb (basic_block bb, loop_p
> orig_loop)
>return vect_slp_bbs (bbs, orig_loop);
>  }
> 
> +/* If NAME is an SSA_NAME defined by an assignment, return that assignment.
> +   If SINGLE_USE_ONLY is true and NAME has multiple uses, return NULL.  */
> +
> +static gassign *
> +get_tree_def (tree name, bool single_use_only)
> +{
> +  if (TREE_CODE (name) != SSA_NAME)
> +return NULL;
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> +
> +  if (single_use_only && !has_single_use (name))
> +return NULL;
> +
> +  if (!is_gimple_assign (def_stmt))
> +return NULL;

Probably cheaper to test this before the single_use. But..
> +
> +  return dyn_cast  (def_stmt);

Why not just do the dyn_cast and assign that to a result and check
If the dyn_cast succeeded.  Saves you the second check of the type.

> +}
> +
> +/* Helper function for vect_slp_optimiz

Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-10-17 Thread Richard Sandiford
Jennifer Schmitz  writes:
> [...]
> Looking at the diff of the vect dumps (below is a section of the diff for 
> strided_store_2.c), it seemed odd that vec_to_scalar operations cost 0 now, 
> instead of the previous cost of 2:
>
> +strided_store_1.c:38:151: note:=== vectorizable_operation ===
> +strided_store_1.c:38:151: note:vect_model_simple_cost: inside_cost = 1, 
> prologue_cost  = 0 .
> +strided_store_1.c:38:151: note:   ==> examining statement: *_6 = _7;
> +strided_store_1.c:38:151: note:   vect_is_simple_use: operand _3 + 1.0e+0, 
> type of def:internal
> +strided_store_1.c:38:151: note:   Vectorizing an unaligned access.
> +Applying pattern match.pd:236, generic-match-9.cc:4128
> +Applying pattern match.pd:5285, generic-match-10.cc:4234
> +strided_store_1.c:38:151: note:   vect_model_store_cost: inside_cost = 12, 
> prologue_cost = 0 .
>  *_2 1 times unaligned_load (misalign -1) costs 1 in body
> -_3 + 1.0e+0 1 times scalar_to_vec costs 1 in prologue
>  _3 + 1.0e+0 1 times vector_stmt costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> + 1 times vector_load costs 1 in prologue
> +_7 1 times vec_to_scalar costs 0 in body
>  _7 1 times scalar_store costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> +_7 1 times vec_to_scalar costs 0 in body
>  _7 1 times scalar_store costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> +_7 1 times vec_to_scalar costs 0 in body
>  _7 1 times scalar_store costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> +_7 1 times vec_to_scalar costs 0 in body
>  _7 1 times scalar_store costs 1 in body
>
> Although the aarch64_use_new_vector_costs_p flag was used in multiple places 
> in aarch64.cc, the location that causes this behavior is this one:
> unsigned
> aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
>  stmt_vec_info stmt_info, slp_tree,
>  tree vectype, int misalign,
>  vect_cost_model_location where)
> {
>   [...]
>   /* Try to get a more accurate cost by looking at STMT_INFO instead
>  of just looking at KIND.  */
> -  if (stmt_info && aarch64_use_new_vector_costs_p ())
> +  if (stmt_info)
> {
>   /* If we scalarize a strided store, the vectorizer costs one
>  vec_to_scalar for each element.  However, we can store the first
>  element using an FP store without a separate extract step.  */
>   if (vect_is_store_elt_extraction (kind, stmt_info))
> count -= 1;
>
>   stmt_cost = aarch64_detect_scalar_stmt_subtype (m_vinfo, kind,
>   stmt_info, stmt_cost);
>
>   if (vectype && m_vec_flags)
> stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
> stmt_info, vectype,
> where, stmt_cost);
> }
>   [...]
>   return record_stmt_cost (stmt_info, where, (count * stmt_cost).ceil ());
> }
>
> Previously, for mtune=generic, this function returned a cost of 2 for a 
> vec_to_scalar operation in the vect body. Now "if (stmt_info)" is entered and 
> "if (vect_is_store_elt_extraction (kind, stmt_info))" evaluates to true, 
> which sets the count to 0 and leads to a return value of 0.

At the time the code was written, a scalarised store would be costed
using one vec_to_scalar call into the backend, with the count parameter
set to the number of elements being stored.  The "count -= 1" was
supposed to lop off the leading element extraction, since we can store
lane 0 as a normal FP store.

The target-independent costing was later reworked so that it costs
each operation individually:

  for (i = 0; i < nstores; i++)
{
  if (costing_p)
{
  /* Only need vector extracting when there are more
 than one stores.  */
  if (nstores > 1)
inside_cost
  += record_stmt_cost (cost_vec, 1, vec_to_scalar,
   stmt_info, 0, vect_body);
  /* Take a single lane vector type store as scalar
 store to avoid ICE like 110776.  */
  if (VECTOR_TYPE_P (ltype)
  && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U))
n_adjacent_stores++;
  else
inside_cost
  += record_stmt_cost (cost_vec, 1, scalar_store,
   stmt_info, 0, vect_body);
  continue;
}

Unfortunately, there's no easy way of telling whether a particular call
is part of a group, and if so, which member of the group it is.

I suppose we could give up on the attempt to be (somewhat) 

Re: [PATCH 2/2] c++: constrained auto NTTP vs associated constraints

2024-10-17 Thread Patrick Palka
On Thu, 17 Oct 2024, Patrick Palka wrote:

> On Tue, 15 Oct 2024, Patrick Palka wrote:
> 
> > On Tue, 15 Oct 2024, Patrick Palka wrote:
> > 
> > > According to [temp.param]/11, the constraint on an auto NTTP is an
> > > associated constraint and so should be checked as part of satisfaction
> > > of the overall associated constraints rather than checked individually
> > > during coerion/deduction.
> > 
> > By the way, I wonder if such associated constraints should be relevant for
> > subsumption now?
> > 
> > template concept C = true;
> > 
> > template concept D = C && true;
> > 
> > template void f(); // #1
> > template void f(); // #2
> > 
> > int main() {
> >   f<0>(); // still ambiguous?
> > }
> > 
> > With this patch the above call is still ambiguous despite #2 now being
> > more constrained than #1 because "more constrained" is only considered for
> > function templates with the same signatures as per
> > https://eel.is/c++draft/temp.func.order#6.2.2 and we deem their signatures
> > to be different due to the different type-constraint.
> 
> I think I convinced myself that this example should be accepted, and the
> way to go about that is to replace the constrained auto in the NTTP type
> with an ordinary auto once we set its TEMPLATE_PARM_CONSTRAINTS.  That
> way both templates have the same signature modulo associated constraints.

Here is v2 which implements this in finish_constrained_parameter.  Now
we can assert that do_auto_deduction doesn't see a constrained auto
during adc_unify deduction!

-- >8 --

Subject: [PATCH v2 2/2] c++: constrained auto NTTP vs associated constraints

According to [temp.param]/11, the constraint on an auto NTTP is an
associated constraint and so should be checked as part of satisfaction
of the overall associated constraints rather than checked individually
during coerion/deduction.

In order to implement this we mainly need to make handling of
constrained auto NTTPs go through finish_constrained_parameter so that
TEMPLATE_PARM_CONSTRAINTS gets set on them, and teach
finish_shorthand_constraint how to build an associated constraint
corresponding to the written type-constraint.

gcc/cp/ChangeLog:

* constraint.cc (finish_shorthand_constraint): Add is_non_type
parameter for handling constrained (auto) NTTPS.
* cp-tree.h (do_auto_deduction): Adjust declaration.
(copy_template_args): Declare.
(finish_shorthand_constraint): Adjust declaration.
* mangle.cc (write_template_param_decl): Obtain constraints of
an (auto) NTTP through TEMPLATE_PARM_CONSTRAINTS instead of
PLACEHOLDER_TYPE_CONSTRAINTS.
* parser.cc (cp_parser_constrained_type_template_parm): Inline
into its only caller and remove.
(cp_parser_constrained_non_type_template_parm): Likewise.
(finish_constrained_parameter): Simplify after the above.  Replace
the type of a constrained (auto) NTTP with an ordinary auto.
(cp_parser_template_parameter): Dispatch to
finish_constrained_parameter for a constrained auto NTTP.
* pt.cc (process_template_parm): Pass is_non_type to
finish_shorthand_constraint.
(convert_template_argument): Adjust call to do_auto_deduction.
(copy_template_args): Remove static.
(unify): Adjust call to do_auto_deduction.
(make_constrained_placeholder_type): Return the type not the
TYPE_NAME for consistency with make_auto etc.
(do_auto_deduction): Remove now unused tmpl parameter.  Assert
we no longer see constrained autos during coercion/deduction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-placeholder12.C: Adjust expected error
upon constrained auto NTTP satisfaction failure.
* g++.dg/cpp2a/concepts-pr97093.C: Likewise.
* g++.dg/cpp2a/concepts-template-parm2.C: Likewise.
* g++.dg/cpp2a/concepts-template-parm6.C: Likewise.
* g++.dg/cpp2a/concepts-template-parm12.C: New test.
---
 gcc/cp/constraint.cc  | 32 --
 gcc/cp/cp-tree.h  |  6 +-
 gcc/cp/mangle.cc  |  2 +-
 gcc/cp/parser.cc  | 64 ---
 gcc/cp/pt.cc  | 38 +--
 .../g++.dg/cpp2a/concepts-placeholder12.C |  4 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-pr97093.C |  2 +-
 .../g++.dg/cpp2a/concepts-template-parm12.C   | 22 +++
 .../g++.dg/cpp2a/concepts-template-parm2.C|  2 +-
 .../g++.dg/cpp2a/concepts-template-parm6.C|  2 +-
 10 files changed, 101 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-template-parm12.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 35be9cc2b41..9394bea8835 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1189,7 +1189,7 @@ build_constrained_parameter (tree cnc, tree proto, tree 
args)
done only after t

Re: [PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-10-17 Thread Richard Sandiford
Evgeny Karpov  writes:
> Thursday, September 19, 2024
> Richard Sandiford  wrote:
>
>>> For instance:
>>> float __attribute__((aligned (32))) large_aligned_array[3];
>>>
>>> BIGGEST_ALIGNMENT could be up to 512 bits on x64.
>>> This patch has been added to cover this case without needing to
>>> change the FFmpeg code.
>>
>> What goes wrong if we don't do this?  I'm not sure from the description
>> whether it's a correctness fix, a performance fix, or whether it's about
>> avoiding wasted space.
>
> It is a correctness fix.

But you could you explain what goes wrong if you don't do this?
(I realise it might be very obvious when you've seen it happen :)
But I'm genuinely unsure.)

Why do we ignore the alignment if it is less than BIGGEST_ALIGNMENT?

>>> +#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT)  \
>>> +  { \
>>> +    unsigned HOST_WIDE_INT rounded = MAX ((SIZE), 1); \
>>> +    unsigned HOST_WIDE_INT alignment = MAX ((ALIGNMENT), 
>>> BIGGEST_ALIGNMENT); \
>>> +    rounded += (alignment / BITS_PER_UNIT) - 1; \
>>> +    rounded = (rounded / (alignment / BITS_PER_UNIT) \
>>> +  * (alignment / BITS_PER_UNIT)); \
>>
>> There's a ROUND_UP macro that could be used here.
>
> Here is the patch after applying ROUND_UP.
>
> Regards,
> Evgeny
>
> diff --git a/gcc/config/aarch64/aarch64-coff.h 
> b/gcc/config/aarch64/aarch64-coff.h
> index 3c8aed806c9..1a45256ebfe 100644
> --- a/gcc/config/aarch64/aarch64-coff.h
> +++ b/gcc/config/aarch64/aarch64-coff.h
> @@ -58,6 +58,13 @@
>assemble_name ((FILE), (NAME)),  \
>fprintf ((FILE), "," HOST_WIDE_INT_PRINT_DEC  "\n", (ROUNDED)))
>
> +#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT)  \
> +  {\
> +unsigned rounded = ROUND_UP (MAX ((SIZE), 1),  \
> +  MAX ((ALIGNMENT), BIGGEST_ALIGNMENT) / BITS_PER_UNIT);   \

Better to use "auto" rather than "unsigned".

Otherwise it looks good modulo the questions above.

Thanks,
Richard

> +ASM_OUTPUT_LOCAL (FILE, NAME, SIZE, rounded);  \
> +  }
> +
>  #define ASM_OUTPUT_SKIP(STREAM, NBYTES)\
>fprintf (STREAM, "\t.space\t%d  // skip\n", (int) (NBYTES))


Re: [PATCH v1 3/4] aarch64: improve assembly debug comments for build attributes

2024-10-17 Thread Richard Sandiford
Matthieu Longo  writes:
> On 2024-10-08 18:45, Richard Sandiford wrote:
>> However...
>> 
>>> +  return s;
>> 
>> ...we are unfortunately limited to C++11 constexprs, so I think this needs
>> to be:
>> 
>>return (t == uleb128 ? "ULEB128"
>>: t == asciz ? "asciz"
>>: nullptr);
>> 
>> if we want it to be treated as a constant for all builds.
>> 
>
> Fixed in the next revision.
> FYI we might have C++14 soon :) => 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665644.html

Great!  Looking forward to finally being able to stop working around this :)

Of course, it'd be OK to change back to a switch statement once the
transition has happened (in stage 1).

>>> +class aeabi_subsection
>>> +{
>>> +  public:
>>> +aeabi_subsection (const char *name, bool optional):
>>> +  name_(name),
>>> +  optional_(optional),
>>> +  avtype_(details::deduce_attr_av_type (T_val{}))
>> 
>> Formatting nit, should be indented as:
>> 
>> class aeabi_subsection
>> {
>> public:
>>aeabi_subsection (const char *name, bool optional)
>>  : name_ (name)
>>...
>> 
>> But the usual GCC style is to use an "m_" prefix for private members,
>> rather than a "_" suffix.
>> 
>
> Fixed.
>
> For "public:", Richard Earnshaw recommended not to respect the 
> recommended GNU style as it breaks the mklog script, and keep 1 space 
> before.

OK, that's fair.

Thanks,
Richard


Re: [PATCH 5/5] libgm2/libm2pim/wrapc.cc: Define NULL as nullptr

2024-10-17 Thread Gaius Mulley
Alejandro Colomar  writes:

> For internal C++ code, unconditionally define NULL as nullptr.
> We already require a C++11 compiler to bootstrap GCC anyway.
>
> Link: 
> Signed-off-by: Alejandro Colomar 
> ---
>  libgm2/libm2pim/wrapc.cc | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/libgm2/libm2pim/wrapc.cc b/libgm2/libm2pim/wrapc.cc
> index 5c31f1e2687..cdd1cf0d0fe 100644
> --- a/libgm2/libm2pim/wrapc.cc
> +++ b/libgm2/libm2pim/wrapc.cc
> @@ -63,10 +63,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #include 
>  #endif
>  
> -/* Define a generic NULL if one hasn't already been defined.  */
> -
>  #if !defined(NULL)
> -#define NULL 0
> +#define NULL nullptr
>  #endif
>  
>  /* strtime returns the address of a string which describes the

lgtm - thanks!