Re: [PATCHv2, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-06 Thread HAO CHEN GUI
Hi Jeff,

在 2024/8/5 23:11, Jeff Law 写道:
> We'll probably need Richard S. or someone else to chime in on the actual 
> patch, but yea, if they can leverage stp, it's likely going to be better than 
> actual vectors.
> 
> Do we have a testcase for this issue or was it something you just happened to 
> notice?

I will refine the patch and ask for Richard's advice.

The auto CI detects the aarch64 regression cases for all submitted patches.
I received its report after sending my "clear by pieces" patch and got
following regression cases.

FAIL: gcc.target/aarch64/auto-init-padding-11.c scan-assembler stp\txzr, xzr,
FAIL: gcc.target/aarch64/auto-init-padding-5.c scan-assembler-times stp\txzr, 
xzr, 2
FAIL: gcc.target/aarch64/memset-corner-cases.c check-function-bodies set0scalar
FAIL: gcc.target/aarch64/memset-q-reg.c check-function-bodies set128bitszero
FAIL: gcc.target/aarch64/memset-q-reg.c check-function-bodies set256bitszero

Then I made the patch and make sure all regression cases can be fixed.

Thanks
Gui Haochen


Re: [x86_64 PATCH] Support memory destinations and wide immediate constants in STV.

2024-08-06 Thread Uros Bizjak
On Mon, Aug 5, 2024 at 5:50 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> Very many thanks for the quick review and approval.  Here's another.
>
> This patch implements two improvements/refinements to the i386 backend's
> Scalar-To-Vector (STV) pass.  The first is to support memory destinations
> in binary logic operations, and the second is to provide more accurate
> costs/gains for (wide) immediate constants in binary logic operations.

Please do not mix together changes made for different reasons, as
advised in "Contributing to GCC" [1], section "Submitting Patches".

[1] https://gcc.gnu.org/contribute.html

Uros.

>
> A motivating example is gcc.target/i386/movti-2.c:
>
> __int128 m;
> void foo()
> {
> m &= ((__int128)0x0123456789abcdefULL<<64) | 0x0123456789abcdefULL;
> }
>
> for which STV1 currently generates a warning/error:
> > r100 has non convertible use in insn 6
>
> (insn 5 2 6 2 (set (reg:TI 100)
> (const_wide_int 0x123456789abcdef0123456789abcdef)) "movti-2.c":7:7
> 87 {
> *movti_internal}
>  (nil))
> (insn 6 5 0 2 (parallel [
> (set (mem/c:TI (symbol_ref:DI ("m") [flags 0x2]   0x7f36d1c
> 27c60 m>) [1 m+0 S16 A128])
> (and:TI (mem/c:TI (symbol_ref:DI ("m") [flags 0x2]
>  7f36d1c27c60 m>) [1 m+0 S16 A128])
> (reg:TI 100)))
> (clobber (reg:CC 17 flags))
> ]) "movti-2.c":7:7 645 {*andti3_doubleword}
>  (expr_list:REG_DEAD (reg:TI 100)
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (nil
>
> and therefore generates the following scalar code with -O2 -mavx
>
> foo:movabsq $81985529216486895, %rax
> andq%rax, m(%rip)
> andq%rax, m+8(%rip)
> ret
>
> with this patch we now support read-modify-write instructions (as STV
> candidates), splitting them into explicit read-modify instructions
> followed by an explicit write instruction.  Hence, we now produce
> (when not optimizing for size):
>
> foo:movabsq $81985529216486895, %rax
> vmovq   %rax, %xmm0
> vpunpcklqdq %xmm0, %xmm0, %xmm0
> vpand   m(%rip), %xmm0, %xmm0
> vmovdqa %xmm0, m(%rip)
> ret
>
> This code also handles the const_wide_int in example above, correcting
> the costs/gains when the hi/lo words are the same.  One minor complication
> is that the middle-end assumes (when generating memset) that SSE constants
> will be shared/amortized across multiple consecutive writes.  Hence to
> avoid testsuite regressions, we add a heuristic that considers an immediate
> constant to be very cheap, if that same immediate value occurs in the
> previous instruction or in the instruction after.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-08-05  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.cc (timode_immed_const_gain): New
> function to determine the gain/cost on a CONST_WIDE_INT.
> (local_duplicate_constant_p): Helper function to see if the
> same immediate constant appears in the previous or next insn.
> (timode_scalar_chain::compute_convert_gain): Fix whitespace.
> : Provide more accurate estimates using
> timode_immed_const_gain and local_duplicate_constant_p.
> : Handle MEM_P (dst) and CONSTANT_SCALAR_INT_P (src).
> (timode_scalar_to_vector_candidate_p): Support the first operand
> of AND, IOR and XOR being MEM_P (i.e. a read-modify-write insn).
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/movti-2.c: Change dg-options to -Os.
> * gcc.target/i386/movti-4.c: Expected output of original movti-2.c.
>
>
> Thanks again,
> Roger
> --
>


[PATCH] wide-int: Fix up mul_internal overflow checking [PR116224]

2024-08-06 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because wi::mul for (_BitInt(65))-15
times (_BitInt(65))-15 computes the right value (_BitInt(65))225, but
sets *overflow to wi::OVF_UNKNOWN as that it overflowed when it didn't.

Even signed operands are unpacked as unsigned but because they are
implicitly sign-extended from the represented value (the operands
obviously have len==1), we get
0xfff1, 0x, 0x1, 0x0
in both u and v (0x1 because that is exactly 65 bits).
We then multiply these.  Next step is because both the high and
overflow handling expects the high half to start at a limb boundary
the bits of the result starting with bit 65 are shifted up by 63 such
that the bits relevant for high/need_overflow start at the half of the
4th half wide int limb.
Because both operands are negative that part is then adjusted.

The reason mul_internal says there is overflow is because of the unspecified
garbage in the most significant bits of the result which the adjusting
doesn't clean up.  65 bit multiplication needs 65 bits of result and 65 bits
of the high part, can't produce more, so the following patch fixes it by
checking for the overflow only in those first 65 bits of the high part, not
anything beyond that.  If it was a highpart multiply, we'd have ignored that
as well (canonized).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-08-06  Jakub Jelinek  

PR tree-optimization/116224
* wide-int.cc (wi::mul_internal): If prec isn't multiple of
HOST_BITS_PER_WIDE_INT, for need_overflow checking only look at
the least significant prec bits starting with r[half_blocks_needed].

* gcc.dg/torture/bitint-72.c: New test.

--- gcc/wide-int.cc.jj  2024-02-24 12:45:28.701248718 +0100
+++ gcc/wide-int.cc 2024-08-05 21:59:06.254441894 +0200
@@ -1574,7 +1574,24 @@ wi::mul_internal (HOST_WIDE_INT *val, co
  top &= mask;
}
 
-  for (i = half_blocks_needed; i < half_blocks_needed * 2; i++)
+  unsigned int end = half_blocks_needed * 2;
+  shift = prec % HOST_BITS_PER_WIDE_INT;
+  if (shift)
+   {
+ /* For overflow checking only look at the first prec bits
+starting with r[half_blocks_needed].  */
+ if (shift <= HOST_BITS_PER_HALF_WIDE_INT)
+   --end;
+ shift %= HOST_BITS_PER_HALF_WIDE_INT;
+ if (shift)
+   {
+ if (top)
+   r[end - 1] |= ((~(unsigned HOST_HALF_WIDE_INT) 0) << shift);
+ else
+   r[end - 1] &= (((unsigned HOST_HALF_WIDE_INT) 1) << shift) - 1;
+   }
+   }
+  for (i = half_blocks_needed; i < end; i++)
if (((HOST_WIDE_INT)(r[i] & mask)) != top)
  /* FIXME: Signed overflow type is not implemented yet.  */
  *overflow = (sgn == UNSIGNED) ? wi::OVF_OVERFLOW : wi::OVF_UNKNOWN;
--- gcc/testsuite/gcc.dg/torture/bitint-72.c.jj 2024-08-05 15:28:34.924687922 
+0200
+++ gcc/testsuite/gcc.dg/torture/bitint-72.c2024-08-05 15:28:26.049805114 
+0200
@@ -0,0 +1,28 @@
+/* PR tree-optimization/116224 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 65
+#define N 65
+#else
+#define N 63
+#endif
+
+signed char g;
+
+int
+foo (signed char c, int i, _BitInt(N) b)
+{
+  __builtin_memmove (&g, &b, 1);
+  return b / i / c;
+}
+
+int
+main ()
+{
+  int x = foo (-15, -15, 900);
+  if (x != 4)
+__builtin_abort ();
+}

Jakub



Re: [PATCH] wide-int: Fix up mul_internal overflow checking [PR116224]

2024-08-06 Thread Richard Biener
On Tue, 6 Aug 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because wi::mul for (_BitInt(65))-15
> times (_BitInt(65))-15 computes the right value (_BitInt(65))225, but
> sets *overflow to wi::OVF_UNKNOWN as that it overflowed when it didn't.
> 
> Even signed operands are unpacked as unsigned but because they are
> implicitly sign-extended from the represented value (the operands
> obviously have len==1), we get
> 0xfff1, 0x, 0x1, 0x0
> in both u and v (0x1 because that is exactly 65 bits).
> We then multiply these.  Next step is because both the high and
> overflow handling expects the high half to start at a limb boundary
> the bits of the result starting with bit 65 are shifted up by 63 such
> that the bits relevant for high/need_overflow start at the half of the
> 4th half wide int limb.
> Because both operands are negative that part is then adjusted.
> 
> The reason mul_internal says there is overflow is because of the unspecified
> garbage in the most significant bits of the result which the adjusting
> doesn't clean up.  65 bit multiplication needs 65 bits of result and 65 bits
> of the high part, can't produce more, so the following patch fixes it by
> checking for the overflow only in those first 65 bits of the high part, not
> anything beyond that.  If it was a highpart multiply, we'd have ignored that
> as well (canonized).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.

Richar.

> 2024-08-06  Jakub Jelinek  
> 
>   PR tree-optimization/116224
>   * wide-int.cc (wi::mul_internal): If prec isn't multiple of
>   HOST_BITS_PER_WIDE_INT, for need_overflow checking only look at
>   the least significant prec bits starting with r[half_blocks_needed].
> 
>   * gcc.dg/torture/bitint-72.c: New test.
> 
> --- gcc/wide-int.cc.jj2024-02-24 12:45:28.701248718 +0100
> +++ gcc/wide-int.cc   2024-08-05 21:59:06.254441894 +0200
> @@ -1574,7 +1574,24 @@ wi::mul_internal (HOST_WIDE_INT *val, co
> top &= mask;
>   }
>  
> -  for (i = half_blocks_needed; i < half_blocks_needed * 2; i++)
> +  unsigned int end = half_blocks_needed * 2;
> +  shift = prec % HOST_BITS_PER_WIDE_INT;
> +  if (shift)
> + {
> +   /* For overflow checking only look at the first prec bits
> +  starting with r[half_blocks_needed].  */
> +   if (shift <= HOST_BITS_PER_HALF_WIDE_INT)
> + --end;
> +   shift %= HOST_BITS_PER_HALF_WIDE_INT;
> +   if (shift)
> + {
> +   if (top)
> + r[end - 1] |= ((~(unsigned HOST_HALF_WIDE_INT) 0) << shift);
> +   else
> + r[end - 1] &= (((unsigned HOST_HALF_WIDE_INT) 1) << shift) - 1;
> + }
> + }
> +  for (i = half_blocks_needed; i < end; i++)
>   if (((HOST_WIDE_INT)(r[i] & mask)) != top)
> /* FIXME: Signed overflow type is not implemented yet.  */
> *overflow = (sgn == UNSIGNED) ? wi::OVF_OVERFLOW : wi::OVF_UNKNOWN;
> --- gcc/testsuite/gcc.dg/torture/bitint-72.c.jj   2024-08-05 
> 15:28:34.924687922 +0200
> +++ gcc/testsuite/gcc.dg/torture/bitint-72.c  2024-08-05 15:28:26.049805114 
> +0200
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/116224 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 65
> +#define N 65
> +#else
> +#define N 63
> +#endif
> +
> +signed char g;
> +
> +int
> +foo (signed char c, int i, _BitInt(N) b)
> +{
> +  __builtin_memmove (&g, &b, 1);
> +  return b / i / c;
> +}
> +
> +int
> +main ()
> +{
> +  int x = foo (-15, -15, 900);
> +  if (x != 4)
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] wide-int: Fix up mul_internal overflow checking [PR116224]

2024-08-06 Thread Sam James
Jakub Jelinek  writes:

> Hi!
>
> The following testcase is miscompiled, because wi::mul for (_BitInt(65))-15
> times (_BitInt(65))-15 computes the right value (_BitInt(65))225, but
> sets *overflow to wi::OVF_UNKNOWN as that it overflowed when it didn't.
>
> Even signed operands are unpacked as unsigned but because they are
> implicitly sign-extended from the represented value (the operands
> obviously have len==1), we get
> 0xfff1, 0x, 0x1, 0x0
> in both u and v (0x1 because that is exactly 65 bits).
> We then multiply these.  Next step is because both the high and
> overflow handling expects the high half to start at a limb boundary
> the bits of the result starting with bit 65 are shifted up by 63 such
> that the bits relevant for high/need_overflow start at the half of the
> 4th half wide int limb.
> Because both operands are negative that part is then adjusted.
>
> The reason mul_internal says there is overflow is because of the unspecified
> garbage in the most significant bits of the result which the adjusting
> doesn't clean up.  65 bit multiplication needs 65 bits of result and 65 bits
> of the high part, can't produce more, so the following patch fixes it by
> checking for the overflow only in those first 65 bits of the high part, not
> anything beyond that.  If it was a highpart multiply, we'd have ignored that
> as well (canonized).

Nit: canonicalized. to canonize is to become a saint :)

thanks,
sam


[COMMITTED 3/9] ada: Use fully qualified in the runtime library

2024-08-06 Thread Marc Poulhiès
From: Viljar Indus 

gcc/ada/

* libgnarl/s-taprop__mingw.adb: Use fully qualified names
to avoid ambiguity.
* libgnarl/s-taprop__posix.adb: Likewise.
* libgnarl/s-taprop__qnx.adb: Likewise.
* libgnarl/s-taprop__rtems.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-taprop__mingw.adb |  2 +-
 gcc/ada/libgnarl/s-taprop__posix.adb |  2 +-
 gcc/ada/libgnarl/s-taprop__qnx.adb   | 16 
 gcc/ada/libgnarl/s-taprop__rtems.adb |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/libgnarl/s-taprop__mingw.adb 
b/gcc/ada/libgnarl/s-taprop__mingw.adb
index f77d71970b8..8c7f59f1c5d 100644
--- a/gcc/ada/libgnarl/s-taprop__mingw.adb
+++ b/gcc/ada/libgnarl/s-taprop__mingw.adb
@@ -1035,7 +1035,7 @@ package body System.Task_Primitives.Operations is
---
 
function RT_Resolution return Duration is
-  Ticks_Per_Second : aliased LARGE_INTEGER;
+  Ticks_Per_Second : aliased System.OS_Interface.LARGE_INTEGER;
begin
   QueryPerformanceFrequency (Ticks_Per_Second'Access);
   return Duration (1.0 / Ticks_Per_Second);
diff --git a/gcc/ada/libgnarl/s-taprop__posix.adb 
b/gcc/ada/libgnarl/s-taprop__posix.adb
index fb70aaf4976..3d76679ad4a 100644
--- a/gcc/ada/libgnarl/s-taprop__posix.adb
+++ b/gcc/ada/libgnarl/s-taprop__posix.adb
@@ -209,7 +209,7 @@ package body System.Task_Primitives.Operations is
  new Ada.Unchecked_Conversion (Task_Id, System.Address);
 
function GNAT_pthread_condattr_setup
- (attr : access pthread_condattr_t) return int;
+ (attr : access pthread_condattr_t) return Interfaces.C.int;
pragma Import (C,
  GNAT_pthread_condattr_setup, "__gnat_pthread_condattr_setup");
 
diff --git a/gcc/ada/libgnarl/s-taprop__qnx.adb 
b/gcc/ada/libgnarl/s-taprop__qnx.adb
index f475c05c562..39e6983f438 100644
--- a/gcc/ada/libgnarl/s-taprop__qnx.adb
+++ b/gcc/ada/libgnarl/s-taprop__qnx.adb
@@ -119,7 +119,7 @@ package body System.Task_Primitives.Operations is
 
function Initialize_Lock
  (L: not null access RTS_Lock;
-  Prio : Any_Priority) return int;
+  Prio : Any_Priority) return Interfaces.C.int;
--  Initialize the lock L. If Ceiling_Support is True, then set the ceiling
--  to Prio. Returns 0 for success, or ENOMEM for out-of-memory.
 
@@ -220,7 +220,7 @@ package body System.Task_Primitives.Operations is
  new Ada.Unchecked_Conversion (Task_Id, System.Address);
 
function GNAT_pthread_condattr_setup
- (attr : access pthread_condattr_t) return int;
+ (attr : access pthread_condattr_t) return Interfaces.C.int;
pragma Import (C,
  GNAT_pthread_condattr_setup, "__gnat_pthread_condattr_setup");
 
@@ -333,11 +333,11 @@ package body System.Task_Primitives.Operations is
 
function Initialize_Lock
  (L: not null access RTS_Lock;
-  Prio : Any_Priority) return int
+  Prio : Any_Priority) return Interfaces.C.int
is
   Attributes : aliased pthread_mutexattr_t;
-  Result : int;
-  Result_2   : aliased int;
+  Result : Interfaces.C.int;
+  Result_2   : aliased Interfaces.C.int;
 
begin
   Result := pthread_mutexattr_init (Attributes'Access);
@@ -425,9 +425,9 @@ package body System.Task_Primitives.Operations is
  (L : not null access Lock; Ceiling_Violation : out Boolean)
is
   Self: constant pthread_t := pthread_self;
-  Result  : int;
-  Policy  : aliased int;
-  Ceiling : aliased int;
+  Result  : Interfaces.C.int;
+  Policy  : aliased Interfaces.C.int;
+  Ceiling : aliased Interfaces.C.int;
   Sched   : aliased struct_sched_param;
 
begin
diff --git a/gcc/ada/libgnarl/s-taprop__rtems.adb 
b/gcc/ada/libgnarl/s-taprop__rtems.adb
index ea8422cb454..0a33c194ec1 100644
--- a/gcc/ada/libgnarl/s-taprop__rtems.adb
+++ b/gcc/ada/libgnarl/s-taprop__rtems.adb
@@ -200,7 +200,7 @@ package body System.Task_Primitives.Operations is
  new Ada.Unchecked_Conversion (Task_Id, System.Address);
 
function GNAT_pthread_condattr_setup
- (attr : access pthread_condattr_t) return int;
+ (attr : access pthread_condattr_t) return Interfaces.C.int;
pragma Import (C,
  GNAT_pthread_condattr_setup, "__gnat_pthread_condattr_setup");
 
@@ -304,7 +304,7 @@ package body System.Task_Primitives.Operations is
  Res :=
mprotect
  (Stack_Base - (Stack_Base mod Page_Size) + Page_Size,
-  size_t (Page_Size),
+  Interfaces.C.size_t (Page_Size),
   prot => (if On then PROT_ON else PROT_OFF));
  pragma Assert (Res = 0);
   end if;
-- 
2.45.2



[COMMITTED 2/9] ada: Fix propagation of SPARK_Mode for renaming-as-body

2024-08-06 Thread Marc Poulhiès
From: Yannick Moy 

The value of SPARK_Mode associated with a renaming-as-body might
not be the correct one, when the private part of the package containing
the declaration has SPARK_Mode Off while the public part has SPARK_Mode
On. This may lead to analysis of code by GNATprove that should not be
analyzed.

gcc/ada/

* freeze.adb (Build_Renamed_Body): Propagate SPARK_Pragma to body
build from renaming, so that locally relevant value is taken into
account.
* sem_ch6.adb (Analyze_Expression_Function): Propagate
SPARK_Pragma to body built from expression function, so that
locally relevant value is taken into account.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb  | 7 +++
 gcc/ada/sem_ch6.adb | 9 +
 2 files changed, 16 insertions(+)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index c8d20d020c7..a947018052c 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -586,6 +586,13 @@ package body Freeze is
  Next (Param_Spec);
   end loop;
 
+  --  Copy SPARK pragma from renaming declaration
+
+  Set_SPARK_Pragma
+(Defining_Unit_Name (Spec), SPARK_Pragma (New_S));
+  Set_SPARK_Pragma_Inherited
+(Defining_Unit_Name (Spec), SPARK_Pragma_Inherited (New_S));
+
   --  In GNATprove, prefer to generate an expression function whenever
   --  possible, to benefit from the more precise analysis in that case
   --  (as if an implicit postcondition had been generated).
diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index 0988fad97e8..d3912ffc9d5 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -333,6 +333,15 @@ package body Sem_Ch6 is
   New_Spec := Copy_Subprogram_Spec (Spec);
   Prev := Current_Entity_In_Scope (Defining_Entity (Spec));
 
+  --  Copy SPARK pragma from expression function
+
+  Set_SPARK_Pragma
+(Defining_Unit_Name (New_Spec),
+ SPARK_Pragma (Defining_Unit_Name (Spec)));
+  Set_SPARK_Pragma_Inherited
+(Defining_Unit_Name (New_Spec),
+ SPARK_Pragma_Inherited (Defining_Unit_Name (Spec)));
+
   --  If there are previous overloadable entities with the same name,
   --  check whether any of them is completed by the expression function.
   --  In a generic context a formal subprogram has no completion.
-- 
2.45.2



[COMMITTED 1/9] ada: Reject use-clause conflicts in the run-time library

2024-08-06 Thread Marc Poulhiès
From: Bob Duff 

This patch fixes a bug where GNAT would fail to detect certain
errors when compiling the run-time library.  In particular, if
two overloaded homographs are both directly visible, it would
pick one, rather than complaining about the ambiguity.

The problem was that some special-purpose code in Sem_Ch8 was trying
to make a user name take precedence over some run-time library
declaration that (incorrectly) appears to be visible because of
rtsfind. The solution is to disable that code while compiling
the run-time library itself.

In addition, we fix the newly-found errors in the run-time library.

gcc/ada/

* sem_ch8.adb (Find_Direct_Name): Disable the special-purpose code
when we are actually compiling the run-time library itself.
* libgnarl/a-exetim__posix.adb: Fix newly-found use-clause
conflicts.
* libgnat/a-direct.adb: Likewise.
* libgnat/a-nbnbin.adb: Likewise.
* libgnat/a-timoio__128.adb: Likewise.
* libgnat/a-timoio.adb: Likewise.
* libgnat/a-wtmoio__128.adb: Likewise.
* libgnat/a-wtmoio.adb: Likewise.
* libgnat/a-ztmoio__128.adb: Likewise.
* libgnat/a-ztmoio.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/a-exetim__posix.adb | 4 ++--
 gcc/ada/libgnat/a-direct.adb | 4 ++--
 gcc/ada/libgnat/a-nbnbin.adb | 3 ++-
 gcc/ada/libgnat/a-timoio.adb | 5 +
 gcc/ada/libgnat/a-timoio__128.adb| 8 
 gcc/ada/libgnat/a-wtmoio.adb | 5 +
 gcc/ada/libgnat/a-wtmoio__128.adb| 8 
 gcc/ada/libgnat/a-ztmoio.adb | 5 +
 gcc/ada/libgnat/a-ztmoio__128.adb| 8 
 gcc/ada/sem_ch8.adb  | 4 +++-
 10 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnarl/a-exetim__posix.adb 
b/gcc/ada/libgnarl/a-exetim__posix.adb
index 05c55c567fa..6f3eecb2fe6 100644
--- a/gcc/ada/libgnarl/a-exetim__posix.adb
+++ b/gcc/ada/libgnarl/a-exetim__posix.adb
@@ -113,14 +113,14 @@ package body Ada.Execution_Time is
   function clock_gettime
 (clock_id : Interfaces.C.int;
  tp   : access timespec)
- return int;
+ return Interfaces.C.int;
   pragma Import (C, clock_gettime, "clock_gettime");
   --  Function from the POSIX.1b Realtime Extensions library
 
   function pthread_getcpuclockid
 (tid   : Thread_Id;
  clock_id  : access Interfaces.C.int)
- return int;
+ return Interfaces.C.int;
   pragma Import (C, pthread_getcpuclockid, "pthread_getcpuclockid");
   --  Function from the Thread CPU-Time Clocks option
 
diff --git a/gcc/ada/libgnat/a-direct.adb b/gcc/ada/libgnat/a-direct.adb
index adff12277e8..fbf249cd35e 100644
--- a/gcc/ada/libgnat/a-direct.adb
+++ b/gcc/ada/libgnat/a-direct.adb
@@ -1292,7 +1292,7 @@ package body Ada.Directories is
   Dir_Pointer  : Dir_Type_Value;
   File_Name_Addr   : Address;
   File_Name_Len: aliased Integer;
-  Pattern_Regex: Regexp;
+  Pattern_Regex: System.Regexp.Regexp;
 
   Call_Result  : Integer;
   pragma Warnings (Off, Call_Result);
@@ -1377,7 +1377,7 @@ package body Ada.Directories is
  Compose (Directory, File_Name) & ASCII.NUL;
   Path   : String renames
  Path_C (Path_C'First .. Path_C'Last - 1);
-  Attr   : aliased File_Attributes;
+  Attr   : aliased System.File_Attributes.File_Attributes;
   Exists : Integer;
   Error  : Integer;
 
diff --git a/gcc/ada/libgnat/a-nbnbin.adb b/gcc/ada/libgnat/a-nbnbin.adb
index 91074cfbc5c..2d140a49e53 100644
--- a/gcc/ada/libgnat/a-nbnbin.adb
+++ b/gcc/ada/libgnat/a-nbnbin.adb
@@ -69,7 +69,8 @@ package body Ada.Numerics.Big_Numbers.Big_Integers is
package Bignums is new System.Generic_Bignums
  (Bignum, Allocate_Bignum, Free_Bignum, To_Bignum);
 
-   use Bignums, System;
+   use System, Bignums;
+   subtype Bignum is Bignums.Bignum;
 
function Get_Bignum (Arg : Big_Integer) return Bignum is
  (if Arg.Value.C = System.Null_Address
diff --git a/gcc/ada/libgnat/a-timoio.adb b/gcc/ada/libgnat/a-timoio.adb
index 65222c1ea0d..eec92e3959a 100644
--- a/gcc/ada/libgnat/a-timoio.adb
+++ b/gcc/ada/libgnat/a-timoio.adb
@@ -36,11 +36,14 @@ with System.Img_LLB; use System.Img_LLB;
 with System.Img_LLU; use System.Img_LLU;
 with System.Img_LLW; use System.Img_LLW;
 with System.Img_WIU; use System.Img_WIU;
+with System.Unsigned_Types;
 with System.Val_Uns; use System.Val_Uns;
 with System.Val_LLU; use System.Val_LLU;
 
 package body Ada.Text_IO.Modular_IO is
 
+   subtype Unsigned is System.Unsigned_Types.Unsigned;
+
package Aux_Uns is new
  Ada.Text_IO.Integer_Aux
(Unsigned,
@@ -49,6 +52,8 @@ package body Ada.Text_IO.Modular_IO is
 Set_Image_Width_Unsigned,
 Set_Image_Based_Unsigned);
 
+   subtype Lo

[COMMITTED 7/9] ada: GNAT-LLVM compiler crash on container aggregates with iterators

2024-08-06 Thread Marc Poulhiès
From: Gary Dismukes 

Recent fixes for container aggregates with iterated element associations
exposed a latent bug with loops that are wrapped in blocks, where the loop
entity's scope was not adjusted to reflect the new enclosing block scope.

gcc/ada/

* sem_ch5.adb (Analyze_Loop_Statement.Wrap_Loop_Statement): Remove
the loop Entity_Id from its old scope and insert it in the new
block scope that wraps it.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch5.adb | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
index d44a12d1dd1..30fee6e6500 100644
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -3800,8 +3800,9 @@ package body Sem_Ch5 is
  procedure Wrap_Loop_Statement (Manage_Sec_Stack : Boolean) is
 Loc : constant Source_Ptr := Sloc (N);
 
-Blk: Node_Id;
-Blk_Id : Entity_Id;
+Blk : Node_Id;
+Blk_Id  : Entity_Id;
+Loop_Id : constant Entity_Id := Entity (Identifier (N));
 
  begin
 Blk :=
@@ -3816,6 +3817,12 @@ package body Sem_Ch5 is
 
 Rewrite (N, Blk);
 Analyze (N);
+
+--  Transfer the loop entity from its old scope to the new block
+--  scope.
+
+Remove_Entity (Loop_Id);
+Append_Entity (Loop_Id, Blk_Id);
  end Wrap_Loop_Statement;
 
  --  Local variables
-- 
2.45.2



[COMMITTED 9/9] ada: Fix error in GNATprove inlining with array concatenation

2024-08-06 Thread Marc Poulhiès
From: Yannick Moy 

Wrong interpretation of the type of the concatenation can lead to a
spurious error in GNATprove when inlining code. Now fixed.

gcc/ada/

* sem_ch4.adb (Analyze_Concatenation_Rest): Do not add a wrong
interpretation of the concatenation, using the type of the operand
already recognized as of the element type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index fc3a2a43c3c..9b77a81e43e 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -1995,6 +1995,7 @@ package body Sem_Ch4 is
   (Root_Type (LT) = Standard_String
  or else Scope (LT) /= Standard_Standard)
   and then Etype (R) = Any_String
+  and then not Is_Component_Left_Opnd (N)
 then
Add_One_Interp (N, Op_Id, LT);
 
@@ -2002,6 +2003,7 @@ package body Sem_Ch4 is
   (Root_Type (RT) = Standard_String
  or else Scope (RT) /= Standard_Standard)
   and then Etype (L) = Any_String
+  and then not Is_Component_Right_Opnd (N)
 then
Add_One_Interp (N, Op_Id, RT);
 
-- 
2.45.2



[COMMITTED 6/9] ada: Spurious error on the default value of a derived scalar type

2024-08-06 Thread Marc Poulhiès
From: Javier Miranda 

When the aspect Default_Value is inherited by a derived scalar
type, and both the parent type T and the derived type DT are
declared in the same scope, a spurious error may be reported.
This occurs if a subprogram declared in the same scope has a
parameter of type DT with a default value, leading the compiler
to incorrectly flag the default value specified in the aspect
of type T as having the wrong type.

gcc/ada/

* freeze.adb (Freeze_Entity): For scalar derived types that
inherit the aspect Default_Value, do not analyze and resolve the
inherited aspect, as the type of the aspect remains the parent
type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index a947018052c..7d5be6b6744 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -7820,7 +7820,24 @@ package body Freeze is
  --  type itself, and we treat Default_Component_Value similarly for
  --  the sake of uniformity).
 
- if Is_First_Subtype (E) and then Has_Default_Aspect (E) then
+ --  But for an inherited Default_Value aspect specification, the type
+ --  of the aspect remains the parent type. RM 3.3.1(11.1), a dynamic
+ --  semantics rule, says "The implicit initial value for a scalar
+ --  subtype that has the Default_Value aspect specified is the value
+ --  of that aspect converted to the nominal subtype". For an inherited
+ --  Default_Value aspect specification, no conversion is evaluated at
+ --  the point of the derived type declaration.
+
+ if Is_First_Subtype (E)
+   and then Has_Default_Aspect (E)
+   and then
+ (not Is_Scalar_Type (E)
+or else
+  not Is_Derived_Type (E)
+or else
+  Default_Aspect_Value (E)
+/= Default_Aspect_Value (Etype (Base_Type (E
+ then
 declare
Nam : Name_Id;
Exp : Node_Id;
-- 
2.45.2



[COMMITTED 4/9] ada: Assert failure in repinfo

2024-08-06 Thread Marc Poulhiès
From: Javier Miranda 

Using switch gnatR4, the frontend crashes when generating information
for a private record type.

gcc/ada/

* repinfo.adb (List_Record_Info): Handle private record types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/repinfo.adb | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/repinfo.adb b/gcc/ada/repinfo.adb
index 7dada5358f7..c08a232a3ab 100644
--- a/gcc/ada/repinfo.adb
+++ b/gcc/ada/repinfo.adb
@@ -521,7 +521,11 @@ package body Repinfo is
 
elsif Is_Record_Type (E) then
   if List_Representation_Info >= 1 then
- List_Record_Info (E, Bytes_Big_Endian);
+ if Is_Private_Type (E) then
+List_Record_Info (Full_View (E), Bytes_Big_Endian);
+ else
+List_Record_Info (E, Bytes_Big_Endian);
+ end if;
 
  --  Recurse into entities local to a record type
 
-- 
2.45.2



[r15-2739 Regression] FAIL: gfortran.dg/class_transformational_1.f90 -O3 -g (test for excess errors) on Linux/x86_64

2024-08-06 Thread haochen.jiang
On Linux/x86_64,

4cb07a38233aadb4b389a6e5236c95f52241b6e0 is the first bad commit
commit 4cb07a38233aadb4b389a6e5236c95f52241b6e0
Author: Paul Thomas 
Date:   Tue Aug 6 06:42:27 2024 +0100

Fortran: Fix class transformational intrinsic calls [PR102689]

caused

FAIL: gfortran.dg/class_transformational_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler 
error: in make_ssa_name_fn, at tree-ssanames.cc:355)
FAIL: gfortran.dg/class_transformational_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/class_transformational_1.f90   -O3 -g  (internal compiler 
error: in make_ssa_name_fn, at tree-ssanames.cc:355)
FAIL: gfortran.dg/class_transformational_1.f90   -O3 -g  (test for excess 
errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-2739/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/class_transformational_1.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/class_transformational_1.f90 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[COMMITTED 8/9] ada: Implement type inference for generic parameters

2024-08-06 Thread Marc Poulhiès
From: Bob Duff 

...based on previous work that added Gen_Assocs_Rec.
Minor cleanup of that previous work.

gcc/ada/

* sem_ch12.adb: Implement type inference for generic parameters.
(Maybe_Infer_One): Forbid inference of anonymous subtypes and
types.
(Inference_Reason): Fix comment.
* debug.adb: Document -gnatd_I switch.
* errout.ads: Document that Empty is not allowed for "&".
* errout.adb (Set_Msg_Insertion_Node): Minor: Do not allow
Error_Msg_Node_1 = Empty for "&". Use "in" instead of multiple
"=". Improve comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/debug.adb|   5 +-
 gcc/ada/errout.adb   |  23 +--
 gcc/ada/errout.ads   |  11 +-
 gcc/ada/sem_ch12.adb | 482 +--
 4 files changed, 485 insertions(+), 36 deletions(-)

diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
index d2546bec1b5..fcd04dfb93b 100644
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -173,7 +173,7 @@ package body Debug is
--  d_F  Encode full invocation paths in ALI files
--  d_G
--  d_H
-   --  d_I
+   --  d_I  Note generic formal type inference
--  d_J
--  d_K  (Reserved) Enable reporting a warning on known-problem issues
--  d_L  Output trace information on elaboration checking
@@ -1029,6 +1029,9 @@ package body Debug is
--   an external target, offering additional information to GNATBIND for
--   purposes of error diagnostics.
 
+   --  d_I  Generic formal type inference: print a "note:" message for each
+   --   actual type that is inferred, or could be inferred.
+
--  d_K  (Reserved) Enable reporting a warning on known-problem issues of
--   previous releases. No action performed in the wavefront.
 
diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index c6534fe2a76..c8d87f0f9bb 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3866,18 +3866,13 @@ package body Errout is

 
procedure Set_Msg_Insertion_Node is
+  pragma Assert (Present (Error_Msg_Node_1));
   K : Node_Kind;
 
begin
-  Suppress_Message :=
-Error_Msg_Node_1 = Error
-  or else Error_Msg_Node_1 = Any_Type;
+  Suppress_Message := Error_Msg_Node_1 in Error | Any_Type;
 
-  if Error_Msg_Node_1 = Empty then
- Set_Msg_Blank_Conditional;
- Set_Msg_Str ("");
-
-  elsif Error_Msg_Node_1 = Error then
+  if Error_Msg_Node_1 = Error then
  Set_Msg_Blank;
  Set_Msg_Str ("");
 
@@ -3898,15 +3893,11 @@ package body Errout is
 
  K := Nkind (Error_Msg_Node_1);
 
- --  If we have operator case, skip quotes since name of operator
- --  itself will supply the required quotations. An operator can be an
- --  applied use in an expression or an explicit operator symbol, or an
- --  identifier whose name indicates it is an operator.
+ --  Skip quotes in the operator case, because the operator will supply
+ --  the required quotes.
 
- if K in N_Op
-   or else K = N_Operator_Symbol
-   or else K = N_Defining_Operator_Symbol
-   or else ((K = N_Identifier or else K = N_Defining_Identifier)
+ if K in N_Op | N_Operator_Symbol | N_Defining_Operator_Symbol
+   or else (K in N_Identifier | N_Defining_Identifier
   and then Is_Operator_Name (Chars (Error_Msg_Node_1)))
  then
 Set_Msg_Node (Error_Msg_Node_1);
diff --git a/gcc/ada/errout.ads b/gcc/ada/errout.ads
index f0e3f5d0b7c..2b0410ae690 100644
--- a/gcc/ada/errout.ads
+++ b/gcc/ada/errout.ads
@@ -173,12 +173,11 @@ package Errout is
--  obtained from the Sloc field of the given node or nodes. If no Sloc
--  is available (happens e.g. for nodes in package Standard), then the
--  default case (see Scans spec) is used. The nodes to be used are
-   --  stored in Error_Msg_Node_1, Error_Msg_Node_2. No insertion occurs
-   --  for the Empty node, and the Error node results in the insertion of
-   --  the characters . In addition, if the special global variable
-   --  Error_Msg_Qual_Level is non-zero, then the reference will include
-   --  up to the given number of levels of qualification, using the scope
-   --  chain.
+   --  stored in Error_Msg_Node_1, Error_Msg_Node_2, which must not be
+   --  Empty. The Error node results in the insertion of "". In
+   --  addition, if the special global variable Error_Msg_Qual_Level is
+   --  non-zero, then the reference will include up to the given number of
+   --  levels of qualification, using the scope chain.
--
--  Note: the special names _xxx (xxx = Pre/Post/Invariant) are changed
--  to insert the string xxx'Class into the message.
diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 25821cb7695..0f8792c3a82 100644
--- a/gcc/ada/sem_ch

[COMMITTED 5/9] ada: Use fully qualified in more library files

2024-08-06 Thread Marc Poulhiès
From: Viljar Indus 

gcc/ada/

* libgnarl/s-interr__hwint.adb: Use fully qualified names to avoid
ambiguity.
* libgnarl/s-taprop__qnx.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-interr__hwint.adb | 11 ++-
 gcc/ada/libgnarl/s-taprop__qnx.adb   |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnarl/s-interr__hwint.adb 
b/gcc/ada/libgnarl/s-interr__hwint.adb
index 12dde452ff4..0cccf6fd294 100644
--- a/gcc/ada/libgnarl/s-interr__hwint.adb
+++ b/gcc/ada/libgnarl/s-interr__hwint.adb
@@ -482,9 +482,10 @@ package body System.Interrupts is
   Handler   : System.OS_Interface.Interrupt_Handler)
is
   Vec : constant Interrupt_Vector :=
-  Interrupt_Number_To_Vector (int (Interrupt));
+  Interrupt_Number_To_Vector
+(Interfaces.C.int (Interrupt));
 
-  Status : int;
+  Status : Interfaces.C.int;
 
begin
   --  Only install umbrella handler when no Ada handler has already been
@@ -613,7 +614,7 @@ package body System.Interrupts is
procedure Notify_Interrupt (Param : System.Address) is
   Interrupt : constant Interrupt_ID := Interrupt_ID (Param);
   Id: constant Binary_Semaphore_Id := Semaphore_ID_Map (Interrupt);
-  Status: int;
+  Status: Interfaces.C.int;
begin
   if Id /= 0 then
  Status := Binary_Semaphore_Release (Id);
@@ -744,7 +745,7 @@ package body System.Interrupts is
   
 
   procedure Unbind_Handler (Interrupt : Interrupt_ID) is
- Status : int;
+ Status : Interfaces.C.int;
 
   begin
  --  Flush server task off semaphore, allowing it to terminate
@@ -1024,7 +1025,7 @@ package body System.Interrupts is
   Tmp_Handler : Parameterless_Handler;
   Tmp_ID  : Task_Id;
   Tmp_Entry_Index : Task_Entry_Index;
-  Status  : int;
+  Status  : Interfaces.C.int;
 
begin
   Semaphore_ID_Map (Interrupt) := Int_Sema;
diff --git a/gcc/ada/libgnarl/s-taprop__qnx.adb 
b/gcc/ada/libgnarl/s-taprop__qnx.adb
index 39e6983f438..d6680b58dba 100644
--- a/gcc/ada/libgnarl/s-taprop__qnx.adb
+++ b/gcc/ada/libgnarl/s-taprop__qnx.adb
@@ -300,7 +300,7 @@ package body System.Task_Primitives.Operations is
  Res :=
mprotect
  (Stack_Base - (Stack_Base mod Page_Size) + Page_Size,
-  size_t (Page_Size),
+  Interfaces.C.size_t (Page_Size),
   prot => (if On then PROT_ON else PROT_OFF));
  pragma Assert (Res = 0);
   end if;
-- 
2.45.2



[PATCH] testsuite: Fix up pr116037.c test [PR116245]

2024-08-06 Thread Jakub Jelinek
Hi!

The test FAILs on big endian targets, because VV is a vector of unsigned 
__int128
and VC vector of unsigned char and so ((VC) vv)[0] is 0x01 on little endian
but 0xff on big endian and PDP endian.
As I believe it is intentional to test it as it is written on little endian,
the following patch just adds another case for big endian and for other
endians instead of figuring out what exactly to fetch it fetches the whole
unsigned __int128 and casts it to unsigned char.  Not that pdp11 has
__int128 support...

Tested on x86_64-linux and powerpc64-linux, ok for trunk?

2024-08-06  Jakub Jelinek  

PR rtl-optimization/116037
PR testsuite/116245
* gcc.dg/torture/pr116037.c (foo): Fix up for big end middle endian.

--- gcc/testsuite/gcc.dg/torture/pr116037.c.jj  2024-07-25 21:34:56.190147936 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr116037.c 2024-08-06 10:58:56.621762156 
+0200
@@ -16,7 +16,13 @@ VL vl;
 VV
 foo (unsigned long long x, VV vv)
 {
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
   x &= -((VC) vv)[0];
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  x &= -((VC) vv)[sizeof (__int128) - 1];
+#else
+  x &= -(unsigned char) (vv[0]);
+#endif
   vi *= (VI) (VS){ -vs[0], vc[0], vs[1], vi[7], vs[7], vl[7], x, vi[5] };
   return x + vv;
 }

Jakub



Re: [PATCH] testsuite: Fix up pr116037.c test [PR116245]

2024-08-06 Thread Richard Biener
On Tue, 6 Aug 2024, Jakub Jelinek wrote:

> Hi!
> 
> The test FAILs on big endian targets, because VV is a vector of unsigned 
> __int128
> and VC vector of unsigned char and so ((VC) vv)[0] is 0x01 on little endian
> but 0xff on big endian and PDP endian.
> As I believe it is intentional to test it as it is written on little endian,
> the following patch just adds another case for big endian and for other
> endians instead of figuring out what exactly to fetch it fetches the whole
> unsigned __int128 and casts it to unsigned char.  Not that pdp11 has
> __int128 support...
> 
> Tested on x86_64-linux and powerpc64-linux, ok for trunk?

OK.

> 2024-08-06  Jakub Jelinek  
> 
>   PR rtl-optimization/116037
>   PR testsuite/116245
>   * gcc.dg/torture/pr116037.c (foo): Fix up for big end middle endian.
> 
> --- gcc/testsuite/gcc.dg/torture/pr116037.c.jj2024-07-25 
> 21:34:56.190147936 +0200
> +++ gcc/testsuite/gcc.dg/torture/pr116037.c   2024-08-06 10:58:56.621762156 
> +0200
> @@ -16,7 +16,13 @@ VL vl;
>  VV
>  foo (unsigned long long x, VV vv)
>  {
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>x &= -((VC) vv)[0];
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +  x &= -((VC) vv)[sizeof (__int128) - 1];
> +#else
> +  x &= -(unsigned char) (vv[0]);
> +#endif
>vi *= (VI) (VS){ -vs[0], vc[0], vs[1], vi[7], vs[7], vl[7], x, vi[5] };
>return x + vv;
>  }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V: Minimal support for Zimop extension.

2024-08-06 Thread Nick Clifton

Hi Jeff,


2.43 was released over an weekend.  Is it possible to let it be supported after 
2.44? cc Nick and jan.

I don't think it's critical enough to backport to 2.43.  I'd just put it on the 
trunk so that it's available in 2.44.


It might be worth adding it to the 2.43 branch as well.  It is looking
like there will be need to create a point release this time as several
other last-minute problems have been uncovered and fixed just too late
to make it into the 2.43 release.

Cheers
  Nick




[PATCH] c++: Improve fixits for incorrect explicit instantiations

2024-08-06 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

When forgetting the '<>' on an explicit specialisation, the suggested
fixit hint suggests to add 'template <>', but naively applying will
cause nonsense results like 'template template <> struct S {};'.

Instead check if we're currently parsing an explicit instantiation, and
if so inform about the issue (an instantiation cannot have a class body)
and suggest a fixit of simply '<>' to create a specialisation instead.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_class_head): Clarify error message for
explicit instantiations.

gcc/testsuite/ChangeLog:

* g++.dg/template/explicit-instantiation9.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/parser.cc  | 19 ++-
 .../g++.dg/template/explicit-instantiation9.C |  6 ++
 2 files changed, 20 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/explicit-instantiation9.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index eb102dea829..4f2ad8201b7 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -27729,11 +27729,20 @@ cp_parser_class_head (cp_parser* parser,
  class_head_start_location,
  get_finish (type_start_token->location));
   rich_location richloc (line_table, reported_loc);
-  richloc.add_fixit_insert_before (class_head_start_location,
-   "template <> ");
-  error_at (&richloc,
-   "an explicit specialization must be preceded by"
-   " %%>");
+  if (processing_explicit_instantiation)
+   {
+ richloc.add_fixit_insert_before ("<> ");
+ error_at (&richloc,
+   "an explicit instantiation cannot have a definition;"
+   " use %%> to declare a specialization");
+   }
+  else
+   {
+ richloc.add_fixit_insert_before ("template <> ");
+ error_at (&richloc,
+   "an explicit specialization must be preceded by"
+   " %%>");
+   }
   invalid_explicit_specialization_p = true;
   /* Take the same action that would have been taken by
 cp_parser_explicit_specialization.  */
diff --git a/gcc/testsuite/g++.dg/template/explicit-instantiation9.C 
b/gcc/testsuite/g++.dg/template/explicit-instantiation9.C
new file mode 100644
index 000..c4400226ef8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/explicit-instantiation9.C
@@ -0,0 +1,6 @@
+// Fixits for specialisations are not valid for instantiations
+
+template 
+struct S {};
+
+template struct S {};  // { dg-error "explicit instantiation cannot have 
a definition" }
-- 
2.43.2



Re: [PATCH v1 1/3] aarch64: store signing key and signing method in DWARF _Unwind_FrameState

2024-08-06 Thread Richard Sandiford
Sorry for the slow review.

Matthieu Longo  writes:
> This patch is only a refactoring of the existing implementation
> of PAuth and returned-address signing. The existing behavior is
> preserved.
>
> _Unwind_FrameState already contains several CIE and FDE information
> (see the attributes below the comment "The information we care
> about from the CIE/FDE" in libgcc/unwind-dw2.h).
> The patch aims at moving the information from DWARF CIE (signing
> key stored in the augmentation string) and FDE (the used signing
> method) into _Unwind_FrameState along the already-stored CIE and
> FDE information.
> Note: those information have to be saved in frame_state_reg_info
> instead of _Unwind_FrameState as they need to be savable by
> DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
> both rely on the attribute "prev".
>
> Those new information in _Unwind_FrameState simplifies the look-up
> of the signing key when the return address is demangled. It also
> allows future signing methods to be easily added.
>
> _Unwind_FrameState is not a part of the public API of libunwind,
> so the change is backward compatible.
>
> A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
> allows to reset values (if needed) in the frame state and unwind
> context before changing the frame state to the caller context.
>
> A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
> isolates the architecture-specific augmentation strings in AArch64
> backend, and allows others architectures to reuse augmentation
> strings that would have clashed with AArch64 DWARF extensions.
>
> aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
> DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
> were documented to clarify where the value of the RA state register
> is stored (FS and CONTEXT respectively).

The abstraction generally looks good.  My main comment is that,
if we are putting the new behaviour behind target macros (which I
agree is a good idea), could we also have a target macro to abstract:

> +#if defined(__aarch64__) && !defined (__ILP32__)
> +unsigned char signing_key;
> +#endif

?  E.g. something like:

#ifdef MD_CFI_STATE
MD_CFI_STATE md;
#endif

with aarch64 defining:

#define MD_CFI_STATE struct { unsigned char signing_key; }

(Names and organisation are just suggestions.)

It might be good to try running the patch through

  contrig/check_GNU_style.py

since the GCC coding standards are quite picky about formatting and
stylistic issues.  The main ones I spotted were:

- In files that already use block comments, all comments should be
  block comments rather than // comments.  The comments should end
  with ".  " (full stop and two spaces).

- Block comments are formatted as:

/* Line 1
   Line 2.  */

  rather than as:

/* Line 1
 * Line 2.  */

- Function names are generally all lowrecase (e.g. "ra" rather than "RA").

- In function calls, there should be a space between the function and
  the opening "("

- For pointer types, there should be a space before a "*" (or string of
  "*"s), but no space afterwards.

- "const" qualifiers generally go before the type that they qualify,
  rather than afterwards.

- The line width should be 80 characters or fewer.  (The patch was pretty
  good about this, but there were a couple of long lines).

- In multi-line conditions, the ||s and &&s go at the beginning of lines,
  rather than at the end.  (Same for infix operators in general.)

More detailed comments below.

>
> libgcc/ChangeLog:
>
>   * config/aarch64/aarch64-unwind.h
>   (AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
>   (aarch64_RA_signing_method_t): The diversifiers used to sign a
>   function's return address.
>   (aarch64_pointer_auth_key): The key used to sign a function's
>   return address.
>   (aarch64_cie_signed_with_b_key): Deleted as the signing key is
>   available now in _Unwind_FrameState.
>   (MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
>   handler for architecture extensions.
>   (MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
>   initialization routine for DWARF frame state and context before
>   execution of DWARF instructions.
>   (aarch64_context_RA_state_get): Read RA state register from CONTEXT.
>   (aarch64_RA_state_get): Read RA state register from FS.
>   (aarch64_RA_state_set): Write RA state register into FS.
>   (aarch64_RA_state_toggle): Toggle RA state register in FS.
>   (aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
>   (aarch64_arch_extension_frame_init): Initialize defaults for the
>   signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
>   (aarch64_demangle_return_addr): Rely on the frame registers and
>   the signing_key attribute in _Unwind_FrameState.
>   * unwind-dw2-execute_cfa.h:
>   Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
>   instead of DW_CFA_GNU_window_save.
>   (DW_CFA_AARCH64_negate_ra_state

Re: [PATCH v1 2/3] libgcc: hide CIE and FDE data for DWARF architecture extensions behind a handler.

2024-08-06 Thread Richard Sandiford
Matthieu Longo  writes:
> This patch provides a new handler MD_ARCH_FRAME_STATE_T to hide an
> architecture-specific structure containing CIE and FDE data related
> to DWARF architecture extensions.
>
> Hiding the architecture-specific attributes behind a handler has the
> following benefits:
> 1. isolating those data from the generic ones in _Unwind_FrameState
> 2. avoiding casts to custom types.
> 3. preserving typing information when debugging with GDB, and so
>facilitating their printing.
>
> This approach required to add a new header md-unwind-def.h included at
> the top of libgcc/unwind-dw2.h, and redirecting to the corresponding
> architecture header via a symbolic link.
>
> An obvious drawback is the increase in complexity with macros, and
> headers. It also caused a split of architecture definitions between
> md-unwind-def.h (types definitions used in unwind-dw2.h) and
> md-unwind.h (local types definitions and handlers implementations).
> The naming of md-unwind.h with .h extension is a bit misleading as
> the file is only included in the middle of unwind-dw2.c. Changing
> this naming would require modification of others backends, which I
> prefered to abstain from. Overall the benefits are worth the added
> complexity from my perspective.

Sorry, I should have read 2/3 before making the suggestion in the
previous review.  I agree that it makes sense to separate this change
out, given that it involves a new header file.

It'd be good to update the comment in no-unwind.h:

/* Dummy header for targets without a definition of
   MD_FALLBACK_FRAME_STATE_FOR.  */

LGTM otherwise, thanks.

On patch 3: IMO it's better to post the regenerated files as part
of the same patch, so that each patch is self-contained.

Richard

> libgcc/ChangeLog:
>
> * Makefile.in: New target for symbolic link to md-unwind-def.h
> * config.host: New parameter md_unwind_def_header. Set it to
> aarch64/aarch64-unwind-def.h for AArch64 targets, or no-unwind.h
> by default.
> * config/aarch64/aarch64-unwind.h
> (aarch64_pointer_auth_key): Move to aarch64-unwind-def.h
> (aarch64_cie_aug_handler): Update.
> (aarch64_arch_extension_frame_init): Update.
> (aarch64_demangle_return_addr): Update.
> * configure.ac: New substitute variable md_unwind_def_header.
> * unwind-dw2.h (defined): MD_ARCH_FRAME_STATE_T.
> * config/aarch64/aarch64-unwind-def.h: New file.
> ---
>  libgcc/Makefile.in |  6 +++-
>  libgcc/config.host | 13 +--
>  libgcc/config/aarch64/aarch64-unwind-def.h | 41 ++
>  libgcc/config/aarch64/aarch64-unwind.h | 14 +++-
>  libgcc/configure.ac|  1 +
>  libgcc/unwind-dw2.h|  6 ++--
>  6 files changed, 67 insertions(+), 14 deletions(-)
>  create mode 100644 libgcc/config/aarch64/aarch64-unwind-def.h
>
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 0e46e9ef768..ffc45f21267 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -47,6 +47,7 @@ with_aix_soname = @with_aix_soname@
>  solaris_ld_v2_maps = @solaris_ld_v2_maps@
>  enable_execute_stack = @enable_execute_stack@
>  unwind_header = @unwind_header@
> +md_unwind_def_header = @md_unwind_def_header@
>  md_unwind_header = @md_unwind_header@
>  sfp_machine_header = @sfp_machine_header@
>  thread_header = @thread_header@
> @@ -358,13 +359,16 @@ SHLIBUNWIND_INSTALL =
>  
>  
>  # Create links to files specified in config.host.
> -LIBGCC_LINKS = enable-execute-stack.c unwind.h md-unwind-support.h \
> +LIBGCC_LINKS = enable-execute-stack.c \
> +   unwind.h md-unwind-def.h md-unwind-support.h \
> sfp-machine.h gthr-default.h
>  
>  enable-execute-stack.c: $(srcdir)/$(enable_execute_stack)
>   -$(LN_S) $< $@
>  unwind.h: $(srcdir)/$(unwind_header)
>   -$(LN_S) $< $@
> +md-unwind-def.h: $(srcdir)/config/$(md_unwind_def_header)
> + -$(LN_S) $< $@
>  md-unwind-support.h: $(srcdir)/config/$(md_unwind_header)
>   -$(LN_S) $< $@
>  sfp-machine.h: $(srcdir)/config/$(sfp_machine_header)
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 9fae51d4ce7..61825e72fe4 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -51,8 +51,10 @@
>  #If either is set, EXTRA_PARTS and
>  #EXTRA_MULTILIB_PARTS inherited from the GCC
>  #subdirectory will be ignored.
> -#  md_unwind_header  The name of a header file defining
> -#MD_FALLBACK_FRAME_STATE_FOR.
> +#  md_unwind_def_header The name of a header file defining 
> architecture-specific
> +#frame information types for unwinding.
> +#  md_unwind_header  The name of a header file defining architecture-specific
> +#handlers used in the unwinder.
>  #  sfp_machine_headerThe name of a sfp-machine.h header file for 
> soft-fp.
>  #Defaults to "$cp

Re: [RFC][PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones

2024-08-06 Thread Kyrylo Tkachov


> On 5 Aug 2024, at 18:00, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov  writes:
>>> On 5 Aug 2024, at 12:01, Richard Sandiford  
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Jennifer Schmitz  writes:
 This patch folds the SVE intrinsic svdiv into a vector of 1's in case
 1) the predicate is svptrue and
 2) dividend and divisor are equal.
 This is implemented in the gimple_folder for signed and unsigned
 integers. Corresponding test cases were added to the existing test
 suites.
 
 The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
 regression.
 OK for mainline?
 
 Please also advise whether it makes sense to implement the same 
 optimization
 for float types and if so, under which conditions?
>>> 
>>> I think we should instead use const_binop to try to fold the division
>>> whenever the predicate is all-true, or if the function uses _x predication.
>>> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.)
>>> 
>> 
>> From what I can see const_binop only works on constant arguments.
> 
> Yeah, it only produces a result for constant arguments.  I see now
> that that isn't the case that the patch is interested in, sorry.
> 
>> Is fold_binary a better interface to use ? I think it’d hook into the 
>> match.pd machinery for divisions at some point.
> 
> We shouldn't use that from gimple folders AIUI, but perhaps I misremember.
> (I realise we'd be using it only to test whether the result is constant,
> but even so.)
> 
> Have you (plural) come across a case where svdiv is used with equal
> non-constant arguments?  If it's just being done on first principles
> then how about starting with const_binop instead?  If possible, it'd be
> good to structure it so that we can reuse the code for svadd, svmul,
> svsub, etc.

We’ve had a bit of internal discussion on this to get our ducks in a row.
We are interested in having more powerful folding of SVE intrinsics generally 
and we’d like some advice on how best to approach this.
Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes 
where possible when they are _x-predicated or have a ptrue predicate. Hopefully 
that would allow us to get all the match.pd and fold-const.cc 
 optimizations “for free”.
Would that be a reasonable direction rather than adding custom folding code to 
individual intrinsics such as svdiv?
We’d need to ensure that the midend knows how to expand such GIMPLE codes with 
VLA types and that the required folding rules exist in match.pd (though maybe 
they work already for VLA types?)

Thanks,
Kyrill

> 
> Thanks,
> Richard
> 
> 
>> Thanks,
>> Kyrill
>> 
>>> We shouldn't need to vet the arguments, since const_binop does that itself.
>>> Using const_binop should also get the conditions right for floating-point
>>> divisions.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> 
 
 Signed-off-by: Jennifer Schmitz 
 
 gcc/
 
 * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
 Add optimization.
 
 gcc/testsuite/
 
 * gcc.target/aarch64/sve/acle/asm/div_s32.c: New test.
 * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
 * gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
 * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
 
 From 43913cfa47b31d055a0456c863a30e3e44acc2f0 Mon Sep 17 00:00:00 2001
 From: Jennifer Schmitz 
 Date: Fri, 2 Aug 2024 06:41:09 -0700
 Subject: [PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones
 
 This patch folds the SVE intrinsic svdiv into a vector of 1's in case
 1) the predicate is svptrue and
 2) dividend and divisor are equal.
 This is implemented in the gimple_folder for signed and unsigned
 integers. Corresponding test cases were added to the existing test
 suites.
 
 The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
 regression.
 OK for mainline?
 
 Signed-off-by: Jennifer Schmitz 
 
 gcc/
 
 * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
 Add optimization.
 
 gcc/testsuite/
 
 * gcc.target/aarch64/sve/acle/asm/div_s32.c: New test.
 * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
 * gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
 * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
 ---
 .../aarch64/aarch64-sve-builtins-base.cc  | 19 ++---
 .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 27 +++
 .../gcc.target/aarch64/sve/acle/asm/div_s64.c | 27 +++
 .../gcc.target/aarch64/sve/acle/asm/div_u32.c | 27 +++
 .../gcc.target/aarch64/sve/acle/asm/div_u64.c | 27 +++
 5 files changed, 124 insert

Re: [PATCH v2] Rearrange SLP nodes with duplicate statements. [PR98138]

2024-08-06 Thread Manolis Tsamis
Pinging this for a review and/or further feedback.

Thanks,
Manolis

On Wed, Jun 26, 2024 at 3:06 PM Manolis Tsamis  wrote:
>
> This change checks when a two_operators SLP node has multiple occurrences of
> the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the 
> operands
> so that there are no duplicates. Two vec_perm expressions are then introduced
> to recreate the original ordering. These duplicates can appear due to how
> two_operators nodes are handled, and they prevent vectorization in some cases.
>
> This targets the vectorization of the SPEC2017 x264 pixel_satd functions.
> In some processors a larger than 10% improvement on x264 has been observed.
>
> See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc: Avoid duplicates in two_operators nodes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vect-slp-two-operator.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
> Changes in v2:
> - Do not use predefined patterns; support rearrangement of arbitrary
> node orderings.
> - Only apply for two_operators nodes.
> - Recurse with single SLP operand instead of two duplicated ones.
> - Refactoring of code.
>
>  .../aarch64/vect-slp-two-operator.c   |  36 ++
>  gcc/tree-vect-slp.cc  | 114 ++
>  2 files changed, 150 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> new file mode 100644
> index 000..b6b093ffc34
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect 
> -fdump-tree-vect-details" } */
> +
> +typedef unsigned char uint8_t;
> +typedef unsigned int uint32_t;
> +
> +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
> +int t0 = s0 + s1;\
> +int t1 = s0 - s1;\
> +int t2 = s2 + s3;\
> +int t3 = s2 - s3;\
> +d0 = t0 + t2;\
> +d1 = t1 + t3;\
> +d2 = t0 - t2;\
> +d3 = t1 - t3;\
> +}
> +
> +void sink(uint32_t tmp[4][4]);
> +
> +int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int 
> i_pix2 )
> +{
> +uint32_t tmp[4][4];
> +int sum = 0;
> +for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
> +{
> +uint32_t a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
> +uint32_t a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
> +uint32_t a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
> +uint32_t a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
> +HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 );
> +}
> +sink(tmp);
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index b47b7e8c979..60d0d388dff 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -2420,6 +2420,95 @@ out:
>}
>swap = NULL;
>
> +  bool has_two_operators_perm = false;
> +  auto_vec two_op_perm_indices[2];
> +  vec two_op_scalar_stmts[2] = {vNULL, vNULL};
> +
> +  if (two_operators && oprnds_info.length () == 2 && group_size > 2)
> +{
> +  unsigned idx = 0;
> +  hash_map seen;
> +  vec new_oprnds_info
> +   = vect_create_oprnd_info (1, group_size);
> +  bool success = true;
> +
> +  enum tree_code code = ERROR_MARK;
> +  if (oprnds_info[0]->def_stmts[0]
> + && is_a (oprnds_info[0]->def_stmts[0]->stmt))
> +   code = gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt);
> +
> +  for (unsigned j = 0; j < group_size; ++j)
> +   {
> + FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
> +   {
> + stmt_vec_info stmt_info = oprnd_info->def_stmts[j];
> + if (!stmt_info || !stmt_info->stmt
> + || !is_a (stmt_info->stmt)
> + || gimple_assign_rhs_code (stmt_info->stmt) != code
> + || skip_args[i])
> +   {
> + success = false;
> + break;
> +   }
> +
> + bool exists;
> + unsigned &stmt_idx
> +   = seen.get_or_insert (stmt_info->stmt, &exists);
> +
> + if (!exists)
> +   {
> + new_oprnds_info[0]->def_stmts.safe_push (stmt_info);
> + new_oprnds_info[0]->ops.safe_push (oprnd_info->ops[j]);
> + stmt_idx = idx;
> + idx++;
> +   }
> +
> + two_op_perm_indices[i].safe_push (stmt_idx);
> +   }
> +
> + if (!success)
> +   break;
> +   }
> +
> + 

Re: [RFC][PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones

2024-08-06 Thread Richard Sandiford
Kyrylo Tkachov  writes:
>> On 5 Aug 2024, at 18:00, Richard Sandiford  wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Kyrylo Tkachov  writes:
 On 5 Aug 2024, at 12:01, Richard Sandiford  
 wrote:
 
 External email: Use caution opening links or attachments
 
 
 Jennifer Schmitz  writes:
> This patch folds the SVE intrinsic svdiv into a vector of 1's in case
> 1) the predicate is svptrue and
> 2) dividend and divisor are equal.
> This is implemented in the gimple_folder for signed and unsigned
> integers. Corresponding test cases were added to the existing test
> suites.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> regression.
> OK for mainline?
> 
> Please also advise whether it makes sense to implement the same 
> optimization
> for float types and if so, under which conditions?
 
 I think we should instead use const_binop to try to fold the division
 whenever the predicate is all-true, or if the function uses _x predication.
 (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.)
 
>>> 
>>> From what I can see const_binop only works on constant arguments.
>> 
>> Yeah, it only produces a result for constant arguments.  I see now
>> that that isn't the case that the patch is interested in, sorry.
>> 
>>> Is fold_binary a better interface to use ? I think it’d hook into the 
>>> match.pd machinery for divisions at some point.
>> 
>> We shouldn't use that from gimple folders AIUI, but perhaps I misremember.
>> (I realise we'd be using it only to test whether the result is constant,
>> but even so.)
>> 
>> Have you (plural) come across a case where svdiv is used with equal
>> non-constant arguments?  If it's just being done on first principles
>> then how about starting with const_binop instead?  If possible, it'd be
>> good to structure it so that we can reuse the code for svadd, svmul,
>> svsub, etc.
>
> We’ve had a bit of internal discussion on this to get our ducks in a row.
> We are interested in having more powerful folding of SVE intrinsics generally 
> and we’d like some advice on how best to approach this.
> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes 
> where possible when they are _x-predicated or have a ptrue predicate. 
> Hopefully that would allow us to get all the match.pd and fold-const.cc 
>  optimizations “for free”.
> Would that be a reasonable direction rather than adding custom folding code 
> to individual intrinsics such as svdiv?
> We’d need to ensure that the midend knows how to expand such GIMPLE codes 
> with VLA types and that the required folding rules exist in match.pd (though 
> maybe they work already for VLA types?)

Expansion shouldn't be a problem, since we already rely on that for
autovectorisation.

But I think this comes back to what we discussed earlier, in the context
of whether we should replace divisions by constants with multi-instruction
alternatives.  My comment there was:

  I'm a bit uneasy about going that far.  I suppose it comes down to a
  question about what intrinsics are for.  Are they for describing an
  algorithm, or for hand-optimising a specific implementation of the
  algorithm?  IMO it's the latter.

  If people want to write out a calculation in natural arithmetic, it
  would be better to write the algorithm in scalar code and let the
  vectoriser handle it.  That gives the opportunity for many more
  optimisations than just this one.

  Intrinsics are about giving programmers direct, architecture-level
  control over how something is implemented.  I've seen Arm's library
  teams go to great lengths to work out which out of a choice of
  instruction sequences is the best one, even though the sequences in
  question would look functionally equivalent to a smart-enough compiler.

  So part of the work of using intrinsics is to figure out what the best
  sequence is.  And IMO, part of the contract is that the compiler
  shouldn't interfere with the programmer's choices too much.  If the
  compiler makes a change, it must very confident that it is a win for
  the function as a whole.

  Replacing one division with one shift is fine, as an aid to the programmer.
  It removes the need for (say) templated functions to check for that case
  manually.  Constant folding is fine too, for similar reasons.  In these
  cases, there's not really a cost/benefit choice to be made between
  different expansions.  One choice is objectively better in all
  realistic situations.

  But when it comes to general constants, there are many different choices
  that could be made when deciding which constants should be open-coded
  and which shouldn't.  IMO we should leave the choice to the programmer
  in those cases.  If the compiler gets it wrong, there will be no way
  for the programmer to force the compiler's hand ("no, when I say svdiv

[PATCH] tree-optimization/116241 - ICE with SLP condition reduction

2024-08-06 Thread Richard Biener
When there's a conversion in front of a SLP condition reduction the
code following the reduc-idx SLP chain fails because it assumes
there's only COND_EXPRs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116241
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Handle
non-COND_EXPR nodes in SLP reduction chain following.

* g++.dg/vect/pr116241.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr116241.cc | 13 +
 gcc/tree-vect-loop.cc | 15 +--
 2 files changed, 22 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr116241.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr116241.cc 
b/gcc/testsuite/g++.dg/vect/pr116241.cc
new file mode 100644
index 000..7ab1ade2533
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116241.cc
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+short var_27;
+long test_var_5;
+int test_var_6;
+void test(short arr_11[][4][24])
+{
+  for (bool i_6 = 0;;)
+for (int i_7; i_7;)
+  for (int i_8; i_8 < test_var_5; i_8 += 1)
+var_27 *= test_var_6 && arr_11[2][1][i_8];
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 856ce491c3e..6456220cdc9 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6075,6 +6075,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  while (cond_node != slp_node_instance->reduc_phis)
{
  stmt_vec_info cond_info = SLP_TREE_REPRESENTATIVE (cond_node);
+ int slp_reduc_idx;
  if (gimple_assign_rhs_code (cond_info->stmt) == COND_EXPR)
{
  gimple *vec_stmt
@@ -6083,13 +6084,15 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  ccompares.safe_push
(std::make_pair (gimple_assign_rhs1 (vec_stmt),
 STMT_VINFO_REDUC_IDX (cond_info) == 2));
-   }
- /* ???  We probably want to have REDUC_IDX on the SLP node?
-We have both three and four children COND_EXPR nodes
-dependent on whether the comparison is still embedded
-as GENERIC.  So work backwards.  */
- int slp_reduc_idx = (SLP_TREE_CHILDREN (cond_node).length () - 3
+ /* ???  We probably want to have REDUC_IDX on the SLP node?
+We have both three and four children COND_EXPR nodes
+dependent on whether the comparison is still embedded
+as GENERIC.  So work backwards.  */
+ slp_reduc_idx = (SLP_TREE_CHILDREN (cond_node).length () - 3
   + STMT_VINFO_REDUC_IDX (cond_info));
+   }
+ else
+   slp_reduc_idx = STMT_VINFO_REDUC_IDX (cond_info);
  cond_node = SLP_TREE_CHILDREN (cond_node)[slp_reduc_idx];
}
}
-- 
2.43.0


Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:16AM +0530, Tejas Belagod wrote:
> Currently poly-int type structures are passed by value to OpenMP runtime
> functions for shared clauses etc.  This patch improves on this by passing
> around poly-int structures by address to avoid copy-overhead.
> 
> gcc/ChangeLog
>   * omp-low.c (use_pointer_for_field): Use pointer if the OMP data
>   structure's field type is a poly-int.

LGTM.

> diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
> index 1a65229cc37..b15607f4ef5 100644
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -466,7 +466,8 @@ static bool
>  use_pointer_for_field (tree decl, omp_context *shared_ctx)
>  {
>if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
> -  || TYPE_ATOMIC (TREE_TYPE (decl)))
> +  || TYPE_ATOMIC (TREE_TYPE (decl))
> +  || POLY_INT_CST_P (DECL_SIZE (decl)))
>  return true;
>  
>/* We can only use copy-in/copy-out semantics for shared variables
> -- 
> 2.25.1

Jakub



Re: [PATCH 02/11] AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:17AM +0530, Tejas Belagod wrote:
> This patch tests various shared clauses with SVE types.  It also adds a test
> scaffold to run OpenMP tests in under the gcc.target testsuite.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold.
>   * gcc/testsuite/gcc.target/aarch64/sve/omp/shared.c: New test.

I'd suggest gcc.target/aarch64/sve/gomp/gomp.exp
E.g. when doing quick testing of OpenMP patches, I'm just doing
make check-gcc RUNTESTFLAGS='gomp.exp goacc.exp goacc-gomp.exp'
+ libgomp testing, I think it would be useful not to have to remember
other *.exp names.  And gomp instead of omp as directory name would be
also consistent what is used elsewhere, gcc.dg/gomp/, g++.dg/gomp/,
gfortran.dg/gomp/, c-c++-common/gomp/ etc.

Though, for the driver name, on the other side gomp.exp doesn't add the
libgomp/.libs/ directory to search path etc., runtime tests generally
go to libgomp.

So, maybe this would better go into
libgomp/testsuite/libgomp.target/aarch64/aarch64.exp

Jakub



Re: [RFC][PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones

2024-08-06 Thread Kyrylo Tkachov


> On 6 Aug 2024, at 12:44, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov  writes:
>>> On 5 Aug 2024, at 18:00, Richard Sandiford  
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Kyrylo Tkachov  writes:
> On 5 Aug 2024, at 12:01, Richard Sandiford  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> This patch folds the SVE intrinsic svdiv into a vector of 1's in case
>> 1) the predicate is svptrue and
>> 2) dividend and divisor are equal.
>> This is implemented in the gimple_folder for signed and unsigned
>> integers. Corresponding test cases were added to the existing test
>> suites.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> regression.
>> OK for mainline?
>> 
>> Please also advise whether it makes sense to implement the same 
>> optimization
>> for float types and if so, under which conditions?
> 
> I think we should instead use const_binop to try to fold the division
> whenever the predicate is all-true, or if the function uses _x 
> predication.
> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.)
> 
 
 From what I can see const_binop only works on constant arguments.
>>> 
>>> Yeah, it only produces a result for constant arguments.  I see now
>>> that that isn't the case that the patch is interested in, sorry.
>>> 
 Is fold_binary a better interface to use ? I think it’d hook into the 
 match.pd machinery for divisions at some point.
>>> 
>>> We shouldn't use that from gimple folders AIUI, but perhaps I misremember.
>>> (I realise we'd be using it only to test whether the result is constant,
>>> but even so.)
>>> 
>>> Have you (plural) come across a case where svdiv is used with equal
>>> non-constant arguments?  If it's just being done on first principles
>>> then how about starting with const_binop instead?  If possible, it'd be
>>> good to structure it so that we can reuse the code for svadd, svmul,
>>> svsub, etc.
>> 
>> We’ve had a bit of internal discussion on this to get our ducks in a row.
>> We are interested in having more powerful folding of SVE intrinsics 
>> generally and we’d like some advice on how best to approach this.
>> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes 
>> where possible when they are _x-predicated or have a ptrue predicate. 
>> Hopefully that would allow us to get all the match.pd and fold-const.cc 
>>  optimizations “for free”.
>> Would that be a reasonable direction rather than adding custom folding code 
>> to individual intrinsics such as svdiv?
>> We’d need to ensure that the midend knows how to expand such GIMPLE codes 
>> with VLA types and that the required folding rules exist in match.pd (though 
>> maybe they work already for VLA types?)
> 
> Expansion shouldn't be a problem, since we already rely on that for
> autovectorisation.
> 
> But I think this comes back to what we discussed earlier, in the context
> of whether we should replace divisions by constants with multi-instruction
> alternatives.  My comment there was:
> 
>  I'm a bit uneasy about going that far.  I suppose it comes down to a
>  question about what intrinsics are for.  Are they for describing an
>  algorithm, or for hand-optimising a specific implementation of the
>  algorithm?  IMO it's the latter.
> 
>  If people want to write out a calculation in natural arithmetic, it
>  would be better to write the algorithm in scalar code and let the
>  vectoriser handle it.  That gives the opportunity for many more
>  optimisations than just this one.
> 
>  Intrinsics are about giving programmers direct, architecture-level
>  control over how something is implemented.  I've seen Arm's library
>  teams go to great lengths to work out which out of a choice of
>  instruction sequences is the best one, even though the sequences in
>  question would look functionally equivalent to a smart-enough compiler.
> 
>  So part of the work of using intrinsics is to figure out what the best
>  sequence is.  And IMO, part of the contract is that the compiler
>  shouldn't interfere with the programmer's choices too much.  If the
>  compiler makes a change, it must very confident that it is a win for
>  the function as a whole.
> 
>  Replacing one division with one shift is fine, as an aid to the programmer.
>  It removes the need for (say) templated functions to check for that case
>  manually.  Constant folding is fine too, for similar reasons.  In these
>  cases, there's not really a cost/benefit choice to be made between
>  different expansions.  One choice is objectively better in all
>  realistic situations.
> 
>  But when it comes to general constants, there are many different choices
>  that could be made when deciding 

Re: [PATCH 03/11] AArch64: Diagnose OpenMP offloading when SVE types involved.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:18AM +0530, Tejas Belagod wrote:
> The target clause in OpenMP is used to offload loop kernels to accelarator
> peripeherals.  target's 'map' clause is used to move data from and to the
> accelarator.  When the data is SVE type, it may not be suitable because of
> various reasons i.e. the two SVE targets may not agree on vector size or
> some targets don't support variable vector size.  This makes SVE unsuitable
> for use in OMP's 'map' clause.  This patch diagnoses all such cases and issues
> an error where SVE types are not suitable.

I've never heard of verify_type_context existence before, seems it is an
Aarch64/RiscV only thing, so I'll defer here to Richard S.
The question where to put the testcase remains.

Jakub



Re: [PATCH 04/11] AArch64: Test OpenMP lastprivate clause for various constructs.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:19AM +0530, Tejas Belagod wrote:
> This patch tests various OpenMP lastprivate clause with SVE object types in
> various construct contexts.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/omp/lastprivate.c: New test.

This is a dg-do compile test, so it could be in gcc.target/aarch64/sve/gomp/

Generally I'd suggest not to mix erroneous constructs with correct ones,
so that the correct ones can be tested up to assembly generation (or runtime
if needed), which might not be the case when there are errors in the same
testcase.
Furethermore, I think it would be useful to actually test the behavior
of lastprviate even for the constructs where you have error now, so
in lastprivate-1.c include say #pragma omp parallel sections lastprivate (va)
rather than just #pragma omp sections lastprivate (va) or the latter
if it is nested inside of #pragma omp parallel (and va is shared there).

> +/* This worksharing construct binds to an implicit outer parallel region in
> +whose scope va is declared and therefore is default private.  This causes
> +the lastprivate clause list item va to be diagnosed as private in the 
> outer
> +context.  Similarly for constructs for and distribute.  */
> +#pragma omp sections lastprivate (va) /* { dg-error {lastprivate variable 
> 'va' is private in outer context} } */
> +{
> +  #pragma omp section
> +  vb = svld1_s32 (svptrue_b32 (), b);
> +  #pragma omp section
> +  vc = svld1_s32 (svptrue_b32 (), c);
> +  #pragma omp section
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +}

Note, unless you know this is included in implicit parallel region with a
single thread, this isn't really valid.  The sections directive is user
explicitly saying that there are no dependencies between the 3 sections
here, which is clearly not the case, the first two compute variables which
the third one uses.

> +  int a[N], b[N], c[N];
> +  svint32_t va, vb, vc;
> +  int i;
> +
> +#pragma omp parallel for
> +  for (i = 0; i < N; i++)
> +{
> +  b[i] = i;
> +  c[i] = i + 1;
> +}
> +
> +#pragma omp for lastprivate (va) /* { dg-error {lastprivate variable 'va' is 
> private in outer context} } */
> +  for (i = 0; i < 1; i++)
> +{
> +  vb = svld1_s32 (svptrue_b32 (), b);
> +  vc = svld1_s32 (svptrue_b32 (), c);
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +}

vb and vc are shared, so if this was inside of a parallel, it wouldn't be
valid again, there would be a data race (assuming more than one iteration,
a single iteration is kind of weird).
> +
> +  return va;
> +}
> +
> +svint32_t __attribute__ ((noinline))
> +omp_lastprivate_simd ()
> +{
> +
> +  int a[N], b[N], c[N];
> +  svint32_t va, vb, vc;
> +  int i;
> +
> +#pragma omp parallel for
> +  for (i = 0; i < N; i++)
> +{
> +  b[i] = i;
> +  c[i] = i + 1;
> +}
> +
> +#pragma omp simd lastprivate (va)
> +  for (i = 0; i < 1; i++)
> +{
> +  vb = svld1_s32 (svptrue_b32 (), b);
> +  vc = svld1_s32 (svptrue_b32 (), c);
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +}

Similarly, these really aren't good examples IMHO on what user should
actually write.

Jakub



Re: [PATCH 05/11] AArch64: Test OpenMP threadprivate clause on SVE type.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:20AM +0530, Tejas Belagod wrote:
> This patch adds a test for ensuring threadprivate clause works for SVE type
> objects.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/threadprivate.c: New test.

Guess this would be better a runtime testcase.

Jakub



[PATCH] genoutput: Accelerate the place_operands function.

2024-08-06 Thread Xianmiao Qu
With the increase in the number of modes and patterns for some
backend architectures, the place_operands function becomes a
bottleneck in the speed of genoutput, and may even become a
bottleneck in the overall speed of building the GCC project.
This patch aims to accelerate the place_operands function,
the optimizations it includes are:
1. Use a hash table to store operand information,
   improving the lookup time for the first operand.
2. Move mode comparison to the beginning to avoid the scenarios of most strcmp.

I tested the speed improvements for the following backends,
Improvement Ratio
x86_64  197.9%
aarch64 954.5%
riscv   2578.6%
If the build machine is slow, then this improvement can save a lot of time.

I tested the genoutput output for x86_64/aarch64/riscv backends,
and there was no difference compared to before the optimization,
so this shouldn't introduce any functional issues.
---
 gcc/genoutput.cc | 101 ---
 1 file changed, 95 insertions(+), 6 deletions(-)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index efd81766bb5b..456d96112cfb 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -112,6 +112,8 @@ static int next_operand_number = 1;
 struct operand_data
 {
   struct operand_data *next;
+  /* Point to the next member with the same hash value in the hash table.  */
+  struct operand_data *eq_next;
   int index;
   const char *predicate;
   const char *constraint;
@@ -127,11 +129,12 @@ struct operand_data
 
 static struct operand_data null_operand =
 {
-  0, 0, "", "", E_VOIDmode, 0, 0, 0, 0, 0
+  0, 0, 0, "", "", E_VOIDmode, 0, 0, 0, 0, 0
 };
 
 static struct operand_data *odata = &null_operand;
 static struct operand_data **odata_end = &null_operand.next;
+static htab_t operand_data_table;
 
 /* Must match the constants in recog.h.  */
 
@@ -180,6 +183,11 @@ static void place_operands (class data *);
 static void process_template (class data *, const char *);
 static void validate_insn_alternatives (class data *);
 static void validate_insn_operands (class data *);
+static hashval_t hash_struct_operand_data (const void *);
+static int eq_struct_operand_data (const void *, const void *);
+static void insert_operand_data (struct operand_data *);
+static struct operand_data *lookup_operand_data (struct operand_data *);
+static void init_operand_data_table (void);
 
 class constraint_data
 {
@@ -532,6 +540,13 @@ compare_operands (struct operand_data *d0, struct 
operand_data *d1)
 {
   const char *p0, *p1;
 
+  /* On one hand, comparing strings for predicate and constraint
+ is time-consuming, and on the other hand, the probability of
+ different modes is relatively high. Therefore, checking the mode
+ first can speed up the execution of the program.  */
+  if (d0->mode != d1->mode)
+return 0;
+
   p0 = d0->predicate;
   if (!p0)
 p0 = "";
@@ -550,9 +565,6 @@ compare_operands (struct operand_data *d0, struct 
operand_data *d1)
   if (strcmp (p0, p1) != 0)
 return 0;
 
-  if (d0->mode != d1->mode)
-return 0;
-
   if (d0->strict_low != d1->strict_low)
 return 0;
 
@@ -577,9 +589,9 @@ place_operands (class data *d)
   return;
 }
 
+  od = lookup_operand_data (&d->operand[0]);
   /* Brute force substring search.  */
-  for (od = odata, i = 0; od; od = od->next, i = 0)
-if (compare_operands (od, &d->operand[0]))
+  for (i = 0; od; od = od->eq_next, i = 0)
   {
od2 = od->next;
i = 1;
@@ -605,6 +617,7 @@ place_operands (class data *d)
   *odata_end = od2;
   odata_end = &od2->next;
   od2->index = next_operand_number++;
+  insert_operand_data (od2);
 }
   *odata_end = NULL;
   return;
@@ -1049,6 +1062,7 @@ main (int argc, const char **argv)
   progname = "genoutput";
 
   init_insn_for_nothing ();
+  init_operand_data_table ();
 
   if (!init_rtx_reader_args (argc, argv))
 return (FATAL_EXIT_CODE);
@@ -1224,3 +1238,78 @@ mdep_constraint_len (const char *s, file_location loc, 
int opno)
   message_at (loc, "note:  in operand %d", opno);
   return 1; /* safe */
 }
+
+/* Helper to Hash a struct operand_data.  */
+
+static hashval_t
+hash_struct_operand_data (const void *ptr)
+{
+  const struct operand_data *d = (const struct operand_data *) ptr;
+  const char *pred, *cons;
+  hashval_t hash;
+
+  pred = d->predicate;
+  if (!pred)
+pred = "";
+  hash = htab_hash_string (pred);
+
+  cons = d->constraint;
+  if (!cons)
+cons = "";
+  hash = iterative_hash (cons, strlen (cons), hash);
+
+  hash = iterative_hash_object (d->mode, hash);
+  hash = iterative_hash_object (d->strict_low, hash);
+  hash = iterative_hash_object (d->eliminable, hash);
+  return hash;
+}
+
+/* Equality function of the operand_data hash table.  */
+
+static int
+eq_struct_operand_data (const void *p1, const void *p2)
+{
+  const struct operand_data *d1 = (const struct operand_data *) p1;
+  const struct operand_data *d2 = (const struct operand_data *) p2;
+
+  return compare_oper

Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-06 Thread Richard Biener
On Tue, Aug 6, 2024 at 3:21 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> It looks like the plus will have additional convert to unsigned in int8 and 
> int16, see below example in test.c.006t.gimple.
> And we need these convert ops in one matching pattern to cover all int scalar 
> types.

Ah, yeah - that's the usual (premature) frontend optimization to
shorten operations after the standard
mandated standard conversion (to 'int' in this case).

> I am not sure if there is a better way here, given convert in matching 
> pattern is not very elegant up to a point.
>
> int16_t
> add_i16 (int16_t a, int16_t b)
> {
>   int16_t sum = a + b;
>   return sum;
> }
>
> int32_t
> add_i32 (int32_t a, int32_t b)
> {
>   int32_t sum = a + b;
>   return sum;
> }
>
> --- 006t.gimple ---
> int16_t add_i16 (int16_t a, int16_t b)
> {
>   int16_t D.2815;
>   int16_t sum;
>
>   a.0_1 = (unsigned short) a;
>   b.1_2 = (unsigned short) b;
>   _3 = a.0_1 + b.1_2;
>   sum = (int16_t) _3;
>   D.2815 = sum;
>   return D.2815;
> }
>
> int32_t add_i32 (int32_t a, int32_t b)
> {
>   int32_t D.2817;
>   int32_t sum;
>
>   sum = a + b;
>   D.2817 = sum;
>   return D.2817;
> }
>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Monday, August 5, 2024 9:52 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> Thanks Richard for comments.
>
> > The convert looks odd to me given @0 is involved in both & operands.
>
> The convert is introduced as the GIMPLE IL is somehow different for int8_t 
> when compares to int32_t or int64_t.
> There are some additional ops convert to unsigned for plus, see below line 
> 8-9 and line 22-23.
> But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the 
> types from int8_t to int64_t, add the
> convert here.
>
> Or may be I have some mistake in the example, let me revisit it and send v2 
> if no surprise.
>
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y)
>6   │ {
>7   │   int8_t sum;
>8   │   unsigned char x.1_1;
>9   │   unsigned char y.2_2;
>   10   │   unsigned char _3;
>   11   │   signed char _4;
>   12   │   signed char _5;
>   13   │   int8_t _6;
>   14   │   _Bool _11;
>   15   │   signed char _12;
>   16   │   signed char _13;
>   17   │   signed char _14;
>   18   │   signed char _22;
>   19   │   signed char _23;
>   20   │
>   21   │[local count: 1073741822]:
>   22   │   x.1_1 = (unsigned char) x_7(D);
>   23   │   y.2_2 = (unsigned char) y_8(D);
>   24   │   _3 = x.1_1 + y.2_2;
>   25   │   sum_9 = (int8_t) _3;
>   26   │   _4 = x_7(D) ^ y_8(D);
>   27   │   _5 = x_7(D) ^ sum_9;
>   28   │   _23 = ~_4;
>   29   │   _22 = _5 & _23;
>   30   │   if (_22 < 0)
>   31   │ goto ; [41.00%]
>   32   │   else
>   33   │ goto ; [59.00%]
>   34   │
>   35   │[local count: 259738146]:
>   36   │   _11 = x_7(D) < 0;
>   37   │   _12 = (signed char) _11;
>   38   │   _13 = -_12;
>   39   │   _14 = _13 ^ 127;
>   40   │
>   41   │[local count: 1073741824]:
>   42   │   # _6 = PHI <_14(3), sum_9(2)>
>   43   │   return _6;
>   44   │
>   45   │ }
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 5, 2024 7:16 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> On Mon, Aug 5, 2024 at 9:14 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 1 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 1:
> >   #define DEF_SAT_S_ADD_FMT_1(T) \
> >   T __attribute__((noinline))\
> >   sat_s_add_##T##_fmt_1 (T x, T y)   \
> >   {  \
> > T min = (T)1u << (sizeof (T) * 8 - 1);   \
> > T max = min - 1; \
> > return (x ^ y) < 0   \
> >   ? (T)(x + y)   \
> >   : ((T)(x + y) ^ x) >= 0\
> > ? (T)(x + y) \
> > : x < 0 ? min : max; \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_1 (int64_t)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
> >6   │ {
> >7   │   long int _1;
> >8   │   long int _2;
> >9   │   long int _3;
> >   10   │   int64_t _4;
> >   11   │   long int _7;
> >   12   │   _Bool _9;
> >   13   │   long int _10;
> >   14   │   long int _11;
> >   15   │   long int _12;
> >   16   │   long int _13;
> >   17   │
> 

Re: [PATCH 06/11] AArch64: Test OpenMP user-defined reductions with SVE types.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:21AM +0530, Tejas Belagod wrote:
> This patch tests user-defined reductions on various constructs with objects
> of SVE type.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/udr-sve.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/udr-sve.c  | 166 ++
>  1 file changed, 166 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
> new file mode 100644
> index 000..049fbee9056
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
> @@ -0,0 +1,166 @@
> +/* { dg-do run } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +
> +#include 
> +
> +#pragma omp declare reduction (+:svint32_t: omp_out = svadd_s32_z 
> (svptrue_b32(), omp_in, omp_out))

Don't you need initializer clause, or is zero initialization what works for
this type?
In any case, it would be useful to also test with non-trivial initializer
clause.

> +
> +int parallel_reduction ()

Function name should go on next line (in various places).

> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};

The space before , rather than after it is weird.

> +int for_reduction ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int i = 0;
> +  int j;
> +  int64_t res;
> +
> +  #pragma omp parallel for reduction (+:va, i)
> +  for (j = 0; j < 8; j++)
> +{
> +  va = svld1_s32 (svptrue_b32 (), a);
> +  i++;

Why the i at all?  The loop has 8 iterations, no need
to have a reduction for that, just assume it must be 8 * 8 later.
Note, for the #pragma omp parallel case that was needed (or you could query
omp_get_num_threads ();).

> +  /* The list includes va that is already vectorized, so the only impact here
> + is on the scalar variable i.  OMP spec says only scalar variables are
> + allowed in the list.  Should non-scalars be diagnosed?  */

Why it should be diagnosed?  The implementation can always choose not to
vectorize, vectorization factor 1 is valid, and that is really what we use
for various cases we can't deal with (e.g. VLA types) or if the vectorizer
gives up.  The standard doesn't talk about vectorization at all, just about
single instruction multiple data and by data there it means whatever user
wrote, so single instruction handling multiple SVE vectors if user wrote
that.  It is fine if we just handle that as vf=1.

> +  #pragma omp simd reduction (+:va, i)
> +  for (j = 0; j < 8; j++)
> +{
> +  va = svld1_s32 (svptrue_b32 (), a);
> +  i++;

Again, why the i reduction?  The loop has 8 iterations, so it will be always
8 at the end.
> +}
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != i)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int taskloop_reduction ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int i = 0;
> +  int j;
> +  int64_t res;
> +
> +  #pragma omp taskloop reduction (+:va, i)
> +  for (j = 0; j < 8; j++)
> +{
> +  svint32_t tva = svld1_s32 (svptrue_b32 (), a);
> +  #pragma omp in_reduction (+: va)

This isn't a valid OpenMP pragma.
-Wunknown-pragmas should be diagnosing that.
What do you want to achieve?
in_reduction is valid on task, target and taskloop constructs,
but requires the var to be either a task_reduction or taskloop's
reduction on an outer construct.

> +  va = svadd_s32_z (svptrue_b32 (), tva, va);

Just note that the different iterations of the taskloop could be
in different tasks (though, testing taskloop without some #pragma omp
parallel surrounding it is again kind of pointless unless you really
want to test behavior in that implicit parallel case).
And, if it is separate tasks, e.g. 3rd iteration might not see the
2nd iteration's va but its own cleared one, they'll be only combined later.


> +  i++;
> +}
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != i * 8)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int task_reduction ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int i = 0;
> +  int j;
> +  int64_t res;
> +
> +  #pragma omp parallel reduction (task,+:va)
> +  {
> +va = svadd_s32_z (svptrue_b32 (), svld1_s32 (svptrue_b32 (), a), va);
> +i++;
> +  }

The use of task modifier doesn't make much sense here, if you want to test
it, you'd need to create tasks and use in_reduction (+:va) on it.
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != i * 8)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int inscan_reduction_incl ()
> +{
> +  

[PATCH v2] c++/modules: Ensure deduction guides are always reachable [PR115231]

2024-08-06 Thread Nathaniel Shead
On Fri, Jul 26, 2024 at 01:17:57PM -0400, Jason Merrill wrote:
> On 7/26/24 12:52 AM, Nathaniel Shead wrote:
> > On Tue, Jul 23, 2024 at 04:17:22PM -0400, Jason Merrill wrote:
> > > On 6/15/24 10:29 PM, Nathaniel Shead wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > 
> > > > This probably isn't the most efficient approach, since we need to do
> > > > name lookup to find deduction guides for a type which will also
> > > > potentially do a bunch of pointless lazy loading from imported modules,
> > > > but I wasn't able to work out a better approach without completely
> > > > reworking how deduction guides are stored and represented.
> > > 
> > > Indeed.  We likely want to find them more directly from the template; it's
> > > not clear to me that DECL_INITIAL is used for TEMPLATE_DECL, or we could 
> > > put
> > > them in an internal attribute or a separate hash table.
> > > 

I had a go at exploring some other representations of deduction guides
but I didn't really land anywhere; most things I tried would require
reimplementing a lot of the merging logic handled by name lookup
currently.  Potentially just having an additional hash table for
deduction guides maintained by pushdecl would work but that felt like
unnecessary duplication, I'm not sure that the benefits outweight the
cost of the name lookup.  (But I haven't measured this.)

> > > > -- >8 --
> > > > 
> > > > Deduction guides are represented as 'normal' functions currently, and
> > > > have no special handling in modules.  However, this causes some issues;
> > > > by [temp.deduct.guide] a deduction guide is not found by normal name
> > > > lookup and instead all reachable deduction guides for a class template
> > > > should be considered, but this does not happen currently.
> > > > 
> > > > To solve this, this patch ensures that all deduction guides are
> > > > considered exported to ensure that they are always visible to importers
> > > > if they are reachable.  Another alternative here would be to add a new
> > > > kind of "all reachable" flag to name lookup, but that is complicated by
> > > > some difficulties in handling GM entities; this may be a better way to
> > > > go if more kinds of entities end up needing this handling, however.
> > > > 
> > > > Another issue here is that because deduction guides are "unrelated"
> > > > functions, they will usually get discarded from the GMF, so this patch
> > > > ensures that when finding dependencies, GMF deduction guides will also
> > > > have bindings created.  We do this in find_dependencies so that we don't
> > > > unnecessarily create bindings for GMF deduction guides that are never
> > > > reached; for consistency we do this for *all* deduction guides, not just
> > > > GM ones.
> > > 
> > > If you fixed the dependency calculation, why do they also need to be
> > > exported?
> > 
> > Deduction guides aren't found using normal name lookup, but any
> > reachable deduction guide must be considered.  This means that even if
> > the module interface exports no declarations whatsoever, a deduction
> > guide declared in the module purview must still be considered by
> > importers.
> 
> Ah, I was missing the name lookup issue.
> 
> > The other option I've considered is adding a new "ANY_REACHABLE" flag to
> > name lookup which would also consider non-exported reachable decls.  On
> > further consideration I might actually go this way; I've been thinking
> > about how to resolve some issues adjacent to supporting textual
> > redefinitions that I believe this will be necessary for anyway, and we
> > can probably use this in tsubst_friend_class as well rather than the
> > current relatively ad-hoc solution.
> 
> There's also my hack in lookup_elaborated_type for ABI namespace types.
> 
> I'm not sure that it should be necessary to do this for redefinitions,
> though; what's the advantage over merging in check_module_override (apart
> from that needing to be fixed)?
> 
> > That said, I've realised that this patch isn't completely sufficient
> > anyway; consider:
> > 
> >// m.cpp
> >module;
> >template  struct S;
> >export module M;
> >S(int) -> S;
> > 
> >// x.cpp
> >template  struct S { S(int); };
> >import M;
> >int main() {
> >  S s(10);  // should be S s;
> >}
> > 
> > This patch doesn't correctly handle this case yet, we need to also
> > consider cases where only the deduction guide is in purview.
> 
> Indeed.
> 
> Jason
> 

Here's a small update to the patch which fixes the above bug by ensuring
that dependencies are created for purview deduction guides rather than
just being skipped entirely, and includes that testcase with dguide-3.

This patch is still using force-exported bindings rather than attempting
to do anything more clever with name lookup just yet.

Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

Deduction guides are represented as 

Re: [PATCH 07/11] AArch64: Test OpenMP uniform clause on SVE types.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:22AM +0530, Tejas Belagod wrote:
> This patch tests if simd uniform clause works with SVE types in simd regions.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/simd-uniform.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/simd-uniform.c | 71 +++
>  1 file changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c
> new file mode 100644
> index 000..6256ce9fdc1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c
> @@ -0,0 +1,71 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +
> +#include 
> +
> +#define N 256
> +
> +void init(int *a, int *a_ref, int *b, int n)
> +{
> +   int i;
> +   for ( i=0; i +   {
> +  a[i] = i;
> +  a_ref[i] = i;
> +  b[i] = N-i;
> +   }
> +}
> +
> +#pragma omp declare simd uniform(a, b, sz) linear (i)
> +void vec_add(int *a, int *b, int i, int64_t sz)

I don't see how this tests anything relevant to SVE types.
That would be an argument which is svint32_t or something
mentioned in uniform clause and using it in the function.

Jakub



Re: [PATCH 08/11] AArch64: Test OpenMP simd aligned clause with SVE types.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:23AM +0530, Tejas Belagod wrote:
> This patch tests simd aligned clause and their interaction with SVE types.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/simd-aligned.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/simd-aligned.c | 50 +++
>  1 file changed, 50 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c
> new file mode 100644
> index 000..6c75bb5a714
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c
> @@ -0,0 +1,50 @@
> +/* { dg-do run } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +#include 
> +
> +#define N 256
> +
> +int a[N] __attribute__((aligned (64)));
> +int b[N] __attribute__((aligned (64)));
> +
> +
> +__attribute((noipa))
> +void foo (int *p, int *q)
> +{
> +   svint32_t va, vb, vc;
> +   int i;
> +   uint64_t sz = svcntw ();
> +
> +#pragma omp simd aligned(p, q : 64) private (va, vb, vc) nontemporal (va, 
> vb, vc)

The testcase suggests the test is about aligned clause, but it tests it only
on something not related to SVE; it tests nontemporal clause on those.

Sure, testing nontemporal clause is useful, but given the test name, it
might be useful to also test aligned clause.
For C the argument must have array or pointer type, for C++ that or
reference to array or pointer type.
So, you might want to test svint32_t * in that clause, or svint32_t array
(if the latter is possible).  Of course, you'd need to arrange for the
svint32_t to be aligned corresponding to the alignment passed in.

Jakub



[PATCH] gimple ssa: Put SCCOPY logic into a class

2024-08-06 Thread Filip Kastl
Hello everybody,

In pr113054[1] Andrew said that he doesn't like the 'dead_stmts' static
variable I used when implementing the sccopy pass.  We agreed that wrapping
the relevant code from the pass in a class would be most likely the best
solution.  Here is a patch that does exactly that.  I waited until stage 1 to
submit it.

Bootstrapped and regtested on x86_64.  Is the patch ok to be pushed to trunk?

Cheers,
Filip Kastl


[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113054


-- 8< --


Currently the main logic of the sccopy pass is implemented as static
functions.  This patch instead puts the code into a class.  This also
gets rid of a global variable (dead_stmts).

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (class scc_copy_prop): New class.
(replace_scc_by_value): Put into...
(scc_copy_prop::replace_scc_by_value): ...scc_copy_prop.
(sccopy_visit_op): Put into...
(scc_copy_prop::visit_op): ...scc_copy_prop.
(sccopy_propagate): Put into...
(scc_copy_prop::propagate): ...scc_copy_prop.
(init_sccopy): Replace by...
(scc_copy_prop::scc_copy_prop): ...the construtor.
(finalize_sccopy): Replace by...
(scc_copy_prop::~scc_copy_prop): ...the destructor.
(pass_sccopy::execute): Use scc_copy_prop.

Signed-off-by: Filip Kastl 
---
 gcc/gimple-ssa-sccopy.cc | 66 ++--
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index 191a4c0b451..d9eaeab4abb 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -94,11 +94,6 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace {
 
-/* Bitmap tracking statements which were propagated to be removed at the end of
-   the pass.  */
-
-static bitmap dead_stmts;
-
 /* State of vertex during SCC discovery.
 
unvisited  Vertex hasn't yet been popped from worklist.
@@ -459,11 +454,33 @@ get_all_stmt_may_generate_copy (void)
   return result;
 }
 
+/* SCC copy propagation
+
+   'scc_copy_prop::propagate ()' is the main function of this pass.  */
+
+class scc_copy_prop
+{
+public:
+  scc_copy_prop ();
+  ~scc_copy_prop ();
+  void propagate ();
+
+private:
+  /* Bitmap tracking statements which were propagated so that they can be
+ removed at the end of the pass.  */
+  bitmap dead_stmts;
+
+  void visit_op (tree op, hash_set &outer_ops,
+   hash_set &scc_set, bool &is_inner,
+   tree &last_outer_op);
+  void replace_scc_by_value (vec scc, tree val);
+};
+
 /* For each statement from given SCC, replace its usages by value
VAL.  */
 
-static void
-replace_scc_by_value (vec scc, tree val)
+void
+scc_copy_prop::replace_scc_by_value (vec scc, tree val)
 {
   for (gimple *stmt : scc)
 {
@@ -476,12 +493,12 @@ replace_scc_by_value (vec scc, tree val)
 fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
 }
 
-/* Part of 'sccopy_propagate ()'.  */
+/* Part of 'scc_copy_prop::propagate ()'.  */
 
-static void
-sccopy_visit_op (tree op, hash_set &outer_ops,
-hash_set &scc_set, bool &is_inner,
-tree &last_outer_op)
+void
+scc_copy_prop::visit_op (tree op, hash_set &outer_ops,
+hash_set &scc_set, bool &is_inner,
+tree &last_outer_op)
 {
   bool op_in_scc = false;
 
@@ -539,8 +556,8 @@ sccopy_visit_op (tree op, hash_set &outer_ops,
  Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
  Section 3.2.  */
 
-static void
-sccopy_propagate ()
+void
+scc_copy_prop::propagate ()
 {
   auto_vec useful_stmts = get_all_stmt_may_generate_copy ();
   scc_discovery discovery;
@@ -575,14 +592,12 @@ sccopy_propagate ()
for (j = 0; j < gimple_phi_num_args (phi); j++)
  {
op = gimple_phi_arg_def (phi, j);
-   sccopy_visit_op (op, outer_ops, scc_set, is_inner,
-  last_outer_op);
+   visit_op (op, outer_ops, scc_set, is_inner, last_outer_op);
  }
break;
  case GIMPLE_ASSIGN:
op = gimple_assign_rhs1 (stmt);
-   sccopy_visit_op (op, outer_ops, scc_set, is_inner,
-  last_outer_op);
+   visit_op (op, outer_ops, scc_set, is_inner, last_outer_op);
break;
  default:
gcc_unreachable ();
@@ -613,19 +628,13 @@ sccopy_propagate ()
 }
 }
 
-/* Called when pass execution starts.  */
-
-static void
-init_sccopy (void)
+scc_copy_prop::scc_copy_prop ()
 {
   /* For propagated statements.  */
   dead_stmts = BITMAP_ALLOC (NULL);
 }
 
-/* Called before pass execution ends.  */
-
-static void
-finalize_sccopy (void)
+scc_copy_prop::~scc_copy_prop ()
 {
   /* Remove all propagated statements.  */
   simple_dce_from_worklist (dead_stmts);
@@ -668,9 +

Re: [PATCH 09/11] AArch64: Diagnose OpenMP linear clause for SVE type objects.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:24AM +0530, Tejas Belagod wrote:
> This patch tests if SVE object types if applied to linear clause is diagnosed
> as expected.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/linear.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/linear.c   | 33 +++
>  1 file changed, 33 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
> new file mode 100644
> index 000..77b823a73d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +
> +#include 
> +
> +int a[256];
> +
> +__attribute__((noinline, noclone)) int
> +f1 (svint32_t va, int i)
> +{
> +  #pragma omp parallel for linear (va: 8) linear (i: 4) /* { dg-error 
> {linear clause applied to non-integral non-pointer variable with type 
> 'svint32_t'} } */
> +  for (int j = 16; j < 64; j++)
> +{
> +  a[i] = j;
> +  i += 4;
> +  va = svindex_s32 (0,1);
> +}
> +  return i;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f2 (svbool_t p, int i)
> +{
> +  #pragma omp parallel for linear (p: 0) linear (i: 4) /* { dg-error {linear 
> clause applied to non-integral non-pointer variable with type 'svbool_t'} } */
> +  for (int j = 16; j < 64; j++)
> +{
> +  a[i] = j;
> +  i += 4;
> +  p = svptrue_b32 ();
> +}
> +  return i;
> +}

This should be also tested for other constructs which accept linear clause,
notably simd and declare simd.
Note, linear-step 0 is weird, I think better test with 1.

Jakub



Re: [PATCH 10/11] AArch64: Test OpenMP depend clause and its variations on SVE types

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:25AM +0530, Tejas Belagod wrote:
> This patch adds a test to test depend clause and its various dependency
> variations with SVE type objects.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/depend-1.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/depend-1.c | 223 ++
>  1 file changed, 223 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c
> new file mode 100644
> index 000..734c20fb9ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c
> @@ -0,0 +1,223 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +
> +#include 
> +
> +int zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0};
> +int ones[8] = { 1, 1, 1, 1, 1, 1, 1, 1 };
> +int twos[8] = { 2, 2, 2, 2, 2, 2, 2, 2 };
> +
> +void
> +dep (void)
> +{
> +  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
> +
> +  #pragma omp parallel
> +  #pragma omp single
> +  {
> +#pragma omp task shared (x) depend(out: x)
> +x = svld1_s32 (svptrue_b32 (), twos);
> +#pragma omp task shared (x) depend(in: x)
> +if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
> +  __builtin_abort  ();

Why the 2 spaces before () (several times)?
Otherwise LGTM, but see the earlier questions on where to put the tests.

Jakub



Re: [PATCH 11/11] AArch64: Diagnose SVE type objects when applied to OpenMP doacross clause.

2024-08-06 Thread Jakub Jelinek
On Mon, May 27, 2024 at 10:36:26AM +0530, Tejas Belagod wrote:
> This patch tests if SVE type objects when applied to doacross clause are
> correctly diagnosed.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/sve/omp/doacross.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/doacross.c | 22 +++
>  1 file changed, 22 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
> new file mode 100644
> index 000..a311887926b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +
> +#include 
> +
> +int a[256];
> +
> +__attribute__((noinline, noclone)) int
> +f1 (svint32_t va)
> +{
> +  int j;
> +  #pragma omp for ordered (1)
> +  for (j = 16; j < 64; j++)
> +{
> +  #pragma omp ordered doacross(sink: va) /* { dg-error {variable 'va' is 
> not an iteration of outermost loop 1, expected 'j'} } */
> +  a[j - 1] = j + svaddv_s32 (svptrue_b32 (), va);
> +  #pragma omp ordered doacross(source: omp_cur_iteration)
> +  j += 4;
> +  va = svindex_s32 (0,1);
> +}
> +  return j;
> +}

Ok pending the test placement.
You don't need -fdump-tree-ompexp for anything though, do you?

Jakub



[RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Alejandro Colomar
Hi!

v4:

-  Only evaluate the operand if the top array is VLA.  Inner VLAs are
   ignored.  [Joseph, Martin]
   This proved very useful for compile-time diagnostics, since we have
   more cases that are constant expressions.
-  Document the evaluation rules, which are unique to this operator
   (similar to sizeof, but we ignore inner VLAs).
-  Add tests to the testsuite.  [Joseph]
-  Swap diagnostic cases preference, to give more meaningful
   diagnostics.  [Martin]
-  Document that Xavier was the first one to suggest this feature, and
   provide a link to the mail thread where that happened.
   BTW, while reading that discussion from 2 years ago, I see that it
   was questioned the value of this operator.  Below is a rationale to
   defend it.
-  Document that Martin's help has been crucial for implementing this,
   with 'Co-developed-by'.  Would you mind confirming that I can use
   that tag?
-  CC += Kees, Qing, Jens

Rationale:

-  While compiler extensions already allow implementing ARRAY_SIZE()
   (), there's still no
   way to get the length of a function parameter which uses array
   notation.  While this first implementation doesn't support those yet
   (because there are some issues that need to be fixed first), the plan
   is to add support to those.  This would be a huge step towards arrays
   being first-class citizens in C.  In those cases, it would reduce the
   chance of programmer errors.  See for example
   .  That entire class of bugs
   would be over, _and_ programs would become simpler.

Some specific questions or concerns:

-  The tests seem to work as expected if I compile them manually, and
   run (the one that should be run) as a normal program.  The one that
   should not be run also gives the expected diagnostics.
   Can anyone give advice of why it's not running well under the test
   suite?

-  I don't like the fact that [*][n] is internally implemented exactly
   like [0][n], which makes them indistinguishable.  All other cases of
   [0] return a constent expression of value 0, but [0][n] must return a
   variable 0, to keep support for [*][n].
   Could you please change the way [*][n] (and thus [*]) is represented
   internally so that it can be differentiated from [0]?
   Do you have in mind any other way that would be a viable
   implementation of [*] that would allow distinguishing [0][n] and
   [*][n]?  Maybe making it to have one node instead of zero and mark
   that node specially?

At the bottom of this email is a range-diff against v3.

And below is a test program I used while developing the feature.  It is
quite similar to what's on the test suite (patch 4/4), since those are
based on this one.

It has comments where I'd like more diagnostics, but those are not
responsibility of this feature.  Some are fault of the representation
for [*], and others are already being worked on by Martin.  There are
also comments on code that causes compile-time errors as expected
(wanted).  Some assertions about evaluation of the operand are commented
out because due to the problems with [*][n] and [0][n] we have more
evaluation than I'd like.  However, those are only with [0], which is
not yet well supported by GCC, so we don't need to worry much for now.

The program below also compares with sizeof and alignof, which the
test-suite tests do not.

Have a lovely day!
Alex

$ cat len.c 
#include 
#include 
#include 


#define memberof(T, member)   \
( \
(T){}.member  \
)


struct s {
int x;
int y[8];
int z[];
};


struct s2 {
int x;
int z[] __attribute__((counted_by(x)));
};


extern int x[];


void array(void);
void incomplete_err(int inc[]);
void unspecified_err(void);
void vla(void);
void member_array(void);
void fam_err(void);
void vla_eval(void);
void in_vla_noeval(void);
void in_vla_noeval2(void);
void array_noeval(void);
void vla_eval2(void);
void matrix_0(void);
void matrix_fixed(void);
void matrix_vla(void);
void f_fixed(void);
void f_zero(void);
void f_vla(void);
void f_star(void);


int
main(int argc, char *argv[argc + 1])
{
(void) argv;

// Wishlist:
//n = lengthof(argv);
//printf("lengthof(argv) == %zu\n", n);

array();
incomplete_err(&argc);
unspecified_err();
vla();
member_array();
fam_err();
vla_eval();
 

[RFC v4 1/4] gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()

2024-08-06 Thread Alejandro Colomar
The old name was misleading.

While at it, also rename some temporary variables that are used with
this function, for consistency.

Link: 
https://inbox.sourceware.org/gcc-patches/9fffd80-dca-2c7e-14b-6c9b509a7...@redhat.com/T/#m2f661c67c8f7b2c405c8c7fc3152dd85dc729120
Cc: Gabriel Ravier 
Cc: Martin Uecker 
Cc: Joseph Myers 
Cc: Xavier Del Campo Romero 
Cc: Jakub Jelinek 

gcc/ChangeLog:

* tree.cc (array_type_nelts): Rename function ...
(array_type_nelts_minus_one): ... to this name.  The old name
was misleading.
* tree.h (array_type_nelts): Rename function ...
(array_type_nelts_minus_one): ... to this name.  The old name
was misleading.
* expr.cc (count_type_elements):
Rename array_type_nelts() => array_type_nelts_minus_one()
* config/aarch64/aarch64.cc
(pure_scalable_type_info::analyze_array): Likewise.
* config/i386/i386.cc (ix86_canonical_va_list_type): Likewise.

gcc/c/ChangeLog:

* c-decl.cc (one_element_array_type_p, get_parm_array_spec):
Rename array_type_nelts() => array_type_nelts_minus_one()
* c-fold.cc (c_fold_array_ref): Likewise.

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array):
Rename array_type_nelts() => array_type_nelts_minus_one()
* init.cc (build_zero_init_1): Likewise.
(build_value_init_noctor): Likewise.
(build_vec_init): Likewise.
(build_delete): Likewise.
* lambda.cc (add_capture): Likewise.
* tree.cc (array_type_nelts_top): Likewise.

gcc/fortran/ChangeLog:

* trans-array.cc (structure_alloc_comps):
Rename array_type_nelts() => array_type_nelts_minus_one()
* trans-openmp.cc (gfc_walk_alloc_comps): Likewise.
(gfc_omp_clause_linear_ctor): Likewise.

gcc/rust/ChangeLog:

* backend/rust-tree.cc (array_type_nelts_top):
Rename array_type_nelts() => array_type_nelts_minus_one()

Suggested-by: Richard Biener 
Signed-off-by: Alejandro Colomar 
---
 gcc/c/c-decl.cc   | 10 +-
 gcc/c/c-fold.cc   |  7 ---
 gcc/config/aarch64/aarch64.cc |  2 +-
 gcc/config/i386/i386.cc   |  2 +-
 gcc/cp/decl.cc|  2 +-
 gcc/cp/init.cc|  8 
 gcc/cp/lambda.cc  |  3 ++-
 gcc/cp/tree.cc|  2 +-
 gcc/expr.cc   |  8 
 gcc/fortran/trans-array.cc|  2 +-
 gcc/fortran/trans-openmp.cc   |  4 ++--
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/tree.cc   |  4 ++--
 gcc/tree.h|  2 +-
 14 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 97f1d346835..4dced430d1f 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5309,7 +5309,7 @@ one_element_array_type_p (const_tree type)
 {
   if (TREE_CODE (type) != ARRAY_TYPE)
 return false;
-  return integer_zerop (array_type_nelts (type));
+  return integer_zerop (array_type_nelts_minus_one (type));
 }
 
 /* Determine whether TYPE is a zero-length array type "[0]".  */
@@ -6257,15 +6257,15 @@ get_parm_array_spec (const struct c_parm *parm, tree 
attrs)
  for (tree type = parm->specs->type; TREE_CODE (type) == ARRAY_TYPE;
   type = TREE_TYPE (type))
{
- tree nelts = array_type_nelts (type);
- if (error_operand_p (nelts))
+ tree nelts_minus_one = array_type_nelts_minus_one (type);
+ if (error_operand_p (nelts_minus_one))
return attrs;
- if (TREE_CODE (nelts) != INTEGER_CST)
+ if (TREE_CODE (nelts_minus_one) != INTEGER_CST)
{
  /* Each variable VLA bound is represented by the dollar
 sign.  */
  spec += "$";
- tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
+ tpbnds = tree_cons (NULL_TREE, nelts_minus_one, tpbnds);
}
}
  tpbnds = nreverse (tpbnds);
diff --git a/gcc/c/c-fold.cc b/gcc/c/c-fold.cc
index 57b67c74bd8..9ea174f79c4 100644
--- a/gcc/c/c-fold.cc
+++ b/gcc/c/c-fold.cc
@@ -73,11 +73,12 @@ c_fold_array_ref (tree type, tree ary, tree index)
   unsigned elem_nchars = (TYPE_PRECISION (elem_type)
  / TYPE_PRECISION (char_type_node));
   unsigned len = (unsigned) TREE_STRING_LENGTH (ary) / elem_nchars;
-  tree nelts = array_type_nelts (TREE_TYPE (ary));
+  tree nelts_minus_one = array_type_nelts_minus_one (TREE_TYPE (ary));
   bool dummy1 = true, dummy2 = true;
-  nelts = c_fully_fold_internal (nelts, true, &dummy1, &dummy2, false, false);
+  nelts_minus_one = c_fully_fold_internal (nelts_minus_one, true, &dummy1,
+  &dummy2, false, false);
   unsigned HOST_WIDE_INT i = tree_to_uhwi (index);
-  if (!tree_int_cst_le (index, nelts)
+  if (!tree_int_cst_le (index, nelts_minus_one)
   || i >= len
   || i + elem_nchars > len)
   

[RFC v4 2/4] Merge definitions of array_type_nelts_top()

2024-08-06 Thread Alejandro Colomar
There were two identical definitions, and none of them are available
where they are needed for implementing __lengthof__().  Merge them, and
provide the single definition in gcc/tree.{h,cc}, where it's available
for __lengthof__().

Signed-off-by: Alejandro Colomar 
---
 gcc/cp/cp-tree.h  |  1 -
 gcc/cp/tree.cc| 13 -
 gcc/rust/backend/rust-tree.cc | 13 -
 gcc/rust/backend/rust-tree.h  |  2 --
 gcc/tree.cc   | 13 +
 gcc/tree.h|  1 +
 6 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c1a371bc721..e6c1c63f872 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8099,7 +8099,6 @@ extern tree build_exception_variant   (tree, 
tree);
 extern void fixup_deferred_exception_variants   (tree, tree);
 extern tree bind_template_template_parm(tree, tree);
 extern tree array_type_nelts_total (tree);
-extern tree array_type_nelts_top   (tree);
 extern bool array_of_unknown_bound_p   (const_tree);
 extern tree break_out_target_exprs (tree, bool = false);
 extern tree build_ctor_subob_ref   (tree, tree, tree);
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 3baeb8fa252..1f3ecff1a21 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -3071,19 +3071,6 @@ cxx_print_statistics (void)
 depth_reached);
 }
 
-/* Return, as an INTEGER_CST node, the number of elements for TYPE
-   (which is an ARRAY_TYPE).  This counts only elements of the top
-   array.  */
-
-tree
-array_type_nelts_top (tree type)
-{
-  return fold_build2_loc (input_location,
- PLUS_EXPR, sizetype,
- array_type_nelts_minus_one (type),
- size_one_node);
-}
-
 /* Return, as an INTEGER_CST node, the number of elements for TYPE
(which is an ARRAY_TYPE).  This one is a recursive count of all
ARRAY_TYPEs that are clumped together.  */
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index a2c12204667..dd8eda84f9b 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -859,19 +859,6 @@ is_empty_class (tree type)
   return CLASSTYPE_EMPTY_P (type);
 }
 
-// forked from gcc/cp/tree.cc array_type_nelts_top
-
-/* Return, as an INTEGER_CST node, the number of elements for TYPE
-   (which is an ARRAY_TYPE).  This counts only elements of the top
-   array.  */
-
-tree
-array_type_nelts_top (tree type)
-{
-  return fold_build2_loc (input_location, PLUS_EXPR, sizetype,
- array_type_nelts_minus_one (type), size_one_node);
-}
-
 // forked from gcc/cp/tree.cc builtin_valid_in_constant_expr_p
 
 /* Test whether DECL is a builtin that may appear in a
diff --git a/gcc/rust/backend/rust-tree.h b/gcc/rust/backend/rust-tree.h
index 26c8b653ac6..e597c3ab81d 100644
--- a/gcc/rust/backend/rust-tree.h
+++ b/gcc/rust/backend/rust-tree.h
@@ -2993,8 +2993,6 @@ extern location_t rs_expr_location (const_tree);
 extern int
 is_empty_class (tree type);
 
-extern tree array_type_nelts_top (tree);
-
 extern bool
 is_really_empty_class (tree, bool);
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index dcaccc4c362..cbbc7627ad6 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -3729,6 +3729,19 @@ array_type_nelts_minus_one (const_tree type)
  ? max
  : fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, min));
 }
+
+/* Return, as an INTEGER_CST node, the number of elements for TYPE
+   (which is an ARRAY_TYPE).  This counts only elements of the top
+   array.  */
+
+tree
+array_type_nelts_top (tree type)
+{
+  return fold_build2_loc (input_location,
+ PLUS_EXPR, sizetype,
+ array_type_nelts_minus_one (type),
+ size_one_node);
+}
 
 /* If arg is static -- a reference to an object in static storage -- then
return the object.  This is not the same as the C meaning of `static'.
diff --git a/gcc/tree.h b/gcc/tree.h
index fdddbcf408e..a6c46440b1a 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4922,6 +4922,7 @@ extern tree build_method_type (tree, tree);
 extern tree build_offset_type (tree, tree);
 extern tree build_complex_type (tree, bool named = false);
 extern tree array_type_nelts_minus_one (const_tree);
+extern tree array_type_nelts_top (tree);
 
 extern tree value_member (tree, tree);
 extern tree purpose_member (const_tree, tree);
-- 
2.45.2



signature.asc
Description: PGP signature


[RFC v4 4/4] testsuite: Add tests for __lengthof__

2024-08-06 Thread Alejandro Colomar
I've compiled those files manually, and they behave as expected.  But
within the test-suite, they don't seem to work:

FAIL: gcc.dg/lengthof-compile.c (test for excess errors)
FAIL: gcc.dg/lengthof.c (test for excess errors)

Signed-off-by: Alejandro Colomar 
---
 gcc/testsuite/gcc.dg/lengthof-compile.c |  48 +
 gcc/testsuite/gcc.dg/lengthof.c | 126 
 2 files changed, 174 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/lengthof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/lengthof.c

diff --git a/gcc/testsuite/gcc.dg/lengthof-compile.c 
b/gcc/testsuite/gcc.dg/lengthof-compile.c
new file mode 100644
index 000..b5ca8978a99
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lengthof-compile.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+
+extern int x[];
+
+void
+incomplete(int p[])
+{
+  unsigned n;
+
+  n = __lengthof__(x);  /* { dg-error "incomplete" } */
+
+  /* We want to support the following one in the future,
+ but for now it should fail.  */
+  n = __lengthof__(p);  /* { dg-error "invalid" } */
+}
+
+void
+fam(void)
+{
+  struct {
+int x;
+int fam[];
+  } s;
+  unsigned n;
+
+  n = __lengthof__(s.fam); /* { dg-error "incomplete" } */
+}
+
+void fix_fix(int i, char (*a)[3][5], int (*x)[__lengthof__(*a)]);
+void fix_var(int i, char (*a)[3][i], int (*x)[__lengthof__(*a)]);
+void fix_uns(int i, char (*a)[3][*], int (*x)[__lengthof__(*a)]);
+
+void
+func(void)
+{
+  int  i3[3];
+  int  i5[5];
+  char c35[3][5];
+
+  fix_fix(5, &c35, &i3);
+  fix_fix(5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
+
+  fix_var(5, &c35, &i3);
+  fix_var(5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
+
+  fix_uns(5, &c35, &i3);
+  fix_uns(5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
+}
diff --git a/gcc/testsuite/gcc.dg/lengthof.c b/gcc/testsuite/gcc.dg/lengthof.c
new file mode 100644
index 000..6aec558749c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lengthof.c
@@ -0,0 +1,126 @@
+/* { dg-do run } */
+
+#undef NDEBUG
+#include 
+
+void
+array(void)
+{
+  short a[7];
+
+  assert(__lengthof__(a) == 7);
+  assert(__lengthof__(long [0]) == 0);
+  assert(__lengthof__(unsigned [99]) == 99);
+}
+
+void
+vla(void)
+{
+  unsigned n;
+
+  n = 99;
+  assert(__lengthof__(short [n - 10]) == 99 - 10);
+
+  int v[n / 2];
+  assert(__lengthof__(v) == 99 / 2);
+
+  n = 0;
+  int z[n];
+  assert(__lengthof__(z) == 0);
+}
+
+void
+member(void)
+{
+  struct {
+int a[8];
+  } s;
+
+  assert(__lengthof__(s.a) == 8);
+}
+
+void
+vla_eval(void)
+{
+  int i;
+
+  i = 7;
+  assert(__lengthof__(struct {int x;}[i++]) == 7);
+  assert(i == 7 + 1);
+
+  int v[i];
+  int (*p)[i];
+  p = &v;
+  assert(__lengthof__(*p++) == i);
+  assert(p - 1 == &v);
+}
+
+void
+inner_vla_noeval(void)
+{
+  int i;
+
+  i = 3;
+  assert(__lengthof__(struct {int x[i++];}[3]) == 3);
+  assert(i == 3);
+}
+
+void
+array_noeval(void)
+{
+  long a[5];
+  long (*p)[__lengthof__(a)];
+
+  p = &a;
+  assert(__lengthof__(*p++) == 5);
+  assert(p == &a);
+}
+
+void
+matrix_zero(void)
+{
+  int i;
+
+  assert(__lengthof__(int [0][4]) == 0);
+  i = 3;
+  assert(__lengthof__(int [0][i]) == 0);
+}
+
+void
+matrix_fixed(void)
+{
+  int i;
+
+  assert(__lengthof__(int [7][4]) == 7);
+  i = 3;
+  assert(__lengthof__(int [7][i]) == 7);
+}
+
+void
+matrix_vla(void)
+{
+  int i, j;
+
+  i = 7;
+  assert(__lengthof__(int [i++][4]) == 7);
+  assert(i == 7 + 1);
+
+  i = 9;
+  j = 3;
+  assert(__lengthof__(int [i++][j]) == 9);
+  assert(i == 9 + 1);
+}
+
+int
+main(void)
+{
+  array();
+  vla();
+  member();
+  vla_eval();
+  inner_vla_noeval();
+  array_noeval();
+  matrix_zero();
+  matrix_fixed();
+  matrix_vla();
+}
-- 
2.45.2



signature.asc
Description: PGP signature


[RFC v4 3/4] c: Add __lengthof__() operator (n2529)

2024-08-06 Thread Alejandro Colomar
This operator is similar to sizeof() but can only be applied to an
array, and returns its length (number of elements).

FUTURE DIRECTIONS:

We could make it work with array parameters to functions, and
somehow magically return the length designator of the array,
regardless of it being really a pointer.

Link: 
Link: 
Suggested-by: Xavier Del Campo Romero 
Co-developed-by: Martin Uecker 
Cc: Gabriel Ravier 
Cc: Joseph Myers 
Cc: Jakub Jelinek 
Cc: Kees Cook 
Cc: Qing Zhao 
Cc: Jens Gustedt 
Signed-off-by: Alejandro Colomar 
---
 gcc/c-family/c-common.cc  |  26 +
 gcc/c-family/c-common.def |   3 +
 gcc/c-family/c-common.h   |   2 +
 gcc/c/c-decl.cc   |  20 +--
 gcc/c/c-parser.cc |  61 +++-
 gcc/c/c-tree.h|   4 ++
 gcc/c/c-typeck.cc | 114 --
 gcc/cp/operators.def  |   1 +
 gcc/doc/extend.texi   |  27 +
 gcc/target.h  |   3 +
 10 files changed, 237 insertions(+), 24 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e7e371fd26f..9f5feb83345 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -465,6 +465,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__inline",RID_INLINE, 0 },
   { "__inline__",  RID_INLINE, 0 },
   { "__label__",   RID_LABEL,  0 },
+  { "__lengthof__",RID_LENGTHOF, 0 },
   { "__null",  RID_NULL,   0 },
   { "__real",  RID_REALPART,   0 },
   { "__real__",RID_REALPART,   0 },
@@ -4070,6 +4071,31 @@ c_alignof_expr (location_t loc, tree expr)
 
   return fold_convert_loc (loc, size_type_node, t);
 }
+
+/* Implement the lengthof keyword: Return the length of an array,
+   that is, the number of elements in the array.  */
+
+tree
+c_lengthof_type (location_t loc, tree type)
+{
+  enum tree_code type_code;
+
+  type_code = TREE_CODE (type);
+  if (type_code != ARRAY_TYPE)
+{
+  error_at (loc, "invalid application of % to type %qT", type);
+  return error_mark_node;
+}
+  if (!COMPLETE_TYPE_P (type))
+{
+  error_at (loc,
+   "invalid application of % to incomplete type %qT",
+   type);
+  return error_mark_node;
+}
+
+  return array_type_nelts_top (type);
+}
 
 /* Handle C and C++ default attributes.  */
 
diff --git a/gcc/c-family/c-common.def b/gcc/c-family/c-common.def
index 5de96e5d4a8..6d162f67104 100644
--- a/gcc/c-family/c-common.def
+++ b/gcc/c-family/c-common.def
@@ -50,6 +50,9 @@ DEFTREECODE (EXCESS_PRECISION_EXPR, "excess_precision_expr", 
tcc_expression, 1)
number.  */
 DEFTREECODE (USERDEF_LITERAL, "userdef_literal", tcc_exceptional, 3)
 
+/* Represents a 'lengthof' expression.  */
+DEFTREECODE (LENGTHOF_EXPR, "lengthof_expr", tcc_expression, 1)
+
 /* Represents a 'sizeof' expression during C++ template expansion,
or for the purpose of -Wsizeof-pointer-memaccess warning.  */
 DEFTREECODE (SIZEOF_EXPR, "sizeof_expr", tcc_expression, 1)
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ccaea27c2b9..f815a4cf3bc 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -105,6 +105,7 @@ enum rid
 
   /* C extensions */
   RID_ASM,   RID_TYPEOF,   RID_TYPEOF_UNQUAL, RID_ALIGNOF,  RID_ATTRIBUTE,
+  RID_LENGTHOF,
   RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,   RID_BUILTIN_SHUFFLE,
@@ -885,6 +886,7 @@ extern tree c_common_truthvalue_conversion (location_t, 
tree);
 extern void c_apply_type_quals_to_decl (int, tree);
 extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool, int);
 extern tree c_alignof_expr (location_t, tree);
+extern tree c_lengthof_type (location_t, tree);
 /* Print an error message for invalid operands to arith operation CODE.
NOP_EXPR is used as a special case (see truthvalue_conversion).  */
 extern void binary_op_error (rich_location *, enum tree_code, tree, tree);
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 4dced430d1f..790c58b2558 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -8937,12 +8937,16 @@ start_struct (location_t loc, enum tree_code code, tree 
name,
  within a statement expr used within sizeof, et. al.  This is not
  terribly serious as C++ doesn't permit statement exprs within
  sizeof anyhow.  */
-  if (warn_cxx_compat && (in_sizeof || in_typeof || in_alignof))
+  if (warn_cxx_compat && (in_sizeof || in_typeof || in_alignof || in_lengthof))
 warning_at (loc, OPT_Wc___compat,
"defining type in %qs expression is invalid in C++",
(in_sizeof
 ? "sizeof"
-: (in_typeof ? "typeof" : "alignof")));
+: (in_typeof
+   ?

Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Alejandro Colomar
On Tue, Aug 06, 2024 at 02:22:38PM GMT, Alejandro Colomar wrote:
> Hi!
> 
> v4:
> 
> -  Only evaluate the operand if the top array is VLA.  Inner VLAs are
>ignored.  [Joseph, Martin]
>This proved very useful for compile-time diagnostics, since we have
>more cases that are constant expressions.
> -  Document the evaluation rules, which are unique to this operator
>(similar to sizeof, but we ignore inner VLAs).
> -  Add tests to the testsuite.  [Joseph]
> -  Swap diagnostic cases preference, to give more meaningful
>diagnostics.  [Martin]
> -  Document that Xavier was the first one to suggest this feature, and
>provide a link to the mail thread where that happened.
>BTW, while reading that discussion from 2 years ago, I see that it

Self-correction: s/2/4/

>was questioned the value of this operator.  Below is a rationale to
>defend it.
> -  Document that Martin's help has been crucial for implementing this,
>with 'Co-developed-by'.  Would you mind confirming that I can use
>that tag?
> -  CC += Kees, Qing, Jens
> 
> Rationale:
> 
> -  While compiler extensions already allow implementing ARRAY_SIZE()
>(), there's still no
>way to get the length of a function parameter which uses array
>notation.  While this first implementation doesn't support those yet
>(because there are some issues that need to be fixed first), the plan
>is to add support to those.  This would be a huge step towards arrays
>being first-class citizens in C.  In those cases, it would reduce the
>chance of programmer errors.  See for example
>.  That entire class of bugs
>would be over, _and_ programs would become simpler.
> 
> Some specific questions or concerns:
> 
> -  The tests seem to work as expected if I compile them manually, and
>run (the one that should be run) as a normal program.  The one that
>should not be run also gives the expected diagnostics.
>Can anyone give advice of why it's not running well under the test
>suite?
> 
> -  I don't like the fact that [*][n] is internally implemented exactly
>like [0][n], which makes them indistinguishable.  All other cases of
>[0] return a constent expression of value 0, but [0][n] must return a
>variable 0, to keep support for [*][n].
>Could you please change the way [*][n] (and thus [*]) is represented
>internally so that it can be differentiated from [0]?
>Do you have in mind any other way that would be a viable
>implementation of [*] that would allow distinguishing [0][n] and
>[*][n]?  Maybe making it to have one node instead of zero and mark
>that node specially?
> 
> At the bottom of this email is a range-diff against v3.
> 
> And below is a test program I used while developing the feature.  It is
> quite similar to what's on the test suite (patch 4/4), since those are
> based on this one.
> 
> It has comments where I'd like more diagnostics, but those are not
> responsibility of this feature.  Some are fault of the representation
> for [*], and others are already being worked on by Martin.  There are
> also comments on code that causes compile-time errors as expected
> (wanted).  Some assertions about evaluation of the operand are commented
> out because due to the problems with [*][n] and [0][n] we have more
> evaluation than I'd like.  However, those are only with [0], which is
> not yet well supported by GCC, so we don't need to worry much for now.
> 
> The program below also compares with sizeof and alignof, which the
> test-suite tests do not.
> 
> Have a lovely day!
> Alex
> 
>   $ cat len.c 
>   #include 
>   #include 
>   #include 
> 
> 
>   #define memberof(T, member)   \
>   ( \
>   (T){}.member  \
>   )
> 
> 
>   struct s {
>   int x;
>   int y[8];
>   int z[];
>   };
> 
> 
>   struct s2 {
>   int x;
>   int z[] __attribute__((counted_by(x)));
>   };
> 
> 
>   extern int x[];
> 
> 
>   void array(void);
>   void incomplete_err(int inc[]);
>   void unspecified_err(void);
>   void vla(void);
>   void member_array(void);
>   void fam_err(void);
>   void vla_eval(void);
>   void in_vla_noeval(void);
>   void in_vla_noeval2(void);
>   void array_noeval(void);
>   void vla_eval2(void);
>   void matrix_0(void);
>   void matrix_fixed(void);
>   void matrix_vla(void);
>   void f_fixed(void);
>   void f_zero(void);
>   void f_vla(void);
>   void f_star(void);
> 
> 
>   int
>   main(int argc, char *argv[argc + 1])
>   {
>   (void) argv;
> 
>   // Wishlist:
>   //n = lengthof(argv);
> 

[PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode precision [PR116202]

2024-08-06 Thread pan2 . li
From: Pan Li 

The .SAT_TRUNC vect pattern recog is valid when the lhs type has
its mode precision.  For example as below, QImode with 1 bit precision
like _Bool is invalid here.

g_12 = (long unsigned int) _2;
_13 = MIN_EXPR ;
_3 = (_Bool) _13;

The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest
only has 1 bit precision with QImode mode.  Aka the type doesn't have
the mode precision.

The below tests are passed for this patch.
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

PR target/116202

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_sat_trunc_pattern): Add the
type_has_mode_precision_p check for the lhs type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr116202-run-1.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/base/pr116202-run-1.c   | 24 +++
 gcc/tree-vect-patterns.cc |  5 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
new file mode 100644
index 000..d150f20b5d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */
+
+int b[24];
+_Bool c[24];
+
+int main() {
+  for (int f = 0; f < 4; ++f)
+b[f] = 6;
+
+  for (int f = 0; f < 24; f += 4)
+c[f] = ({
+  int g = ({
+unsigned long g = -b[f];
+1 < g ? 1 : g;
+  });
+  g;
+});
+
+  if (c[0] != 1)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4674a16d15f..74f80587b0e 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4695,11 +4695,12 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
 
   tree ops[1];
   tree lhs = gimple_assign_lhs (last_stmt);
+  tree otype = TREE_TYPE (lhs);
 
-  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))
+  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)
+  && type_has_mode_precision_p (otype))
 {
   tree itype = TREE_TYPE (ops[0]);
-  tree otype = TREE_TYPE (lhs);
   tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
   tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
   internal_fn fn = IFN_SAT_TRUNC;
-- 
2.43.0



RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-06 Thread Li, Pan2
> Ah, yeah - that's the usual (premature) frontend optimization to
> shorten operations after the standard
> mandated standard conversion (to 'int' in this case).

Thanks Richard for confirmation, let me refine the matching in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 6, 2024 7:50 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

On Tue, Aug 6, 2024 at 3:21 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> It looks like the plus will have additional convert to unsigned in int8 and 
> int16, see below example in test.c.006t.gimple.
> And we need these convert ops in one matching pattern to cover all int scalar 
> types.

Ah, yeah - that's the usual (premature) frontend optimization to
shorten operations after the standard
mandated standard conversion (to 'int' in this case).

> I am not sure if there is a better way here, given convert in matching 
> pattern is not very elegant up to a point.
>
> int16_t
> add_i16 (int16_t a, int16_t b)
> {
>   int16_t sum = a + b;
>   return sum;
> }
>
> int32_t
> add_i32 (int32_t a, int32_t b)
> {
>   int32_t sum = a + b;
>   return sum;
> }
>
> --- 006t.gimple ---
> int16_t add_i16 (int16_t a, int16_t b)
> {
>   int16_t D.2815;
>   int16_t sum;
>
>   a.0_1 = (unsigned short) a;
>   b.1_2 = (unsigned short) b;
>   _3 = a.0_1 + b.1_2;
>   sum = (int16_t) _3;
>   D.2815 = sum;
>   return D.2815;
> }
>
> int32_t add_i32 (int32_t a, int32_t b)
> {
>   int32_t D.2817;
>   int32_t sum;
>
>   sum = a + b;
>   D.2817 = sum;
>   return D.2817;
> }
>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Monday, August 5, 2024 9:52 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> Thanks Richard for comments.
>
> > The convert looks odd to me given @0 is involved in both & operands.
>
> The convert is introduced as the GIMPLE IL is somehow different for int8_t 
> when compares to int32_t or int64_t.
> There are some additional ops convert to unsigned for plus, see below line 
> 8-9 and line 22-23.
> But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the 
> types from int8_t to int64_t, add the
> convert here.
>
> Or may be I have some mistake in the example, let me revisit it and send v2 
> if no surprise.
>
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y)
>6   │ {
>7   │   int8_t sum;
>8   │   unsigned char x.1_1;
>9   │   unsigned char y.2_2;
>   10   │   unsigned char _3;
>   11   │   signed char _4;
>   12   │   signed char _5;
>   13   │   int8_t _6;
>   14   │   _Bool _11;
>   15   │   signed char _12;
>   16   │   signed char _13;
>   17   │   signed char _14;
>   18   │   signed char _22;
>   19   │   signed char _23;
>   20   │
>   21   │[local count: 1073741822]:
>   22   │   x.1_1 = (unsigned char) x_7(D);
>   23   │   y.2_2 = (unsigned char) y_8(D);
>   24   │   _3 = x.1_1 + y.2_2;
>   25   │   sum_9 = (int8_t) _3;
>   26   │   _4 = x_7(D) ^ y_8(D);
>   27   │   _5 = x_7(D) ^ sum_9;
>   28   │   _23 = ~_4;
>   29   │   _22 = _5 & _23;
>   30   │   if (_22 < 0)
>   31   │ goto ; [41.00%]
>   32   │   else
>   33   │ goto ; [59.00%]
>   34   │
>   35   │[local count: 259738146]:
>   36   │   _11 = x_7(D) < 0;
>   37   │   _12 = (signed char) _11;
>   38   │   _13 = -_12;
>   39   │   _14 = _13 ^ 127;
>   40   │
>   41   │[local count: 1073741824]:
>   42   │   # _6 = PHI <_14(3), sum_9(2)>
>   43   │   return _6;
>   44   │
>   45   │ }
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 5, 2024 7:16 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> On Mon, Aug 5, 2024 at 9:14 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 1 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 1:
> >   #define DEF_SAT_S_ADD_FMT_1(T) \
> >   T __attribute__((noinline))\
> >   sat_s_add_##T##_fmt_1 (T x, T y)   \
> >   {  \
> > T min = (T)1u << (sizeof (T) * 8 - 1);   \
> > T max = min - 1; \
> > return (x ^ y) < 0   \
> >   ? (T)(x + y)   \
> >   : ((T)(x + y) ^ x) >= 0\
> > ? (T)(x + y) \
> > : x < 0 ? min : max; \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_1 (int64_t)
> >
> > We can tell the differenc

[PATCH] tree-optimization/116166 - forward jump-threading going wild

2024-08-06 Thread Richard Biener
Currently the forward threader isn't limited as to the search space
it explores and with it now using path-ranger for simplifying
conditions it runs into it became pretty slow for degenerate cases
like compiling insn-emit.cc for RISC-V esp. when compiling for
a host with LOGICAL_OP_NON_SHORT_CIRCUIT disabled.

The following makes the forward threader honor the search space
limit I introduced for the backward threader.  This reduces
compile-time from minutes to seconds for the testcase in PR116166.

Note this wasn't necessary before we had ranger but with ranger
the work we do is quadatic in the length of the threading path
we build up (the same is true for the backwards threader).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR tree-optimization/116166
* tree-ssa-threadedge.h (jump_threader::thread_around_empty_blocks):
Add limit parameter.
(jump_threader::thread_through_normal_block): Likewise.
* tree-ssa-threadedge.cc (jump_threader::thread_around_empty_blocks):
Honor and decrement limit parameter.
(jump_threader::thread_through_normal_block): Likewise.
(jump_threader::thread_across_edge): Initialize limit from
param_max_jump_thread_paths and pass it down to workers.
---
 gcc/tree-ssa-threadedge.cc | 30 ++
 gcc/tree-ssa-threadedge.h  |  4 ++--
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-threadedge.cc b/gcc/tree-ssa-threadedge.cc
index 7f82639b8ec..0aa2aa85143 100644
--- a/gcc/tree-ssa-threadedge.cc
+++ b/gcc/tree-ssa-threadedge.cc
@@ -786,13 +786,17 @@ propagate_threaded_block_debug_into (basic_block dest, 
basic_block src)
 bool
 jump_threader::thread_around_empty_blocks (vec *path,
   edge taken_edge,
-  bitmap visited)
+  bitmap visited, unsigned &limit)
 {
   basic_block bb = taken_edge->dest;
   gimple_stmt_iterator gsi;
   gimple *stmt;
   tree cond;
 
+  if (limit == 0)
+return false;
+  --limit;
+
   /* The key property of these blocks is that they need not be duplicated
  when threading.  Thus they cannot have visible side effects such
  as PHI nodes.  */
@@ -830,7 +834,8 @@ jump_threader::thread_around_empty_blocks 
(vec *path,
  m_registry->push_edge (path, taken_edge, EDGE_NO_COPY_SRC_BLOCK);
  m_state->append_path (taken_edge->dest);
  bitmap_set_bit (visited, taken_edge->dest->index);
- return thread_around_empty_blocks (path, taken_edge, visited);
+ return thread_around_empty_blocks (path, taken_edge, visited,
+limit);
}
}
 
@@ -872,7 +877,7 @@ jump_threader::thread_around_empty_blocks 
(vec *path,
   m_registry->push_edge (path, taken_edge, EDGE_NO_COPY_SRC_BLOCK);
   m_state->append_path (taken_edge->dest);
 
-  thread_around_empty_blocks (path, taken_edge, visited);
+  thread_around_empty_blocks (path, taken_edge, visited, limit);
   return true;
 }
 
@@ -899,8 +904,13 @@ jump_threader::thread_around_empty_blocks 
(vec *path,
 
 int
 jump_threader::thread_through_normal_block (vec *path,
-   edge e, bitmap visited)
+   edge e, bitmap visited,
+   unsigned &limit)
 {
+  if (limit == 0)
+return 0;
+  limit--;
+
   m_state->register_equivs_edge (e);
 
   /* PHIs create temporary equivalences.
@@ -989,7 +999,7 @@ jump_threader::thread_through_normal_block 
(vec *path,
 visited.  This may be overly conservative.  */
  bitmap_set_bit (visited, dest->index);
  bitmap_set_bit (visited, e->dest->index);
- thread_around_empty_blocks (path, taken_edge, visited);
+ thread_around_empty_blocks (path, taken_edge, visited, limit);
  return 1;
}
 }
@@ -1075,9 +1085,12 @@ jump_threader::thread_across_edge (edge e)
   bitmap_set_bit (visited, e->src->index);
   bitmap_set_bit (visited, e->dest->index);
 
+  /* Limit search space.  */
+  unsigned limit = param_max_jump_thread_paths;
+
   int threaded = 0;
   if ((e->flags & EDGE_DFS_BACK) == 0)
-threaded = thread_through_normal_block (path, e, visited);
+threaded = thread_through_normal_block (path, e, visited, limit);
 
   if (threaded > 0)
 {
@@ -1148,11 +1161,12 @@ jump_threader::thread_across_edge (edge e)
m_registry->push_edge (path, e, EDGE_START_JUMP_THREAD);
m_registry->push_edge (path, taken_edge, EDGE_COPY_SRC_JOINER_BLOCK);
 
-   found = thread_around_empty_blocks (path, taken_edge, visited);
+   found = thread_around_empty_blocks (path, taken_edge, visited, limit);
 
if (!found)
  found = thread_through_normal_block (path,
-   

Re: [PATCH] gimple ssa: Put SCCOPY logic into a class

2024-08-06 Thread Richard Biener
On Tue, 6 Aug 2024, Filip Kastl wrote:

> Hello everybody,
> 
> In pr113054[1] Andrew said that he doesn't like the 'dead_stmts' static
> variable I used when implementing the sccopy pass.  We agreed that wrapping
> the relevant code from the pass in a class would be most likely the best
> solution.  Here is a patch that does exactly that.  I waited until stage 1 to
> submit it.
> 
> Bootstrapped and regtested on x86_64.  Is the patch ok to be pushed to trunk?

OK.

Richard.

> Cheers,
> Filip Kastl
> 
> 
> [1]
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113054
> 
> 
> -- 8< --
> 
> 
> Currently the main logic of the sccopy pass is implemented as static
> functions.  This patch instead puts the code into a class.  This also
> gets rid of a global variable (dead_stmts).
> 
> gcc/ChangeLog:
> 
>   * gimple-ssa-sccopy.cc (class scc_copy_prop): New class.
>   (replace_scc_by_value): Put into...
>   (scc_copy_prop::replace_scc_by_value): ...scc_copy_prop.
>   (sccopy_visit_op): Put into...
>   (scc_copy_prop::visit_op): ...scc_copy_prop.
>   (sccopy_propagate): Put into...
>   (scc_copy_prop::propagate): ...scc_copy_prop.
>   (init_sccopy): Replace by...
>   (scc_copy_prop::scc_copy_prop): ...the construtor.
>   (finalize_sccopy): Replace by...
>   (scc_copy_prop::~scc_copy_prop): ...the destructor.
>   (pass_sccopy::execute): Use scc_copy_prop.
> 
> Signed-off-by: Filip Kastl 
> ---
>  gcc/gimple-ssa-sccopy.cc | 66 ++--
>  1 file changed, 37 insertions(+), 29 deletions(-)
> 
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> index 191a4c0b451..d9eaeab4abb 100644
> --- a/gcc/gimple-ssa-sccopy.cc
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -94,11 +94,6 @@ along with GCC; see the file COPYING3.  If not see
>  
>  namespace {
>  
> -/* Bitmap tracking statements which were propagated to be removed at the end 
> of
> -   the pass.  */
> -
> -static bitmap dead_stmts;
> -
>  /* State of vertex during SCC discovery.
>  
> unvisited  Vertex hasn't yet been popped from worklist.
> @@ -459,11 +454,33 @@ get_all_stmt_may_generate_copy (void)
>return result;
>  }
>  
> +/* SCC copy propagation
> +
> +   'scc_copy_prop::propagate ()' is the main function of this pass.  */
> +
> +class scc_copy_prop
> +{
> +public:
> +  scc_copy_prop ();
> +  ~scc_copy_prop ();
> +  void propagate ();
> +
> +private:
> +  /* Bitmap tracking statements which were propagated so that they can be
> + removed at the end of the pass.  */
> +  bitmap dead_stmts;
> +
> +  void visit_op (tree op, hash_set &outer_ops,
> + hash_set &scc_set, bool &is_inner,
> + tree &last_outer_op);
> +  void replace_scc_by_value (vec scc, tree val);
> +};
> +
>  /* For each statement from given SCC, replace its usages by value
> VAL.  */
>  
> -static void
> -replace_scc_by_value (vec scc, tree val)
> +void
> +scc_copy_prop::replace_scc_by_value (vec scc, tree val)
>  {
>for (gimple *stmt : scc)
>  {
> @@ -476,12 +493,12 @@ replace_scc_by_value (vec scc, tree val)
>  fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
>  }
>  
> -/* Part of 'sccopy_propagate ()'.  */
> +/* Part of 'scc_copy_prop::propagate ()'.  */
>  
> -static void
> -sccopy_visit_op (tree op, hash_set &outer_ops,
> -  hash_set &scc_set, bool &is_inner,
> -  tree &last_outer_op)
> +void
> +scc_copy_prop::visit_op (tree op, hash_set &outer_ops,
> +  hash_set &scc_set, bool &is_inner,
> +  tree &last_outer_op)
>  {
>bool op_in_scc = false;
>  
> @@ -539,8 +556,8 @@ sccopy_visit_op (tree op, hash_set &outer_ops,
>   Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
>   Section 3.2.  */
>  
> -static void
> -sccopy_propagate ()
> +void
> +scc_copy_prop::propagate ()
>  {
>auto_vec useful_stmts = get_all_stmt_may_generate_copy ();
>scc_discovery discovery;
> @@ -575,14 +592,12 @@ sccopy_propagate ()
>   for (j = 0; j < gimple_phi_num_args (phi); j++)
> {
>   op = gimple_phi_arg_def (phi, j);
> - sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> -last_outer_op);
> + visit_op (op, outer_ops, scc_set, is_inner, last_outer_op);
> }
>   break;
> case GIMPLE_ASSIGN:
>   op = gimple_assign_rhs1 (stmt);
> - sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> -last_outer_op);
> + visit_op (op, outer_ops, scc_set, is_inner, last_outer_op);
>   break;
> default:
>   gcc_unreachable ();
> @@ -613,19 +628,13 @@ sccopy_propagate ()
>  }
>  }
>  
> -/* Called when pass execution starts.  */
> -
> -static void
> -init_sccopy (void)
> +scc_copy_prop::scc_copy_prop ()
>  {
>

Re: [PATCH] gimple ssa: Put SCCOPY logic into a class

2024-08-06 Thread Filip Kastl
On Tue 2024-08-06 15:14:32, Richard Biener wrote:
> On Tue, 6 Aug 2024, Filip Kastl wrote:
> 
> > Hello everybody,
> > 
> > In pr113054[1] Andrew said that he doesn't like the 'dead_stmts' static
> > variable I used when implementing the sccopy pass.  We agreed that wrapping
> > the relevant code from the pass in a class would be most likely the best
> > solution.  Here is a patch that does exactly that.  I waited until stage 1 
> > to
> > submit it.
> > 
> > Bootstrapped and regtested on x86_64.  Is the patch ok to be pushed to 
> > trunk?
> 
> OK.
> 
> Richard.
> 

Thanks, pushed.

Filip


Re: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode precision [PR116202]

2024-08-06 Thread Richard Biener
On Tue, Aug 6, 2024 at 2:59 PM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC vect pattern recog is valid when the lhs type has
> its mode precision.  For example as below, QImode with 1 bit precision
> like _Bool is invalid here.
>
> g_12 = (long unsigned int) _2;
> _13 = MIN_EXPR ;
> _3 = (_Bool) _13;
>
> The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest
> only has 1 bit precision with QImode mode.  Aka the type doesn't have
> the mode precision.
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> PR target/116202
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_trunc_pattern): Add the
> type_has_mode_precision_p check for the lhs type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../riscv/rvv/base/pr116202-run-1.c   | 24 +++
>  gcc/tree-vect-patterns.cc |  5 ++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> new file mode 100644
> index 000..d150f20b5d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */
> +
> +int b[24];
> +_Bool c[24];
> +
> +int main() {
> +  for (int f = 0; f < 4; ++f)
> +b[f] = 6;
> +
> +  for (int f = 0; f < 24; f += 4)
> +c[f] = ({
> +  int g = ({
> +unsigned long g = -b[f];
> +1 < g ? 1 : g;
> +  });
> +  g;
> +});
> +
> +  if (c[0] != 1)
> +__builtin_abort ();
> +}
> +
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 4674a16d15f..74f80587b0e 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4695,11 +4695,12 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>
>tree ops[1];
>tree lhs = gimple_assign_lhs (last_stmt);
> +  tree otype = TREE_TYPE (lhs);
>
> -  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)
> +  && type_has_mode_precision_p (otype))
>  {
>tree itype = TREE_TYPE (ops[0]);
> -  tree otype = TREE_TYPE (lhs);
>tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
>tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>internal_fn fn = IFN_SAT_TRUNC;
> --
> 2.43.0
>


Re: [PATCH] RISC-V: Minimal support for Zimop extension.

2024-08-06 Thread Jeff Law




On 8/6/24 3:31 AM, Nick Clifton wrote:

Hi Jeff,

2.43 was released over an weekend.  Is it possible to let it be 
supported after 2.44? cc Nick and jan.
I don't think it's critical enough to backport to 2.43.  I'd just put 
it on the trunk so that it's available in 2.44.


It might be worth adding it to the 2.43 branch as well.  It is looking
like there will be need to create a point release this time as several
other last-minute problems have been uncovered and fixed just too late
to make it into the 2.43 release.
I certainly wouldn't object.  It'll make my life marginally easier as 
we're carrying Lyut's version as one of the very few remaining local 
changes to binutils+gdb.


Jeff



Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Martin Uecker
Am Dienstag, dem 06.08.2024 um 14:22 +0200 schrieb Alejandro Colomar:
> Hi!
> 
> -  The tests seem to work as expected if I compile them manually, and
>run (the one that should be run) as a normal program.  The one that
>should not be run also gives the expected diagnostics.
>Can anyone give advice of why it's not running well under the test
>suite?

What is the output?  You get an additional warning / error.

> 
> -  I don't like the fact that [*][n] is internally implemented exactly
>like [0][n], which makes them indistinguishable.  All other cases of
>[0] return a constent expression of value 0, but [0][n] must return a
>variable 0, to keep support for [*][n].
>Could you please change the way [*][n] (and thus [*]) is represented
>internally so that it can be differentiated from [0]?
>Do you have in mind any other way that would be a viable
>implementation of [*] that would allow distinguishing [0][n] and
>[*][n]?  Maybe making it to have one node instead of zero and mark
>that node specially?

The C++ frontend encodes zero-sized arrays using a range of [0,-1]. 
I have a half-finished patch which implements this for the C FE.


Martin



Re: [PATCH] c++: remove function/var concepts code

2024-08-06 Thread Marek Polacek
On Mon, Aug 05, 2024 at 09:56:20PM -0400, Patrick Palka wrote:
> On Fri, 2 Aug 2024, Marek Polacek wrote:
> 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu.  Comments?
> > 
> > -- >8 --
> > This patch removes vestigial Concepts TS code as discussed in
> > .
> 
> Yay!  FWIW I think we can also remove the concept_check_p checks in
> 
> cxx_eval_call_expression
> cxx_eval_outermost_constant_expr
> cp_genericize_r 
> check_noexcept_r

Thanks, I'm testing a patch.
 
> And perhaps we could rename *concept_check* to *concept_id* throughout
> to match the standard terminology.

Maybe...

Marek



Re: [PATCHv2, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-06 Thread Jeff Law




On 8/6/24 1:20 AM, HAO CHEN GUI wrote:

Hi Jeff,

在 2024/8/5 23:11, Jeff Law 写道:

We'll probably need Richard S. or someone else to chime in on the actual patch, 
but yea, if they can leverage stp, it's likely going to be better than actual 
vectors.

Do we have a testcase for this issue or was it something you just happened to 
notice?


I will refine the patch and ask for Richard's advice.

The auto CI detects the aarch64 regression cases for all submitted patches.
I received its report after sending my "clear by pieces" patch and got
following regression cases.

FAIL: gcc.target/aarch64/auto-init-padding-11.c scan-assembler stp\txzr, xzr,
FAIL: gcc.target/aarch64/auto-init-padding-5.c scan-assembler-times stp\txzr, 
xzr, 2
FAIL: gcc.target/aarch64/memset-corner-cases.c check-function-bodies set0scalar
FAIL: gcc.target/aarch64/memset-q-reg.c check-function-bodies set128bitszero
FAIL: gcc.target/aarch64/memset-q-reg.c check-function-bodies set256bitszero

Then I made the patch and make sure all regression cases can be fixed.
Thanks for clarifying.  Yea, we definitely want to fix this as part of 
the kit then.  Richard S. is the right person to start with WRT the 
aarch64 patch.


Jeff



[PATCH] c++: fold calls to std::forward_like [PR96780]

2024-08-06 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK  for trunk?

-- >8 --

This extends our folding of cast-like standard library functions
to also include C++23's std::forward_like.

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : Fold calls
to std::forward_like as well.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: Test std::forward_like folding.
---
 gcc/cp/cp-gimplify.cc  | 3 ++-
 gcc/testsuite/g++.dg/opt/pr96780.C | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index b88c3b7f370..3db9657ae93 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3316,7 +3316,8 @@ cp_fold (tree x, fold_flags_t flags)
|| id_equal (DECL_NAME (callee), "addressof")
/* This addressof equivalent is used heavily in libstdc++.  */
|| id_equal (DECL_NAME (callee), "__addressof")
-   || id_equal (DECL_NAME (callee), "as_const")))
+   || id_equal (DECL_NAME (callee), "as_const")
+   || id_equal (DECL_NAME (callee), "forward_like")))
  {
r = CALL_EXPR_ARG (x, 0);
/* Check that the return and argument types are sane before
diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C 
b/gcc/testsuite/g++.dg/opt/pr96780.C
index 61e11855eeb..a29cda8b836 100644
--- a/gcc/testsuite/g++.dg/opt/pr96780.C
+++ b/gcc/testsuite/g++.dg/opt/pr96780.C
@@ -29,6 +29,10 @@ void f() {
   auto&& x11 = std::as_const(a);
   auto&& x12 = std::as_const(ca);
 #endif
+#if __cpp_lib_forward_like
+  auto&& x13 = std::forward_like(a);
+  auto&& x14 = std::forward_like(ca);
+#endif
 }
 
 // { dg-final { scan-tree-dump-not "= std::move" "gimple" } }
@@ -36,3 +40,4 @@ void f() {
 // { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } }
 // { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } }
 // { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
+// { dg-final { scan-tree-dump-not "= std::forward_like" "gimple" } }
-- 
2.46.0.39.g891ee3b9db



Re: [PATCH] c++: fold calls to std::forward_like [PR96780]

2024-08-06 Thread Marek Polacek
On Tue, Aug 06, 2024 at 10:01:22AM -0400, Patrick Palka wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
> look OK  for trunk?

Looks simple & good.

Reviewed-by: Marek Polacek 
 
> -- >8 --
> 
> This extends our folding of cast-like standard library functions
> to also include C++23's std::forward_like.
> 
>   PR c++/96780
> 
> gcc/cp/ChangeLog:
> 
>   * cp-gimplify.cc (cp_fold) : Fold calls
>   to std::forward_like as well.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/opt/pr96780.C: Test std::forward_like folding.
> ---
>  gcc/cp/cp-gimplify.cc  | 3 ++-
>  gcc/testsuite/g++.dg/opt/pr96780.C | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index b88c3b7f370..3db9657ae93 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -3316,7 +3316,8 @@ cp_fold (tree x, fold_flags_t flags)
>   || id_equal (DECL_NAME (callee), "addressof")
>   /* This addressof equivalent is used heavily in libstdc++.  */
>   || id_equal (DECL_NAME (callee), "__addressof")
> - || id_equal (DECL_NAME (callee), "as_const")))
> + || id_equal (DECL_NAME (callee), "as_const")
> + || id_equal (DECL_NAME (callee), "forward_like")))
> {
>   r = CALL_EXPR_ARG (x, 0);
>   /* Check that the return and argument types are sane before
> diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C 
> b/gcc/testsuite/g++.dg/opt/pr96780.C
> index 61e11855eeb..a29cda8b836 100644
> --- a/gcc/testsuite/g++.dg/opt/pr96780.C
> +++ b/gcc/testsuite/g++.dg/opt/pr96780.C
> @@ -29,6 +29,10 @@ void f() {
>auto&& x11 = std::as_const(a);
>auto&& x12 = std::as_const(ca);
>  #endif
> +#if __cpp_lib_forward_like
> +  auto&& x13 = std::forward_like(a);
> +  auto&& x14 = std::forward_like(ca);
> +#endif
>  }
>  
>  // { dg-final { scan-tree-dump-not "= std::move" "gimple" } }
> @@ -36,3 +40,4 @@ void f() {
>  // { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } }
>  // { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } }
>  // { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
> +// { dg-final { scan-tree-dump-not "= std::forward_like" "gimple" } }
> -- 
> 2.46.0.39.g891ee3b9db
> 

Marek



[PATCH v1 0/4] dwarf2: add hooks for architecture-specific CFIs

2024-08-06 Thread Matthieu Longo
Architecture-specific CFI directives are currently declared an processed among 
others architecture-independent CFI directives in gcc/dwarf2* files. This 
approach creates confusion, specifically in the case of DWARF instructions in 
the vendor space and using the same instruction code.
Such a clash currently happen between DW_CFA_GNU_window_save (used on SPARC) 
and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both having the same 
instruction code 0x2d. Then AArch64 compilers generates a SPARC CFI directive 
(.cfi_window_save) instead of .cfi_negate_ra_state, contrarily to what is 
expected in [1].

1. Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE

This patch renames:
- dwarf2out_frame_debug_cfa_toggle_ra_mangle to 
dwarf2out_frame_debug_cfa_negate_ra_state,
- REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE,
as the naming was misleading.
The word "toggle" suggested a binary state, whereas this register stores the 
mangling state (that can be more than 2 states) for the return address on 
AArch64.

2. dwarf2: add hooks for architecture-specific CFIs

This refactoring does not solve completely the problem, but improve the 
situation by moving some of the processing of those directives (more 
specifically their output in the assembly) to the backend via 2 target hooks:
- DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any).
- OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string.
Only AArch64's and SPARC's backend are impacted.

3. aarch64 testsuite: explain expectections for pr94515*
PR94515's tests in AArch64 G++ testsuite were lacking documentation. They are 
now thoroughly documented.

4. dwarf2: store the RA state in CFI row

On AArch64, the RA state informs the unwinder whether the return address is 
mangled and how, or not. This information is encoded in a boolean in the CFI 
row. This binary approach prevents from expressing more complex configuration, 
as it is the case with PAuth_LR introduced in Armv9.5-A.
This patch addresses this limitation by replacing the boolean by an enum.


References:
[1] DWARF for the Arm 64-bit Architecture (AArch64) --> 
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst

## Testing

Built for target aarch64-unknown-linux-gnu and ran GCC's & G++'s testsuites for 
AArch64.
Built GCC stage 1 for target sparc64-unknown-linux-gnu.


Ok for master? I don't have commit access so I need someone to commit on my 
behalf.

Regards,
Matthieu.


Matthieu Longo (4):
  Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE
  dwarf2: add hooks for architecture-specific CFIs
  aarch64 testsuite: explain expectections for pr94515* tests
  dwarf2: store the RA state in CFI row

 gcc/combine-stack-adj.cc |  2 +-
 gcc/config/aarch64/aarch64.cc| 36 -
 gcc/config/sparc/sparc.cc| 36 +
 gcc/coretypes.h  |  6 +++
 gcc/doc/tm.texi  | 28 ++
 gcc/doc/tm.texi.in   | 17 ++
 gcc/dwarf2cfi.cc | 57 ++--
 gcc/dwarf2out.cc | 13 +++--
 gcc/dwarf2out.h  | 10 ++--
 gcc/reg-notes.def|  8 +--
 gcc/target.def   | 20 +++
 gcc/testsuite/g++.target/aarch64/pr94515-1.C | 14 +++--
 gcc/testsuite/g++.target/aarch64/pr94515-2.C | 30 ---
 libffi/include/ffi_cfi.h |  2 +
 libgcc/config/aarch64/aarch64-asm.h  |  4 +-
 libitm/config/aarch64/sjlj.S | 10 ++--
 16 files changed, 233 insertions(+), 60 deletions(-)

-- 
2.46.0



[PATCH v1 1/4] Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE

2024-08-06 Thread Matthieu Longo
The current name REG_CFA_TOGGLE_RA_MANGLE is not representative of what
it really is, i.e. a register to represent several states, not only a
binary one. Same for dwarf2out_frame_debug_cfa_toggle_ra_mangle.

gcc/ChangeLog:

* combine-stack-adj.cc
(no_unhandled_cfa): Rename.
* config/aarch64/aarch64.cc
(aarch64_expand_prologue): Rename.
(aarch64_expand_epilogue): Rename.
* dwarf2cfi.cc
(dwarf2out_frame_debug_cfa_toggle_ra_mangle): Rename this...
(dwarf2out_frame_debug_cfa_negate_ra_state): To this.
(dwarf2out_frame_debug): Rename.
* reg-notes.def (REG_CFA_NOTE): Rename REG_CFA_TOGGLE_RA_MANGLE.
---
 gcc/combine-stack-adj.cc  | 2 +-
 gcc/config/aarch64/aarch64.cc | 4 ++--
 gcc/dwarf2cfi.cc  | 8 
 gcc/reg-notes.def | 8 
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/combine-stack-adj.cc b/gcc/combine-stack-adj.cc
index 2da9bf2bc1e..367d3b66b74 100644
--- a/gcc/combine-stack-adj.cc
+++ b/gcc/combine-stack-adj.cc
@@ -212,7 +212,7 @@ no_unhandled_cfa (rtx_insn *insn)
   case REG_CFA_SET_VDRAP:
   case REG_CFA_WINDOW_SAVE:
   case REG_CFA_FLUSH_QUEUE:
-  case REG_CFA_TOGGLE_RA_MANGLE:
+  case REG_CFA_NEGATE_RA_STATE:
return false;
   }
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e0cf382998c..0af5d85c36f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -9606,7 +9606,7 @@ aarch64_expand_prologue (void)
  default:
gcc_unreachable ();
}
-  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
+  add_reg_note (insn, REG_CFA_NEGATE_RA_STATE, const0_rtx);
   RTX_FRAME_RELATED_P (insn) = 1;
 }
 
@@ -10027,7 +10027,7 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
  default:
gcc_unreachable ();
}
-  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
+  add_reg_note (insn, REG_CFA_NEGATE_RA_STATE, const0_rtx);
   RTX_FRAME_RELATED_P (insn) = 1;
 }
 
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index 1231b5bb5f0..4ad9acbd6fd 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -1547,13 +1547,13 @@ dwarf2out_frame_debug_cfa_window_save (void)
   cur_row->window_save = true;
 }
 
-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_TOGGLE_RA_MANGLE.
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_NEGATE_RA_STATE.
Note: DW_CFA_GNU_window_save dwarf opcode is reused for toggling RA mangle
state, this is a target specific operation on AArch64 and can only be used
on other targets if they don't use the window save operation otherwise.  */
 
 static void
-dwarf2out_frame_debug_cfa_toggle_ra_mangle (void)
+dwarf2out_frame_debug_cfa_negate_ra_state (void)
 {
   dw_cfi_ref cfi = new_cfi ();
 
@@ -2341,8 +2341,8 @@ dwarf2out_frame_debug (rtx_insn *insn)
handled_one = true;
break;
 
-  case REG_CFA_TOGGLE_RA_MANGLE:
-   dwarf2out_frame_debug_cfa_toggle_ra_mangle ();
+  case REG_CFA_NEGATE_RA_STATE:
+   dwarf2out_frame_debug_cfa_negate_ra_state ();
handled_one = true;
break;
 
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index 5b878fb2a1c..ddcf16b68be 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -180,10 +180,10 @@ REG_CFA_NOTE (CFA_WINDOW_SAVE)
the rest of the compiler as a CALL_INSN.  */
 REG_CFA_NOTE (CFA_FLUSH_QUEUE)
 
-/* Attached to insns that are RTX_FRAME_RELATED_P, toggling the mangling status
-   of return address.  Currently it's only used by AArch64.  The argument is
-   ignored.  */
-REG_CFA_NOTE (CFA_TOGGLE_RA_MANGLE)
+/* Attached to insns that are RTX_FRAME_RELATED_P, indicating an authentication
+   of the return address. Currently it's only used by AArch64.
+   The argument is ignored.  */
+REG_CFA_NOTE (CFA_NEGATE_RA_STATE)
 
 /* Indicates what exception region an INSN belongs in.  This is used
to indicate what region to which a call may throw.  REGION 0
-- 
2.46.0



[PATCH v1 3/4] aarch64 testsuite: explain expectections for pr94515* tests

2024-08-06 Thread Matthieu Longo
gcc/testsuite/ChangeLog:

* g++.target/aarch64/pr94515-1.C: Improve test documentation.
* g++.target/aarch64/pr94515-2.C: Same.
---
 gcc/testsuite/g++.target/aarch64/pr94515-1.C |  8 ++
 gcc/testsuite/g++.target/aarch64/pr94515-2.C | 28 +++-
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.target/aarch64/pr94515-1.C 
b/gcc/testsuite/g++.target/aarch64/pr94515-1.C
index d5c114a83a8..359039e1753 100644
--- a/gcc/testsuite/g++.target/aarch64/pr94515-1.C
+++ b/gcc/testsuite/g++.target/aarch64/pr94515-1.C
@@ -15,12 +15,20 @@ void unwind (void)
 __attribute__((noinline, noipa, target("branch-protection=pac-ret")))
 int test (int z)
 {
+  // paciasp -> cfi_negate_ra_state: RA_no_signing -> RA_signing_SP
   if (z) {
 asm volatile ("":::"x20","x21");
 unwind ();
+// autiasp -> cfi_negate_ra_state: RA_signing_SP -> RA_no_signing
 return 1;
   } else {
+// 2nd cfi_negate_ra_state because the CFI directives are processed 
linearily.
+// At this point, the unwinder would believe that the address is not signed
+// due to the previous return. That's why the compiler has to emit second
+// cfi_negate_ra_state to mean that the return address is still signed.
+// cfi_negate_ra_state: RA_no_signing -> RA_signing_SP
 unwind ();
+// autiasp -> cfi_negate_ra_state: RA_signing_SP -> RA_no_signing
 return 2;
   }
 }
diff --git a/gcc/testsuite/g++.target/aarch64/pr94515-2.C 
b/gcc/testsuite/g++.target/aarch64/pr94515-2.C
index f4abeed..af6b128b8fd 100644
--- a/gcc/testsuite/g++.target/aarch64/pr94515-2.C
+++ b/gcc/testsuite/g++.target/aarch64/pr94515-2.C
@@ -6,6 +6,7 @@
 volatile int zero = 0;
 int global = 0;
 
+/* This is a leaf function, so no .cfi_negate_ra_state directive is expected.  
*/
 __attribute__((noinline))
 int bar(void)
 {
@@ -13,29 +14,44 @@ int bar(void)
   return 0;
 }
 
+/* This function does not return normally, so the address is signed but no
+ * authentication code is emitted. It means that only one CFI directive is
+ * supposed to be emitted at signing time.  */
 __attribute__((noinline, noreturn))
 void unwind (void)
 {
   throw 42;
 }
 
+/* This function has several return instructions, and alternates different RA
+ * states. 4 .cfi_negate_ra_state and a .cfi_remember_state/.cfi_restore_state
+ * should be emitted.
+ */
 __attribute__((noinline, noipa))
 int test(int x)
 {
-  if (x==1) return 2; /* This return path may not use the stack.  */
+  // This return path may not use the stack. This means that the return address
+  // won't be signed.
+  if (x==1) return 2;
+
+  // All the return paths of the code below must have RA mangle state set, and
+  // the return address must be signed.
   int y = bar();
   if (y > global) global=y;
-  if (y==3) unwind(); /* This return path must have RA mangle state set.  */
-  return 0;
+  if (y==3) unwind(); // authentication of the return address is not required.
+  return 0; // authentication of the return address is required.
 }
 
+/* This function requires only 2 .cfi_negate_ra_state.  */
 int main ()
 {
+  // paciasp -> cfi_negate_ra_state: RA_no_signing -> RA_signing_SP
   try {
 test (zero);
-__builtin_abort ();
+__builtin_abort (); // authentication of the return address is not 
required.
   } catch (...) {
+// autiasp -> cfi_negate_ra_state: RA_signing_SP -> RA_no_signing
 return 0;
   }
-  __builtin_abort ();
-}
+  __builtin_abort (); // authentication of the return address is not required.
+}
\ No newline at end of file
-- 
2.46.0



[PATCH v1 2/4] dwarf2: add hooks for architecture-specific CFIs

2024-08-06 Thread Matthieu Longo
Architecture-specific CFI directives are currently declared an processed
among others architecture-independent CFI directives in gcc/dwarf2* files.
This approach creates confusion, specifically in the case of DWARF
instructions in the vendor space and using the same instruction code.

Such a clash currently happen between DW_CFA_GNU_window_save (used on
SPARC) and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both
having the same instruction code 0x2d.
Then AArch64 compilers generates a SPARC CFI directive (.cfi_window_save)
instead of .cfi_negate_ra_state, contrarilly to what is expected in
[DWARF for the Arm 64-bit Architecture (AArch64)](https://github.com/
ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst).

This refactoring does not solve completely the problem, but improve the
situation by moving some of the processing of those directives (more
specifically their output in the assembly) to the backend via 2 target
hooks:
- DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any).
- OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string.

Additionally, this patch also contains a renaming of an enum used for
return address mangling on AArch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_output_cfi_directive): New hook for CFI directives.
(aarch64_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* config/sparc/sparc.cc
(sparc_output_cfi_directive): New hook for CFI directives.
(sparc_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* coretypes.h
(struct dw_cfi_node): Forward declaration of CFI type from
gcc/dwarf2out.h.
(enum dw_cfi_oprnd_type): Same.
* doc/tm.texi: Regenerated from doc/tm.texi.in.
* doc/tm.texi.in: Add doc for OUTPUT_CFI_DIRECTIVE and
DW_CFI_OPRND1_DESC.
* dwarf2cfi.cc
(struct dw_cfi_row): Update the description for window_save
and ra_mangled.
(cfi_equal_p): Adapt parameter of dw_cfi_oprnd1_desc.
(dwarf2out_frame_debug_cfa_negate_ra_state): Use AArch64 CFI
directive instead of the SPARC one.
(change_cfi_row): Use the right CFI directive's name for RA
mangling.
(output_cfi): Remove explicit architecture-specific CFI
directive DW_CFA_GNU_window_save that falls into default case.
(output_cfi_directive): Use target hook as default.
* dwarf2out.cc (dw_cfi_oprnd1_desc): Use target hook as default.
* dwarf2out.h (enum dw_cfi_oprnd_type): specify underlying type
of enum to allow forward declaration.
(dw_cfi_oprnd1_desc): Change type of parameter.
(output_cfi_directive): Use dw_cfi_ref instead of struct
dw_cfi_node *.
* target.def: Documentation for new hooks.

libffi/ChangeLog:

* include/ffi_cfi.h (cfi_negate_ra_state): Declare AArch64 cfi
directive.

libgcc/ChangeLog:

* config/aarch64/aarch64-asm.h (PACIASP): Replace SPARC CFI
directive by AArch64 one.
(AUTIASP): Same.

libitm/ChangeLog:

* config/aarch64/sjlj.S: Replace SPARC CFI directive by
AArch64 one.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/pr94515-1.C: Replace SPARC CFI directive by
AArch64 one.
* g++.target/aarch64/pr94515-2.C: Same.
---
 gcc/config/aarch64/aarch64.cc| 32 +
 gcc/config/sparc/sparc.cc| 36 
 gcc/coretypes.h  |  6 
 gcc/doc/tm.texi  | 28 +++
 gcc/doc/tm.texi.in   | 17 +
 gcc/dwarf2cfi.cc | 29 ++--
 gcc/dwarf2out.cc | 13 ---
 gcc/dwarf2out.h  | 10 +++---
 gcc/target.def   | 20 +++
 gcc/testsuite/g++.target/aarch64/pr94515-1.C |  6 ++--
 gcc/testsuite/g++.target/aarch64/pr94515-2.C |  2 +-
 libffi/include/ffi_cfi.h |  2 ++
 libgcc/config/aarch64/aarch64-asm.h  |  4 +--
 libitm/config/aarch64/sjlj.S | 10 +++---
 14 files changed, 176 insertions(+), 39 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 0af5d85c36f..1f87779f40a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -59,6 +59,7 @@
 #include "opts.h"
 #include "gimplify.h"
 #include "dwarf2.h"
+#include "dwarf2out.h"
 #include "gimple-iterator.h"
 #include "tree-vectorizer.h"
 #include "aarch64-cost-tables.h"
@@ -1449,6 +1450,31 @@ aarch64_dwarf_frame_reg_mode (int regno)
   return default_dwarf_frame_reg_mode (regno);
 }
 
+/* Implement TARGET_OUTPU

[PATCH v1 4/4] dwarf2: store the RA state in CFI row

2024-08-06 Thread Matthieu Longo
On AArch64, the RA state informs the unwinder whether the return address
is mangled and how, or not. This information is encoded in a boolean in
the CFI row. This binary approach prevents from expressing more complex
configuration, as it is the case with PAuth_LR introduced in Armv9.5-A.

This patch addresses this limitation by replacing the boolean by an enum.

gcc/ChangeLog:

* dwarf2cfi.cc
(struct dw_cfi_row): Declare a new enum type to replace ra_mangled.
(cfi_row_equal_p): Use ra_state instead of ra_mangled.
(dwarf2out_frame_debug_cfa_negate_ra_state): Same.
(change_cfi_row): Same.
---
 gcc/dwarf2cfi.cc | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index 6c80e0b17bd..023f61fb712 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -57,6 +57,15 @@ along with GCC; see the file COPYING3.  If not see
 #define DEFAULT_INCOMING_FRAME_SP_OFFSET INCOMING_FRAME_SP_OFFSET
 #endif
 
+
+/* Signing method used for return address authentication.
+   (AArch64 extension)*/
+typedef enum
+{
+  RA_no_signing = 0x0,
+  RA_signing_SP = 0x1,
+} RA_signing_method_t;
+
 /* A collected description of an entire row of the abstract CFI table.  */
 struct GTY(()) dw_cfi_row
 {
@@ -74,8 +83,8 @@ struct GTY(()) dw_cfi_row
   bool window_save;
 
   /* Aarch64 extension for DW_CFA_AARCH64_negate_ra_state.
- True if the return address is in a mangled state.  */
-  bool ra_mangled;
+ Enum which stores the return address state.  */
+  RA_signing_method_t ra_state;
 };
 
 /* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
@@ -859,7 +868,7 @@ cfi_row_equal_p (dw_cfi_row *a, dw_cfi_row *b)
   if (a->window_save != b->window_save)
 return false;
 
-  if (a->ra_mangled != b->ra_mangled)
+  if (a->ra_state != b->ra_state)
 return false;
 
   return true;
@@ -1556,8 +1565,11 @@ dwarf2out_frame_debug_cfa_negate_ra_state (void)
 {
   dw_cfi_ref cfi = new_cfi ();
   cfi->dw_cfi_opc = DW_CFA_AARCH64_negate_ra_state;
+  cur_row->ra_state =
+ (cur_row->ra_state == RA_no_signing)
+ ? RA_signing_SP
+ : RA_no_signing;
   add_cfi (cfi);
-  cur_row->ra_mangled = !cur_row->ra_mangled;
 }
 
 /* Record call frame debugging information for an expression EXPR,
@@ -2414,12 +2426,12 @@ change_cfi_row (dw_cfi_row *old_row, dw_cfi_row 
*new_row)
 {
   dw_cfi_ref cfi = new_cfi ();
 
-  gcc_assert (!old_row->ra_mangled && !new_row->ra_mangled);
+  gcc_assert (!old_row->ra_state && !new_row->ra_state);
   cfi->dw_cfi_opc = DW_CFA_GNU_window_save;
   add_cfi (cfi);
 }
 
-  if (old_row->ra_mangled != new_row->ra_mangled)
+  if (old_row->ra_state != new_row->ra_state)
 {
   dw_cfi_ref cfi = new_cfi ();
 
-- 
2.46.0



RE: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode precision [PR116202]

2024-08-06 Thread Li, Pan2
> OK.

Thanks Richard.

Just notice we can put type_has_mode_precision_p as the first condition to 
avoid unnecessary
pattern matching (which is heavy), will commit with this change if no surprise 
from test suite.

From:
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)
> +  && type_has_mode_precision_p (otype))

To:
> +  if (type_has_mode_precision_p (otype)
> +  && gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 6, 2024 9:26 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode 
precision [PR116202]

On Tue, Aug 6, 2024 at 2:59 PM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC vect pattern recog is valid when the lhs type has
> its mode precision.  For example as below, QImode with 1 bit precision
> like _Bool is invalid here.
>
> g_12 = (long unsigned int) _2;
> _13 = MIN_EXPR ;
> _3 = (_Bool) _13;
>
> The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest
> only has 1 bit precision with QImode mode.  Aka the type doesn't have
> the mode precision.
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> PR target/116202
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_trunc_pattern): Add the
> type_has_mode_precision_p check for the lhs type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../riscv/rvv/base/pr116202-run-1.c   | 24 +++
>  gcc/tree-vect-patterns.cc |  5 ++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> new file mode 100644
> index 000..d150f20b5d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */
> +
> +int b[24];
> +_Bool c[24];
> +
> +int main() {
> +  for (int f = 0; f < 4; ++f)
> +b[f] = 6;
> +
> +  for (int f = 0; f < 24; f += 4)
> +c[f] = ({
> +  int g = ({
> +unsigned long g = -b[f];
> +1 < g ? 1 : g;
> +  });
> +  g;
> +});
> +
> +  if (c[0] != 1)
> +__builtin_abort ();
> +}
> +
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 4674a16d15f..74f80587b0e 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4695,11 +4695,12 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>
>tree ops[1];
>tree lhs = gimple_assign_lhs (last_stmt);
> +  tree otype = TREE_TYPE (lhs);
>
> -  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)
> +  && type_has_mode_precision_p (otype))
>  {
>tree itype = TREE_TYPE (ops[0]);
> -  tree otype = TREE_TYPE (lhs);
>tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
>tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>internal_fn fn = IFN_SAT_TRUNC;
> --
> 2.43.0
>


Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Alejandro Colomar
Hi Martin,

On Tue, Aug 06, 2024 at 03:37:13PM GMT, Martin Uecker wrote:
> Am Dienstag, dem 06.08.2024 um 14:22 +0200 schrieb Alejandro Colomar:
> > Hi!
> > 
> > -  The tests seem to work as expected if I compile them manually, and
> >run (the one that should be run) as a normal program.  The one that
> >should not be run also gives the expected diagnostics.
> >Can anyone give advice of why it's not running well under the test
> >suite?
> 
> What is the output?  You get an additional warning / error.

$ /opt/local/gnu/gcc/lengthof/bin/gcc 
gcc/testsuite/gcc.dg/lengthof-compile.c 
gcc/testsuite/gcc.dg/lengthof-compile.c: In function ‘incomplete’:
gcc/testsuite/gcc.dg/lengthof-compile.c:10:19: error: invalid 
application of ‘lengthof’ to incomplete type ‘int[]’
   10 |   n = __lengthof__(x);  /* { dg-error "incomplete" } */
  |   ^
gcc/testsuite/gcc.dg/lengthof-compile.c:14:19: error: invalid 
application of ‘lengthof’ to type ‘int *’
   14 |   n = __lengthof__(p);  /* { dg-error "invalid" } */
  |   ^
gcc/testsuite/gcc.dg/lengthof-compile.c: In function ‘fam’:
gcc/testsuite/gcc.dg/lengthof-compile.c:26:19: error: invalid 
application of ‘lengthof’ to incomplete type ‘int[]’
   26 |   n = __lengthof__(s.fam); /* { dg-error "incomplete" } */
  |   ^
gcc/testsuite/gcc.dg/lengthof-compile.c: In function ‘func’:
gcc/testsuite/gcc.dg/lengthof-compile.c:41:20: error: passing argument 
3 of ‘fix_fix’ from incompatible pointer type [-Wincompatible-pointer-types]
   41 |   fix_fix(5, &c35, &i5); /* { dg-error 
"incompatible-pointer-types" } */
  |^~~
  ||
  |int (*)[5]
gcc/testsuite/gcc.dg/lengthof-compile.c:29:44: note: expected ‘int 
(*)[3]’ but argument is of type ‘int (*)[5]’
   29 | void fix_fix(int i, char (*a)[3][5], int 
(*x)[__lengthof__(*a)]);
  |  ~~^~~~
gcc/testsuite/gcc.dg/lengthof-compile.c:44:20: error: passing argument 
3 of ‘fix_var’ from incompatible pointer type [-Wincompatible-pointer-types]
   44 |   fix_var(5, &c35, &i5); /* { dg-error 
"incompatible-pointer-types" } */
  |^~~
  ||
  |int (*)[5]
gcc/testsuite/gcc.dg/lengthof-compile.c:30:44: note: expected ‘int 
(*)[3]’ but argument is of type ‘int (*)[5]’
   30 | void fix_var(int i, char (*a)[3][i], int 
(*x)[__lengthof__(*a)]);
  |  ~~^~~~
gcc/testsuite/gcc.dg/lengthof-compile.c:47:20: error: passing argument 
3 of ‘fix_uns’ from incompatible pointer type [-Wincompatible-pointer-types]
   47 |   fix_uns(5, &c35, &i5); /* { dg-error 
"incompatible-pointer-types" } */
  |^~~
  ||
  |int (*)[5]
gcc/testsuite/gcc.dg/lengthof-compile.c:31:44: note: expected ‘int 
(*)[3]’ but argument is of type ‘int (*)[5]’
   31 | void fix_uns(int i, char (*a)[3][*], int 
(*x)[__lengthof__(*a)]);
  |  ~~^~~~
$ /opt/local/gnu/gcc/lengthof/bin/gcc 
gcc/testsuite/gcc.dg/lengthof-compile.c |& grep ' error: '
gcc/testsuite/gcc.dg/lengthof-compile.c:10:19: error: invalid 
application of ‘lengthof’ to incomplete type ‘int[]’
gcc/testsuite/gcc.dg/lengthof-compile.c:14:19: error: invalid 
application of ‘lengthof’ to type ‘int *’
gcc/testsuite/gcc.dg/lengthof-compile.c:26:19: error: invalid 
application of ‘lengthof’ to incomplete type ‘int[]’
gcc/testsuite/gcc.dg/lengthof-compile.c:41:20: error: passing argument 
3 of ‘fix_fix’ from incompatible pointer type [-Wincompatible-pointer-types]
gcc/testsuite/gcc.dg/lengthof-compile.c:44:20: error: passing argument 
3 of ‘fix_var’ from incompatible pointer type [-Wincompatible-pointer-types]
gcc/testsuite/gcc.dg/lengthof-compile.c:47:20: error: passing argument 
3 of ‘fix_uns’ from incompatible pointer type [-Wincompatible-pointer-types]
$ /opt/local/gnu/gcc/lengthof/bin/gcc 
gcc/testsuite/gcc.dg/lengthof-compile.c |& grep ' error: ' | wc -l
6

I count 6, which is what I expect:

$ grep dg-error gcc/testsuite/gcc.dg/lengthof-compile.c
  n = __lengthof__(x);  /* { dg-error "incomplete" } */
  n = __lengthof__(p);  /* { dg-error "invalid" } */
  n = __lengthof__(s.fam); /* { dg-error "incomplete" } */
  fix_fix(5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
  fix_var(5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
  fix_u

Re: [PATCH v5 1/1] RISC-V: Add support for XCVbitmanip extension in CV32E40P

2024-08-06 Thread Jeff Law




On 8/4/24 12:35 PM, Mary Bennett wrote:

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add XCVbitmanip.
* config/riscv/constraints.md: Likewise.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/predicates.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* doc/extend.texi: Add XCVbitmanip builtin documentation.
* doc/sourcebuild.texi: Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bitmanip-compile-bclr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bclrr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bitrev.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bset.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bsetr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-clb.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-cnt.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extract.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractu.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractur.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-ff1.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-fl1.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-insert.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-insertr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-ror.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bclr.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bitrev.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bset.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-extract.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-extractu.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-insert.c: New test.
* lib/target-supports.exp: Add proc for the XCVbitmanip extension.
---





@@ -2651,3 +2653,189 @@
  }
[(set_attr "type" "branch")
 (set_attr "mode" "none")])
+
+;; XCVBITMANIP builtins
+
+(define_insn "riscv_cv_bitmanip_extract"
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
+(sign_extract:SI
+  (match_operand:SI 1 "register_operand" "r,r")
+  (ashiftrt:SI
+(match_operand:HI 2 "bit_extract_operand" "CV_bit_si10,r")
+(const_int 5))
+  (plus:SI
+(and:SI
+  (match_dup 2)
+  (const_int 31))
+(const_int 1]
+
+  "TARGET_XCVBITMANIP && !TARGET_64BIT"
+  "@
+   cv.extract\t%0,%1,%Z2,%W2
+   cv.extractr\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")
+  (set_attr "mode" "SI")])
+
+(define_insn "riscv_cv_bitmanip_extractu"
I would combine this with the previous pattern.  There's already a code 
iterator "any_extract" you can use in the RTL template.  And there's a 
 code attribute that could be extended to handle sign/zero extract so 
that you conditionally generate the "u" in the assembly code.


Why aren't the position and size arguments just registers?  I didn't see 
anything in the spec that would suggest that we'd want to match those 
shift and plus expressions in the pos/size arguments to the extraction. 
But given the insert/clear/set patterns are using the same basic 
structure I think I must be misunderstanding something.


*If* the semantics of the thead extensions are compatible with the Zbs 
extension, then we ought to be able to just reuse the existing patterns. 
 So for example a bclr has two major forms:




(define_insn "*bclr"
  [(set (match_operand:X 0 "register_operand" "=r")
(and:X (rotate:X (const_int -2)
 (match_operand:QI 2 "register_operand" "r"))
   (match_operand:X 1 "register_operand" "r")))]
  "TARGET_ZBS"
  "bclr\t%0,%1,%2"
  [(set_attr "type" "bitmanip")])

(define_insn "*bclri"
  [(set (match_operand:X 0 "register_operand" "=r")
(and:X (match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "not_single_bit_mask_operand" "DnS")))]
  "TARGET_ZBS"
  "bclri\t%0,%1,%T2"
  [(set_attr "type" "bitmanip")])


There's other variants and if the semantics are compatible I'd think 
we'd want to take advantage of the work that's already been done to 
recognize the other forms that combine might generate such as:




;; Yet another form of a bset/bclr that can be created by combine.
(define_insn "*bsetclr_zero_extract"
  [(set (zero_extract:X (match_operand:X 0 "register_operand" "+r")
   

Re: [PATCH v1 4/4] dwarf2: store the RA state in CFI row

2024-08-06 Thread Jakub Jelinek
On Tue, Aug 06, 2024 at 03:07:44PM +0100, Matthieu Longo wrote:
> On AArch64, the RA state informs the unwinder whether the return address
> is mangled and how, or not. This information is encoded in a boolean in
> the CFI row. This binary approach prevents from expressing more complex
> configuration, as it is the case with PAuth_LR introduced in Armv9.5-A.
> 
> This patch addresses this limitation by replacing the boolean by an enum.

Formatting nits.
> 
> gcc/ChangeLog:
> 
> * dwarf2cfi.cc
> (struct dw_cfi_row): Declare a new enum type to replace ra_mangled.
> (cfi_row_equal_p): Use ra_state instead of ra_mangled.
> (dwarf2out_frame_debug_cfa_negate_ra_state): Same.
> (change_cfi_row): Same.
> ---
>  gcc/dwarf2cfi.cc | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
> index 6c80e0b17bd..023f61fb712 100644
> --- a/gcc/dwarf2cfi.cc
> +++ b/gcc/dwarf2cfi.cc
> @@ -57,6 +57,15 @@ along with GCC; see the file COPYING3.  If not see
>  #define DEFAULT_INCOMING_FRAME_SP_OFFSET INCOMING_FRAME_SP_OFFSET
>  #endif
>  
> +
> +/* Signing method used for return address authentication.
> +   (AArch64 extension)*/

Dot and two spaces missing before */

> +typedef enum
> +{
> +  RA_no_signing = 0x0,
> +  RA_signing_SP = 0x1,
> +} RA_signing_method_t;

I think it should be ra_* and *_sp
> @@ -1556,8 +1565,11 @@ dwarf2out_frame_debug_cfa_negate_ra_state (void)
>  {
>dw_cfi_ref cfi = new_cfi ();
>cfi->dw_cfi_opc = DW_CFA_AARCH64_negate_ra_state;
> +  cur_row->ra_state =
> +   (cur_row->ra_state == RA_no_signing)
> +   ? RA_signing_SP
> +   : RA_no_signing;

This is wrongly formatted.  = shouldn't be at the end of line.
Better
  cur_row->ra_state
= (cur_row->ra_state == ra_no_signing
   ? ra_signing_sp : ra_no_signing);

Jakub



[PATCH] Support if conversion for switches

2024-08-06 Thread Andi Kleen
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.

Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
and no ranges. They are handled similar to if in if conversion.

Some notes:

In theory this handles switches with case ranges, but it seems
for the simple "one target label" switch case that is supported
here these are always optimized by the cfg passes to COND,
so this case is latent.

This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.

gcc/ChangeLog:

PR tree-opt/115866
* tree-if-conv.cc (if_convertible_switch_p): New function.
(if_convertible_stmt_p): Check for switch.
(get_loop_body_in_if_conv_order): Handle switch.
(predicate_bbs): Likewise.
(predicate_statements): Likewise.
(remove_conditions_and_labels): Likewise.
(ifcvt_split_critical_edges): Likewise.
(ifcvt_local_dce): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
---
 .../gcc.dg/vect/vect-bitfield-read-1-not.c|   2 +-
 .../gcc.dg/vect/vect-switch-ifcvt-1.c | 107 ++
 .../gcc.dg/vect/vect-switch-ifcvt-2.c |  28 +
 .../vect/vect-switch-search-line-fast.c   |  17 +++
 gcc/tree-if-conv.cc   |  90 ++-
 5 files changed, 238 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
index 0d91067ebb2..85f4de8464a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { xfail 
*-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
new file mode 100644
index 000..0b06d3c84a7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -0,0 +1,107 @@
+/* { dg-require-effective-target vect_int } */
+
+extern void abort (void);
+
+int
+f1 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f2 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+   {
+ switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f3 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
+ c++;
+  s++;
+}
+  return c;
+}
+
+
+int
+f4 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
+   c++;
+  s++;
+}
+  return c;
+}
+
+#define CHECK(f, str, res) \
+  __builtin_strcpy(buf, str); n = f(buf); if (n != res) abort();
+
+int
+main ()
+{
+  int n;
+  char buf[64];
+
+  CHECK (f1, ",,", 10);
+  CHECK (f1, "||", 10);
+  CHECK (f1, "aa", 0);
+  CHECK (f1, "", 0);
+  CHECK (f1, ",|,|xx", 4);
+
+  CHECK (f2, ",|,|xx", 4);
+  CHECK (f2, ",|,|xx", 4);
+  CHECK (f2, ",|,|xx", 4);
+  CHECK (f2, ",|,|xx", 4);
+
+  CHECK (f3, ",|,|xx", 4);
+  CHECK (f3, ",|,|xx", 4);
+  CHECK (f3, ",|,|xx", 4);
+  CHECK (f3, ",|,|xx", 4);
+
+  CHECK (f4, ",|,|xx", 4);
+  CHECK (f4, ",|,|xx", 4);
+  CHECK (f4, ",|,|xx", 4);
+  CHECK (f4, ",|,|xx", 4);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect"  } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-switc

[wwwdocs] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-06 Thread Filip Kastl
Hi,

I recently had to add '#include ' to a program to compile it with the
latest trunk GCC due to

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659176.html

So I thought I might create the GCC 15 porting_to.html page and add an entry
about this (I just copied the entry from GCC 14 porting_to.html).

Ha!  As I'm writing this I noticed that actually Jonathan predicted this and
suggested a corresponding porting_to.html entry.  Well, here it is :).

Validated with the W3 Validator.  Is this ok to be pushed?

Cheers,
Filip Kastl


-- 8< --


---
 htdocs/gcc-15/changes.html|  3 +-
 htdocs/gcc-15/porting_to.html | 53 +++
 2 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 htdocs/gcc-15/porting_to.html

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index bfd98496..0d2c0aaf 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -17,9 +17,8 @@
 
 This page is a "brief" summary of some of the huge number of improvements
 in GCC 15.
-
 
diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html
new file mode 100644
index ..43c3b302
--- /dev/null
+++ b/htdocs/gcc-15/porting_to.html
@@ -0,0 +1,53 @@
+
+
+
+
+
+Porting to GCC 15
+https://gcc.gnu.org/gcc.css";>
+
+
+
+Porting to GCC 15
+
+
+The GCC 15 release series differs from previous GCC releases in
+a number of ways. Some of these are a result
+of bug fixing, and some old behaviors have been intentionally changed
+to support new standards, or relaxed in standards-conforming ways to
+facilitate compilation or run-time performance.
+
+
+
+Some of these changes are user visible and can cause grief when
+porting to GCC 15. This document is an effort to identify common issues
+and provide solutions. Let us know if you have suggestions for improvements!
+
+
+Note: GCC 15 has not been released yet, so this document is
+a work-in-progress.
+
+
+
+C++ language issues
+
+Header dependency changes
+Some C++ Standard Library headers have been changed to no longer include
+other headers that were being used internally by the library.
+As such, C++ programs that used standard library components without
+including the right headers will no longer compile.
+
+
+The following headers are used less widely in libstdc++ and may need to
+be included explicitly when compiling with GCC 15:
+
+
+ 
+  (for std::int8_t, std::int32_t etc.)
+
+
+
+
+
+
+
-- 
2.45.2



Re: [PATCH] ASAN: call initialize_sanitizer_builtins for hwasan [PR115205]

2024-08-06 Thread Andrew Pinski
On Tue, May 28, 2024 at 8:28 PM Andrew Pinski  wrote:
>
> Sometimes initialize_sanitizer_builtins is not called before emitting
> the asan builtins with hwasan. In the case of the bug report, there
> was a path with the fortran front-end where it was not called.
> So let's call it in asan_instrument before calling transform_statements.
>
> Built and tested for aarch64-linux-gnu with no regressions.

Ping? Another duplicate of the bug report came in too.

>
> gcc/ChangeLog:
>
> PR sanitizer/115205
> * asan.cc (asan_instrument): Call initialize_sanitizer_builtins
> for hwasan.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/asan.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/asan.cc b/gcc/asan.cc
> index 9e0f51b1477..c684ca6d366 100644
> --- a/gcc/asan.cc
> +++ b/gcc/asan.cc
> @@ -4276,6 +4276,7 @@ asan_instrument (void)
>  {
>if (hwasan_sanitize_p ())
>  {
> +  initialize_sanitizer_builtins ();
>transform_statements ();
>return 0;
>  }
> --
> 2.43.0
>


Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Martin Uecker
Am Dienstag, dem 06.08.2024 um 16:12 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> 
> On Tue, Aug 06, 2024 at 03:37:13PM GMT, Martin Uecker wrote:
> > Am Dienstag, dem 06.08.2024 um 14:22 +0200 schrieb Alejandro Colomar:
> > > Hi!
> > > 
> > > -  The tests seem to work as expected if I compile them manually, and
> > >run (the one that should be run) as a normal program.  The one that
> > >should not be run also gives the expected diagnostics.
> > >Can anyone give advice of why it's not running well under the test
> > >suite?
> > 
> > What is the output?  You get an additional warning / error.
> 
>   $ /opt/local/gnu/gcc/lengthof/bin/gcc 
> gcc/testsuite/gcc.dg/lengthof-compile.c 
>   gcc/testsuite/gcc.dg/lengthof-compile.c: In function ‘incomplete’:
>   gcc/testsuite/gcc.dg/lengthof-compile.c:10:19: error: invalid 
> application of ‘lengthof’ to incomplete type ‘int[]’
>  10 |   n = __lengthof__(x);  /* { dg-error "incomplete" } */
> |   ^
>   gcc/testsuite/gcc.dg/lengthof-compile.c:14:19: error: invalid 
> application of ‘lengthof’ to type ‘int *’
>  14 |   n = __lengthof__(p);  /* { dg-error "invalid" } */
> |   ^
>   gcc/testsuite/gcc.dg/lengthof-compile.c: In function ‘fam’:
>   gcc/testsuite/gcc.dg/lengthof-compile.c:26:19: error: invalid 
> application of ‘lengthof’ to incomplete type ‘int[]’
>  26 |   n = __lengthof__(s.fam); /* { dg-error "incomplete" } */
> |   ^
>   gcc/testsuite/gcc.dg/lengthof-compile.c: In function ‘func’:
>   gcc/testsuite/gcc.dg/lengthof-compile.c:41:20: error: passing argument 
> 3 of ‘fix_fix’ from incompatible pointer type [-Wincompatible-pointer-types]
>  41 |   fix_fix(5, &c35, &i5); /* { dg-error 
> "incompatible-pointer-types" } */
> |^~~
> ||
> |int (*)[5]
>   gcc/testsuite/gcc.dg/lengthof-compile.c:29:44: note: expected ‘int 
> (*)[3]’ but argument is of type ‘int (*)[5]’
>  29 | void fix_fix(int i, char (*a)[3][5], int 
> (*x)[__lengthof__(*a)]);
> |  ~~^~~~
>   gcc/testsuite/gcc.dg/lengthof-compile.c:44:20: error: passing argument 
> 3 of ‘fix_var’ from incompatible pointer type [-Wincompatible-pointer-types]
>  44 |   fix_var(5, &c35, &i5); /* { dg-error 
> "incompatible-pointer-types" } */
> |^~~
> ||
> |int (*)[5]
>   gcc/testsuite/gcc.dg/lengthof-compile.c:30:44: note: expected ‘int 
> (*)[3]’ but argument is of type ‘int (*)[5]’
>  30 | void fix_var(int i, char (*a)[3][i], int 
> (*x)[__lengthof__(*a)]);
> |  ~~^~~~
>   gcc/testsuite/gcc.dg/lengthof-compile.c:47:20: error: passing argument 
> 3 of ‘fix_uns’ from incompatible pointer type [-Wincompatible-pointer-types]
>  47 |   fix_uns(5, &c35, &i5); /* { dg-error 
> "incompatible-pointer-types" } */
> |^~~
> ||
> |int (*)[5]
>   gcc/testsuite/gcc.dg/lengthof-compile.c:31:44: note: expected ‘int 
> (*)[3]’ but argument is of type ‘int (*)[5]’
>  31 | void fix_uns(int i, char (*a)[3][*], int 
> (*x)[__lengthof__(*a)]);
> |  ~~^~~~
>   $ /opt/local/gnu/gcc/lengthof/bin/gcc 
> gcc/testsuite/gcc.dg/lengthof-compile.c |& grep ' error: '
>   gcc/testsuite/gcc.dg/lengthof-compile.c:10:19: error: invalid 
> application of ‘lengthof’ to incomplete type ‘int[]’
>   gcc/testsuite/gcc.dg/lengthof-compile.c:14:19: error: invalid 
> application of ‘lengthof’ to type ‘int *’
>   gcc/testsuite/gcc.dg/lengthof-compile.c:26:19: error: invalid 
> application of ‘lengthof’ to incomplete type ‘int[]’
>   gcc/testsuite/gcc.dg/lengthof-compile.c:41:20: error: passing argument 
> 3 of ‘fix_fix’ from incompatible pointer type [-Wincompatible-pointer-types]
>   gcc/testsuite/gcc.dg/lengthof-compile.c:44:20: error: passing argument 
> 3 of ‘fix_var’ from incompatible pointer type [-Wincompatible-pointer-types]
>   gcc/testsuite/gcc.dg/lengthof-compile.c:47:20: error: passing argument 
> 3 of ‘fix_uns’ from incompatible pointer type [-Wincompatible-pointer-types]
>   $ /opt/local/gnu/gcc/lengthof/bin/gcc 
> gcc/testsuite/gcc.dg/lengthof-compile.c |& grep ' error: ' | wc -l
>   6
> 
> I count 6, which is what I expect:
> 
>   $ grep dg-error gcc/testsuite/gcc.dg/lengthof-compile.c
> n = __lengthof__(x);  /* { dg-error "incomplete" } */
> n = __lengthof__(p);  /* { dg-error "invalid" } */
> n = __lengthof__(s.fam); /* { dg-error "incomplete" } */
> fix_fix(5, &c35

Re: [wwwdocs] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-06 Thread Gerald Pfeifer
On Tue, 6 Aug 2024, Filip Kastl wrote:
> So I thought I might create the GCC 15 porting_to.html page and add an 
> entry about this (I just copied the entry from GCC 14 porting_to.html).

Nice.

> Ha!  As I'm writing this I noticed that actually Jonathan predicted this 
> and suggested a corresponding porting_to.html entry.  Well, here it is :).

Great minds think alike, they say. :-)

> Validated with the W3 Validator.  Is this ok to be pushed?

Looks good to me, just one question to Jonathan as native speaker and one 
observation:

> +
> +The following headers are used less widely in libstdc++ and may need to
> +be included explicitly when compiling with GCC 15:

Is "in libstdc++" the best option, or maybe "by ..." or "within ..."?

> +
> + 
> +  (for std::int8_t, std::int32_t etc.)
> +

The text reads "headers", alas there only appears to be one right now?
So "header is" (singular)?

Gerald


Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-08-06 Thread Carl Love

Steve:

Agreed the documentation only specifies unsigned char argument for the 
two built-ins.


Do you think we should add support signed char arguments in addition to 
the documented unsigned char arguments?


Do you see any situations where a user might want to to have both signed 
and unsigned char arguments for the two built-ins?


Thanks.

    Carl

On 8/5/24 2:12 PM, Steven Munroe wrote:
Looking at the latest version of the Power Vector Intrinsic 
Programming Reference (Revision 2. 0. 0_prd, Bill slipped this to me 
for review), I see that vec_test_lsbb_all_ones vec_test_lsbb_all_zeros 
both specify vector unsigned char, only. On


Looking at the latest version of the Power Vector Intrinsic 
Programming Reference (Revision 2.0.0_prd, Bill slipped this to me for 
review), I see that



vec_test_lsbb_all_ones


vec_test_lsbb_all_zeros

both specify vector unsigned char, only.

On Mon, Aug 5, 2024 at 1:15 AM Kewen.Lin  wrote:

on 2024/8/3 05:48, Peter Bergner wrote:
> On 7/31/24 10:21 PM, Kewen.Lin wrote:
>> on 2024/8/1 01:52, Carl Love wrote:
>>> Yes, I noticed that the built-ins were defined as overloaded
but only had one definition.   Did seem odd to me.
>>>
 either is with "vector unsigned char" as argument type, but
the corresponding instance
 prototype in builtin table is with "vector signed char". 
It's inconsistent and weird,
 I think we can just update the prototype in builtin table
with "vector unsigned char"
 and remove the entries in overload table.  It can be a follow
up patch.
>>>
>>> I didn't notice that it was signed in the instance prototype
but unsigned in the overloaded definition. That is definitely
inconsistent.
>>>
>>> That said, should we just go ahead and support both signed and
unsigned argument versions of the all ones and all zeros built-ins?
>>
>> Good question, I thought about that but found openxl only
supports the unsigned version
>> so I felt it's probably better to keep consistent with it.  But
I'm fine for either, if
>> we decide to extend it to cover both signed and unsigned, we
should notify openxl team
>> to extend it as well.
>>
>> openxl doc links:
>>
>>

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones
>>

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-zeros
>
> If it makes sense to support vector signed char rather than only
the vector unsigned char,
> then I'm fine adding support for it.  It almost seems since we
tried adding an overload
> for it, that that was our intention (to support both signed and
unsigned) and we just
> had a bug so only unsigned was supported?

Good question but I'm not sure, it could be an oversight without
adding one more instance
for overloading, or adopting some useless code (only for
overloading) for a single instance.
I found it's introduced by r11-2437-gcf5d0fc2d1adcd, CC'ed Will as
he contributed this.

BR,
Kewen

>
> CC'ing Steve since he noticed the missing documentation when we
was trying to
> use the built-ins.  Steve, do you see a need to also support
vector signed char
> with these built-ins?
>
> Peter
>
>





Re: [RFC v3 3/3] c: Add __lengthof__() operator

2024-08-06 Thread Qing Zhao


On Aug 5, 2024, at 16:59, Alejandro Colomar  wrote:


The “counted-by” attribute currently is not in the TYPE system, and
we plan to add it into the TYPE system later through language
standard (or an GCC extension).  If that happens, then both the
“sizeof” and the “__lengthof__” operators should be automatically
evaluate the “size" or the “length” for the expr through its TYPE.
(Just as the current VLA, its size and length already in the TYPE,
therefore both “sizeof” and “__lengthof__” should evaluate VLA.

I'm curious; how do you plan to make counted_by as part of the type
system?  I've read the paper for using a .identifier length designator
(n3188; ),
but that's a constant, and doesn't use an attribute.

The “counted_by” attribute is only a temporary and practical solution at this 
moment
 to build a direct relationship between the length of the array and and array 
itself in the
source code level, but not touching the TYPE system at all.
The final plan is similar as the solution in the above paper you
referred.

i.e, currently, with “counted_by” attribute:

struct foo {
  unsigned int count;
  char array [] _attribute__ ((counted_by (count));
};

Later, when the relationship is built into TYPE, the above will become:

struct foo {
  unsigned int count;
  char array [.count];
};

That will be the cleanest solution to this problem.
However, might take a much longer time to final get into the compiler.

Thanks.

Qing


Re: [RFC] libstdc++: Replace Ryu with Teju Jagua for float.

2024-08-06 Thread Andi Kleen
Cassio Neri  writes:

> Implement the template function teju_jagua which finds the shortest
> representation of a floating-point number. The floating-point type is a
> template parameter and the implementation is generic enough to handle all
> floating-point types of interest, namely, IEEE 754, std::bfloat16_t,
> x86 80-bit and IBM128.

So the only benefit is performance, right? So the patch
should come with some performance numbers how it is better 
than the old code. Also how did you validate that it works
correctly?

-Andi


Re: [PATCH] c++: fold calls to std::forward_like [PR96780]

2024-08-06 Thread Jason Merrill

On 8/6/24 10:01 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK  for trunk?


I might add it after std::forward instead of at the bottom?  OK either way.


-- >8 --

This extends our folding of cast-like standard library functions
to also include C++23's std::forward_like.

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : Fold calls
to std::forward_like as well.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: Test std::forward_like folding.
---
  gcc/cp/cp-gimplify.cc  | 3 ++-
  gcc/testsuite/g++.dg/opt/pr96780.C | 5 +
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index b88c3b7f370..3db9657ae93 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3316,7 +3316,8 @@ cp_fold (tree x, fold_flags_t flags)
|| id_equal (DECL_NAME (callee), "addressof")
/* This addressof equivalent is used heavily in libstdc++.  */
|| id_equal (DECL_NAME (callee), "__addressof")
-   || id_equal (DECL_NAME (callee), "as_const")))
+   || id_equal (DECL_NAME (callee), "as_const")
+   || id_equal (DECL_NAME (callee), "forward_like")))
  {
r = CALL_EXPR_ARG (x, 0);
/* Check that the return and argument types are sane before
diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C 
b/gcc/testsuite/g++.dg/opt/pr96780.C
index 61e11855eeb..a29cda8b836 100644
--- a/gcc/testsuite/g++.dg/opt/pr96780.C
+++ b/gcc/testsuite/g++.dg/opt/pr96780.C
@@ -29,6 +29,10 @@ void f() {
auto&& x11 = std::as_const(a);
auto&& x12 = std::as_const(ca);
  #endif
+#if __cpp_lib_forward_like
+  auto&& x13 = std::forward_like(a);
+  auto&& x14 = std::forward_like(ca);
+#endif
  }
  
  // { dg-final { scan-tree-dump-not "= std::move" "gimple" } }

@@ -36,3 +40,4 @@ void f() {
  // { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } }
  // { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } }
  // { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
+// { dg-final { scan-tree-dump-not "= std::forward_like" "gimple" } }




Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Alejandro Colomar
Hi Martin,

On Tue, Aug 06, 2024 at 04:43:27PM GMT, Martin Uecker wrote:
> > When running `make check -j24 -Orecurse |& tee log`, this is what I see:
> > 
> > FAIL: gcc.dg/lengthof-compile.c (test for excess errors)
> > 
> > Is there any way to see more details?
> 
> See gcc/testsuite/gcc/gcc.log 

Ahhh, thanks!  It seems it was only the obvious C90-compat warnings that
I need to turn off.  It all seems good after that.

FAIL: gcc.dg/lengthof-compile.c (test for excess errors)
Excess errors:
/home/alx/src/gnu/gcc/len/gcc/testsuite/gcc.dg/lengthof-compile.c:22:9: error: 
ISO C90 does not support flexible array members [-Wpedantic]
/home/alx/src/gnu/gcc/len/gcc/testsuite/gcc.dg/lengthof-compile.c:30:1: error: 
ISO C90 forbids variable length array 'a' [-Wvla]
/home/alx/src/gnu/gcc/len/gcc/testsuite/gcc.dg/lengthof-compile.c:31:33: error: 
ISO C90 does not support '[*]' array declarators [-Wpedantic]

> 
> There are also *.sum files which you can diff against a build
> without your patch to see whether there are any regressions.

Good.  I'll check.

> > > > -  I don't like the fact that [*][n] is internally implemented exactly
> > > >like [0][n], which makes them indistinguishable.  All other cases of
> > > >[0] return a constent expression of value 0, but [0][n] must return a
> > > >variable 0, to keep support for [*][n].
> > > >Could you please change the way [*][n] (and thus [*]) is represented
> > > >internally so that it can be differentiated from [0]?
> > > >Do you have in mind any other way that would be a viable
> > > >implementation of [*] that would allow distinguishing [0][n] and
> > > >[*][n]?  Maybe making it to have one node instead of zero and mark
> > > >that node specially?
> > > 
> > > The C++ frontend encodes zero-sized arrays using a range of [0,-1]. 
> > > I have a half-finished patch which implements this for the C FE.
> > 
> > Thanks!  I guess your patch will be merged before mine, so please ping
> > me when that happens so I update mine for it.
> 
> Not sure about this...

:)

> > 
> > BTW, do you allow me to use Co-developed-by: you?
> 
> ok,

Thanks!

Cheers,
Alex

> Martin
> > 
> > Have a lovely day!
> > Alex

-- 



signature.asc
Description: PGP signature


Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Qing Zhao
Hi, Alex,
I noticed that all your 4 versions of the patches and the corresponding 
discussion are all in the same email thread, it’s very inconvenient to read. 
Can you start a new email thread for each of the new version of the patch? 
(i.e, Please not reply to the previous version when you have a new version of 
the patch).
Some more questions and comments below:

On Aug 6, 2024, at 08:22, Alejandro Colomar  wrote:

Hi!

v4:

-  Only evaluate the operand if the top array is VLA.  Inner VLAs are
  ignored.  [Joseph, Martin]
  This proved very useful for compile-time diagnostics, since we have
  more cases that are constant expressions.
-  Document the evaluation rules, which are unique to this operator
  (similar to sizeof, but we ignore inner VLAs).
-  Add tests to the testsuite.  [Joseph]
-  Swap diagnostic cases preference, to give more meaningful
  diagnostics.  [Martin]
-  Document that Xavier was the first one to suggest this feature, and
  provide a link to the mail thread where that happened.
  BTW, while reading that discussion from 2 years ago, I see that it
  was questioned the value of this operator.  Below is a rationale to
  defend it.
I briefly read the two links you provided as the background of your patch:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf

This is a proposal submitted on 6/4/2020, do you know the current status of 
this proposal?

https://inbox.sourceware.org/gcc/m8s4oqy--...@tutanota.com/T/

This is some discussion within GCC community on this proposal around the same 
time (the end of May of 2020, before the submission date of the proposal).

From the discussion, I didn’t see a consistent positive opinion on the proposal 
itself.

So, I am wondering whether you have any new background information since then? 
What’s the major motivation to bring up this proposal again this time after 4 
years?

-  Document that Martin's help has been crucial for implementing this,
  with 'Co-developed-by'.  Would you mind confirming that I can use
  that tag?
-  CC += Kees, Qing, Jens

Rationale:

-  While compiler extensions already allow implementing ARRAY_SIZE()
  (), there's still no
  way to get the length of a function parameter which uses array
  notation.


Is this one of major benefits from this new __lenghth__ operator?
If so, any rough idea now on how to implement this (i.e, the length of a 
function parameter array).


 While this first implementation doesn't support those yet
  (because there are some issues that need to be fixed first), the plan
  is to add support to those.


What kind of issues are? What’s the plan to resolve those issues?


 This would be a huge step towards arrays
  being first-class citizens in C.  In those cases, it would reduce the
  chance of programmer errors.  See for example
  .  That entire class of bugs
  would be over, _and_ programs would become simpler.

Some specific questions or concerns:

-  The tests seem to work as expected if I compile them manually, and
  run (the one that should be run) as a normal program.  The one that
  should not be run also gives the expected diagnostics.
  Can anyone give advice of why it's not running well under the test
  suite?

You might want to check some existing testing cases in GCC’s testsuite first to 
see what kind of directives you are missing in your test case. (For example, 
any testing case in gcc/testsuite/gcc.dg/).

The documentation of the test suite is here:
https://gcc.gnu.org/onlinedocs/gccint/Testsuites.html

Adding testing case correctly into GCC’s testing suite is very important for 
any patch. And adding them in the beginning of the development also is very 
important and will save you a lot of time.

Qing


-  I don't like the fact that [*][n] is internally implemented exactly
  like [0][n], which makes them indistinguishable.  All other cases of
  [0] return a constent expression of value 0, but [0][n] must return a
  variable 0, to keep support for [*][n].
  Could you please change the way [*][n] (and thus [*]) is represented
  internally so that it can be differentiated from [0]?
  Do you have in mind any other way that would be a viable
  implementation of [*] that would allow distinguishing [0][n] and
  [*][n]?  Maybe making it to have one node instead of zero and mark
  that node specially?

At the bottom of this email is a range-diff against v3.

And below is a test program I used while developing the feature.  It is
quite similar to what's on the test suite (patch 4/4), since those are
based on this one.

It has comments where I'd like more diagnostics, but those are not
responsibility of this feature.  Some are fault of the representation
for [*], and others are already being worked on by Martin.  There are
also comments on code that causes compile-time errors as expected
(wanted).  Some assertions about evaluation of the operand are commented
out because due to the problems with [*][n]

[PATCH] RISC-V: Fix format-diag warning from improperly formatted url

2024-08-06 Thread Patrick O'Neill
gcc/ChangeLog:

PR target/116152
* config/riscv/riscv.cc (riscv_option_override): Fix url
  formatting.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-9.c: Update testcase.

Co-authored-by: Jakub Jelinek 
Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc.
---
 gcc/config/riscv/riscv.cc | 4 ++--
 gcc/testsuite/gcc.target/riscv/predef-9.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b005af62e61..3f7eec8d69e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9826,8 +9826,8 @@ riscv_option_override (void)
   if (riscv_abi == ABI_LP64E)
 {
   if (warning (OPT_Wdeprecated, "LP64E ABI is marked for deprecation in 
GCC"))
-   inform (UNKNOWN_LOCATION, "If you need LP64E please notify the GCC "
-   "project via https://gcc.gnu.org/PR116152";);
+   inform (UNKNOWN_LOCATION, "if you need LP64E please notify the GCC "
+   "project via %{PR116152%}", "https://gcc.gnu.org/PR116152";);
 }

   /* Zfinx require abi ilp32, ilp32e, lp64 or lp64e.  */
diff --git a/gcc/testsuite/gcc.target/riscv/predef-9.c 
b/gcc/testsuite/gcc.target/riscv/predef-9.c
index 0d9488529ea..b173d5df57f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-9.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-9.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64em -mabi=lp64e -mno-div -mcmodel=medlow" } */
 /* { dg-warning "LP64E ABI is marked for deprecation in GCC" "" { target *-*-* 
} 0 } */
-/* { dg-note "If you need LP64E please notify the GCC project via 
https://gcc.gnu.org/PR116152"; "" { target *-*-* } 0 } */
+/* { dg-note "if you need LP64E please notify the GCC project via PR116152" "" 
{ target *-*-* } 0 } */

 int main () {
 #if !defined(__riscv)
--
2.34.1



[PATCH] c++: further concept_check_p clean-up

2024-08-06 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Patrick noticed a few more concept_check_p checks that can be removed
now.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Remove concept_check_p check.
(cxx_eval_outermost_constant_expr): Likewise.
* cp-gimplify.cc (cp_genericize_r) : Likewise.
* except.cc (check_noexcept_r): Likewise.
---
 gcc/cp/constexpr.cc   | 20 ++--
 gcc/cp/cp-gimplify.cc |  9 -
 gcc/cp/except.cc  |  2 --
 3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8d994f0ee53..b0adbb9036d 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2797,10 +2797,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  value_cat lval,
  bool *non_constant_p, bool *overflow_p)
 {
-  /* Handle concept checks separately.  */
-  if (concept_check_p (t))
-return evaluate_concept_check (t);
-
   location_t loc = cp_expr_loc_or_input_loc (t);
   tree fun = get_function_named_in_call (t);
   constexpr_call new_call
@@ -8774,16 +8770,12 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
   || TREE_CODE (t) == AGGR_INIT_EXPR
   || TREE_CODE (t) == TARGET_EXPR))
 {
-  /* For non-concept checks, determine if it is consteval.  */
-  if (!concept_check_p (t))
-   {
- tree x = t;
- if (TREE_CODE (x) == TARGET_EXPR)
-   x = TARGET_EXPR_INITIAL (x);
- tree fndecl = cp_get_callee_fndecl_nofold (x);
- if (fndecl && DECL_IMMEDIATE_FUNCTION_P (fndecl))
-   is_consteval = true;
-   }
+  tree x = t;
+  if (TREE_CODE (x) == TARGET_EXPR)
+   x = TARGET_EXPR_INITIAL (x);
+  tree fndecl = cp_get_callee_fndecl_nofold (x);
+  if (fndecl && DECL_IMMEDIATE_FUNCTION_P (fndecl))
+   is_consteval = true;
 }
   if (AGGREGATE_TYPE_P (type) || VECTOR_TYPE_P (type))
 {
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 0c589eeaaec..003e68f1ea7 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2092,15 +2092,6 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
   break;
 
 case CALL_EXPR:
-  /* Evaluate function concept checks instead of treating them as
-normal functions.  */
-  if (concept_check_p (stmt))
-   {
- *stmt_p = evaluate_concept_check (stmt);
- * walk_subtrees = 0;
- break;
-   }
-
   if (!wtd->no_sanitize_p
  && sanitize_flags_p ((SANITIZE_NULL
| SANITIZE_ALIGNMENT | SANITIZE_VPTR)))
diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
index 3c69ab69502..0231bd2507d 100644
--- a/gcc/cp/except.cc
+++ b/gcc/cp/except.cc
@@ -1074,8 +1074,6 @@ check_noexcept_r (tree *tp, int *walk_subtrees, void *)
 
  We could use TREE_NOTHROW (t) for !TREE_PUBLIC fns, though... */
   tree fn = cp_get_callee (t);
-  if (concept_check_p (fn))
-   return NULL_TREE;
   tree type = TREE_TYPE (fn);
   gcc_assert (INDIRECT_TYPE_P (type));
   type = TREE_TYPE (type);

base-commit: 180625ae72b3f733813a360fae4f0d6ce79eccdc
-- 
2.45.2



Re: [PATCH] RISC-V: Fix format-diag warning from improperly formatted url

2024-08-06 Thread Palmer Dabbelt

On Tue, 06 Aug 2024 09:07:26 PDT (-0700), Patrick O'Neill wrote:

gcc/ChangeLog:

PR target/116152
* config/riscv/riscv.cc (riscv_option_override): Fix url
  formatting.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-9.c: Update testcase.

Co-authored-by: Jakub Jelinek 
Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc.
---
 gcc/config/riscv/riscv.cc | 4 ++--
 gcc/testsuite/gcc.target/riscv/predef-9.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b005af62e61..3f7eec8d69e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9826,8 +9826,8 @@ riscv_option_override (void)
   if (riscv_abi == ABI_LP64E)
 {
   if (warning (OPT_Wdeprecated, "LP64E ABI is marked for deprecation in 
GCC"))
-   inform (UNKNOWN_LOCATION, "If you need LP64E please notify the GCC "
-   "project via https://gcc.gnu.org/PR116152";);
+   inform (UNKNOWN_LOCATION, "if you need LP64E please notify the GCC "
+   "project via %{PR116152%}", "https://gcc.gnu.org/PR116152";);
 }

   /* Zfinx require abi ilp32, ilp32e, lp64 or lp64e.  */
diff --git a/gcc/testsuite/gcc.target/riscv/predef-9.c 
b/gcc/testsuite/gcc.target/riscv/predef-9.c
index 0d9488529ea..b173d5df57f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-9.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-9.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64em -mabi=lp64e -mno-div -mcmodel=medlow" } */
 /* { dg-warning "LP64E ABI is marked for deprecation in GCC" "" { target *-*-* 
} 0 } */
-/* { dg-note "If you need LP64E please notify the GCC project via 
https://gcc.gnu.org/PR116152"; "" { target *-*-* } 0 } */
+/* { dg-note "if you need LP64E please notify the GCC project via PR116152" "" 
{ target *-*-* } 0 } */

 int main () {
 #if !defined(__riscv)


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

Thanks!


[Committed] RISC-V: Fix format-diag warning from improperly formatted url

2024-08-06 Thread Patrick O'Neill



On 8/6/24 09:12, Palmer Dabbelt wrote:

On Tue, 06 Aug 2024 09:07:26 PDT (-0700), Patrick O'Neill wrote:

gcc/ChangeLog:

PR target/116152
* config/riscv/riscv.cc (riscv_option_override): Fix url
  formatting.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-9.c: Update testcase.

Co-authored-by: Jakub Jelinek 
Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc.
---
 gcc/config/riscv/riscv.cc | 4 ++--
 gcc/testsuite/gcc.target/riscv/predef-9.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b005af62e61..3f7eec8d69e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9826,8 +9826,8 @@ riscv_option_override (void)
   if (riscv_abi == ABI_LP64E)
 {
   if (warning (OPT_Wdeprecated, "LP64E ABI is marked for 
deprecation in GCC"))
-    inform (UNKNOWN_LOCATION, "If you need LP64E please notify the 
GCC "

-    "project via https://gcc.gnu.org/PR116152";);
+    inform (UNKNOWN_LOCATION, "if you need LP64E please notify the 
GCC "

+    "project via %{PR116152%}", "https://gcc.gnu.org/PR116152";);
 }

   /* Zfinx require abi ilp32, ilp32e, lp64 or lp64e.  */
diff --git a/gcc/testsuite/gcc.target/riscv/predef-9.c 
b/gcc/testsuite/gcc.target/riscv/predef-9.c

index 0d9488529ea..b173d5df57f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-9.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-9.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64em -mabi=lp64e -mno-div -mcmodel=medlow" 
} */
 /* { dg-warning "LP64E ABI is marked for deprecation in GCC" "" { 
target *-*-* } 0 } */
-/* { dg-note "If you need LP64E please notify the GCC project via 
https://gcc.gnu.org/PR116152"; "" { target *-*-* } 0 } */
+/* { dg-note "if you need LP64E please notify the GCC project via 
PR116152" "" { target *-*-* } 0 } */


 int main () {
 #if !defined(__riscv)


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

Thanks!


Committed - thanks!

Patrick



[PATCH 0/3] libcpp: improve x86 vectorized helpers

2024-08-06 Thread Alexander Monakov
Hello!

As discussed, I'm sending patches that reimplement our SSE4.2 search_line_fast
helper with SSSE3, and then add the corresponding AVX2 helper. They are on top
of Andi's "Remove MMX code path in lexer" patch, which was approved, but not
committed yet (Andi, can you push your own patch?).

Apparently the branch where we find a possible EOL and return from the function
is poorly predictable, hence a small win from AVX2 use (wider vectors => fewer
mispredicts).

I'm also attaching here a microbenchmark for testing all variants in isolation.

Alexander

search-line-bench.tgz
Description: application/gzip


[PATCH 1/3] libcpp: configure: check for AVX2 instead of SSE4

2024-08-06 Thread Alexander Monakov
Upcoming patches first drop Binutils ISA support from SSE4.2 to SSSE3,
then bump it to AVX2. Instead of fiddling with detection, just bump
our configure check to AVX2 immediately: if by some accident somebody
builds GCC without AVX2 support in the assembler, they will get SSE2
vectorized lexer, which is not too slow.

libcpp/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for AVX2 instead of SSE4.2.
* lex.cc: Adjust for changed config macro.
---
 libcpp/config.in| 6 +++---
 libcpp/configure| 4 ++--
 libcpp/configure.ac | 6 +++---
 libcpp/lex.cc   | 2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/libcpp/config.in b/libcpp/config.in
index 253ef03a3d..a0ca9e4df4 100644
--- a/libcpp/config.in
+++ b/libcpp/config.in
@@ -35,6 +35,9 @@
*/
 #undef HAVE_ALLOCA_H
 
+/* Define to 1 if you can assemble AVX2 insns. */
+#undef HAVE_AVX2
+
 /* Define to 1 if you have the Mac OS X function
CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
 #undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
@@ -210,9 +213,6 @@
 /* Define to 1 if you have the `putc_unlocked' function. */
 #undef HAVE_PUTC_UNLOCKED
 
-/* Define to 1 if you can assemble SSE4 insns. */
-#undef HAVE_SSE4
-
 /* Define to 1 if you have the  header file. */
 #undef HAVE_STDDEF_H
 
diff --git a/libcpp/configure b/libcpp/configure
index 32d6aaa306..74af097620 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -9140,14 +9140,14 @@ case $target in
 int
 main ()
 {
-asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))
+asm ("vpshufb %ymm0, %ymm1, %ymm2")
   ;
   return 0;
 }
 _ACEOF
 if ac_fn_c_try_compile "$LINENO"; then :
 
-$as_echo "#define HAVE_SSE4 1" >>confdefs.h
+$as_echo "#define HAVE_AVX2 1" >>confdefs.h
 
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index b883fec776..cfefb63552 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -197,9 +197,9 @@ fi
 
 case $target in
   i?86-* | x86_64-*)
-AC_TRY_COMPILE([], [asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))],
-  [AC_DEFINE([HAVE_SSE4], [1],
-[Define to 1 if you can assemble SSE4 insns.])])
+AC_TRY_COMPILE([], [asm ("vpshufb %ymm0, %ymm1, %ymm2")],
+  [AC_DEFINE([HAVE_AVX2], [1],
+[Define to 1 if you can assemble AVX2 insns.])])
 esac
 
 # Enable --enable-host-shared.
diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index 1591dcdf15..fa9c03614c 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -344,7 +344,7 @@ search_line_sse2 (const uchar *s, const uchar *end 
ATTRIBUTE_UNUSED)
   return (const uchar *)p + found;
 }
 
-#ifdef HAVE_SSE4
+#ifdef HAVE_AVX2
 /* A version of the fast scanner using SSE 4.2 vectorized string insns.  */
 
 static const uchar *
-- 
2.44.0



[PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Alexander Monakov
Since the characters we are searching for (CR, LF, '\', '?') all have
distinct ASCII codes mod 16, PSHUFB can help match them all at once.

libcpp/ChangeLog:

* lex.cc (search_line_sse42): Replace with...
(search_line_ssse3): ... this new function.  Adjust the use...
(init_vectorized_lexer): ... here.
---
 libcpp/lex.cc | 118 --
 1 file changed, 46 insertions(+), 72 deletions(-)

diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index fa9c03614c..815b8abd29 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -345,84 +345,58 @@ search_line_sse2 (const uchar *s, const uchar *end 
ATTRIBUTE_UNUSED)
 }
 
 #ifdef HAVE_AVX2
-/* A version of the fast scanner using SSE 4.2 vectorized string insns.  */
+/* A version of the fast scanner using SSSE3 shuffle (PSHUFB) insns.  */
 
 static const uchar *
-#ifndef __SSE4_2__
-__attribute__((__target__("sse4.2")))
+#ifndef __SSSE3__
+__attribute__((__target__("ssse3")))
 #endif
-search_line_sse42 (const uchar *s, const uchar *end)
+search_line_ssse3 (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
 {
   typedef char v16qi __attribute__ ((__vector_size__ (16)));
-  static const v16qi search = { '\n', '\r', '?', '\\' };
-
-  uintptr_t si = (uintptr_t)s;
-  uintptr_t index;
-
-  /* Check for unaligned input.  */
-  if (si & 15)
-{
-  v16qi sv;
-
-  if (__builtin_expect (end - s < 16, 0)
- && __builtin_expect ((si & 0xfff) > 0xff0, 0))
-   {
- /* There are less than 16 bytes left in the buffer, and less
-than 16 bytes left on the page.  Reading 16 bytes at this
-point might generate a spurious page fault.  Defer to the
-SSE2 implementation, which already handles alignment.  */
- return search_line_sse2 (s, end);
-   }
-
-  /* ??? The builtin doesn't understand that the PCMPESTRI read from
-memory need not be aligned.  */
-  sv = __builtin_ia32_loaddqu ((const char *) s);
-  index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
-
-  if (__builtin_expect (index < 16, 0))
-   goto found;
-
-  /* Advance the pointer to an aligned address.  We will re-scan a
-few bytes, but we no longer need care for reading past the
-end of a page, since we're guaranteed a match.  */
-  s = (const uchar *)((si + 15) & -16);
-}
-
-  /* Main loop, processing 16 bytes at a time.  */
-#ifdef __GCC_ASM_FLAG_OUTPUTS__
-  while (1)
+  typedef v16qi v16qi_u __attribute__ ((__aligned__ (1)));
+  /* Helper vector for pshufb-based matching:
+ each character C we're searching for is at position (C % 16).  */
+  v16qi lut = { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, '\n', 0, '\\', '\r', 0, '?' };
+  static_assert ('\n' == 10 && '\r' == 13 && '\\' == 92 && '?' == 63);
+
+  int found;
+  /* Process three 16-byte chunks per iteration.  */
+  for (; ; s += 48)
 {
-  char f;
-
-  /* By using inline assembly instead of the builtin,
-we can use the result, as well as the flags set.  */
-  __asm ("%vpcmpestri\t$0, %2, %3"
-: "=c"(index), "=@ccc"(f)
-: "m"(*s), "x"(search), "a"(4), "d"(16));
-  if (f)
-   break;
-  
-  s += 16;
+  v16qi data, t;
+  /* Unaligned load.  Reading beyond the final newline is safe, since
+files.cc:read_file_guts pads the allocation.  */
+  data = *(const v16qi_u *)s;
+  /* Prevent propagation into pshufb and pcmp as memory operand.  */
+  __asm__ ("" : "+x" (data));
+  t = __builtin_ia32_pshufb128 (lut, data);
+  if ((found = __builtin_ia32_pmovmskb128 (t == data)))
+   goto done;
+  /* Second chunk.  */
+  data = *(const v16qi_u *)(s + 16);
+  __asm__ ("" : "+x" (data));
+  t = __builtin_ia32_pshufb128 (lut, data);
+  if ((found = __builtin_ia32_pmovmskb128 (t == data)))
+   goto add_16;
+  /* Third chunk.  */
+  data = *(const v16qi_u *)(s + 32);
+  __asm__ ("" : "+x" (data));
+  t = __builtin_ia32_pshufb128 (lut, data);
+  if ((found = __builtin_ia32_pmovmskb128 (t == data)))
+   goto add_32;
 }
-#else
-  s -= 16;
-  /* By doing the whole loop in inline assembly,
- we can make proper use of the flags set.  */
-  __asm (  ".balign 16\n"
-   "0: add $16, %1\n"
-   "   %vpcmpestri\t$0, (%1), %2\n"
-   "   jnc 0b"
-   : "=&c"(index), "+r"(s)
-   : "x"(search), "a"(4), "d"(16));
-#endif
-
- found:
-  return s + index;
+add_32:
+  s += 16;
+add_16:
+  s += 16;
+done:
+  return s + __builtin_ctz (found);
 }
 
 #else
-/* Work around out-dated assemblers without sse4 support.  */
-#define search_line_sse42 search_line_sse2
+/* Work around out-dated assemblers without SSSE3 support.  */
+#define search_line_ssse3 search_line_sse2
 #endif
 
 /* Check the CPU capabilities.  */
@@ -440,18 +414,18 @@ init_vectorized_lexer (void)
   search_line_fast_type impl = search_line_acc_char;
   int minimum = 0;
 
-#if defi

[PATCH 3/3] libcpp: add AVX2 helper

2024-08-06 Thread Alexander Monakov
Use the same PSHUFB-based matching as in the SSSE3 helper, just 2x
wider.

Directly use the new helper if __AVX2__ is defined. It makes the other
helpers unused, so mark them inline to prevent warnings.

Rewrite and simplify init_vectorized_lexer.

libcpp/ChangeLog:

* files.cc (read_file_guts): Bump padding to 32 if HAVE_AVX2.
* lex.cc (search_line_acc_char): Mark inline, not "unused".
(search_line_sse2): Mark inline.
(search_line_ssse3): Ditto.
(search_line_avx2): New function.
(init_vectorized_lexer): Reimplement.
---
 libcpp/files.cc |  15 +++
 libcpp/lex.cc   | 111 
 2 files changed, 92 insertions(+), 34 deletions(-)

diff --git a/libcpp/files.cc b/libcpp/files.cc
index 78f56e30bd..3df070d035 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -693,7 +693,7 @@ static bool
 read_file_guts (cpp_reader *pfile, _cpp_file *file, location_t loc,
const char *input_charset)
 {
-  ssize_t size, total, count;
+  ssize_t size, pad, total, count;
   uchar *buf;
   bool regular;
 
@@ -732,11 +732,10 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file, 
location_t loc,
the majority of C source files.  */
 size = 8 * 1024;
 
-  /* The + 16 here is space for the final '\n' and 15 bytes of padding,
- used to quiet warnings from valgrind or Address Sanitizer, when the
- optimized lexer accesses aligned 16-byte memory chunks, including
- the bytes after the malloced, area, and stops lexing on '\n'.  */
-  buf = XNEWVEC (uchar, size + 16);
+  pad = HAVE_AVX2 ? 32 : 16;
+  /* The '+ PAD' here is space for the final '\n' and PAD-1 bytes of padding,
+ allowing search_line_fast to use (possibly misaligned) vector loads.  */
+  buf = XNEWVEC (uchar, size + pad);
   total = 0;
   while ((count = read (file->fd, buf + total, size - total)) > 0)
 {
@@ -747,7 +746,7 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file, 
location_t loc,
  if (regular)
break;
  size *= 2;
- buf = XRESIZEVEC (uchar, buf, size + 16);
+ buf = XRESIZEVEC (uchar, buf, size + pad);
}
 }
 
@@ -765,7 +764,7 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file, 
location_t loc,
 
   file->buffer = _cpp_convert_input (pfile,
 input_charset,
-buf, size + 16, total,
+buf, size + pad, total,
 &file->buffer_start,
 &file->st.st_size);
   file->buffer_valid = file->buffer;
diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index 815b8abd29..c336281658 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -225,10 +225,7 @@ acc_char_index (word_type cmp ATTRIBUTE_UNUSED,
and branches without increasing the number of arithmetic operations.
It's almost certainly going to be a win with 64-bit word size.  */
 
-static const uchar * search_line_acc_char (const uchar *, const uchar *)
-  ATTRIBUTE_UNUSED;
-
-static const uchar *
+static inline const uchar *
 search_line_acc_char (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
 {
   const word_type repl_nl = acc_char_replicate ('\n');
@@ -293,7 +290,7 @@ static const char repl_chars[4][16] 
__attribute__((aligned(16))) = {
 
 /* A version of the fast scanner using SSE2 vectorized byte compare insns.  */
 
-static const uchar *
+static inline const uchar *
 #ifndef __SSE2__
 __attribute__((__target__("sse2")))
 #endif
@@ -345,9 +342,9 @@ search_line_sse2 (const uchar *s, const uchar *end 
ATTRIBUTE_UNUSED)
 }
 
 #ifdef HAVE_AVX2
-/* A version of the fast scanner using SSSE3 shuffle (PSHUFB) insns.  */
+/* Variants of the fast scanner using SSSE3 shuffle (PSHUFB) insns.  */
 
-static const uchar *
+static inline const uchar *
 #ifndef __SSSE3__
 __attribute__((__target__("ssse3")))
 #endif
@@ -394,44 +391,106 @@ done:
   return s + __builtin_ctz (found);
 }
 
+static inline const uchar *
+#ifndef __AVX2__
+__attribute__((__target__("avx2")))
+#endif
+search_line_avx2 (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
+{
+  typedef char v32qi __attribute__ ((__vector_size__ (32)));
+  typedef v32qi v32qi_u __attribute__ ((__aligned__ (1)));
+  v32qi lut = {
+1, 0, 0, 0, 0, 0, 0, 0, 0, 0, '\n', 0, '\\', '\r', 0, '?',
+1, 0, 0, 0, 0, 0, 0, 0, 0, 0, '\n', 0, '\\', '\r', 0, '?'
+  };
+
+  int found;
+  /* Process three 32-byte chunks per iteration.  */
+  for (; ; s += 96)
+{
+  v32qi data, t;
+  data = *(const v32qi_u *)s;
+  __asm__ ("" : "+x" (data));
+  t = __builtin_ia32_pshufb256 (lut, data);
+  if ((found = __builtin_ia32_pmovmskb256 (t == data)))
+   goto done;
+  /* Second chunk.  */
+  data = *(const v32qi_u *)(s + 32);
+  __asm__ ("" : "+x" (data));
+  t = __builtin_ia32_pshufb256 (lut, data);
+  if ((found = __builtin_ia32_pmovmskb256 (t == data)))
+   goto add_32;
+  /*

Re: [RFC v4 0/4] c: Add __lengthof__ operator

2024-08-06 Thread Alejandro Colomar
[CC += David, Florian, Andreas]

On Tue, Aug 06, 2024 at 03:59:11PM GMT, Qing Zhao wrote:
> Hi, Alex,

Hi Qing,

> I noticed that all your 4 versions of the patches and the
> corresponding discussion are all in the same email thread, it’s very
> inconvenient to read. Can you start a new email thread for each of the
> new version of the patch? (i.e, Please not reply to the previous
> version when you have a new version of the patch).

Hmmm; I have the opposite opinion in projects that I maintain.  I prefer
when successive iterations of the same patch set are replies to the same
thread, which allows to easily go back to the previous iterations.

Is there consensus in gcc-patches@ that I should start new threads for
each revision?

I very much prefer to keep using a single thread to keep my sanity, but
I'll do whatever gcc-patches@ maintainers prefer.

> Some more questions and comments below:

Could you please use quoting character?  I find it hard to distinguish
the quoted parts from your own.

> I briefly read the two links you provided as the background of your patch:
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf
> 
> This is a proposal submitted on 6/4/2020, do you know the current
> status of this proposal?

I CCed the author of that proposal, but he didn't say anything.  The
proposal probably died.

> 
> https://inbox.sourceware.org/gcc/m8s4oqy--...@tutanota.com/T/
> 
> This is some discussion within GCC community on this proposal around
> the same time (the end of May of 2020, before the submission date of
> the proposal).
> 
> From the discussion, I didn’t see a consistent positive opinion on the
> proposal itself.

I've read it.  The feedback was basically that _Lengthof() would be
redundant with ARRAY_SIZE() for those careful enough to use it, and a
dead language feature for the cowboys that don't like seat belts.

However, it didn't take into account the possibility of including array
length information in function parameters declared with array notation,
which is a net improvement for everyone.

I've CCed David (the author of 2020's negative feedback) in case he has
any comments.

From what I've seen in these 4 revisions, feedback is not bad.  I've
also been discussing several array features lately, and it seems like
the way forward.  Hopefully, the general opinion has changed.

BTW, the linux kernel is starting to use macros that magically get the
array length:

This is also what shadow utils is doing (done by me).  By having
__lengthof__ work on function parameters, these macros will be usable in
more places.

> So, I am wondering whether you have any new background information
> since then? What’s the major motivation to bring up this proposal
> again this time after 4 years?

When I proposed this a couple of years ago (before knowing about n2539),
there was some positive feedback.  I didn't have the time back then to
implement it, but I have now.  So far, I've only seen positive feedback
about it.

> Rationale:
> 
> -  While compiler extensions already allow implementing ARRAY_SIZE()
>   (), there's still no
>   way to get the length of a function parameter which uses array
>   notation.
> 
> 
> Is this one of major benefits from this new __lenghth__ operator?

I'd say the main one, yes.  As in, I think C++ might have not been
invented or not have developed their own arrays if we had this
functionality in C back then.

> If so, any rough idea now on how to implement this (i.e, the length of
> a function parameter array).

By implementing the function parameters as actual arrays inside the
compiler instead of just pointers.



Martin is working on several array features at the moment.

>  While this first implementation doesn't support those yet
>   (because there are some issues that need to be fixed first), the plan
>   is to add support to those.
> 
> 
> What kind of issues are? What’s the plan to resolve those issues?

n2906.  When n2906 is implemented by Martin, __lengthof__ will be able
to work on function parameters.  He may be able to talk more about it.

> -  The tests seem to work as expected if I compile them manually, and
>   run (the one that should be run) as a normal program.  The one that
>   should not be run also gives the expected diagnostics.
>   Can anyone give advice of why it's not running well under the test
>   suite?
> 
> You might want to check some existing testing cases in GCC’s testsuite
> first to see what kind of directives you are missing in your test
> case. (For example, any testing case in gcc/testsuite/gcc.dg/).

It was some spurious warnings.  Martin helped me with those, and it's
already solved in my working copy.

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: [RESEND PATCH v5 1/3] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

2024-08-06 Thread Philipp Tomsich
Sam, Jakub & Robin,

We had an "OK for trunk" from Jeff for v4 (see
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656907.html) and
it has been two more weeks for this RESEND.
I'll push this by end of this week unless I hear otherwise.

Thanks,
Philipp.


On Fri, 26 Jul 2024 at 12:50, Sam James  wrote:
>
> Manolis Tsamis  writes:
>
> > This is an extension of what was done in PR106590.
>
> FWIW, I think that if a bug is worth mentioning in the commit message,
> it's worth tagging so the hooks pick it up (as you get a nice
> reverse-mapping then if anyone is looking at it and wondering if a
> follow-up occurred).
>
> CC'd Jakub too given he wrote that commit and maybe he wants to review.
>
> Fixed Robin's email in CC list too.
>
> > [...]
>
> thanks,
> sam


Re: [PATCH] c++: further concept_check_p clean-up

2024-08-06 Thread Patrick Palka
On Tue, 6 Aug 2024, Marek Polacek wrote:

> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

LGTM

> 
> -- >8 --
> Patrick noticed a few more concept_check_p checks that can be removed
> now.
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_call_expression): Remove concept_check_p check.
>   (cxx_eval_outermost_constant_expr): Likewise.
>   * cp-gimplify.cc (cp_genericize_r) : Likewise.
>   * except.cc (check_noexcept_r): Likewise.
> ---
>  gcc/cp/constexpr.cc   | 20 ++--
>  gcc/cp/cp-gimplify.cc |  9 -
>  gcc/cp/except.cc  |  2 --
>  3 files changed, 6 insertions(+), 25 deletions(-)
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index 8d994f0ee53..b0adbb9036d 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -2797,10 +2797,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
> value_cat lval,
> bool *non_constant_p, bool *overflow_p)
>  {
> -  /* Handle concept checks separately.  */
> -  if (concept_check_p (t))
> -return evaluate_concept_check (t);
> -
>location_t loc = cp_expr_loc_or_input_loc (t);
>tree fun = get_function_named_in_call (t);
>constexpr_call new_call
> @@ -8774,16 +8770,12 @@ cxx_eval_outermost_constant_expr (tree t, bool 
> allow_non_constant,
>  || TREE_CODE (t) == AGGR_INIT_EXPR
>  || TREE_CODE (t) == TARGET_EXPR))
>  {
> -  /* For non-concept checks, determine if it is consteval.  */
> -  if (!concept_check_p (t))
> - {
> -   tree x = t;
> -   if (TREE_CODE (x) == TARGET_EXPR)
> - x = TARGET_EXPR_INITIAL (x);
> -   tree fndecl = cp_get_callee_fndecl_nofold (x);
> -   if (fndecl && DECL_IMMEDIATE_FUNCTION_P (fndecl))
> - is_consteval = true;
> - }
> +  tree x = t;
> +  if (TREE_CODE (x) == TARGET_EXPR)
> + x = TARGET_EXPR_INITIAL (x);
> +  tree fndecl = cp_get_callee_fndecl_nofold (x);
> +  if (fndecl && DECL_IMMEDIATE_FUNCTION_P (fndecl))
> + is_consteval = true;
>  }
>if (AGGREGATE_TYPE_P (type) || VECTOR_TYPE_P (type))
>  {
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index 0c589eeaaec..003e68f1ea7 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -2092,15 +2092,6 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, 
> void *data)
>break;
>  
>  case CALL_EXPR:
> -  /* Evaluate function concept checks instead of treating them as
> -  normal functions.  */
> -  if (concept_check_p (stmt))
> - {
> -   *stmt_p = evaluate_concept_check (stmt);
> -   * walk_subtrees = 0;
> -   break;
> - }
> -
>if (!wtd->no_sanitize_p
> && sanitize_flags_p ((SANITIZE_NULL
>   | SANITIZE_ALIGNMENT | SANITIZE_VPTR)))
> diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
> index 3c69ab69502..0231bd2507d 100644
> --- a/gcc/cp/except.cc
> +++ b/gcc/cp/except.cc
> @@ -1074,8 +1074,6 @@ check_noexcept_r (tree *tp, int *walk_subtrees, void *)
>  
>   We could use TREE_NOTHROW (t) for !TREE_PUBLIC fns, though... */
>tree fn = cp_get_callee (t);
> -  if (concept_check_p (fn))
> - return NULL_TREE;
>tree type = TREE_TYPE (fn);
>gcc_assert (INDIRECT_TYPE_P (type));
>type = TREE_TYPE (type);
> 
> base-commit: 180625ae72b3f733813a360fae4f0d6ce79eccdc
> -- 
> 2.45.2
> 
> 



[pushed] c++: zero-init and class nttp [PR94568]

2024-08-06 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

A zero-initializer should not reflect the constness of what it's
initializing, as it does not for initializers with different syntax.

This does have mangling implications for rare C++20 code, but it seems
infeasable to make the mangling depend on -fabi-version while fixing the
semantic bug, and C++20 is still experimental anyway.

PR c++/94568

gcc/cp/ChangeLog:

* init.cc (build_zero_init_1): Call cv_unqualified.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class36.C: Remove xfail.
* g++.dg/cpp2a/nontype-class37.C: Remove xfail.
* g++.dg/cpp1z/nontype-auto26.C: New test.
---
 gcc/cp/init.cc   |  3 ++
 gcc/testsuite/g++.dg/cpp1z/nontype-auto26.C  | 29 
 gcc/testsuite/g++.dg/cpp2a/nontype-class36.C |  2 +-
 gcc/testsuite/g++.dg/cpp2a/nontype-class37.C |  2 +-
 4 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto26.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index de82152bd1d..20373d26988 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -173,6 +173,9 @@ build_zero_init_1 (tree type, tree nelts, bool 
static_storage_p,
 
   gcc_assert (nelts == NULL_TREE || TREE_CODE (nelts) == INTEGER_CST);
 
+  /* An initializer is unqualified.  */
+  type = cv_unqualified (type);
+
   if (type == error_mark_node)
 ;
   else if (static_storage_p && zero_init_p (type))
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto26.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype-auto26.C
new file mode 100644
index 000..9abe54ed974
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto26.C
@@ -0,0 +1,29 @@
+// PR c++/94568
+// { dg-do compile { target c++20 } }
+
+struct A;
+typedef int A::*MemPtr;
+
+struct B { MemPtr p; };
+
+static constexpr MemPtr mp { };
+
+template  struct X { };
+
+typedef X XB;
+typedef XXB;
+
+struct C { int a[2]; };
+template  struct D { };
+
+constexpr const int i0 = 0;
+constexpr const int i_{ };
+
+static_assert (i0 == i_);
+
+// typedef D   DC01;
+// typedef D  DC01;
+typedef D  DC01;
+
+// { dg-final { scan-assembler "_Z1f1DIXtl1CtlA2_iLi0ELi1E" } }
+void f(DC01) {}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class36.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class36.C
index 8371eb96621..2e6d76cd43a 100644
--- a/gcc/testsuite/g++.dg/cpp2a/nontype-class36.C
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class36.C
@@ -59,7 +59,7 @@ typedef X XB00p;
 typedef X XB00p;
 typedef X   XB00p;
 typedef X XB00p;
-typedef X XB00p;  // { dg-bogus 
"conflicting declaration" "pr94568" { xfail *-*-* } }
+typedef X XB00p;  // { dg-bogus 
"conflicting declaration" "pr94568" }
 
 static const constexpr MemFuncPtr mfp0 = { 0 };
 static const constexpr MemFuncPtr mfpn = { nullptr };
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class37.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class37.C
index f5e9826d243..dc054a9939a 100644
--- a/gcc/testsuite/g++.dg/cpp2a/nontype-class37.C
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class37.C
@@ -77,4 +77,4 @@ typedef DDC01;
 typedef D DC01;
 typedef D  DC01;
 typedef D DC01;
-typedef D  DC01;   // { dg-bogus "conflicting 
declaration" "pr94567" { xfail *-*-* } }
+typedef D  DC01;   // { dg-bogus "conflicting 
declaration" "pr94568" }

base-commit: 69093fd8aa682a1b906e80b3c5f10956e692b7c4
-- 
2.45.2



Re: [PATCH] tree-optimization/116166 - forward jump-threading going wild

2024-08-06 Thread Andrew MacLeod



On 8/6/24 09:12, Richard Biener wrote:

Currently the forward threader isn't limited as to the search space
it explores and with it now using path-ranger for simplifying
conditions it runs into it became pretty slow for degenerate cases
like compiling insn-emit.cc for RISC-V esp. when compiling for
a host with LOGICAL_OP_NON_SHORT_CIRCUIT disabled.

The following makes the forward threader honor the search space
limit I introduced for the backward threader.  This reduces
compile-time from minutes to seconds for the testcase in PR116166.

Note this wasn't necessary before we had ranger but with ranger
the work we do is quadatic in the length of the threading path
we build up (the same is true for the backwards threader).


Theres probably some work that can be done in the path processing space 
using the new gori_on_edge interface I introduced for the fast VRP pass.


// These APIs are used to query GORI if there are ranges generated on an 
edge.

// GORI_ON_EDGE is used to get all the ranges at once (returned in an
// ssa_cache structure).
// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = NULL);

With this, the threader and path calculator can get and collect all the 
outgoing ranges for a block in linear time and just keep them.. and 
decide what it wants to use.   I suspect for really large CFGs, we'd 
want to substitute and alternative ssa_cache implementation to something 
like the sbr_sparse_bitmap class ranger's  cache uses which compresses 
the size of the vector so it isn't a vector over all the ssa-names, and 
at the same time limit it to  a max of 14 outgoing ranges.


no one has had any time to investigate that  yet.



Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

OK with me.

Andrew



[PATCH 1/2] c++: alias and non-type template parm [PR116223]

2024-08-06 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk and 14.

-- 8< --

My r14-8291 for PR112632 introduced IMPLICIT_CONV_EXPR_FORCED to express
conversions to the type of an alias template parameter.  In this example,
that broke deduction of X in the call to foo, so let's teach deduction to
look through it.

PR c++/116223
PR c++/112632

gcc/cp/ChangeLog:

* pt.cc (deducible_expression): Also look through
IMPLICIT_CONV_EXPR_FORCED.
(unify): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype-auto25.C: New test.
---
 gcc/cp/pt.cc|  6 +-
 gcc/testsuite/g++.dg/cpp1z/nontype-auto25.C | 18 ++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto25.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 35a9c5619f9..cf65b347f6c 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23031,6 +23031,8 @@ deducible_expression (tree expr)
   /* Strip implicit conversions and implicit INDIRECT_REFs.  */
   while (CONVERT_EXPR_P (expr)
 || TREE_CODE (expr) == VIEW_CONVERT_EXPR
+|| (TREE_CODE (expr) == IMPLICIT_CONV_EXPR
+&& IMPLICIT_CONV_EXPR_FORCED (expr))
 || REFERENCE_REF_P (expr))
 expr = TREE_OPERAND (expr, 0);
   return (TREE_CODE (expr) == TEMPLATE_PARM_INDEX);
@@ -24560,7 +24562,9 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
  signedness is the only information lost, and I think that will be
  okay.  VIEW_CONVERT_EXPR can appear with class NTTP, thanks to
  finish_id_expression_1, and are also OK.  */
-  while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR)
+  while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR
+|| (TREE_CODE (parm) == IMPLICIT_CONV_EXPR
+&& IMPLICIT_CONV_EXPR_FORCED (parm)))
 parm = TREE_OPERAND (parm, 0);
 
   if (arg == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto25.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype-auto25.C
new file mode 100644
index 000..36b38b48ec2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto25.C
@@ -0,0 +1,18 @@
+// PR c++/116223
+// { dg-do compile { target c++17 } }
+
+template  struct A { int value = T; };
+
+template  using B = A;
+
+template 
+void foo(B& mat) noexcept
+{
+  //std::cout << mat.value << "\n";
+}
+
+int main()
+{
+A<2> mat;
+foo(mat);
+}

base-commit: 352c21c8a22a48d34cbd2fbfe398ee12c0a1d681
-- 
2.45.2



[PATCH 2/2] c++: more non-type template parms [PR116223]

2024-08-06 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk (not 14).

-- 8< --

Building on the last patch, deduction should probably look through all
IMPLICIT_CONV_EXPR like we do other conversions.

One resulting regression turned out to be due to PR94568, fixed separately.

The one other regression was for a seeming mismatch between a function and
its address, handled here.  Before this change we treated the
IMPLICIT_CONV_EXPR as dependent because the template parameter has dependent
type.

PR c++/116223

gcc/cp/ChangeLog:

* pt.cc (deducible_expression): Strip all IMPLICIT_CONV_EXPR.
(unify): Likewise.  Handle resulting function/addr mismatch.
---
 gcc/cp/pt.cc | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index cf65b347f6c..677ed7d1289 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23031,8 +23031,7 @@ deducible_expression (tree expr)
   /* Strip implicit conversions and implicit INDIRECT_REFs.  */
   while (CONVERT_EXPR_P (expr)
 || TREE_CODE (expr) == VIEW_CONVERT_EXPR
-|| (TREE_CODE (expr) == IMPLICIT_CONV_EXPR
-&& IMPLICIT_CONV_EXPR_FORCED (expr))
+|| TREE_CODE (expr) == IMPLICIT_CONV_EXPR
 || REFERENCE_REF_P (expr))
 expr = TREE_OPERAND (expr, 0);
   return (TREE_CODE (expr) == TEMPLATE_PARM_INDEX);
@@ -24563,8 +24562,7 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
  okay.  VIEW_CONVERT_EXPR can appear with class NTTP, thanks to
  finish_id_expression_1, and are also OK.  */
   while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR
-|| (TREE_CODE (parm) == IMPLICIT_CONV_EXPR
-&& IMPLICIT_CONV_EXPR_FORCED (parm)))
+|| TREE_CODE (parm) == IMPLICIT_CONV_EXPR)
 parm = TREE_OPERAND (parm, 0);
 
   if (arg == error_mark_node)
@@ -24578,6 +24576,12 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   if (parm == any_targ_node || arg == any_targ_node)
 return unify_success (explain_p);
 
+  /* Stripping IMPLICIT_CONV_EXPR above can produce this mismatch
+ (g++.dg/abi/mangle57.C).  */
+  if (TREE_CODE (parm) == FUNCTION_DECL
+  && TREE_CODE (arg) == ADDR_EXPR)
+arg = TREE_OPERAND (arg, 0);
+
   /* If PARM uses template parameters, then we can't bail out here,
  even if ARG == PARM, since we won't record unifications for the
  template parameters.  We might need them if we're trying to
-- 
2.45.2



[PATCH] rust: avoid clobbering LIBS

2024-08-06 Thread Marc Poulhiès
Save LIBS around calls to AC_SEARCH_LIBS to avoid clobbering $LIBS.

ChangeLog:

* configure: Regenerate.
* configure.ac: Save LIBS around calls to AC_SEARCH_LIBS.

Signed-off-by: Marc Poulhiès 
Reviewed-by: Thomas Schwinge 
Tested-by: Thomas Schwinge 
---
Hello,

This has already been merged in our github repository: 
https://github.com/Rust-GCC/gccrs/pull/3121
When testing on Ubuntu 20.04, I (and Thomas, thanks for testing) get:
S["CRAB1_LIBS"]="-lpthread -ldl "
S["LIBS"]=""

So LIBS correctly stays unmodified by our calls to AC_SEARCH_LIBS.

Ok for master?

Marc

 configure| 15 ---
 configure.ac | 15 ---
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index 51bf1d1add1..e9583f2ba0c 100755
--- a/configure
+++ b/configure
@@ -8878,9 +8878,12 @@ fi
 
 # Rust requires -ldl and -lpthread if you are using an old glibc that does not 
include them by
 # default, so we check for them here
-
+# We are doing the test here and not in the gcc/configure to be able to nicely 
disable the
+# build of the Rust frontend in case a dep is missing.
 missing_rust_dynlibs=none
 
+save_LIBS="$LIBS"
+LIBS=
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing 
dlopen" >&5
 $as_echo_n "checking for library containing dlopen... " >&6; }
 if ${ac_cv_search_dlopen+:} false; then :
@@ -8993,16 +8996,14 @@ if test "$ac_res" != no; then :
 
 fi
 
+CRAB1_LIBS="$LIBS"
+LIBS="$save_LIBS"
 
-if test "$ac_cv_search_dlopen" = -ldl; then
-CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test "$ac_cv_search_dlopen" = no; then
+if test "$ac_cv_search_dlopen" = no; then
 missing_rust_dynlibs="libdl"
 fi
 
-if test "$ac_cv_search_pthread_create" = -lpthread; then
-CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test "$ac_cv_search_pthread_create" = no; then
+if test "$ac_cv_search_pthread_create" = no; then
 missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
 fi
 
diff --git a/configure.ac b/configure.ac
index 20457005e29..f61dbe64a94 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2039,21 +2039,22 @@ AC_SUBST(PICFLAG)
 
 # Rust requires -ldl and -lpthread if you are using an old glibc that does not 
include them by
 # default, so we check for them here
-
+# We are doing the test here and not in the gcc/configure to be able to nicely 
disable the
+# build of the Rust frontend in case a dep is missing.
 missing_rust_dynlibs=none
 
+save_LIBS="$LIBS"
+LIBS=
 AC_SEARCH_LIBS([dlopen], [dl])
 AC_SEARCH_LIBS([pthread_create], [pthread])
+CRAB1_LIBS="$LIBS"
+LIBS="$save_LIBS"
 
-if test "$ac_cv_search_dlopen" = -ldl; then
-CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test "$ac_cv_search_dlopen" = no; then
+if test "$ac_cv_search_dlopen" = no; then
 missing_rust_dynlibs="libdl"
 fi
 
-if test "$ac_cv_search_pthread_create" = -lpthread; then
-CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test "$ac_cv_search_pthread_create" = no; then
+if test "$ac_cv_search_pthread_create" = no; then
 missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
 fi
 
-- 
2.42.0



Re: [PATCH] c++: further concept_check_p clean-up

2024-08-06 Thread Jason Merrill

On 8/6/24 12:09 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
Patrick noticed a few more concept_check_p checks that can be removed
now.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Remove concept_check_p check.
(cxx_eval_outermost_constant_expr): Likewise.
* cp-gimplify.cc (cp_genericize_r) : Likewise.
* except.cc (check_noexcept_r): Likewise.
---
  gcc/cp/constexpr.cc   | 20 ++--
  gcc/cp/cp-gimplify.cc |  9 -
  gcc/cp/except.cc  |  2 --
  3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8d994f0ee53..b0adbb9036d 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2797,10 +2797,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  value_cat lval,
  bool *non_constant_p, bool *overflow_p)
  {
-  /* Handle concept checks separately.  */
-  if (concept_check_p (t))
-return evaluate_concept_check (t);
-
location_t loc = cp_expr_loc_or_input_loc (t);
tree fun = get_function_named_in_call (t);
constexpr_call new_call
@@ -8774,16 +8770,12 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
   || TREE_CODE (t) == AGGR_INIT_EXPR
   || TREE_CODE (t) == TARGET_EXPR))
  {
-  /* For non-concept checks, determine if it is consteval.  */
-  if (!concept_check_p (t))
-   {
- tree x = t;
- if (TREE_CODE (x) == TARGET_EXPR)
-   x = TARGET_EXPR_INITIAL (x);
- tree fndecl = cp_get_callee_fndecl_nofold (x);
- if (fndecl && DECL_IMMEDIATE_FUNCTION_P (fndecl))
-   is_consteval = true;
-   }
+  tree x = t;
+  if (TREE_CODE (x) == TARGET_EXPR)
+   x = TARGET_EXPR_INITIAL (x);
+  tree fndecl = cp_get_callee_fndecl_nofold (x);
+  if (fndecl && DECL_IMMEDIATE_FUNCTION_P (fndecl))
+   is_consteval = true;
  }
if (AGGREGATE_TYPE_P (type) || VECTOR_TYPE_P (type))
  {
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 0c589eeaaec..003e68f1ea7 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2092,15 +2092,6 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
break;
  
  case CALL_EXPR:

-  /* Evaluate function concept checks instead of treating them as
-normal functions.  */
-  if (concept_check_p (stmt))
-   {
- *stmt_p = evaluate_concept_check (stmt);
- * walk_subtrees = 0;
- break;
-   }
-
if (!wtd->no_sanitize_p
  && sanitize_flags_p ((SANITIZE_NULL
| SANITIZE_ALIGNMENT | SANITIZE_VPTR)))
diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
index 3c69ab69502..0231bd2507d 100644
--- a/gcc/cp/except.cc
+++ b/gcc/cp/except.cc
@@ -1074,8 +1074,6 @@ check_noexcept_r (tree *tp, int *walk_subtrees, void *)
  
   We could use TREE_NOTHROW (t) for !TREE_PUBLIC fns, though... */

tree fn = cp_get_callee (t);
-  if (concept_check_p (fn))
-   return NULL_TREE;
tree type = TREE_TYPE (fn);
gcc_assert (INDIRECT_TYPE_P (type));
type = TREE_TYPE (type);

base-commit: 180625ae72b3f733813a360fae4f0d6ce79eccdc




Re: [RFC] libstdc++: Replace Ryu with Teju Jagua for float.

2024-08-06 Thread Jonathan Wakely
On Tue, 6 Aug 2024, 17:28 Andi Kleen,  wrote:

> Cassio Neri  writes:
>
> > Implement the template function teju_jagua which finds the shortest
> > representation of a floating-point number. The floating-point type is a
> > template parameter and the implementation is generic enough to handle all
> > floating-point types of interest, namely, IEEE 754, std::bfloat16_t,
> > x86 80-bit and IBM128.
>
> So the only benefit is performance, right?


The functions should be as fast as possible.

The algorithm is also simpler though.

So the patch
> should come with some performance numbers how it is better
> than the old code. Also how did you validate that it works
> correctly?
>

See the talks linked to in
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659362.html



> -Andi
>


Re: [PATCH] c++: Improve fixits for incorrect explicit instantiations

2024-08-06 Thread Jason Merrill

On 8/6/24 5:55 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

When forgetting the '<>' on an explicit specialisation, the suggested
fixit hint suggests to add 'template <>', but naively applying will
cause nonsense results like 'template template <> struct S {};'.

Instead check if we're currently parsing an explicit instantiation, and
if so inform about the issue (an instantiation cannot have a class body)
and suggest a fixit of simply '<>' to create a specialisation instead.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_class_head): Clarify error message for
explicit instantiations.

gcc/testsuite/ChangeLog:

* g++.dg/template/explicit-instantiation9.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/parser.cc  | 19 ++-
  .../g++.dg/template/explicit-instantiation9.C |  6 ++
  2 files changed, 20 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/explicit-instantiation9.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index eb102dea829..4f2ad8201b7 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -27729,11 +27729,20 @@ cp_parser_class_head (cp_parser* parser,
   class_head_start_location,
   get_finish (type_start_token->location));
rich_location richloc (line_table, reported_loc);
-  richloc.add_fixit_insert_before (class_head_start_location,
-   "template <> ");
-  error_at (&richloc,
-   "an explicit specialization must be preceded by"
-   " %%>");
+  if (processing_explicit_instantiation)
+   {
+ richloc.add_fixit_insert_before ("<> ");
+ error_at (&richloc,
+   "an explicit instantiation cannot have a definition;"
+   " use %%> to declare a specialization");
+   }
+  else
+   {
+ richloc.add_fixit_insert_before ("template <> ");
+ error_at (&richloc,
+   "an explicit specialization must be preceded by"
+   " %%>");
+   }
invalid_explicit_specialization_p = true;
/* Take the same action that would have been taken by
 cp_parser_explicit_specialization.  */
diff --git a/gcc/testsuite/g++.dg/template/explicit-instantiation9.C 
b/gcc/testsuite/g++.dg/template/explicit-instantiation9.C
new file mode 100644
index 000..c4400226ef8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/explicit-instantiation9.C
@@ -0,0 +1,6 @@
+// Fixits for specialisations are not valid for instantiations
+
+template 
+struct S {};
+
+template struct S {};  // { dg-error "explicit instantiation cannot have a 
definition" }




Re: [PATCH] c++: permit errors inside uninstantiated templates [PR116064]

2024-08-06 Thread Jason Merrill

On 8/5/24 6:09 PM, Patrick Palka wrote:

On Mon, 5 Aug 2024, Jason Merrill wrote:


On 8/5/24 3:47 PM, Patrick Palka wrote:

On Mon, 5 Aug 2024, Jason Merrill wrote:


On 8/5/24 1:14 PM, Patrick Palka wrote:

On Mon, 5 Aug 2024, Jason Merrill wrote:


On 8/2/24 4:18 PM, Patrick Palka wrote:

On Fri, 2 Aug 2024, Patrick Palka wrote:


On Fri, 2 Aug 2024, Jason Merrill wrote:


On 8/1/24 2:52 PM, Patrick Palka wrote:

In recent versions of GCC we've been diagnosing more and more
kinds of
errors inside a template ahead of time.  This is a largely
good
thing
because it catches bugs, typos, dead code etc sooner.

But if the template never gets instantiated then such errors
are
harmless, and can be inconvenient to work around if say the
code
in
question is third party and in maintenence mode.  So it'd be
useful to


"maintenance"


Fixed




diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index d80bac822ba..0bb0a482e28 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -165,6 +165,58 @@ class cxx_format_postprocessor : public
format_postprocessor
deferred_printed_type m_type_b;
  };
  +/* A map from TEMPLATE_DECL to the location of the first
error (if
any)
+   within the template that we permissivly downgraded to a
warning.
*/


"permissively"


Fixed




+relaxed_template_errors_t *relaxed_template_errors;
+
+/* Callback function
diagnostic_context::m_adjust_diagnostic_info.
+
+   In -fpermissive mode we downgrade errors within a template
to
+   warnings, and only issue an error if we later need to
instantiate
+   the template.  */
+
+static void
+cp_adjust_diagnostic_info (diagnostic_context *context,
+  diagnostic_info *diagnostic)
+{
+  tree ti;
+  if (diagnostic->kind == DK_ERROR
+  && context->m_permissive
+  && !current_instantiation ()
+  && in_template_context
+  && (ti = get_template_info (current_scope (
+{
+  if (!relaxed_template_errors)
+   relaxed_template_errors = new
relaxed_template_errors_t;
+
+  tree tmpl = TI_TEMPLATE (ti);
+  if (!relaxed_template_errors->get (tmpl))
+   relaxed_template_errors->put (tmpl,
diagnostic->richloc->get_loc ());
+  diagnostic->kind = DK_WARNING;


Rather than check m_permissive directly and downgrade to
DK_WARNING,
how
about
downgrading to DK_PERMERROR?  That way people will get the
[-fpermissive]
clue.

...though I suppose DK_PERMERROR doesn't work where you call
this
hook
in
report_diagnostic, at which point we've already reassigned it
into
DK_WARNING
or DK_ERROR in diagnostic_impl.

But we could still set diagnostic->option_index even for
DK_ERROR,
whether to
context->m_opt_permissive or to its own warning flag, perhaps
-Wno-template-body?


Fixed by adding an enabled-by-default -Wtemplate-body flag and
setting
option_index to it for each downgraded error.  Thus -permissive
-Wno-template-body would suppress the downgraded warnings
entirely,
and
only issue a generic error upon instantiation of the erroneous
template.


... or did you have in mind to set option_index even when not using
-fpermissive so that eligible non-downgraded errors get the
[-fpermissive] or [-Wtemplate-body] hint as well?


Yes.


IMHO I'm not sure that'd be worth the extra noise since the vast
majority of users appreciate and expect errors to get diagnosed
inside
templates.


But people trying to build legacy code should appreciate the pointer
for
how
to make it compile, as with other permerrors.


And on second thought I'm not sure what extra value a new warning
flag
adds either.  I can't think of a good reason why one would use
-fpermissive -Wno-template-body?


One would use -Wno-template-body (or -Wno-error=template-body) without
-fpermissive, like with the various permerror_opt cases.


Since compiling legacy/unmaintained code is the only plausible use case,
why have a dedicated warning flag instead of just recommending
-fpermissive
when compiling legacy code?  I don't quite understand the motivation for
adding a new permerror_opt flag for this class of errors.


It seems to me an interesting class of errors, but I don't mind leaving it
under just -fpermissive if you prefer.


-Wnarrowing is an existing permerror_opt flag, but I can imagine it's
useful to pass -Wno-error=narrowing etc when incrementally migrating
C / C++98 code to modern C++ where you don't want any conformance errors
allowed by -fpermissive to sneak in.  So being able to narrowly control
this class of errors seems useful, so a dedicated flag makes sense.

But there's no parallel for -Wtemplate-body here, since by assumption
the code base is unmaintained / immutable.  Otherwise the more proper
fix would be to just fix and/or delete the uninstantiated erroneous
template.  If say you're #including a legacy header that has such
errors, then doing #pragma GCC diagnostic "-fpermissive -w" around
the #include should be totally fine too.


I just realized #pragma GCC diagnostic warning "-fpermissive" etc
doesn't actually work since -f

[Committed 1/3] RISC-V: Fix comment typos

2024-08-06 Thread Patrick O'Neill



On 8/5/24 20:19, Jeff Law wrote:



On 8/5/24 4:29 PM, Patrick O'Neill wrote:
This fixes most of the typos I found when reading various parts of 
the RISC-V

backend.
Comment typos are always OK to fix under the "obvious" rule.  No need 
to wait for an ACK.


Jeff


Committed.

Patrick



  1   2   >