Re: [PATCH] cse: Fix up record_jump_equiv checks [PR117095]

2024-12-13 Thread Jeff Law




On 12/13/24 8:20 AM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled on s390x-linux with -O2 -march=z15.
The problem happens during cse2, which sees in an extended basic block
(jump_insn 217 78 216 10 (parallel [
 (set (pc)
 (if_then_else (ne (reg:SI 165)
 (const_int 1 [0x1]))
 (label_ref 216)
 (pc)))
 (set (reg:SI 165)
 (plus:SI (reg:SI 165)
 (const_int -1 [0x])))
 (clobber (scratch:SI))
 (clobber (reg:CC 33 %cc))
 ]) "t.c":14:17 discrim 1 2192 {doloop_si64}
  (int_list:REG_BR_PROB 955630228 (nil))
  -> 216)
...
(insn 99 98 100 12 (set (reg:SI 138)
 (const_int 1 [0x1])) "t.c":9:31 1507 {*movsi_zarch}
  (nil))
(insn 100 99 103 12 (parallel [
 (set (reg:SI 137)
 (minus:SI (reg:SI 138)
 (subreg:SI (reg:HI 135 [ a ]) 0)))
 (clobber (reg:CC 33 %cc))
 ]) "t.c":9:31 1904 {*subsi3}
  (expr_list:REG_DEAD (reg:SI 138)
 (expr_list:REG_DEAD (reg:HI 135 [ a ])
 (expr_list:REG_UNUSED (reg:CC 33 %cc)
 (nil)
I don't really see the connection between (reg 165) and (reg 138), but I 
don't think it matters enough to dive into.





This optimization isn't correct here though, because the JUMP_INSN has
multiple sets.  Before r0-77890 record_jump_equiv has been called from
cse_insn guarded on n_sets == 1 && any_condjump_p (insn), so it wouldn't
be done on the above JUMP_INSN where n_sets == 2.  But since that change
it is guarded with single_set (insn) && any_condjump_p (insn) and that
is true because of the REG_UNUSED note.  Looking at that note is
inappropriate in CSE though, because the whole intent of the pass is to
extend the lifetimes of the pseudos if equivalence is found, so the fact
that there is REG_UNUSED note for (reg:SI 165) and that the reg isn't used
later doesn't imply that it won't be used after the optimization.

Exactly.  CSE inherently trashes the meaning of REG_UNUSED notes




The patch below adds !multiple_sets (insn) check instead of replacing with
it the single_set (insn) check, because apparently any_condjump_p uses
pc_set which supports the case where PATTERN is a SET to PC (that is a
single_set (insn) && !multiple_sets (insn), PATTERN is a PARALLEL with a
single SET to PC (likewise) and some CLOBBERs, PARALLEL with two or more
SETs where the first one is SET to PC (that could be single_set (insn)
with REG_UNUSED notes but is not !multiple_sets (insn)) or PATTERN
is UNSPEC/UNSPEC_VOLATILE with SET inside of it.  For the last case
!multiple_sets (insn) will be true, but IMHO we shouldn't try to derive
anything from those because we haven't checked the rest of the UNSPEC*
and we don't really know what it does.

Right.



Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, ok
for trunk?

2024-12-13  Jakub Jelinek  

PR rtl-optimization/117095
* cse.cc (cse_extended_basic_block): Don't call record_jump_equiv
if multiple_sets (insn).

* gcc.c-torture/execute/pr117095.c: New test.

OK

jeff



Re: [PATCH v2] RISC-V: Increase cost for vec_construct [PR118019].

2024-12-13 Thread 钟居哲
OK.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-12-13 23:20
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH v2] RISC-V: Increase cost for vec_construct [PR118019].
Hi,
 
for a generic vec_construct from scalar elements we need to load each
scalar element and move it over to a vector register.
Right now we only use a cost of 1 per element.
 
This patch uses register-move cost as well as scalar_to_vec and multiplies it
with the number of elements in the vector instead.
 
Regtested on rv64gcv_zvl512b.
 
Changes from V1:
- Added a test case.
 
Regards
Robin
 
 
PR target/118019
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_builtin_vectorization_cost):
Increase vec_construct cost.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr118019.c: New test.
---
gcc/config/riscv/riscv.cc |  8 ++-
.../gcc.target/riscv/rvv/autovec/pr118019.c   | 52 +++
2 files changed, 59 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index be2ebf9d9c0..aa8a4562d9a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12263,7 +12263,13 @@ riscv_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 case vec_construct:
-  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
+ {
+   /* TODO: This is too pessimistic in case we can splat.  */
+   int regmove_cost = fp ? costs->regmove->FR2VR
+ : costs->regmove->GR2VR;
+   return (regmove_cost + common_costs->scalar_to_vec_cost)
+ * estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
+ }
 default:
   gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c
new file mode 100644
index 000..b1431d123bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl512b -mstrict-align 
-mvector-strict-align" } */
+
+/* Make sure we do not construct the vector element-wise despite
+   slow misaligned scalar and vector accesses.  */
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+
+#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3)  
\
+  {
\
+int t0 = s0 + s1;  
\
+int t1 = s0 - s1;  
\
+int t2 = s2 + s3;  
\
+int t3 = s2 - s3;  
\
+d0 = t0 + t2;  
\
+d2 = t0 - t2;  
\
+d1 = t1 + t3;  
\
+d3 = t1 - t3;  
\
+  }
+
+uint32_t
+abs2 (uint32_t a)
+{
+  uint32_t s = ((a >> 15) & 0x10001) * 0x;
+  return (a + s) ^ s;
+}
+
+int
+x264_pixel_satd_8x4 (uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2)
+{
+  uint32_t tmp[4][4];
+  uint32_t a0, a1, a2, a3;
+  int sum = 0;
+  for (int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2)
+{
+  a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
+  a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
+  a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
+  a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
+  HADAMARD4 (tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0, a1, a2, a3);
+}
+  for (int i = 0; i < 4; i++)
+{
+  HADAMARD4 (a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
+  sum += abs2 (a0) + abs2 (a1) + abs2 (a2) + abs2 (a3);
+}
+  return (((uint16_t) sum) + ((uint32_t) sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-assembler-not "lbu" } } */
-- 
2.47.1
 
 


[COMMITTED 15/20] ada: Fix code indentation

2024-12-13 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix uncontroversial coding style violations detected by an experiment with
a tree-sitter indentation support in Emacs.

gcc/ada/ChangeLog:

* atree.adb, diagnostics-pretty_emitter.adb,
diagnostics-utils.adb, einfo-utils.adb, errout.adb, exp_aggr.adb,
exp_ch3.adb, exp_ch5.adb, exp_ch6.adb, exp_ch7.adb, exp_imgv.adb,
exp_pakd.adb, exp_prag.adb, exp_unst.adb, exp_util.adb, gnatchop.adb,
gnatlink.adb, inline.adb, itypes.adb, osint.adb, rtsfind.adb,
sem_aggr.adb, sem_ch10.adb, sem_ch12.adb, sem_ch13.adb, sem_ch3.adb,
sem_ch4.adb, sem_dim.adb, sem_elab.adb, sem_prag.adb, sem_util.adb,
sprint.adb, switch-m.adb, table.adb: Fix code indentation.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/atree.adb  |  2 +-
 gcc/ada/diagnostics-pretty_emitter.adb | 30 +-
 gcc/ada/diagnostics-utils.adb  |  8 ++---
 gcc/ada/einfo-utils.adb| 10 +++---
 gcc/ada/errout.adb |  8 ++---
 gcc/ada/exp_aggr.adb   | 26 
 gcc/ada/exp_ch3.adb| 42 -
 gcc/ada/exp_ch5.adb|  4 +--
 gcc/ada/exp_ch6.adb| 14 -
 gcc/ada/exp_ch7.adb|  7 +++--
 gcc/ada/exp_imgv.adb   |  2 +-
 gcc/ada/exp_pakd.adb   |  8 ++---
 gcc/ada/exp_prag.adb   |  8 ++---
 gcc/ada/exp_unst.adb   | 16 +-
 gcc/ada/exp_util.adb   | 42 -
 gcc/ada/gnatchop.adb   |  2 +-
 gcc/ada/gnatlink.adb   | 16 +-
 gcc/ada/inline.adb | 10 +++---
 gcc/ada/itypes.adb |  6 ++--
 gcc/ada/osint.adb  |  2 +-
 gcc/ada/rtsfind.adb|  2 +-
 gcc/ada/sem_aggr.adb   | 14 -
 gcc/ada/sem_ch10.adb   | 36 ++---
 gcc/ada/sem_ch12.adb   | 25 ---
 gcc/ada/sem_ch13.adb   | 26 
 gcc/ada/sem_ch3.adb| 10 +++---
 gcc/ada/sem_ch4.adb| 20 ++--
 gcc/ada/sem_dim.adb|  8 ++---
 gcc/ada/sem_elab.adb   | 20 ++--
 gcc/ada/sem_prag.adb   | 16 +-
 gcc/ada/sem_util.adb   | 43 +-
 gcc/ada/sprint.adb |  4 +--
 gcc/ada/switch-m.adb   | 16 +-
 gcc/ada/table.adb  |  4 +--
 34 files changed, 254 insertions(+), 253 deletions(-)

diff --git a/gcc/ada/atree.adb b/gcc/ada/atree.adb
index 416097bb272..8cc22394b0c 100644
--- a/gcc/ada/atree.adb
+++ b/gcc/ada/atree.adb
@@ -2444,7 +2444,7 @@ package body Atree is
---
 
function Internal_Traverse_With_Parent
-  (Node : Node_Id) return Traverse_Final_Result
+ (Node : Node_Id) return Traverse_Final_Result
is
   Tail_Recursion_Counter : Natural := 0;
 
diff --git a/gcc/ada/diagnostics-pretty_emitter.adb 
b/gcc/ada/diagnostics-pretty_emitter.adb
index df27a5c6fde..e376ae12803 100644
--- a/gcc/ada/diagnostics-pretty_emitter.adb
+++ b/gcc/ada/diagnostics-pretty_emitter.adb
@@ -163,14 +163,14 @@ package body Diagnostics.Pretty_Emitter is
  (Intersecting_Labels : Labeled_Span_List);
 
function Get_Line_End
-  (Buf : Source_Buffer_Ptr;
-   Loc : Source_Ptr) return Source_Ptr;
+ (Buf : Source_Buffer_Ptr;
+  Loc : Source_Ptr) return Source_Ptr;
--  Get the source location for the end of the line (LF) in Buf for Loc. If
--  Loc is past the end of Buf already, return Buf'Last.
 
function Get_Line_Start
-  (Buf : Source_Buffer_Ptr;
-   Loc : Source_Ptr) return Source_Ptr;
+ (Buf : Source_Buffer_Ptr;
+  Loc : Source_Ptr) return Source_Ptr;
--  Get the source location for the start of the line in Buf for Loc
 
function Get_First_Line_Char
@@ -187,22 +187,22 @@ package body Diagnostics.Pretty_Emitter is
--  Width digits.
 
procedure Write_Buffer
-  (Buf   : Source_Buffer_Ptr;
-   First : Source_Ptr;
-   Last  : Source_Ptr);
+ (Buf   : Source_Buffer_Ptr;
+  First : Source_Ptr;
+  Last  : Source_Ptr);
--  Output the characters from First to Last position in Buf, using
--  Write_Buffer_Char.
 
procedure Write_Buffer_Char
-  (Buf : Source_Buffer_Ptr;
-   Loc : Source_Ptr);
+ (Buf : Source_Buffer_Ptr;
+  Loc : Source_Ptr);
--  Output the characters at position Loc in Buf, translating ASCII.HT
--  in a suitable number of spaces so that the output is not modified
--  by starting in a different column that 1.
 
procedure Write_Line_Marker
-  (Num   : Pos;
-   Width : Positive);
+ (Num   : Pos;
+  Width : Positive);
 
procedure Write_

[COMMITTED 02/20] ada: Remove implicit assumption in the double case

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

The assumption is fulfilled in all the instantiations of the package, but
it should not be made in the generic code.

gcc/ada/ChangeLog:

* libgnat/s-imager.adb (Set_Image_Real): In the case where a double
integer is needed, do not implicit assume that it can contain up to
'Digits of the floating-point type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-imager.adb | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/libgnat/s-imager.adb b/gcc/ada/libgnat/s-imager.adb
index 89f9c1b020a..f30478843a8 100644
--- a/gcc/ada/libgnat/s-imager.adb
+++ b/gcc/ada/libgnat/s-imager.adb
@@ -432,30 +432,39 @@ package body System.Image_R is
 
  --  Otherwise, do the conversion in two steps
 
- else pragma Assert (X <= 10.0 ** Num'Digits * Num (Uns'Last));
+ else
 declare
-   Y : constant Uns := To_Unsigned (X / Powten (Num'Digits));
+   Halfdigs : constant Natural := Maxdigs / 2;
 
-   Buf : String (1 .. Num'Digits);
+   Buf : String (1 .. Halfdigs);
Len : Natural;
+   Y   : Uns;
 
 begin
+   --  Compute upper Halfdigs stripped from leading zeros
+
+   Y := To_Unsigned (X / Powten (Halfdigs));
Set_Image_Unsigned (Y, Digs, Ndigs);
 
-   X := X - From_Unsigned (Y) * Powten (Num'Digits);
+   --  Compute lower Halfdigs stripped from leading zeros
 
Len := 0;
+   X := X - From_Unsigned (Y) * Powten (Halfdigs);
Set_Image_Unsigned (To_Unsigned (X), Buf, Len);
+   pragma Assert (Len <= Halfdigs);
+
+   --  Concatenate unmodified upper part with zero-padded
+   --  lower part up to Halfdigs.
 
-   for J in 1 .. Num'Digits - Len loop
+   for J in 1 .. Halfdigs - Len loop
   Digs (Ndigs + J) := '0';
end loop;
 
for J in 1 .. Len loop
-  Digs (Ndigs + Num'Digits - Len + J) := Buf (J);
+  Digs (Ndigs + Halfdigs - Len + J) := Buf (J);
end loop;
 
-   Ndigs := Ndigs + Num'Digits;
+   Ndigs := Ndigs + Halfdigs;
 end;
  end if;
   end if;
-- 
2.43.0



[COMMITTED 14/20] ada: Fix fixed point text-io when subtype has dynamic range

2024-12-13 Thread Marc Poulhiès
When the fixed point subtype has dynamic range, for example in the
context of a generic procedure Test where Fixed_Type is a type formal:

  procedure Test (Low, High : Fixed_Type) is
type New_Subtype is new Fixed_Type range Low .. High;
package New_Io is new Text_IO.Fixed_IO (New_Subtype);

the compiler would complain with:
 non-static universal integer value out of range

Have the check use the Base type for checking what integer type can be
used. If a given integer type can be used for a base type, it can
also be used for any of its subtypes.

gcc/ada/ChangeLog:

* libgnat/a-tifiio.adb (OK_Get_32): Use 'Base.
(OK_Put_32, OK_Get_64, OK_Put_64): Likewise.
* libgnat/a-tifiio__128.adb (OK_Get_32, OK_Put_32, OK_Get_64)
(OK_Put_64, OK_Get_128, OK_Put_128): Likewise.
* libgnat/a-wtfiio.adb (OK_Get_32): Likewise.
(OK_Put_32, OK_Get_64, OK_Put_64): Likewise.
* libgnat/a-wtfiio__128.adb (OK_Get_32, OK_Put_32, OK_Get_64)
(OK_Put_64, OK_Get_128, OK_Put_128): Likewise.
* libgnat/a-ztfiio.adb (OK_Get_32): Likewise.
(OK_Put_32, OK_Get_64, OK_Put_64): Likewise.
* libgnat/a-ztfiio__128.adb (OK_Get_32, OK_Put_32, OK_Get_64)
(OK_Put_64, OK_Get_128, OK_Put_128): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-tifiio.adb  | 48 -
 gcc/ada/libgnat/a-tifiio__128.adb | 72 ++-
 gcc/ada/libgnat/a-wtfiio.adb  | 48 -
 gcc/ada/libgnat/a-wtfiio__128.adb | 72 ++-
 gcc/ada/libgnat/a-ztfiio.adb  | 48 -
 gcc/ada/libgnat/a-ztfiio__128.adb | 72 ++-
 6 files changed, 210 insertions(+), 150 deletions(-)

diff --git a/gcc/ada/libgnat/a-tifiio.adb b/gcc/ada/libgnat/a-tifiio.adb
index 7358d123313..e4642185b00 100644
--- a/gcc/ada/libgnat/a-tifiio.adb
+++ b/gcc/ada/libgnat/a-tifiio.adb
@@ -191,51 +191,59 @@ package body Ada.Text_IO.Fixed_IO with SPARK_Mode => Off 
is
OK_Get_32 : constant Boolean :=
  Num'Base'Object_Size <= 32
and then
- ((Num'Small_Numerator = 1 and then Num'Small_Denominator <= 2**31)
+ ((Num'Base'Small_Numerator = 1
+and then Num'Base'Small_Denominator <= 2**31)
or else
-  (Num'Small_Denominator = 1 and then Num'Small_Numerator <= 2**31)
+  (Num'Base'Small_Denominator = 1
+and then Num'Base'Small_Numerator <= 2**31)
or else
-  (Num'Small_Numerator <= 2**27
-and then Num'Small_Denominator <= 2**27));
+  (Num'Base'Small_Numerator <= 2**27
+and then Num'Base'Small_Denominator <= 2**27));
--  These conditions are derived from the prerequisites of System.Value_F
 
OK_Put_32 : constant Boolean :=
  Num'Base'Object_Size <= 32
and then
- ((Num'Small_Numerator = 1 and then Num'Small_Denominator <= 2**31)
+ ((Num'Base'Small_Numerator = 1
+and then Num'Base'Small_Denominator <= 2**31)
or else
-  (Num'Small_Denominator = 1 and then Num'Small_Numerator <= 2**31)
+  (Num'Base'Small_Denominator = 1
+and then Num'Base'Small_Numerator <= 2**31)
or else
-  (Num'Small_Numerator < Num'Small_Denominator
-and then Num'Small_Denominator <= 2**27)
+  (Num'Base'Small_Numerator < Num'Base'Small_Denominator
+and then Num'Base'Small_Denominator <= 2**27)
or else
-  (Num'Small_Denominator < Num'Small_Numerator
-and then Num'Small_Numerator <= 2**25));
+  (Num'Base'Small_Denominator < Num'Base'Small_Numerator
+and then Num'Base'Small_Numerator <= 2**25));
--  These conditions are derived from the prerequisites of System.Image_F
 
OK_Get_64 : constant Boolean :=
  Num'Base'Object_Size <= 64
and then
- ((Num'Small_Numerator = 1 and then Num'Small_Denominator <= 2**63)
+ ((Num'Base'Small_Numerator = 1
+and then Num'Base'Small_Denominator <= 2**63)
or else
-  (Num'Small_Denominator = 1 and then Num'Small_Numerator <= 2**63)
+  (Num'Base'Small_Denominator = 1
+and then Num'Base'Small_Numerator <= 2**63)
or else
-  (Num'Small_Numerator <= 2**59
-and then Num'Small_Denominator <= 2**59));
+  (Num'Base'Small_Numerator <= 2**59
+and then Num'Base'Small_Denominator <= 2**59));
--  These conditions are derived from the prerequisites of System.Value_F
 
OK_Put_64 : constant Boolean :=
  Num'Base'Object_Size <= 64
and then
- ((Num'Small_Numerator = 1 and then Num'Small_Denominator <= 2**63)
+ ((Num'Base'Small_Numerator = 1
+and then Num'Base'Small_Denominator <= 2**63)
or else
-  (Num'Small_Denominator = 1 and then Num'Small_Numerator <= 2**63)
+   

[COMMITTED 10/20] ada: Elide copy for calls in allocators for nonlimited by-reference types

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

This prevents a temporary from being created on the primary stack to hold
the result of the function calls before it is copied to the newly allocated
memory in the nonlimited by-reference case.

That's already not done in the nonlimited non-by-reference case and there is
no reason to do it in the former case either.  The main issue is the call to
Remove_Side_Effects in Expand_Allocator_Expression, but its only purpose is
to cover the problematic processing done in Build_Allocate_Deallocate_Proc
on (part of) the expression; once this is fixed, the call is unnecessary.

The change also contains another small fix to deal with the corner case of
allocators for access-to-access types.

gcc/ada/ChangeLog:

* exp_ch4.adb (Expand_Allocator_Expression): Do not preventively
call Remove_Side_Effects on the expression in the nonlimited
by-reference case.  Always call Build_Allocate_Deallocate_Proc
in the default case.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Bail out if the call
is the qualified expression of an allocator.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Replace all the
calls to Relocate_Node by calls to Duplicate_Subexpr_No_Checks.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb  | 44 ++--
 gcc/ada/exp_ch6.adb  | 10 ++
 gcc/ada/exp_util.adb | 18 --
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 8c1faf415e1..6d8aa0e6eeb 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -820,19 +820,6 @@ package body Exp_Ch4 is
  --  We analyze by hand the new internal allocator to avoid any
  --  recursion and inappropriate call to Initialize.
 
- --  We don't want to remove side effects when the expression must be
- --  built in place and we don't need it when there is no storage pool
- --  or this is a return/secondary stack allocation.
-
- if not Aggr_In_Place
-   and then not Delayed_Cond_Expr
-   and then Present (Storage_Pool (N))
-   and then not Is_RTE (Storage_Pool (N), RE_RS_Pool)
-   and then not Is_RTE (Storage_Pool (N), RE_SS_Pool)
- then
-Remove_Side_Effects (Exp);
- end if;
-
  Temp := Make_Temporary (Loc, 'P', N);
 
  --  For a class wide allocation generate the following code:
@@ -1079,6 +1066,8 @@ package body Exp_Ch4 is
 Displace_Allocator_Pointer (N);
  end if;
 
+  --  Case of aggregate built in place
+
   elsif Aggr_In_Place then
  Temp := Make_Temporary (Loc, 'P', N);
  Build_Aggregate_In_Place (Temp, PtrT);
@@ -1099,26 +1088,29 @@ package body Exp_Ch4 is
  Analyze_And_Resolve (N, PtrT);
  Apply_Predicate_Check (N, T, Deref => True);
 
-  elsif Is_Access_Type (T) and then Can_Never_Be_Null (T) then
- Install_Null_Excluding_Check (Exp);
+  --  Default case
 
-  elsif Is_Access_Type (DesigT)
-and then Nkind (Exp) = N_Allocator
-and then Nkind (Expression (Exp)) /= N_Qualified_Expression
-  then
- --  Apply constraint to designated subtype indication
+  else
+ if Is_Access_Type (T) and then Can_Never_Be_Null (T) then
+Install_Null_Excluding_Check (Exp);
+ end if;
 
- Apply_Constraint_Check
-   (Expression (Exp), Designated_Type (DesigT), No_Sliding => True);
+ if Is_Access_Type (DesigT)
+   and then Nkind (Exp) = N_Allocator
+   and then Nkind (Expression (Exp)) /= N_Qualified_Expression
+ then
+--  Apply constraint to designated subtype indication
 
- if Nkind (Expression (Exp)) = N_Raise_Constraint_Error then
+Apply_Constraint_Check
+  (Expression (Exp), Designated_Type (DesigT), No_Sliding => True);
 
---  Propagate constraint_error to enclosing allocator
+--  Propagate Constraint_Error to enclosing allocator
 
-Rewrite (Exp, New_Copy (Expression (Exp)));
+if Nkind (Expression (Exp)) = N_Raise_Constraint_Error then
+   Rewrite (Exp, New_Copy (Expression (Exp)));
+end if;
  end if;
 
-  else
  Build_Allocate_Deallocate_Proc (N);
 
  --  For an access-to-unconstrained-packed-array type, build an
diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index 945f44630d1..751c5f4b5cd 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -5392,6 +5392,16 @@ package body Exp_Ch6 is
  return;
   end if;
 
+  --  The same optimization: if the returned value is used to initialize a
+  --  dynamically allocated object, then no need to copy/readjust/finalize,
+  --  we can initialize it in place.
+
+  if Nkind (Par) = N_Qualified_Expression
+and then Nkind (Par

[COMMITTED 16/20] ada: Fix indentation in record component declarations

2024-12-13 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup.

gcc/ada/ChangeLog:

* exp_aggr.adb (Case_Bounds): Fix indentation.
* sem_case.adb (Choice_Bounds): Likewise.
* libgnat/s-dourea.ads (Duuble_T): Likewise.
* libgnat/s-excmac__arm.ads (Cleanup_Cache_Type): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb  | 6 +++---
 gcc/ada/libgnat/s-dourea.ads  | 2 +-
 gcc/ada/libgnat/s-excmac__arm.ads | 2 +-
 gcc/ada/sem_case.adb  | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 092e67c8a81..9aabd58b2a9 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -79,9 +79,9 @@ with Warnsw; use Warnsw;
 package body Exp_Aggr is
 
type Case_Bounds is record
- Choice_Lo   : Node_Id;
- Choice_Hi   : Node_Id;
- Choice_Node : Node_Id;
+  Choice_Lo   : Node_Id;
+  Choice_Hi   : Node_Id;
+  Choice_Node : Node_Id;
end record;
 
type Case_Table_Type is array (Nat range <>) of Case_Bounds;
diff --git a/gcc/ada/libgnat/s-dourea.ads b/gcc/ada/libgnat/s-dourea.ads
index 5112228f687..61f974c7eef 100644
--- a/gcc/ada/libgnat/s-dourea.ads
+++ b/gcc/ada/libgnat/s-dourea.ads
@@ -43,7 +43,7 @@ package System.Double_Real is
pragma Pure;
 
type Double_T is record
- Hi, Lo : Num;
+  Hi, Lo : Num;
end record;
 
function To_Double (N : Num) return Double_T is ((Hi => N, Lo => 0.0));
diff --git a/gcc/ada/libgnat/s-excmac__arm.ads 
b/gcc/ada/libgnat/s-excmac__arm.ads
index 463191d6b42..d6792d3ca6f 100644
--- a/gcc/ada/libgnat/s-excmac__arm.ads
+++ b/gcc/ada/libgnat/s-excmac__arm.ads
@@ -114,7 +114,7 @@ package System.Exceptions.Machine is
end record;
 
type Cleanup_Cache_Type is record
- Bitpattern : uint32_t_array (0 .. 3);
+  Bitpattern : uint32_t_array (0 .. 3);
end record;
 
type Pr_Cache_Type is record
diff --git a/gcc/ada/sem_case.adb b/gcc/ada/sem_case.adb
index 9d197870414..b85afede98b 100644
--- a/gcc/ada/sem_case.adb
+++ b/gcc/ada/sem_case.adb
@@ -58,9 +58,9 @@ with GNAT.Sets;
 package body Sem_Case is
 
type Choice_Bounds is record
- Lo   : Node_Id;
- Hi   : Node_Id;
- Node : Node_Id;
+  Lo   : Node_Id;
+  Hi   : Node_Id;
+  Node : Node_Id;
end record;
--  Represent one choice bounds entry with Lo and Hi values, Node points
--  to the choice node itself.
-- 
2.43.0



[COMMITTED 07/20] ada: Fix typo in reference manual

2024-12-13 Thread Marc Poulhiès
From: Ronan Desplanques 

gcc/ada/ChangeLog:

* doc/gnat_rm/gnat_language_extensions.rst: Fix typo.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/doc/gnat_rm/gnat_language_extensions.rst | 2 +-
 gcc/ada/gnat_rm.texi | 2 +-
 gcc/ada/gnat_ugn.texi| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/gnat_language_extensions.rst 
b/gcc/ada/doc/gnat_rm/gnat_language_extensions.rst
index 32fa6fb8e8b..4e7f9fae602 100644
--- a/gcc/ada/doc/gnat_rm/gnat_language_extensions.rst
+++ b/gcc/ada/doc/gnat_rm/gnat_language_extensions.rst
@@ -1516,7 +1516,7 @@ No_Raise aspect
 
 The ``No_Raise`` aspect can be applied to a subprogram to declare that this 
subprogram is not
 expected to raise any exceptions. Should an exception still occur during the 
execution of
-this subpropgram, ``Program_Error`` is raised.
+this subprogram, ``Program_Error`` is raised.
 
 New specification for ``Ada.Finalization.Controlled``
 ^
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index e2e2c310524..adced897ad5 100644
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -30742,7 +30742,7 @@ heap-allocated objects
 
 The @code{No_Raise} aspect can be applied to a subprogram to declare that this 
subprogram is not
 expected to raise any exceptions. Should an exception still occur during the 
execution of
-this subpropgram, @code{Program_Error} is raised.
+this subprogram, @code{Program_Error} is raised.
 
 @menu
 * New specification for Ada.Finalization.Controlled: New specification for Ada 
Finalization Controlled. 
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index d6c87ef5098..662fe1c1642 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -29839,8 +29839,8 @@ to permit their use in free software.
 
 @printindex ge
 
-@anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
 @anchor{d2}@w{  }
+@anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
 
 @c %**end of body
 @bye
-- 
2.43.0



[COMMITTED 17/20] ada: Improve expansion of nested conditional expressions in return statements

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

This arranges for nested conditional expressions in simple return statements
to have their expansion delayed until the returns are distributed into their
dependent expressions.  This comprises the case of the elsif part of an if
expression present in the source code.

This also distributes qualified expressions into the dependent expressions
of conditional expressions, although this seems to occur rarely in practice.

gcc/ada/ChangeLog:

* exp_aggr.ads (Is_Delayed_Conditional_Expression): Move to...
* exp_aggr.adb (Is_Delayed_Conditional_Expression): Move to...
(Convert_To_Assignments): Use Delay_Conditional_Expressions_Between.
* exp_ch3.adb (Expand_N_Object_Declaration): Reset the Analyzed flag
by means of Unanalyze_Delayed_Conditional_Expression.
* exp_ch4.adb (Expand_N_Case_Expression): Likewise.  Delay expanding
the expression if it is in the context of a simple return statement.
(Expand_N_If_Expression): Likewise.
(Expand_N_Qualified_Expression): Fold identical operand.  Distribute
the expression into an operand that is a conditional expression with
expansion delayed.
(Process_Transient_In_Expression): Also test the parent node for the
presence of a simple return statement.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Test the unconditional
parent node for the presence of a simple return statement.
* exp_util.ads (Delayed Expansion): New description.
(Delay_Conditional_Expressions_Between): New procedure.
(Is_Delayed_Conditional_Expression): ...here.
(Unanalyze_Delayed_Conditional_Expression): New procedure.
(Unconditional_Parent): New function.
* exp_util.adb (Find_Hook_Context): Take into account conditional
statements coming from conditional expressions.
(Within_Conditional_Expression): Likewise.
(Delay_Conditional_Expressions_Between): New procedure.
(Is_Delayed_Conditional_Expression): ...here.
(Unanalyze_Delayed_Conditional_Expression): New procedure.
(Unconditional_Parent): New function.
* sinfo.ads (Expansion_Delayed): Adjust description.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb |  32 +--
 gcc/ada/exp_aggr.ads |   4 -
 gcc/ada/exp_ch3.adb  |   5 +-
 gcc/ada/exp_ch4.adb  | 200 +--
 gcc/ada/exp_ch6.adb  |  10 +--
 gcc/ada/exp_util.adb | 112 +++-
 gcc/ada/exp_util.ads |  56 
 gcc/ada/sinfo.ads|   4 +-
 8 files changed, 332 insertions(+), 91 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 9aabd58b2a9..344e4d10c5f 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -4217,7 +4217,7 @@ package body Exp_Aggr is
   --  First, climb the parent chain, looking through qualified expressions
   --  and dependent expressions of conditional expressions.
 
-  while True loop
+  loop
  case Nkind (Parent_Node) is
 when N_Case_Expression_Alternative =>
null;
@@ -4276,25 +4276,13 @@ package body Exp_Aggr is
 
  or else Is_Build_In_Place_Aggregate_Return (Parent_Node)
   then
- Node := N;
-
  --  Mark the aggregate, as well as all the intermediate conditional
  --  expressions, as having expansion delayed. This will block the
  --  usual (bottom-up) expansion of the marked nodes and replace it
  --  with a top-down expansion from the parent node.
 
- while Node /= Parent_Node loop
-if Nkind (Node) in N_Aggregate
- | N_Case_Expression
- | N_Extension_Aggregate
- | N_If_Expression
-then
-   Set_Expansion_Delayed (Node);
-end if;
-
-Node := Parent (Node);
- end loop;
-
+ Set_Expansion_Delayed (N);
+ Delay_Conditional_Expressions_Between (N, Parent_Node);
  return;
   end if;
 
@@ -8650,7 +8638,7 @@ package body Exp_Aggr is
   --  expansion has been delayed, analyze it again and expand it.
 
   if Is_Delayed_Conditional_Expression (Expression (Init_Stmt)) then
- Set_Analyzed (Expression (Init_Stmt), False);
+ Unanalyze_Delayed_Conditional_Expression (Expression (Init_Stmt));
   end if;
 
   Append_To (Blk_Stmts, Init_Stmt);
@@ -8765,18 +8753,6 @@ package body Exp_Aggr is
 and then Expansion_Delayed (Unqual_N);
end Is_Delayed_Aggregate;
 
-   ---
-   -- Is_Delayed_Conditional_Expression --
-   ---
-
-   function Is_Delayed_Conditional_Expression (N : Node_Id) return Boolean is
-  Unqual_N : constant Node_Id := Unqualify (N);
-
-   begin
-  return Nkind (Unqual_N) in N_Case_Expression | N_If_Expression
-and

[COMMITTED 20/20] ada: Fix internal error on packed record with 0-size component

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

The problem is that the order of components listed in a constant CONSTRUCTOR
does not match that of the associated record type.

gcc/ada/ChangeLog:

* gcc-interface/utils2.cc (compare_elmt_bitpos): Deal specially with
0-sized components when the bit position is the same.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/utils2.cc | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc
index 8eebf593596..e91f67d7e45 100644
--- a/gcc/ada/gcc-interface/utils2.cc
+++ b/gcc/ada/gcc-interface/utils2.cc
@@ -2113,7 +2113,23 @@ compare_elmt_bitpos (const void *rt1, const void *rt2)
   const int ret
 = tree_int_cst_compare (bit_position (field1), bit_position (field2));
 
-  return ret ? ret : (int) (DECL_UID (field1) - DECL_UID (field2));
+  if (ret)
+return ret;
+
+  /* The bit position can be the same if one of the fields has zero size.
+ In this case, if the other has nonzero size, put the former first to
+ match the layout done by components_to_record.  Otherwise, preserve
+ the order of the source code.  */
+
+  const bool field1_zero_size = integer_zerop (DECL_SIZE (field1));
+  const bool field2_zero_size = integer_zerop (DECL_SIZE (field2));
+
+  if (field1_zero_size && !field2_zero_size)
+return -1;
+  else if (!field1_zero_size && field2_zero_size)
+return 1;
+  else
+return (int) (DECL_UID (field1) - DECL_UID (field2));
 }
 
 /* Return a CONSTRUCTOR of TYPE whose elements are V.  */
-- 
2.43.0



[Patch] C++: reject OpenMP directives in constexpr functions

2024-12-13 Thread Tobias Burnus

OpenMP states for C++:

"Directives may not appear in constexpr functions or in constant expressions."

There is some support for this already in GCC, but not for [[omp::decl]]-type
of directives and it also doesn't work that well. For the example, for the
newly added testcase, the result with the patch is simple and clear:

error: OpenMP directives may not appear in ‘constexpr’ functions

without the patch:

error: uninitialized variable ‘i’ in ‘constexpr’ function
error: uninitialized variable ‘i’ in ‘constexpr’ function
sorry, unimplemented: ‘#pragma omp allocate’ not yet supported
sorry, unimplemented: ‘#pragma omp allocate’ not yet supported
error: ‘constexpr int f()’ called in a constant expression
error: ‘constexpr int g()’ called in a constant expression

Note: I think OpenACC has a similar issue but as the specification
is silent about it, the patch only handles OpenMP.

* * *

I have not touched the 'case OMP_...:' in constexpr.cc, added in
previous patches; in principle, those should be now unreachable
and could be removed.
I also have not included any OpenACC pragmas, even though they have
the same issue. (However, contrary to OpenMP, the OpenACC spec is
silent about constexpr.)

* * *

Comments, suggestions, concerns?

Tobias
C++: reject OpenMP directives in constexpr functions

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_construct, cp_parser_pragma): Reject
	OpenMP expressions in constexpr functions.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/pr108607.C: Update dg-error.
	* g++.dg/gomp/pr79664.C: Update dg-error.
	* g++.dg/gomp/omp-constexpr.C: New test.

 gcc/cp/parser.cc  | 24 -
 gcc/testsuite/g++.dg/gomp/omp-constexpr.C | 45 +++
 gcc/testsuite/g++.dg/gomp/pr108607.C  | 16 +--
 gcc/testsuite/g++.dg/gomp/pr79664.C   | 38 +-
 4 files changed, 95 insertions(+), 28 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 15a5253b50d..88641c373e2 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -52071,7 +52071,18 @@ cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
   char p_name[sizeof "#pragma omp teams distribute parallel for simd"];
   omp_clause_mask mask (0);
 
-  switch (cp_parser_pragma_kind (pragma_tok))
+  unsigned int id = cp_parser_pragma_kind (pragma_tok);
+  if (current_function_decl
+  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+  && id >= PRAGMA_OMP__START_
+  && id <= PRAGMA_OMP__LAST_)
+{
+  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"OpenMP directives may not appear in % functions");
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  return;
+}
+  switch (id)
 {
 case PRAGMA_OACC_ATOMIC:
   cp_parser_omp_atomic (parser, pragma_tok, true);
@@ -52596,6 +52607,17 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
   return false;
 }
+  if (current_function_decl
+  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+  && id >= PRAGMA_OMP__START_
+  && id <= PRAGMA_OMP__LAST_)
+{
+  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"OpenMP directives may not appear in % functions");
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  return false;
+}
+
   if (id != PRAGMA_OMP_DECLARE && id != PRAGMA_OACC_ROUTINE)
 cp_ensure_no_omp_declare_simd (parser);
   switch (id)
diff --git a/gcc/testsuite/g++.dg/gomp/omp-constexpr.C b/gcc/testsuite/g++.dg/gomp/omp-constexpr.C
new file mode 100644
index 000..0d984d8609b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/omp-constexpr.C
@@ -0,0 +1,45 @@
+// { dg-do compile { target c++11 } }
+
+constexpr int
+f ()
+{
+  int a = 42;
+  #pragma omp parallel for simd  /* { dg-error "OpenMP directives may not appear in 'constexpr' functions" }  */
+  for (int i=0; i < 10; i++)
+a += i;
+  return a;
+} // { dg-error "not a return-statement" "" { target c++11_down } }
+
+constexpr int
+g ()
+{
+  int a = 42;
+  [[omp::sequence(omp::directive(parallel),omp::directive(for))]]  /* { dg-error "OpenMP directives may not appear in 'constexpr' functions" }  */
+  for (int i=0; i < 10; i++)
+a += i;
+  return a;
+} // { dg-error "not a return-statement" "" { target c++11_down } }
+
+constexpr int
+h ()
+{
+  int a = 42;
+  #pragma omp allocate(a) align(128)  /* { dg-error "OpenMP directives may not appear in 'constexpr' functions" }  */
+  return a;
+} // { dg-error "not a return-statement" "" { target c++11_down } }
+
+constexpr int
+i ()
+{
+  int a [[omp::decl(allocate, align(128))]] = 42;  /* { dg-error "OpenMP directives may not appear in 'constexpr' functions" }  */
+  return a;
+} // { dg-error "not a return-statement" "" { target c++11_down } }
+
+
+
+int main() {
+  static constexpr int a = f ();  // { dg-error "called in a constant expression" "" { target c++11

Re: [PATCH v1] RISC-V: Make vector strided load alias all other memories

2024-12-13 Thread Robin Dapp
OK.

-- 
Regards
 Robin



[COMMITTED 01/20] ada: Adjust cut-off for scaling of floating-point numbers

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

The value needs to take into account denormals and encompass Maxdigs.

gcc/ada/ChangeLog:

* libgnat/s-imager.adb (Maxscaling): Change to Natural constant and
add Maxdigs to value.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-imager.adb | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/libgnat/s-imager.adb b/gcc/ada/libgnat/s-imager.adb
index d19fda3b613..89f9c1b020a 100644
--- a/gcc/ada/libgnat/s-imager.adb
+++ b/gcc/ada/libgnat/s-imager.adb
@@ -49,14 +49,13 @@ package body System.Image_R is
 
Maxdigs : constant Natural := 2 * Natural'Min (Uns'Width - 2, Num'Digits);
 
-   Maxscaling : constant := 5000;
-   --  Max decimal scaling required during conversion of floating-point
-   --  numbers to decimal. This is used to defend against infinite
-   --  looping in the conversion, as can be caused by erroneous executions.
-   --  The largest exponent used on any current system is 2**16383, which
-   --  is approximately 10**4932, and the highest number of decimal digits
-   --  is about 35 for 128-bit floating-point formats, so 5000 leaves
-   --  enough room for scaling such values
+   Maxscaling : constant Natural := 5000 + Maxdigs;
+   --  Maximum decimal scaling required during conversion of floating-point
+   --  numbers to decimal. This is used to defend against infinite looping
+   --  during the conversion, that could be caused by erroneous execution.
+   --  The largest decimal exponent in absolute value used on any current
+   --  system is 4966 (denormals of IEEE binary128) and we scale up to the
+   --  Maxdigs exponent during the conversion.
 
package Double_Real is new System.Double_Real (Num);
use type Double_Real.Double_T;
-- 
2.43.0



[COMMITTED 03/20] ada: Further work in semantic analysis of iterated component associations

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

This finishes up the transition to preanalysis of a copy of the expression
for iterated component associations in all contexts, thus voiding the need
to clean things up afterward.

However, this requires a larger cleanup in semantics analysis of aggregates,
in particular for others choices, which are currently skipped in Sem_Aggr,
with Exp_Aggr trying to patch things up afterward but leaving some legality
loopholes in the end.  That's why this makes sure that all the expressions
appearing in aggregates are either analyzed or preanalyzed by Sem_Aggr, as
documented in the spec of Sem, modulo the copy in an iteration context.

gcc/ada/ChangeLog:

* exp_aggr.adb (Build_Array_Aggr_Code): Remove obsolete comment.
(Convert_To_Positional): Remove Ctyp local variable.
(Is_Static_Element): Remove Dims parameter and do not preanalyze the
expression there.
(Expand_Array_Aggregate): Make Ctyp a constant.
(Compute_Others_Present): Do not preanalyze the expression there.
* sem_aggr.adb (Resolve_Array_Aggregate): New Ctyp constant.  Use it
throughout the procedure to denote the component type.
(Resolve_Aggr_Expr): Always preanalyze a copy of the expression in
an iteration context.  Preanalyze it directly when the expander is
active and the choice may cover multiple components.  Otherwise,
fully analyze it.
Do not reanalyze an iterated component association with an others
choice either when there are positional components.
(Resolve_Iterated_Component_Association): Do not remove references
from the expression after invoking Resolve_Aggr_Expr on it.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb |  54 ---
 gcc/ada/sem_aggr.adb | 158 ---
 2 files changed, 86 insertions(+), 126 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index c01011cc1fb..c93554347ad 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -1645,8 +1645,7 @@ package body Exp_Aggr is
  if Is_Iterated_Component then
 
 --  Create a new scope for the loop variable so that the
---  following Gen_Assign (that ends up calling
---  Preanalyze_And_Resolve) can correctly find it.
+--  following Gen_Assign can correctly find it.
 
 Ent := New_Internal_Entity (E_Loop,
  Current_Scope, Loc, 'L');
@@ -4410,7 +4409,6 @@ package body Exp_Aggr is
   Dims : constant Nat := Number_Dimensions (Typ);
   Max_Others_Replicate : constant Nat := Max_Aggregate_Size (N);
 
-  Ctyp  : Entity_Id := Component_Type (Typ);
   Static_Components : Boolean   := True;
 
   procedure Check_Static_Components;
@@ -4430,7 +4428,7 @@ package body Exp_Aggr is
   --  Return True if the aggregate N is flat (which is not trivial in the
   --  case of multidimensional aggregates).
 
-  function Is_Static_Element (N : Node_Id; Dims : Nat) return Boolean;
+  function Is_Static_Element (N : Node_Id) return Boolean;
   --  Return True if N, an element of a component association list, i.e.
   --  N_Component_Association or N_Iterated_Component_Association, has a
   --  compile-time known value and can be passed as is to the back-end
@@ -4474,7 +4472,7 @@ package body Exp_Aggr is
  then
 Assoc := First (Component_Associations (N));
 while Present (Assoc) loop
-   if not Is_Static_Element (Assoc, Dims) then
+   if not Is_Static_Element (Assoc) then
   Static_Components := False;
   exit;
end if;
@@ -4699,7 +4697,7 @@ package body Exp_Aggr is
   --  only if either the element is static or is
   --  an aggregate (we already know it is OK).
 
-  elsif not Is_Static_Element (Elmt, Dims)
+  elsif not Is_Static_Element (Elmt)
 and then Nkind (Expr) /= N_Aggregate
   then
  return False;
@@ -4856,7 +4854,7 @@ package body Exp_Aggr is
   -- Is_Static_Element --
   ---
 
-  function Is_Static_Element (N : Node_Id; Dims : Nat) return Boolean is
+  function Is_Static_Element (N : Node_Id) return Boolean is
  Expr : constant Node_Id := Expression (N);
 
   begin
@@ -4874,14 +4872,6 @@ package body Exp_Aggr is
  then
 return True;
 
- --  However, one may write static expressions that are syntactically
- --  ambiguous, so preanalyze the expression before checking it again,
- --  but only at the innermost level for a multidimensional array.
-
- elsif Dims = 1 then
-Preanalyze_And_Resolve (Ex

[COMMITTED 05/20] ada: Fix documentation of Ada.Real_Time.Timing_Events

2024-12-13 Thread Marc Poulhiès
From: Ronan Desplanques 

The GNAT reference manual stated that GNAT did not implement this
language-defined package, but GNAT in fact does offer an implementation
of it.

gcc/ada/ChangeLog:

* doc/gnat_rm/standard_library_routines.rst: Fix documentation.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/doc/gnat_rm/standard_library_routines.rst | 3 ++-
 gcc/ada/gnat_rm.texi  | 3 ++-
 gcc/ada/gnat_ugn.texi | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/standard_library_routines.rst 
b/gcc/ada/doc/gnat_rm/standard_library_routines.rst
index 2e7642652b2..a595be5c4f2 100644
--- a/gcc/ada/doc/gnat_rm/standard_library_routines.rst
+++ b/gcc/ada/doc/gnat_rm/standard_library_routines.rst
@@ -383,7 +383,8 @@ the unit is not implemented.
   then such a backward jump may occur.
 
 ``Ada.Real_Time.Timing_Events`` *(D.15)*
-  Not implemented in GNAT.
+  This package allows procedures to be executed at a specified time without
+  the use of a task or a delay statement.
 
 ``Ada.Sequential_IO`` *(A.8.1)*
   This package provides input-output facilities for sequential files,
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index ee22978b27c..e2e2c310524 100644
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -21376,7 +21376,8 @@ then such a backward jump may occur.
 
 @item @code{Ada.Real_Time.Timing_Events} `(D.15)'
 
-Not implemented in GNAT.
+This package allows procedures to be executed at a specified time without
+the use of a task or a delay statement.
 
 @item @code{Ada.Sequential_IO} `(A.8.1)'
 
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 662fe1c1642..d6c87ef5098 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -29839,8 +29839,8 @@ to permit their use in free software.
 
 @printindex ge
 
-@anchor{d2}@w{  }
 @anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
+@anchor{d2}@w{  }
 
 @c %**end of body
 @bye
-- 
2.43.0



[COMMITTED 06/20] ada: Fix dangling reference with user-defined indexing of function call

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

This happens with a noncontrolled type because the user-defined indexing is
expanded into a function call that binds the lifetime of the original call
to its return value.  The temporary must be created explicitly in this case,
so that the front-end can control its lifetime.

gcc/ada/ChangeLog:

* exp_ch6.adb (Expand_Call_Helper): Also create a temporary in the
case of a noncontrolled user-defined indexing.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch6.adb | 36 
 1 file changed, 36 insertions(+)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index 20ce7a5b239..945f44630d1 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -5314,6 +5314,42 @@ package body Exp_Ch6 is
 Establish_Transient_Scope
   (Call_Node, Needs_Secondary_Stack (Etype (Call_Node)));
  end if;
+
+  --  Functions returning noncontrolled objects that may be subject to
+  --  user-defined indexing also need special attention. The problem
+  --  is that, when a call to such a function is directly passed as an
+  --  actual in a call to the Constant_Indexing function, the latter
+  --  call effectively binds the lifetime of the actual to that of its
+  --  return value, thus extending it beyond the call. This cannot be
+  --  directly supported by code generators, for which the lifetime of
+  --  temporaries created for actuals ends immediately after the call.
+  --  Therefore we force the creation of a temporary in this case, as
+  --  the above code would have done in the controlled case; note that,
+  --  in this latter case, the temporary cannot be finalized just after
+  --  the call as would naturally be done, and Is_Finalizable_Transient
+  --  also has a special processing for it (see Is_Indexed_Container).
+
+  elsif Nkind (Call_Node) = N_Function_Call
+and then Nkind (Parent (Call_Node)) = N_Function_Call
+  then
+ declare
+Aspect : constant Node_Id :=
+  Find_Value_Of_Aspect
+(Etype (Call_Node), Aspect_Constant_Indexing);
+
+ begin
+if Present (Aspect)
+  and then Is_Entity_Name (Name (Parent (Call_Node)))
+  and then Entity (Name (Parent (Call_Node))) = Entity (Aspect)
+then
+   --  Resolution is now finished, make sure we don't start
+   --  analysis again because of the duplication.
+
+   Set_Analyzed (Call_Node);
+
+   Remove_Side_Effects (Call_Node);
+end if;
+ end;
   end if;
end Expand_Call_Helper;
 
-- 
2.43.0



[COMMITTED 04/20] ada: Exclude library units from gnatcov instrumentation

2024-12-13 Thread Marc Poulhiès
From: Ronan Desplanques 

Before this patch, we instrumented code that's only used during the
build process to generate more code. This patch marks the
code-generating code so it's not instrumented for coverage.

gcc/ada/ChangeLog:

* gnat2.gpr: Add library units to coverage exclusion list.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gnat2.gpr | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/gnat2.gpr b/gcc/ada/gnat2.gpr
index df648465812..9d9f3b55057 100644
--- a/gcc/ada/gnat2.gpr
+++ b/gcc/ada/gnat2.gpr
@@ -49,6 +49,23 @@ project Gnat2 is
   --  of fresh source files from the run-time library. We need gnatcov to 
not instrument
   --  those files, so we add the clause below. It's unknown why only 
putting "GNAT" is sufficient???
   --  We also pull in GNAT.Lists for example, but specifying it here 
triggers a warning.
-  for Excluded_Units use ("Gnat");
+  Overridden_Runtime_Units := ("GNAT");
+
+  --  We don't want to instrument code generation tools
+  Codegen_Units :=
+("Gen_IL",
+ "Gen_IL.Gen",
+ "Gen_IL.Fields",
+ "Gen_IL.Gen.Gen_Entities",
+ "Gen_IL.Gen.Gen_Nodes",
+ "Gen_IL.Internals",
+ "Gen_IL.Main",
+ "Gen_IL.Types",
+ "XSnamesT",
+ "XUtil",
+ "XOSCons",
+ "XLeaps");
+
+  for Excluded_Units use Overridden_Runtime_Units & Codegen_Units;
end Coverage;
 end Gnat2;
-- 
2.43.0



[COMMITTED 11/20] ada: Remove unused parameter from volatile type queries

2024-12-13 Thread Marc Poulhiès
From: Piotr Trojanek 

Routines Is_Effectively_Volatile and Is_Effectively_Volatile_For_Reading
were always called with Ignore_Protected parameter set to True (or has
been passed unmodified on recursive calls), so this parameter wasn't
actually needed.

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* sem_util.adb (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Remove Ignore_Protected
parameter.
(Is_Effectively_Volatile_Object,
Is_Effectively_Volatile_Object_For_Reading): Remove
single-parameter wrappers that are needed to instantiate
generic subprogram.
* sem_util.ads (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Remove parameter; adjust
comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 35 ++-
 gcc/ada/sem_util.ads | 15 ++-
 2 files changed, 12 insertions(+), 38 deletions(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 30c1a5236ae..dea27dc8d6b 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -16727,9 +16727,7 @@ package body Sem_Util is
-- Is_Effectively_Volatile --
-
 
-   function Is_Effectively_Volatile
- (Id   : Entity_Id;
-  Ignore_Protected : Boolean := False) return Boolean is
+   function Is_Effectively_Volatile (Id : Entity_Id) return Boolean is
begin
   if Is_Type (Id) then
 
@@ -16760,15 +16758,13 @@ package body Sem_Util is
   --  private type may be missing in case of error.
 
   return Present (Anc)
-and then Is_Effectively_Volatile
-  (Component_Type (Anc), Ignore_Protected);
+and then Is_Effectively_Volatile (Component_Type (Anc));
end;
 end if;
 
- --  A protected type is always volatile unless Ignore_Protected is
- --  True.
+ --  A protected type is always volatile
 
- elsif Is_Protected_Type (Id) and then not Ignore_Protected then
+ elsif Is_Protected_Type (Id) then
 return True;
 
  --  A descendant of Ada.Synchronous_Task_Control.Suspension_Object is
@@ -16794,7 +16790,7 @@ package body Sem_Util is
 and then not
   (Ekind (Id) = E_Variable and then No_Caching_Enabled (Id)))
  or else Has_Volatile_Components (Id)
- or else Is_Effectively_Volatile (Etype (Id), Ignore_Protected);
+ or else Is_Effectively_Volatile (Etype (Id));
   end if;
end Is_Effectively_Volatile;
 
@@ -16803,19 +16799,15 @@ package body Sem_Util is
-
 
function Is_Effectively_Volatile_For_Reading
- (Id   : Entity_Id;
-  Ignore_Protected : Boolean := False) return Boolean
+ (Id : Entity_Id) return Boolean
is
begin
-  --  A concurrent type is effectively volatile for reading, except for a
-  --  protected type when Ignore_Protected is True.
+  --  A concurrent type is effectively volatile for reading
 
-  if Is_Task_Type (Id)
-or else (Is_Protected_Type (Id) and then not Ignore_Protected)
-  then
+  if Is_Concurrent_Type (Id) then
  return True;
 
-  elsif Is_Effectively_Volatile (Id, Ignore_Protected) then
+  elsif Is_Effectively_Volatile (Id) then
 
 --  Other volatile types and objects are effectively volatile for
 --  reading when they have property Async_Writers or Effective_Reads
@@ -16845,7 +16837,7 @@ package body Sem_Util is
 
return Present (Anc)
  and then Is_Effectively_Volatile_For_Reading
-   (Component_Type (Anc), Ignore_Protected);
+   (Component_Type (Anc));
 end;
  end if;
   end if;
@@ -16859,9 +16851,6 @@ package body Sem_Util is

 
function Is_Effectively_Volatile_Object (N : Node_Id) return Boolean is
-  function Is_Effectively_Volatile (E : Entity_Id) return Boolean is
- (Is_Effectively_Volatile (E, Ignore_Protected => False));
-
   function Is_Effectively_Volatile_Object_Inst
   is new Is_Effectively_Volatile_Object_Shared (Is_Effectively_Volatile);
begin
@@ -16875,10 +16864,6 @@ package body Sem_Util is
function Is_Effectively_Volatile_Object_For_Reading
  (N : Node_Id) return Boolean
is
-  function Is_Effectively_Volatile_For_Reading
-(E : Entity_Id) return Boolean
-  is (Is_Effectively_Volatile_For_Reading (E, Ignore_Protected => False));
-
   function Is_Effectively_Volatile_Object_For_Reading_Inst
   is new Is_Effectively_Volatile_Object_Shared
 (Is_Effectively_Volatile_For_Reading);
diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
index 2b9ba5f494c..dda031f3516 100644
--- a/gcc/ada/sem_util.ads
++

[COMMITTED 08/20] ada: Fix breakage of GNATprove introduced by latest change

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

gcc/ada/ChangeLog:

* sem_aggr.adb (Resolve_Aggr_Expr): Always perform a full analysis
of the expression in SPARK mode.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 8cc00ad3b27..3a82e6620c5 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -2057,13 +2057,14 @@ package body Sem_Aggr is
 --  In an iterated context, preanalyze a copy of the expression to
 --  verify legality. We use a copy because the expression will be
 --  analyzed anew when the enclosing aggregate is expanded and the
---  construct is rewritten as a loop with a new index variable.
+--  construct is rewritten as a loop with a new iteration variable.
+--  This does not apply to SPARK mode, where expansion is skipped.
 
 --  If the parent is a component association, we also temporarily
 --  point its Expression field to the copy, because analysis may
 --  expect this invariant to hold.
 
-if Iterated_Expr then
+if Iterated_Expr and then not GNATprove_Mode then
declare
   In_Assoc : constant Boolean :=
 Nkind (Parent (Expr)) in N_Component_Association
-- 
2.43.0



[COMMITTED 09/20] ada: Remove last call to Preanalyze_And_Resolve from Exp_Aggr

2024-12-13 Thread Marc Poulhiès
From: Eric Botcazou 

All the expressions are now at least preanalyzed in a non-iterated context,
so we do not need to redo it in Aggr_Assignment_OK_For_Backend, given that
Is_OK_Aggregate explicitly rejects iterated component associations.

gcc/ada/ChangeLog:

* exp_aggr.adb (Aggr_Assignment_OK_For_Backend): Do not call again
Preanalyze_And_Resolve on the expression.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index c93554347ad..c0218c9e3dc 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -534,10 +534,7 @@ package body Exp_Aggr is
   end if;
 
   --  If the expression has side effects (e.g. contains calls with
-  --  potential side effects) reject as well. We only preanalyze the
-  --  expression to prevent the removal of intended side effects.
-
-  Preanalyze_And_Resolve (Expr, Ctyp);
+  --  potential side effects), then reject it as well.
 
   if not Side_Effect_Free (Expr) then
  return False;
-- 
2.43.0



[PATCH] RISC-V: Increase cost for vec_construct [PR118019].

2024-12-13 Thread Robin Dapp
Hi,

for a generic vec_construct from scalar elements we need
to load each scalar element and move it over to a vector register.
This patch uses register-move cost and scalar_to_vec and multiplies
it with the number of elements in the vector.

This helps vectorization of e.g. x264 SATD with the default
-mvector-strict-align.

Regtested on rv64gcv_zvl512b.

Regards
 Robin

PR target/118019

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_builtin_vectorization_cost):
Increase vec_construct cost.
---
 gcc/config/riscv/riscv.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index be2ebf9d9c0..aa8a4562d9a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12263,7 +12263,13 @@ riscv_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
 case vec_construct:
-  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
+   {
+ /* TODO: This is too pessimistic in case we can splat.  */
+ int regmove_cost = fp ? costs->regmove->FR2VR
+   : costs->regmove->GR2VR;
+ return (regmove_cost + common_costs->scalar_to_vec_cost)
+   * estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
+   }
 
 default:
   gcc_unreachable ();
-- 
2.47.1



??????[PATCH] RISC-V: Increase cost for vec_construct [PR118019].

2024-12-13 Thread ??????
Could you add testcase and dump tree check??








 --Reply to Message--
 On Fri, Dec 13, 2024 19:47 PM Robin Dapp

[Fortran, Patch, PR117347, v1] Fix array constructor not resolved in associate

2024-12-13 Thread Andre Vehreschild
Hi all,

attached patch fixes an reject-valid of an array constructor in an associate by
resolving the array constructor before parsing the associate-block. I am not
100% sure, if that is the right place to do this. But given, that there is
already a special casing before the patch, I just propose to do the resolve
there.

Regstests ok on x86_64-pc-linux-gnu / F41. Ok for mainline ?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 0c5315e0d70da9dd107e6057716ff0d4ce89dc9b Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 13 Dec 2024 09:06:11 +0100
Subject: [PATCH] Fortran: Fix associate with derived type array construtor
 [PR117347]

gcc/fortran/ChangeLog:

	PR fortran/117347

	* primary.cc (gfc_match_varspec): Resolve array constructors in
	associate before parsing the block of the associate.

gcc/testsuite/ChangeLog:

	* gfortran.dg/associate_71.f90: New test.
---
 gcc/fortran/primary.cc |  6 +
 gcc/testsuite/gfortran.dg/associate_71.f90 | 28 ++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/associate_71.f90

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 1db27929eeb..8d6195303a2 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2285,6 +2285,12 @@ gfc_match_varspec (gfc_expr *primary, int equiv_flag, bool sub_flag,
   if (tgt_expr->rank)
 	sym->ts.u.derived = tgt_expr->ts.u.derived;
 }
+  else if (sym->ts.type == BT_UNKNOWN && sym->assoc && !sym->assoc->dangling
+	   && tgt_expr->expr_type == EXPR_ARRAY)
+{
+  gcc_assert (gfc_resolve_expr (tgt_expr));
+  sym->ts = tgt_expr->ts;
+}

   peeked_char = gfc_peek_ascii_char ();
   if ((inferred_type && !sym->as && peeked_char == '(')
diff --git a/gcc/testsuite/gfortran.dg/associate_71.f90 b/gcc/testsuite/gfortran.dg/associate_71.f90
new file mode 100644
index 000..716d1b8ff61
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_71.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+!
+! Check that pr117347 is fixed.
+! Contributed by Ivan Pribec  
+
+program pr117347
+  implicit none
+
+  type :: point
+ real :: x = 42.
+  end type point
+
+  type(point) :: mypoint
+  real:: pi(1)
+  associate (points =>  mypoint ) ! accepted
+pi(:) = points% x
+  end associate
+  if (any(pi /= 42)) stop 1
+  associate (points => (mypoint)) ! accepted
+pi(:) = points% x
+  end associate
+  if (any(pi /= 42)) stop 2
+  associate (points => [mypoint]) ! REJECTED
+pi(:) = points% x
+  end associate
+  if (any(pi /= 42)) stop 3
+end program
+
--
2.47.1



[COMMITTED 13/20] ada: Refactor code of Check_Ambiguous_Call and Valid_Conversion

2024-12-13 Thread Marc Poulhiès
From: Javier Miranda 

gcc/ada/ChangeLog:

* sem_res.adb (Report_Ambiguous_Argument): Code cleanup.
(Resolve): Code cleanup.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_res.adb | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index cd75508021c..948ed940481 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -2435,14 +2435,7 @@ package body Sem_Res is
 
 Get_First_Interp (Name (Arg), I, It);
 while Present (It.Nam) loop
-   Error_Msg_Sloc := Sloc (It.Nam);
-
-   if Nkind (Parent (It.Nam)) = N_Full_Type_Declaration then
-  Error_Msg_N ("interpretation (inherited) #!", Arg);
-   else
-  Error_Msg_N ("interpretation #!", Arg);
-   end if;
-
+   Report_Interpretation (Arg, It.Nam, It.Typ);
Get_Next_Interp (I, It);
 end loop;
  end if;
@@ -2823,13 +2816,10 @@ package body Sem_Res is
 
 Ambiguous := True;
 
-if Nkind (Parent (Seen)) = N_Full_Type_Declaration then
-   Error_Msg_N
- ("\\possible interpretation (inherited)#!", N);
-else
-   Error_Msg_N -- CODEFIX
- ("\\possible interpretation#!", N);
-end if;
+Report_Interpretation
+  (N   => N,
+   Nam => Seen,
+   Typ => Etype (Seen));
 
 if Nkind (N) in N_Subprogram_Call
   and then Present (Parameter_Associations (N))
@@ -2912,8 +2902,8 @@ package body Sem_Res is
  elsif
Nkind (Parent (It.Nam)) = N_Full_Type_Declaration
  then
-Error_Msg_N
-  ("\\possible interpretation (inherited)#!", N);
+Report_Interpretation (N, It.Nam, It.Typ);
+
  else
 Error_Msg_N -- CODEFIX
   ("\\possible interpretation#!", N);
-- 
2.43.0



[COMMITTED 12/20] ada: Implement new rules about effectively volatile types in SPARK

2024-12-13 Thread Marc Poulhiès
From: Piotr Trojanek 

New rules make record types effectively volatile based on the effective
volatility of their components; same for effectively volatile for
reading. Now volatility composition for records works like volatility
composition for arrays.

gcc/ada/ChangeLog:

* sem_util.adb (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Implement new rule for
record types.
* sem_util.ads (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Adjust comments.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 49 
 gcc/ada/sem_util.ads |  4 
 2 files changed, 53 insertions(+)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index dea27dc8d6b..0b4a2965ad8 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -16728,6 +16728,8 @@ package body Sem_Util is
-
 
function Is_Effectively_Volatile (Id : Entity_Id) return Boolean is
+  Comp : Entity_Id;
+  Has_Vol_Comp : Boolean := False;
begin
   if Is_Type (Id) then
 
@@ -16773,6 +16775,35 @@ package body Sem_Util is
  elsif Is_Descendant_Of_Suspension_Object (Id) then
 return True;
 
+ --  A record type for which all components have an effectively
+ --  volatile type.
+
+ elsif Is_Record_Type (Id) then
+
+--  Inspect all components defined in the scope of the type,
+--  looking for those whose type is not effecively volatile.
+
+Comp := First_Component (Id);
+while Present (Comp) loop
+   if Comes_From_Source (Comp) then
+  if Is_Effectively_Volatile (Etype (Comp)) then
+ Has_Vol_Comp := True;
+
+  --  The component is not effecively volatile
+
+  else
+ return False;
+  end if;
+   end if;
+
+   Next_Component (Comp);
+end loop;
+
+--  If we get here, then all components are of an effectively
+--  volatile type.
+
+return Has_Vol_Comp;
+
  --  Otherwise the type is not effectively volatile
 
  else
@@ -16801,6 +16832,7 @@ package body Sem_Util is
function Is_Effectively_Volatile_For_Reading
  (Id : Entity_Id) return Boolean
is
+  Comp : Entity_Id;
begin
   --  A concurrent type is effectively volatile for reading
 
@@ -16839,6 +16871,23 @@ package body Sem_Util is
  and then Is_Effectively_Volatile_For_Reading
(Component_Type (Anc));
 end;
+
+ --  In addition, a record type is effectively volatile for reading
+ --  if at least one component has an effectively volatile type for
+ --  reading.
+
+ elsif Is_Record_Type (Id) then
+Comp := First_Component (Id);
+while Present (Comp) loop
+   if Comes_From_Source (Comp)
+ and then Is_Effectively_Volatile_For_Reading (Etype (Comp))
+   then
+  return True;
+   end if;
+   Next_Component (Comp);
+end loop;
+
+return False;
  end if;
   end if;
 
diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
index dda031f3516..a809cdbaa07 100644
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -1968,6 +1968,8 @@ package Sem_Util is
--* Volatile without No_Caching
--* An array type subject to aspect Volatile_Components
--* An array type whose component type is effectively volatile
+   --* A record type for which all components have an effectively volatile
+   --  type
--* A protected type
--* Descendant of type Ada.Synchronous_Task_Control.Suspension_Object
 
@@ -1982,6 +1984,8 @@ package Sem_Util is
--  Async_Writers and Effective_Reads set to False
--* An array type whose component type is effectively volatile for
--  reading
+   --* A record type for which at least one component has an effectively
+   --  volatile type for reading
--* A protected type
--* Descendant of type Ada.Synchronous_Task_Control.Suspension_Object
 
-- 
2.43.0



Re: [PATCH] RISC-V: Emit vector shift pattern for const_vector [PR117353].

2024-12-13 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-12-12 18:43
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH] RISC-V: Emit vector shift pattern for const_vector [PR117353].
Hi,
 
in PR117353 and PR117878 we expand a const vector during reload.  For
this we use an unpredicated left shift.  Normally an insn like this is
split but as we introduce it late and cannot create pseudos anymore
it remains unpredicated and is not recognized by the vsetvl pass (where
we expect all insns to be in predicated RVV format).
 
This patch directly emits a predicated shift instead.  We could
distinguish between !lra_in_progress and lra_in_progress and emit
an unpredicated shift in the former case but we're not very likely
to optimize it anyway so it doesn't seem worth it.
 
Regtested on rv64gcv_zvl512b and waiting for the CI.
 
Regards
Robin
 
PR target/117353
PR target/117878
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector): Use predicated
instead of simple shift.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr117353.c: New test.
---
gcc/config/riscv/riscv-v.cc   |  8 +++--
.../gcc.target/riscv/rvv/autovec/pr117353.c   | 29 +++
2 files changed, 34 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117353.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5c14c77068f..417c36a7587 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1439,9 +1439,11 @@ expand_const_vector (rtx target, rtx src)
  rtx shift_count
= gen_int_mode (exact_log2 (builder.npatterns ()),
builder.inner_mode ());
-   rtx tmp1 = expand_simple_binop (builder.mode (), LSHIFTRT,
- vid, shift_count, NULL_RTX,
- false, OPTAB_DIRECT);
+   rtx tmp1 = gen_reg_rtx (builder.mode ());
+   rtx shift_ops[] = {tmp1, vid, shift_count};
+   emit_vlmax_insn (code_for_pred_scalar
+(LSHIFTRT, builder.mode ()), BINARY_OP,
+shift_ops);
  /* Step 3: Generate tmp2 = tmp1 * step.  */
  rtx tmp2 = gen_reg_rtx (builder.mode ());
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117353.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117353.c
new file mode 100644
index 000..135a00194c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117353.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gcv_zvl256b -mabi=lp64d" } */
+
+int *b;
+
+inline void c (char *d, int e)
+{
+  d[0] = 0;
+  d[1] = e;
+}
+
+void f ();
+
+void h ()
+{
+  for (;;)
+{
+  char *a;
+  long g = 8;
+  while (g)
+ {
+   c (a, *b);
+   b++;
+   a += 2;
+   g--;
+ }
+  f ();
+}
+}
-- 
2.47.1
 
 


[COMMITTED 18/20] ada: Cleanup preanalysis of static expressions

2024-12-13 Thread Marc Poulhiès
From: Javier Miranda 

During preanalysis, the frontend does not generate freeze nodes.
The exception to this rule occurs during the preanalysis of default
and per-object expressions, where static expressions are frozen.

A patch merged six years ago to address an issue in this area introduced
additional complexity and confusion regarding the frontend's behavior in
such cases. The purpose of this patch is to revert that change, simplifying
the support for the preanalysis of static expressions to make it cleaner
and easier to understand.

gcc/ada/ChangeLog:

* sem.ads (Inside_Preanalysis_Without_Freezing): Removed.
* sem.adb (Semantics): Remove Inside_Preanalysis_Without_Freezing.
* sem_ch6.adb (Preanalyze_Formal_Expression): Removed.
* sem_ch3.ads (Preanalyze_Assert_Expression): Add documentation.
(Preanalyze_Spec_Expression): Add documentation.
* sem_ch3.adb (Preanalyze_Assert_Expression) Code cleanup.
(Preanalyze_Default_Expression): Code cleanup.
* sem_res.ads (Preanalyze_With_Freezing_And_Resolve): Removed.
* sem_res.adb (Preanalyze_With_Freezing_And_Resolve): Removed.
(Preanalyze_And_Resolve): Code cleanup.
* freeze.adb (Freeze_Entity): No freeze under strict preanalysis.
(Freeze_Expression): Code cleanup.
(Freeze_Expr_Types): Replace call to Preanalyze_Spec_Expression by
strict preanalysis during preanalysis of a duplicate of the
expression performed to have available the minimum decoration
to locate referenced unfrozen types.
* sem_aggr.adb (Resolve_Array_Aggregate): Minor code cleanup.
* sem_attr.adb (Resolve_Attribute): Add documentation.
* sem_ch13.adb (Resolve_Aspect_Expressions[Aspect_Default_Value]):
Replace call to Preanalyze_Spec_Expression by Preanalyze_And_Resolve.
(Resolve_Aspect_Expressions[Aspect_Default_Component_Value]): Ditto.
* sem_ch8.adb (Set_Entity_Or_Discriminal): Code cleaup.
* sem_prag.adb (Analyze_Initial_Condition_In_Decl_Part): Replace
call to Preanalyze_Assert_Expression by call to Preanalyze_And_Resolve.
(Analyze_Pre_Post_Condition): Replace call to Preanayze_Spec_Expression
by call to Preanalyze_Assert_Expression.
* sem_util.ads (In_Pragma_Expression): Adding a formal to extend the
functionality of this subprogram.
(Within_Static_Expression): New subprogram.
* sem_util.adb (In_Pragma_Expression): Ditto.
(Within_Static_Expression): Ditto.
* checks.adb (Install_Null_Excluding_Check): No check during 
preanalysis.
(Install_Primitive_Elaboration_Check): Ditto.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb   | 13 --
 gcc/ada/freeze.adb   | 46 ---
 gcc/ada/sem.adb  | 11 -
 gcc/ada/sem.ads  | 19 +++
 gcc/ada/sem_aggr.adb | 17 -
 gcc/ada/sem_attr.adb |  3 ++-
 gcc/ada/sem_ch13.adb |  6 ++---
 gcc/ada/sem_ch3.adb  | 21 
 gcc/ada/sem_ch3.ads  | 10 
 gcc/ada/sem_ch6.adb  | 22 +++--
 gcc/ada/sem_ch8.adb  | 14 +++
 gcc/ada/sem_prag.adb |  4 ++--
 gcc/ada/sem_res.adb  | 50 +++---
 gcc/ada/sem_res.ads  |  3 ---
 gcc/ada/sem_util.adb | 57 
 gcc/ada/sem_util.ads | 12 --
 16 files changed, 144 insertions(+), 164 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 1ec49924c9b..c30c99b31aa 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -8405,11 +8405,15 @@ package body Checks is
 
   if Inside_A_Generic then
  return;
-  end if;
+
+  --  No check during preanalysis
+
+  elsif Preanalysis_Active then
+ return;
 
   --  No check needed if known to be non-null
 
-  if Known_Non_Null (N) then
+  elsif Known_Non_Null (N) then
  return;
   end if;
 
@@ -8569,6 +8573,11 @@ package body Checks is
   if GNATprove_Mode then
  return;
 
+  --  No check during preanalysis
+
+  elsif Preanalysis_Active then
+ return;
+
   --  Do not generate an elaboration check if all checks have been
   --  suppressed.
 
diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index dae1d9afcde..c36f626cc8c 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -6445,9 +6445,11 @@ package body Freeze is
 
  goto Leave;
 
-  --  Do not freeze if we are preanalyzing without freezing
+  --  Do not freeze under strict preanalysis
 
-  elsif Inside_Preanalysis_Without_Freezing > 0 then
+  elsif Preanalysis_Active
+and then not Within_Spec_Static_Expression (N)
+  then
  Result := No_List;
  goto Leave;
 
@@ -8532,29 +8534,21 @@ package body Freeze is
 
   if Must_Not_Freeze (N) then
  return;
-  end if;
 
-  --  If expression is non-

[COMMITTED 19/20] ada: Pass artificial_p to create_type_decl

2024-12-13 Thread Marc Poulhiès
From: Tom Tromey 

The recent "nameless types" change to gcc-interface caused the gdb
pretty-printer for VSS to fail.  This happens because one call to
create_type_decl unconditionally passes "true" as the "artificial_p"
parameter.  This patch changes this call to instead pass the entity's
local artificial_p value instead.  This makes sense, I think, because
the type decl being created for debug purposes (as the comment says)
is there to represent the relevant entity from the source.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Pass artificial_p to
create_type_decl.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 024bf456bc9..8f20de2c9b7 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -4558,8 +4558,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
 false, definition, false);
 
  if (gnu_type != orig_type && !gnu_decl)
-   create_type_decl (gnu_entity_name, orig_type, true, debug_info_p,
- gnat_entity);
+   create_type_decl (gnu_entity_name, orig_type, artificial_p,
+ debug_info_p, gnat_entity);
}
 
   /* Now set the RM size of the type.  We cannot do it before padding
-- 
2.43.0



[PATCH] i386: Add vec_fm{addsub,subadd}v2sf4 patterns [PR116979]

2024-12-13 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the addition of vec_addsubv2sf3 expander caused
the testcase to be vectorized and no longer to use fma.
The following patch adds new expanders so that it can be vectorized
again with the alternating add/sub fma instructions.

There is some bug on the slp cost computation side which causes it
not to count some scalar multiplication costs, but I think the patch
is desirable anyway before that is fixed and the testcase for now just
uses -fvect-cost-model=unlimited.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-12-13  Jakub Jelinek  

PR target/116979
* config/i386/mmx.md (vec_fmaddsubv2sf4, vec_fmsubaddv2sf4): New
define_expand patterns.

* gcc.target/i386/pr116979.c: New test.

--- gcc/config/i386/mmx.md.jj   2024-12-12 19:46:50.651306295 +0100
+++ gcc/config/i386/mmx.md  2024-12-12 20:15:39.502007436 +0100
@@ -1132,6 +1132,54 @@ (define_expand "vec_addsubv2sf3"
   DONE;
 })
 
+(define_expand "vec_fmaddsubv2sf4"
+  [(match_operand:V2SF 0 "register_operand")
+   (match_operand:V2SF 1 "nonimmediate_operand")
+   (match_operand:V2SF 2 "nonimmediate_operand")
+   (match_operand:V2SF 3 "nonimmediate_operand")]
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
+   && TARGET_MMX_WITH_SSE
+   && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V4SFmode);
+  rtx op2 = gen_reg_rtx (V4SFmode);
+  rtx op1 = gen_reg_rtx (V4SFmode);
+  rtx op0 = gen_reg_rtx (V4SFmode);
+
+  emit_insn (gen_movq_v2sf_to_sse (op3, operands[3]));
+  emit_insn (gen_movq_v2sf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v2sf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_vec_fmaddsubv4sf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode));
+  DONE;
+})
+
+(define_expand "vec_fmsubaddv2sf4"
+  [(match_operand:V2SF 0 "register_operand")
+   (match_operand:V2SF 1 "nonimmediate_operand")
+   (match_operand:V2SF 2 "nonimmediate_operand")
+   (match_operand:V2SF 3 "nonimmediate_operand")]
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
+   && TARGET_MMX_WITH_SSE
+   && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V4SFmode);
+  rtx op2 = gen_reg_rtx (V4SFmode);
+  rtx op1 = gen_reg_rtx (V4SFmode);
+  rtx op0 = gen_reg_rtx (V4SFmode);
+
+  emit_insn (gen_movq_v2sf_to_sse (op3, operands[3]));
+  emit_insn (gen_movq_v2sf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v2sf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_vec_fmsubaddv4sf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode));
+  DONE;
+})
+
 ;
 ;;
 ;; Parallel single-precision floating point comparisons
--- gcc/testsuite/gcc.target/i386/pr116979.c.jj 2024-12-12 20:19:18.179934902 
+0100
+++ gcc/testsuite/gcc.target/i386/pr116979.c2024-12-12 20:21:31.685059095 
+0100
@@ -0,0 +1,24 @@
+/* PR target/116979 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited" } */
+/* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)pd" } } */
+/* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)ps" { target lp64 } } 
} */
+
+struct S { __complex__ float f; };
+struct T { __complex__ double f; };
+
+struct S
+foo (const struct S *a, const struct S *b)
+{
+  struct S r;
+  r.f = a->f * b->f;
+  return r;
+}
+
+struct T
+bar (const struct T *a, const struct T *b)
+{
+  struct T r;
+  r.f = a->f * b->f;
+  return r;
+}

Jakub



[committed] testsuite: Fix typo in directive names

2024-12-13 Thread Jakub Jelinek
Hi!

Some directives in the test were #errror rather than #error.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to
trunk as obvious.

2024-12-13  Jakub Jelinek  

* c-c++-common/cpp/embed-1.c: Use #error rather than #errror.

--- gcc/testsuite/c-c++-common/cpp/embed-1.c.jj 2024-09-12 23:12:54.030424131 
+0200
+++ gcc/testsuite/c-c++-common/cpp/embed-1.c2024-12-12 22:18:33.800427075 
+0100
@@ -51,14 +51,14 @@
 #endif
 
 #if __has_embed ("embed-1.c" limit (0)) != __STDC_EMBED_EMPTY__
-#errror "__has_embed fail"
+#error "__has_embed fail"
 #endif
 
 #define E1 "embed-1.c"
 #define E2 limit (
 #define E3 1)
 #if __has_embed (E1 E2 E3) != __STDC_EMBED_FOUND__
-#errror "__has_embed fail"
+#error "__has_embed fail"
 #endif
 
 #if __has_embed () != __STDC_EMBED_NOT_FOUND__
@@ -77,19 +77,19 @@ E6 \
 #endif
 
 #if __has_embed ("embed-1.inc") != __STDC_EMBED_FOUND__
-#errror "__has_embed fail"
+#error "__has_embed fail"
 #endif
 
 #if __has_embed ( "embed-1.inc" __limit__ ( 7 - 7 ) ) != __STDC_EMBED_EMPTY__
-#errror "__has_embed fail"
+#error "__has_embed fail"
 #endif
 
 #if __has_embed () != __STDC_EMBED_FOUND__
-#errror "__has_embed fail"
+#error "__has_embed fail"
 #endif
 
 #if __has_embed ( limit(0)) != __STDC_EMBED_EMPTY__
-#errror "__has_embed fail"
+#error "__has_embed fail"
 #endif
 
 #if __has_embed ("../empty.h") != __STDC_EMBED_EMPTY__

Jakub



Re: [PATCH] i386: Add vec_fm{addsub,subadd}v2sf4 patterns [PR116979]

2024-12-13 Thread Uros Bizjak
On Fri, Dec 13, 2024 at 9:57 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, the addition of vec_addsubv2sf3 expander caused
> the testcase to be vectorized and no longer to use fma.
> The following patch adds new expanders so that it can be vectorized
> again with the alternating add/sub fma instructions.
>
> There is some bug on the slp cost computation side which causes it
> not to count some scalar multiplication costs, but I think the patch
> is desirable anyway before that is fixed and the testcase for now just
> uses -fvect-cost-model=unlimited.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-12-13  Jakub Jelinek  
>
> PR target/116979
> * config/i386/mmx.md (vec_fmaddsubv2sf4, vec_fmsubaddv2sf4): New
> define_expand patterns.
>
> * gcc.target/i386/pr116979.c: New test.

OK with a small test adjustment (scan string target).

Thanks,
Uros.

>
> --- gcc/config/i386/mmx.md.jj   2024-12-12 19:46:50.651306295 +0100
> +++ gcc/config/i386/mmx.md  2024-12-12 20:15:39.502007436 +0100
> @@ -1132,6 +1132,54 @@ (define_expand "vec_addsubv2sf3"
>DONE;
>  })
>
> +(define_expand "vec_fmaddsubv2sf4"
> +  [(match_operand:V2SF 0 "register_operand")
> +   (match_operand:V2SF 1 "nonimmediate_operand")
> +   (match_operand:V2SF 2 "nonimmediate_operand")
> +   (match_operand:V2SF 3 "nonimmediate_operand")]
> +  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
> +   && TARGET_MMX_WITH_SSE
> +   && ix86_partial_vec_fp_math"
> +{
> +  rtx op3 = gen_reg_rtx (V4SFmode);
> +  rtx op2 = gen_reg_rtx (V4SFmode);
> +  rtx op1 = gen_reg_rtx (V4SFmode);
> +  rtx op0 = gen_reg_rtx (V4SFmode);
> +
> +  emit_insn (gen_movq_v2sf_to_sse (op3, operands[3]));
> +  emit_insn (gen_movq_v2sf_to_sse (op2, operands[2]));
> +  emit_insn (gen_movq_v2sf_to_sse (op1, operands[1]));
> +
> +  emit_insn (gen_vec_fmaddsubv4sf4 (op0, op1, op2, op3));
> +
> +  emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode));
> +  DONE;
> +})
> +
> +(define_expand "vec_fmsubaddv2sf4"
> +  [(match_operand:V2SF 0 "register_operand")
> +   (match_operand:V2SF 1 "nonimmediate_operand")
> +   (match_operand:V2SF 2 "nonimmediate_operand")
> +   (match_operand:V2SF 3 "nonimmediate_operand")]
> +  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
> +   && TARGET_MMX_WITH_SSE
> +   && ix86_partial_vec_fp_math"
> +{
> +  rtx op3 = gen_reg_rtx (V4SFmode);
> +  rtx op2 = gen_reg_rtx (V4SFmode);
> +  rtx op1 = gen_reg_rtx (V4SFmode);
> +  rtx op0 = gen_reg_rtx (V4SFmode);
> +
> +  emit_insn (gen_movq_v2sf_to_sse (op3, operands[3]));
> +  emit_insn (gen_movq_v2sf_to_sse (op2, operands[2]));
> +  emit_insn (gen_movq_v2sf_to_sse (op1, operands[1]));
> +
> +  emit_insn (gen_vec_fmsubaddv4sf4 (op0, op1, op2, op3));
> +
> +  emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode));
> +  DONE;
> +})
> +
>  ;
>  ;;
>  ;; Parallel single-precision floating point comparisons
> --- gcc/testsuite/gcc.target/i386/pr116979.c.jj 2024-12-12 20:19:18.179934902 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr116979.c2024-12-12 20:21:31.685059095 
> +0100
> @@ -0,0 +1,24 @@
> +/* PR target/116979 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited" } */
> +/* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)pd" } } */
> +/* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)ps" { target lp64 } 
> } } */

/* dg-final { scan-assembler "..." { target { ! ia32 } } } } */

x32 is TARGET_MMX_WITH_SSE ilp32 target and is able to vectorize the
testcase as well.

> +
> +struct S { __complex__ float f; };
> +struct T { __complex__ double f; };
> +
> +struct S
> +foo (const struct S *a, const struct S *b)
> +{
> +  struct S r;
> +  r.f = a->f * b->f;
> +  return r;
> +}
> +
> +struct T
> +bar (const struct T *a, const struct T *b)
> +{
> +  struct T r;
> +  r.f = a->f * b->f;
> +  return r;
> +}
>
> Jakub
>


Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

2024-12-13 Thread Jennifer Schmitz


> On 11 Dec 2024, at 13:01, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> As follow-up to
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>> this patch implements folding of svmul by -1 to svneg for
>> unsigned SVE vector types. The key idea is to reuse the existing code that
>> does this fold for signed types and feed it as callback to a helper function
>> that adds the necessary type conversions.
>> 
>> For example, for the test case
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  return svmul_n_u64_x (pg, x, -1);
>> }
>> 
>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  svint64_t D.12921;
>>  svint64_t D.12920;
>>  svuint64_t D.12919;
>> 
>>  D.12920 = VIEW_CONVERT_EXPR(x);
>>  D.12921 = svneg_s64_x (pg, D.12920);
>>  D.12919 = VIEW_CONVERT_EXPR(D.12921);
>>  goto ;
>>  :
>>  return D.12919;
>> }
>> 
>> In general, the new helper gimple_folder::convert_and_fold
>> - takes a target type and a function pointer,
>> - converts the lhs and all non-boolean vector types to the target type,
>> - passes the converted lhs and arguments to the callback,
>> - receives the new gimple statement from the callback function,
>> - adds the necessary view converts to the gimple sequence,
>> - and returns the new call.
>> 
>> Because all arguments are converted to the same target types, the helper
>> function is only suitable for folding calls whose arguments are all of
>> the same type. If necessary, this could be extended to convert the
>> arguments to different types differentially.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/ChangeLog:
>> 
>>  * config/aarch64/aarch64-sve-builtins-base.cc
>>  (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>  function and pass to gimple_folder::convert_and_fold to enable
>>  the transform for unsigned types.
>>  * config/aarch64/aarch64-sve-builtins.cc
>>  (gimple_folder::convert_and_fold): New function that converts
>>  operands to target type before calling callback function, adding the
>>  necessary conversion statements.
>>  * config/aarch64/aarch64-sve-builtins.h
>>  (gimple_folder::convert_and_fold): Declare function.
>>  (signed_type_suffix_index): Return type_suffix_index of signed
>>  vector type for given width.
>>  (function_instance::signed_type): Return signed vector type for
>>  given width.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>  expected outcome.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc  | 70 +--
>> gcc/config/aarch64/aarch64-sve-builtins.cc| 43 
>> gcc/config/aarch64/aarch64-sve-builtins.h | 31 
>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>> 7 files changed, 153 insertions(+), 34 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 87e9909b55a..52401a8c57a 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -2092,33 +2092,61 @@ public:
>>   return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>> /* If one of the operands is all integer -1, fold to svneg.  */
>> -tree pg = gimple_call_arg (f.call, 0);
>> -tree negated_op = NULL;
>> -if (integer_minus_onep (op2))
>> -  negated_op = op1;
>> -else if (integer_minus_onep (op1))
>> -  negated_op = op2;
>> -if (!f.type_suffix (0).unsigned_p && negated_op)
>> +if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>   {
>> - function_instance instance ("svneg", functions::svneg,
>> - shapes::unary, MODE_none,
>> - f.type_suffix_ids, GROUP_none, f.pred);
>> - gcall *call = f.redirect_call (instance);
>> - unsigned offset_index = 0;
>> - if (f.pred == PRED_m)
>> + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
>> + vec &args_conv) -> gimple *
>>{
>> - offset_index = 1;
>> - gimple_call_set_arg (call, 0, op1);
>> -   }
>> - else
>> -   gimple_set_num_ops (call, 5);
>> - gimple_call_set_arg (call, offset_index, pg);
>> - gimple_call_set_arg (call, offset_index + 1, negated_op);

[Fortran, Patch, PR114612, v1] Fix missing deep-copy for allocatable components of derived types having cycles.

2024-12-13 Thread Andre Vehreschild
Hi all,

attached patch fixes deep-copying (or rather its former absence) for
allocatable components of derived types having cyclic dependencies.

Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 4721060d14920335c1b50816d93196c847064ebe Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 13 Dec 2024 12:07:01 +0100
Subject: [PATCH] Fortran: Ensure deep copy of allocatable components in cylic
 types [PR114612]

gcc/fortran/ChangeLog:

	PR fortran/114612

	* trans-array.cc (structure_alloc_comps): Ensure deep copy is
	also done for types having cycles.

gcc/testsuite/ChangeLog:

	* gfortran.dg/alloc_comp_deep_copy_4.f03: New test.
---
 gcc/fortran/trans-array.cc|  7 ++---
 .../gfortran.dg/alloc_comp_deep_copy_4.f03| 29 +++
 2 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 366127d5651..bec14ec254c 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -10583,10 +10583,9 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, tree dest,
 	   false, false, NULL_TREE, NULL_TREE);
 	  gfc_add_expr_to_block (&fnblock, tmp);
 	}
-	  else if ((c->attr.allocatable)
-		&& !c->attr.proc_pointer && !same_type
-		&& (!(cmp_has_alloc_comps && c->as) || c->attr.codimension
-			|| caf_in_coarray (caf_mode)))
+	  else if (c->attr.allocatable && !c->attr.proc_pointer
+		   && (!(cmp_has_alloc_comps && c->as) || c->attr.codimension
+		   || caf_in_coarray (caf_mode)))
 	{
 	  rank = c->as ? c->as->rank : 0;
 	  if (c->attr.codimension)
diff --git a/gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03 b/gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03
new file mode 100644
index 000..3c445be032f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03
@@ -0,0 +1,29 @@
+!{ dg-do run }
+!
+! Contributed Vladimir Terzi  
+! Check that deep-copy for b=a works.
+
+program pr114672
+type node
+integer::val
+type(node),allocatable::next
+end type
+
+type(node)::a,b
+
+allocate(a%next)
+a%val=1
+a%next%val=2
+!print*,a%val,a%next%val
+b=a
+b%val=3
+b%next%val=4
+if (loc(b) == loc(a)) stop 1
+if (loc(b%next) == loc(a%next)) stop 2
+!print*,a%val,a%next%val
+deallocate(b%next)
+if (.NOT. allocated(a%next)) stop 3
+!print*,a%val,a%next%val
+deallocate(a%next)
+end
+
--
2.47.1



[PATCH v2] c++: Disallow decomposition of lambda bases [PR90321]

2024-12-13 Thread Nathaniel Shead
On Thu, Nov 21, 2024 at 04:01:02PM -0500, Marek Polacek wrote:
> On Thu, Nov 07, 2024 at 09:48:52PM +1100, Nathaniel Shead wrote:
> > Bootstrapped and lightly regtested on x86_64-pc-linux-gnu (so far just
> > dg.exp), OK for trunk if full regtest succeeds?
> > 
> > -- >8 --
> > 
> > Decomposition of lambda closure types is not allowed by
> > [dcl.struct.bind] p6, since members of a closure have no name.
> > 
> > r244909 made this an error, but missed the case where a lambda is used
> > as a base.  This patch moves the check to find_decomp_class_base to
> > handle this case.
> > 
> > As a drive-by improvement, we also slightly improve the diagnostics to
> > indicate why a base class was being inspected.  Ideally the diagnostic
> > would point directly at the relevant base, but there doesn't seem to be
> > an easy way to get this location just from the binfo so I don't worry
> > about that here.
> > 
> > PR c++/90321
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.cc (find_decomp_class_base): Check for decomposing a
> > lambda closure type.  Report base class chains if needed.
> > (cp_finish_decomp): Remove no-longer-needed check.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1z/decomp62.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  gcc/cp/decl.cc| 20 ++--
> >  gcc/testsuite/g++.dg/cpp1z/decomp62.C | 12 
> >  2 files changed, 26 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C
> > 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 0e4533c6fab..87480dca1ac 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -9268,6 +9268,14 @@ cp_finish_decl (tree decl, tree init, bool 
> > init_const_expr_p,
> >  static tree
> >  find_decomp_class_base (location_t loc, tree type, tree ret)
> >  {
> > +  if (LAMBDA_TYPE_P (type))
> > +{
> 
> Missing auto_diagnostic_group d; here?
> 

Thanks, fixed.

> > +  error_at (loc, "cannot decompose lambda closure type %qT", type);
> > +  inform (DECL_SOURCE_LOCATION (TYPE_NAME (type)),
> > + "lambda declared here");
> > +  return error_mark_node;
> > +}
> > +
> >bool member_seen = false;
> >for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> >  if (TREE_CODE (field) != FIELD_DECL
> > @@ -9310,9 +9318,14 @@ find_decomp_class_base (location_t loc, tree type, 
> > tree ret)
> >for (binfo = TYPE_BINFO (type), i = 0;
> > BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
> >  {
> > +  auto_diagnostic_group d;
> >tree t = find_decomp_class_base (loc, TREE_TYPE (base_binfo), ret);
> >if (t == error_mark_node)
> > -   return error_mark_node;
> > +   {
> > + inform (DECL_SOURCE_LOCATION (TYPE_NAME (type)),
> 
> location_of might be nicer.
> 

Yeah, I agree, thanks.  Here's an updated version of the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Decomposition of lambda closure types is not allowed by
[dcl.struct.bind] p6, since members of a closure have no name.

r244909 made this an error, but missed the case where a lambda is used
as a base.  This patch moves the check to find_decomp_class_base to
handle this case.

As a drive-by improvement, we also slightly improve the diagnostics to
indicate why a base class was being inspected.  Ideally the diagnostic
would point directly at the relevant base, but there doesn't seem to be
an easy way to get this location just from the binfo so I don't worry
about that here.

PR c++/90321

gcc/cp/ChangeLog:

* decl.cc (find_decomp_class_base): Check for decomposing a
lambda closure type.  Report base class chains if needed.
(cp_finish_decomp): Remove no-longer-needed check.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp62.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Marek Polacek 
---
 gcc/cp/decl.cc| 19 +--
 gcc/testsuite/g++.dg/cpp1z/decomp62.C | 12 
 2 files changed, 25 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4ba6e3784ca..a1b9957a9be 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9405,6 +9405,14 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
 static tree
 find_decomp_class_base (location_t loc, tree type, tree ret)
 {
+  if (LAMBDA_TYPE_P (type))
+{
+  auto_diagnostic_group d;
+  error_at (loc, "cannot decompose lambda closure type %qT", type);
+  inform (location_of (type), "lambda declared here");
+  return error_mark_node;
+}
+
   bool member_seen = false;
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 if (TREE_CODE (field) != FIELD_DECL
@@ -9447,9 +9455,13 @@ find_decomp_class_base (location_t loc, tree type, tree 
ret)
   for (binfo = TYPE_BINFO (type), i = 0;
BINF

Re: [PATCH] replace atoi with strtoul in opts.cc, lto-wrapper.c, lto/lto.c [PR114542]

2024-12-13 Thread Heiko Eißfeldt

I think, I overlooked something, so please ignore that. I will prepare
Patch V2 soon.

On 12/9/24 8:13 PM, Heiko Eißfeldt wrote:

Straight forward replacements of atoi() with strtoul() in order to
avoid UB
and detect invalid argument values.

Tested with x86_64-pc-linux-gnu.
2024-12-09 Heiko Eißfeldt 

    PR lto/114542
    * lto-wrapper.cc (run_gcc):
    Use strtoul with ERANGE check instead of atoi

    * lto/lto.cc (do_whole_program_analysis):
    ditto

    * opts.cc (common_handle_option):
    ditto

    * gcc.dg/pr114542.c: new test case




Re: [PATCH] c++: Only prune capture proxies for constant variables at instantiation time [PR114292]

2024-12-13 Thread Simon Martin
Hi Marek,

On 13 Dec 2024, at 0:44, Marek Polacek wrote:

> On Thu, Dec 12, 2024 at 07:07:38PM +, Simon Martin wrote:
>> We currently ICE upon the following valid (under -Wno-vla) code
>>
>> === cut here ===
>> void f(int c) {
>>   constexpr int r = 4;
>>   [&](auto) { int t[r * c]; }(0);
>> }
>> === cut here ===
>>
>> The problem is that when parsing the lambda body, and more 
>> specifically
>> the multiplication, we mark the lambda as 
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>> even though the replacement of r by 4 is "undone" by the call to
>> build_min_non_dep in build_x_binary_op. This makes 
>> prune_lambda_captures
>
> Ah yeah, because build_min_non_dep gets the original operands.
>
>> remove the proxy declaration while it should not, and we trip on an
>> assert at instantiation time.
>>
>> This patch fixes the ICE by making sure that lambdas are only marked 
>> as
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED when they're instantiated (I tried 
>> other
>> strategies like not undoing constant folding in build_min_non_dep, 

>> but
>> it is pretty intrusive and breaks lots of things).
>
> I've tried that too and I also ran into a number of issues.  I also 

> tried
> checking p_t_d in prune_lambda_capture since it already says "Don't 

> bother
> pruning in a template" but that doesn't work, either.
Yeah, the fundamental “problem” is that for lambdas that are not 
within a template, we generate the closure before instantiating the 
lambda function, so prune_lambda_capture thinks (without my patch) that 
captures to constants have been folded out, which might be the case 
(e.g. with ok_3 in the test I added) or not (e.g. in this PR) depending 
on whether they’re part of an expression for which build_min_non_dep 
has been called.

>> The test I added also shows that we don't always optimize out 
>> captures
>> to constants for lambdas that are not within a template (see ok_2 for

>> example, or ok_3 that unlike ok_2 "regresses" a bit with my patch) - 
>> I'm
>> curious if we consider it a problem or not? If so, I can try to fix 
>> this
>> in a follow-up patch.
>
> Since "P0588R1: Simplifying implicit lambda capture" they are 
> captures,
> though [expr.prim.lambda.capture] gives us license to optimize the
> captures if not ODR-used.  I couldn't find a test where this patch
> would be an ABI change.
Thanks for the pointer! Yeah, whether we optimise out the captures or 

not remains legit.

And if you look specifically at ok_3 from the test case (without the 
patch, the capture is optimised out, with it it’s not anymore):
  - I don’t think it’s an ABI change since we don’t do anything 
with captures in mangle.cc.
  - Since captures are only relevant within the function calling them, 
we don’t have any issue if we link together object files generated 
without the patch and object files generated with the patch.

>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/114292
>>
>> gcc/cp/ChangeLog:
>>
>>  * cp-tree.h (mark_const_var_capture_optimized): Declare.
>>  * expr.cc (mark_use): Call mark_const_var_capture_optimized.
>>  * lambda.cc (mark_const_var_capture_optimized): New. Only set
>>  LAMBDA_EXPR_CAPTURE_OPTIMIZED at lambda instantiation time.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/cpp1y/lambda-ice4.C: New test.
>>
>> ---
>>  gcc/cp/cp-tree.h |  1 +
>>  gcc/cp/expr.cc   | 10 ++
>>  gcc/cp/lambda.cc | 13 +++
>>  gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C | 44 
>> 
>>  4 files changed, 60 insertions(+), 8 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
>>
>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>> index c5e0fc5c440..ce050032fdb 100644
>> --- a/gcc/cp/cp-tree.h
>> +++ b/gcc/cp/cp-tree.h
>> @@ -8058,6 +8058,7 @@ extern bool is_constant_capture_proxy   
>> (tree);
>>  extern void register_capture_members(tree);
>>  extern tree lambda_expr_this_capture(tree, int);
>>  extern void maybe_generic_this_capture  (tree, tree);
>> +extern void mark_const_var_capture_optimized(void);
>>  extern tree maybe_resolve_dummy (tree, bool);
>>  extern tree current_nonlambda_function  (void);
>>  extern tree nonlambda_method_basetype   (void);
>> diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc
>> index de4991e616c..d6a2454c46e 100644
>> --- a/gcc/cp/expr.cc
>> +++ b/gcc/cp/expr.cc
>> @@ -120,10 +120,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,

>>  {
>>tree val = RECUR (cap);
>>if (!is_capture_proxy (val))
>> -{
>> -  tree l = current_lambda_expr ();
>> -  LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>> -}
>> +mark_const_var_capture_optimized ();
>>return val;
>>  }
>>  }
>> @@ -171,10 +168,7 @@ mark_use (tree expr, bool rvalue_p, bool read_

[PATCH 1/2] RISC-V: Update Xsfvfnrclip implementation.

2024-12-13 Thread Jiawei
gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (expand_floattype): New func.
(main):
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_XFQF_OPS):
(vint8mf8_t):
(vint8mf4_t):
(vint8mf2_t):
(vint8m1_t):
(vint8m2_t):
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_XFQF_OPS):
(rvv_arg_type_info::get_xfqf_float_type):
* config/riscv/riscv-vector-builtins.def (xfqf_vector):
(xfqf_float):
* config/riscv/riscv-vector-builtins.h (struct rvv_arg_type_info):
* config/riscv/sifive-vector.md:
* config/riscv/vector-iterators.md:

---
 gcc/config/riscv/genrvv-type-indexer.cc   | 17 ++
 .../riscv/riscv-vector-builtins-types.def | 13 
 gcc/config/riscv/riscv-vector-builtins.cc | 33 +++
 gcc/config/riscv/riscv-vector-builtins.def|  4 ++-
 gcc/config/riscv/riscv-vector-builtins.h  |  1 +
 gcc/config/riscv/sifive-vector.md | 10 +++---
 gcc/config/riscv/vector-iterators.md  | 25 +++---
 7 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index e1eee34237a..a2974269adc 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -164,6 +164,18 @@ floattype (unsigned sew, int lmul_log2)
   return mode.str ();
 }
 
+std::string
+expand_floattype (unsigned sew, int lmul_log2, unsigned nf)
+{
+  if (sew != 8 || nf!= 1
+  || (!valid_type (sew * 4, lmul_log2 + 2, /*float_t*/ true)))
+return "INVALID";
+
+  std::stringstream mode;
+  mode << "vfloat" << sew * 4 << to_lmul (lmul_log2 + 2) << "_t";
+  return mode.str ();
+}
+
 std::string
 floattype (unsigned sew, int lmul_log2, unsigned nf)
 {
@@ -276,6 +288,7 @@ main (int argc, const char **argv)
   fprintf (fp, "  /*QLMUL1*/ INVALID,\n");
   fprintf (fp, "  /*QLMUL1_SIGNED*/ INVALID,\n");
   fprintf (fp, "  /*QLMUL1_UNSIGNED*/ INVALID,\n");
+  fprintf (fp, "  /*XFQF*/ INVALID,\n");
   for (unsigned eew : {8, 16, 32, 64})
fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
 
@@ -384,6 +397,8 @@ main (int argc, const char **argv)
 inttype (8, /*lmul_log2*/ 0, false).c_str ());
fprintf (fp, "  /*QLMUL1_UNSIGNED*/ %s,\n",
 inttype (8, /*lmul_log2*/ 0, true).c_str ());
+   fprintf (fp, "  /*XFQF*/ %s,\n",
+expand_floattype (sew, lmul_log2, nf).c_str ());
for (unsigned eew : {8, 16, 32, 64})
  {
if (eew == sew)
@@ -473,6 +488,7 @@ main (int argc, const char **argv)
 bfloat16_wide_type (/*lmul_log2*/ 0).c_str ());
fprintf (fp, "  /*QLMUL1_SIGNED*/ INVALID,\n");
fprintf (fp, "  /*QLMUL1_UNSIGNED*/ INVALID,\n");
+   fprintf (fp, "  /*XFQF*/ INVALID,\n");
for (unsigned eew : {8, 16, 32, 64})
  fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
 
@@ -558,6 +574,7 @@ main (int argc, const char **argv)
   floattype (sew / 4, /*lmul_log2*/ 0).c_str ());
  fprintf (fp, "  /*QLMUL1_SIGNED*/ INVALID,\n");
  fprintf (fp, "  /*QLMUL1_UNSIGNED*/ INVALID,\n");
+ fprintf (fp, "  /*XFQF*/ INVALID,\n");
  for (unsigned eew : {8, 16, 32, 64})
fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 96412bfd1a5..df55b6a8823 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -363,6 +363,12 @@ along with GCC; see the file COPYING3. If not see
 #define DEF_RVV_QMACC_OPS(TYPE, REQUIRE)
 #endif
 
+/* Use "DEF_RVV_XFQF_OPS" macro include signed integer which will
+   be iterated and registered as intrinsic functions.  */
+#ifndef DEF_RVV_XFQF_OPS
+#define DEF_RVV_XFQF_OPS(TYPE, REQUIRE)
+#endif
+
 DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint8mf4_t, 0)
 DEF_RVV_I_OPS (vint8mf2_t, 0)
@@ -1451,6 +1457,12 @@ DEF_RVV_QMACC_OPS (vint32m2_t, 0)
 DEF_RVV_QMACC_OPS (vint32m4_t, 0)
 DEF_RVV_QMACC_OPS (vint32m8_t, 0)
 
+DEF_RVV_XFQF_OPS (vint8mf8_t, 0)
+DEF_RVV_XFQF_OPS (vint8mf4_t, 0)
+DEF_RVV_XFQF_OPS (vint8mf2_t, 0)
+DEF_RVV_XFQF_OPS (vint8m1_t, 0)
+DEF_RVV_XFQF_OPS (vint8m2_t, 0)
+
 #undef DEF_RVV_I_OPS
 #undef DEF_RVV_U_OPS
 #undef DEF_RVV_F_OPS
@@ -1506,3 +1518,4 @@ DEF_RVV_QMACC_OPS (vint32m8_t, 0)
 #undef DEF_RVV_CRYPTO_SEW64_OPS
 #undef DEF_RVV_F32_OPS
 #undef DEF_RVV_QMACC_OPS
+#undef DEF_RVV_XFQF_OPS
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index b9b9d33adab..37c9f71fa85 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -551,6 +551,12 @@ static const rvv_type_info qmacc_ops[]

Re: [Patch] C++: reject OpenMP directives in constexpr functions

2024-12-13 Thread Jakub Jelinek
On Fri, Dec 13, 2024 at 01:15:36PM +0100, Tobias Burnus wrote:
> OpenMP states for C++:
> 
> "Directives may not appear in constexpr functions or in constant expressions."
> 
> There is some support for this already in GCC, but not for [[omp::decl]]-type
> of directives and it also doesn't work that well. For the example, for the
> newly added testcase, the result with the patch is simple and clear:
> 
> error: OpenMP directives may not appear in ‘constexpr’ functions
> 
> without the patch:
> 
> error: uninitialized variable ‘i’ in ‘constexpr’ function
> error: uninitialized variable ‘i’ in ‘constexpr’ function
> sorry, unimplemented: ‘#pragma omp allocate’ not yet supported
> sorry, unimplemented: ‘#pragma omp allocate’ not yet supported
> error: ‘constexpr int f()’ called in a constant expression
> error: ‘constexpr int g()’ called in a constant expression
> 
> Note: I think OpenACC has a similar issue but as the specification
> is silent about it, the patch only handles OpenMP.
> 
> * * *
> 
> I have not touched the 'case OMP_...:' in constexpr.cc, added in
> previous patches; in principle, those should be now unreachable
> and could be removed.
> I also have not included any OpenACC pragmas, even though they have
> the same issue. (However, contrary to OpenMP, the OpenACC spec is
> silent about constexpr.)
> 
> * * *
> 
> Comments, suggestions, concerns?

LGTM, though I think the restriction is against the direction of C++
development over the recent years.
Pretty much no constructs are disallowed in constexpr functions,
just with some exceptions what is still disallowed when encountering it.

So, I think the above restriction should be replaced for OpenMP 6.1 or
whatever version is planned next.
Generally, we want to allow something like
constexpr
type foo (...)
{
  if consteval {
...
  } else {
#pragma omp ...
...
  }
}
or even just if (some_cond) { #pragma omp ... } in there,
as long as some_cond doesn't evaluate to true at compile time.
So, similar in how more recent C++ versions handle say asm statements
in constexpr functions (or before C++26 e.g. throwing exceptions).
If you encounter it while evaluating it, it is not constant expression,
if you don't, it is not a bug.
And, I don't see why declarative directives should be a problem in constexpr
functions.

Jakub



[PATCH 2/2] RISC-V: Update Xsfvqmacc and Xsfvfnrclip's testcases

2024-12-13 Thread Jiawei
From: Liao Shihua 

---
 gcc/config/riscv/vector.md|  7 ++-
 .../riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c  | 60 ++
 .../riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c | 63 ++-
 .../riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c | 16 +
 .../riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c | 16 +
 .../riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c   | 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c   | 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c| 16 +
 .../riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c| 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c   | 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c   | 17 +
 11 files changed, 259 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 58406f3d17c..d24916d2caf 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -56,7 +56,8 @@
  
vssegtux,vssegtox,vlsegdff,vandn,vbrev,vbrev8,vrev8,vcpop,vclz,vctz,vrol,\
  
vror,vwsll,vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,\
  
vaeskf1,vaeskf2,vaesz,vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c,\
- vfncvtbf16,vfwcvtbf16,vfwmaccbf16")
+ vfncvtbf16,vfwcvtbf16,vfwmaccbf16,\
+ sf_vqmacc,sf_vfnrclip")
 (const_string "true")]
(const_string "false")))
 
@@ -893,7 +894,7 @@
  
vfredo,vfwredu,vfwredo,vslideup,vslidedown,vislide1up,\
  
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
  
vlsegds,vlsegdux,vlsegdox,vandn,vrol,vror,vwsll,vclmul,vclmulh,\
- vfwmaccbf16")
+ vfwmaccbf16,sf_vqmacc,sf_vfnrclip")
   (symbol_ref "riscv_vector::get_ta(operands[6])")
 
 (eq_attr "type" "vimuladd,vfmuladd")
@@ -924,7 +925,7 @@
  vfwalu,vfwmul,vfsgnj,vfcmp,vslideup,vslidedown,\
  
vislide1up,vislide1down,vfslide1up,vfslide1down,vgather,\
  
viwmuladd,vfwmuladd,vlsegds,vlsegdux,vlsegdox,vandn,vrol,\
-  vror,vwsll,vclmul,vclmulh,vfwmaccbf16")
+ 
vror,vwsll,vclmul,vclmulh,vfwmaccbf16,sf_vqmacc,sf_vfnrclip")
   (symbol_ref "riscv_vector::get_ma(operands[7])")
 
 (eq_attr "type" "vimuladd,vfmuladd")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
index 813f7860f64..a4193b5aea9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
@@ -7,6 +7,7 @@
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf8+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -17,6 +18,7 @@ vint8mf8_t 
test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t(vfloat32mf2_t vs2, float rs1
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf4_vint8mf4_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf4+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -27,6 +29,7 @@ vint8mf4_t 
test_sf_vfnrclip_x_f_qf_i8mf4_vint8mf4_t(vfloat32m1_t vs2, float rs1,
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf2_vint8mf2_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf2+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -37,6 +40,7 @@ vint8mf2_t 
test_sf_vfnrclip_x_f_qf_i8mf2_vint8mf2_t(vfloat32m2_t vs2, float rs1,
 /*
 ** test_sf_vfnrclip_x_f_qf_i8m1_vint8m1_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,m1+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -47,6 +51,7 @@ vint8m1_t test_sf_vfnrclip_x_f_qf_i8m1_vint8m1_t(vfloat32m4_t 
vs2, float rs1, si
 /*
 ** test_sf_vfnrclip_x_f_qf_i8m2_vint8m2_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,m2+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -57,6 +62,7 @@ vint8m2_t test_sf_vfnrclip_x_f_qf_i8m2_vint8m2_t(vfloat32m8_t 
vs2, float rs1, si
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf8_m_vint8mf8_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf8+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
 ** ...
 */
@@ -67,6 +73,7 @@ vint8mf8_t 
test_sf_vfnrclip_x_f_qf_i8mf8_m_vint8mf8_t(vbool64_t mask, vfloat32mf
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf4_m_vint8mf4_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf4+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
 ** ...
 */
@@ -77,6 +84,7 @@ vint8mf4_t 
test_sf_vfnrclip_x_f_qf_i8mf4_m_vint8mf4_t(vbool32_t mask, vfloat32m1
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf2_m_vint8mf2_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf2+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
 ** ...
 */
@@ -87,6 +95,7 @@ vint8mf2_t 
test_sf_vfnrclip_x_f_qf_i8mf2_m_vint8mf2_t(vbool16_t mask, vfloat32m2
 /*
 ** test_sf_vfnrclip_x_

[PATCH v3 2/2] RISC-V: Update Xsfvqmacc and Xsfvfnrclip's testcases

2024-12-13 Thread Jiawei
From: Liao Shihua 

Update Sifive Xsfvqmacc and Xsfvfnrclip extension's testcases.

version log:
Update synchronize LMUL settings with return type.

gcc/ChangeLog:

* config/riscv/vector.md: New attr set.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c: Add vsetivli 
checking.
* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: Ditto.

---
 gcc/config/riscv/vector.md|  7 ++-
 .../riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c  | 60 ++
 .../riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c | 63 ++-
 .../riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c | 16 +
 .../riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c | 16 +
 .../riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c   | 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c   | 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c| 16 +
 .../riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c| 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c   | 17 +
 .../riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c   | 17 +
 11 files changed, 259 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 58406f3d17c..d24916d2caf 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -56,7 +56,8 @@
  
vssegtux,vssegtox,vlsegdff,vandn,vbrev,vbrev8,vrev8,vcpop,vclz,vctz,vrol,\
  
vror,vwsll,vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,\
  
vaeskf1,vaeskf2,vaesz,vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c,\
- vfncvtbf16,vfwcvtbf16,vfwmaccbf16")
+ vfncvtbf16,vfwcvtbf16,vfwmaccbf16,\
+ sf_vqmacc,sf_vfnrclip")
 (const_string "true")]
(const_string "false")))
 
@@ -893,7 +894,7 @@
  
vfredo,vfwredu,vfwredo,vslideup,vslidedown,vislide1up,\
  
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
  
vlsegds,vlsegdux,vlsegdox,vandn,vrol,vror,vwsll,vclmul,vclmulh,\
- vfwmaccbf16")
+ vfwmaccbf16,sf_vqmacc,sf_vfnrclip")
   (symbol_ref "riscv_vector::get_ta(operands[6])")
 
 (eq_attr "type" "vimuladd,vfmuladd")
@@ -924,7 +925,7 @@
  vfwalu,vfwmul,vfsgnj,vfcmp,vslideup,vslidedown,\
  
vislide1up,vislide1down,vfslide1up,vfslide1down,vgather,\
  
viwmuladd,vfwmuladd,vlsegds,vlsegdux,vlsegdox,vandn,vrol,\
-  vror,vwsll,vclmul,vclmulh,vfwmaccbf16")
+ 
vror,vwsll,vclmul,vclmulh,vfwmaccbf16,sf_vqmacc,sf_vfnrclip")
   (symbol_ref "riscv_vector::get_ma(operands[7])")
 
 (eq_attr "type" "vimuladd,vfmuladd")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
index 813f7860f64..a4193b5aea9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
@@ -7,6 +7,7 @@
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf8+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -17,6 +18,7 @@ vint8mf8_t 
test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t(vfloat32mf2_t vs2, float rs1
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf4_vint8mf4_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf4+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -27,6 +29,7 @@ vint8mf4_t 
test_sf_vfnrclip_x_f_qf_i8mf4_vint8mf4_t(vfloat32m1_t vs2, float rs1,
 /*
 ** test_sf_vfnrclip_x_f_qf_i8mf2_vint8mf2_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,mf2+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -37,6 +40,7 @@ vint8mf2_t 
test_sf_vfnrclip_x_f_qf_i8mf2_vint8mf2_t(vfloat32m2_t vs2, float rs1,
 /*
 ** test_sf_vfnrclip_x_f_qf_i8m1_vint8m1_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,m1+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -47,6 +51,7 @@ vint8m1_t test_sf_vfnrclip_x_f_qf_i8m1_vint8m1_t(vfloat32m4_t 
vs2, float rs1, si
 /*
 ** test_sf_vfnrclip_x_f_qf_i8m2_vint8m2_t:
 ** ...
+** vsetivli\s+zero+,0+,e8+,m2+,ta+,ma+
 ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
 ** ...
 */
@@ -57,

[PATCH v3 1/2] RISC-V: Update Xsfvfnrclip implementation.

2024-12-13 Thread Jiawei
Update implementation of Xsfvfnrclip, using return type as iterator.

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (expand_floattype): New func.
(main): New type.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_XFQF_OPS): New 
def.
(vint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vint8m2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_XFQF_OPS): Ditto.
(rvv_arg_type_info::get_xfqf_float_type): Ditto.
* config/riscv/riscv-vector-builtins.def (xfqf_vector): Ditto.
(xfqf_float): Ditto.
* config/riscv/riscv-vector-builtins.h
*(struct rvv_arg_type_info): New function prototype.
* config/riscv/sifive-vector.md: Update iterator.
* config/riscv/vector-iterators.md: Ditto.

---
 gcc/config/riscv/genrvv-type-indexer.cc   | 17 ++
 .../riscv/riscv-vector-builtins-types.def | 13 
 gcc/config/riscv/riscv-vector-builtins.cc | 33 +++
 gcc/config/riscv/riscv-vector-builtins.def|  4 ++-
 gcc/config/riscv/riscv-vector-builtins.h  |  1 +
 gcc/config/riscv/sifive-vector.md | 10 +++---
 gcc/config/riscv/vector-iterators.md  | 25 +++---
 7 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index e1eee34237a..a2974269adc 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -164,6 +164,18 @@ floattype (unsigned sew, int lmul_log2)
   return mode.str ();
 }
 
+std::string
+expand_floattype (unsigned sew, int lmul_log2, unsigned nf)
+{
+  if (sew != 8 || nf!= 1
+  || (!valid_type (sew * 4, lmul_log2 + 2, /*float_t*/ true)))
+return "INVALID";
+
+  std::stringstream mode;
+  mode << "vfloat" << sew * 4 << to_lmul (lmul_log2 + 2) << "_t";
+  return mode.str ();
+}
+
 std::string
 floattype (unsigned sew, int lmul_log2, unsigned nf)
 {
@@ -276,6 +288,7 @@ main (int argc, const char **argv)
   fprintf (fp, "  /*QLMUL1*/ INVALID,\n");
   fprintf (fp, "  /*QLMUL1_SIGNED*/ INVALID,\n");
   fprintf (fp, "  /*QLMUL1_UNSIGNED*/ INVALID,\n");
+  fprintf (fp, "  /*XFQF*/ INVALID,\n");
   for (unsigned eew : {8, 16, 32, 64})
fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
 
@@ -384,6 +397,8 @@ main (int argc, const char **argv)
 inttype (8, /*lmul_log2*/ 0, false).c_str ());
fprintf (fp, "  /*QLMUL1_UNSIGNED*/ %s,\n",
 inttype (8, /*lmul_log2*/ 0, true).c_str ());
+   fprintf (fp, "  /*XFQF*/ %s,\n",
+expand_floattype (sew, lmul_log2, nf).c_str ());
for (unsigned eew : {8, 16, 32, 64})
  {
if (eew == sew)
@@ -473,6 +488,7 @@ main (int argc, const char **argv)
 bfloat16_wide_type (/*lmul_log2*/ 0).c_str ());
fprintf (fp, "  /*QLMUL1_SIGNED*/ INVALID,\n");
fprintf (fp, "  /*QLMUL1_UNSIGNED*/ INVALID,\n");
+   fprintf (fp, "  /*XFQF*/ INVALID,\n");
for (unsigned eew : {8, 16, 32, 64})
  fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
 
@@ -558,6 +574,7 @@ main (int argc, const char **argv)
   floattype (sew / 4, /*lmul_log2*/ 0).c_str ());
  fprintf (fp, "  /*QLMUL1_SIGNED*/ INVALID,\n");
  fprintf (fp, "  /*QLMUL1_UNSIGNED*/ INVALID,\n");
+ fprintf (fp, "  /*XFQF*/ INVALID,\n");
  for (unsigned eew : {8, 16, 32, 64})
fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 96412bfd1a5..df55b6a8823 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -363,6 +363,12 @@ along with GCC; see the file COPYING3. If not see
 #define DEF_RVV_QMACC_OPS(TYPE, REQUIRE)
 #endif
 
+/* Use "DEF_RVV_XFQF_OPS" macro include signed integer which will
+   be iterated and registered as intrinsic functions.  */
+#ifndef DEF_RVV_XFQF_OPS
+#define DEF_RVV_XFQF_OPS(TYPE, REQUIRE)
+#endif
+
 DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint8mf4_t, 0)
 DEF_RVV_I_OPS (vint8mf2_t, 0)
@@ -1451,6 +1457,12 @@ DEF_RVV_QMACC_OPS (vint32m2_t, 0)
 DEF_RVV_QMACC_OPS (vint32m4_t, 0)
 DEF_RVV_QMACC_OPS (vint32m8_t, 0)
 
+DEF_RVV_XFQF_OPS (vint8mf8_t, 0)
+DEF_RVV_XFQF_OPS (vint8mf4_t, 0)
+DEF_RVV_XFQF_OPS (vint8mf2_t, 0)
+DEF_RVV_XFQF_OPS (vint8m1_t, 0)
+DEF_RVV_XFQF_OPS (vint8m2_t, 0)
+
 #undef DEF_RVV_I_OPS
 #undef DEF_RVV_U_OPS
 #undef DEF_RVV_F_OPS
@@ -1506,3 +1518,4 @@ DEF_RVV_QMACC_OPS (vint32m8_t, 0)
 #undef DEF_RVV_CRYPTO_SEW64_OPS
 #undef DEF_RVV_F32_OPS
 #undef DEF_RVV_QMACC_OPS
+#undef DEF_RVV_XFQF_OPS
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-ve

Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-12-13 Thread Richard Biener
On Thu, Dec 12, 2024 at 5:27 PM Jennifer Schmitz  wrote:
>
>
>
> > On 6 Dec 2024, at 08:41, Jennifer Schmitz  wrote:
> >
> >
> >
> >> On 5 Dec 2024, at 20:07, Richard Sandiford  
> >> wrote:
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Jennifer Schmitz  writes:
>  On 5 Dec 2024, at 11:44, Richard Biener  wrote:
> 
>  External email: Use caution opening links or attachments
> 
> 
>  On Thu, 5 Dec 2024, Jennifer Schmitz wrote:
> 
> >
> >
> >> On 17 Oct 2024, at 19:23, Richard Sandiford 
> >>  wrote:
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Jennifer Schmitz  writes:
> >>> [...]
> >>> Looking at the diff of the vect dumps (below is a section of the diff 
> >>> for strided_store_2.c), it seemed odd that vec_to_scalar operations 
> >>> cost 0 now, instead of the previous cost of 2:
> >>>
> >>> +strided_store_1.c:38:151: note:=== vectorizable_operation ===
> >>> +strided_store_1.c:38:151: note:vect_model_simple_cost: 
> >>> inside_cost = 1, prologue_cost  = 0 .
> >>> +strided_store_1.c:38:151: note:   ==> examining statement: *_6 = _7;
> >>> +strided_store_1.c:38:151: note:   vect_is_simple_use: operand _3 + 
> >>> 1.0e+0, type of def:internal
> >>> +strided_store_1.c:38:151: note:   Vectorizing an unaligned access.
> >>> +Applying pattern match.pd:236, generic-match-9.cc:4128
> >>> +Applying pattern match.pd:5285, generic-match-10.cc:4234
> >>> +strided_store_1.c:38:151: note:   vect_model_store_cost: inside_cost 
> >>> = 12, prologue_cost = 0 .
> >>> *_2 1 times unaligned_load (misalign -1) costs 1 in body
> >>> -_3 + 1.0e+0 1 times scalar_to_vec costs 1 in prologue
> >>> _3 + 1.0e+0 1 times vector_stmt costs 1 in body
> >>> -_7 1 times vec_to_scalar costs 2 in body
> >>> + 1 times vector_load costs 1 in prologue
> >>> +_7 1 times vec_to_scalar costs 0 in body
> >>> _7 1 times scalar_store costs 1 in body
> >>> -_7 1 times vec_to_scalar costs 2 in body
> >>> +_7 1 times vec_to_scalar costs 0 in body
> >>> _7 1 times scalar_store costs 1 in body
> >>> -_7 1 times vec_to_scalar costs 2 in body
> >>> +_7 1 times vec_to_scalar costs 0 in body
> >>> _7 1 times scalar_store costs 1 in body
> >>> -_7 1 times vec_to_scalar costs 2 in body
> >>> +_7 1 times vec_to_scalar costs 0 in body
> >>> _7 1 times scalar_store costs 1 in body
> >>>
> >>> Although the aarch64_use_new_vector_costs_p flag was used in multiple 
> >>> places in aarch64.cc, the location that causes this behavior is this 
> >>> one:
> >>> unsigned
> >>> aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt 
> >>> kind,
> >>>  stmt_vec_info stmt_info, slp_tree,
> >>>  tree vectype, int misalign,
> >>>  vect_cost_model_location where)
> >>> {
> >>> [...]
> >>> /* Try to get a more accurate cost by looking at STMT_INFO instead
> >>>  of just looking at KIND.  */
> >>> -  if (stmt_info && aarch64_use_new_vector_costs_p ())
> >>> +  if (stmt_info)
> >>> {
> >>>   /* If we scalarize a strided store, the vectorizer costs one
> >>>  vec_to_scalar for each element.  However, we can store the first
> >>>  element using an FP store without a separate extract step.  */
> >>>   if (vect_is_store_elt_extraction (kind, stmt_info))
> >>> count -= 1;
> >>>
> >>>   stmt_cost = aarch64_detect_scalar_stmt_subtype (m_vinfo, kind,
> >>>   stmt_info, 
> >>> stmt_cost);
> >>>
> >>>   if (vectype && m_vec_flags)
> >>> stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
> >>> stmt_info, 
> >>> vectype,
> >>> where, stmt_cost);
> >>> }
> >>> [...]
> >>> return record_stmt_cost (stmt_info, where, (count * stmt_cost).ceil 
> >>> ());
> >>> }
> >>>
> >>> Previously, for mtune=generic, this function returned a cost of 2 for 
> >>> a vec_to_scalar operation in the vect body. Now "if (stmt_info)" is 
> >>> entered and "if (vect_is_store_elt_extraction (kind, stmt_info))" 
> >>> evaluates to true, which sets the count to 0 and leads to a return 
> >>> value of 0.
> >>
> >> At the time the code was written, a scalarised store would be costed
> >> using one vec_to_scalar call into the backend, with the count parameter
> >> set to the number of elements being stored.  The "count -= 1" was
> >> supposed to lop off the leading element extraction, since we can store
> >> lane 0 as a normal FP store.
> >>
> >> The target-independent co

[PATCH] RISC-V: optimization by converting to LUI operands with LUI_AFTER_COMMON_LEADING_SHIFT

2024-12-13 Thread Oliver Kozul
The patch optimizes code generation for comparisons of the form
X & C1 == C2. When the bitwise AND mask is stored in the lower 20 bits
it can be left shifted so it behaves as a LUI operand instead,
saving an addi instruction while loading.

2024-12-13  Oliver Kozul  

  PR target/114087

gcc/ChangeLog:

  * config/riscv/riscv.h (COMMON_LEADING_ZEROS): New macro.
  (LUI_AFTER_COMMON_LEADING_SHIFT): New macro.
  * config/riscv/riscv.md (*lui_constraint_ashift): New pattern.

gcc/testsuite/ChangeLog:

  * gcc.target/riscv/pr114087-3.c: New test.



CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/riscv.h| 18 +++
 gcc/config/riscv/riscv.md   | 35 +
 gcc/testsuite/gcc.target/riscv/pr114087-3.c | 10 ++
 3 files changed, 63 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114087-3.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 09de74667a9..92850a52251 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -677,6 +677,24 @@ enum reg_class
   (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2))
\
&& SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)))
 
+/* Returns the smaller (common) number of leading zeros for VAL1 and VAL2.  */
+#define COMMON_LEADING_ZEROS(VAL1, VAL2)   \
+  (clz_hwi (VAL1) < clz_hwi (VAL2) \
+   ? clz_hwi (VAL1)\
+   : clz_hwi (VAL2))
+
+/* Returns true if both VAL1 and VAL2 are LUI_OPERANDs after shifting by
+   the common number of leading zeros (-1 to account for sign).  */
+#define LUI_AFTER_COMMON_LEADING_SHIFT(VAL1, VAL2) \
+  ((LUI_OPERAND \
+  ((VAL1) << (COMMON_LEADING_ZEROS (VAL1, VAL2) - 1))  \
+   && LUI_OPERAND \
+   ((VAL2) << (COMMON_LEADING_ZEROS (VAL1, VAL2) - 1))) \
+   || ((LUI_OPERAND \
+   ((VAL1) << (COMMON_LEADING_ZEROS (VAL1, VAL2) - 33)) \
+   && LUI_OPERAND \
+   ((VAL2) << (COMMON_LEADING_ZEROS (VAL1, VAL2) - 33)
+
 /* Stack layout; function entry, exit and calling.  */
 
 #define STACK_GROWS_DOWNWARD 1
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3a4cd1d93a0..a44caa6908d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -858,6 +858,41 @@
   [(set_attr "type" "arith")
(set_attr "mode" "SI")])
 
+(define_insn_and_split "*lui_constraint_ashift"
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+(plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+(match_operand 2 "const_int_operand"))
+(match_operand 3 "const_int_operand")))
+(clobber (match_scratch:X 4 "=&r"))]
+  "!LUI_OPERAND (INTVAL (operands[2]))
+  && !LUI_OPERAND (-INTVAL (operands[3]))
+  && !SMALL_OPERAND (INTVAL (operands[2]))
+  && !SMALL_OPERAND (-INTVAL (operands[3]))
+  && LUI_AFTER_COMMON_LEADING_SHIFT (INTVAL (operands[2]),
+  -INTVAL (operands[3]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:X (match_dup 1) (match_dup 5)))
+   (set (match_dup 4) (match_dup 6))
+   (set (match_dup 0) (and:X (match_dup 0) (match_dup 4)))
+   (set (match_dup 4) (match_dup 7))
+   (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))]
+  {
+ HOST_WIDE_INT mask = INTVAL (operands[2]);
+HOST_WIDE_INT val = -INTVAL (operands[3]);
+int leading_shift = COMMON_LEADING_ZEROS (mask, val) - 1;
+
+if (TARGET_64BIT && leading_shift > 32)
+{
+  leading_shift -= 32;
+}
+
+operands[5] = GEN_INT (leading_shift);
+operands[6] = GEN_INT (mask << leading_shift);
+operands[7] = GEN_INT (val << leading_shift);
+  }
+[(set_attr "type" "arith")])
+
 ;;
 ;;  
 ;;
diff --git a/gcc/testsuite/gcc.target/riscv/pr114087-3.c 
b/gcc/testsuite/gcc.target/riscv/pr114087-3.c
new file mode 100644
index 000..d93fb354c25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114087-3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+
+int pred3a(int x) {
+  return ((x & 0x0005) == 0x00045014);
+}
+
+/* { dg-final { scan-assembler  {slli\s*[a-x0-9]+,\s*[a-x0-9]+,\s*12}} } */
\ No newline at end of file
-- 
2.43.0



Re: [PATCH] Apply lambda section attributes to static thunks

2024-12-13 Thread Campbell Suter
On 14/12/2024 04:34, Marek Polacek wrote:
>> @@ -1376,6 +1377,13 @@ maybe_add_lambda_conv_op (tree type)
>>if (generic_lambda_p)
>>  fn = add_inherited_template_parms (fn, DECL_TI_TEMPLATE (callop));
>>  
>> +  if (lookup_attribute ("section", DECL_ATTRIBUTES (callop)))
>> +{
>> +  duplicate_one_attribute(&DECL_ATTRIBUTES (fn),
>> +DECL_ATTRIBUTES (callop), "section");
>> +  set_decl_section_name (statfn, callop);
>> +}
> 
> duplicate_one_attribute does two lookups, but we just looked up the
> callop attr, and we just built up fn so it shouldn't have any attrs.
> Thus I wonder if writing it like this would be a little neater:
> 
>   if (tree a = lookup_attribute ("section", DECL_ATTRIBUTES (callop)))
> {
>   DECL_ATTRIBUTES (fn) = attr_chainon (DECL_ATTRIBUTES (fn), a);
>   set_decl_section_name (fn, callop);
> }

I'm regtesting that now, then I'll send it in as a V2 of this patch?

> Sadly I see that just the set_decl_section_name call wouldn't be enough...

Right - the problem I found was that functions are only allowed to have both
a section (per decl_section_name) and a comdat group if they've got a section
attribute. Setting the attribute seemed a lot cleaner than adding a check
at the time of validation.


Re: The COBOL front end, in 8 notes

2024-12-13 Thread Andi Kleen
"James K. Lowden"  writes:

> The following 8 patches constitute the 80 files needed to build and
> document the COBOL front end.  They assume that following exist:
>
> gcc/cobol/ChangeLog
> libgcobol/ChangeLog
>
> The messages are grouped by files in a more or less logical order,
> but groups are somewhat arbitrary.  The primary constraint afaik is to
> keep them from getting too big, fsvo $too.  We have:
>
>   460K hdr  header files
>   484K par  the parser
>   760K gen  GENERIC interface
>   556K cbl  other supporting C++ files
>   432K cfg  libgcobol/configure
>   788K lib  libgcobol, all of it
>72K doc  man pages, for now
>24K bld  "meta" files, such a gcc/cobol/Make-lang.in

How would it be regression tested?


-Andi


RE: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors

2024-12-13 Thread Tamar Christina
> >  ;; 2 element quad vector modes.
> >  (define_mode_iterator VQ_2E [V2DI V2DF])
> >
> > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI")  (V16QI
> "V8QI")
> >  (V2DI "DI")(V2SF  "SF")
> >  (V4SF "V2SF")  (V4HF "V2HF")
> >  (V8HF "V4HF")  (V2DF  "DF")
> > -(V8BF "V4BF")])
> > +(V8BF "V4BF")
> > +(VNx16QI "VNx8QI") (VNx8QI "VNx4QI")
> > +(VNx4QI "VNx2QI")  (VNx2QI "QI")
> > +(VNx8HI "VNx4HI")  (VNx4HI "VNx2HI") (VNx2HI "HI")
> > +(VNx8HF "VNx4HF")  (VNx4HF "VNx2HF") (VNx2HF "HF")
> > +(VNx8BF "VNx4BF")  (VNx4BF "VNx2BF") (VNx2BF "BF")
> > +(VNx4SI "VNx2SI")  (VNx2SI "SI")
> > +(VNx4SF "VNx2SF")  (VNx2SF "SF")
> > +(VNx2DI "DI")  (VNx2DF "DF")])
> 
> Are the x2 entries necessary, given that the new uses are restricted
> to NO2E?
> 

No, but I wanted to keep the symmetry with the Adv. SIMD modes.   Since the
mode attributes don't really control the number of alternatives I thought it 
would
be better to have the attributes be "fully" defined rather than only the subset 
I use.

gcc/ChangeLog:

PR target/96342
* config/aarch64/aarch64-sve.md (vec_init): New.
(@aarch64_pack_partial): New.
* config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector): 
New.
* config/aarch64/iterators.md (SVE_NO2E): New.
(VHALF, Vhalf): Add SVE partial vectors.

gcc/testsuite/ChangeLog:

PR target/96342
* gcc.target/aarch64/vect-simd-clone-2.c: New test.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
a72ca2a500d394598268c6adfe717eed94a304b3..8ed4221dbe5c49db97b37f186365fa391900eadb
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2839,6 +2839,16 @@ (define_expand "vec_init"
   }
 )
 
+(define_expand "vec_init"
+  [(match_operand:SVE_NO2E 0 "register_operand")
+   (match_operand 1 "")]
+  "TARGET_SVE"
+  {
+aarch64_sve_expand_vector_init (operands[0], operands[1]);
+DONE;
+  }
+)
+
 ;; Shift an SVE vector left and insert a scalar into element 0.a
 (define_insn "vec_shl_insert_"
   [(set (match_operand:SVE_FULL 0 "register_operand")
@@ -9289,6 +9299,19 @@ (define_insn "vec_pack_trunc_"
   "uzp1\t%0., %1., %2."
 )
 
+;; Integer partial pack packing two partial SVE types into a single full SVE
+;; type of the same element type.  Use UZP1 on the wider type, which discards
+;; the high part of each wide element.  This allows to concat SVE partial types
+;; into a wider vector.
+(define_insn "@aarch64_pack_partial"
+  [(set (match_operand:SVE_NO2E 0 "register_operand" "=w")
+   (vec_concat:SVE_NO2E
+ (match_operand: 1 "register_operand" "w")
+ (match_operand: 2 "register_operand" "w")))]
+  "TARGET_SVE"
+  "uzp1\t%0., %1., %2."
+)
+
 ;; -
 ;;  [INT<-INT] Unpacks
 ;; -
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
de4c0a0783912b54ac35d7c818c24574b27a4ca0..40214e318f3c4e30e619d96073b253887c973efc
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24859,6 +24859,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals)
 v.quick_push (XVECEXP (vals, 0, i));
   v.finalize ();
 
+  /* If we have two elements and are concatting vector.  */
+  machine_mode elem_mode = GET_MODE (v.elt (0));
+  if (nelts == 2 && VECTOR_MODE_P (elem_mode))
+{
+  /* We've failed expansion using a dup.  Try using a cheeky truncate. */
+  rtx arg0 = force_reg (elem_mode, v.elt(0));
+  rtx arg1 = force_reg (elem_mode, v.elt(1));
+  emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1));
+  return;
+}
+
   /* If neither sub-vectors of v could be initialized specially,
  then use INSR to insert all elements from v into TARGET.
  ??? This might not be optimal for vectors with large
@@ -24870,6 +24881,30 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals)
 aarch64_sve_expand_vector_init_insert_elems (target, v, nelts);
 }
 
+/* Initialize register TARGET from the two vector subelements in PARALLEL
+   rtx VALS.  */
+
+void
+aarch64_sve_expand_vector_init_subvector (rtx target, rtx vals)
+{
+  machine_mode mode = GET_MODE (target);
+  int nelts = XVECLEN (vals, 0);
+
+  gcc_assert (nelts == 2);
+
+  rtx arg0 = XVECEXP (vals, 0, 0);
+  rtx arg1 = XVECEXP (vals, 0, 1);
+
+  /* If we have two elements and are concatting vector.  */
+  machine_mode elem_mode = GET

Re: [PATCH] vect: Use proper vectype for costing vec_construct [PR118019].

2024-12-13 Thread Robin Dapp
> Iff you want to fix this now before re-architecting how we do costing then
> IMO the only sensible way is by adding extra member to slp_tree indicating
> the Element Type used.

I see, then it might not be worth it - I guess it can wait until GCC 16.


Re: [PATCH] COBOL 8/8 bld: "meta" files, such a gcc/cobol/Make-lang.in

2024-12-13 Thread Marc
Hello James,

> gcc/ChangeLog
>   * common.opt: Add libgcobol module and cobol language.
>   * dwarf2out.cc: New file.

Minor comment while quickly looking at it. But I think the ChangeLog
entry for dwarf2out is not correct, as the file already exists.

> gcc/cobol/ChangeLog
>   * config-lang.in: New file.

> diff --git a/gcc/cobol/config-lang.in b/gcc/cobol/config-lang.in
> new file mode 100644
> index 000..e18cb7fdba0
> --- /dev/null
> +++ b/gcc/cobol/config-lang.in

But gcc/cobol/config-lang.in the file is already "New" in your
PATCH 4/8... Is it possible that something went wrong when creating the
various patches? Applying the patch fails when it tries to create the
file again...

I may do something wrong (not sure I'm saving the patch correctly before
applying), but I can't apply the PATCH 5/8, I get:

 error: corrupt patch at line 208

Are your patches supposed to be a perfect match with your git
repository? I'm still unable to use the "make install" target when doing
nightly build for the compiler explorer build (see
https://github.com/compiler-explorer/compiler-explorer/issues/6725).

I initially wanted to apply the patches to check if I had the same
issue...

Marc


Re: [PATCH] COBOL 8/8 bld: "meta" files, such a gcc/cobol/Make-lang.in

2024-12-13 Thread David Malcolm
On Thu, 2024-12-12 at 12:58 -0500, James K. Lowden wrote:

[...snip...]


> diff --git a/gcc/cobol/lang.opt b/gcc/cobol/lang.opt
> new file mode 100644
> index 000..1739dd9fdf3
> --- /dev/null
> +++ b/gcc/cobol/lang.opt
> @@ -0,0 +1,144 @@
> +; lang.opt -- Options for the gcc Cobol front end.
> +
> +; Copyright (C) 2021-2024 Free Software Foundation, Inc.
> +;
> +; This file is part of GCC.
> +;
> +; GCC is free software; you can redistribute it and/or modify it
> under
> +; the terms of the GNU General Public License as published by the
> Free
> +; Software Foundation; either version 3, or (at your option) any
> later
> +; version.
> +;
> +; GCC is distributed in the hope that it will be useful, but WITHOUT
> ANY
> +; WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License
> +; for more details.
> +;
> +; You should have received a copy of the GNU General Public License
> +; along with GCC; see the file COPYING3.  If not see
> +; .
> +
> +; See the GCC internals manual for a description of this file's
> format.
> +
> +; Please try to keep this file in ASCII collating order.
> +
> +Language
> +Cobol
> +
> +D
> +Cobol Joined Separate
> +; Documented in c.opt
> +
> +E
> +Cobol
> +; Documented in c.opt
> +
> +I
> +Cobol Joined Separate
> +;;  -I  Add copybook search directory
> +; Documented in c.opt
> +
> +dialect
> +Cobol Joined Separate Enum(dialect_type) EnumBitSet
> Var(cobol_dialect)
> +Accept COBOL constructs used by non-ISO compilers
> +
> +Enum
> +Name(dialect_type) Type(int) UnknownError(Unrecognized COBOL dialect
> name: %qs)
> +
> +EnumValue
> +Enum(dialect_type) String(gcc) Value(0x04) Canonical
> +
> +EnumValue
> +Enum(dialect_type) String(ibm) Value(0x01)
> +
> +EnumValue
> +Enum(dialect_type) String(mf)  Value(0x02)
> +
> +EnumValue
> +Enum(dialect_type) String(gnu) Value(0x04)
> +
> +fcobol-exceptions
> +Cobol Joined Separate Var(cobol_exceptions)
> +-fcobol-exceptions=   Enable some exceptions by default
> +
> +copyext
> +Cobol Joined Separate Var(cobol_copyext) Init(0)
> +Define alternative implicit copybook filename extension
> +
> +fdefaultbyte
> +Cobol RejectNegative Joined Separate UInteger
> Var(cobol_default_byte)
> +Set Working-Storage data items to the supplied value
> +
> +fflex-debug
> +Cobol Var(yy_flex_debug, 1) Init(0)
> +Enable Cobol lex debugging
> +
> +ffixed-form
> +Cobol RejectNegative
> +Assume that the source file is fixed form.
> +
> +fsyntax-only
> +Cobol RejectNegative
> +; Documented in c.opt
> +
> +ffree-form
> +Cobol RejectNegative
> +Assume that the source file is free form.
> +
> +findicator-column
> +Cobol RejectNegative Joined Separate UInteger Var(indicator_column)
> Init(0) IntegerRange(0, 8)
> +-findicator-column=   Column after which Region A begins
> +
> +finternal-ebcdic
> +Cobol Var(cobol_ebcdic, 1) Init(0)
> +-finternal-ebcdicInternal processing is in EBCDIC Code Page
> 1140
> +
> +fmax-errors
> +Cobol Joined Separate
> +; Documented in C
> +
> +fstatic-call
> +Cobol Var(cobol_static_call, 1) Init(1)
> +Enable/disable static linkage for CALL literals
> +
> +ftrace-debug
> +Cobol Var(cobol_trace_debug, 1) Init(0)
> +Enable Cobol parser debugging
> +
> +fyacc-debug
> +Cobol Var(yy_debug, 1) Init(0)
> +Enable Cobol yacc debugging
> +
> +preprocess
> +Cobol Joined Separate Var(cobol_preprocess)
> +preprocess  before compiling
> +
> +iprefix
> +Cobol Joined Separate
> +; Documented in C
> +
> +include
> +Cobol Joined Separate Var(cobol_include)
> +; Documented in C
> +
> +isysroot
> +Cobol Joined Separate
> +; Documented in C
> +
> +isystem
> +Cobol Joined Separate
> +; Documented in C
> +
> +main
> +Cobol
> +-mainThe first program-id in the next source file is called by a
> generated main() entry point
> +
> +main=
> +Cobol Joined Var(cobol_main_string)
> +-main= source_file/PROGRAM-ID is called by the
> generated main()
> +
> +nomain
> +Cobol
> +-nomain  No main() function is created from COBOL source
> files
> +
> +; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/cobol/lang.opt.urls b/gcc/cobol/lang.opt.urls
> new file mode 100644
> index 000..9fd69a37a6b
> --- /dev/null
> +++ b/gcc/cobol/lang.opt.urls
> @@ -0,0 +1,29 @@
> +; Copied by Dubner from gcc/rust/ so that compilation could proceed
> +; Autogenerated by regenerate-opt-urls.py from gcc/rust/lang.opt and
> generated HTML

You should be able to regenerate this via
  "make regenerate-opt-urls"

This runs a Python script that walks over the .opt files and the
generated html documentation and for each .opt file makes a .opt.urls
file mapping the options to links into the generated HTML via UrlSuffix
directives.

It look like from patch 7 that your documentation is in the form of man
pages in hand-written roff or similar.  Am I right in thinking that
there isn't any HTML documentation?  If so, then I'd expect the
genera

Re: The COBOL front end, in 8 notes

2024-12-13 Thread David Malcolm
On Thu, 2024-12-12 at 12:56 -0500, James K. Lowden wrote:
> The following 8 patches constitute the 80 files needed to build and
> document the COBOL front end.  They assume that following exist:
> 
>     gcc/cobol/ChangeLog
>     libgcobol/ChangeLog
> 
> The messages are grouped by files in a more or less logical order,
> but groups are somewhat arbitrary.  The primary constraint afaik is
> to
> keep them from getting too big, fsvo $too.  We have:
> 
>   460K hdr  header files
>   484K par  the parser
>   760K gen  GENERIC interface
>   556K cbl  other supporting C++ files
>   432K cfg  libgcobol/configure
>   788K lib  libgcobol, all of it
>72K doc  man pages, for now
>24K bld  "meta" files, such a gcc/cobol/Make-lang.in
> 
> Except for "bld", these all contain new files, can be applied in any
> order.  
> 
> If you would like the patches smaller or larger, I'm happy to
> rearrange
> them.  Some of exceed the 400 KB mail limit, but I'm assured they'll
> be
> moderated through.  
> 
> This patchset excludes tests.  While we do have tests, it's not clear
> how or if to add them to gcc.  They use a combination of (largely)
> 3rd
> party sources and GNU Autotest.
> 
> A word about C style, always a lively topic.  For any files already
> present in gcc, the existing style was followed, and any variation
> from
> it is unintentional.  Files related to the parser use K&R style.  The
> GENERIC interface and runtime library use Whitesmiths style.  All C++
> code uses spaces for indentation.  
> 
> The COBOL front end has been and is being written by two guys with
> decades of experience.  We hope the code is a testament to that
> experience.  Our relatively recent experience, these last four years,
> is that it has been more productive to keep using the styles to which
> we've long become accustomed.  The position of curly braces is hardly
> any hindrance to read another's code, but it's a burden to write that
> way. We think, 83,068 lines later, the proof of the pudding is in the
> eating.  
> 
> Thank you for your kind consideration of our work.

Please forgive me if you've already said this elsewhere, but is this
work available in a public git repo somewhere?

Thanks
Dave



[PATCH] vect: Use proper vectype for costing vec_construct [PR118019].

2024-12-13 Thread Robin Dapp
Hi,

in VMAT_STRIDED_SLP we're likely to select a different vectype with
fewer elements for vector construction.  After loading it is
re-interpreted as the proper vectype.
When checking costs we use the original vectype with more elements
leading to wrong costing in case vector construction is dependent
on the number of elements.

This patch makes a temporary copy of stmt_info and slp_node, changes
their vectype and passes them to record_stmt_cost in case we chose
a different load/construction vectype.

Bootstrapped and regtested on x86, aarch64 and power10.
Regtested on rv64gcv.

Regards
 Robin

PR target/118019

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Use construction/load
vectype for costing.
---
 gcc/tree-vect-stmts.cc | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index be1139a423c..6ac1e97c4c1 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10810,9 +10810,26 @@ vectorizable_load (vec_info *vinfo,
  if (nloads > 1)
{
  if (costing_p)
-   inside_cost += record_stmt_cost (cost_vec, 1, vec_construct,
-stmt_info, slp_node, 0,
-vect_body);
+   {
+ if (lvectype != vectype)
+   {
+ /* If we chose a different vectype for vector
+construction make sure to use it for costing.  */
+ stmt_vec_info stmt_info_copy = stmt_info;
+ stmt_info_copy->vectype = lvectype;
+ slp_tree slp_node_copy = slp_node;
+ slp_node_copy->vectype = lvectype;
+ inside_cost
+   += record_stmt_cost (cost_vec, 1, vec_construct,
+stmt_info_copy, slp_node_copy,
+0, vect_body);
+   }
+
+ else
+   inside_cost += record_stmt_cost (cost_vec, 1, vec_construct,
+stmt_info, slp_node, 0,
+vect_body);
+   }
  else
{
  tree vec_inv = build_constructor (lvectype, v);
-- 
2.47.1



Re: [PATCH] vect: Use proper vectype for costing vec_construct [PR118019].

2024-12-13 Thread Richard Biener



> Am 13.12.2024 um 20:49 schrieb Robin Dapp :
> 
> Hi,
> 
> in VMAT_STRIDED_SLP we're likely to select a different vectype with
> fewer elements for vector construction.  After loading it is
> re-interpreted as the proper vectype.
> When checking costs we use the original vectype with more elements
> leading to wrong costing in case vector construction is dependent
> on the number of elements.
> 
> This patch makes a temporary copy of stmt_info and slp_node, changes
> their vectype and passes them to record_stmt_cost in case we chose
> a different load/construction vectype.
> 
> Bootstrapped and regtested on x86, aarch64 and power10.
> Regtested on rv64gcv.

Either this makes just a copy of the Pointer or it will end up with Stack vars 
that got out of scope when recording costs later.

Iff you want to fix this now before re-architecting how we do costing then IMO 
the only sensible way is by adding extra member to slp_tree indicating the 
Element Type used.

Richard 

> Regards
> Robin
> 
>PR target/118019
> 
> gcc/ChangeLog:
> 
>* tree-vect-stmts.cc (vectorizable_load): Use construction/load
>vectype for costing.
> ---
> gcc/tree-vect-stmts.cc | 23 ---
> 1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index be1139a423c..6ac1e97c4c1 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -10810,9 +10810,26 @@ vectorizable_load (vec_info *vinfo,
>  if (nloads > 1)
>{
>  if (costing_p)
> -inside_cost += record_stmt_cost (cost_vec, 1, vec_construct,
> - stmt_info, slp_node, 0,
> - vect_body);
> +{
> +  if (lvectype != vectype)
> +{
> +  /* If we chose a different vectype for vector
> + construction make sure to use it for costing.  */
> +  stmt_vec_info stmt_info_copy = stmt_info;
> +  stmt_info_copy->vectype = lvectype;
> +  slp_tree slp_node_copy = slp_node;
> +  slp_node_copy->vectype = lvectype;
> +  inside_cost
> ++= record_stmt_cost (cost_vec, 1, vec_construct,
> + stmt_info_copy, slp_node_copy,
> + 0, vect_body);
> +}
> +
> +  else
> +inside_cost += record_stmt_cost (cost_vec, 1, vec_construct,
> + stmt_info, slp_node, 0,
> + vect_body);
> +}
>  else
>{
>  tree vec_inv = build_constructor (lvectype, v);
> --
> 2.47.1
> 


Re: [PATCH] c: special-case some "bool" errors with C23 [PR117629]

2024-12-13 Thread Sam James
Sam James  writes:

> David Malcolm  writes:
>
>> This patch attempts to provide better error messages for
>> code compiled with C23 that hasn't been updated for
>> "bool", "true", and "false" becoming keywords (based on
>> a brief review of the Gentoo bug tracker links given at
>> https://gcc.gnu.org/pipermail/gcc/2024-November/245185.html).
>>
>> [...]
>
> Thanks a lot David -- I'm going to give it a spin on some codebases over
> the weekend.
>
> I have seen some other instances with constexpr, static_assert, and
> unreachable, but that looks like it might be easy to add on top of this
> and maybe I could have a go at doing that after.

The diagnostics are significantly better -- and importantly, people
receiving reports from our testing containing these errors seem to
understand the issue more readily now. Thanks!


Re: [PATCH v2] c++: Disallow decomposition of lambda bases [PR90321]

2024-12-13 Thread Jason Merrill

On 12/13/24 4:49 AM, Nathaniel Shead wrote:

On Thu, Nov 21, 2024 at 04:01:02PM -0500, Marek Polacek wrote:

On Thu, Nov 07, 2024 at 09:48:52PM +1100, Nathaniel Shead wrote:

Bootstrapped and lightly regtested on x86_64-pc-linux-gnu (so far just
dg.exp), OK for trunk if full regtest succeeds?

-- >8 --

Decomposition of lambda closure types is not allowed by
[dcl.struct.bind] p6, since members of a closure have no name.

r244909 made this an error, but missed the case where a lambda is used
as a base.  This patch moves the check to find_decomp_class_base to
handle this case.

As a drive-by improvement, we also slightly improve the diagnostics to
indicate why a base class was being inspected.  Ideally the diagnostic
would point directly at the relevant base, but there doesn't seem to be
an easy way to get this location just from the binfo so I don't worry
about that here.

PR c++/90321

gcc/cp/ChangeLog:

* decl.cc (find_decomp_class_base): Check for decomposing a
lambda closure type.  Report base class chains if needed.
(cp_finish_decomp): Remove no-longer-needed check.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp62.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc| 20 ++--
  gcc/testsuite/g++.dg/cpp1z/decomp62.C | 12 
  2 files changed, 26 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 0e4533c6fab..87480dca1ac 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9268,6 +9268,14 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  static tree
  find_decomp_class_base (location_t loc, tree type, tree ret)
  {
+  if (LAMBDA_TYPE_P (type))
+{


Missing auto_diagnostic_group d; here?



Thanks, fixed.


+  error_at (loc, "cannot decompose lambda closure type %qT", type);
+  inform (DECL_SOURCE_LOCATION (TYPE_NAME (type)),
+ "lambda declared here");
+  return error_mark_node;
+}
+
bool member_seen = false;
for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
  if (TREE_CODE (field) != FIELD_DECL
@@ -9310,9 +9318,14 @@ find_decomp_class_base (location_t loc, tree type, tree 
ret)
for (binfo = TYPE_BINFO (type), i = 0;
 BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
  {
+  auto_diagnostic_group d;
tree t = find_decomp_class_base (loc, TREE_TYPE (base_binfo), ret);
if (t == error_mark_node)
-   return error_mark_node;
+   {
+ inform (DECL_SOURCE_LOCATION (TYPE_NAME (type)),


location_of might be nicer.



Yeah, I agree, thanks.  Here's an updated version of the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

Decomposition of lambda closure types is not allowed by
[dcl.struct.bind] p6, since members of a closure have no name.

r244909 made this an error, but missed the case where a lambda is used
as a base.  This patch moves the check to find_decomp_class_base to
handle this case.

As a drive-by improvement, we also slightly improve the diagnostics to
indicate why a base class was being inspected.  Ideally the diagnostic
would point directly at the relevant base, but there doesn't seem to be
an easy way to get this location just from the binfo so I don't worry
about that here.

PR c++/90321

gcc/cp/ChangeLog:

* decl.cc (find_decomp_class_base): Check for decomposing a
lambda closure type.  Report base class chains if needed.
(cp_finish_decomp): Remove no-longer-needed check.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp62.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Marek Polacek 
---
  gcc/cp/decl.cc| 19 +--
  gcc/testsuite/g++.dg/cpp1z/decomp62.C | 12 
  2 files changed, 25 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4ba6e3784ca..a1b9957a9be 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9405,6 +9405,14 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  static tree
  find_decomp_class_base (location_t loc, tree type, tree ret)
  {
+  if (LAMBDA_TYPE_P (type))
+{
+  auto_diagnostic_group d;
+  error_at (loc, "cannot decompose lambda closure type %qT", type);
+  inform (location_of (type), "lambda declared here");
+  return error_mark_node;
+}
+
bool member_seen = false;
for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
  if (TREE_CODE (field) != FIELD_DECL
@@ -9447,9 +9455,13 @@ find_decomp_class_base (location_t loc, tree type, tree 
ret)
for (binfo = TYPE_BINFO (type), i = 0;
 BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
  {
+  auto_diagnostic_group d;
tree t = find_decomp_class_base (loc, TREE_TYPE (base_binfo), ret);
   

Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-12-13 Thread Jennifer Schmitz


> On 13 Dec 2024, at 13:40, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Dec 12, 2024 at 5:27 PM Jennifer Schmitz  wrote:
>> 
>> 
>> 
>>> On 6 Dec 2024, at 08:41, Jennifer Schmitz  wrote:
>>> 
>>> 
>>> 
 On 5 Dec 2024, at 20:07, Richard Sandiford  
 wrote:
 
 External email: Use caution opening links or attachments
 
 
 Jennifer Schmitz  writes:
>> On 5 Dec 2024, at 11:44, Richard Biener  wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> On Thu, 5 Dec 2024, Jennifer Schmitz wrote:
>> 
>>> 
>>> 
 On 17 Oct 2024, at 19:23, Richard Sandiford 
  wrote:
 
 External email: Use caution opening links or attachments
 
 
 Jennifer Schmitz  writes:
> [...]
> Looking at the diff of the vect dumps (below is a section of the diff 
> for strided_store_2.c), it seemed odd that vec_to_scalar operations 
> cost 0 now, instead of the previous cost of 2:
> 
> +strided_store_1.c:38:151: note:=== vectorizable_operation ===
> +strided_store_1.c:38:151: note:vect_model_simple_cost: 
> inside_cost = 1, prologue_cost  = 0 .
> +strided_store_1.c:38:151: note:   ==> examining statement: *_6 = _7;
> +strided_store_1.c:38:151: note:   vect_is_simple_use: operand _3 + 
> 1.0e+0, type of def:internal
> +strided_store_1.c:38:151: note:   Vectorizing an unaligned access.
> +Applying pattern match.pd:236, generic-match-9.cc:4128
> +Applying pattern match.pd:5285, generic-match-10.cc:4234
> +strided_store_1.c:38:151: note:   vect_model_store_cost: inside_cost 
> = 12, prologue_cost = 0 .
> *_2 1 times unaligned_load (misalign -1) costs 1 in body
> -_3 + 1.0e+0 1 times scalar_to_vec costs 1 in prologue
> _3 + 1.0e+0 1 times vector_stmt costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> + 1 times vector_load costs 1 in prologue
> +_7 1 times vec_to_scalar costs 0 in body
> _7 1 times scalar_store costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> +_7 1 times vec_to_scalar costs 0 in body
> _7 1 times scalar_store costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> +_7 1 times vec_to_scalar costs 0 in body
> _7 1 times scalar_store costs 1 in body
> -_7 1 times vec_to_scalar costs 2 in body
> +_7 1 times vec_to_scalar costs 0 in body
> _7 1 times scalar_store costs 1 in body
> 
> Although the aarch64_use_new_vector_costs_p flag was used in multiple 
> places in aarch64.cc, the location that causes this behavior is this 
> one:
> unsigned
> aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt 
> kind,
> stmt_vec_info stmt_info, slp_tree,
> tree vectype, int misalign,
> vect_cost_model_location where)
> {
> [...]
> /* Try to get a more accurate cost by looking at STMT_INFO instead
> of just looking at KIND.  */
> -  if (stmt_info && aarch64_use_new_vector_costs_p ())
> +  if (stmt_info)
> {
>  /* If we scalarize a strided store, the vectorizer costs one
> vec_to_scalar for each element.  However, we can store the first
> element using an FP store without a separate extract step.  */
>  if (vect_is_store_elt_extraction (kind, stmt_info))
>count -= 1;
> 
>  stmt_cost = aarch64_detect_scalar_stmt_subtype (m_vinfo, kind,
>  stmt_info, 
> stmt_cost);
> 
>  if (vectype && m_vec_flags)
>stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
>stmt_info, vectype,
>where, stmt_cost);
> }
> [...]
> return record_stmt_cost (stmt_info, where, (count * stmt_cost).ceil 
> ());
> }
> 
> Previously, for mtune=generic, this function returned a cost of 2 for 
> a vec_to_scalar operation in the vect body. Now "if (stmt_info)" is 
> entered and "if (vect_is_store_elt_extraction (kind, stmt_info))" 
> evaluates to true, which sets the count to 0 and leads to a return 
> value of 0.
 
 At the time the code was written, a scalarised store would be costed
 using one vec_to_scalar call into the backend, with the count parameter
 set to the number of elements being stored.  The "count -= 1" was
 supposed to lop off the lead

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-12-13 Thread Qing Zhao


> On Dec 12, 2024, at 17:36, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 12.12.2024 um 13:59 -0800 schrieb Bill Wendling:
>> 
>> 
>> So, it’s the correct behavior for the counted_by attribute for FAM based 
>> on our previous discussion and agreement.
> 
> If it is indeed that the value of p->count last stored before p->array is
> *referenced* which counts, then everything is well.
 
 Yes, For FAM, every “reference” to p->array will be converted as a call to 
 (*.ACCESS_WITH_SIZE (p->array, &p->count, …))
>>> 
>>> Can you remind why we have to pass the address of p->count, i.e. &p->count
>>> instead of its value?
>>> 
>> So that if we change the value of p->count it will be reflected in
>> future checks.
>> 
>> p->count = n;
>> p->array[3] = x;
>> // ...
>> p->count = m;
>> p->array[3] = y;
>> 
>> We would want the last "p->array[3]" to be checked against the new
>> value of p->count rather than the original value.
> 
> But wouldn't at the second access to p->array not a new call to
> .ACCESS_WITH_SIZE be inserted?
Yes, a call to .ACCESS_WITH_SIZE is inserted for every reference to the pointer 
array. 
For the following testing case:

struct annotated {
  int b;
  int c[] __attribute__ ((counted_by (b)));
} *p_array_annotated;

int main(int argc, char *argv[])
{
  p_array_annotated
= (struct annotated *)malloc (sizeof (struct annotated) + (10 * sizeof 
(int)));
  p_array_annotated->b = 10;

  p_array_annotated->c[9] = 2;
  p_array_annotated->b = 20;
  p_array_annotated->c[15] = 2;
  return 0;
}


The IR for “main” is:

{
  p_array_annotated = (struct annotated *) malloc (44);
  p_array_annotated->b = 10;
  (*.ACCESS_WITH_SIZE ((int *) &p_array_annotated->c, &p_array_annotated->b, 1, 
0, -1, 0B))[9] = 2;
  p_array_annotated->b = 20;
  (*.ACCESS_WITH_SIZE ((int *) &p_array_annotated->c, &p_array_annotated->b, 1, 
0, -1, 0B))[15] = 2;
  return 0;
}

From the above IR, we can see that the &p_array_annotated->b (the address of 
the counted-by object corresponding to the array object p_array_annotated->c) 
is passed to every call to .ACCESS_WITH_SIZE.   Doing this allows the value of 
p_array_annotated->b being updated between two references to the same pointer 
array. ( if I remember correctly, it’s mainly for the purpose to support the 
feature for FAM:

 One important feature of the attribute is, a reference to the
 flexible array member field uses the latest value assigned to the
 field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' uses 'val1' as the number of the elements in
 'p->array', and 'ref2' uses 'val2' as the number of elements in
 'p->array’.
)


Qing

> 
>> 
 
 The count value for p->array is  *(&p->count), which is guaranteed to be 
 the last stored value of the address of p->count before the current 
 reference to p->array.
 
 Similarly, for the pointer array,  every “reference” to p->pa will be 
 converted as a call to .ACCESS_WITH_SIZE(p->pa, &p->count…). The count 
 value of the pointer array p->pa is *(&p->count), which is also guaranteed 
 to be the last stored value of the address of p->count before the current 
 reference to p->pa.
 
> Somehow I thought for FAMs it is the value p->count last stored before
> p->array is *accessed* (possibly indirectly via another pointer).  
> Probably
> it was just me being confused.
> 
>> 
>> However, as you pointed out, when the “counted_by” attribute is extended 
>> to  the pointer field, this feature will be problematic.
>> And we need to add the following additional new requirement for the 
>> “counted_by” attribute of pointer field:
>> 
>> p->count and  p->array  can only be changed by changing the whole 
>> structure at the same time.
> 
> Actually, I am then not even sure we need this requirement. My point was 
> only that
> setting the whole structure at the time should work correctly, i.e. 
> without changing
> the bounds for old pointers which were stored in the struct previously.  
> With the
> semantics  above it seems this case also works automatically.
 
 For pointer field with counted_by attribute, if the p->count and p->pa are 
 not set together when changing the whole structure, then for example:
 
 struct annotated {
  int b;
  int *c __attribute__ ((counted_by (b)));
 };
 
 /* note, this routine only allocate the space for the pointer array field, 
 but does NOT set the counted_by field.  */
 struct annotated __attribute__((__noinline__)) setup (int attr_count)
 {
  struct annotated p_array_annotated;
  p_array_annotated.c = (int *) malloc (sizeof (int) * attr_c

Re: [PATCH] Apply lambda section attributes to static thunks

2024-12-13 Thread Marek Polacek
On Fri, Dec 13, 2024 at 04:52:58PM +1300, Campbell Suter wrote:
> Each lambda that can be converted to a plain function pointer has a
> thunk generated for it, which invokes the body of the lambda function.
> 
> When a section attribute is added to a lambda function, it only applies
> to the body of the lambda function, and not the thunk. When a lambda is
> only ever used by converting it to a function pointer, the body of the
> lambda is inlined into this thunk. As a result, the section attribute
> is effectively ignored: the function it applied to is gone, and the thunk
> does not have the section attribute applied to it either.
> 
> This patch checks if a section attribute is present on a lambda, and
> applies it to the thunk.
> 
> The motivation for this change is embedded devices where most code is
> executed from flash, but code which must execute while the device is
> being reprogrammed can be moved to RAM by placing it in a different
> section.
> 
> This patch was tested with bootstrapping on x86-64 under WSL, and
> the newly added test was also run on 32-bit ARM.

Hi, thanks for the patch.
 
> gcc/cp/ChangeLog:
> 
>   * lambda.cc (maybe_add_lambda_conv_op): Don't ignore section
>   attributes on lambda functions which are converted to plain
>   function pointers.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/attr-section-lambda.C: New test.
> 
> Signed-off-by: Campbell Suter 
> ---
>  gcc/cp/lambda.cc  |  8 
>  .../g++.dg/ext/attr-section-lambda.C  | 42 +++
>  2 files changed, 50 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section-lambda.C
> 
> diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
> index d8a15d97d..e8937cc0d 100644
> --- a/gcc/cp/lambda.cc
> +++ b/gcc/cp/lambda.cc
> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify.h"
>  #include "target.h"
>  #include "decl.h"
> +#include "attribs.h"
>  
>  /* Constructor for a lambda expression.  */
>  
> @@ -1376,6 +1377,13 @@ maybe_add_lambda_conv_op (tree type)
>if (generic_lambda_p)
>  fn = add_inherited_template_parms (fn, DECL_TI_TEMPLATE (callop));
>  
> +  if (lookup_attribute ("section", DECL_ATTRIBUTES (callop)))
> +{
> +  duplicate_one_attribute(&DECL_ATTRIBUTES (fn),
> + DECL_ATTRIBUTES (callop), "section");
> +  set_decl_section_name (statfn, callop);
> +}

duplicate_one_attribute does two lookups, but we just looked up the
callop attr, and we just built up fn so it shouldn't have any attrs.
Thus I wonder if writing it like this would be a little neater:

  if (tree a = lookup_attribute ("section", DECL_ATTRIBUTES (callop)))
{
  DECL_ATTRIBUTES (fn) = attr_chainon (DECL_ATTRIBUTES (fn), a);
  set_decl_section_name (fn, callop);
}

Sadly I see that just the set_decl_section_name call wouldn't be enough...

Marek



Re: [PATCH] Introduce -flto-partition=locality

2024-12-13 Thread Kyrylo Tkachov
Ping.
Thanks,
Kyrill

> On 28 Nov 2024, at 11:22, Kyrylo Tkachov  wrote:
> 
> Ping.
> 
>> On 15 Nov 2024, at 17:04, Kyrylo Tkachov  wrote:
>> 
>> Hi all,
>> 
>> This is a patch submission following-up from the RFC at:
>> https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
>> The patch is rebased and retested against current trunk, some debugging code
>> removed, comments improved and some fixes added as I've we've done more
>> testing.
>> 
>> >8-
>> Implement partitioning and cloning in the callgraph to help locality.
>> A new -flto-partition=locality flag is used to enable this.
>> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
>> The optimization has two components:
>> * Partitioning the callgraph so as to group callers and callees that 
>> frequently
>> call each other in the same partition
>> * Cloning functions that straddle multiple callchains and allowing each clone
>> to be local to the partition of its callchain.
>> 
>> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
>> It creates a partitioning plan and does the prerequisite cloning.
>> The partitioning is then implemented during the existing LTO partitioning 
>> pass.
>> 
>> To guide these locality heuristics we use PGO data.
>> In the absence of PGO data we use a static heuristic that uses the 
>> accumulated
>> estimated edge frequencies of the callees for each function to guide the
>> reordering.
>> We are investigating some more elaborate static heuristics, in particular 
>> using
>> the demangled C++ names to group template instantiatios together.
>> This is promising but we are working out some kinks in the implementation
>> currently and want to send that out as a follow-up once we're more confident
>> in it.
>> 
>> A new bootstrap-lto-locality bootstrap config is added that allows us to test
>> this on GCC itself with either static or PGO heuristics.
>> GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
>> 
>> With this optimization we are seeing good performance gains on some large
>> internal workloads that stress the parts of the processor that is sensitive
>> to code locality, but we'd appreciate wider performance evaluation.
>> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for mainline?
>> Thanks,
>> Kyrill
>> 
>> Signed-off-by: Prachi Godbole 
>> Co-authored-by: Kyrylo Tkachov 
>> 
>>   config/ChangeLog:
>>* bootstrap-lto-locality.mk: New file.
>> 
>>gcc/ChangeLog:
>>   * Makefile.in (OBJS): Add ipa-locality-cloning.o
>>   (GTFILES): Add ipa-locality-cloning.cc dependency.
>>   * common.opt (lto_partition_model): Add locality value.
>>   * flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
>> value.
>>   (enum lto_locality_cloning_model): Define.
>>   * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
>> of node
>>   and index.
>>   * params.opt (lto_locality_cloning_model): New enum.
>>   (lto-partition-locality-cloning): New param.
>>   (lto-partition-locality-frequency-cutoff): Likewise.
>>   (lto-partition-locality-size-cutoff): Likewise.
>>   (lto-max-locality-partition): Likewise.
>>   * passes.def: Add pass_ipa_locality_cloning.
>>   * timevar.def (TV_IPA_LC): New timevar.
>>   * tree-pass.h (make_pass_ipa_locality_cloning): Declare.
>>   * ipa-locality-cloning.cc: New file.
>>   * ipa-locality-cloning.h: New file.
>> 
>> gcc/lto/ChangeLog:
>>* lto-partition.cc: Include ipa-locality-cloning.h
>>   (add_node_references_to_partition): Define.
>>   (create_partition): Likewise.
>>   (lto_locality_map): Likewise.
>>   (lto_promote_cross_file_statics): Add extra dumping.
>>   * lto-partition.h (lto_locality_map): Declare.
>>   * lto.cc (do_whole_program_analysis): Handle 
>> LTO_PARTITION_LOCALITY.
>> 
>> <0001-Introduce-flto-partition-locality.patch>
> 



[committed] libstdc++: Fix -Wsign-compare warning in

2024-12-13 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/regex.tcc: Fix -Wsign-compare warning.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/regex.tcc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex.tcc 
b/libstdc++-v3/include/bits/regex.tcc
index 5cf217ef777..5d2584e9d6d 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -444,7 +444,7 @@ namespace __detail
  __num *= 10;
  __num += __traits.value(*__next++, 10);
}
- if (0 <= __num && __num < this->size())
+ if (0 <= __num && size_t(__num) < this->size())
__output(__num);
}
  else
-- 
2.47.1



The COBOL front end, in 8 notes

2024-12-13 Thread James K. Lowden
The following 8 patches constitute the 80 files needed to build and
document the COBOL front end.  They assume that following exist:

gcc/cobol/ChangeLog
libgcobol/ChangeLog

The messages are grouped by files in a more or less logical order,
but groups are somewhat arbitrary.  The primary constraint afaik is to
keep them from getting too big, fsvo $too.  We have:

460K hdr  header files
484K par  the parser
760K gen  GENERIC interface
556K cbl  other supporting C++ files
432K cfg  libgcobol/configure
788K lib  libgcobol, all of it
 72K doc  man pages, for now
 24K bld  "meta" files, such a gcc/cobol/Make-lang.in

Except for "bld", these all contain new files, can be applied in any
order.  

If you would like the patches smaller or larger, I'm happy to rearrange
them.  Some of exceed the 400 KB mail limit, but I'm assured they'll be
moderated through.  

This patchset excludes tests.  While we do have tests, it's not clear
how or if to add them to gcc.  They use a combination of (largely) 3rd
party sources and GNU Autotest.

A word about C style, always a lively topic.  For any files already
present in gcc, the existing style was followed, and any variation from
it is unintentional.  Files related to the parser use K&R style.  The
GENERIC interface and runtime library use Whitesmiths style.  All C++
code uses spaces for indentation.  

The COBOL front end has been and is being written by two guys with
decades of experience.  We hope the code is a testament to that
experience.  Our relatively recent experience, these last four years,
is that it has been more productive to keep using the styles to which
we've long become accustomed.  The position of curly braces is hardly
any hindrance to read another's code, but it's a burden to write that
way. We think, 83,068 lines later, the proof of the pudding is in the
eating.  

Thank you for your kind consideration of our work.

--jkl



Re: [Fortran, Patch, PR117347, v1] Fix array constructor not resolved in associate

2024-12-13 Thread Harald Anlauf

Hi Andre,

while the patch works with the reduced testcase, it runs into the
newly added gcc_assert() when trying the original testcase in the PR.

I also wonder if this use of gcc_assert() is a good idea or good style:

+  gcc_assert (gfc_resolve_expr (tgt_expr));

Since gcc_assert is a macro, and its precise definition depends on
configuration and could possibly be defined to be a no-op, I suggest
to evaluate arguments with side-effects outside and pass the
return code to gcc_assert.  (There are also many other ways to handle
this situation.

Then removing the gcc_assert around the gfc_resolve_expr() avoids
the ICE, but restores the reported error.

So not OK yet.  Sorry!

Thanks,
Harald


Am 13.12.24 um 10:10 schrieb Andre Vehreschild:

Hi all,

attached patch fixes an reject-valid of an array constructor in an associate by
resolving the array constructor before parsing the associate-block. I am not
100% sure, if that is the right place to do this. But given, that there is
already a special casing before the patch, I just propose to do the resolve
there.

Regstests ok on x86_64-pc-linux-gnu / F41. Ok for mainline ?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de




[libstdc++] Optimize string constructors

2024-12-13 Thread Jan Hubicka
Hi,
this patch improves code generation on string constructors.  We currently have
_M_construct which takes as a parameter two iterators (begin/end pointers to
other string) and produces new string.  This patch adds special case of
constructor where instead of begining/end pointers we readily know the string
size and also special case when we know that source is 0 terminated.  This
happens commonly when producing stirng copies. Moreover currently ipa-prop is
not able to propagate information that beg-end is known constant (copied string
size) which makes it impossible for inliner to spot the common case where
string size is known to be shorter than 15 bytes and fits in local buffer.

Finally I made new constructor inline. Because it is explicitely instantiated
without C++20 constexpr we do not produce implicit instantiation (as required
by standard) which prevents inlining, ipa-modref and any other IPA analysis to
happen.  I think we need to make many of the other functions inline, since
optimization accross string manipulation is quite important. There is PR94960
to track this issue.

Bootstrapped/regtested x86_64-linux, OK?

libstdc++-v3/ChangeLog:

PR tree-optimization/103827
PR tree-optimization/80331
PR tree-optimization/87502

* config/abi/pre/gnu.ver: Add version for _M_construct
* include/bits/basic_string.h: (basic_string::_M_construct): 
Declare.
(basic_string constructors): Use it.
* include/bits/basic_string.tcc: (basic_string::_M_construct): 
New template.
* src/c++11/string-inst.cc: Instantated S::_M_construct.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103827.C: New test.
* g++.dg/tree-ssa/pr80331.C: New test.
* g++.dg/tree-ssa/pr87502.C: New test.

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103827.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr103827.C
new file mode 100644
index 000..6059fe514b1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103827.C
@@ -0,0 +1,22 @@
+// { dg-do compile }
+// { dg-options "-O1 -fdump-tree-optimized" }
+struct foo
+{
+  int a;
+  void bar() const;
+  ~foo()
+  {
+if (a != 42)
+  __builtin_abort ();
+  }
+};
+__attribute__ ((noinline))
+void test(const struct foo a)
+{
+int b = a.a;
+a.bar();
+if (a.a != b)
+  __builtin_printf ("optimize me away");
+}
+
+/* { dg-final { scan-tree-dump-not "optimize me away" "optimized" } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80331.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr80331.C
new file mode 100644
index 000..85034504f2f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80331.C
@@ -0,0 +1,8 @@
+// { dg-do compile }
+// { dg-additional-options "-O2 -fdump-tree-optimized" }
+#include
+int sain() {
+  const std::string remove_me("remove_me");
+  return 0;
+}
+// { dg-final { scan-tree-dump-not "remove_me" "optimized" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr87502.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr87502.C
new file mode 100644
index 000..7975432597d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr87502.C
@@ -0,0 +1,16 @@
+// { dg-do compile }
+// { dg-additional-options "-O2 -fdump-tree-optimized" }
+#include 
+
+
+__attribute__ ((pure))
+extern int foo (const std::string &);
+
+int
+bar ()
+{
+  return foo ("abc") + foo (std::string("abc"));
+}
+// We used to add terminating zero explicitely instead of using fact
+// that memcpy source is already 0 terminated.
+// { dg-final { scan-tree-dump-not "remove_me" "= 0;" } }
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index ae79b371d80..75a6ade1373 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2540,6 +2540,9 @@ GLIBCXX_3.4.34 {
 
_ZNSt8__format25__locale_encoding_to_utf8ERKSt6localeSt17basic_string_viewIcSt11char_traitsIcEEPv;
 # __sso_string constructor and destructor
 _ZNSt12__sso_string[CD][12]Ev;
+# void std::__cxx11::basic_string, 
std::allocator >::_M_construct(char const*, unsigned long)
+# and wide char version
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_M_constructILb[01]EEEvPK[cw]m;
 } GLIBCXX_3.4.33;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 8369c24d3ae..effc22b8dc9 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -341,6 +341,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   void
   _M_construct(size_type __req, _CharT __c);
 
+  // Construct using block of memory of known size.
+  // If _Terminated is true assume that source is already 0 terminated.
+  template
+   _GLIBCXX20_CONSTEXPR inline
+   void
+   _M_construct(const _CharT *__c, size_type __n);
+
   _GLIBCXX20_CONSTEXPR
   allocator_type&
   _M_get_allocator()
@@ -561,8 +568,7 @@ _GLIBCXX_

[committed] libstdc++: Fix -Wreorder warning in

2024-12-13 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/pstl/parallel_backend_tbb.h (__merge_func): Fix order
of mem-initializers.
---

Tested x86_64-linux. Pushed to trunk.

There are still lots of -Wunknown-pragmas/-Wsign-compare/-Wunused
warnings in this header though.

 libstdc++-v3/include/pstl/parallel_backend_tbb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/pstl/parallel_backend_tbb.h 
b/libstdc++-v3/include/pstl/parallel_backend_tbb.h
index 96e4b709fbe..bb6fa8f18e8 100644
--- a/libstdc++-v3/include/pstl/parallel_backend_tbb.h
+++ b/libstdc++-v3/include/pstl/parallel_backend_tbb.h
@@ -834,7 +834,7 @@ class __merge_func
 __merge_func(_SizeType __xs, _SizeType __xe, _SizeType __ys, _SizeType 
__ye, _SizeType __zs, _Compare __comp,
  _Cleanup, _LeafMerge __leaf_merge, _SizeType __nsort, 
_RandomAccessIterator1 __x_beg,
  _RandomAccessIterator2 __z_beg, bool __x_orig, bool __y_orig, 
bool __root)
-: _M_xs(__xs), _M_xe(__xe), _M_ys(__ys), _M_ye(__ye), _M_zs(__zs), 
_M_x_beg(__x_beg), _M_z_beg(__z_beg),
+: _M_x_beg(__x_beg), _M_z_beg(__z_beg), _M_xs(__xs), _M_xe(__xe), 
_M_ys(__ys), _M_ye(__ye), _M_zs(__zs),
   _M_comp(__comp), _M_leaf_merge(__leaf_merge), _M_nsort(__nsort), 
_root(__root),
   _x_orig(__x_orig), _y_orig(__y_orig), _split(false)
 {
-- 
2.47.1



[PATCH] COBOL 7/8 doc: man pages, for now

2024-12-13 Thread James K. Lowden
>From 64bcb34e12371f61a8958645e1668e0ac2704391doc.patch 4 Oct 2024 12:01:22 
>-0400
From: "James K. Lowden" 
Date: Thu 12 Dec 2024 06:28:05 PM EST
Subject: [PATCH]  Add 'cobol' to 4 files

gcc/cobol/ChangeLog
* gcobc: New file.
* gcobol.1: New file.
* gcobol.3: New file.
* help.gen: New file.

---
gcc/cobol/gcobc | 
+++-
gcc/cobol/gcobol.1 | 
++-
gcc/cobol/gcobol.3 | 
-
gcc/cobol/help.gen | +++
4 files changed, 2356 insertions(+), 4 deletions(-)
diff --git a/gcc/cobol/gcobc b/gcc/cobol/gcobc
new file mode 100755
index 000..9afd8fd93fe
--- /dev/null
+++ b/gcc/cobol/gcobc
@@ -0,0 +1,443 @@
+#! /bin/sh -e
+
+#
+# COPYRIGHT
+# The gcobc program is in public domain.
+# If it breaks then you get to keep both pieces.
+#
+# This file emulates the GnuCOBOL cobc compiler to a limited degree.
+# For options that can be "mapped" (see migration-guide.1), it accepts
+# cobc options, changing them to the gcobol equivalents.  Options not
+# recognized by the script are passed verbatim to gcobol, which will
+# reject them unless of course they are gcobol options.
+#
+# User-defined variables, and their defaults:
+#
+# Variable Default Effect 
+# echo  none   If defined, echo the gcobol command 
+# gcobcxnone   Produce verbose messages
+# gcobol   ./gcobolName of the gcobol binary
+# GCOBCUDF PREFIX/share/cobol/udf/Location of UDFs to be prepended to 
input
+#
+# By default, this script includes all files in $GCOBCUDF.  To defeat
+# that behavior, use GCOBCUDF=none.
+#
+# A list of supported options is produced with "gcobc -HELP". 
+#
+## Maintainer note. In modifying this file, the following may make
+## your life easier:
+##
+##  - To force the script to exit, either set exit_status to 1, or call
+##the error function.
+##  - As handled options are added, add them to the HELP here-doc.
+##  - The compiler can produce only one kind of output.  In this
+##script, that's known by $mode.  Options that affect the type of
+##output set the mode variable.  Everything else is appended to the
+##opts variable.
+##
+
+if [ "$COBCPY" ]
+then
+copydir="-I$COBCPY"
+fi
+
+if [ "$COB_COPY_DIR" ]
+then
+copydir="-I$COB_COPY_DIR"
+fi
+
+udf_default="${0%/*}/../share/cobol/udf"
+udfdir="${GCOBCUDF:-$udf_default}"
+
+if [ -d "$udfdir" ]
+then
+for F in "$udfdir"/*
+do
+if [ -f $F ]
+then
+includes="$includes -include $F "
+fi
+done
+else
+if [ "$GCOBCUDF" -a "$GCOBCUDF" != "none" ]
+then
+echo warning: no such directory: "'$GCOBCUDF'"
+fi
+fi

[PATCH] COBOL 8/8 bld: "meta" files, such a gcc/cobol/Make-lang.in

2024-12-13 Thread James K. Lowden
>From 64bcb34e12371f61a8958645e1668e0ac2704391bld.patch 4 Oct 2024 12:01:22 
>-0400
From: "James K. Lowden" 
Date: Thu 12 Dec 2024 06:28:06 PM EST
Subject: [PATCH]  Add 'cobol' to 10 files

ChangeLog
* Makefile.def: Add libgcobol module and cobol language.
* configure: Regenerate.
* configure.ac: Add libgcobol module and cobol language.

gcc/ChangeLog
* common.opt: Add libgcobol module and cobol language.
* dwarf2out.cc: New file.

gcc/cobol/ChangeLog
* LICENSE: New file.
* Make-lang.in: New file.
* config-lang.in: New file.
* lang.opt: New file.
* lang.opt.urls: New file.

---
Makefile.def | ++-
configure | ++--
configure.ac | +-
gcc/cobol/LICENSE | +-
gcc/cobol/Make-lang.in | 
-
gcc/cobol/config-lang.in | +-
gcc/cobol/lang.opt | 
-
gcc/cobol/lang.opt.urls | +-
gcc/common.opt | -
gcc/dwarf2out.cc | +
10 files changed, 493 insertions(+), 11 deletions(-)
diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..1192e852c7a 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; 
lib_path=.libs; };
 target_modules = { module= libitm; lib_path=.libs; };
 target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
 target_modules = { module= libgrust; };
+target_modules = { module= libgcobol; };
 
 // These are (some of) the make targets to be done in each subdirectory.
 // Not all; these are the ones which don't have special options.
@@ -324,6 +325,7 @@ flags_to_pass = { flag= CXXFLAGS_FOR_TARGET ; };
 flags_to_pass = { flag= DLLTOOL_FOR_TARGET ; };
 flags_to_pass = { flag= DSYMUTIL_FOR_TARGET ; };
 flags_to_pass = { flag= FLAGS_FOR_TARGET ; };
+flags_to_pass = { flag= GCOBOL_FOR_TARGET ; };
 flags_to_pass = { flag= GFORTRAN_FOR_TARGET ; };
 flags_to_pass = { flag= GOC_FOR_TARGET ; };
 flags_to_pass = { flag= GOCFLAGS_FOR_TARGET ; };
@@ -655,6 +657,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
no_c=true; };
 // built newlib on some targets (e.g. Cygwin).  It still needs
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
+lang_env_dependencies = { module=libgcobol; cxx=true; };
 
 dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
@@ -690,6 +693,7 @@ dependencies = { module=install-target-libvtv; 
on=install-target-libgcc; };
 dependencies = { module=install-target-libitm; on=install-target-libgcc; };
 dependencies = { module=install-target-libobjc; on=install-target-libgcc; };
 dependencies = { module=install-target-libstdc++-v3; on=install-target-libgcc; 
};
+dependencies = { module=install-target-libgcobol; 
on=install-target-libstdc++-v3; };
 
 // Target modules in the 'src' repository.
 lang_env_dependencies = { module=libtermcap; };
@@ -727,6 +731,8 @@ languages = { language=d;   gcc-check-target=check-d;
lib-check-target=check-target-libphobos; };
 languages = { language=jit;gcc-check-target=check-jit; };
 languages = { language=rust;   gcc-check-target=check-rust; };
+languages = { language=cobol;  gcc-check-target=check-cobol;
+   lib-check-target=check-target-libgcobol; };
 
 // Toplevel bootstrap
 bootstrap_stage = { id=1 ; };
diff --git a/configure b/configure
index 51bf1d1add1..2a8f0cadc0e 100755
--- a/configure
+++ b/configure
@@ -775,6 +775,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -949,6 +950,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1201,6 +1203,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | -

[committed] libstdc++: Fix -Wmisleading-indentation warning in testcase

2024-12-13 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* testsuite/26_numerics/random/random_device/entropy.cc: Fix
indentation to avoid -Wmisleading-indentation warning.
---

Tested x86_64-linux. Pushed to trunk.

 .../testsuite/26_numerics/random/random_device/entropy.cc   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc 
b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
index 9f529f5d814..a6bebb39a9e 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
@@ -30,7 +30,7 @@ test01()
   VERIFY( entropy == max );
 }
 
-for (auto token : { "getentropy", "arc4random" })
+  for (auto token : { "getentropy", "arc4random" })
 if (__gnu_test::random_device_available(token))
 {
   const double entropy = std::random_device(token).entropy();
-- 
2.47.1



[committed] libstdc++: Swap expressions in noexcept-specifier of ranges::not_equal_to

2024-12-13 Thread Jonathan Wakely
Although this should never make a difference for sensible code, we
should really make the expression in the noexcept-specifier match the
expression in the function body.

libstdc++-v3/ChangeLog:

* include/bits/ranges_cmp.h (not_equal_to): Make order of
expressions in noexcept-specifier match the body.
* testsuite/20_util/function_objects/range.cmp/not_equal_to.cc:
Check noexcept.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/ranges_cmp.h  |  2 +-
 .../function_objects/range.cmp/not_equal_to.cc  | 17 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/ranges_cmp.h 
b/libstdc++-v3/include/bits/ranges_cmp.h
index 8425016288c..b1a33f48d02 100644
--- a/libstdc++-v3/include/bits/ranges_cmp.h
+++ b/libstdc++-v3/include/bits/ranges_cmp.h
@@ -99,7 +99,7 @@ namespace ranges
   requires equality_comparable_with<_Tp, _Up>
   constexpr bool
   operator()(_Tp&& __t, _Up&& __u) const
-  noexcept(noexcept(std::declval<_Up>() == std::declval<_Tp>()))
+  noexcept(noexcept(std::declval<_Tp>() == std::declval<_Up>()))
   { return !equal_to{}(std::forward<_Tp>(__t), std::forward<_Up>(__u)); }
 
 using is_transparent = __is_transparent;
diff --git 
a/libstdc++-v3/testsuite/20_util/function_objects/range.cmp/not_equal_to.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/range.cmp/not_equal_to.cc
index 5b4f3cb32ff..1b5167f9783 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/range.cmp/not_equal_to.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/range.cmp/not_equal_to.cc
@@ -68,9 +68,26 @@ test02()
   VERIFY( ! f(x, x) );
 }
 
+struct A
+{
+  bool operator==(const A&) const noexcept { return true; }
+  bool operator==(A&&) const { return true; }
+};
+
+void
+test03()
+{
+  const A a{};
+  static_assert( noexcept(a == a) );
+  static_assert( ! noexcept(a == A{}) );
+  static_assert( noexcept(std::ranges::not_equal_to{}(a, a)) );
+  static_assert( ! noexcept(std::ranges::not_equal_to{}(a, A{})) );
+}
+
 int
 main()
 {
   test01();
   test02();
+  test03();
 }
-- 
2.47.1



[committed] libstdc++: Fix uninitialized data in std::basic_spanbuf::seekoff

2024-12-13 Thread Jonathan Wakely
I noticed a -Wmaybe-uninitialized warning for this function, which turns
out to be correct. If the caller passes a valid std::ios_base::seekdir
value then there's no problem, but if they pass std::seekdir(999) then
we don't initialize the __base variable before adding it to __off.

Rather than initialize it to an arbitrary value, we should return an
error.

Also add [[unlikely]] attributes to the paths that return an error.

libstdc++-v3/ChangeLog:

* include/std/spanstream (basic_spanbuf::seekoff): Return an
error for invalid seekdir values.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/spanstream | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/spanstream 
b/libstdc++-v3/include/std/spanstream
index 98ad3fa856a..23a340a746e 100644
--- a/libstdc++-v3/include/std/spanstream
+++ b/libstdc++-v3/include/std/spanstream
@@ -168,7 +168,7 @@ template
}
   else
{
- off_type __base;
+ off_type __base{};
  __which &= (ios_base::in|ios_base::out);
 
  if (__which == ios_base::out)
@@ -182,11 +182,13 @@ template
}
  else if (__way == ios_base::end)
__base = _M_buf.size();
-
- if (__builtin_add_overflow(__base, __off, &__off))
+ else /* way is not ios::beg, ios::cur, or ios::end */ [[unlikely]]
return __ret;
 
- if (__off < 0 || __off > _M_buf.size())
+ if (__builtin_add_overflow(__base, __off, &__off)) [[unlikely]]
+   return __ret;
+
+ if (__off < 0 || __off > _M_buf.size()) [[unlikely]]
return __ret;
 
  if (__which & ios_base::in)
-- 
2.47.1



[PATCH] libstdc++: Move std::basic_ostream to new internal header

2024-12-13 Thread Jonathan Wakely
This adds  so that other headers don't need to include
all of , which pulls in all of  since C++23 (for the
std::print and std::println overloads in ). This new header
allows the constrained operator<< in  to be defined
without all of std::format being compiled.

We could also replace  with  in all of
, , , and . That seems more
likely to cause problems for users who might be expecting  to
define std::endl, for example. Although the standard doesn't guarantee
that, it is more reasonable than expecting  to define it! We can
look into making those changes for GCC 16.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/unique_ptr.h: Include bits/ostream.h instead of
ostream.
* include/std/ostream: Include new header.
* include/bits/ostream.h: New file.
---

Tested x86_64-linux. Any objections?

 libstdc++-v3/include/Makefile.am   |   1 +
 libstdc++-v3/include/Makefile.in   |   1 +
 libstdc++-v3/include/bits/ostream.h| 817 +
 libstdc++-v3/include/bits/unique_ptr.h |   2 +-
 libstdc++-v3/include/std/ostream   | 763 +--
 5 files changed, 821 insertions(+), 763 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/ostream.h

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 6efd3cd5f1c..07f3f027c82 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -135,6 +135,7 @@ bits_freestanding = \
${bits_srcdir}/memoryfwd.h \
${bits_srcdir}/monostate.h \
${bits_srcdir}/move.h \
+   ${bits_srcdir}/ostream.h \
${bits_srcdir}/out_ptr.h \
${bits_srcdir}/predefined_ops.h \
${bits_srcdir}/parse_numbers.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 3b5f93ce185..25fc5a27a2b 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -490,6 +490,7 @@ bits_freestanding = \
${bits_srcdir}/memoryfwd.h \
${bits_srcdir}/monostate.h \
${bits_srcdir}/move.h \
+   ${bits_srcdir}/ostream.h \
${bits_srcdir}/out_ptr.h \
${bits_srcdir}/predefined_ops.h \
${bits_srcdir}/parse_numbers.h \
diff --git a/libstdc++-v3/include/bits/ostream.h 
b/libstdc++-v3/include/bits/ostream.h
new file mode 100644
index 000..b63b8dc51aa
--- /dev/null
+++ b/libstdc++-v3/include/bits/ostream.h
@@ -0,0 +1,817 @@
+// Output streams -*- C++ -*-
+
+// Copyright (C) 1997-2024 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/ostream.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{ostream}
+ */
+
+//
+// ISO C++ 14882: 27.6.2  Output streams
+//
+
+#ifndef _GLIBCXX_OSTREAM_H
+#define _GLIBCXX_OSTREAM_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#include  // iostreams
+
+#include 
+#include 
+#if __cplusplus > 202002L
+# include 
+#endif
+
+# define __glibcxx_want_print
+#include  // __glibcxx_syncbuf
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /**
+   *  @brief  Template class basic_ostream.
+   *  @ingroup io
+   *
+   *  @tparam _CharT  Type of character stream.
+   *  @tparam _Traits  Traits for character type, defaults to
+   *   char_traits<_CharT>.
+   *
+   *  This is the base class for all output streams.  It provides text
+   *  formatting of all builtin types, and communicates with any class
+   *  derived from basic_streambuf to do the actual output.
+  */
+  template
+class basic_ostream : virtual public basic_ios<_CharT, _Traits>
+{
+public:
+  // Types (inherited from basic_ios):
+  typedef _CharT   char_type;
+  typedef typename _Traits::int_type   int_type;
+  typedef typename _Traits::pos_type   pos_type;
+ 

Re: [PATCH v4] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2024-12-13 Thread Christophe Lyon
On Tue, 10 Dec 2024 at 13:14, Richard Earnshaw (lists)
 wrote:
>
> On 09/12/2024 21:11, Christophe Lyon wrote:
> > In this PR, we have to handle a case where MVE predicates are supplied
> > as a const_int, where individual predicates have illegal boolean
> > values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
> > fix the constant (any non-zero value is converted to all 1s) and emit
> > a warning.
> >
> > On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
> > instruction level, but end-users should describe lanes rather than
> > bytes (so all bytes of a true-predicated lane should be '1'), see the
> > section on MVE intrinsics in the Arm ACLE specification.
> >
> > Since force_lowpart_subreg cannot handle const_int (because they have VOID 
> > mode),
> > use gen_lowpart on them, force_lowpart_subreg otherwise.
> >
> > 2024-11-20  Christophe Lyon  
> >   Jakub Jelinek  
> >
> >   PR target/114801
> >   gcc/
> >   * config/arm/arm-mve-builtins.cc
> >   (function_expander::add_input_operand): Handle CONST_INT
> >   predicates.
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/mve/pr108443.c: Update predicate constant.
> >   * gcc.target/arm/mve/pr108443-run.c: Likewise.
> >   * gcc.target/arm/mve/pr114801.c: New test.
>
> Thanks, that looks much better.  OK, assuming no regressions.
>

Indeed, thanks for your suggestions.

OK to backport to gcc-14  after a while?

Thanks,

Christophe


> R.
>
> > ---
> >   gcc/config/arm/arm-mve-builtins.cc| 32 ++-
> >   .../gcc.target/arm/mve/pr108443-run.c |  2 +-
> >   gcc/testsuite/gcc.target/arm/mve/pr108443.c   |  4 +-
> >   gcc/testsuite/gcc.target/arm/mve/pr114801.c   | 39 +++
> >   4 files changed, 73 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
> >
> > diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> > b/gcc/config/arm/arm-mve-builtins.cc
> > index 8570e18fd96..3c3d30bd0de 100644
> > --- a/gcc/config/arm/arm-mve-builtins.cc
> > +++ b/gcc/config/arm/arm-mve-builtins.cc
> > @@ -2358,7 +2358,37 @@ function_expander::add_input_operand (insn_code 
> > icode, rtx x)
> > mode = GET_MODE (x);
> >   }
> > else if (VALID_MVE_PRED_MODE (mode))
> > -x = gen_lowpart (mode, x);
> > +{
> > +  if (CONST_INT_P (x))
> > + {
> > +   if (mode == V8BImode || mode == V4BImode)
> > + {
> > +   /* In V8BI or V4BI each element has 2 or 4 bits, if those bits
> > +  aren't all the same, gen_lowpart might ICE.  Canonicalize all
> > +  the 2 or 4 bits to all ones if any of them is non-zero.  V8BI
> > +  and V4BI multi-bit masks are interpreted byte-by-byte at
> > +  instruction level, but such constants should describe lanes,
> > +  rather than bytes.  See the section on MVE intrinsics in the
> > +  Arm ACLE specification.  */
> > +   unsigned HOST_WIDE_INT xi = UINTVAL (x);
> > +   xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
> > +   if (mode == V4BImode)
> > + xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
> > +   if (xi != UINTVAL (x))
> > + warning_at (location, 0, "constant predicate argument %d"
> > + " (%wx) does not map to %d lane numbers,"
> > + " converted to %wx",
> > + opno, UINTVAL (x) & 0x,
> > + mode == V8BImode ? 8 : 4,
> > + xi & 0x);
> > +
> > +   x = gen_int_mode (xi, HImode);
> > + }
> > +   x = gen_lowpart (mode, x);
> > + }
> > +  else
> > + x = force_lowpart_subreg (mode, x, GET_MODE (x));
> > +}
> >
> > m_ops.safe_grow (m_ops.length () + 1, true);
> > create_input_operand (&m_ops.last (), x, mode);
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c 
> > b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> > index cb4b45bd305..b894f019b8b 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> > @@ -16,7 +16,7 @@ __attribute__ ((noipa)) partial_write (uint32_t *a, 
> > uint32x4_t v, unsigned short
> >
> >   int main (void)
> >   {
> > -  unsigned short p = 0x00CC;
> > +  unsigned short p = 0x00FF;
> > uint32_t a[] = {0, 0, 0, 0};
> > uint32_t b[] = {0, 0, 0, 0};
> > uint32x4_t v = vdupq_n_u32 (0xU);
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443.c 
> > b/gcc/testsuite/gcc.target/arm/mve/pr108443.c
> > index c5fbfa4a1bb..0c0e2dd6eb8 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/pr108443.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/pr108443.c
> > @@ -7,8 +7,8 @@
> >   void
> >   __attribute__ ((noipa)) partial_write_cst (uint32_t *a, uint32x4_t v)
> >   {
> > -  vstrwq_p_u32 (a, v, 0x00CC);
> > +  vstrwq_p_u32 (a, v, 0x00FF

[PATCH] cse: Fix up record_jump_equiv checks [PR117095]

2024-12-13 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled on s390x-linux with -O2 -march=z15.
The problem happens during cse2, which sees in an extended basic block
(jump_insn 217 78 216 10 (parallel [
(set (pc)
(if_then_else (ne (reg:SI 165)
(const_int 1 [0x1]))
(label_ref 216)
(pc)))
(set (reg:SI 165)
(plus:SI (reg:SI 165)
(const_int -1 [0x])))
(clobber (scratch:SI))
(clobber (reg:CC 33 %cc))
]) "t.c":14:17 discrim 1 2192 {doloop_si64}
 (int_list:REG_BR_PROB 955630228 (nil))
 -> 216)
...
(insn 99 98 100 12 (set (reg:SI 138)
(const_int 1 [0x1])) "t.c":9:31 1507 {*movsi_zarch}
 (nil))
(insn 100 99 103 12 (parallel [
(set (reg:SI 137)
(minus:SI (reg:SI 138)
(subreg:SI (reg:HI 135 [ a ]) 0)))
(clobber (reg:CC 33 %cc))
]) "t.c":9:31 1904 {*subsi3}
 (expr_list:REG_DEAD (reg:SI 138)
(expr_list:REG_DEAD (reg:HI 135 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil)
Note, cse2 has df_note_add_problem () before df_analyze, which add
 (expr_list:REG_UNUSED (reg:SI 165)
(expr_list:REG_UNUSED (reg:CC 33 %cc)
notes to the first insn (correctly so, %cc is clobbered there and pseudo
165 isn't used after the insn).
Now, cse_extended_basic_block has an extra optimization on conditional
jumps, where it records equivalence on the edge which continues in the ebb.
Here it sees (ne reg:SI 165) (const_int 1) is false on the edge and
remembers that pseudo 165 is comparison equivalent to (const_int 1),
so on insn 100 it decides to replace (reg:SI 138) with (reg:SI 165).

This optimization isn't correct here though, because the JUMP_INSN has
multiple sets.  Before r0-77890 record_jump_equiv has been called from
cse_insn guarded on n_sets == 1 && any_condjump_p (insn), so it wouldn't
be done on the above JUMP_INSN where n_sets == 2.  But since that change
it is guarded with single_set (insn) && any_condjump_p (insn) and that
is true because of the REG_UNUSED note.  Looking at that note is
inappropriate in CSE though, because the whole intent of the pass is to
extend the lifetimes of the pseudos if equivalence is found, so the fact
that there is REG_UNUSED note for (reg:SI 165) and that the reg isn't used
later doesn't imply that it won't be used after the optimization.
So, unless we manage to process the other sets on the JUMP_INSN (it wouldn't
be terribly hard in this exact case, the doloop insn decreases the register
by 1 and so we could just record equivalence to (const_int 0) instead, but
generally it might be hard), we should IMHO just punt if there are multiple
sets.

The patch below adds !multiple_sets (insn) check instead of replacing with
it the single_set (insn) check, because apparently any_condjump_p uses
pc_set which supports the case where PATTERN is a SET to PC (that is a
single_set (insn) && !multiple_sets (insn), PATTERN is a PARALLEL with a
single SET to PC (likewise) and some CLOBBERs, PARALLEL with two or more
SETs where the first one is SET to PC (that could be single_set (insn)
with REG_UNUSED notes but is not !multiple_sets (insn)) or PATTERN
is UNSPEC/UNSPEC_VOLATILE with SET inside of it.  For the last case
!multiple_sets (insn) will be true, but IMHO we shouldn't try to derive
anything from those because we haven't checked the rest of the UNSPEC*
and we don't really know what it does.

Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, ok
for trunk?

2024-12-13  Jakub Jelinek  

PR rtl-optimization/117095
* cse.cc (cse_extended_basic_block): Don't call record_jump_equiv
if multiple_sets (insn).

* gcc.c-torture/execute/pr117095.c: New test.

--- gcc/cse.cc.jj   2024-12-07 11:35:49.305442089 +0100
+++ gcc/cse.cc  2024-12-12 13:27:44.909611907 +0100
@@ -6629,7 +6629,15 @@ cse_extended_basic_block (struct cse_bas
  && EDGE_COUNT (bb->succs) == 2
  && JUMP_P (insn)
  && single_set (insn)
- && any_condjump_p (insn))
+ && any_condjump_p (insn)
+ /* single_set may return non-NULL even for multiple sets
+if there are REG_UNUSED notes.  record_jump_equiv only
+looks at pc_set and doesn't consider other sets that
+could affect the value, and the recorded equivalence
+can extend the lifetime of the compared REG, so use
+also !multiple_sets check to verify it is exactly one
+set.  */
+ && !multiple_sets (insn))
{
  basic_block next_bb = ebb_data->path[path_entry + 1].bb;
  bool taken = (next_bb == BRANCH_EDGE (bb)->dest);
--- gcc/testsuite/gcc.c-torture/execute/pr117095.c.jj   2024-12-12 
13:34:43.305695130 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr117095.c  2024-12-12 
13:38:25.339557318 +0100

[PATCH v2] RISC-V: Increase cost for vec_construct [PR118019].

2024-12-13 Thread Robin Dapp
Hi,

for a generic vec_construct from scalar elements we need to load each
scalar element and move it over to a vector register.
Right now we only use a cost of 1 per element.

This patch uses register-move cost as well as scalar_to_vec and multiplies it
with the number of elements in the vector instead.

Regtested on rv64gcv_zvl512b.

Changes from V1:
 - Added a test case.

Regards
 Robin


PR target/118019

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_builtin_vectorization_cost):
Increase vec_construct cost.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118019.c: New test.
---
 gcc/config/riscv/riscv.cc |  8 ++-
 .../gcc.target/riscv/rvv/autovec/pr118019.c   | 52 +++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index be2ebf9d9c0..aa8a4562d9a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12263,7 +12263,13 @@ riscv_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
 case vec_construct:
-  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
+   {
+ /* TODO: This is too pessimistic in case we can splat.  */
+ int regmove_cost = fp ? costs->regmove->FR2VR
+   : costs->regmove->GR2VR;
+ return (regmove_cost + common_costs->scalar_to_vec_cost)
+   * estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
+   }
 
 default:
   gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c
new file mode 100644
index 000..b1431d123bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl512b -mstrict-align 
-mvector-strict-align" } */
+
+/* Make sure we do not construct the vector element-wise despite
+   slow misaligned scalar and vector accesses.  */
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+
+#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3)  
\
+  {
\
+int t0 = s0 + s1;  
\
+int t1 = s0 - s1;  
\
+int t2 = s2 + s3;  
\
+int t3 = s2 - s3;  
\
+d0 = t0 + t2;  
\
+d2 = t0 - t2;  
\
+d1 = t1 + t3;  
\
+d3 = t1 - t3;  
\
+  }
+
+uint32_t
+abs2 (uint32_t a)
+{
+  uint32_t s = ((a >> 15) & 0x10001) * 0x;
+  return (a + s) ^ s;
+}
+
+int
+x264_pixel_satd_8x4 (uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2)
+{
+  uint32_t tmp[4][4];
+  uint32_t a0, a1, a2, a3;
+  int sum = 0;
+  for (int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2)
+{
+  a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
+  a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
+  a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
+  a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
+  HADAMARD4 (tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0, a1, a2, a3);
+}
+  for (int i = 0; i < 4; i++)
+{
+  HADAMARD4 (a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
+  sum += abs2 (a0) + abs2 (a1) + abs2 (a2) + abs2 (a3);
+}
+  return (((uint16_t) sum) + ((uint32_t) sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-assembler-not "lbu" } } */
-- 
2.47.1



[PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop conditions

2024-12-13 Thread Tamar Christina
Hi All,

We are currently generating a loop which has more comparisons than you'd
typically need as the probablities on the small size loop are such that it
assumes the likely case is that an element is not found.

This again generates a pattern that's harder for branch predictors to follow,
but also just generates more instructions for the what one could say is the
typical case: That your hashtable contains the entry you are looking for.

This patch adds a __builtin_expect to indicate that it's likely that you'd
find the element that's being searched for.

The second change is in _M_find_before_node where at the moment the loop
is optimized for the case where we don't do any iterations.

A simple testcase is:

#include 

bool foo (int **a, int n, int val, int *tkn)
{
for (int i = 0; i < n; i++)
{
if (!a[i] || a[i]==tkn)
  return false;

if (*a[i] == val)
  return true;
}
}

which generataes:

foo:
cmp w1, 0
ble .L1
add x1, x0, w1, uxtw 3
b   .L4
.L9:
ldr w4, [x4]
cmp w4, w2
beq .L6
cmp x0, x1
beq .L1
.L4:
ldr x4, [x0]
add x0, x0, 8
cmp x4, 0
ccmpx4, x3, 4, ne
bne .L9
mov w0, 0
.L1:
ret
.L6:
mov w0, 1
ret

i.e. BB rotation makes is generate an unconditional branch to a conditional
branch. However this method is only called when the size is above a certain
threshold, and so it's likely that we have to do that first iteration.

Adding:

#include 

bool foo (int **a, int n, int val, int *tkn)
{
for (int i = 0; i < n; i++)
{
if (__builtin_expect(!a[i] || a[i]==tkn, 0))
  return false;

if (*a[i] == val)
  return true;
}
}

to indicate that we will likely do an iteration more generates:

foo:
cmp w1, 0
ble .L1
add x1, x0, w1, uxtw 3
.L4:
ldr x4, [x0]
add x0, x0, 8
cmp x4, 0
ccmpx4, x3, 4, ne
beq .L5
ldr w4, [x4]
cmp w4, w2
beq .L6
cmp x0, x1
bne .L4
.L1:
ret
.L5:
mov w0, 0
ret
.L6:
mov w0, 1
ret

which results in ~0-20% extra on top of the previous patch.

In table form:

+---++---+--++
| Group | Benchmark  | Size  | % Inline | % Unlikely all |
+---++---+--++
| Find  | unord_M_equals(__k, __code, *__p))
return __prev_p;
 
- if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
+ if (__builtin_expect (!__p->_M_nxt || 
_M_bucket_index(*__p->_M_next()) != __bkt, 0))
break;
  __prev_p = __p;
}
@@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (this->_M_equals_tr(__k, __code, *__p))
  return __prev_p;
 
-   if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
+   if (__builtin_expect (!__p->_M_nxt || 
_M_bucket_index(*__p->_M_next()) != __bkt, 0))
  break;
__prev_p = __p;
  }
@@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   pointer_to(const_cast<__node_base&>(_M_before_begin));
  while (__loc._M_before->_M_nxt)
{
- if (this->_M_key_equals(__k, *__loc._M_node()))
+ if (__builtin_expect (this->_M_key_equals(__k, *__loc._M_node()), 
1))
return __loc;
  __loc._M_before = __loc._M_before->_M_nxt;
}




-- 
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931affcf71d826c88094 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  if (this->_M_equals(__k, __code, *__p))
 	return __prev_p;
 
-	  if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
+	  if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt, 0))
 	break;
 	  __prev_p = __p;
 	}
@@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	if (this->_M_equals_tr(__k, __code, *__p))
 	  return __prev_p;
 
-	if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
+	if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt, 0))
 	  break;
 	__prev_p = __p;
 	  }
@@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   pointer_to(const_cast<__node_base&>(_M_before_begin));
 	  while (__loc._M_before->_M_nxt)
 	{
-	  if (this->_M_key_equals(__k, *__loc._M_node()))
+	  if (__builtin_expect (this->_M_key_equals(__k, *__loc._M_node()), 1))
 		return __loc;
 	  __loc._M_before = __

[PATCH 1/2][libstdc++]: Add inline keyword to _M_locate

2024-12-13 Thread Tamar Christina
Hi All,

In GCC 12 there was a ~40% regression in the performance of hashmap->find.

This regression came about accidentally:

Before GCC 12 the find function was small enough that IPA would inline it even
though it wasn't marked inline.  In GCC-12 an optimization was added to perform
a linear search when the entries in the hashmap are small.

This increased the size of the function enough that IPA would no longer inline.
Inlining had two benefits:

1.  The return value is a reference. so it has to be returned and dereferenced
even though the search loop may have already dereference it.
2.  The pattern is a hard pattern to track for branch predictors.  This causes
a large number of branch misses if the value is immediately checked and
branched on. i.e. if (a != m.end()) which is a common pattern.

The patch fixes both these issues by adding the inline keyword to _M_locate
to allow the inliner to consider inlining again.

This and the other patches have been ran through serveral benchmarks where
the size, number of elements searched for and type (reference vs value) etc
were tested.

The change shows no statistical regression, but an average find improvement of
~27% and a range between ~10-60% improvements.  A selection of the results:

+---++---+--+
| Group | Benchmark  | Size  | % Inline |
+---++---+--+
| Find  | unord
-auto
+inline auto
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 _M_locate(const key_type& __k) const




-- 
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   typename _ExtractKey, typename _Equal,
 	   typename _Hash, typename _RangeHash, typename _Unused,
 	   typename _RehashPolicy, typename _Traits>
-auto
+inline auto
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 _M_locate(const key_type& __k) const





[PATCH 4/7 v4] lto: Implement ltrans cache

2024-12-13 Thread Michal Jires
On Thu, 2024-12-12 at 15:48:19 +, Jan Hubicka wrote:
> fgetc has kind of non-trivial overhead.  For non-MMAP systems (is
> Windows such?), I think allocating some buffer, say 64K
> and doing fread/memcmp is probably better.
Ok, changed to fread/memcmp fallback.

> Isn't std::string always 0 terminated?
It appears so, removed the push_back.

> Patch is OK, but please update the fgetc based file compare.

---

This patch implements Incremental LTO as ltrans cache.

Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/common.opt  |   8 +
 gcc/lto-ltrans-cache.cc | 437 
 gcc/lto-ltrans-cache.h  | 144 +
 gcc/lto-opts.cc |   2 +
 gcc/lto-wrapper.cc  | 164 +--
 6 files changed, 745 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index bb82d402ed0..bca3e94aec8 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1879,7 +1879,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2541,7 +2541,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/common.opt b/gcc/common.opt
index a42537c5f1e..0afcac0fa1c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2236,6 +2236,14 @@ flto=
 Common RejectNegative Joined Var(flag_lto)
 Link-time optimization with number of parallel jobs or jobserver.
 
+flto-incremental=
+Common Joined Var(flag_lto_incremental)
+Enable incremental LTO, with its cache in given directory.
+
+flto-incremental-cache-size=
+Common Joined RejectNegative UInteger Var(flag_lto_incremental_cache_size) 
Init(2048)
+Number of cache entries in incremental LTO after which to prune old entries.
+
 Enum
 Name(lto_partition_model) Type(enum lto_partition_model) UnknownError(unknown 
LTO partitioning model %qs)
 
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..c3e26f84072
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,437 @@
+/* File caching.
+   Copyright (C) 2023-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define INCLUDE_ALGORITHM
+#define INCLUDE_STRING
+#define INCLUDE_ARRAY
+#define INCLUDE_MAP
+#define INCLUDE_VECTOR
+#include "config.h"
+#include "system.h"
+#include "sha1.h"
+#include "lto-ltrans-cache.h"
+
+static const checksum_t NULL_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns NULL_CHECKSUM if not
+   possible.  */
+static checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return NULL_CHECKSUM;
+
+  checksum_t result = NULL_CHECKSUM;
+
+  int ret = sha1_stream (file, &result);
+
+  if (ret)
+result = NULL_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  bool ret = true;
+
+#if HAVE_MMAP_FILE
+  struct stat st;
+  if (stat (first_filename, &st) < 0 || !

[PATCH 3/4] libstdc++: Use alias-declarations in bits/hashtable_policy, h

2024-12-13 Thread Jonathan Wakely
This file is only for C++11 and later, so replace typedefs with
alias-declarations for clarity. Also remove redundant std::
qualification on size_t, ptrdiff_t etc.

We can also remove the result_type, first_argument_type and
second_argument_type typedefs from the range hashers. We don't need
those types to follow the C++98 adaptable function object protocol.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h: Replace typedefs with
alias-declarations. Remove redundant std:: qualification.
(_Mod_range_hashing, _Mask_range_hashing): Remove adaptable
function object typedefs.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/hashtable_policy.h | 167 +--
 1 file changed, 78 insertions(+), 89 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index caf8f82cc24..b7788eb5bd7 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -276,7 +276,7 @@ namespace __detail
   template
 struct _Hashtable_hash_traits
 {
-  static constexpr std::size_t
+  static constexpr size_t
   __small_size_threshold() noexcept
   { return std::__is_fast_hash<_Hash>::value ? 0 : 20; }
 };
@@ -306,7 +306,7 @@ namespace __detail
   template
 struct _Hash_node_value_base
 {
-  typedef _Value value_type;
+  using value_type = _Value;
 
   __gnu_cxx::__aligned_buffer<_Value> _M_storage;
 
@@ -343,7 +343,7 @@ namespace __detail
*/
   template<>
 struct _Hash_node_code_cache
-{ std::size_t  _M_hash_code; };
+{ size_t  _M_hash_code; };
 
   template
 struct _Hash_node_value
@@ -403,9 +403,9 @@ namespace __detail
   using __node_type = typename __base_type::__node_type;
 
 public:
-  using value_type = _Value;
-  using difference_type = std::ptrdiff_t;
-  using iterator_category = std::forward_iterator_tag;
+  using value_type= _Value;
+  using difference_type   = ptrdiff_t;
+  using iterator_category = forward_iterator_tag;
 
   using pointer = __conditional_t<__constant_iterators,
  const value_type*, value_type*>;
@@ -474,12 +474,12 @@ namespace __detail
= _Node_iterator<_Value, __constant_iterators, __cache>;
 
 public:
-  typedef _Value   value_type;
-  typedef std::ptrdiff_t   difference_type;
-  typedef std::forward_iterator_tag
iterator_category;
+  using value_type= _Value;
+  using difference_type   = ptrdiff_t;
+  using iterator_category = forward_iterator_tag;
 
-  typedef const value_type*pointer;
-  typedef const value_type&reference;
+  using pointer = const value_type*;
+  using reference = const value_type&;
 
   _Node_const_iterator() = default;
 
@@ -577,13 +577,8 @@ namespace __detail
   /// into the range [0, N).
   struct _Mod_range_hashing
   {
-typedef std::size_t first_argument_type;
-typedef std::size_t second_argument_type;
-typedef std::size_t result_type;
-
-result_type
-operator()(first_argument_type __num,
-  second_argument_type __den) const noexcept
+size_t
+operator()(size_t __num, size_t __den) const noexcept
 { return __num % __den; }
   };
 
@@ -609,12 +604,12 @@ namespace __detail
 
 // Return a bucket size no smaller than n.
 // TODO: 'const' qualifier is kept for abi compatibility reason.
-std::size_t
-_M_next_bkt(std::size_t __n) const;
+size_t
+_M_next_bkt(size_t __n) const;
 
 // Return a bucket count appropriate for n elements
-std::size_t
-_M_bkt_for_elements(std::size_t __n) const
+size_t
+_M_bkt_for_elements(size_t __n) const
 { return __builtin_ceil(__n / (double)_M_max_load_factor); }
 
 // __n_bkt is current bucket count, __n_elt is current element count,
@@ -622,11 +617,11 @@ namespace __detail
 // increase bucket count?  If so, return make_pair(true, n), where n
 // is the new bucket count.  If not, return make_pair(false, 0).
 // TODO: 'const' qualifier is kept for abi compatibility reason.
-std::pair
-_M_need_rehash(std::size_t __n_bkt, std::size_t __n_elt,
-  std::size_t __n_ins) const;
+std::pair
+_M_need_rehash(size_t __n_bkt, size_t __n_elt,
+  size_t __n_ins) const;
 
-typedef std::size_t _State;
+using _State = size_t;
 
 _State
 _M_state() const
@@ -640,30 +635,25 @@ namespace __detail
 _M_reset(_State __state)
 { _M_next_resize = __state; }
 
-static const std::size_t _S_growth_factor = 2;
+static const size_t _S_growth_factor = 2;
 
 float  _M_max_load_factor;
 
 // TODO: 'mutable' kept for abi compatibility reason.
-mutable std::size_t_

[PATCH 4/4] libstdc++: Initialize all members of hashtable local iterators

2024-12-13 Thread Jonathan Wakely
Currently the _M_bucket members are left uninitialized for
default-initialized local iterators, and then copy construction copies
indeterminate values. We should just ensure they're initialized on
construction.

Setting them to zero makes default-initialization consistent with
value-initialization and avoids indeterminate values.

For the _Local_iterator_base<..., false> specialization we preserve the
existing behaviour of setting _M_bucket_count to -1 in the default
constructor, as a sentinel value to indicate there's no hash object
present.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Local_iterator_base): Use
default member-initializers.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/hashtable_policy.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index b7788eb5bd7..6f46a5796ba 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1156,8 +1156,8 @@ namespace __detail
  }
   }
 
-  size_t _M_bucket;
-  size_t _M_bucket_count;
+  size_t _M_bucket = 0;
+  size_t _M_bucket_count = 0;
 
 public:
   size_t
@@ -1194,7 +1194,7 @@ namespace __detail
   using __hash_obj_storage = _Hash_obj_storage<_Hash>;
   using __node_iter_base = _Node_iterator_base<_Value, false>;
 
-  _Local_iterator_base() : _M_bucket_count(-1) { }
+  _Local_iterator_base() = default;
 
   _Local_iterator_base(const __hash_code_base& __base,
   _Hash_node<_Value, false>* __p,
@@ -1242,8 +1242,8 @@ namespace __detail
  }
   }
 
-  size_t _M_bucket;
-  size_t _M_bucket_count;
+  size_t _M_bucket = 0;
+  size_t _M_bucket_count = -1;
 
   void
   _M_init(const _Hash& __h)
-- 
2.47.1



Re: [PATCH] RISC-V: optimization by converting to LUI operands with LUI_AFTER_COMMON_LEADING_SHIFT

2024-12-13 Thread Jeff Law




On 12/13/24 5:42 AM, Oliver Kozul wrote:

The patch optimizes code generation for comparisons of the form
X & C1 == C2. When the bitwise AND mask is stored in the lower 20 bits
it can be left shifted so it behaves as a LUI operand instead,
saving an addi instruction while loading.

2024-12-13  Oliver Kozul  

  PR target/114087

gcc/ChangeLog:

  * config/riscv/riscv.h (COMMON_LEADING_ZEROS): New macro.
  (LUI_AFTER_COMMON_LEADING_SHIFT): New macro.
  * config/riscv/riscv.md (*lui_constraint_ashift): New 
pattern.


gcc/testsuite/ChangeLog:

  * gcc.target/riscv/pr114087-3.c: New test.



CONFIDENTIALITY: The contents of this e-mail are confidential and 
intended only for the above addressee(s). If you are not the intended 
recipient, or the person responsible for delivering it to the intended 
recipient, copying or delivering it to anyone else or using it in any 
unauthorized manner is prohibited and may be unlawful. If you receive 
this e-mail by mistake, please notify the sender and the systems 
administrator at straym...@rt-rk.com immediately.


patch-example3-v1.txt

---
  gcc/config/riscv/riscv.h| 18 +++
  gcc/config/riscv/riscv.md   | 35 +
  gcc/testsuite/gcc.target/riscv/pr114087-3.c | 10 ++
  3 files changed, 63 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr114087-3.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 09de74667a9..92850a52251 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h



diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3a4cd1d93a0..a44caa6908d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -858,6 +858,41 @@
[(set_attr "type" "arith")
 (set_attr "mode" "SI")])
  
+(define_insn_and_split "*lui_constraint_ashift"

+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+(plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+(match_operand 2 "const_int_operand"))
+(match_operand 3 "const_int_operand")))
+(clobber (match_scratch:X 4 "=&r"))]

So more of a nit.  We would write this as:

(define_insn_and_split "*lui_constraint_ashift"
  [(set (match_operand:ANYI 0 "register_operand" "=r")
(plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
 (match_operand 2 "const_int_operand"))
   (match_operand 3 "const_int_operand")))
(clobber (match_scratch:X 4 "=&r"))]

That makes it easier to see what the operands go with which operator.



+  "!LUI_OPERAND (INTVAL (operands[2]))
+  && !LUI_OPERAND (-INTVAL (operands[3]))
+  && !SMALL_OPERAND (INTVAL (operands[2]))
+  && !SMALL_OPERAND (-INTVAL (operands[3]))
+  && LUI_AFTER_COMMON_LEADING_SHIFT (INTVAL (operands[2]),
+  -INTVAL (operands[3]))"

Similarly we could make the -INTVAL (operands[3]) line up under the
INTVAL on the prior line.



+  [(set (match_dup 0) (ashift:X (match_dup 1) (match_dup 5)))
+   (set (match_dup 4) (match_dup 6))
+   (set (match_dup 0) (and:X (match_dup 0) (match_dup 4)))
+   (set (match_dup 4) (match_dup 7))
+   (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))]
+  {
+ HOST_WIDE_INT mask = INTVAL (operands[2]);
+HOST_WIDE_INT val = -INTVAL (operands[3]);
+int leading_shift = COMMON_LEADING_ZEROS (mask, val) - 1;
+
+if (TARGET_64BIT && leading_shift > 32)
+{
+  leading_shift -= 32;
+}
+
+operands[5] = GEN_INT (leading_shift);
+operands[6] = GEN_INT (mask << leading_shift);
+operands[7] = GEN_INT (val << leading_shift);
+  }
So it's safe to overwrite the operands[] array entries.  It's not really 
important here, but worth remembering that you don't really need to use 
new entries in the operands array.


I think the bigger question here is don't you need to to a right shift 
of the result to preserve the semantics of the original insn?


Concretely it matches this RTL:


(set (reg:DI 146)
(plus:DI (and:DI (reg:DI 150 [ x ])
(const_int 349525 [0x5]))
(const_int -282644 [0xfffbafec])))


If (reg:DI 150) has the value 0x1 when this insn executes, then the 
result would be 0xfffbafed.


mask = 0x5
val = 0x45013
leading_shift = (44 - 1) - 32 = 11

That results in the following new values in the operands array
operands[5] = 11
operands[6] = 0x2800
operands[7] = 0x22809800

The RTL we generate
op0 = (0x1 << 11)  == 0x800
op4 = 0x2800
op0 = op0 & op4 == 0x800 & 0x2800 == 0x800
op4 = 0x22809800
op0 = op0 - op4 == 0x800 - 0x22809800 == 0xdd7f700 or 0xdd7f700

Which looks nothing like the intended result of 0xfffbafed

Even if you arithmetic right shift your result 11 places you end up with 
the wrong answer:  0xfffbafee.  It's much closer, but still 
incorrect.


Also note there are failures in the pre-commit CI bot.   Of particular 
concern would be the pr102511.c failure.  I'll also

[PATCH] c++: modules: Fix 32-bit overflow with 64-bit location_t [PR117970]

2024-12-13 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117970

This fixes one place in module.cc that I missed updating for 64-bit
location_t in r15-5616. bootstrap + regtested on x86-64 + aarch64. Also
verified that the regression reported on the PR is fixed on that specific
configuration. OK to push? Thanks!

-Lewis

-- >8 --

With the move to 64-bit location_t in r15-6016, I missed a spot in module.cc
where a location_t was still being stored in a 32-bit int. Fixed.

The xtreme-header* tests in modules.exp were still passing fine on lots of
architectures that were tested (x86-64, i686, aarch64, sparc, riscv64), but
the PR shows that they were failing in some particular risc-v multilib
configurations. They pass now.

gcc/cp/ChangeLog:

* module.cc (module_state::read_ordinary_maps): Change argument to
line_map_uint_t instead of unsigned int.
---
 gcc/cp/module.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c3800b0f125..f2a4fb16c07 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3823,7 +3823,7 @@ class GTY((chain_next ("%h.parent"), for_user)) 
module_state {
 
   void write_ordinary_maps (elf_out *to, range_t &,
bool, unsigned *crc_ptr);
-  bool read_ordinary_maps (unsigned, unsigned);
+  bool read_ordinary_maps (line_map_uint_t, unsigned);
   void write_macro_maps (elf_out *to, range_t &, unsigned *crc_ptr);
   bool read_macro_maps (line_map_uint_t);
 
@@ -17093,7 +17093,8 @@ module_state::write_macro_maps (elf_out *to, range_t 
&info, unsigned *crc_p)
 }
 
 bool
-module_state::read_ordinary_maps (unsigned num_ord_locs, unsigned range_bits)
+module_state::read_ordinary_maps (line_map_uint_t num_ord_locs,
+ unsigned range_bits)
 {
   bytes_in sec;
 


Re: [PATCH] cse: Fix up record_jump_equiv checks [PR117095]

2024-12-13 Thread Jakub Jelinek
On Fri, Dec 13, 2024 at 04:00:38PM -0700, Jeff Law wrote:
> On 12/13/24 8:20 AM, Jakub Jelinek wrote:
> > The following testcase is miscompiled on s390x-linux with -O2 -march=z15.
> > The problem happens during cse2, which sees in an extended basic block
> > (jump_insn 217 78 216 10 (parallel [
> >  (set (pc)
> >  (if_then_else (ne (reg:SI 165)
> >  (const_int 1 [0x1]))
> >  (label_ref 216)
> >  (pc)))
> >  (set (reg:SI 165)
> >  (plus:SI (reg:SI 165)
> >  (const_int -1 [0x])))
> >  (clobber (scratch:SI))
> >  (clobber (reg:CC 33 %cc))
> >  ]) "t.c":14:17 discrim 1 2192 {doloop_si64}
> >   (int_list:REG_BR_PROB 955630228 (nil))
> >   -> 216)
> > ...
> > (insn 99 98 100 12 (set (reg:SI 138)
> >  (const_int 1 [0x1])) "t.c":9:31 1507 {*movsi_zarch}
> >   (nil))
> > (insn 100 99 103 12 (parallel [
> >  (set (reg:SI 137)
> >  (minus:SI (reg:SI 138)
> >  (subreg:SI (reg:HI 135 [ a ]) 0)))
> >  (clobber (reg:CC 33 %cc))
> >  ]) "t.c":9:31 1904 {*subsi3}
> >   (expr_list:REG_DEAD (reg:SI 138)
> >  (expr_list:REG_DEAD (reg:HI 135 [ a ])
> >  (expr_list:REG_UNUSED (reg:CC 33 %cc)
> >  (nil)
> I don't really see the connection between (reg 165) and (reg 138), but I
> don't think it matters enough to dive into.

The ebb continues after jump_insn 217 when the branch is not taken,
i.e. if (ne (reg:SI 165) (const_int 1 [0x1])) is false.  As it ignored the
r165 -= 1 part of the insn, it recorded (reg:SI 165) must be const1_rtx
and (reg:SI 138) is set to the same value, so let's use the older holder
of SImode 1 constant instead of a new one...
Except that (reg:SI 165) is actually 0 after the doloop insn stops
iterating.

Jakub



[PATCH 1/4] libstdc++: Further simplify _Hashtable inheritance hierarchy

2024-12-13 Thread Jonathan Wakely
The main change here is using [[no_unique_address]] instead of the Empty
Base-class Optimization. Using the attribute allows us to use data
members instead of base-classes. That simplifies the inheritance
hierarchy, which means less work for the compiler. It also means that
ADL has fewer associated classes and associated namespaces to consider,
further reducing the work the compiler has to do.

Reducing the differences between the _Hashtable_ebo_helper primary
template and the partial specialization means we no longer need to use
member functions to access the stored object, because it's now always a
data member called _M_obj.  This means we can also remove a number of
other helper functions that were using those member functions to access
the object, for example we can swap the _Hash and _Equal objects
directly in _Hashtable::swap instead of calling _Hashtable_base::_M_swap
which then calls _Hash_code_base::_M_swap.

Although [[no_unique_address]] would allow us to reduce the size for
empty types that are also 'final', doing so would be an ABI break
because those types were previously excluded from using the EBO. So we
still need the _Hashtable_ebo_helper class template and a partial
specialization, so that we only use the attribute under exactly the same
conditions as we previously used the EBO. This could be avoided with a
non-standard [[no_unique_address(expr)]] attribute that took a boolean
condition, or with reflection and token sequence injection, but we don't
have either of those things.

Because _Hashtable_ebo_helper is no longer used as a base-class we don't
need to disambiguate possible identical bases, so it doesn't need an
integral non-type template parameter.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::swap): Swap hash
function and equality predicate here. Inline allocator swap
instead of using __alloc_on_swap.
* include/bits/hashtable_policy.h (_Hashtable_ebo_helper):
Replace EBO with no_unique_address attribute. Remove NTTP.
(_Hash_code_base): Replace base class with data member using
no_unique_address attribute.
(_Hash_code_base::_M_swap): Remove.
(_Hash_code_base::_M_hash): Remove.
(_Hashtable_base): Replace base class with data member using
no_unique_address attribute.
(_Hashtable_base::_M_swap): Remove.
(_Hashtable_alloc): Replace base class with data member using
no_unique_address attribute.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/hashtable.h|  16 ++-
 libstdc++-v3/include/bits/hashtable_policy.h | 126 +--
 2 files changed, 42 insertions(+), 100 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index b8bd8c2f418..2dc24985d58 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1829,12 +1829,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 noexcept(__and_<__is_nothrow_swappable<_Hash>,
__is_nothrow_swappable<_Equal>>::value)
 {
-  // The only base class with member variables is hash_code_base.
-  // We define _Hash_code_base::_M_swap because different
-  // specializations have different members.
-  this->_M_swap(__x);
+  using std::swap;
+  swap(__hash_code_base::_M_hash._M_obj,
+  __x.__hash_code_base::_M_hash._M_obj);
+  swap(__hashtable_base::_M_equal._M_obj,
+  __x.__hashtable_base::_M_equal._M_obj);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
+  if constexpr (__node_alloc_traits::propagate_on_container_swap::value)
+   swap(this->_M_node_allocator(), __x._M_node_allocator());
+#pragma GCC diagnostic pop
 
-  std::__alloc_on_swap(this->_M_node_allocator(), __x._M_node_allocator());
   std::swap(_M_rehash_policy, __x._M_rehash_policy);
 
   // Deal properly with potentially moved instances.
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 6769399bd4d..8b3b7ba2682 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1015,46 +1015,24 @@ namespace __detail
   /**
*  Primary class template _Hashtable_ebo_helper.
*
-   *  Helper class using EBO when it is not forbidden (the type is not
-   *  final) and when it is worth it (the type is empty.)
+   *  Helper class using [[no_unique_address]] to reduce object size.
*/
-  template
-struct _Hashtable_ebo_helper;
-
-  /// Specialization using EBO.
-  template
-struct _Hashtable_ebo_helper<_Nm, _Tp, true>
-: private _Tp
+struct _Hashtable_ebo_helper
 {
-  _Hashtable_ebo_helper() noexcept(noexcept(_Tp())) : _Tp() { }
-
-  template
-   _Hashtable_ebo_helper(_OtherTp&& __tp)
-   : _Tp(std::forward<_OtherTp>(__tp))
-   { }
-
-  const _Tp& _M_cget() const { return

[PATCH 2/4] libstdc++: Simplify storage of hasher in local iterators

2024-12-13 Thread Jonathan Wakely
The fix for PR libstdc++/56267 (relating to the lifetime of the hash
object stored in a local iterator) has undefined behaviour, as it relies
on being able to call a member function on an empty object that never
started its lifetime. Although the member function probably doesn't care
about the empty object's state, this is still technically undefined
because there is no object of that type at that address. It's also
possible that the hash object would have a stricter alignment than the
_Hash_code_storage object, so that the reinterpret_cast would produce a
misaligned pointer.

This fix replaces _Local_iterator_base's _Hash_code_storage base-class
with a new class template containing a potentially-overlapping (i.e.
[[no_unique_address]]) union member.  This means that we always have
storage of the correct type, and it can be initialized/destroyed when
required. We no longer need a reinterpret_cast that gives us a pointer
that we should not dereference.

It would be nice if we could just use a union containing the _Hash
object as a data member of _Local_iterator_base, but that would be an
ABI change. The _Hash_code_storage that contains the _Hash object is the
first base-class, before the _Node_iterator_base base-class. Making the
union a data member of _Local_iterator_base would make it come after the
_Node_iterator_base base instead of before it, altering the layout.

Since we're changing _Hash_code_storage anyway, we can replace it with a
new class template that stores the _Hash object itself in the union,
rather than a _Hash_code_base that holds the _Hash. This removes an
unnecessary level of indirection in the class hierarchy. This change
requires the effects of _Hash_code_base::_M_bucket_index to be inlined
into the _Local_iterator_base::_M_incr function, but that's easy.

We don't need separate specializations of _Hash_obj_storage for an empty
hash function and a non-empty one. Using [[no_unique_address]] gives us
an empty base-class when possible.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Hash_code_storage): Remove.
(_Hash_obj_storage): New class template. Store the hash
function as a union member instead of using a byte buffer.
(_Local_iterator_base): Use _Hash_obj_storage instead of
_Hash_code_storage, adjust members that construct and destroy
the hash object.
(_Local_iterator_base::_M_incr): Calculate bucket index.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/hashtable_policy.h | 68 +++-
 1 file changed, 25 insertions(+), 43 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 8b3b7ba2682..caf8f82cc24 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1174,55 +1174,34 @@ namespace __detail
   _M_get_bucket() const { return _M_bucket; }  // for debug mode
 };
 
-  // Uninitialized storage for a _Hash_code_base.
-  // This type is DefaultConstructible and Assignable even if the
-  // _Hash_code_base type isn't, so that _Local_iterator_base<..., false>
-  // can be DefaultConstructible and Assignable.
-  template::value>
-struct _Hash_code_storage
+  // Uninitialized storage for a _Hash object in a local iterator.
+  // This type is DefaultConstructible even if the _Hash type isn't,
+  // so that _Local_iterator_base<..., false> can be DefaultConstructible.
+  template
+struct _Hash_obj_storage
 {
-  __gnu_cxx::__aligned_buffer<_Tp> _M_storage;
+  union _Uninit_storage
+  {
+   _Uninit_storage() noexcept { }
+   ~_Uninit_storage() { }
 
-  _Tp*
-  _M_h() { return _M_storage._M_ptr(); }
+   [[__no_unique_address__]] _Hash _M_h;
+  };
 
-  const _Tp*
-  _M_h() const { return _M_storage._M_ptr(); }
+  [[__no_unique_address__]] _Uninit_storage _M_u;
 };
 
-  // Empty partial specialization for empty _Hash_code_base types.
-  template
-struct _Hash_code_storage<_Tp, true>
-{
-  static_assert( std::is_empty<_Tp>::value, "Type must be empty" );
-
-  // As _Tp is an empty type there will be no bytes written/read through
-  // the cast pointer, so no strict-aliasing violation.
-  _Tp*
-  _M_h() { return reinterpret_cast<_Tp*>(this); }
-
-  const _Tp*
-  _M_h() const { return reinterpret_cast(this); }
-};
-
-  template
-using __hash_code_for_local_iter
-  = _Hash_code_storage<_Hash_code_base<_Key, _Value, _ExtractKey,
-  _Hash, _RangeHash, _Unused, false>>;
-
   // Partial specialization used when hash codes are not cached
   template
 struct _Local_iterator_base<_Key, _Value, _ExtractKey,
_Hash, _RangeHash, _Unused, false>
-: __hash_code_for_local_iter<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
-_Unused>
-, _Node_iterator_base<_Value, false>
+

[PATCH] openmp: Add support for non-constant iterator parameters in map, to and from clauses

2024-12-13 Thread Kwok Cheung Yeung
This patch builds on the previous patch series implementing OpenMP 
iterators at: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670333.html


This patch removes the limitation that the lower and upper bounds and 
strides of the iterator must be compile-time constants - they can now be 
anything that results in an integer expression.


This means that the internal arrays used to hold the expanded clause 
expressions must now be dynamically allocated instead of statically 
(static arrays are still used if the iteration count is determined to be 
a compile-time constant). Calls to malloc the arrays are generated just 
before the loops created to expand the iterators, and calls to free them 
are generated after the target statement. The malloc calls are added in 
the omplower stage rather than when the loops are being built during 
Gimplification because clauses can still get moved around and removed at 
that point.


KwokFrom d277a9539ad78fae9eb97a156949320623060a0d Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Thu, 12 Dec 2024 21:22:20 +
Subject: [PATCH] openmp: Add support for non-constant iterator parameters in
 map, to and from clauses

This patch enables support for using non-constant expressions when specifying
iterators in the map clause of target constructs and to/from clauses of
target update constructs.

2024-12-10  Kwok Cheung Yeung  

gcc/
* gimplify.cc (build_omp_iterators_loops): Change type of elements
array to pointer of pointers if array length is non-constant, and
assign size with indirect reference.  Reorder elements added to
iterator vector and add element containing array length.
* omp-low.cc (lower_omp_map_iterator_expr): Reorder elements read
from iterator vector.  If elements field is a pointer type, assign
using pointer arithmetic followed by indirect reference, and return
the field directly.
(lower_omp_map_iterator_size): Reorder elements read from iterator
vector.  If elements field is a pointer type, assign using pointer
arithmetic followed by indirect reference.
(allocate_omp_iterator_elems): New.
(free_omp_iterator_elems): New.
(lower_omp_target): Call allocate_omp_iterator_elems before inserting
loops sequence, and call free_omp_iterator_elems afterwards.
* tree-pretty-print.cc (dump_omp_iterators): Print extra elements in
iterator vector.

gcc/testsuite/
* c-c++-common/gomp/target-map-iterators-3.c: Update expected Gimple
output.
* c-c++-common/gomp/target-map-iterators-5.c: New.
* c-c++-common/gomp/target-update-iterators-3.c: Update expected
Gimple output.
* gfortran.dg/gomp/target-map-iterators-3.f90: Likewise.
* gfortran.dg/gomp/target-map-iterators-5.f90: New.
* gfortran.dg/gomp/target-update-iterators-3.f90: Update expected
Gimple output.

libgomp/
* testsuite/libgomp.c-c++-common/target-map-iterators-4.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-5.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-4.c: New.
* testsuite/libgomp.fortran/target-map-iterators-4.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-5.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-4.f90: New.
---
 gcc/gimplify.cc   |  34 +++---
 gcc/omp-low.cc| 100 --
 .../gomp/target-map-iterators-3.c |   8 +-
 .../gomp/target-map-iterators-5.c |  14 +++
 .../gomp/target-update-iterators-3.c  |   4 +-
 .../gomp/target-map-iterators-3.f90   |   8 +-
 .../gomp/target-map-iterators-5.f90   |  21 
 .../gomp/target-update-iterators-3.f90|   6 +-
 gcc/tree-pretty-print.cc  |   6 +-
 .../target-map-iterators-4.c  |  48 +
 .../target-map-iterators-5.c  |  59 +++
 .../target-update-iterators-4.c   |  66 
 .../target-map-iterators-4.f90|  48 +
 .../target-map-iterators-5.f90|  61 +++
 .../target-update-iterators-4.f90 |  70 
 15 files changed, 509 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-map-iterators-5.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-5.f90
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-map-iterators-4.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-map-iterators-5.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-4.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-5.f90
 create mode 100644 
libgomp/test

Re: [PATCH 1/2][libstdc++]: Add inline keyword to _M_locate

2024-12-13 Thread Jonathan Wakely
On Fri, 13 Dec 2024 at 17:12, Tamar Christina  wrote:
>
> Hi All,
>
> In GCC 12 there was a ~40% regression in the performance of hashmap->find.
>
> This regression came about accidentally:
>
> Before GCC 12 the find function was small enough that IPA would inline it even
> though it wasn't marked inline.  In GCC-12 an optimization was added to 
> perform
> a linear search when the entries in the hashmap are small.
>
> This increased the size of the function enough that IPA would no longer 
> inline.
> Inlining had two benefits:
>
> 1.  The return value is a reference. so it has to be returned and dereferenced
> even though the search loop may have already dereference it.
> 2.  The pattern is a hard pattern to track for branch predictors.  This causes
> a large number of branch misses if the value is immediately checked and
> branched on. i.e. if (a != m.end()) which is a common pattern.
>
> The patch fixes both these issues by adding the inline keyword to _M_locate
> to allow the inliner to consider inlining again.
>
> This and the other patches have been ran through serveral benchmarks where
> the size, number of elements searched for and type (reference vs value) etc
> were tested.
>
> The change shows no statistical regression, but an average find improvement of
> ~27% and a range between ~10-60% improvements.  A selection of the results:
>
> +---++---+--+
> | Group | Benchmark  | Size  | % Inline |
> +---++---+--+
> | Find  | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find  | unord | Find Many | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find Many | unord | Find  | unord | Find  | unord | Find  | unord | Find  | unord | Find  | unord | Find  | val | Find  | unord | Find  | unord | Find  | unord +---++---+--+
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and possible backports?

Thanks for the thorough benchmarking.

OK for trunk.

A different patch will be needed for the release branches because
_M_locate only exists on trunk.

>
> Thanks,
> Tamar
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/hashtable.h: Inline _M_locate.
>
> ---
> diff --git a/libstdc++-v3/include/bits/hashtable.h 
> b/libstdc++-v3/include/bits/hashtable.h
> index 
> b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3
>  100644
> --- a/libstdc++-v3/include/bits/hashtable.h
> +++ b/libstdc++-v3/include/bits/hashtable.h
> @@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>typename _ExtractKey, typename _Equal,
>typename _Hash, typename _RangeHash, typename _Unused,
>typename _RehashPolicy, typename _Traits>
> -auto
> +inline auto
>  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>  _M_locate(const key_type& __k) const
>
>
>
>
> --



Re: [PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop conditions

2024-12-13 Thread Jonathan Wakely
On Fri, 13 Dec 2024 at 17:13, Tamar Christina  wrote:
>
> Hi All,
>
> We are currently generating a loop which has more comparisons than you'd
> typically need as the probablities on the small size loop are such that it
> assumes the likely case is that an element is not found.
>
> This again generates a pattern that's harder for branch predictors to follow,
> but also just generates more instructions for the what one could say is the
> typical case: That your hashtable contains the entry you are looking for.
>
> This patch adds a __builtin_expect to indicate that it's likely that you'd
> find the element that's being searched for.
>
> The second change is in _M_find_before_node where at the moment the loop
> is optimized for the case where we don't do any iterations.
>
> A simple testcase is:
>
> #include 
>
> bool foo (int **a, int n, int val, int *tkn)
> {
> for (int i = 0; i < n; i++)
> {
> if (!a[i] || a[i]==tkn)
>   return false;
>
> if (*a[i] == val)
>   return true;
> }
> }
>
> which generataes:
>
> foo:
> cmp w1, 0
> ble .L1
> add x1, x0, w1, uxtw 3
> b   .L4
> .L9:
> ldr w4, [x4]
> cmp w4, w2
> beq .L6
> cmp x0, x1
> beq .L1
> .L4:
> ldr x4, [x0]
> add x0, x0, 8
> cmp x4, 0
> ccmpx4, x3, 4, ne
> bne .L9
> mov w0, 0
> .L1:
> ret
> .L6:
> mov w0, 1
> ret
>
> i.e. BB rotation makes is generate an unconditional branch to a conditional
> branch. However this method is only called when the size is above a certain
> threshold, and so it's likely that we have to do that first iteration.
>
> Adding:
>
> #include 
>
> bool foo (int **a, int n, int val, int *tkn)
> {
> for (int i = 0; i < n; i++)
> {
> if (__builtin_expect(!a[i] || a[i]==tkn, 0))
>   return false;
>
> if (*a[i] == val)
>   return true;
> }
> }
>
> to indicate that we will likely do an iteration more generates:
>
> foo:
> cmp w1, 0
> ble .L1
> add x1, x0, w1, uxtw 3
> .L4:
> ldr x4, [x0]
> add x0, x0, 8
> cmp x4, 0
> ccmpx4, x3, 4, ne
> beq .L5
> ldr w4, [x4]
> cmp w4, w2
> beq .L6
> cmp x0, x1
> bne .L4
> .L1:
> ret
> .L5:
> mov w0, 0
> ret
> .L6:
> mov w0, 1
> ret
>
> which results in ~0-20% extra on top of the previous patch.
>
> In table form:
>
> +---++---+--++
> | Group | Benchmark  | Size  | % Inline | % Unlikely all |
> +---++---+--++
> | Find  | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find  | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find  | unord | Find  | unord | Find  | unord | Find  | unord | Find  | unord | Find  | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find Many | unord | Find  | unord | Find  | unord | Find  | vec +---++---+--++
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and possible backports?
>
> Thanks,
> Tamar
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/hashtable.h (_M_locate): Make it likely that we find an
> element.
> (_M_find_before_node): Make it likely that the map has at least one
> entry and so we do at least one iteration.
>
> ---
> diff --git a/libstdc++-v3/include/bits/hashtable.h 
> b/libstdc++-v3/include/bits/hashtable.h
> index 
> e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931affcf71d826c88094
>  100644
> --- a/libstdc++-v3/include/bits/hashtable.h
> +++ b/libstdc++-v3/include/bits/hashtable.h
> @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   if (this->_M_equals(__k, __code, *__p))
> return __prev_p;
>
> - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> + if (__builtin_expect (!__p->_M_nxt || 
> _M_bucket_index(*__p->_M_next()) != __bkt, 0))
> break;
>   __prev_p = __p;
> }
> @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> if (this->_M_equals_tr(__k, __code, *__p))
>   return __prev_p;
>
> -   if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> +   if (__builtin_expect (!__p->_M_nxt || 
> _M_bucket_index(*__p->_M_n

Re: [PATCH 1/2][libstdc++]: Add inline keyword to _M_locate

2024-12-13 Thread Jonathan Wakely
N.B. All libstdc++ patches should be sent to the libstdc++ list (not
just to the names in the MAINTAINERS file).

On Sat, 14 Dec 2024 at 01:01, Jonathan Wakely  wrote:
>
> On Fri, 13 Dec 2024 at 17:12, Tamar Christina  wrote:
> >
> > Hi All,
> >
> > In GCC 12 there was a ~40% regression in the performance of hashmap->find.
> >
> > This regression came about accidentally:
> >
> > Before GCC 12 the find function was small enough that IPA would inline it 
> > even
> > though it wasn't marked inline.  In GCC-12 an optimization was added to 
> > perform
> > a linear search when the entries in the hashmap are small.
> >
> > This increased the size of the function enough that IPA would no longer 
> > inline.
> > Inlining had two benefits:
> >
> > 1.  The return value is a reference. so it has to be returned and 
> > dereferenced
> > even though the search loop may have already dereference it.
> > 2.  The pattern is a hard pattern to track for branch predictors.  This 
> > causes
> > a large number of branch misses if the value is immediately checked and
> > branched on. i.e. if (a != m.end()) which is a common pattern.
> >
> > The patch fixes both these issues by adding the inline keyword to _M_locate
> > to allow the inliner to consider inlining again.
> >
> > This and the other patches have been ran through serveral benchmarks where
> > the size, number of elements searched for and type (reference vs value) etc
> > were tested.
> >
> > The change shows no statistical regression, but an average find improvement 
> > of
> > ~27% and a range between ~10-60% improvements.  A selection of the results:
> >
> > +---++---+--+
> > | Group | Benchmark  | Size  | % Inline |
> > +---++---+--+
> > | Find  | unord > | Find  | unord > | Find Many | unord > | Find Many | unord > | Find Many | unord > | Find Many | unord > | Find  | unord > | Find  | unord > | Find Many | unord > | Find  | unord > | Find Many | unord > | Find Many | unord > | Find Many | unord > | Find  | unord > | Find Many | unord > | Find Many | unord > | Find Many | unord > | Find Many | unord > | Find Many | unord > | Find  | unord > | Find Many | unord > | Find  | unord > | Find  | unord > | Find  | unord > | Find  | unord > | Find  | unord > | Find  | val > | Find  | unord > | Find  | unord > | Find  | unord > +---++---+--+
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? and possible backports?
>
> Thanks for the thorough benchmarking.
>
> OK for trunk.
>
> A different patch will be needed for the release branches because
> _M_locate only exists on trunk.
>
> >
> > Thanks,
> > Tamar
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/hashtable.h: Inline _M_locate.
> >
> > ---
> > diff --git a/libstdc++-v3/include/bits/hashtable.h 
> > b/libstdc++-v3/include/bits/hashtable.h
> > index 
> > b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3
> >  100644
> > --- a/libstdc++-v3/include/bits/hashtable.h
> > +++ b/libstdc++-v3/include/bits/hashtable.h
> > @@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >typename _ExtractKey, typename _Equal,
> >typename _Hash, typename _RangeHash, typename _Unused,
> >typename _RehashPolicy, typename _Traits>
> > -auto
> > +inline auto
> >  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >  _M_locate(const key_type& __k) const
> >
> >
> >
> >
> > --