date:20240117

[PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread Huanghui Nie

Hi.

When I implemented a hash table with reference to the C++ STL, I found that
when the hash table in the C++ STL deletes elements, if the first element
deleted is the begin element, the before begin node is repeatedly assigned.
This creates unnecessary performance overhead.


First, let’s see the code implementation:

In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when
&_M_before_begin == _M_buckets[__bkt]. That also means
_M_buckets[__bkt]->_M_nxt is assigned under some conditions.

_M_remove_bucket_begin is called by _M_erase and _M_extract_node:

   1. Case _M_erase a range: _M_remove_bucket_begin is called in a for loop
   when __is_bucket_begin is true. And if __is_bucket_begin is true and
   &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin.
   __prev_n->_M_nxt is always assigned in _M_erase. That means
   _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is
   called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to
   assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
   2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
   _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and
   _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned.
   So there's no need to assign _M_buckets[__bkt]->_M_nxt in
   _M_remove_bucket_begin.

In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt]
and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.


Then let’s see the responsibility of each method:

The hash table in the C++ STL is composed of hash buckets and a node list.
The update of the node list is responsible for _M_erase and _M_extract_node
method. _M_remove_bucket_begin method only needs to update the hash
buckets. The update of _M_before_begin belongs to the update of the node
list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.


Existing tests listed below cover this change:

23_containers/unordered_set/allocator/copy.cc

23_containers/unordered_set/allocator/copy_assign.cc

23_containers/unordered_set/allocator/move.cc

23_containers/unordered_set/allocator/move_assign.cc

23_containers/unordered_set/allocator/swap.cc

23_containers/unordered_set/erase/1.cc

23_containers/unordered_set/erase/24061-set.cc

23_containers/unordered_set/modifiers/extract.cc

23_containers/unordered_set/operations/count.cc

23_containers/unordered_set/requirements/exception/basic.cc

23_containers/unordered_map/allocator/copy.cc

23_containers/unordered_map/allocator/copy_assign.cc

23_containers/unordered_map/allocator/move.cc

23_containers/unordered_map/allocator/move_assign.cc

23_containers/unordered_map/allocator/swap.cc

23_containers/unordered_map/erase/1.cc

23_containers/unordered_map/erase/24061-map.cc

23_containers/unordered_map/modifiers/extract.cc

23_containers/unordered_map/modifiers/move_assign.cc

23_containers/unordered_map/operations/count.cc

23_containers/unordered_map/requirements/exception/basic.cc


Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?


---

ChangeLog:


libstdc++: hashtable: No need to update before begin node in
_M_remove_bucket_begin


2024-01-16  Huanghui Nie  


gcc/

* libstdc++-v3/include/bits/hashtable.h


---


diff --git a/libstdc++-v3/include/bits/hashtable.h
b/libstdc++-v3/include/bits/hashtable.h

index b48610036fa..6056639e663 100644

--- a/libstdc++-v3/include/bits/hashtable.h

+++ b/libstdc++-v3/include/bits/hashtable.h

@@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

if (!__next_n || __next_bkt != __bkt)

  {

// Bucket is now empty

-   // First update next bucket if any

+   // Update next bucket if any

if (__next_n)

  _M_buckets[__next_bkt] = _M_buckets[__bkt];



-   // Second update before begin node if necessary

-   if (&_M_before_begin == _M_buckets[__bkt])

- _M_before_begin._M_nxt = __next_n;

_M_buckets[__bkt] = nullptr;

  }

   }

Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread Huanghui Nie

Thanks. Done.

2024年1月17日(水) 12:39 Sam James :

>
> Huanghui Nie  writes:
>
> > Hi.
>
> Please CC the libstdc++ LM for libstdc++ patches, per
>
> https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html#list.patches
> .
>
> > [...]
>
>

Re: [PATCH] libsanitizer: Replace memcpy with internal version in sanitizer_common

2024-01-17 Thread Daniel Cederman


On 2024-01-16 15:44, Jakub Jelinek wrote:

On Tue, Jan 16, 2024 at 03:11:39PM +0100, Daniel Cederman wrote:

When GCC is configured with --enable-target-optspace the compiler generates
a memcpy call in the Symbolizer constructor in sanitizer_symbolizer.cpp
when compiling for SPARC V8. Add HAVE_AS_SYM_ASSIGN to replace it with a
call to __sanitizer_internal_memcpy.

libsanitizer/ChangeLog:

* sanitizer_common/Makefile.am (DEFS): Add @AS_SYM_ASSIGN_DEFS@.
* sanitizer_common/Makefile.in: Regenerate.


Ok.

Jakub



We have only been granted write approval for the SPARC port. Is it
ok to push this anyway or could you help us with it?

/Daniel

Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread Jonathan Wakely

On Wed, 17 Jan 2024, 08:14 Huanghui Nie via Gcc,  wrote:

> Thanks. Done.
>

And don't CC the main gcc@ list, that's not for patch discussion. And if
you CC the right list, you don't need to CC the individual maintainers.

Anyway, it's on the right list now so we'll review it there, thanks.

> 2024年1月17日(水) 12:39 Sam James :
>
> >
> > Huanghui Nie  writes:
> >
> > > Hi.
> >
> > Please CC the libstdc++ LM for libstdc++ patches, per
> >
> >
> https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html#list.patches
> > .
> >
> > > [...]
> >
> >
>

[PATCH v2] Fix __builtin_nested_func_ptr_{created, deleted} symbol versions [PR113402]

2024-01-17 Thread Iain Sandoe

Tested on x86_64, aarch64 Darwin21 (which default to heap-based trampolines)
and on x86_64 Darwin19 and Linux (which default to executable stack
trampolines).
OK for trunk?
Iain

--- 8< ---

The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.

As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*.

PR libgcc/113402

gcc/ChangeLog:

* builtins.def
(BUILT_IN_NESTED_PTR_CREATED): Rename __builtin_nested_func_ptr_created
to __gcc_nested_func_ptr_created.
(BUILT_IN_NESTED_PTR_DELETED): Rename __builtin_nested_func_ptr_deleted
to __gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Likewise.
* tree.cc (build_common_builtin_nodes): Likewise.

libgcc/ChangeLog:

* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.

Signed-off-by: Iain Sandoe 
Co-authored-by: Jakub Jelinek  
---
 gcc/builtins.def| 4 ++--
 gcc/doc/invoke.texi | 4 ++--
 gcc/tree.cc | 8 
 libgcc/config/aarch64/heap-trampoline.c | 8 
 libgcc/config/i386/heap-trampoline.c| 8 
 libgcc/libgcc-std.ver.in| 5 ++---
 libgcc/libgcc2.h| 4 ++--
 7 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 4d97ca0eec9..e8a88ee8bf7 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1084,8 +1084,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, 
"__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
-DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, 
"__builtin_nested_func_ptr_created")
-DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, 
"__builtin_nested_func_ptr_deleted")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, "__gcc_nested_func_ptr_created")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, "__gcc_nested_func_ptr_deleted")
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16e31a3c6db..9727f1de71d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19450,8 +19450,8 @@ for nested functions.
 By default, trampolines are generated on stack.  However, certain platforms
 (such as the Apple M1) do not permit an executable stack.  Compiling with
 @option{-ftrampoline-impl=heap} generate calls to
-@code{__builtin_nested_func_ptr_created} and
-@code{__builtin_nested_func_ptr_deleted} in order to allocate and
+@code{__gcc_nested_func_ptr_created} and
+@code{__gcc_nested_func_ptr_deleted} in order to allocate and
 deallocate trampoline space on the executable heap.  These functions are
 implemented in libgcc, and will only be provided on specific targets:
 x86_64 Darwin, x86_64 and aarch64 Linux.  @emph{PLEASE NOTE}: Heap
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 8aee3ef18d8..6fa99ad7fe4 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9934,15 +9934,15 @@ build_common_builtin_nodes (void)
ptr_type_node, // void *func
ptr_ptr_type_node, // void **dst
NULL_TREE);
-  local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
+  local_define_builtin ("__gcc_nested_func_ptr_created", ftype,
BUILT_IN_NESTED_PTR_CREATED,
-   "__builtin_nested_func_ptr_created", ECF_NOTHROW);
+   "__gcc_nested_func_ptr_created", ECF_NOTHROW);
 
   ftype = build_function_type_list (void_type_node,
NULL_TREE);
-  local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
+  local_define_builtin ("__gcc_nested_func_ptr_deleted", ftype,
BUILT_IN_NESTED_PTR_DELETED,
-   "__builtin_nested_func_ptr_deleted", ECF_NOTHROW);
+   "__gcc_nested_func_ptr_deleted", ECF_NOTHROW);
 
   ftype = build_function_type_list (void_type_node,
ptr_type_node, ptr_type_node, NULL_TREE);
diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
index f22233987ca..2041fe6aa39 100644
--- a/libgcc/config/aarch64/heap-trampoline.c
+++ b

Re: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Alex Coplan

Hi Andrew,

On 16/01/2024 19:29, Andrew Pinski wrote:
> So the problem here is that aarch64_ldp_reg_operand will all subreg even 
> subreg of lo_sum.
> When LRA tries to fix that up, all things break. So the fix is to change the 
> check to only
> allow reg and subreg of regs.

Thanks a lot for tracking this down, I really appreciate having some help with
the bug-fixing.  Sorry for not getting to it sooner myself, I'm working on
PR113089 which ended up taking longer than expected to fix.

> 
> Note the tendancy here is to use register_operand but that checks the mode of 
> the register
> but we need to allow a mismatch modes for this predicate for now.

Yeah, due to the design of the patterns using special predicates we need
to allow a mode mismatch with the contextual mode.

The patch broadly LGTM (although I can't approve), but I've left a
couple of minor comments below.

> 
> Built and tested for aarch64-linux-gnu with no regressions
> (Also tested with the LD/ST pair pass back on).
> 
>   PR target/113221
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
>   only allow REG operands isntead of allowing all.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.c-torture/compile/pr113221-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/predicates.md |  8 +++-
>  gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
>  2 files changed, 19 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> 
> diff --git a/gcc/config/aarch64/predicates.md 
> b/gcc/config/aarch64/predicates.md
> index 8a204e48bb5..256268517d8 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -313,7 +313,13 @@ (define_predicate "pmode_plus_operator"
>  
>  (define_special_predicate "aarch64_ldp_reg_operand"
>(and
> -(match_code "reg,subreg")
> +(ior
> +  (match_code "reg")
> +  (and
> +   (match_code "subreg")
> +   (match_test "GET_CODE (SUBREG_REG (op)) == REG")

This could be just REG_P (SUBREG_REG (op)) in the match_test.

> +  )
> +)

I think it would be more in keeping with the style in the rest of the file to
have the closing parens on the same line as the SUBREG_REG match_test.

>  (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
>  (ior
>(match_test "mode == VOIDmode")
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> new file mode 100644
> index 000..152a510786e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */

Does this need to be dg-additional-options?  Naively I would expect the
dg-options clause to override the torture options (and potentially any
options provided in RUNTESTFLAGS, e.g. to re-enable the ldp/stp pass).

Thanks again for the patch, and apologies for the oversight on my part: I'd
missed that register_operand also checks the code inside the subreg.

Alex

> +/* PR target/113221 */
> +/* This used to ICE after the `load/store pair fusion pass` was added
> +   due to the predicate aarch64_ldp_reg_operand allowing too much. */
> +
> +
> +void bar();
> +void foo(int* b) {
> +  for (;;)
> +*b++ = (long)bar;
> +}
> +
> -- 
> 2.39.3
>

[PATCH v1] RISC-V: Fix asm checks regression due to recent middle-end change

2024-01-17 Thread pan2 . li

From: Pan Li 

The recent middle-end change result in some asm check failures.
This patch would like to fix the asm check by adjust the times.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/shift-1.c: Fix asm check
count.
* gcc.target/riscv/rvv/autovec/vls/shift-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
index e57a0b6bdf3..cb5a1dbc9ff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, int64_t, >>)
 DEF_OP_VV (shift, 256, int64_t, >>)
 DEF_OP_VV (shift, 512, int64_t, >>)
 
-/* { dg-final { scan-assembler-times 
{vsra\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 39 } } */
+/* { dg-final { scan-assembler-times 
{vsra\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
index 9d1fa64232c..e626a52c2d8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, uint64_t, >>)
 DEF_OP_VV (shift, 256, uint64_t, >>)
 DEF_OP_VV (shift, 512, uint64_t, >>)
 
-/* { dg-final { scan-assembler-times 
{vsrl\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 39 } } */
+/* { dg-final { scan-assembler-times 
{vsrl\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
index 8de1b9c0c41..244bee02e55 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, int64_t, <<)
 DEF_OP_VV (shift, 256, int64_t, <<)
 DEF_OP_VV (shift, 512, int64_t, <<)
 
-/* { dg-final { scan-assembler-times 
{vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 46 } } */
+/* { dg-final { scan-assembler-times 
{vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 47 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
-- 
2.34.1

Re: [PATCH v1] RISC-V: Fix asm checks regression due to recent middle-end change

2024-01-17 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-01-17 17:00
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Fix asm checks regression due to recent middle-end 
change
From: Pan Li 
 
The recent middle-end change result in some asm check failures.
This patch would like to fix the asm check by adjust the times.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/shift-1.c: Fix asm check
count.
* gcc.target/riscv/rvv/autovec/vls/shift-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
index e57a0b6bdf3..cb5a1dbc9ff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, int64_t, >>)
DEF_OP_VV (shift, 256, int64_t, >>)
DEF_OP_VV (shift, 512, int64_t, >>)
-/* { dg-final { scan-assembler-times 
{vsra\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 39 } } */
+/* { dg-final { scan-assembler-times 
{vsra\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
index 9d1fa64232c..e626a52c2d8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, uint64_t, >>)
DEF_OP_VV (shift, 256, uint64_t, >>)
DEF_OP_VV (shift, 512, uint64_t, >>)
-/* { dg-final { scan-assembler-times 
{vsrl\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 39 } } */
+/* { dg-final { scan-assembler-times 
{vsrl\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
index 8de1b9c0c41..244bee02e55 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, int64_t, <<)
DEF_OP_VV (shift, 256, int64_t, <<)
DEF_OP_VV (shift, 512, int64_t, <<)
-/* { dg-final { scan-assembler-times 
{vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 46 } } */
+/* { dg-final { scan-assembler-times 
{vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 47 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
-- 
2.34.1

RE: [PATCH v1] RISC-V: Fix asm checks regression due to recent middle-end change

2024-01-17 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, January 17, 2024 5:02 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Fix asm checks regression due to recent 
middle-end change

LGTM。


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-01-17 17:00
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Fix asm checks regression due to recent middle-end 
change
From: Pan Li mailto:pan2...@intel.com>>

The recent middle-end change result in some asm check failures.
This patch would like to fix the asm check by adjust the times.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/shift-1.c: Fix asm check
count.
* gcc.target/riscv/rvv/autovec/vls/shift-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
index e57a0b6bdf3..cb5a1dbc9ff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-1.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, int64_t, >>)
DEF_OP_VV (shift, 256, int64_t, >>)
DEF_OP_VV (shift, 512, int64_t, >>)
-/* { dg-final { scan-assembler-times 
{vsra\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 39 } } */
+/* { dg-final { scan-assembler-times 
{vsra\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
index 9d1fa64232c..e626a52c2d8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-2.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, uint64_t, >>)
DEF_OP_VV (shift, 256, uint64_t, >>)
DEF_OP_VV (shift, 512, uint64_t, >>)
-/* { dg-final { scan-assembler-times 
{vsrl\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 39 } } */
+/* { dg-final { scan-assembler-times 
{vsrl\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
index 8de1b9c0c41..244bee02e55 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
@@ -53,5 +53,5 @@ DEF_OP_VV (shift, 128, int64_t, <<)
DEF_OP_VV (shift, 256, int64_t, <<)
DEF_OP_VV (shift, 512, int64_t, <<)
-/* { dg-final { scan-assembler-times 
{vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 46 } } */
+/* { dg-final { scan-assembler-times 
{vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 47 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
--
2.34.1

Re: [PATCH v2] Fix __builtin_nested_func_ptr_{created, deleted} symbol versions [PR113402]

2024-01-17 Thread Iain Sandoe




> On 17 Jan 2024, at 08:55, Iain Sandoe  wrote:
> 
> Tested on x86_64, aarch64 Darwin21 (which default to heap-based trampolines)
> and on x86_64 Darwin19 and Linux (which default to executable stack
> trampolines).
> OK for trunk?

Hmm.. maybe this is not right and the builtins should still be named __builtin 
(with
the fallback function only renamed) or alternatively, add these as libfuncs 
only?

> Iain
> 
> --- 8< ---
> 
> The symbols for the functions supporting heap-based trampolines were
> exported at an incorrect symbol version, the following patch fixes that.
> 
> As requested in the PR, this also renames __builtin_nested_func_ptr* to
> __gcc_nested_func_ptr*.
> 
>   PR libgcc/113402
> 
> gcc/ChangeLog:
> 
>   * builtins.def
>   (BUILT_IN_NESTED_PTR_CREATED): Rename __builtin_nested_func_ptr_created
>   to __gcc_nested_func_ptr_created.
>   (BUILT_IN_NESTED_PTR_DELETED): Rename __builtin_nested_func_ptr_deleted
>   to __gcc_nested_func_ptr_deleted.
>   * doc/invoke.texi: Likewise.
>   * tree.cc (build_common_builtin_nodes): Likewise.
> 
> libgcc/ChangeLog:
> 
>   * config/aarch64/heap-trampoline.c: Rename
>   __builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
>   __builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
>   * config/i386/heap-trampoline.c: Likewise.
>   * libgcc2.h: Likewise.
>   * libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
>   __gcc_nested_func_ptr_created and
>   __gcc_nested_func_ptr_deleted from this symbol version to ...
>   (GCC_14.0.0): ... this one.
> 
> Signed-off-by: Iain Sandoe 
> Co-authored-by: Jakub Jelinek  
> ---
> gcc/builtins.def| 4 ++--
> gcc/doc/invoke.texi | 4 ++--
> gcc/tree.cc | 8 
> libgcc/config/aarch64/heap-trampoline.c | 8 
> libgcc/config/i386/heap-trampoline.c| 8 
> libgcc/libgcc-std.ver.in| 5 ++---
> libgcc/libgcc2.h| 4 ++--
> 7 files changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 4d97ca0eec9..e8a88ee8bf7 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -1084,8 +1084,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, 
> "__builtin_adjust_trampoline")
> DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
> DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
> DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
> -DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, 
> "__builtin_nested_func_ptr_created")
> -DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, 
> "__builtin_nested_func_ptr_deleted")
> +DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, 
> "__gcc_nested_func_ptr_created")
> +DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, 
> "__gcc_nested_func_ptr_deleted")
> 
> /* Implementing __builtin_setjmp.  */
> DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 16e31a3c6db..9727f1de71d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -19450,8 +19450,8 @@ for nested functions.
> By default, trampolines are generated on stack.  However, certain platforms
> (such as the Apple M1) do not permit an executable stack.  Compiling with
> @option{-ftrampoline-impl=heap} generate calls to
> -@code{__builtin_nested_func_ptr_created} and
> -@code{__builtin_nested_func_ptr_deleted} in order to allocate and
> +@code{__gcc_nested_func_ptr_created} and
> +@code{__gcc_nested_func_ptr_deleted} in order to allocate and
> deallocate trampoline space on the executable heap.  These functions are
> implemented in libgcc, and will only be provided on specific targets:
> x86_64 Darwin, x86_64 and aarch64 Linux.  @emph{PLEASE NOTE}: Heap
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 8aee3ef18d8..6fa99ad7fe4 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -9934,15 +9934,15 @@ build_common_builtin_nodes (void)
>   ptr_type_node, // void *func
>   ptr_ptr_type_node, // void **dst
>   NULL_TREE);
> -  local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
> +  local_define_builtin ("__gcc_nested_func_ptr_created", ftype,
>   BUILT_IN_NESTED_PTR_CREATED,
> - "__builtin_nested_func_ptr_created", ECF_NOTHROW);
> + "__gcc_nested_func_ptr_created", ECF_NOTHROW);
> 
>   ftype = build_function_type_list (void_type_node,
>   NULL_TREE);
> -  local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
> +  local_define_builtin ("__gcc_nested_func_ptr_deleted", ftype,
>   BUILT_IN_NESTED_PTR_DELETED,
> - "__builtin_nested_func_ptr_deleted", ECF_NOTHROW);
> + "_

Re: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Jakub Jelinek

On Tue, Jan 16, 2024 at 07:29:04PM -0800, Andrew Pinski wrote:
> So the problem here is that aarch64_ldp_reg_operand will all subreg even 
> subreg of lo_sum.
> When LRA tries to fix that up, all things break. So the fix is to change the 
> check to only
> allow reg and subreg of regs.
> 
> Note the tendancy here is to use register_operand but that checks the mode of 
> the register
> but we need to allow a mismatch modes for this predicate for now.
> 
> Built and tested for aarch64-linux-gnu with no regressions
> (Also tested with the LD/ST pair pass back on).
> 
>   PR target/113221
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
>   only allow REG operands isntead of allowing all.

s/isntead/instead/
Otherwise I defer to AArch64 maintainers.

Jakub

Re: [PATCH] libsanitizer: Replace memcpy with internal version in sanitizer_common

2024-01-17 Thread Jakub Jelinek

On Wed, Jan 17, 2024 at 09:17:09AM +0100, Daniel Cederman wrote:
> On 2024-01-16 15:44, Jakub Jelinek wrote:
> > On Tue, Jan 16, 2024 at 03:11:39PM +0100, Daniel Cederman wrote:
> > > When GCC is configured with --enable-target-optspace the compiler 
> > > generates
> > > a memcpy call in the Symbolizer constructor in sanitizer_symbolizer.cpp
> > > when compiling for SPARC V8. Add HAVE_AS_SYM_ASSIGN to replace it with a
> > > call to __sanitizer_internal_memcpy.
> > > 
> > > libsanitizer/ChangeLog:
> > > 
> > >   * sanitizer_common/Makefile.am (DEFS): Add @AS_SYM_ASSIGN_DEFS@.
> > >   * sanitizer_common/Makefile.in: Regenerate.
> > 
> > Ok.
> 
> We have only been granted write approval for the SPARC port. Is it
> ok to push this anyway or could you help us with it?

I don't think anyone has write access only to a part of the git repository.
So, if you are Write After Approval (which seems you are according to
MAINTAINERS), you should be able to commit any patch approved by
maintainers or reviewers.

Jakub

Re: [PATCH v2] Fix __builtin_nested_func_ptr_{created,deleted} symbol versions [PR113402]

2024-01-17 Thread Jakub Jelinek

On Wed, Jan 17, 2024 at 09:04:08AM +, Iain Sandoe wrote:
> > On 17 Jan 2024, at 08:55, Iain Sandoe  wrote:
> > 
> > Tested on x86_64, aarch64 Darwin21 (which default to heap-based trampolines)
> > and on x86_64 Darwin19 and Linux (which default to executable stack
> > trampolines).
> > OK for trunk?
> 
> Hmm.. maybe this is not right and the builtins should still be named 
> __builtin (with
> the fallback function only renamed) or alternatively, add these as libfuncs 
> only?
> 
> > Iain
> > 
> > --- 8< ---
> > 
> > The symbols for the functions supporting heap-based trampolines were
> > exported at an incorrect symbol version, the following patch fixes that.
> > 
> > As requested in the PR, this also renames __builtin_nested_func_ptr* to
> > __gcc_nested_func_ptr*.
> > 
> > PR libgcc/113402
> > 
> > gcc/ChangeLog:
> > 
> > * builtins.def
> > (BUILT_IN_NESTED_PTR_CREATED): Rename __builtin_nested_func_ptr_created
> > to __gcc_nested_func_ptr_created.
> > (BUILT_IN_NESTED_PTR_DELETED): Rename __builtin_nested_func_ptr_deleted
> > to __gcc_nested_func_ptr_deleted.

The normal way would be call the builtins in the compiler
__builtin___gcc_nested_func_ptr_*
and expand them to the __gcc_nested_func_ptr_* calls.
See e.g. __builtin___clear_cache.

Jakub

[PATCH] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong

This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible 
check
for conflict vsetvl fusion.

Buggy assembler before this patch:

.L69:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
j   .L37
.L68:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
addia3,a5,8
vmv.v.i v1,0
vse8.v  v1,0(a5)
vse8.v  v1,0(a3)
addia4,a4,-16
li  a3,8
bltua4,a3,.L37
j   .L69
.L67:
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
vse8.v  v1,0(a3)
addia4,a4,-32
li  a3,16
bltua4,a3,.L36
j   .L68

After this patch:

.L63:
ble s1,zero,.L49
sllia4,s1,3
li  a3,32
addia5,sp,48
bltua4,a3,.L62
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
addia4,a4,-32
addia5,sp,80
vse8.v  v1,0(a3)
.L35:
li  a3,16
bltua4,a3,.L36
addia3,a5,8
vmv.v.i v1,0
addia4,a4,-16
vse8.v  v1,0(a5)
addia5,a5,16
vse8.v  v1,0(a3)
.L36:
li  a3,8
bltua4,a3,.L37
vmv.v.i v1,0
vse8.v  v1,0(a5)

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113429

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug of conflict vsetvl fusion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/spec2017_cam4/ppgrid.mod: New test.
* gcc.target/riscv/rvv/spec2017_cam4/shr_kind_mod.mod: New test.
* gcc.target/riscv/rvv/spec2017_cam4/pr113429.f90: New test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-5.c: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  39 ---
 .../rvv/fortran/spec2017_cam4/ppgrid.mod  | Bin 0 -> 296 bytes
 .../rvv/fortran/spec2017_cam4/pr113429.f90| 110 ++
 .../fortran/spec2017_cam4/shr_kind_mod.mod| Bin 0 -> 499 bytes
 .../gcc.target/riscv/rvv/rvv-fortran.exp  |   2 +
 .../riscv/rvv/vsetvl/vlmax_conflict-4.c   |   5 +-
 .../riscv/rvv/vsetvl/vlmax_conflict-5.c   |  10 +-
 7 files changed, 140 insertions(+), 26 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/ppgrid.mod
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/pr113429.f90
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/shr_kind_mod.mod

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df7ed149388..76e3d2eb471 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2254,6 +2254,22 @@ private:
 return true;
   }
 
+  bool has_compatible_reaching_vsetvl_p (vsetvl_info info)
+  {
+unsigned int index;
+sbitmap_iterator sbi;
+EXECUTE_IF_SET_IN_BITMAP (m_vsetvl_def_in[info.get_bb ()->index ()], 0,
+ index, sbi)
+  {
+   const auto prev_info = *m_vsetvl_def_exprs[index];
+   if (!prev_info.valid_p ())
+ continue;
+   if (m_dem.compatible_p (prev_info, info))
+ return true;
+  }
+return false;
+  }
+
   bool preds_all_same_avl_and_ratio_p (const vsetvl_info &curr_info)
   {
 gcc_assert (
@@ -3075,22 +3091,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
{
  vsetvl_info new_curr_info = curr_info;
  new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
- bool has_compatible_p = false;
- unsigned int def_expr_index;
- sbitmap_iterator sbi2;
- EXECUTE_IF_SET_IN_BITMAP (
-   m_vsetvl_def_in[new_curr_info.get_bb ()->index ()], 0,
-   def_expr_index, sbi2)
-   {
- vsetvl_info &prev_info = *m_vsetvl_def_exprs[def_expr_index];
- if (!prev_info.valid_p ())
-   continue;
- if (m_dem.compatible_p (prev_info, new_curr_info))
-   {
- has_compatible_p = true;
- break;
-   }
-   }
+ bool has_compatible_p
+   = has_compatible_reaching_vsetvl_p (new_curr_info);
  if (!has_compatible_p)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3146,7 +3148,10 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-17 Thread Ajit Agarwal

Hello Kewen:

On 17/01/24 12:32 pm, Kewen.Lin wrote:
> on 2024/1/16 06:22, Ajit Agarwal wrote:
>> Hello Richard:
>>
>> On 15/01/24 6:25 pm, Ajit Agarwal wrote:
>>>
>>>
>>> On 15/01/24 6:14 pm, Ajit Agarwal wrote:
 Hello Richard:

 On 15/01/24 3:03 pm, Richard Biener wrote:
> On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal  
> wrote:
>>
>> Hello All:
>>
>> This patch add the vecload pass to replace adjacent memory accesses lxv 
>> with lxvp
>> instructions. This pass is added before ira pass.
>>
>> vecload pass removes one of the defined adjacent lxv (load) and replace 
>> with lxvp.
>> Due to removal of one of the defined loads the allocno is has only uses 
>> but
>> not defs.
>>
>> Due to this IRA pass doesn't assign register pairs like registers in 
>> sequence.
>> Changes are made in IRA register allocator to assign sequential 
>> registers to
>> adjacent loads.
>>
>> Some of the registers are cleared and are not set as profitable 
>> registers due
>> to zero cost is greater than negative costs and checks are added to 
>> compare
>> positive costs.
>>
>> LRA register is changed not to reassign them to different register and 
>> form
>> the sequential register pairs intact.
>>
>> contrib/check_GNU_style.sh run on patch looks good.
>>
>> Bootstrapped and regtested for powerpc64-linux-gnu.
>>
>> Spec2017 benchmarks are run and I get impressive benefits for some of 
>> the FP
>> benchmarks.
> i
> I want to point out the aarch64 target recently got a ld/st fusion
> pass which sounds
> related.  It would be nice to have at least common infrastructure for
> this (the aarch64
> one also looks quite more powerful)
> 
> Thank Richi for pointing out this pass.  Yeah, it would be nice if we can 
> share
> something common.  CC the author Alex as well in case he have more insightful
> comments.
> 

 load/store fusion pass in aarch64 is scheduled to use before peephole2 
 pass 
 and after register allocator pass. In our case, if we do after register 
 allocator
 then we should keep register assigned to lower offset load and other load
 that is adjacent to previous load with offset difference of 16 is removed.

 Then we are left with one load with lower offset and register assigned 
 by register allocator for lower offset load should be lower than other
 adjacent load. If not, we need to change it to lower register and 
 propagate them with all the uses of the variable. Similary for other
 adjacent load that we are removing, register needs to be propagated to
 all the uses.

 In that case we are doing the work of register allocator. In most of our
 example testcases the lower offset load is assigned greater register 
 than other adjacent load by register allocator and hence we are left
 with propagating them always and almost redoing the register allocator
 work.

 Is it same/okay to use load/store fusion pass as on aarch64 for our cases
 considering the above scenario.

 Please let me know what do you think. 
>>
>> I have gone through the implementation of ld/st fusion in aarch64.
>>
>> Here is my understanding:
>>
>> First all its my mistake that I have mentioned in my earlier mail that 
>> this pass is done before peephole2 after RA-pass.
>>
>> This pass does it before RA-pass early before early-remat and 
>> also before peephole2 after RA-pass.
>>
>> This pass does load fusion 2 ldr instruction with adjacent accesses
>> into ldp instruction.
>>
>> The assembly syntax of ldp instruction is
>>
>> ldp w3, w7, [x0]
>>
>> It loads [X0] into w3 and [X0+4] into W7.
>>
>> Both registers that forms pairs are mentioned in ldp instructions
>> and might not be in sequntial order like first register is W3 and
>> then next register would be W3+1.
>>
>> Thats why the pass before RA-pass works as it has both the defs
>> and may not be required in sequential order like first_reg and then
>> first_reg+1. It can be any valid registers.
>>
>>
>> But in lxvp instructions:
>>
>> lxv vs32, 0(r2)
>> lxv vs45, 16(r2)
>>
>> When we combine above lxv instruction into lxvp, lxvp instruction
>> becomes
>>
>> lxvp vs32, 0(r2)
>>
>> wherein in lxvp  r2+0 is loaded into vs32 and r2+16 is loaded into vs33 
>> register (sequential registers). vs33 is hidden in lxvp instruction.
>> This is mandatory requirement for lxvp instruction and cannot be in 
>> any other sequence. register assignment difference should be 1.
> 
> Note that the first register number in the pair should be even, it
> means the so-called sequential order should be X, X + 1 (X is even).
> This is also the reason why we preferred this pairing to be done
> before RA (can catch more opportunities).
> 
>>
>> All the uses of r45 has to be propagated with r33.
> 
> I think you meant s/r45/vs45/ and s/r33/v

[commit] Sanitizer/MIPS: Use $t9 for preemptible function call

2024-01-17 Thread YunQiang Su

From: YunQiang Su 

Currently, almost all of the shared libraries of MIPS, rely on $t9
to get the address of current function, instead of PCREL instructions,
even on MIPSr6. So we have to set $t9 properly.

To get the address of preemptible function, we need the help of GOT.
MIPS/O32 has .cpload, which can help to generate 3 instructions to get GOT.
For __mips64, we can get GOT by:

lui $t8, %hi(%neg(%gp_rel(SANITIZER_STRINGIFY(TRAMPOLINE(func)
daddu $t8, $t8, $t9
daddiu $t8, $t8, %hi(%neg(%gp_rel(SANITIZER_STRINGIFY(TRAMPOLINE(func)

And then get the address of __interceptor_func, and jump to it

ld $t9, %got_disp(_interceptor" SANITIZER_STRINGIFY(func) ")($t8)
jr $t9

Upstream-Commit: 0a64367a72f1634321f5051221f05f2f364bd882

libsanitizer

* interception/interception.h (substitution_##func_name):
Use macro C_ASM_TAIL_CALL.
* sanitizer_common/sanitizer_asm.h: Define C_ASM_TAIL_CALL
for MIPS with help of t9.
---
 libsanitizer/interception/interception.h  |  5 ++--
 libsanitizer/sanitizer_common/sanitizer_asm.h | 23 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/interception/interception.h 
b/libsanitizer/interception/interception.h
index 9d8b60b2eef..58e969378a9 100644
--- a/libsanitizer/interception/interception.h
+++ b/libsanitizer/interception/interception.h
@@ -205,8 +205,9 @@ const interpose_substitution substitution_##func_name[] 
\
  ASM_TYPE_FUNCTION_STR "\n"
\
SANITIZER_STRINGIFY(TRAMPOLINE(func)) ":\n" 
\
SANITIZER_STRINGIFY(CFI_STARTPROC) "\n" 
\
-   SANITIZER_STRINGIFY(ASM_TAIL_CALL) " __interceptor_"
\
- SANITIZER_STRINGIFY(ASM_PREEMPTIBLE_SYM(func)) "\n"   
\
+   C_ASM_TAIL_CALL(SANITIZER_STRINGIFY(TRAMPOLINE(func)),  
\
+   "__interceptor_"
\
+ SANITIZER_STRINGIFY(ASM_PREEMPTIBLE_SYM(func))) "\n"  
\
SANITIZER_STRINGIFY(CFI_ENDPROC) "\n"   
\
".size  " SANITIZER_STRINGIFY(TRAMPOLINE(func)) ", "
\
 ".-" SANITIZER_STRINGIFY(TRAMPOLINE(func)) "\n"
\
diff --git a/libsanitizer/sanitizer_common/sanitizer_asm.h 
b/libsanitizer/sanitizer_common/sanitizer_asm.h
index bbb18cfbdf1..3af66a4e449 100644
--- a/libsanitizer/sanitizer_common/sanitizer_asm.h
+++ b/libsanitizer/sanitizer_common/sanitizer_asm.h
@@ -53,6 +53,29 @@
 # define ASM_TAIL_CALL tail
 #endif
 
+// Currently, almost all of the shared libraries rely on the value of
+// $t9 to get the address of current function, instead of PCREL, even
+// on MIPSr6. To be compatiable with them, we have to set $t9 properly.
+// MIPS uses GOT to get the address of preemptible functions.
+#if defined(__mips64)
+#  define C_ASM_TAIL_CALL(t_func, i_func)   \
+"lui $t8, %hi(%neg(%gp_rel(" t_func ")))\n" \
+"daddu $t8, $t8, $t9\n" \
+"daddiu $t8, $t8, %lo(%neg(%gp_rel(" t_func ")))\n" \
+"ld $t9, %got_disp(" i_func ")($t8)\n"  \
+"jr $t9\n"
+#elif defined(__mips__)
+#  define C_ASM_TAIL_CALL(t_func, i_func)   \
+".setnoreorder\n"   \
+".cpload $t9\n" \
+".setreorder\n" \
+"lw $t9, %got(" i_func ")($gp)\n"   \
+"jr $t9\n"
+#elif defined(ASM_TAIL_CALL)
+#  define C_ASM_TAIL_CALL(t_func, i_func)   \
+SANITIZER_STRINGIFY(ASM_TAIL_CALL) " " i_func
+#endif
+
 #if defined(__ELF__) && defined(__x86_64__) || defined(__i386__) || \
 defined(__riscv)
 # define ASM_PREEMPTIBLE_SYM(sym) sym@plt
-- 
2.39.2

[PATCH] ipa-strub: Fix handling of _BitInt returns [PR113406]

2024-01-17 Thread Jakub Jelinek

Hi!

Seems pass_ipa_strub::execute contains a copy of the expand_thunk
code I've changed for _BitInt in r14-6805 PR112941 - larger _BitInts
are aggregate_value_p even when they are is_gimple_reg_type.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2024-01-17  Jakub Jelinek  

PR middle-end/113406
* ipa-strub.cc (pass_ipa_strub::execute): Check aggregate_value_p
regardless of whether is_gimple_reg_type (restype) or not.

* gcc.dg/bitint-70.c: New test.

--- gcc/ipa-strub.cc.jj 2024-01-03 11:51:28.374775006 +0100
+++ gcc/ipa-strub.cc2024-01-16 10:51:03.987463928 +0100
@@ -3174,21 +3174,16 @@ pass_ipa_strub::execute (function *)
   resdecl,
   build_int_cst (TREE_TYPE (resdecl), 0));
  }
-   else if (!is_gimple_reg_type (restype))
+   else if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
  {
-   if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
- {
-   restmp = resdecl;
+   restmp = resdecl;
 
-   if (VAR_P (restmp))
- {
-   add_local_decl (cfun, restmp);
-   BLOCK_VARS (DECL_INITIAL (current_function_decl))
- = restmp;
- }
+   if (VAR_P (restmp))
+ {
+   add_local_decl (cfun, restmp);
+   BLOCK_VARS (DECL_INITIAL (current_function_decl))
+ = restmp;
  }
-   else
- restmp = create_tmp_var (restype, "retval");
  }
else
  restmp = create_tmp_reg (restype, "retval");
--- gcc/testsuite/gcc.dg/bitint-70.c.jj 2024-01-16 11:01:48.300524130 +0100
+++ gcc/testsuite/gcc.dg/bitint-70.c2024-01-16 11:01:19.456924333 +0100
@@ -0,0 +1,14 @@
+/* PR middle-end/113406 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -fstrub=internal" } */
+/* { dg-require-effective-target strub } */
+
+#if __BITINT_MAXWIDTH__ >= 146
+_BitInt(146)
+#else
+_BitInt(16)
+#endif
+foo (void)
+{
+  return 0;
+}

Jakub

[PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling [PR113408]

2024-01-17 Thread Jakub Jelinek

Hi!

Unlike NOP_EXPR/CONVERT_EXPR which are GIMPLE_UNARY_RHS, VIEW_CONVERT_EXPR
is GIMPLE_SINGLE_RHS and so gimple_assign_rhs1 contains the operand wrapped
in VIEW_CONVERT_EXPR tree.

So, to handle it like other casts we need to look through it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-17  Jakub Jelinek  

PR tree-optimization/113408
* gimple-lower-bitint.cc (bitint_large_huge::handle_stmt): For
VIEW_CONVERT_EXPR, pass TREE_OPERAND (rhs1, 0) rather than rhs1
to handle_cast.

* gcc.dg/bitint-71.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-15 17:34:00.0 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-16 12:32:56.617721208 +0100
@@ -1975,9 +1975,12 @@ bitint_large_huge::handle_stmt (gimple *
case INTEGER_CST:
  return handle_operand (gimple_assign_rhs1 (stmt), idx);
CASE_CONVERT:
-   case VIEW_CONVERT_EXPR:
  return handle_cast (TREE_TYPE (gimple_assign_lhs (stmt)),
  gimple_assign_rhs1 (stmt), idx);
+   case VIEW_CONVERT_EXPR:
+ return handle_cast (TREE_TYPE (gimple_assign_lhs (stmt)),
+ TREE_OPERAND (gimple_assign_rhs1 (stmt), 0),
+ idx);
default:
  break;
}
--- gcc/testsuite/gcc.dg/bitint-71.c.jj 2024-01-16 12:38:16.679239526 +0100
+++ gcc/testsuite/gcc.dg/bitint-71.c2024-01-16 12:37:24.724967020 +0100
@@ -0,0 +1,18 @@
+/* PR tree-optimization/113408 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 713
+struct A { _BitInt(713) b; } g;
+#else
+struct A { _BitInt(49) b; } g;
+#endif
+int f;
+
+void
+foo (void)
+{
+  struct A j = g;
+  if (j.b)
+f = 0;
+}

Jakub

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-17 Thread Xi Ruoyao

On Wed, 2024-01-17 at 17:38 +0800, chenglulu wrote:
> 
> 在 2024/1/13 下午9:05, Xi Ruoyao 写道:
> > 在 2024-01-13星期六的 15:01 +0800，chenglulu写道：
> > > 在 2024/1/12 下午7:42, Xi Ruoyao 写道:
> > > > 在 2024-01-12星期五的 09:46 +0800，chenglulu写道：
> > > > 
> > > > > > I found an issue bootstrapping GCC with -mcmodel=extreme in 
> > > > > > BOOT_CFLAGS:
> > > > > > we need a target hook to tell the generic code
> > > > > > UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or 
> > > > > > we'll
> > > > > > see millions lines of messages like
> > > > > > 
> > > > > > ../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
> > > > > > UNSPEC_LA_PCREL_64_PART1 (42) found in variable location
> > > > > I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't 
> > > > > reproduced the problem you mentioned.
> > > > > 
> > > > >   $ ../configure --host=loongarch64-linux-gnu 
> > > > > --target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \
> > > > >   --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
> > > > > --enable-languages=c,c++,fortran,lto --enable-plugin \
> > > > >   --disable-multilib --disable-host-shared --enable-bootstrap 
> > > > > --enable-checking=release
> > > > >   $ make BOOT_FLAGS="-mcmodel=extreme"
> > > > > 
> > > > > What did I do wrong?:-(
> > > > BOOT_CFLAGS, not BOOT_FLAGS :).
> > > > 
> > > This is so strange. My compilation here stopped due to syntax problems,
> > > 
> > > and I still haven't reproduced the information you mentioned about
> > > UNSPEC_LA_PCREL_64_PART1.
> > I used:
> > 
> > ../gcc/configure --with-system-zlib --disable-fixincludes \
> >   --enable-default-ssp --enable-default-pie \
> >   --disable-werror --disable-multilib \
> >   --prefix=/home/xry111/gcc-dev
> > 
> > and then
> > 
> > make STAGE1_{C,CXX}FLAGS="-O2 -g" -j8 \
> >   BOOT_{C,CXX}FLAGS="-O2 -g -mcmodel=extreme" &| tee gcc-build.log
> > 
> > I guess "-g" is needed to reproduce the issue as well as the messages
> > were produced in dwarf generation.
> > 
> I have reproduced this problem, and it can be solved by adding a hook.
> 
> But unfortunately, when using '-mcmodel=extreme -mexplicit-relocs=always'
> 
> to test spec2006 403.gcc, an error will occur. Others have not been 
> tested yet.
> 
> I roughly debugged it, and the problem should be this:
> 
> The problem is that the address of the instruction ‘ldx.d $r12, $r25, 
> $r6’ is wrong.
> 
> Wrong assembly:
> 
>     5826 pcalau12i   $r13,%got_pc_hi20(recog_data)
>   5827 addi.d  $r12,$r0,%got_pc_lo12(recog_data)
>   5828 lu32i.d $r12,%got64_pc_lo20(recog_data)
>   5829 lu52i.d $r12,$r12,%got64_pc_hi12(recog_data)
>   5830 ldx.d   $r12,$r13,$r12
>   5831 ld.b    $r8,$r12,997
>   5832 .loc 1 829 18 discriminator 1 view .LVU1527
>   5833 ble $r8,$r0,.L476
>   5834 ld.d    $r6,$r3,16
>   5835 ld.d    $r9,$r3,88
>   5836 .LBB189 = .
>   5837 .loc 1 839 24 view .LVU1528
>   5838 alsl.d  $r7,$r19,$r19,2
>   5839 ldx.d   $r12,$r25,$r6
>   5840 addi.d  $r17,$r3,120
>   5841 .LBE189 = .
>   5842 .loc 1 829 18 discriminator 1 view .LVU1529
>   5843 or  $r13,$r0,$r0
>   5844 addi.d  $r4,$r12,992
> 
> Assembly that works fine using macros:
> 
> 3040 la.global   $r12,$r13,recog_data
> 3041 ld.b    $r9,$r12,997
> 3042 ble $r9,$r0,.L475
> 3043 alsl.d  $r5,$r16,$r16,2
> 3044 la.global   $r15,$r17,recog_data
> 3045 addi.d  $r4,$r12,992
> 3046 addi.d  $r18,$r3,48
> 3047 or  $r13,$r0,$r0
> 
> Comparing the assembly, we can see that lines 5844 and 3045 have the 
> same function,
> 
> but there is a problem with the base address register optimization at 
> line 5844.
> 
> regrename.c.283r.loop2_init:
> 
> (insn 6 497 2741 34 (set (reg:DI 180 [ ivtmp.713D.15724 ])
>  (const_int 0 [0])) "regrename.c":829:18 discrim 1 156 
> {*movdi_64bit}
> (nil))
> (insn 2741 6 2744 34 (parallel [
>  (set (reg:DI 1502)
>  (unspec:DI [
>  (symbol_ref:DI ("recog_data") [flags 0xc0]  
> )
>  ] UNSPEC_LA_PCREL_64_PART1))
>  (set (reg/f:DI 1479)
>  (unspec:DI [
>  (symbol_ref:DI ("recog_data") [flags 0xc0]  
> )
>  ] UNSPEC_LA_PCREL_64_PART2))
>  ]) -1
>   (expr_list:REG_UNUSED (reg/f:DI 1479)
> (nil)))
> (insn 2744 2741 2745 34 (set (reg/f:DI 1503)
>  (mem:DI (plus:DI (reg/f:DI 1479)
>  (reg:DI 1502)) [0  S8 A8])) 156 {*movdi_64bit}
>   (expr_list:REG_EQUAL (symbol_ref:DI ("recog_data") [flags 0xc0] 
> )
> (nil)))
> 
> 
> Virtual register 1479 will be used in insn 2744, but register 1479 was
> assigned the REG_UNUSED attribute in the previous instruction.
> 
> The attached file is the wrong file.

[PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Monk Chiang

This allows the backend to generate movcc instructions, if target
machine has movcc pattern.

branchless-cond.c needs to be updated since some target machines have
conditional move instructions, and the experssion will not change to
branchless expression.

gcc/ChangeLog:
PR target/113095
* match.pd (`(zero_one == 0) ? y : z  y`,
`(zero_one != 0) ? z  y : y`): Do not match to branchless
expression, if target machine has conditional move pattern.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
---
 gcc/match.pd  | 30 +--
 .../gcc.dg/tree-ssa/branchless-cond.c |  6 ++--
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index e42ecaf9ec7..a1f90b1cd41 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (mult (convert:type @0) @2) @1
+   (with {
+  bool can_movecc_p = false;
+  if (can_conditionally_move_p (TYPE_MODE (type)))
+   can_movecc_p = true;
+
+  /* Some target only support word_mode for movcc pattern, if type can
+extend to word_mode then use conditional move. Even if there is a
+extend instruction, the cost is lower than branchless.  */
+  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+ && can_conditionally_move_p (word_mode))
+   can_movecc_p = true;
+}
+(if (!can_movecc_p)
+ (op (mult (convert:type @0) @2) @1))
 
 /* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior plus)
@@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (mult (convert:type @0) @2) @1
+   (with {
+  bool can_movecc_p = false;
+  if (can_conditionally_move_p (TYPE_MODE (type)))
+   can_movecc_p = true;
+
+  /* Some target only support word_mode for movcc pattern, if type can
+extend to word_mode then use conditional move. Even if there is a
+extend instruction, the cost is lower than branchless.  */
+  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+ && can_conditionally_move_p (word_mode))
+   can_movecc_p = true;
+}
+(if (!can_movecc_p)
+ (op (mult (convert:type @0) @2) @1))
 
 /* ?: Value replacement. */
 /* a == 0 ? b : b + a  -> b + a */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
index e063dc4bb5f..c002ed97364 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
   return ((x & 1) != 0) ? z | y : y;
 }
 
-/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { 
"aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" 
} } } } */
+/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { 
"aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" 
} } } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* 
alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
-- 
2.40.1

[committed] openmp: Add OpenMP _BitInt support [PR113409]

2024-01-17 Thread Jakub Jelinek

Hi!

The following patch adds support for _BitInt iterators of OpenMP canonical
loops (with the preexisting limitation that when not using compile time
static scheduling the iterators in the library are at most unsigned long long
or signed long, so one can't in the runtime/dynamic/guided etc. cases iterate
more than what those types can represent, like is the case of e.g. __int128
iterators too) and the testcase also covers linear/reduction clauses for them.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2024-01-17  Jakub Jelinek  

PR middle-end/113409
* omp-general.cc (omp_adjust_for_condition): Handle BITINT_TYPE like
INTEGER_TYPE.
(omp_extract_for_data): Use build_bitint_type rather than
build_nonstandard_integer_type if either iter_type or loop->v type
is BITINT_TYPE.
* omp-expand.cc (expand_omp_for_generic,
expand_omp_taskloop_for_outer, expand_omp_taskloop_for_inner): Handle
BITINT_TYPE like INTEGER_TYPE.

* testsuite/libgomp.c/bitint-1.c: New test.

--- gcc/omp-general.cc.jj   2024-01-04 09:10:56.590914073 +0100
+++ gcc/omp-general.cc  2024-01-16 16:08:15.160663134 +0100
@@ -115,7 +115,8 @@ omp_adjust_for_condition (location_t loc
 
 case NE_EXPR:
   gcc_assert (TREE_CODE (step) == INTEGER_CST);
-  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE)
+  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE
+ || TREE_CODE (TREE_TYPE (v)) == BITINT_TYPE)
{
  if (integer_onep (step))
*cond_code = LT_EXPR;
@@ -409,6 +410,7 @@ omp_extract_for_data (gomp_for *for_stmt
   loop->v = gimple_omp_for_index (for_stmt, i);
   gcc_assert (SSA_VAR_P (loop->v));
   gcc_assert (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE
+ || TREE_CODE (TREE_TYPE (loop->v)) == BITINT_TYPE
  || TREE_CODE (TREE_TYPE (loop->v)) == POINTER_TYPE);
   var = TREE_CODE (loop->v) == SSA_NAME ? SSA_NAME_VAR (loop->v) : loop->v;
   loop->n1 = gimple_omp_for_initial (for_stmt, i);
@@ -479,9 +481,17 @@ omp_extract_for_data (gomp_for *for_stmt
  else if (i == 0
   || TYPE_PRECISION (iter_type)
  < TYPE_PRECISION (TREE_TYPE (loop->v)))
-   iter_type
- = build_nonstandard_integer_type
- (TYPE_PRECISION (TREE_TYPE (loop->v)), 1);
+   {
+ if (TREE_CODE (iter_type) == BITINT_TYPE
+ || TREE_CODE (TREE_TYPE (loop->v)) == BITINT_TYPE)
+   iter_type
+ = build_bitint_type (TYPE_PRECISION (TREE_TYPE (loop->v)),
+  1);
+ else
+   iter_type
+ = build_nonstandard_integer_type
+   (TYPE_PRECISION (TREE_TYPE (loop->v)), 1);
+   }
}
   else if (iter_type != long_long_unsigned_type_node)
{
@@ -747,7 +757,8 @@ omp_extract_for_data (gomp_for *for_stmt
  if (t && integer_zerop (t))
count = build_zero_cst (long_long_unsigned_type_node);
  else if ((i == 0 || count != NULL_TREE)
-  && TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE
+  && (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE
+  || TREE_CODE (TREE_TYPE (loop->v)) == BITINT_TYPE)
   && TREE_CONSTANT (loop->n1)
   && TREE_CONSTANT (loop->n2)
   && TREE_CODE (loop->step) == INTEGER_CST)
--- gcc/omp-expand.cc.jj2024-01-03 11:51:39.095626210 +0100
+++ gcc/omp-expand.cc   2024-01-16 13:17:47.367928336 +0100
@@ -4075,7 +4075,7 @@ expand_omp_for_generic (struct omp_regio
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
-  && TREE_CODE (type) == INTEGER_TYPE
+  && (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == BITINT_TYPE)
   && !TYPE_UNSIGNED (type)
   && fd->ordered == 0)
 {
@@ -7191,7 +7191,7 @@ expand_omp_taskloop_for_outer (struct om
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
-  && TREE_CODE (type) == INTEGER_TYPE
+  && (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == BITINT_TYPE)
   && !TYPE_UNSIGNED (type))
 {
   tree n1, n2;
@@ -7352,7 +7352,7 @@ expand_omp_taskloop_for_inner (struct om
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
-  && TREE_CODE (type) == INTEGER_TYPE
+  && (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == BITINT_TYPE)
   && !TYPE_UNSIGNED (type))
 {
   tree n1, n2;
--- libgomp/testsuite/libgomp.c/bitint-1.c.jj   2024-01-16 13:47:24.880153301 
+0100
+++ libgomp/testsuite/libgomp.c/bitint-1.c  2024-01-16 16:04:43.242609845 
+0100
@@ -0,0 +1,65 @@
+/* PR middle-end/113409 */
+/* { dg-do run { target bitint } } */
+
+extern void abort (void)

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-17 Thread chenglulu




在 2024/1/17 下午5:50, Xi Ruoyao 写道:

On Wed, 2024-01-17 at 17:38 +0800, chenglulu wrote:

在 2024/1/13 下午9:05, Xi Ruoyao 写道:

在 2024-01-13星期六的 15:01 +0800，chenglulu写道：

在 2024/1/12 下午7:42, Xi Ruoyao 写道:

在 2024-01-12星期五的 09:46 +0800，chenglulu写道：


I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
we need a target hook to tell the generic code
UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll
see millions lines of messages like

../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
UNSPEC_LA_PCREL_64_PART1 (42) found in variable location

I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't reproduced the 
problem you mentioned.

   $ ../configure --host=loongarch64-linux-gnu 
--target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \
   --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
--enable-languages=c,c++,fortran,lto --enable-plugin \
   --disable-multilib --disable-host-shared --enable-bootstrap 
--enable-checking=release
   $ make BOOT_FLAGS="-mcmodel=extreme"

What did I do wrong?:-(

BOOT_CFLAGS, not BOOT_FLAGS :).


This is so strange. My compilation here stopped due to syntax problems,

and I still haven't reproduced the information you mentioned about
UNSPEC_LA_PCREL_64_PART1.

I used:

../gcc/configure --with-system-zlib --disable-fixincludes \
   --enable-default-ssp --enable-default-pie \
   --disable-werror --disable-multilib \
   --prefix=/home/xry111/gcc-dev

and then

make STAGE1_{C,CXX}FLAGS="-O2 -g" -j8 \
   BOOT_{C,CXX}FLAGS="-O2 -g -mcmodel=extreme" &| tee gcc-build.log

I guess "-g" is needed to reproduce the issue as well as the messages
were produced in dwarf generation.


I have reproduced this problem, and it can be solved by adding a hook.

But unfortunately, when using '-mcmodel=extreme -mexplicit-relocs=always'

to test spec2006 403.gcc, an error will occur. Others have not been
tested yet.

I roughly debugged it, and the problem should be this:

The problem is that the address of the instruction ‘ldx.d $r12, $r25,
$r6’ is wrong.

Wrong assembly:

     5826 pcalau12i   $r13,%got_pc_hi20(recog_data)
   5827 addi.d  $r12,$r0,%got_pc_lo12(recog_data)
   5828 lu32i.d $r12,%got64_pc_lo20(recog_data)
   5829 lu52i.d $r12,$r12,%got64_pc_hi12(recog_data)
   5830 ldx.d   $r12,$r13,$r12
   5831 ld.b    $r8,$r12,997
   5832 .loc 1 829 18 discriminator 1 view .LVU1527
   5833 ble $r8,$r0,.L476
   5834 ld.d    $r6,$r3,16
   5835 ld.d    $r9,$r3,88
   5836 .LBB189 = .
   5837 .loc 1 839 24 view .LVU1528
   5838 alsl.d  $r7,$r19,$r19,2
   5839 ldx.d   $r12,$r25,$r6
   5840 addi.d  $r17,$r3,120
   5841 .LBE189 = .
   5842 .loc 1 829 18 discriminator 1 view .LVU1529
   5843 or  $r13,$r0,$r0
   5844 addi.d  $r4,$r12,992

Assembly that works fine using macros:

3040 la.global   $r12,$r13,recog_data
3041 ld.b    $r9,$r12,997
3042 ble $r9,$r0,.L475
3043 alsl.d  $r5,$r16,$r16,2
3044 la.global   $r15,$r17,recog_data
3045 addi.d  $r4,$r12,992
3046 addi.d  $r18,$r3,48
3047 or  $r13,$r0,$r0

Comparing the assembly, we can see that lines 5844 and 3045 have the
same function,

but there is a problem with the base address register optimization at
line 5844.

regrename.c.283r.loop2_init:

(insn 6 497 2741 34 (set (reg:DI 180 [ ivtmp.713D.15724 ])
  (const_int 0 [0])) "regrename.c":829:18 discrim 1 156
{*movdi_64bit}
(nil))
(insn 2741 6 2744 34 (parallel [
  (set (reg:DI 1502)
  (unspec:DI [
  (symbol_ref:DI ("recog_data") [flags 0xc0]
)
  ] UNSPEC_LA_PCREL_64_PART1))
  (set (reg/f:DI 1479)
  (unspec:DI [
  (symbol_ref:DI ("recog_data") [flags 0xc0]
)
  ] UNSPEC_LA_PCREL_64_PART2))
  ]) -1
   (expr_list:REG_UNUSED (reg/f:DI 1479)
(nil)))
(insn 2744 2741 2745 34 (set (reg/f:DI 1503)
  (mem:DI (plus:DI (reg/f:DI 1479)
  (reg:DI 1502)) [0  S8 A8])) 156 {*movdi_64bit}
   (expr_list:REG_EQUAL (symbol_ref:DI ("recog_data") [flags 0xc0]
)
(nil)))


Virtual register 1479 will be used in insn 2744, but register 1479 was
assigned the REG_UNUSED attribute in the previous instruction.

The attached file is the wrong file.
The compilation command is as follows:

$ ./gcc/cc1 -fpreprocessed regrename.i -quiet -dp -dumpbase regrename.c
-dumpbase-ext .c -mno-relax -mabi=lp64d -march=loongarch64 -mfpu=64
-msimd=lasx -mcmodel=extreme -mtune=loongarch64 -g3 -O2
-Wno-int-conversion -Wno-implicit-int -Wno-implicit-function-declaration
-Wno-incompatible-pointer-types -version -o regrename.s
-mexplicit-relocs=always -fdump-rtl-all-all

I've seen some "guality" test failure

[PATCH] lower-bitint: Avoid overlap between destinations and sources in libgcc calls [PR113421]

2024-01-17 Thread Jakub Jelinek

Hi!

The following testcase is miscompiled because the bitint lowering emits a
  .MULBITINT (&a, 1024, &a, 1024, &x, 1024);
call.  The bug is in the overlap between the destination and source, that is
something the libgcc routines don't handle, they use the source arrays
during the entire algorithms which computes the destination array(s).
For the mapping of SSA_NAMEs to VAR_DECLs the code already supports that
correctly, but the checking whether a load from memory can be used directly
without a temporary even when earlier we decided to merge the
multiplication/division/modulo etc. with a store didn't.

The following patch implements that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-17  Jakub Jelinek  

PR tree-optimization/113421
* gimple-lower-bitint.cc (stmt_needs_operand_addr): Adjust function
comment.
(bitint_dom_walker::before_dom_children): Add g temporary to simplify
formatting.  Start at vop rather than cvop even if stmt is a store
and needs_operand_addr.

* gcc.dg/torture/bitint-50.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-16 12:32:56.617721208 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-16 17:33:04.046476302 +0100
@@ -5455,7 +5455,8 @@ vuse_eq (ao_ref *, tree vuse1, void *dat
 
 /* Return true if STMT uses a library function and needs to take
address of its inputs.  We need to avoid bit-fields in those
-   cases.  */
+   cases.  Similarly, we need to avoid overlap between destination
+   and source limb arrays.  */
 
 bool
 stmt_needs_operand_addr (gimple *stmt)
@@ -5574,7 +5575,8 @@ bitint_dom_walker::before_dom_children (
  else if (!bitmap_bit_p (m_loads, SSA_NAME_VERSION (s)))
continue;
 
- tree rhs1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s));
+ gimple *g = SSA_NAME_DEF_STMT (s);
+ tree rhs1 = gimple_assign_rhs1 (g);
  if (needs_operand_addr
  && TREE_CODE (rhs1) == COMPONENT_REF
  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (rhs1, 1)))
@@ -5596,15 +5598,14 @@ bitint_dom_walker::before_dom_children (
 
  ao_ref ref;
  ao_ref_init (&ref, rhs1);
- tree lvop = gimple_vuse (SSA_NAME_DEF_STMT (s));
+ tree lvop = gimple_vuse (g);
  unsigned limit = 64;
  tree vuse = cvop;
  if (vop != cvop
  && is_gimple_assign (stmt)
  && gimple_store_p (stmt)
- && !operand_equal_p (lhs,
-  gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s)),
-  0))
+ && (needs_operand_addr
+ || !operand_equal_p (lhs, gimple_assign_rhs1 (g), 0)))
vuse = vop;
  if (vuse != lvop
  && walk_non_aliased_vuses (&ref, vuse, false, vuse_eq,
--- gcc/testsuite/gcc.dg/torture/bitint-50.c.jj 2024-01-16 17:35:16.084622119 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-50.c2024-01-16 17:35:06.701753879 
+0100
@@ -0,0 +1,31 @@
+/* PR tree-optimization/113421 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 1024
+unsigned _BitInt(1024) a = -5wb;
+
+__attribute__((noipa)) void
+foo (unsigned _BitInt(1024) x)
+{
+  a *= x;
+}
+#else
+int a = 30;
+
+void
+foo (int)
+{
+}
+#endif
+
+int
+main ()
+{
+  foo (-6wb);
+  if (a != 30wb)
+__builtin_abort ();
+  return 0;
+}

Jakub

[PATCH] libstdc++: Update baseline symbols for riscv64-linux

2024-01-17 Thread Andreas Schwab

* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.
---
 .../abi/post/riscv64-linux-gnu/baseline_symbols.txt  | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt
index 5ee7f5a0460..a37a0b9a0c9 100644
--- a/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt
@@ -497,7 +497,12 @@ FUNC:_ZNKSt11__timepunctIwE7_M_daysEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11__timepunctIwE8_M_am_pmEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11__timepunctIwE9_M_monthsEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11logic_error4whatEv@@GLIBCXX_3.4
+FUNC:_ZNKSt12__basic_fileIcE13native_handleEv@@GLIBCXX_3.4.32
 FUNC:_ZNKSt12__basic_fileIcE7is_openEv@@GLIBCXX_3.4
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
 FUNC:_ZNKSt12bad_weak_ptr4whatEv@@GLIBCXX_3.4.15
 FUNC:_ZNKSt12future_error4whatEv@@GLIBCXX_3.4.14
 FUNC:_ZNKSt12strstreambuf6pcountEv@@GLIBCXX_3.4
@@ -3210,6 +3215,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@@GLIBCX
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_S_compareEmm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_M_capacityEm@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_S_allocateERS3_m@@GLIBCXX_3.4.32
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcOS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC2EPcOS3_@@GLIBCXX_3.4.23
@@ -3362,6 +3368,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_disposeEv@@GLIBCX
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_replaceEmmPKwm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_S_compareEmm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_M_capacityEm@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_S_allocateERS3_m@@GLIBCXX_3.4.32
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwOS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC2EPwOS3_@@GLIBCXX_3.4.23
@@ -4523,6 +4530,7 @@ FUNC:__cxa_allocate_exception@@CXXABI_1.3
 FUNC:__cxa_bad_cast@@CXXABI_1.3
 FUNC:__cxa_bad_typeid@@CXXABI_1.3
 FUNC:__cxa_begin_catch@@CXXABI_1.3
+FUNC:__cxa_call_terminate@@CXXABI_1.3.15
 FUNC:__cxa_call_unexpected@@CXXABI_1.3
 FUNC:__cxa_current_exception_type@@CXXABI_1.3
 FUNC:__cxa_deleted_virtual@@CXXABI_1.3.6
@@ -4566,6 +4574,7 @@ OBJECT:0:CXXABI_1.3.11
 OBJECT:0:CXXABI_1.3.12
 OBJECT:0:CXXABI_1.3.13
 OBJECT:0:CXXABI_1.3.14
+OBJECT:0:CXXABI_1.3.15
 OBJECT:0:CXXABI_1.3.2
 OBJECT:0:CXXABI_1.3.3
 OBJECT:0:CXXABI_1.3.4
-- 
2.43.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

RE: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Kyrylo Tkachov




> -Original Message-
> From: Andrew Pinski 
> Sent: Wednesday, January 17, 2024 3:29 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Alex Coplan ; Andrew Pinski
> 
> Subject: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow
> all subreg [PR113221]
> 
> So the problem here is that aarch64_ldp_reg_operand will all subreg even 
> subreg
> of lo_sum.
> When LRA tries to fix that up, all things break. So the fix is to change the 
> check to
> only
> allow reg and subreg of regs.
> 
> Note the tendancy here is to use register_operand but that checks the mode of
> the register
> but we need to allow a mismatch modes for this predicate for now.
> 
> Built and tested for aarch64-linux-gnu with no regressions
> (Also tested with the LD/ST pair pass back on).

Ok with the comments from Alex addressed.
Thanks,
Kyrill

> 
>   PR target/113221
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
>   only allow REG operands isntead of allowing all.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.c-torture/compile/pr113221-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/predicates.md |  8 +++-
>  gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
>  2 files changed, 19 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> 
> diff --git a/gcc/config/aarch64/predicates.md
> b/gcc/config/aarch64/predicates.md
> index 8a204e48bb5..256268517d8 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -313,7 +313,13 @@ (define_predicate "pmode_plus_operator"
> 
>  (define_special_predicate "aarch64_ldp_reg_operand"
>(and
> -(match_code "reg,subreg")
> +(ior
> +  (match_code "reg")
> +  (and
> +   (match_code "subreg")
> +   (match_test "GET_CODE (SUBREG_REG (op)) == REG")
> +  )
> +)
>  (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
>  (ior
>(match_test "mode == VOIDmode")
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> new file mode 100644
> index 000..152a510786e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */
> +/* PR target/113221 */
> +/* This used to ICE after the `load/store pair fusion pass` was added
> +   due to the predicate aarch64_ldp_reg_operand allowing too much. */
> +
> +
> +void bar();
> +void foo(int* b) {
> +  for (;;)
> +*b++ = (long)bar;
> +}
> +
> --
> 2.39.3

[committed] Fix comment typos

2024-01-17 Thread Jakub Jelinek

Hi!

When looking at PR113410, I found a comment typo and just searched for
the same typo elsewhere and found some typos in the comments which had
that typo as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to
trunk as obvious.

2024-01-17  Jakub Jelinek  

* tree-into-ssa.cc (pass_build_ssa::gate): Fix comment typo,
funcions -> functions, and use were instead of was.
* gengtype.cc (dump_typekind): Fix comment typos, funcion -> function
and guaranteee -> guarantee.
* attribs.h (struct attr_access): Fix comment typo funcion -> function.

--- gcc/tree-into-ssa.cc.jj 2024-01-03 11:51:34.128695146 +0100
+++ gcc/tree-into-ssa.cc2024-01-16 18:57:38.136438943 +0100
@@ -2499,7 +2499,7 @@ public:
   /* opt_pass methods: */
   bool gate (function *fun) final override
 {
-  /* Do nothing for funcions that was produced already in SSA form.  */
+  /* Do nothing for functions that were produced already in SSA form.  */
   return !(fun->curr_properties & PROP_ssa);
 }
 
--- gcc/gengtype.cc.jj  2024-01-03 11:51:23.314845233 +0100
+++ gcc/gengtype.cc 2024-01-16 18:56:57.383009291 +0100
@@ -4718,8 +4718,8 @@ write_roots (pair_p variables, bool emit
 }
 
 /* Prints not-as-ugly version of a typename of T to OF.  Trades the uniquness
-   guaranteee for somewhat increased readability.  If name conflicts do happen,
-   this funcion will have to be adjusted to be more like
+   guarantee for somewhat increased readability.  If name conflicts do happen,
+   this function will have to be adjusted to be more like
output_mangled_typename.  */
 
 #define INDENT 2
--- gcc/attribs.h.jj2024-01-03 11:51:24.200832936 +0100
+++ gcc/attribs.h   2024-01-16 19:08:27.507350364 +0100
@@ -324,7 +324,7 @@ struct attr_access
  in TREE_VALUE and their positions in the argument list (stored
  in TREE_PURPOSE).  Each expression may be a PARM_DECL or some
  other DECL (for ordinary variables), or an EXPR for other
- expressions (e.g., funcion calls).  */
+ expressions (e.g., function calls).  */
   tree size;
 
   /* The zero-based position of each of the formal function arguments.

Jakub

[PATCH] gimple-ssa-warn-access: Cast huge params to sizetype before using them in maybe_check_access_sizes [PR113410]

2024-01-17 Thread Jakub Jelinek

Hi!

WHen a VLA is created with some very high precision size expression
(say __int128, or _BitInt(65535) etc.), we cast it to sizetype, because
we can't have arrays longer than what can be expressed in sizetype.

But the maybe_check_access_sizes code when trying to determine ranges
wasn't doing this but was using fixed buffers for the sizes.  While
__int128 could still be handled (fit into the buffers), obviously
arbitrary _BitInt parameter ranges can't, they can be in the range of
up to almost 20KB per number.  It doesn't make sense to print such
ranges though, no array can be larger than sizetype precision, and
ranger's range_of_expr can handle NOP_EXPRs/CONVERT_EXPRs wrapping a
PARM_DECL just fine, so the following patch just casts the excessively
large counters for the range determination purposes to sizetype.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-17  Jakub Jelinek  

PR middle-end/113410
* gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
If access_nelts is integral with larger precision than sizetype,
fold_convert it to sizetype.

* gcc.dg/bitint-72.c: New test.

--- gcc/gimple-ssa-warn-access.cc.jj2024-01-03 11:51:30.087751231 +0100
+++ gcc/gimple-ssa-warn-access.cc   2024-01-16 19:25:35.408958088 +0100
@@ -3406,6 +3406,15 @@ pass_waccess::maybe_check_access_sizes (
   else
access_nelts = rwm->get (sizidx)->size;
 
+  /* If access_nelts is e.g. a PARM_DECL with larger precision than
+sizetype, such as __int128 or _BitInt(34123) parameters,
+cast it to sizetype.  */
+  if (access_nelts
+ && INTEGRAL_TYPE_P (TREE_TYPE (access_nelts))
+ && (TYPE_PRECISION (TREE_TYPE (access_nelts))
+ > TYPE_PRECISION (sizetype)))
+   access_nelts = fold_convert (sizetype, access_nelts);
+
   /* Format the value or range to avoid an explosion of messages.  */
   char sizstr[80];
   tree sizrng[2] = { size_zero_node, build_all_ones_cst (sizetype) };
--- gcc/testsuite/gcc.dg/bitint-72.c.jj 2024-01-16 19:31:33.839938120 +0100
+++ gcc/testsuite/gcc.dg/bitint-72.c2024-01-16 19:31:06.000328741 +0100
@@ -0,0 +1,16 @@
+/* PR middle-end/113410 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23" } */
+
+#if __BITINT_MAXWIDTH__ >= 905
+void bar (_BitInt(905) n, int[n]);
+#else
+void bar (int n, int[n]);
+#endif
+
+void
+foo (int n)
+{
+  int buf[n];
+  bar (n, buf);
+}

Jakub

Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread Huanghui Nie

I'm sorry for CC the gcc@ list. Waiting for your review results there.
Thanks.

2024年1月17日(水) 16:18 Jonathan Wakely :

>
>
> On Wed, 17 Jan 2024, 08:14 Huanghui Nie via Gcc,  wrote:
>
>> Thanks. Done.
>>
>
> And don't CC the main gcc@ list, that's not for patch discussion. And if
> you CC the right list, you don't need to CC the individual maintainers.
>
> Anyway, it's on the right list now so we'll review it there, thanks.
>
>
>
>> 2024年1月17日(水) 12:39 Sam James :
>>
>> >
>> > Huanghui Nie  writes:
>> >
>> > > Hi.
>> >
>> > Please CC the libstdc++ LM for libstdc++ patches, per
>> >
>> >
>> https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html#list.patches
>> > .
>> >
>> > > [...]
>> >
>> >
>>
>

[committed] testsuite: Add testcase for already fixed PR [PR110251]

2024-01-17 Thread Jakub Jelinek

Hi!

This testcase started to hang at -O3 with r13-4208 and got fixed
with r14-2097.

Regtested on x86_64-linux and i686-linux, committed to trunk as obvious.

2024-01-17  Jakub Jelinek  

PR tree-optimization/110251
* gcc.c-torture/compile/pr110251.c: New test.

--- gcc/testsuite/gcc.c-torture/compile/pr110251.c.jj   2024-01-16 
20:39:50.605210933 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr110251.c  2024-01-16 
20:39:43.568310057 +0100
@@ -0,0 +1,27 @@
+/* PR tree-optimization/110251 */
+
+int a, b;
+signed char c;
+
+int
+foo (int e)
+{
+  if (e >= 'a')
+return e;
+}
+
+int
+bar (unsigned short e)
+{
+  for (; e; a++)
+e &= e - 1;
+}
+
+void
+baz (void)
+{
+  while (c < 1)
+;
+  for (; bar (c - 1); b = foo (c))
+;
+}

Jakub

Re: [PATCH 01/14] c++: Implement __is_integral built-in trait

2024-01-17 Thread Joseph Myers

On Wed, 17 Jan 2024, Jonathan Wakely wrote:

> So we can remove the dependency on __STRICT_ISO__ for 128-bit integer
> types, and implementing std::is_integral with a built-in seems like
> the perfect time to do that. But that seems like stage 1 material, as
> we need to go through the library and see what needs to change.

As noted on IRC, for C23 there would also be library issues in making 
__int128 an extended integer type.  If it's an extended integer type, then 
C23 would require  to define int128_t, uint128_t, int_least128_t 
and uint_least128_t, along with the macros INT128_WIDTH, UINT128_WIDTH, 
INT_LEAST128_WIDTH, UINT_LEAST128_WIDTH (trivial), and INT128_C and 
UINT128_C (require an integer constant suffix), and INT128_MAX, 
INT128_MIN, UINT128_MAX, INT_LEAST128_MAX, INT_LEAST128_MIN, 
UINT_LEAST128_MAX (most simply defined using an integer constant suffix, 
though don't strictly require one).  And  would have to define 
all the printf and scanf format macros for int128_t, uint128_t, 
int_least128_t and uint_least128_t - so library support would be needed 
for those (the format macros themselves should probably expand to "w128d" 
and similar, a C23 feature already supported for narrower types by glibc 
and by GCC format checking, rather than inventing new features there).

So because an extended integer type (without padding bits) in C23 is 
expected to have all the library support from  and , 
you need integer constant suffixes and printf/scanf support before you can 
declare __int128 an extended integer type for C23.

(If adding printf and scanf support for int128_t to glibc, it probably 
makes sense to add user-visible functions such as strtoi128 at the same 
time - no such functions are in the standard, but something like them 
would be needed internally as part of the scanf implementation, and it's 
likely they would be useful for users as well.)

-- 
Joseph S. Myers
josmy...@redhat.com

[PATCH] libgcc: fix SEH C++ rethrow semantics [PR113337]

2024-01-17 Thread Matteo Italia

SEH _Unwind_Resume_or_Rethrow invokes abort directly if
_Unwind_RaiseException doesn't manage to find a handler for the rethrown
exception; this is incorrect, as in this case std::terminate should be
invoked, allowing an application-provided terminate handler to handle
the situation instead of straight crashing the application through
abort.

The bug can be demonstrated with this simple test case:
===
static void custom_terminate_handler() {
fprintf(stderr, "custom_terminate_handler invoked\n");
std::exit(1);
}

int main(int argc, char *argv[]) {
std::set_terminate(&custom_terminate_handler);
if (argc < 2) return 1;
const char *mode = argv[1];
fprintf(stderr, "%s\n", mode);
if (strcmp(mode, "throw") == 0) {
throw std::exception();
} else if (strcmp(mode, "rethrow") == 0) {
try {
throw std::exception();
} catch (...) {
throw;
}
} else {
return 1;
}
return 0;
}
===

On all gcc builds with non-SEH exceptions, this will print
"custom_terminate_handler invoked" both if launched as ./a.out throw or
as ./a.out rethrow, on SEH builds instead if will work as expected only
with ./a.exe throw, but will crash with the "built-in" abort message
with ./a.exe rethrow.

This patch fixes the problem, forwarding back the error code to the
caller (__cxa_rethrow), that calls std::terminate if
_Unwind_Resume_or_Rethrow returns.

The change makes the code path coherent with SEH _Unwind_RaiseException,
and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc
(used for SjLj and Dw2 exception backend).

libgcc/ChangeLog:

* unwind-seh.c (_Unwind_Resume_or_Rethrow): forward
_Unwind_RaiseException return code back to caller instead of
calling abort, allowing __cxa_rethrow to invoke std::terminate
in case of uncaught rethrown exception
---
 libgcc/unwind-seh.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c
index 8ef0257b616..f1b8f5a8519 100644
--- a/libgcc/unwind-seh.c
+++ b/libgcc/unwind-seh.c
@@ -395,9 +395,9 @@ _Unwind_Reason_Code
 _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc)
 {
   if (exc->private_[0] == 0)
-_Unwind_RaiseException (exc);
-  else
-_Unwind_ForcedUnwind_Phase2 (exc);
+return _Unwind_RaiseException (exc);
+
+  _Unwind_ForcedUnwind_Phase2 (exc);
   abort ();
 }
 
-- 
2.34.1

Re: [PATCH] libstdc++: Do not use CTAD for _Utf32_view alias template

2024-01-17 Thread Jonathan Wakely

On Tue, 16 Jan 2024 at 21:28, Jonathan Wakely wrote:
>
> Tested aarch64-linux. I plan to push this to fix an error when using
> trunk with Clang.

Pushed.

>
> -- >8 --
>
> We were relying on P1814R0 (CTAD for alias templates) which isn't
> supported by Clang. We can just not use CTAD and provide an explicit
> template argument list for _Utf32_view.
>
> Ideally we'd define a deduction guide for _Grapheme_cluster_view that
> uses views::all_t to properly convert non-views to views, but all_t is
> defined in  and we don't want to include all of that in
> . So just make it require a view for now, which can be
> cheaply copied.
>
> Although it's not needed yet, it would also be more correct to
> specialize enable_borrowed_range for the views in .
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/unicode.h (_Grapheme_cluster_view): Require view.
> Do not use CTAD for _Utf32_view.
> (__format_width, __truncate): Do not use CTAD.
> (enable_borrowed_range<_Utf_view>): Define specialization.
> (enable_borrowed_range<_Grapheme_cluster_view>): Likewise.
> ---
>  libstdc++-v3/include/bits/unicode.h | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/unicode.h 
> b/libstdc++-v3/include/bits/unicode.h
> index f1b2b359bdf..d35c83d0090 100644
> --- a/libstdc++-v3/include/bits/unicode.h
> +++ b/libstdc++-v3/include/bits/unicode.h
> @@ -714,15 +714,15 @@ inline namespace __v15_1_0
>};
>
>// Split a range into extended grapheme clusters.
> -  template
> +  template requires ranges::view<_View>
>  class _Grapheme_cluster_view
>  : public ranges::view_interface<_Grapheme_cluster_view<_View>>
>  {
>  public:
>
>constexpr
> -  _Grapheme_cluster_view(const _View& __v)
> -  : _M_begin(_Utf32_view(__v).begin())
> +  _Grapheme_cluster_view(_View __v)
> +  : _M_begin(_Utf32_view<_View>(std::move(__v)).begin())
>{ }
>
>constexpr auto begin() const { return _M_begin; }
> @@ -946,7 +946,7 @@ inline namespace __v15_1_0
>  {
>if (__s.empty()) [[unlikely]]
> return 0;
> -  _Grapheme_cluster_view __gc(__s);
> +  _Grapheme_cluster_view> __gc(__s);
>auto __it = __gc.begin();
>const auto __end = __gc.end();
>size_t __n = __it.width();
> @@ -964,7 +964,7 @@ inline namespace __v15_1_0
>if (__s.empty()) [[unlikely]]
> return 0;
>
> -  _Grapheme_cluster_view __gc(__s);
> +  _Grapheme_cluster_view> __gc(__s);
>auto __it = __gc.begin();
>const auto __end = __gc.end();
>size_t __n = __it.width();
> @@ -1058,6 +1058,19 @@ inline namespace __v15_1_0
>
>  } // namespace __unicode
>
> +namespace ranges
> +{
> +  template
> +inline constexpr bool
> +enable_borrowed_range>
> +  = enable_borrowed_range<_Range>;
> +
> +  template
> +inline constexpr bool
> +enable_borrowed_range>
> +  = enable_borrowed_range<_Range>;
> +} // namespace ranges
> +
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
>  #endif // C++20
> --
> 2.43.0
>

Re: [PATCH] ipa-strub: Fix handling of _BitInt returns [PR113406]

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> Seems pass_ipa_strub::execute contains a copy of the expand_thunk
> code I've changed for _BitInt in r14-6805 PR112941 - larger _BitInts
> are aggregate_value_p even when they are is_gimple_reg_type.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

OK.

> 2024-01-17  Jakub Jelinek  
> 
>   PR middle-end/113406
>   * ipa-strub.cc (pass_ipa_strub::execute): Check aggregate_value_p
>   regardless of whether is_gimple_reg_type (restype) or not.
> 
>   * gcc.dg/bitint-70.c: New test.
> 
> --- gcc/ipa-strub.cc.jj   2024-01-03 11:51:28.374775006 +0100
> +++ gcc/ipa-strub.cc  2024-01-16 10:51:03.987463928 +0100
> @@ -3174,21 +3174,16 @@ pass_ipa_strub::execute (function *)
>  resdecl,
>  build_int_cst (TREE_TYPE (resdecl), 0));
> }
> - else if (!is_gimple_reg_type (restype))
> + else if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
> {
> - if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
> -   {
> - restmp = resdecl;
> + restmp = resdecl;
>  
> - if (VAR_P (restmp))
> -   {
> - add_local_decl (cfun, restmp);
> - BLOCK_VARS (DECL_INITIAL (current_function_decl))
> -   = restmp;
> -   }
> + if (VAR_P (restmp))
> +   {
> + add_local_decl (cfun, restmp);
> + BLOCK_VARS (DECL_INITIAL (current_function_decl))
> +   = restmp;
> }
> - else
> -   restmp = create_tmp_var (restype, "retval");
> }
>   else
> restmp = create_tmp_reg (restype, "retval");
> --- gcc/testsuite/gcc.dg/bitint-70.c.jj   2024-01-16 11:01:48.300524130 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-70.c  2024-01-16 11:01:19.456924333 +0100
> @@ -0,0 +1,14 @@
> +/* PR middle-end/113406 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -fstrub=internal" } */
> +/* { dg-require-effective-target strub } */
> +
> +#if __BITINT_MAXWIDTH__ >= 146
> +_BitInt(146)
> +#else
> +_BitInt(16)
> +#endif
> +foo (void)
> +{
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling [PR113408]

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> Unlike NOP_EXPR/CONVERT_EXPR which are GIMPLE_UNARY_RHS, VIEW_CONVERT_EXPR
> is GIMPLE_SINGLE_RHS and so gimple_assign_rhs1 contains the operand wrapped
> in VIEW_CONVERT_EXPR tree.
> 
> So, to handle it like other casts we need to look through it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-01-17  Jakub Jelinek  
> 
>   PR tree-optimization/113408
>   * gimple-lower-bitint.cc (bitint_large_huge::handle_stmt): For
>   VIEW_CONVERT_EXPR, pass TREE_OPERAND (rhs1, 0) rather than rhs1
>   to handle_cast.
> 
>   * gcc.dg/bitint-71.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-15 17:34:00.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-16 12:32:56.617721208 +0100
> @@ -1975,9 +1975,12 @@ bitint_large_huge::handle_stmt (gimple *
>   case INTEGER_CST:
> return handle_operand (gimple_assign_rhs1 (stmt), idx);
>   CASE_CONVERT:
> - case VIEW_CONVERT_EXPR:
> return handle_cast (TREE_TYPE (gimple_assign_lhs (stmt)),
> gimple_assign_rhs1 (stmt), idx);
> + case VIEW_CONVERT_EXPR:
> +   return handle_cast (TREE_TYPE (gimple_assign_lhs (stmt)),
> +   TREE_OPERAND (gimple_assign_rhs1 (stmt), 0),
> +   idx);
>   default:
> break;
>   }
> --- gcc/testsuite/gcc.dg/bitint-71.c.jj   2024-01-16 12:38:16.679239526 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-71.c  2024-01-16 12:37:24.724967020 +0100
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/113408 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 713
> +struct A { _BitInt(713) b; } g;
> +#else
> +struct A { _BitInt(49) b; } g;
> +#endif
> +int f;
> +
> +void
> +foo (void)
> +{
> +  struct A j = g;
> +  if (j.b)
> +f = 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[committed v5] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

2024-01-17 Thread Jonathan Wakely

Here's the final version that I pushed.

Tested aarch64-linux, x86_64-linux.
commit df0a668b784556fe4317317d58961652d93d53de
Author: Jonathan Wakely 
Date:   Mon Jan 15 15:42:50 2024

libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.

The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).

Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values.  They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.

This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory).  Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled).  In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.

Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.

libstdc++-v3/ChangeLog:

PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.

Reviewed-by: Ulrich Drepper 
Reviewed-by: Patrick Palka 

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index e7cbf0fcf96..f9ba7ef744b 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
   # Keep these sync'd with the list in Makefile.am.  The first provides an
   # expandable list at autoconf time; the second provides an expandable list
   # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/c++23 src/filesystem src/libbacktrace src/experimental 
doc po testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/c++23 src/c++26 src/filesystem src/libbacktrace 
src/experimental doc po testsuite python])
   SUBDIRS='glibcxx_SUBDIRS'
 
   # These need to be absolute paths, yet at the same time need to
@@ -5821,6 +5821,34 @@ AC_LANG_SAVE
   AC_LANG_RESTORE
 ])
 
+dnl
+dnl

Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Monk Chiang wrote:

> This allows the backend to generate movcc instructions, if target
> machine has movcc pattern.
> 
> branchless-cond.c needs to be updated since some target machines have
> conditional move instructions, and the experssion will not change to
> branchless expression.

While I agree this pattern should possibly be applied during RTL
expansion or instruction selection on x86 which also has movcc
the multiplication is cheaper.  So I don't think this isn't the way to go.

I'd rather revert the change than trying to "fix" it this way?

Thanks,
Richard.

> gcc/ChangeLog:
>   PR target/113095
>   * match.pd (`(zero_one == 0) ? y : z  y`,
>   `(zero_one != 0) ? z  y : y`): Do not match to branchless
>   expression, if target machine has conditional move pattern.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
> ---
>  gcc/match.pd  | 30 +--
>  .../gcc.dg/tree-ssa/branchless-cond.c |  6 ++--
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e42ecaf9ec7..a1f90b1cd41 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type)
> && TYPE_PRECISION (type) > 1
> && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> -   (op (mult (convert:type @0) @2) @1
> +   (with {
> +  bool can_movecc_p = false;
> +  if (can_conditionally_move_p (TYPE_MODE (type)))
> + can_movecc_p = true;
> +
> +  /* Some target only support word_mode for movcc pattern, if type can
> +  extend to word_mode then use conditional move. Even if there is a
> +  extend instruction, the cost is lower than branchless.  */
> +  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +   && can_conditionally_move_p (word_mode))
> + can_movecc_p = true;
> +}
> +(if (!can_movecc_p)
> + (op (mult (convert:type @0) @2) @1))
>  
>  /* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
>  (for op (bit_xor bit_ior plus)
> @@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type)
> && TYPE_PRECISION (type) > 1
> && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> -   (op (mult (convert:type @0) @2) @1
> +   (with {
> +  bool can_movecc_p = false;
> +  if (can_conditionally_move_p (TYPE_MODE (type)))
> + can_movecc_p = true;
> +
> +  /* Some target only support word_mode for movcc pattern, if type can
> +  extend to word_mode then use conditional move. Even if there is a
> +  extend instruction, the cost is lower than branchless.  */
> +  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +   && can_conditionally_move_p (word_mode))
> + can_movecc_p = true;
> +}
> +(if (!can_movecc_p)
> + (op (mult (convert:type @0) @2) @1))
>  
>  /* ?: Value replacement. */
>  /* a == 0 ? b : b + a  -> b + a */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> index e063dc4bb5f..c002ed97364 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
>return ((x & 1) != 0) ? z | y : y;
>  }
>  
> -/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { 
> "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* 
> nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { 
> "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* 
> nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* 
> alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] gimple-ssa-warn-access: Cast huge params to sizetype before using them in maybe_check_access_sizes [PR113410]

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> WHen a VLA is created with some very high precision size expression
> (say __int128, or _BitInt(65535) etc.), we cast it to sizetype, because
> we can't have arrays longer than what can be expressed in sizetype.
> 
> But the maybe_check_access_sizes code when trying to determine ranges
> wasn't doing this but was using fixed buffers for the sizes.  While
> __int128 could still be handled (fit into the buffers), obviously
> arbitrary _BitInt parameter ranges can't, they can be in the range of
> up to almost 20KB per number.  It doesn't make sense to print such
> ranges though, no array can be larger than sizetype precision, and
> ranger's range_of_expr can handle NOP_EXPRs/CONVERT_EXPRs wrapping a
> PARM_DECL just fine, so the following patch just casts the excessively
> large counters for the range determination purposes to sizetype.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-01-17  Jakub Jelinek  
> 
>   PR middle-end/113410
>   * gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
>   If access_nelts is integral with larger precision than sizetype,
>   fold_convert it to sizetype.
> 
>   * gcc.dg/bitint-72.c: New test.
> 
> --- gcc/gimple-ssa-warn-access.cc.jj  2024-01-03 11:51:30.087751231 +0100
> +++ gcc/gimple-ssa-warn-access.cc 2024-01-16 19:25:35.408958088 +0100
> @@ -3406,6 +3406,15 @@ pass_waccess::maybe_check_access_sizes (
>else
>   access_nelts = rwm->get (sizidx)->size;
>  
> +  /* If access_nelts is e.g. a PARM_DECL with larger precision than
> +  sizetype, such as __int128 or _BitInt(34123) parameters,
> +  cast it to sizetype.  */
> +  if (access_nelts
> +   && INTEGRAL_TYPE_P (TREE_TYPE (access_nelts))
> +   && (TYPE_PRECISION (TREE_TYPE (access_nelts))
> +   > TYPE_PRECISION (sizetype)))
> + access_nelts = fold_convert (sizetype, access_nelts);
> +
>/* Format the value or range to avoid an explosion of messages.  */
>char sizstr[80];
>tree sizrng[2] = { size_zero_node, build_all_ones_cst (sizetype) };
> --- gcc/testsuite/gcc.dg/bitint-72.c.jj   2024-01-16 19:31:33.839938120 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-72.c  2024-01-16 19:31:06.000328741 +0100
> @@ -0,0 +1,16 @@
> +/* PR middle-end/113410 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 905
> +void bar (_BitInt(905) n, int[n]);
> +#else
> +void bar (int n, int[n]);
> +#endif
> +
> +void
> +foo (int n)
> +{
> +  int buf[n];
> +  bar (n, buf);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Fix merging of value predictors

2024-01-17 Thread Jan Hubicka

Hi,
expr_expected_value is doing some guesswork when it is merging two or more
independent value predictions either in PHI node or in binary operation.
Since we do not know how the predictions interact with each other, we can
not really merge the values precisely.

The previous logic merged the prediciton and picked the later predictor
(since predict.def is sorted by reliability). This however leads to troubles
with __builtin_expect_with_probability since it is special cased as a predictor
with custom probabilities.  If this predictor is downgraded to something else,
we ICE since we have prediction given by predictor that is not expected
to have customprobability.

This patch fixies it by inventing new predictors PRED_COMBINED_VALUE_PREDICTIONS
and PRED_COMBINED_VALUE_PREDICTIONS_PHI which also allows custom values but
are considered less reliable then __builtin_expect_with_probability (they
are combined by ds theory rather then by first match).  This is less likely
going to lead to very stupid decisions if combining does not work as expected.

I also updated the code to be bit more careful about merging values and do not
downgrade the precision when unnecesary (as tested by new testcases).

Bootstrapped/regtested x86_64-linux, will commit it tomorrow if there are
no complains.

2024-01-17  Jan Hubicka 
Jakub Jelinek 

PR tree-optimization/110852

gcc/ChangeLog:

* predict.cc (expr_expected_value_1):
(get_predictor_value):
* predict.def (PRED_COMBINED_VALUE_PREDICTIONS):
(PRED_COMBINED_VALUE_PREDICTIONS_PHI):

gcc/testsuite/ChangeLog:

* gcc.dg/predict-18.c:
* gcc.dg/predict-23.c: New test.
* gcc.dg/tree-ssa/predict-1.c: New test.
* gcc.dg/tree-ssa/predict-2.c: New test.
* gcc.dg/tree-ssa/predict-3.c: New test.

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 84cbe3ffc61..f9d73c5eb1a 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -2404,44 +2404,78 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
   if (!bitmap_set_bit (visited, SSA_NAME_VERSION (op0)))
return NULL;
 
-  if (gimple_code (def) == GIMPLE_PHI)
+  if (gphi *phi = dyn_cast  (def))
{
  /* All the arguments of the PHI node must have the same constant
 length.  */
- int i, n = gimple_phi_num_args (def);
- tree val = NULL, new_val;
+ int i, n = gimple_phi_num_args (phi);
+ tree val = NULL;
+ bool has_nonzero_edge = false;
+
+ /* If we already proved that given edge is unlikely, we do not need
+to handle merging of the probabilities.  */
+ for (i = 0; i < n && !has_nonzero_edge; i++)
+   {
+ tree arg = PHI_ARG_DEF (phi, i);
+ if (arg == PHI_RESULT (phi))
+   continue;
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (!cnt.initialized_p () || cnt.nonzero_p ())
+   has_nonzero_edge = true;
+   }
 
  for (i = 0; i < n; i++)
{
- tree arg = PHI_ARG_DEF (def, i);
+ tree arg = PHI_ARG_DEF (phi, i);
  enum br_predictor predictor2;
 
- /* If this PHI has itself as an argument, we cannot
-determine the string length of this argument.  However,
-if we can find an expected constant value for the other
-PHI args then we can still be sure that this is
-likely a constant.  So be optimistic and just
-continue with the next argument.  */
- if (arg == PHI_RESULT (def))
+ /* Skip self-referring parameters, since they does not change
+expected value.  */
+ if (arg == PHI_RESULT (phi))
continue;
 
+ /* Skip edges which we already predicted as executing
+zero times.  */
+ if (has_nonzero_edge)
+   {
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (cnt.initialized_p () && !cnt.nonzero_p ())
+   continue;
+   }
  HOST_WIDE_INT probability2;
- new_val = expr_expected_value (arg, visited, &predictor2,
-&probability2);
+ tree new_val = expr_expected_value (arg, visited, &predictor2,
+ &probability2);
+ /* If we know nothing about value, give up.  */
+ if (!new_val)
+   return NULL;
 
- /* It is difficult to combine value predictors.  Simply assume
-that later predictor is weaker and take its prediction.  */
- if (*predictor < predictor2)
+ /* If this is a first edge, trust its prediction.  */
+ if (!val)
{
+ val = new_val;
  *pred

Re: Fix merging of value predictors

2024-01-17 Thread Jakub Jelinek

On Wed, Jan 17, 2024 at 01:45:18PM +0100, Jan Hubicka wrote:
> Hi,
> expr_expected_value is doing some guesswork when it is merging two or more
> independent value predictions either in PHI node or in binary operation.
> Since we do not know how the predictions interact with each other, we can
> not really merge the values precisely.
> 
> The previous logic merged the prediciton and picked the later predictor
> (since predict.def is sorted by reliability). This however leads to troubles
> with __builtin_expect_with_probability since it is special cased as a 
> predictor
> with custom probabilities.  If this predictor is downgraded to something else,
> we ICE since we have prediction given by predictor that is not expected
> to have customprobability.
> 
> This patch fixies it by inventing new predictors 
> PRED_COMBINED_VALUE_PREDICTIONS
> and PRED_COMBINED_VALUE_PREDICTIONS_PHI which also allows custom values but
> are considered less reliable then __builtin_expect_with_probability (they
> are combined by ds theory rather then by first match).  This is less likely
> going to lead to very stupid decisions if combining does not work as expected.
> 
> I also updated the code to be bit more careful about merging values and do not
> downgrade the precision when unnecesary (as tested by new testcases).
> 
> Bootstrapped/regtested x86_64-linux, will commit it tomorrow if there are
> no complains.
> 
> 2024-01-17  Jan Hubicka 
>   Jakub Jelinek 

2 spaces before < rather than 1.
> 
>   PR tree-optimization/110852
> 
> gcc/ChangeLog:
> 
>   * predict.cc (expr_expected_value_1):
>   (get_predictor_value):
>   * predict.def (PRED_COMBINED_VALUE_PREDICTIONS):
>   (PRED_COMBINED_VALUE_PREDICTIONS_PHI):
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/predict-18.c:

Please fill in what has changed, both for predict-18.c and predict.{cc,def}
changes.

> @@ -2613,24 +2658,40 @@ expr_expected_value_1 (tree type, tree op0, enum 
> tree_code code,
> if (!nop1)
>   nop1 = op1;
>}
> +  /* We already checked if folding one of arguments to constant is good
> +  enough.  Consequently failing to fold both means that we will not
> +  succeed determinging the value.  */

s/determinging/determining/

Otherwise LGTM.

Jakub

Re: Add -falign-all-functions

2024-01-17 Thread Jan Hubicka

> > +falign-all-functions
> > +Common Var(flag_align_all_functions) Optimization
> > +Align the start of functions.
> 
> all functions
> 
> or maybe "of every function."?

Fixed, thanks!
> > +@opindex falign-all-functions=@var{n}
> > +@item -falign-all-functions
> > +Specify minimal alignment for function entry. Unlike 
> > @option{-falign-functions}
> > +this alignment is applied also to all functions (even those considered 
> > cold).
> > +The alignment is also not affected by @option{-flimit-function-alignment}
> > +
> 
> For functions with two entries (like on powerpc), which entry does this
> apply to?  I suppose the external ABI entry, not the local one?  But
> how does this then help to align the patchable entry (the common
> local entry should be aligned?).  Should we align _both_ entries?

To be honest I did not really know we actually would like to patch
alternative entry points.
The function alignent is always produced before the start of function,
so the first entry point wins and the other entry point is not aligned.

Aligning later labels needs to go using the label align code, since
theoretically some targets need to do relaxation over it.

In final.cc we do no function alignments on labels those labels.
I guess this makes sense because if we align for performance, we
probably do not want the altenrate entry point to be aligned since it
appears close to original one.  I can add that to compute_alignment:
test if label is alternative entry point and add alignment.
I wonder if that is a desired behaviour though and is this code
path even used?

I know this was originally added to support i386 register passing
conventions and stack alignment via alternative entry point, but it was
never really used that way.  Also there was plan to support Fortran
alternative entry point.

Looking at what rs6000 does, it seems to not use the RTL representation
of alternative entry points.  it seems that:
 1) call assemble_start_functions which
a) outputs function alignment
b) outputs start label
c) calls print_patchable_function_entry
 2) call final_start_functions which calls output_function_prologue.
In rs6000 there is second call to
rs6000_print_patchable_function_entry
So there is no target-independent place where alignment can be added,
so I would say it is up to rs6000 maintainers to decide what is right
here :)
> 
> >  @opindex falign-labels
> >  @item -falign-labels
> >  @itemx -falign-labels=@var{n}
> > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> >  Align loops to a power-of-two boundary.  If the loops are executed
> >  many times, this makes up for any execution of the dummy padding
> >  instructions.
> > +This is an optimization of code performance and alignment is ignored for
> > +loops considered cold.
> >  
> >  If @option{-falign-labels} is greater than this value, then its value
> >  is used instead.
> > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> >  Align branch targets to a power-of-two boundary, for branch targets
> >  where the targets can only be reached by jumping.  In this case,
> >  no dummy operations need be executed.
> > +This is an optimization of code performance and alignment is ignored for
> > +jumps considered cold.
> >  
> >  If @option{-falign-labels} is greater than this value, then its value
> >  is used instead.
> > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} and 
> > optimization
> >  options should be specified at compile time and during the final link.
> >  It is recommended that you compile all the files participating in the
> >  same link with the same options and also specify those options at
> > -link time.  
> > +link time.
> >  For example:
> >  
> >  @smallexample
> > diff --git a/gcc/flags.h b/gcc/flags.h
> > index e4bafa310d6..ecf4fb9e846 100644
> > --- a/gcc/flags.h
> > +++ b/gcc/flags.h
> > @@ -89,6 +89,7 @@ public:
> >align_flags x_align_jumps;
> >align_flags x_align_labels;
> >align_flags x_align_functions;
> > +  align_flags x_align_all_functions;
> >  };
> >  
> >  extern class target_flag_state default_target_flag_state;
> > @@ -98,10 +99,11 @@ extern class target_flag_state *this_target_flag_state;
> >  #define this_target_flag_state (&default_target_flag_state)
> >  #endif
> >  
> > -#define align_loops (this_target_flag_state->x_align_loops)
> > -#define align_jumps (this_target_flag_state->x_align_jumps)
> > -#define align_labels(this_target_flag_state->x_align_labels)
> > -#define align_functions (this_target_flag_state->x_align_functions)
> > +#define align_loops(this_target_flag_state->x_align_loops)
> > +#define align_jumps(this_target_flag_state->x_align_jumps)
> > +#define align_labels   (this_target_flag_state->x_align_labels)
> > +#define align_functions(this_target_flag_state->x_align_functions)
> > +#define align_all_functions (this_target_flag_state->x_align_all

Re: Fix merging of value predictors

2024-01-17 Thread Jan Hubicka

> 
> Please fill in what has changed, both for predict-18.c and predict.{cc,def}
> changes.

Sorry, I re-generated the patch after fixing some typos and forgot to
copy over the changelog.
> 
> > @@ -2613,24 +2658,40 @@ expr_expected_value_1 (tree type, tree op0, enum 
> > tree_code code,
> >   if (!nop1)
> > nop1 = op1;
> >  }
> > +  /* We already checked if folding one of arguments to constant is good
> > +enough.  Consequently failing to fold both means that we will not
> > +succeed determinging the value.  */
> 
> s/determinging/determining/
Fixed.  I am re-testing the following and will commit if it succeeds (on
x86_64-linux)

2024-01-17  Jan Hubicka  
Jakub Jelinek  

PR tree-optimization/110852

gcc/ChangeLog:

* predict.cc (expr_expected_value_1): Fix profile merging of PHI and
binary operations
(get_predictor_value): Handle PRED_COMBINED_VALUE_PREDICTIONS and
PRED_COMBINED_VALUE_PREDICTIONS_PHI
* predict.def (PRED_COMBINED_VALUE_PREDICTIONS): New predictor.
(PRED_COMBINED_VALUE_PREDICTIONS_PHI): New predictor.

gcc/testsuite/ChangeLog:

* gcc.dg/predict-18.c: Update template to expect combined value 
predictor.
* gcc.dg/predict-23.c: New test.
* gcc.dg/tree-ssa/predict-1.c: New test.
* gcc.dg/tree-ssa/predict-2.c: New test.
* gcc.dg/tree-ssa/predict-3.c: New test.

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 84cbe3ffc61..c1c48bf3df1 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -2404,44 +2404,78 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
   if (!bitmap_set_bit (visited, SSA_NAME_VERSION (op0)))
return NULL;
 
-  if (gimple_code (def) == GIMPLE_PHI)
+  if (gphi *phi = dyn_cast  (def))
{
  /* All the arguments of the PHI node must have the same constant
 length.  */
- int i, n = gimple_phi_num_args (def);
- tree val = NULL, new_val;
+ int i, n = gimple_phi_num_args (phi);
+ tree val = NULL;
+ bool has_nonzero_edge = false;
+
+ /* If we already proved that given edge is unlikely, we do not need
+to handle merging of the probabilities.  */
+ for (i = 0; i < n && !has_nonzero_edge; i++)
+   {
+ tree arg = PHI_ARG_DEF (phi, i);
+ if (arg == PHI_RESULT (phi))
+   continue;
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (!cnt.initialized_p () || cnt.nonzero_p ())
+   has_nonzero_edge = true;
+   }
 
  for (i = 0; i < n; i++)
{
- tree arg = PHI_ARG_DEF (def, i);
+ tree arg = PHI_ARG_DEF (phi, i);
  enum br_predictor predictor2;
 
- /* If this PHI has itself as an argument, we cannot
-determine the string length of this argument.  However,
-if we can find an expected constant value for the other
-PHI args then we can still be sure that this is
-likely a constant.  So be optimistic and just
-continue with the next argument.  */
- if (arg == PHI_RESULT (def))
+ /* Skip self-referring parameters, since they does not change
+expected value.  */
+ if (arg == PHI_RESULT (phi))
continue;
 
+ /* Skip edges which we already predicted as executing
+zero times.  */
+ if (has_nonzero_edge)
+   {
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (cnt.initialized_p () && !cnt.nonzero_p ())
+   continue;
+   }
  HOST_WIDE_INT probability2;
- new_val = expr_expected_value (arg, visited, &predictor2,
-&probability2);
+ tree new_val = expr_expected_value (arg, visited, &predictor2,
+ &probability2);
+ /* If we know nothing about value, give up.  */
+ if (!new_val)
+   return NULL;
 
- /* It is difficult to combine value predictors.  Simply assume
-that later predictor is weaker and take its prediction.  */
- if (*predictor < predictor2)
+ /* If this is a first edge, trust its prediction.  */
+ if (!val)
{
+ val = new_val;
  *predictor = predictor2;
  *probability = probability2;
+ continue;
}
- if (!new_val)
-   return NULL;
- if (!val)
-   val = new_val;
- else if (!operand_equal_p (val, new_val, false))
+ /* If there are two different values, give up.  */
+ if (!operand_equal_p (val, new_val, f

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-17 Thread Lipeng Zhu





On 1/3/2024 5:14 PM, Lipeng Zhu wrote:



On 2023/12/21 19:42, Thomas Schwinge wrote:

Hi!

On 2023-12-13T21:52:29+0100, I wrote:

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:

On 2023/12/12 1:45, H.J. Lu wrote:
On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  
wrote:

On 2023/12/9 23:23, Jakub Jelinek wrote:

On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the
percentage to step into the insert_unit function is around 30%, in
most instances, we can get the unit in the phase of reading the
unit_cache or unit_root tree. So split the read/write phase by
rwlock would be an approach to make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with
220 cores. The benchmark we used is
https://github.com/rwesson/NEAT



Ok for trunk, thanks.



Thanks! Looking forward to landing to trunk.



Pushed for you.



I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' 
execution test timeouts".

Would you be able to look into that?


See my update in there.


Grüße
  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 
201, 80634 München; Gesellschaft mit beschränkter Haftung; 
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: 
München; Registergericht München, HRB 106955




Updated in https://gcc.gnu.org/PR113005. Could you help to verify if the 
draft patch would fix the execution test timeout issue on your side?




Hi Thomas,

Any feedback from your side?

Regards,
Lipeng Zhu

Re: Add -falign-all-functions

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > > +falign-all-functions
> > > +Common Var(flag_align_all_functions) Optimization
> > > +Align the start of functions.
> > 
> > all functions
> > 
> > or maybe "of every function."?
> 
> Fixed, thanks!
> > > +@opindex falign-all-functions=@var{n}
> > > +@item -falign-all-functions
> > > +Specify minimal alignment for function entry. Unlike 
> > > @option{-falign-functions}
> > > +this alignment is applied also to all functions (even those considered 
> > > cold).
> > > +The alignment is also not affected by @option{-flimit-function-alignment}
> > > +
> > 
> > For functions with two entries (like on powerpc), which entry does this
> > apply to?  I suppose the external ABI entry, not the local one?  But
> > how does this then help to align the patchable entry (the common
> > local entry should be aligned?).  Should we align _both_ entries?
> 
> To be honest I did not really know we actually would like to patch
> alternative entry points.
> The function alignent is always produced before the start of function,
> so the first entry point wins and the other entry point is not aligned.
> 
> Aligning later labels needs to go using the label align code, since
> theoretically some targets need to do relaxation over it.
> 
> In final.cc we do no function alignments on labels those labels.
> I guess this makes sense because if we align for performance, we
> probably do not want the altenrate entry point to be aligned since it
> appears close to original one.  I can add that to compute_alignment:
> test if label is alternative entry point and add alignment.
> I wonder if that is a desired behaviour though and is this code
> path even used?
> 
> I know this was originally added to support i386 register passing
> conventions and stack alignment via alternative entry point, but it was
> never really used that way.  Also there was plan to support Fortran
> alternative entry point.
> 
> Looking at what rs6000 does, it seems to not use the RTL representation
> of alternative entry points.  it seems that:
>  1) call assemble_start_functions which
> a) outputs function alignment
> b) outputs start label
> c) calls print_patchable_function_entry
>  2) call final_start_functions which calls output_function_prologue.
> In rs6000 there is second call to
> rs6000_print_patchable_function_entry
> So there is no target-independent place where alignment can be added,
> so I would say it is up to rs6000 maintainers to decide what is right
> here :)

Fair enough ...

> > 
> > >  @opindex falign-labels
> > >  @item -falign-labels
> > >  @itemx -falign-labels=@var{n}
> > > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> > >  Align loops to a power-of-two boundary.  If the loops are executed
> > >  many times, this makes up for any execution of the dummy padding
> > >  instructions.
> > > +This is an optimization of code performance and alignment is ignored for
> > > +loops considered cold.
> > >  
> > >  If @option{-falign-labels} is greater than this value, then its value
> > >  is used instead.
> > > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> > >  Align branch targets to a power-of-two boundary, for branch targets
> > >  where the targets can only be reached by jumping.  In this case,
> > >  no dummy operations need be executed.
> > > +This is an optimization of code performance and alignment is ignored for
> > > +jumps considered cold.
> > >  
> > >  If @option{-falign-labels} is greater than this value, then its value
> > >  is used instead.
> > > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} 
> > > and optimization
> > >  options should be specified at compile time and during the final link.
> > >  It is recommended that you compile all the files participating in the
> > >  same link with the same options and also specify those options at
> > > -link time.  
> > > +link time.
> > >  For example:
> > >  
> > >  @smallexample
> > > diff --git a/gcc/flags.h b/gcc/flags.h
> > > index e4bafa310d6..ecf4fb9e846 100644
> > > --- a/gcc/flags.h
> > > +++ b/gcc/flags.h
> > > @@ -89,6 +89,7 @@ public:
> > >align_flags x_align_jumps;
> > >align_flags x_align_labels;
> > >align_flags x_align_functions;
> > > +  align_flags x_align_all_functions;
> > >  };
> > >  
> > >  extern class target_flag_state default_target_flag_state;
> > > @@ -98,10 +99,11 @@ extern class target_flag_state 
> > > *this_target_flag_state;
> > >  #define this_target_flag_state (&default_target_flag_state)
> > >  #endif
> > >  
> > > -#define align_loops   (this_target_flag_state->x_align_loops)
> > > -#define align_jumps   (this_target_flag_state->x_align_jumps)
> > > -#define align_labels  (this_target_flag_state->x_align_labels)
> > > -#define align_functions   (this_target_flag_state->x_align_functions)
> > > +#define align_loops  (this_target_flag_state->x_align_loops)
> > > +#define align_jumps

Re: [PATCH] lower-bitint: Avoid overlap between destinations and sources in libgcc calls [PR113421]

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled because the bitint lowering emits a
>   .MULBITINT (&a, 1024, &a, 1024, &x, 1024);
> call.  The bug is in the overlap between the destination and source, that is
> something the libgcc routines don't handle, they use the source arrays
> during the entire algorithms which computes the destination array(s).
> For the mapping of SSA_NAMEs to VAR_DECLs the code already supports that
> correctly, but the checking whether a load from memory can be used directly
> without a temporary even when earlier we decided to merge the
> multiplication/division/modulo etc. with a store didn't.
> 
> The following patch implements that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-01-17  Jakub Jelinek  
> 
>   PR tree-optimization/113421
>   * gimple-lower-bitint.cc (stmt_needs_operand_addr): Adjust function
>   comment.
>   (bitint_dom_walker::before_dom_children): Add g temporary to simplify
>   formatting.  Start at vop rather than cvop even if stmt is a store
>   and needs_operand_addr.
> 
>   * gcc.dg/torture/bitint-50.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-16 12:32:56.617721208 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-16 17:33:04.046476302 +0100
> @@ -5455,7 +5455,8 @@ vuse_eq (ao_ref *, tree vuse1, void *dat
>  
>  /* Return true if STMT uses a library function and needs to take
> address of its inputs.  We need to avoid bit-fields in those
> -   cases.  */
> +   cases.  Similarly, we need to avoid overlap between destination
> +   and source limb arrays.  */
>  
>  bool
>  stmt_needs_operand_addr (gimple *stmt)
> @@ -5574,7 +5575,8 @@ bitint_dom_walker::before_dom_children (
> else if (!bitmap_bit_p (m_loads, SSA_NAME_VERSION (s)))
>   continue;
>  
> -   tree rhs1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s));
> +   gimple *g = SSA_NAME_DEF_STMT (s);
> +   tree rhs1 = gimple_assign_rhs1 (g);
> if (needs_operand_addr
> && TREE_CODE (rhs1) == COMPONENT_REF
> && DECL_BIT_FIELD_TYPE (TREE_OPERAND (rhs1, 1)))
> @@ -5596,15 +5598,14 @@ bitint_dom_walker::before_dom_children (
>  
> ao_ref ref;
> ao_ref_init (&ref, rhs1);
> -   tree lvop = gimple_vuse (SSA_NAME_DEF_STMT (s));
> +   tree lvop = gimple_vuse (g);
> unsigned limit = 64;
> tree vuse = cvop;
> if (vop != cvop
> && is_gimple_assign (stmt)
> && gimple_store_p (stmt)
> -   && !operand_equal_p (lhs,
> -gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s)),
> -0))
> +   && (needs_operand_addr
> +   || !operand_equal_p (lhs, gimple_assign_rhs1 (g), 0)))
>   vuse = vop;
> if (vuse != lvop
> && walk_non_aliased_vuses (&ref, vuse, false, vuse_eq,
> --- gcc/testsuite/gcc.dg/torture/bitint-50.c.jj   2024-01-16 
> 17:35:16.084622119 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-50.c  2024-01-16 17:35:06.701753879 
> +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/113421 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 1024
> +unsigned _BitInt(1024) a = -5wb;
> +
> +__attribute__((noipa)) void
> +foo (unsigned _BitInt(1024) x)
> +{
> +  a *= x;
> +}
> +#else
> +int a = 30;
> +
> +void
> +foo (int)
> +{
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +  foo (-6wb);
> +  if (a != 30wb)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: Add -falign-all-functions

2024-01-17 Thread Jan Hubicka

> 
> I meant the new option might be named -fmin-function-alignment=
> rather than -falign-all-functions because of how it should
> override all other options.

I was also pondering about both names.  -falign-all-functions has the
advantage that it is similar to all the other alignment flags that are
all called -falign-XXX

but both options are finte for me.
> 
> Otherwise is there an updated patch to look at?

I will prepare one.  So shall I drop the max-skip support for alignment
and rename the flag?

Honza
> 
> Richard.
> 
> > > -flimit-function-alignment should not have an effect on it
> > > and even very small functions should be aligned.
> > 
> > I write that it is not affected by limit-function-alignment
> > @opindex falign-all-functions=@var{n}
> > @item -falign-all-functions
> > Specify minimal alignment for function entry. Unlike 
> > @option{-falign-functions}
> > this alignment is applied also to all functions (even those considered 
> > cold).
> > The alignment is also not affected by @option{-flimit-function-alignment}
> > 
> > Because indeed that would break the atomicity of updates.
> 
> 
> 
> > Honza
> > > 
> > > Richard.
> > > 
> > > > +}
> > > > +
> > > >/* Handle a user-specified function alignment.
> > > >   Note that we still need to align to DECL_ALIGN, as above,
> > > >   because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at 
> > > > all.  */
> > > > 
> > > 
> > > -- 
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Robin Dapp

Hi Juzhe,

the change itself is OK but I don't think we should add binary
files like this.  Even if not ideal, if you want to go forward
IMHO let's skip the test for now and add it at a (not much) later
time.

> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/ppgrid.mod 
> b/gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/ppgrid.mod
> new file mode 100644
> index 
> ..cb021390ccd758e75c3ad11b33da93e5fba9dd25
> GIT binary patch
> literal 296
> zcmV+@0oVQ?iwFP!01J#p3PlGTRhVT6q@2zl{=@`s-tfVf)tq@kHSA}j8Hz3S;
> z@Yh?$U?jR7j0cyt>GwAM+UHHbPVT~3#av=jq`S4ohpx6+k%JCBiloxd?>fb@DmEy~
> zRh6Yz%d*TqwV7`iu`C;Z(McQF#DqT#2lPd+lGk1SMnM}A6Hp9cSqmNq{B|nvAn#@P
> zCGE%0NIhX&LksHou~f}
> z%QY-XvEF`Xig?UtLYWiJK(&GdvtL8`p`0sj4z|eo?bN0F=wJgq8=k>8<#-V;obgE;
> u

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-17 Thread Michael Matz

Hello,

On Wed, 17 Jan 2024, Ajit Agarwal wrote:

> > first is even, since OOmode is only ok for even vsx register and its
> > size makes it take two consecutive vsx registers.
> > 
> > Hi Peter, is my understanding correct?
> > 
> 
> I tried all the combination in the past RA is not allocating sequential 
> register. I dont see any such code in RA that generates sequential 
> registers.

See HARD_REGNO_NREGS.  If you form a pseudo of a mode that's larger than a 
native-sized hardreg (and the target is correctly set up) then the RA will 
allocate the correct number of hardregs (consecutively) for this pseudo.  
This is what Kewen was referring to by mentioning the OOmode for the new 
hypothetical pseudo.  The individual parts of such pseudo will then need 
to use subreg to access them.

So, when you work before RA you simply will transform this (I'm going to 
use SImode and DImode for demonstration):

   (set (reg:SI x) (mem:SI (addr)))
   (set (reg:SI y) (mem:SI (addr+4)))
   ...
   ( ...use1... (reg:SI x))
   ( ...use2... (reg:SI y))

into this:

   (set (reg:DI z) (mem:DI (addr)))
   ...
   ( ...use1... (subreg:SI (reg:DI z) 0))
   ( ...use2... (subreg:SI (reg:DI z) 4))

For this to work the target needs to accept the (subreg...) in certain 
operands of instruction patterns, which I assume was what Kewen also 
referred to.  The register allocator will then assign hardregs X and X+1 
to the pseudo-reg 'z'.  (Assuming that DImode is okay for hardreg X, and 
HARD_REGNO_NREGS says that it needs two hardregs to hold DImode).

It will also replace the subregs by their appropriate concrete hardreg.

It seems your problems stem from trying to place your new pass somewhere 
within the register-allocation pipeline, rather than simply completely 
before.

Ciao,
Michael.

Re: Add -falign-all-functions

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > 
> > I meant the new option might be named -fmin-function-alignment=
> > rather than -falign-all-functions because of how it should
> > override all other options.
> 
> I was also pondering about both names.  -falign-all-functions has the
> advantage that it is similar to all the other alignment flags that are
> all called -falign-XXX
> 
> but both options are finte for me.
> > 
> > Otherwise is there an updated patch to look at?
> 
> I will prepare one.  So shall I drop the max-skip support for alignment
> and rename the flag?

Yes.

> Honza
> > 
> > Richard.
> > 
> > > > -flimit-function-alignment should not have an effect on it
> > > > and even very small functions should be aligned.
> > > 
> > > I write that it is not affected by limit-function-alignment
> > > @opindex falign-all-functions=@var{n}
> > > @item -falign-all-functions
> > > Specify minimal alignment for function entry. Unlike 
> > > @option{-falign-functions}
> > > this alignment is applied also to all functions (even those considered 
> > > cold).
> > > The alignment is also not affected by @option{-flimit-function-alignment}
> > > 
> > > Because indeed that would break the atomicity of updates.
> > 
> > 
> > 
> > > Honza
> > > > 
> > > > Richard.
> > > > 
> > > > > +}
> > > > > +
> > > > >/* Handle a user-specified function alignment.
> > > > >   Note that we still need to align to DECL_ALIGN, as above,
> > > > >   because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at 
> > > > > all.  */
> > > > > 
> > > > 
> > > > -- 
> > > > Richard Biener 
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG 
> > > > Nuernberg)
> > > 
> > 
> > -- 
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH V2] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong

This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible 
check
for conflict vsetvl fusion.

Buggy assembler before this patch:

.L69:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
j   .L37
.L68:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
addia3,a5,8
vmv.v.i v1,0
vse8.v  v1,0(a5)
vse8.v  v1,0(a3)
addia4,a4,-16
li  a3,8
bltua4,a3,.L37
j   .L69
.L67:
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
vse8.v  v1,0(a3)
addia4,a4,-32
li  a3,16
bltua4,a3,.L36
j   .L68

After this patch:

.L63:
ble s1,zero,.L49
sllia4,s1,3
li  a3,32
addia5,sp,48
bltua4,a3,.L62
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
addia4,a4,-32
addia5,sp,80
vse8.v  v1,0(a3)
.L35:
li  a3,16
bltua4,a3,.L36
addia3,a5,8
vmv.v.i v1,0
addia4,a4,-16
vse8.v  v1,0(a5)
addia5,a5,16
vse8.v  v1,0(a3)
.L36:
li  a3,8
bltua4,a3,.L37
vmv.v.i v1,0
vse8.v  v1,0(a5)

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113429

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
Fix conflict vsetvl fusion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-5.c: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 39 +++
 .../riscv/rvv/vsetvl/vlmax_conflict-4.c   |  5 +--
 .../riscv/rvv/vsetvl/vlmax_conflict-5.c   | 10 ++---
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df7ed149388..76e3d2eb471 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2254,6 +2254,22 @@ private:
 return true;
   }
 
+  bool has_compatible_reaching_vsetvl_p (vsetvl_info info)
+  {
+unsigned int index;
+sbitmap_iterator sbi;
+EXECUTE_IF_SET_IN_BITMAP (m_vsetvl_def_in[info.get_bb ()->index ()], 0,
+ index, sbi)
+  {
+   const auto prev_info = *m_vsetvl_def_exprs[index];
+   if (!prev_info.valid_p ())
+ continue;
+   if (m_dem.compatible_p (prev_info, info))
+ return true;
+  }
+return false;
+  }
+
   bool preds_all_same_avl_and_ratio_p (const vsetvl_info &curr_info)
   {
 gcc_assert (
@@ -3075,22 +3091,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
{
  vsetvl_info new_curr_info = curr_info;
  new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
- bool has_compatible_p = false;
- unsigned int def_expr_index;
- sbitmap_iterator sbi2;
- EXECUTE_IF_SET_IN_BITMAP (
-   m_vsetvl_def_in[new_curr_info.get_bb ()->index ()], 0,
-   def_expr_index, sbi2)
-   {
- vsetvl_info &prev_info = *m_vsetvl_def_exprs[def_expr_index];
- if (!prev_info.valid_p ())
-   continue;
- if (m_dem.compatible_p (prev_info, new_curr_info))
-   {
- has_compatible_p = true;
- break;
-   }
-   }
+ bool has_compatible_p
+   = has_compatible_reaching_vsetvl_p (new_curr_info);
  if (!has_compatible_p)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3146,7 +3148,10 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
   && !m_dem.compatible_p (prev_info, curr_info))
{
  /* Cancel lift up if probabilities are equal.  */
- if (successors_probability_equal_p (eg->src))
+ if (successors_probability_equal_p (eg->src)
+ || (dest_block_info.probability
+   > src_block_info.probability
+ && !has_compatible_reaching_vsetvl_p (curr_info)))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlm

Re: [PATCH V2] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Robin Dapp

OK.

Regards
 Robin

[PATCH] aarch64: Fix eh_return for -mtrack-speculation [PR112987]

2024-01-17 Thread Szabolcs Nagy

Recent commit introduced a conditional branch in eh_return epilogues
that is not compatible with speculation tracking:

  commit 426fddcbdad6746fe70e031f707fb07f55dfb405
  Author: Szabolcs Nagy 
  CommitDate: 2023-11-27 15:52:48 +

  aarch64: Use br instead of ret for eh_return

gcc/ChangeLog:

PR target/112987
* config/aarch64/aarch64.cc (aarch64_expand_epilogue): Use
explicit compare and separate jump with speculation tracking.
---
 gcc/config/aarch64/aarch64.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e6bd3fd0bb4..e6de62dc02a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -9879,7 +9879,17 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
 is just as correct as retaining the CFA from the body
 of the function.  Therefore, do nothing special.  */
   rtx label = gen_label_rtx ();
-  rtx x = gen_rtx_EQ (VOIDmode, EH_RETURN_TAKEN_RTX, const0_rtx);
+  rtx x;
+  if (aarch64_track_speculation)
+   {
+ /* Emit an explicit compare, so cc can be tracked.  */
+ rtx cc_reg = aarch64_gen_compare_reg (EQ,
+   EH_RETURN_TAKEN_RTX,
+   const0_rtx);
+ x = gen_rtx_EQ (GET_MODE (cc_reg), cc_reg, const0_rtx);
+   }
+  else
+   x = gen_rtx_EQ (VOIDmode, EH_RETURN_TAKEN_RTX, const0_rtx);
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   rtx jump = emit_jump_insn (gen_rtx_SET (pc_rtx, x));
-- 
2.25.1

Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Jeff Law





On 1/17/24 05:14, Richard Biener wrote:

On Wed, 17 Jan 2024, Monk Chiang wrote:


This allows the backend to generate movcc instructions, if target
machine has movcc pattern.

branchless-cond.c needs to be updated since some target machines have
conditional move instructions, and the experssion will not change to
branchless expression.


While I agree this pattern should possibly be applied during RTL
expansion or instruction selection on x86 which also has movcc
the multiplication is cheaper.  So I don't think this isn't the way to go.

I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable 
branchless sequences for that very same code.  Reverting would regress 
performance on a variety of micro-architectures.  IIUC, the issue is 
that the SiFive part in question has a fusion which allows it to do the 
branchy sequence cheaply.


ISTM this really needs to be addressed during expansion and most likely 
with a RISC-V target twiddle for the micro-archs which have 
short-forward-branch optimizations.


jeff

Re: [PATCH 1/4] rtl-ssa: Run finalize_new_accesses forwards [PR113070]

2024-01-17 Thread Jeff Law





On 1/13/24 08:43, Alex Coplan wrote:

The next patch in this series exposes an interface for creating new uses
in RTL-SSA.  The intent is that new user-created uses can consume new
user-created defs in the same change group.  This is so that we can
correctly update uses of memory when inserting a new store pair insn in
the aarch64 load/store pair fusion pass (the affected uses need to
consume the new store pair insn).

As it stands, finalize_new_accesses is called as part of the backwards
insn placement loop within change_insns, but if we want new uses to be
able to depend on new defs in the same change group, we need
finalize_new_accesses to be called on earlier insns first.  This is so
that when we process temporary uses and turn them into permanent uses,
we can follow the last_def link on the temporary def to ensure we end up
with a permanent use consuming a permanent def.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/113070
* rtl-ssa/changes.cc (function_info::change_insns): Split out the call
to finalize_new_accesses from the backwards placement loop, run it
forwards in a separate loop.
So just to be explicit -- given this is adjusting the rtl-ssa 
infrastructure, I was going to let Richard S. own the review side -- he 
knows that code better than I.


Jeff

Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-17 Thread Martin Jambor

Hi,
On Wed, Jan 10 2024, Jakub Jelinek wrote:
> Hi!
>
> As changed in other parts of the compiler, using
> build_nonstandard_integer_type is not appropriate for arbitrary precisions,
> especially if the precision comes from a BITINT_TYPE or something based on
> that, build_nonstandard_integer_type relies on some integral mode being
> supported that can support the precision.
>
> The following patch uses build_bitint_type instead for BITINT_TYPE
> precisions.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, it would be good if we were able to punt on the optimization
> (but this code doesn't seem to be able to punt, so it needs to be done
> somewhere earlier) at least in cases where building it would be invalid.
> E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
> but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
> I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
> so maybe it ran into some other SRA limit.

Thank you very much for the patch.  Regarding punting, did you mean for
all BITINT_TYPEs or just for big ones, like you did when you fixed PR
11333 (thanks for that too) or something entirely else?

Martin

>
> 2024-01-10  Jakub Jelinek  
>
>   PR tree-optimization/113120
>   * tree-sra.cc (analyze_access_subtree): For BITINT_TYPE
>   with root->size TYPE_PRECISION don't build anything new.
>   Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type
>   rather than build_nonstandard_integer_type.
>
>   * gcc.dg/bitint-63.c: New test.

Re: [PATCH 1/4] rtl-ssa: Run finalize_new_accesses forwards [PR113070]

2024-01-17 Thread Alex Coplan

On 17/01/2024 07:42, Jeff Law wrote:
> 
> 
> On 1/13/24 08:43, Alex Coplan wrote:
> > The next patch in this series exposes an interface for creating new uses
> > in RTL-SSA.  The intent is that new user-created uses can consume new
> > user-created defs in the same change group.  This is so that we can
> > correctly update uses of memory when inserting a new store pair insn in
> > the aarch64 load/store pair fusion pass (the affected uses need to
> > consume the new store pair insn).
> > 
> > As it stands, finalize_new_accesses is called as part of the backwards
> > insn placement loop within change_insns, but if we want new uses to be
> > able to depend on new defs in the same change group, we need
> > finalize_new_accesses to be called on earlier insns first.  This is so
> > that when we process temporary uses and turn them into permanent uses,
> > we can follow the last_def link on the temporary def to ensure we end up
> > with a permanent use consuming a permanent def.
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> > 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > PR target/113070
> > * rtl-ssa/changes.cc (function_info::change_insns): Split out the call
> > to finalize_new_accesses from the backwards placement loop, run it
> > forwards in a separate loop.
> So just to be explicit -- given this is adjusting the rtl-ssa
> infrastructure, I was going to let Richard S. own the review side -- he
> knows that code better than I.

Yeah, that's fine, thanks.  Richard is away this week but back on Monday, so
hopefully he can take a look at it then.

Alex

> 
> Jeff

Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-17 Thread Jakub Jelinek

On Wed, Jan 17, 2024 at 03:46:44PM +0100, Martin Jambor wrote:
> > Note, it would be good if we were able to punt on the optimization
> > (but this code doesn't seem to be able to punt, so it needs to be done
> > somewhere earlier) at least in cases where building it would be invalid.
> > E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
> > but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
> > I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
> > so maybe it ran into some other SRA limit.
> 
> Thank you very much for the patch.  Regarding punting, did you mean for
> all BITINT_TYPEs or just for big ones, like you did when you fixed PR
> 11333 (thanks for that too) or something entirely else?

I meant what I did in PR113330, but still wonder if we really need to use
a root->size which is multiple of BITS_PER_UNIT (or words or whatever it
actually is), at least on little endian if the _BitInt starts at the start
of a memory.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113408#c1
for more details, wonder if it just couldn't use _BitInt(713) in there
directly rather than _BitInt(768).

Jakub

[PATCH] aarch64: Check the ldp/stp policy model correctly when mem ops are reversed.

2024-01-17 Thread Manos Anagnostakis

The current ldp/stp policy framework implementation was missing cases, where
the memory operands were reversed. Therefore the call to the framework function
is moved after the lower mem check with the suitable parameters. Also removes
the mode of aarch64_operands_ok_for_ldpstp, which becomes unused and triggers
a warning on bootstrap.

gcc/ChangeLog:

* config/aarch64/aarch64-ldpstp.md: Remove unused mode.
* config/aarch64/aarch64-protos.h (aarch64_operands_ok_for_ldpstp):
Likewise.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Call on framework moved later.

Signed-off-by: Manos Anagnostakis 
Co-Authored-By: Manolis Tsamis 
---
 gcc/config/aarch64/aarch64-ldpstp.md | 22 +++---
 gcc/config/aarch64/aarch64-protos.h  |  2 +-
 gcc/config/aarch64/aarch64.cc| 18 +-
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldpstp.md 
b/gcc/config/aarch64/aarch64-ldpstp.md
index b668fa8e2a6..b7c0bf05cd1 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -23,7 +23,7 @@
(match_operand:GPI 1 "memory_operand" ""))
(set (match_operand:GPI 2 "register_operand" "")
(match_operand:GPI 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true);
@@ -35,7 +35,7 @@
(match_operand:GPI 1 "aarch64_reg_or_zero" ""))
(set (match_operand:GPI 2 "memory_operand" "")
(match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, false);
@@ -47,7 +47,7 @@
(match_operand:GPF 1 "memory_operand" ""))
(set (match_operand:GPF 2 "register_operand" "")
(match_operand:GPF 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true);
@@ -59,7 +59,7 @@
(match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
(set (match_operand:GPF 2 "memory_operand" "")
(match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, false);
@@ -71,7 +71,7 @@
(match_operand:DREG 1 "memory_operand" ""))
(set (match_operand:DREG2 2 "register_operand" "")
(match_operand:DREG2 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true);
@@ -83,7 +83,7 @@
(match_operand:DREG 1 "register_operand" ""))
(set (match_operand:DREG2 2 "memory_operand" "")
(match_operand:DREG2 3 "register_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, false);
@@ -96,7 +96,7 @@
(set (match_operand:VQ2 2 "register_operand" "")
(match_operand:VQ2 3 "memory_operand" ""))]
   "TARGET_FLOAT
-   && aarch64_operands_ok_for_ldpstp (operands, true, mode)
+   && aarch64_operands_ok_for_ldpstp (operands, true)
&& (aarch64_tune_params.extra_tuning_flags
& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
   [(const_int 0)]
@@ -111,7 +111,7 @@
(set (match_operand:VQ2 2 "memory_operand" "")
(match_operand:VQ2 3 "register_operand" ""))]
   "TARGET_FLOAT
-   && aarch64_operands_ok_for_ldpstp (operands, false, mode)
+   && aarch64_operands_ok_for_ldpstp (operands, false)
&& (aarch64_tune_params.extra_tuning_flags
& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
   [(const_int 0)]
@@ -128,7 +128,7 @@
(sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
(set (match_operand:DI 2 "register_operand" "")
(sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true, SIGN_EXTEND);
@@ -140,7 +140,7 @@
(zero_extend:DI (match_operand:SI 1 "memory_operand" "")))
(set (match_operand:DI 2 "register_operand" "")
(zero_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true, ZERO_EXTEND);
@@ -162,7 +162,7 @@
(match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" ""))
(set

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Richard Biener

On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
 wrote:
>
> > On Jan 17, 2024, at 10:51, Richard Biener  
> > wrote:
> >
> > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> >>> Hi Vladimir,
> >>> Hi Jeff,
> >>>
> >>> Richard and Alexander have reviewed this patch and [I assume] have no
> >>> further comments.  OK to merge?
> >> I think the question is whether or not we're too late.  I know that
> >> Richard S has held off on his late-combine pass and I'm holding off on
> >> the ext-dce work due to the fact that we're well past stage1 close.
> >>
> >> I think the release managers ought to have the final say on this.
> >
> > I'm fine with this now, it doesn't change code generation.
>
> Thanks, Richard.
>
> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
> stage1 opens.

This seems to have caused a compare-debug bootstrap issue on x86_64-linux,

gcc/fortran/f95-lang.o differs

does n_mem_deps or n_inc_deps include debug insns?

Richard.

> Regards,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Maxim Kuvyrkov

> On Jan 17, 2024, at 19:02, Richard Biener  wrote:
> 
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>  wrote:
>> 
>>> On Jan 17, 2024, at 10:51, Richard Biener  
>>> wrote:
>>> 
>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
 
 
 
 On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> Hi Vladimir,
> Hi Jeff,
> 
> Richard and Alexander have reviewed this patch and [I assume] have no
> further comments.  OK to merge?
 I think the question is whether or not we're too late.  I know that
 Richard S has held off on his late-combine pass and I'm holding off on
 the ext-dce work due to the fact that we're well past stage1 close.
 
 I think the release managers ought to have the final say on this.
>>> 
>>> I'm fine with this now, it doesn't change code generation.
>> 
>> Thanks, Richard.
>> 
>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
>> stage1 opens.
> 
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
> 
> gcc/fortran/f95-lang.o differs
> 
> does n_mem_deps or n_inc_deps include debug insns?

Thanks, investigating.

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH v1] Fix compare-debug bootstrap failure

2024-01-17 Thread Maxim Kuvyrkov

... caused by scheduler fix for PR96388 and PR111554.

This patch adjusts decision sched-deps.cc:find_inc() to use
length of dependency lists sans any DEBUG_INSN instructions.

gcc/ChangeLog:

* haifa-sched.cc (dep_list_size): Make global.
* sched-deps.cc (find_inc): Use instead of sd_lists_size().
* sched-int.h (dep_list_size): Declare.
---
 gcc/haifa-sched.cc | 8 ++--
 gcc/sched-deps.cc  | 6 +++---
 gcc/sched-int.h| 2 ++
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 49ee589aed7..1bc610f9a5f 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1560,8 +1560,7 @@ contributes_to_priority_p (dep_t dep)
 }
 
 /* Compute the number of nondebug deps in list LIST for INSN.  */
-
-static int
+int
 dep_list_size (rtx_insn *insn, sd_list_types_def list)
 {
   sd_iterator_def sd_it;
@@ -1571,6 +1570,11 @@ dep_list_size (rtx_insn *insn, sd_list_types_def list)
   if (!MAY_HAVE_DEBUG_INSNS)
 return sd_lists_size (insn, list);
 
+  /* TODO: We should split normal and debug insns into separate SD_LIST_*
+ sub-lists, and then we'll be able to use something like
+ sd_lists_size(insn, list & SD_LIST_NON_DEBUG)
+ instead of walking dependencies below.  */
+
   FOR_EACH_DEP (insn, list, sd_it, dep)
 {
   if (DEBUG_INSN_P (DEP_CON (dep)))
diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 0615007c560..5034e664e5e 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4791,7 +4791,7 @@ find_inc (struct mem_inc_info *mii, bool backwards)
   sd_iterator_def sd_it;
   dep_t dep;
   sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
-  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
+  int n_mem_deps = dep_list_size (mii->mem_insn, mem_deps);
 
   sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
@@ -4808,12 +4808,12 @@ find_inc (struct mem_inc_info *mii, bool backwards)
   if (backwards)
{
  inc_cand = pro;
- n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+ n_inc_deps = dep_list_size (inc_cand, SD_LIST_BACK);
}
   else
{
  inc_cand = con;
- n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+ n_inc_deps = dep_list_size (inc_cand, SD_LIST_FORW);
}
 
   /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index ab784fe0d17..4df092013e9 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1677,6 +1677,8 @@ extern void sd_copy_back_deps (rtx_insn *, rtx_insn *, 
bool);
 extern void sd_delete_dep (sd_iterator_def);
 extern void sd_debug_lists (rtx, sd_list_types_def);
 
+extern int dep_list_size (rtx_insn *, sd_list_types_def);
+
 /* Macros and declarations for scheduling fusion.  */
 #define FUSION_MAX_PRIORITY (INT_MAX)
 extern bool sched_fusion;
-- 
2.34.1

Re: [PATCH] c++: address of NTTP object as targ [PR113242]

2024-01-17 Thread Patrick Palka

On Mon, 15 Jan 2024, Jason Merrill wrote:

> On 1/5/24 11:50, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for trunk and perhaps 13?
> > 
> > -- >8 --
> > 
> > invalid_tparm_referent_p was rejecting using the address of a class NTTP
> > object as a template argument, but this should be fine.
> 
> Hmm, I suppose so; https://eel.is/c++draft/temp#param-8 saying "No two
> template parameter objects are template-argument-equivalent" suggests there
> can be only one.  And clang/msvc allow it.
> 
> > PR c++/113242
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (invalid_tparm_referent_p) : Suppress
> > DECL_ARTIFICIAL rejection test for class NTTP objects.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/nontype-class61.C: New test.
> > ---
> >   gcc/cp/pt.cc |  3 ++-
> >   gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
> >   2 files changed, 29 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 154ac76cb65..8c7d178328d 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -7219,7 +7219,8 @@ invalid_tparm_referent_p (tree type, tree expr,
> > tsubst_flags_t complain)
> >* a string literal (5.13.5),
> >* the result of a typeid expression (8.2.8), or
> >* a predefined __func__ variable (11.4.1).  */
> > -   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
> > +   else if (VAR_P (decl) && !DECL_NTTP_OBJECT_P (decl)
> > +&& DECL_ARTIFICIAL (decl))
> 
> If now some artificial variables are OK and others are not, perhaps we should
> enumerate them either way and abort if it's one we haven't specifically
> considered.

Sounds good, like so?  Shall we backport this patch or the original
patch to the 13 branch?

-- >8 --

Subject: [PATCH] c++: address of class NTTP object as targ [PR113242]

invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.

This patch fixes this by refining the DECL_ARTIFICIAL rejection test to
check specifically for the kinds of artificial variables we want to
exclude.

PR c++/113242

gcc/cp/ChangeLog:

* pt.cc (invalid_tparm_referent_p) : Refine
DECL_ARTIFICIAL rejection test.  Assert that C++20 template
parameter objects are the only artificial variables we accept.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class61.C: New test.
---
 gcc/cp/pt.cc | 13 +++---
 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
 2 files changed, 37 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b6117231de1..885c297450e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7212,12 +7212,14 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
/* C++17: For a non-type template-parameter of reference or pointer
   type, the value of the constant expression shall not refer to (or
   for a pointer type, shall not be the address of):
-  * a subobject (4.5),
+  * a subobject (4.5), (relaxed in C++20)
   * a temporary object (15.2),
-  * a string literal (5.13.5),
+  * a string literal (5.13.5), (we diagnose this early in
+convert_nontype_argument)
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+   else if (VAR_P (decl)
+&& (DECL_TINFO_P (decl) || DECL_FNAME_P (decl)))
  {
if (complain & tf_error)
  error ("the address of %qD is not a valid template argument",
@@ -7242,6 +7244,11 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
 decl);
return true;
  }
+
+   /* The only artificial variables we do accept are C++20
+  template parameter objects.   */
+   if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+ gcc_checking_assert (DECL_NTTP_OBJECT_P (decl));
   }
   break;
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
new file mode 100644
index 000..90805a05ecf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
@@ -0,0 +1,27 @@
+// PR c++/113242
+// { dg-do compile { target c++20 } }
+
+struct wrapper {
+  int n;
+};
+
+template
+void f1() {
+  static_assert(X.n == 42);
+}
+
+template
+void f2() {
+  static_assert(X->n == 42);
+}
+
+template
+void g() {
+  f1();
+  f2<&X>();
+}
+
+int main() {
+  constexpr wrapper X = {42};
+  g();
+}
-- 
2.43.0.367.g186b115d30

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Maxim Kuvyrkov

> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov  wrote:
> 
>> On Jan 17, 2024, at 19:02, Richard Biener  wrote:
>> 
>> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>>  wrote:
>>> 
 On Jan 17, 2024, at 10:51, Richard Biener  
 wrote:
 
 On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
> 
> 
> 
> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>> Hi Vladimir,
>> Hi Jeff,
>> 
>> Richard and Alexander have reviewed this patch and [I assume] have no
>> further comments.  OK to merge?
> I think the question is whether or not we're too late.  I know that
> Richard S has held off on his late-combine pass and I'm holding off on
> the ext-dce work due to the fact that we're well past stage1 close.
> 
> I think the release managers ought to have the final say on this.
 
 I'm fine with this now, it doesn't change code generation.
>>> 
>>> Thanks, Richard.
>>> 
>>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
>>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
>>> stage1 opens.
>> 
>> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>> 
>> gcc/fortran/f95-lang.o differs
>> 
>> does n_mem_deps or n_inc_deps include debug insns?
> 
> Thanks, investigating.

Hi Richard,

Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for 
this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html .  
Testing it now.

If you prefer, I can revert the fix for PR96388 and PR111554.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org

Re: [PATCH v3] c++/modules: Fix handling of extern templates in modules [PR112820]

2024-01-17 Thread Jason Merrill


On 1/17/24 01:33, Nathaniel Shead wrote:

On Mon, Jan 15, 2024 at 06:10:55PM -0500, Jason Merrill wrote:

Under what circumstances does it make sense for CLASSTYPE_INTERFACE_ONLY to
be set in the context of modules, anyway?  We probably want to propagate it
for things in the global module so that various libstdc++ explicit
instantiations work the same with import std.

For an class imported from a named module, this ties into the earlier
discussion about vtables and inlines that hasn't resolved yet in the ABI
committee.  But it's certainly significantly interface-like.  And I would
expect maybe_suppress_debug_info to suppress the debug info for such a class
on the assumption that the module unit has the needed debug info.

Jason



Here's another approach for this patch. This still only fixes the
specific issues in the PR, I think vtable handling etc. should wait till
stage 1 because it involves a lot of messing around in decl2.cc.

As mentioned in the commit message, after thinking more about it I don't
think we (in general) want to propagate CLASSTYPE_INTERFACE_ONLY, even
for declarations in the GMF. This makes sense to me because typically it
can only be accurately determined at the end of the TU, which we haven't
yet arrived at after importing. For instance, for a polymorphic class in
the GMF without a key method, that we import from a module and then
proceed to define the key method later on in this TU.


That sounds right for a module implementation unit or the GMF.


Bootstrapped and partially regtested on x86_64-pc-linux-gnu (so far only
modules.exp): OK for trunk if full regtesting passes?


Please add a reference to ABI issue 170 
(https://github.com/itanium-cxx-abi/cxx-abi/issues/170).  OK with that 
change if Nathan doesn't have any further comments this week.



-- >8 --

Currently, extern templates are detected by looking for the
DECL_EXTERNAL flag on a TYPE_DECL. However, this is incorrect:
TYPE_DECLs don't actually set this flag, and it happens to work by
coincidence due to TYPE_DECL_SUPPRESS_DEBUG happening to use the same
underlying bit. This however causes issues with other TYPE_DECLs that
also happen to have suppressed debug information.

Instead, this patch reworks the logic so CLASSTYPE_INTERFACE_ONLY is
always emitted into the module BMI and can then be used to check for an
extern template correctly.

Otherwise, for other declarations we want to redetermine this: even for
declarations from the GMF, we may change our mind on whether to import
or export depending on decisions made later in the TU after importing so
we shouldn't decide this now, or necessarily reuse what the module we'd
imported had decided.

PR c++/112820
PR c++/102607

gcc/cp/ChangeLog:

* module.cc (trees_out::lang_type_bools): Write interface_only
and interface_unknown.
(trees_in::lang_type_bools): Read the above flags.
(trees_in::decl_value): Reset CLASSTYPE_INTERFACE_* except for
extern templates.
(trees_in::read_class_def): Remove buggy extern template
handling.

gcc/testsuite/ChangeLog:

* g++.dg/modules/debug-2_a.C: New test.
* g++.dg/modules/debug-2_b.C: New test.
* g++.dg/modules/debug-2_c.C: New test.
* g++.dg/modules/debug-3_a.C: New test.
* g++.dg/modules/debug-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 36 +---
  gcc/testsuite/g++.dg/modules/debug-2_a.C |  9 ++
  gcc/testsuite/g++.dg/modules/debug-2_b.C |  8 ++
  gcc/testsuite/g++.dg/modules/debug-2_c.C |  9 ++
  gcc/testsuite/g++.dg/modules/debug-3_a.C |  8 ++
  gcc/testsuite/g++.dg/modules/debug-3_b.C |  9 ++
  6 files changed, 63 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 350ad15dc62..efc1d532a6e 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5806,10 +5806,8 @@ trees_out::lang_type_bools (tree t)
  
WB ((lang->gets_delete >> 0) & 1);

WB ((lang->gets_delete >> 1) & 1);
-  // Interfaceness is recalculated upon reading.  May have to revisit?
-  // How do dllexport and dllimport interact across a module?
-  // lang->interface_only
-  // lang->interface_unknown
+  WB (lang->interface_only);
+  WB (lang->interface_unknown);
WB (lang->contains_empty_class_p);
WB (lang->anon_aggr);
WB (lang->non_zero_init);
@@ -5877,9 +5875,8 @@ trees_in::lang_type_bools (tree t)
v = b () << 0;
v |= b () << 1;
lang->gets_delete = v;
-  // lang->interface_only
-  // lang->interface_unknown
-  lang->interface_unknown = true; // Redetermine interface
+  RB (lang->interface_only);
+  RB

Re: Add -falign-all-functions

2024-01-17 Thread Jan Hubicka

> On Wed, 17 Jan 2024, Jan Hubicka wrote:
> 
> > > 
> > > I meant the new option might be named -fmin-function-alignment=
> > > rather than -falign-all-functions because of how it should
> > > override all other options.
> > 
> > I was also pondering about both names.  -falign-all-functions has the
> > advantage that it is similar to all the other alignment flags that are
> > all called -falign-XXX
> > 
> > but both options are finte for me.
> > > 
> > > Otherwise is there an updated patch to look at?
> > 
> > I will prepare one.  So shall I drop the max-skip support for alignment
> > and rename the flag?
> 
> Yes.
OK, here is updated version.
Bootstrapped/regtested on x86_64-linux, OK?

gcc/ChangeLog:

* common.opt (flimit-function-alignment): Reorder so file is
alphabetically ordered.
(flimit-function-alignment): New flag.
* doc/invoke.texi (-fmin-function-alignment): Document
(-falign-jumps,-falign-labels): Document that this is an optimization
bypassed in cold code.
* varasm.cc (assemble_start_function): Honor -fmin-function-alignment.

diff --git a/gcc/common.opt b/gcc/common.opt
index 5f0a101bccb..6e85853f086 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1040,9 +1040,6 @@ Align the start of functions.
 falign-functions=
 Common RejectNegative Joined Var(str_align_functions) Optimization
 
-flimit-function-alignment
-Common Var(flag_limit_function_alignment) Optimization Init(0)
-
 falign-jumps
 Common Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
@@ -2277,6 +2274,10 @@ fmessage-length=
 Common RejectNegative Joined UInteger
 -fmessage-length=  Limit diagnostics to  characters per 
line.  0 suppresses line-wrapping.
 
+fmin-function-alignment=
+Common Joined RejectNegative UInteger Var(flag_min_function_alignment) 
Optimization
+Align the start of every function.
+
 fmodulo-sched
 Common Var(flag_modulo_sched) Optimization
 Perform SMS based modulo scheduling before the first scheduling pass.
@@ -2601,6 +2602,9 @@ starts and when the destructor finishes.
 flifetime-dse=
 Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization 
IntegerRange(0, 2)
 
+flimit-function-alignment
+Common Var(flag_limit_function_alignment) Optimization Init(0)
+
 flive-patching
 Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43fd3c3a3cd..456374d9446 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -546,6 +546,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
 -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
 -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
+-fmin-function-alignment=[@var{n}]
 -fno-allocation-dce -fallow-store-data-races
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}]
 -fauto-inc-dec  -fbranch-probabilities
@@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two 
greater than or
 equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
 without crossing an @var{n}-byte alignment boundary.
+This is an optimization of code performance and alignment is ignored for
+functions considered cold.  If alignment is required for all functions,
+use @option{-fmin-function-alignment}.
 
 If @var{m} is not specified, it defaults to @var{n}.
 
@@ -14240,6 +14244,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
 Align loops to a power-of-two boundary.  If the loops are executed
 many times, this makes up for any execution of the dummy padding
 instructions.
+This is an optimization of code performance and alignment is ignored for
+loops considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14262,6 +14268,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
 Align branch targets to a power-of-two boundary, for branch targets
 where the targets can only be reached by jumping.  In this case,
 no dummy operations need be executed.
+This is an optimization of code performance and alignment is ignored for
+jumps considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14275,6 +14283,14 @@ The maximum allowed @var{n} option value is 65536.
 
 Enabled at levels @option{-O2}, @option{-O3}.
 
+@opindex fmin-function-alignment=@var{n}
+@item -fmin-function-alignment
+Specify minimal alignment of functions to the next power-of-two greater than or
+equal to @var{n}. Unlike @option{-falign-functions} this alignment is applied
+also to all functions (even those considered cold).  The alignment is also not
+affected by @option{-flimit-function-alignment}
+
+
 @opindex fno-allocation-dce
 @item -fno-allocation-dce
 Do not remove unused C++ allocations in dead code elimination.
@@ -14371,7 +14387,7 @@ To use the link-time

Re: [PATCH] c++/modules: Prevent overwriting arguments for duplicates [PR112588]

2024-01-17 Thread Jason Merrill


On 1/8/24 12:04, Patrick Palka wrote:

On Mon, 8 Jan 2024, Nathaniel Shead wrote:


On Sat, Jan 06, 2024 at 05:32:37PM -0500, Nathan Sidwell wrote:

I;m not sure about this, there was clearly a reason I did it the way it is,
but perhaps that reasoning became obsolete -- something about an existing
declaration and reading in a definition maybe?


So I took a bit of a closer look and this is actually a regression,
seeming to start with r13-3134-g09df0d8b14dda6. I haven't looked more
closely at the actual change though to see whether this implies a
different fix yet though.


Interesting..  FWIW I applied your patch to the gcc 12 release branch,
which doesn't have r13-3134, and there were no modules testsuite
regressions there either, which at least suggests that this maybe_dup
logic isn't directly related to the optimization that r13-3134 removed.

Your patch also seems to fix PR99244 (which AFAICT is not a regression)


It seems to me we always want the DECL_ARGUMENTS corresponding to the 
actual definition we're using, which since "installing" is true, is the 
new definition.  In duplicate_decls when we merge a new definition into 
an old declaration, we give the old declaration the new DECL_ARGUMENTS.


The patch is OK.


On 11/22/23 06:33, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

When merging duplicate instantiations of function templates, currently
read_function_def overwrites the arguments with that of the existing
duplicate. This is problematic, however, since this means that the
PARM_DECLs in the body of the function definition no longer match with
the PARM_DECLs in the argument list, which causes issues when it comes
to generating RTL.

There doesn't seem to be any reason to do this replacement, so this
patch removes that logic.

PR c++/112588

gcc/cp/ChangeLog:

* module.cc (trees_in::read_function_def): Don't overwrite
arguments.

gcc/testsuite/ChangeLog:

* g++.dg/modules/merge-16.h: New test.
* g++.dg/modules/merge-16_a.C: New test.
* g++.dg/modules/merge-16_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/module.cc  |  2 --
   gcc/testsuite/g++.dg/modules/merge-16.h   | 10 ++
   gcc/testsuite/g++.dg/modules/merge-16_a.C |  7 +++
   gcc/testsuite/g++.dg/modules/merge-16_b.C |  5 +
   4 files changed, 22 insertions(+), 2 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16.h
   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_a.C
   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4f5b6e2747a..2520ab659cc 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11665,8 +11665,6 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
 DECL_RESULT (decl) = result;
 DECL_INITIAL (decl) = initial;
 DECL_SAVED_TREE (decl) = saved;
-  if (maybe_dup)
-   DECL_ARGUMENTS (decl) = DECL_ARGUMENTS (maybe_dup);
 if (context)
SET_DECL_FRIEND_CONTEXT (decl, context);
diff --git a/gcc/testsuite/g++.dg/modules/merge-16.h 
b/gcc/testsuite/g++.dg/modules/merge-16.h
new file mode 100644
index 000..fdb38551103
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/merge-16.h
@@ -0,0 +1,10 @@
+// PR c++/112588
+
+void f(int*);
+
+template 
+struct S {
+  void g(int n) { f(&n); }
+};
+
+template struct S;


If we use a partial specialization here instead (which would have disabled
the removed optimization, demonstrating how fragile/inconsistent it was)

   void f(int*);

   template 
   struct S { };

   template
   struct S {
 void g(int n) { f(&n); }
   };

   template struct S;

then the ICE appears earlier, since GCC 12 instead of 13.


diff --git a/gcc/testsuite/g++.dg/modules/merge-16_a.C 
b/gcc/testsuite/g++.dg/modules/merge-16_a.C
new file mode 100644
index 000..c243224c875
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/merge-16_a.C
@@ -0,0 +1,7 @@
+// PR c++/112588
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi merge16 }
+
+module;
+#include "merge-16.h"
+export module merge16;
diff --git a/gcc/testsuite/g++.dg/modules/merge-16_b.C 
b/gcc/testsuite/g++.dg/modules/merge-16_b.C
new file mode 100644
index 000..8c7b1f0511f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/merge-16_b.C
@@ -0,0 +1,5 @@
+// PR c++/112588
+// { dg-additional-options "-fmodules-ts" }
+
+#include "merge-16.h"
+import merge16;


--
Nathan Sidwell

Re: [PATCH] c++: address of NTTP object as targ [PR113242]

2024-01-17 Thread Jason Merrill


On 1/17/24 10:43, Patrick Palka wrote:

On Mon, 15 Jan 2024, Jason Merrill wrote:

On 1/5/24 11:50, Patrick Palka wrote:


invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.


Hmm, I suppose so; https://eel.is/c++draft/temp#param-8 saying "No two
template parameter objects are template-argument-equivalent" suggests there
can be only one.  And clang/msvc allow it.


+   else if (VAR_P (decl) && !DECL_NTTP_OBJECT_P (decl)
+&& DECL_ARTIFICIAL (decl))


If now some artificial variables are OK and others are not, perhaps we should
enumerate them either way and abort if it's one we haven't specifically
considered.


Sounds good, like so?  Shall we backport this patch or the original
patch to the 13 branch?


Hmm, looks like this patch changes the non-checking default behavior 
from reject to accept; maybe just add a checking_assert (tinfo || fname) 
to your original patch?  OK with that change, for trunk and 13.



-- >8 --

Subject: [PATCH] c++: address of class NTTP object as targ [PR113242]

invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.

This patch fixes this by refining the DECL_ARTIFICIAL rejection test to
check specifically for the kinds of artificial variables we want to
exclude.

PR c++/113242

gcc/cp/ChangeLog:

* pt.cc (invalid_tparm_referent_p) : Refine
DECL_ARTIFICIAL rejection test.  Assert that C++20 template
parameter objects are the only artificial variables we accept.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class61.C: New test.
---
  gcc/cp/pt.cc | 13 +++---
  gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
  2 files changed, 37 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b6117231de1..885c297450e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7212,12 +7212,14 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
/* C++17: For a non-type template-parameter of reference or pointer
   type, the value of the constant expression shall not refer to (or
   for a pointer type, shall not be the address of):
-  * a subobject (4.5),
+  * a subobject (4.5), (relaxed in C++20)
   * a temporary object (15.2),
-  * a string literal (5.13.5),
+  * a string literal (5.13.5), (we diagnose this early in
+convert_nontype_argument)
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+   else if (VAR_P (decl)
+&& (DECL_TINFO_P (decl) || DECL_FNAME_P (decl)))
  {
if (complain & tf_error)
  error ("the address of %qD is not a valid template argument",
@@ -7242,6 +7244,11 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
 decl);
return true;
  }
+
+   /* The only artificial variables we do accept are C++20
+  template parameter objects.   */
+   if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+ gcc_checking_assert (DECL_NTTP_OBJECT_P (decl));
}
break;
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

new file mode 100644
index 000..90805a05ecf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
@@ -0,0 +1,27 @@
+// PR c++/113242
+// { dg-do compile { target c++20 } }
+
+struct wrapper {
+  int n;
+};
+
+template
+void f1() {
+  static_assert(X.n == 42);
+}
+
+template
+void f2() {
+  static_assert(X->n == 42);
+}
+
+template
+void g() {
+  f1();
+  f2<&X>();
+}
+
+int main() {
+  constexpr wrapper X = {42};
+  g();
+}

Re: [PATCH v1] Fix compare-debug bootstrap failure

2024-01-17 Thread Jakub Jelinek

On Wed, Jan 17, 2024 at 03:40:20PM +, Maxim Kuvyrkov wrote:
> ... caused by scheduler fix for PR96388 and PR111554.
> 
> This patch adjusts decision sched-deps.cc:find_inc() to use
> length of dependency lists sans any DEBUG_INSN instructions.
> 
> gcc/ChangeLog:
> 

Please mention
PR bootstrap/113445
here

>   * haifa-sched.cc (dep_list_size): Make global.
>   * sched-deps.cc (find_inc): Use instead of sd_lists_size().
>   * sched-int.h (dep_list_size): Declare.

and include some testcase from the PR into the testsuite.
Otherwise LGTM.

Jakub

RE: [PATCH v2 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2024-01-17 Thread Kyrylo Tkachov

Hi Andre,

> -Original Message-
> From: Andre Vieira 
> Sent: Friday, January 5, 2024 5:52 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Stam Markianos-Wright
> 
> Subject: [PATCH v2 2/2] arm: Add support for MVE Tail-Predicated Low Overhead
> Loops
> 
> Respin after comments on first version.

I think I'm nitpicking some code style and implementation points rather than 
diving deep into the algorithms, I think those were okay last time I looked at 
this some time ago.

+/* Return true if INSN is a MVE instruction that is VPT-predicable, but in
+   its unpredicated form, or if it is predicated, but on a predicate other
+   than VPR_REG.  */
+
+static bool
+arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate (rtx_insn *insn,
+ rtx vpr_reg)
+{
+  rtx insn_vpr_reg_operand;
+  if (MVE_VPT_UNPREDICATED_INSN_P (insn)
+  || (MVE_VPT_PREDICATED_INSN_P (insn)
+ && (insn_vpr_reg_operand = arm_get_required_vpr_reg_param (insn))
+ && !rtx_equal_p (vpr_reg, insn_vpr_reg_operand)))
+return true;
+  else
+return false;
+}
+
+/* Return true if INSN is a MVE instruction that is VPT-predicable and is
+   predicated on VPR_REG.  */
+
+static bool
+arm_mve_vec_insn_is_predicated_with_this_predicate (rtx_insn *insn,
+   rtx vpr_reg)
+{
+  rtx insn_vpr_reg_operand;
+  if (MVE_VPT_PREDICATED_INSN_P (insn)
+  && (insn_vpr_reg_operand = arm_get_required_vpr_reg_param (insn))
+  && rtx_equal_p (vpr_reg, insn_vpr_reg_operand))
+return true;
+  else
+return false;
+}

These two functions seem to have an "if (condition) return true; else return 
false;" structure that we try to avoid. How about:
rtx_insn vpr_reg_operand = MVE_VPT_PREDICATED_INSN_P (insn)  ? 
arm_get_required_vpr_reg_param (insn) : NULL_RTX;
return vpr_reg_operand && rtx_equal_p (vpr_reg, insn_vpr_reg_operand);


+static bool
+arm_is_mve_across_vector_insn (rtx_insn* insn)
+{
+  df_ref insn_defs = NULL;
+  if (!MVE_VPT_PREDICABLE_INSN_P (insn))
+return false;
+
+  bool is_across_vector = false;
+  FOR_EACH_INSN_DEF (insn_defs, insn)
+if (!VALID_MVE_MODE (GET_MODE (DF_REF_REG (insn_defs)))
+   && !arm_get_required_vpr_reg_ret_val (insn))
+  is_across_vector = true;
+

You can just return true here immediately, no need to set is_across_vector

+  return is_across_vector;

... and you can return false here, avoiding the need for is_across_vector 
entirely
+}

+static bool
+arm_mve_check_reg_origin_is_num_elems (basic_block body, rtx reg, rtx 
vctp_step)
+{
+  /* Ok, we now know the loop starts from zero and increments by one.
+ Now just show that the max value of the counter came from an
+ appropriate ASHIFRT expr of the correct amount.  */
+  basic_block pre_loop_bb = body->prev_bb;
+  while (pre_loop_bb && BB_END (pre_loop_bb)
+&& !df_bb_regno_only_def_find (pre_loop_bb, REGNO (reg)))
+pre_loop_bb = pre_loop_bb->prev_bb;
+
+  df_ref counter_max_last_def = df_bb_regno_only_def_find (pre_loop_bb, REGNO 
(reg));
+  if (!counter_max_last_def)
+return false;
+  rtx counter_max_last_set = single_set (DF_REF_INSN (counter_max_last_def));
+  if (!counter_max_last_set)
+return false;
+
+  /* If we encounter a simple SET from a REG, follow it through.  */
+  if (REG_P (SET_SRC (counter_max_last_set)))
+return arm_mve_check_reg_origin_is_num_elems
+(pre_loop_bb->next_bb, SET_SRC (counter_max_last_set), vctp_step);
+
+  /* If we encounter a SET from an IF_THEN_ELSE where one of the operands is a
+ constant and the other is a REG, follow through to that REG.  */
+  if (GET_CODE (SET_SRC (counter_max_last_set)) == IF_THEN_ELSE
+  && REG_P (XEXP (SET_SRC (counter_max_last_set), 1))
+  && CONST_INT_P (XEXP (SET_SRC (counter_max_last_set), 2)))
+return arm_mve_check_reg_origin_is_num_elems
+(pre_loop_bb->next_bb, XEXP (SET_SRC (counter_max_last_set), 1), 
vctp_step);
+
+  if (GET_CODE (SET_SRC (counter_max_last_set)) == ASHIFTRT
+  && CONST_INT_P (XEXP (SET_SRC (counter_max_last_set), 1))
+  && ((1 << INTVAL (XEXP (SET_SRC (counter_max_last_set), 1)))
+  == abs (INTVAL (vctp_step

I'm a bit concerned here with using abs() for HOST_WIDE_INT values that are 
compared to other HOST_WIDE_INT values.
abs () will implicitly cast the argument and return an int. We should use the 
abs_hwi function defined in hwint.h. It may not cause problems in practice 
given the ranges involved, but better safe than sorry at this stage.

Looks decent to me otherwise, and an impressive piece of work, thanks.
I'd give Richard an opportunity to comment next week when he's back before 
committing though.
Thanks,
Kyrill

[PATCH] Avoid ICE on m68k -fzero-call-used-regs -fpic [PR110934]

2024-01-17 Thread Mikael Pettersson

PR110934 is a problem on m68k where -fzero-call-used-regs -fpic ICEs
when clearing an FP register.

The generic code generates an XFmode move of zero to that register,
which becomes an XFmode load from initialized data, which due to -fpic
uses a non-constant address, which the backend rejects.  The
zero-call-used-regs pass runs very late, after register allocation and
frame layout, and at that point we can't allow new uses of the PIC
register or new pseudos.

To clear an FP register on m68k it's enough to do the move in SFmode,
but the generic code can't be told to do that, so this patch updates
m68k to use its own TARGET_ZERO_CALL_USED_REGS.

Bootstrapped and regression tested on m68k-linux-gnu.

Ok for master? (I don't have commit rights.)

gcc/

PR target/110934
* config/m68k/m68k.cc (m68k_zero_call_used_regs): New function.
(TARGET_ZERO_CALL_USED_REGS): Define.

gcc/testsuite/

PR target/110934
* gcc.target/m68k/pr110934.c: New test.
---
 gcc/config/m68k/m68k.cc  | 46 
 gcc/testsuite/gcc.target/m68k/pr110934.c |  9 +
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/m68k/pr110934.c

diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc
index e9325686b92..72a29d772ea 100644
--- a/gcc/config/m68k/m68k.cc
+++ b/gcc/config/m68k/m68k.cc
@@ -197,6 +197,7 @@ static bool m68k_modes_tieable_p (machine_mode, 
machine_mode);
 static machine_mode m68k_promote_function_mode (const_tree, machine_mode,
int *, const_tree, int);
 static void m68k_asm_final_postscan_insn (FILE *, rtx_insn *insn, rtx [], int);
+static HARD_REG_SET m68k_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs);
 
 /* Initialize the GCC target structure.  */
 
@@ -361,6 +362,9 @@ static void m68k_asm_final_postscan_insn (FILE *, rtx_insn 
*insn, rtx [], int);
 #undef TARGET_ASM_FINAL_POSTSCAN_INSN
 #define TARGET_ASM_FINAL_POSTSCAN_INSN m68k_asm_final_postscan_insn
 
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS m68k_zero_call_used_regs
+
 TARGET_GNU_ATTRIBUTES (m68k_attribute_table,
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
@@ -7166,4 +7170,46 @@ m68k_promote_function_mode (const_tree type, 
machine_mode mode,
   return mode;
 }
 
+/* Implement TARGET_ZERO_CALL_USED_REGS.  */
+
+static HARD_REG_SET
+m68k_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  rtx zero_fpreg = NULL_RTX;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+  {
+   rtx reg, zero;
+
+   if (INT_REGNO_P (regno))
+ {
+   reg = regno_reg_rtx[regno];
+   zero = CONST0_RTX (SImode);
+ }
+   else if (FP_REGNO_P (regno))
+ {
+   reg = gen_raw_REG (SFmode, regno);
+   if (zero_fpreg == NULL_RTX)
+ {
+   /* On the 040/060 clearing an FP reg loads a large
+  immediate.  To reduce code size use the first
+  cleared FP reg to clear remaing ones.  Don't do
+  this on cores which use fmovecr.  */
+   zero = CONST0_RTX (SFmode);
+   if (TUNE_68040_60)
+ zero_fpreg = reg;
+ }
+   else
+ zero = zero_fpreg;
+ }
+   else
+ gcc_unreachable ();
+
+   emit_move_insn (reg, zero);
+  }
+
+  return need_zeroed_hardregs;
+}
+
 #include "gt-m68k.h"
diff --git a/gcc/testsuite/gcc.target/m68k/pr110934.c 
b/gcc/testsuite/gcc.target/m68k/pr110934.c
new file mode 100644
index 000..8c21d46f660
--- /dev/null
+++ b/gcc/testsuite/gcc.target/m68k/pr110934.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { do-options "-fzero-call-used-regs=used -fpic -O2" } */
+
+extern double clobber_fp0 (void);
+
+void foo (void)
+{
+  clobber_fp0 ();
+}
-- 
2.43.0

Re: Disable FMADD in chains for Zen4 and generic

2024-01-17 Thread Jan Hubicka

> Can we backport the patch(at least the generic part) to
> GCC11/GCC12/GCC13 release branch?

Yes, the periodic testers has took the change and as far as I can tell,
there are no surprises.

Thanks,
Honza
> > > >
> > > >  /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 
> > > > 512bit or
> > > > smaller FMA chain.  */
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> 
> 
> 
> -- 
> BR,
> Hongtao

Fix handling of X86_TUNE_AVOID_512FMA_CHAINS

2024-01-17 Thread Jan Hubicka

Hi,
I have noticed quite bad pasto in handling of X86_TUNE_AVOID_512FMA_CHAINS.  At 
the
moment it is ignored, but X86_TUNE_AVOID_256FMA_CHAINS controls 512FMA too.
This patch fixes it, we may want to re-check how that works on AVX512 machines.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_option_override_internal): Fix
handling of X86_TUNE_AVOID_512FMA_CHAINS.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 3605c2c53fb..b6f634e9a32 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3248,7 +3248,7 @@ ix86_option_override_internal (bool main_args_p,
   = (cf_protection_level) (opts->x_flag_cf_protection | CF_SET);
 }
 
-  if (ix86_tune_features [X86_TUNE_AVOID_256FMA_CHAINS])
+  if (ix86_tune_features [X86_TUNE_AVOID_512FMA_CHAINS])
 SET_OPTION_IF_UNSET (opts, opts_set, param_avoid_fma_max_bits, 512);
   else if (ix86_tune_features [X86_TUNE_AVOID_256FMA_CHAINS])
 SET_OPTION_IF_UNSET (opts, opts_set, param_avoid_fma_max_bits, 256);

Remove accidental hack in ipa_polymorphic_call_context::set_by_invariant

2024-01-17 Thread Jan Hubicka

Hi,
I managed to commit a hack setting offset to 0 in
ipa_polymorphic_call_context::set_by_invariant.  This makes it to give up on 
multiple
inheritance, but most likely won't give bad code since the ohter base will be of
different type.  

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* ipa-polymorphic-call.cc 
(ipa_polymorphic_call_context::set_by_invariant): Remove
accidental hack reseting offset.

diff --git a/gcc/ipa-polymorphic-call.cc b/gcc/ipa-polymorphic-call.cc
index 8667059abee..81de6d7fc33 100644
--- a/gcc/ipa-polymorphic-call.cc
+++ b/gcc/ipa-polymorphic-call.cc
@@ -766,7 +766,6 @@ ipa_polymorphic_call_context::set_by_invariant (tree cst,
   tree base;
 
   invalid = false;
-  off = 0;
   clear_outer_type (otr_type);
 
   if (TREE_CODE (cst) != ADDR_EXPR)

[PATCH] sra: Disqualify bases of operands of asm gotos

2024-01-17 Thread Martin Jambor

Hi,

PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto.  We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements.  Because we can't
have that after ASM gotos, we need to punt.

Bootstrapped and tested on x86_64-linux, OK for master?  It will need
some tweaking for release branches, is it in principle OK for them too
(after testing)?

Thanks,

Martin


gcc/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* tree-sra.cc (scan_function): Disqualify bases of operands of asm
gotos.

gcc/testsuite/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr110422.c | 10 +
 gcc/tree-sra.cc | 29 -
 2 files changed, 33 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110422.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110422.c 
b/gcc/testsuite/gcc.dg/torture/pr110422.c
new file mode 100644
index 000..2e171a7a19e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110422.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+struct T { int x; };
+int foo(void) {
+  struct T v;
+  asm goto("" : "+r"(v.x) : : : lab);
+  return 0;
+lab:
+  return -5;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 6a1141b7377..f8e71ec48b9 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1559,15 +1559,32 @@ scan_function (void)
case GIMPLE_ASM:
  {
gasm *asm_stmt = as_a  (stmt);
-   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+   if (stmt_ends_bb_p (asm_stmt)
+   && !single_succ_p (gimple_bb (asm_stmt)))
  {
-   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
-   ret |= build_access_from_expr (t, asm_stmt, false);
+   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
+   disqualify_base_of_expr (t, "OP of asm goto.");
+ }
+   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
+   disqualify_base_of_expr (t, "OP of asm goto.");
+ }
  }
-   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+   else
  {
-   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
-   ret |= build_access_from_expr (t, asm_stmt, true);
+   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
+   ret |= build_access_from_expr (t, asm_stmt, false);
+ }
+   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
+   ret |= build_access_from_expr (t, asm_stmt, true);
+ }
  }
  }
  break;
-- 
2.43.0

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread H.J. Lu

On Wed, Jan 17, 2024 at 7:02 AM Richard Biener
 wrote:
>
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>  wrote:
> >
> > > On Jan 17, 2024, at 10:51, Richard Biener  
> > > wrote:
> > >
> > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
> > >>
> > >>
> > >>
> > >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> > >>> Hi Vladimir,
> > >>> Hi Jeff,
> > >>>
> > >>> Richard and Alexander have reviewed this patch and [I assume] have no
> > >>> further comments.  OK to merge?
> > >> I think the question is whether or not we're too late.  I know that
> > >> Richard S has held off on his late-combine pass and I'm holding off on
> > >> the ext-dce work due to the fact that we're well past stage1 close.
> > >>
> > >> I think the release managers ought to have the final say on this.
> > >
> > > I'm fine with this now, it doesn't change code generation.
> >
> > Thanks, Richard.
> >
> > I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
> > cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
> > stage1 opens.
>
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>
> gcc/fortran/f95-lang.o differs
>
> does n_mem_deps or n_inc_deps include debug insns?
>
> Richard.

FWIW, I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113456

-- 
H.J.

RE: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Alex Coplan 
> Sent: Wednesday, January 17, 2024 12:59 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to
> allow all subreg [PR113221]
> 
> Hi Andrew,
> 
> On 16/01/2024 19:29, Andrew Pinski wrote:
> > So the problem here is that aarch64_ldp_reg_operand will all subreg even
> subreg of lo_sum.
> > When LRA tries to fix that up, all things break. So the fix is to
> > change the check to only allow reg and subreg of regs.
> 
> Thanks a lot for tracking this down, I really appreciate having some help with
> the bug-fixing.  Sorry for not getting to it sooner myself, I'm working on
> PR113089 which ended up taking longer than expected to fix.
> 
> >
> > Note the tendancy here is to use register_operand but that checks the
> > mode of the register but we need to allow a mismatch modes for this
> predicate for now.
> 
> Yeah, due to the design of the patterns using special predicates we need to
> allow a mode mismatch with the contextual mode.
> 
> The patch broadly LGTM (although I can't approve), but I've left a couple of
> minor comments below.
> 
> >
> > Built and tested for aarch64-linux-gnu with no regressions (Also
> > tested with the LD/ST pair pass back on).
> >
> > PR target/113221
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For
> subreg,
> > only allow REG operands isntead of allowing all.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/compile/pr113221-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/aarch64/predicates.md |  8 +++-
> >  gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
> >  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> >
> > diff --git a/gcc/config/aarch64/predicates.md
> > b/gcc/config/aarch64/predicates.md
> > index 8a204e48bb5..256268517d8 100644
> > --- a/gcc/config/aarch64/predicates.md
> > +++ b/gcc/config/aarch64/predicates.md
> > @@ -313,7 +313,13 @@ (define_predicate "pmode_plus_operator"
> >
> >  (define_special_predicate "aarch64_ldp_reg_operand"
> >(and
> > -(match_code "reg,subreg")
> > +(ior
> > +  (match_code "reg")
> > +  (and
> > +   (match_code "subreg")
> > +   (match_test "GET_CODE (SUBREG_REG (op)) == REG")
> 
> This could be just REG_P (SUBREG_REG (op)) in the match_test.
> 
> > +  )
> > +)
> 
> I think it would be more in keeping with the style in the rest of the file to 
> have
> the closing parens on the same line as the SUBREG_REG match_test.
> 
> >  (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
> >  (ior
> >(match_test "mode == VOIDmode") diff --git
> > a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> > new file mode 100644
> > index 000..152a510786e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */
> 
> Does this need to be dg-additional-options?  Naively I would expect the dg-
> options clause to override the torture options (and potentially any options
> provided in RUNTESTFLAGS, e.g. to re-enable the ldp/stp pass).

I just checked my testsuite run and the answer for this is no it does not need 
to be dg-additional-options in this case.
dg-options does not override the torture options but rather puts them after 
those ones. 
As far as I understand it, dg-additional-options makes it easier to have 
different options added per target but in this case we don't need that.

Will update the patch with the rest of the changes and push it in next few 
hours.
I did notice an issue with the testcase though, I need to cast to __SIZE_TYPE__ 
instead of long to allow it to work with targets that are not ILP32 and LP54. I 
will fix that too.

Thanks,
Andrew Pinski

> 
> Thanks again for the patch, and apologies for the oversight on my part: I'd
> missed that register_operand also checks the code inside the subreg.
> 
> Alex
> 
> > +/* PR target/113221 */
> > +/* This used to ICE after the `load/store pair fusion pass` was added
> > +   due to the predicate aarch64_ldp_reg_operand allowing too much. */
> > +
> > +
> > +void bar();
> > +void foo(int* b) {
> > +  for (;;)
> > +*b++ = (long)bar;
> > +}
> > +
> > --
> > 2.39.3
> >

[COMITTED 2/2] RISC-V: fix some vsetvl debug info in pass's Phase 2 code [NFC]

2024-01-17 Thread Vineet Gupta

When staring at VSETVL pass for PR/113429, spotted some minor
improvements.

1. For readablity, remove some redundant condition check in Phase 2
   function  earliest_fuse_vsetvl_info ().
2. Add iteration count in debug prints in same function.

gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (earliest_fuse_vsetvl_info):
Remove redundant checks in else condition for readablity.
(earliest_fuse_vsetvl_info) Print iteration count in debug
prints.
(earliest_fuse_vsetvl_info) Fix misleading vsetvl info
dump details in certain cases.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-vsetvl.cc | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 78a2f7b38faf..41d4b80648f6 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2343,7 +2343,7 @@ public:
   void compute_lcm_local_properties ();
 
   void fuse_local_vsetvl_info ();
-  bool earliest_fuse_vsetvl_info ();
+  bool earliest_fuse_vsetvl_info (int iter);
   void pre_global_vsetvl_info ();
   void emit_vsetvl ();
   void cleaup ();
@@ -2961,7 +2961,7 @@ pre_vsetvl::fuse_local_vsetvl_info ()
 
 
 bool
-pre_vsetvl::earliest_fuse_vsetvl_info ()
+pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
 {
   compute_avl_def_data ();
   compute_vsetvl_def_data ();
@@ -2984,7 +2984,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "\n  Compute LCM earliest insert data (lift 
%d):\n\n",
+  iter);
   fprintf (dump_file, "Expression List (%u):\n", num_exprs);
   for (unsigned i = 0; i < num_exprs; i++)
{
@@ -3032,7 +3033,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, "Fused global info result:\n");
+  fprintf (dump_file, "Fused global info result (lift %d):\n", iter);
 }
 
   bool changed = false;
@@ -3142,8 +3143,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (src_block_info.has_info ())
src_block_info.probability += dest_block_info.probability;
}
- else if (src_block_info.has_info ()
-  && !m_dem.compatible_p (prev_info, curr_info))
+ else
{
  /* Cancel lift up if probabilities are equal.  */
  if (successors_probability_equal_p (eg->src))
@@ -3151,11 +3151,11 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
-  "  Change empty bb %u to from:",
+  "  Reset bb %u:",
   eg->src->index);
  prev_info.dump (dump_file, "");
  fprintf (dump_file,
-  "to (higher probability):");
+  "due to (same probability):");
  curr_info.dump (dump_file, "");
}
  src_block_info.set_empty_info ();
@@ -3170,7 +3170,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
-  "  Change empty bb %u to from:",
+  "  Change bb %u from:",
   eg->src->index);
  prev_info.dump (dump_file, "");
  fprintf (dump_file,
@@ -3627,7 +3627,7 @@ pass_vsetvl::lazy_vsetvl ()
 {
   if (dump_file)
fprintf (dump_file, "  Try lift up %d.\n\n", fused_count);
-  changed = pre.earliest_fuse_vsetvl_info ();
+  changed = pre.earliest_fuse_vsetvl_info (fused_count);
   fused_count += 1;
   } while (changed);
 
-- 
2.34.1

[COMITTED 1/2] RISC-V: RVV: add toggle to control vsetvl pass behavior

2024-01-17 Thread Vineet Gupta

RVV requires VSET?VL? instructions to dynamically configure VLEN at
runtime. There's a custom pass to do that which has a simple mode
which generates a VSETVL for each V insn and a lazy/optimal mode which
uses LCM dataflow to move VSETVL around, identify/delete the redundant
ones.

Currently simple mode is default for !optimize invocations while lazy
mode being the default.

This patch allows simple mode to be forced via a toggle independent of
the optimization level. A lot of gcc developers are currently doing this
in some form in their local setups, as in the initial phase of autovec
development issues are expected. It makes sense to provide this facility
upstream. It could potentially also be used by distro builder for any
quick workarounds in autovec bugs of future.

gcc/ChangeLog:
* config/riscv/riscv.opt: New -param=vsetvl-strategy.
* config/riscv/riscv-opts.h: New enum vsetvl_strategy_enum.
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::pre_global_vsetvl_info): Use vsetvl_strategy.
(pass_vsetvl::execute): Use vsetvl_strategy.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-opts.h|  9 +
 gcc/config/riscv/riscv-vsetvl.cc |  2 +-
 gcc/config/riscv/riscv.opt   | 14 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff4406ab8eaf..ca57dddf1d9a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -116,6 +116,15 @@ enum stringop_strategy_enum {
   STRATEGY_AUTO = STRATEGY_SCALAR | STRATEGY_VECTOR
 };
 
+/* Behavior of VSETVL Pass.  */
+enum vsetvl_strategy_enum {
+  /* Simple: Insert a vsetvl* instruction for each Vector instruction.  */
+  VSETVL_SIMPLE = 1,
+  /* Optimized: Run LCM dataflow analysis to reduce vsetvl* insns and
+ delete any redundant ones generated in the process.  */
+  VSETVL_OPT = 2
+};
+
 #define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
 
 /* Bit of riscv_zvl_flags will set contintuly, N-1 bit will set if N-bit is
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df7ed149388a..78a2f7b38faf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3671,7 +3671,7 @@ pass_vsetvl::execute (function *)
   if (!has_vector_insn (cfun))
 return 0;
 
-  if (!optimize)
+  if (!optimize || vsetvl_strategy & VSETVL_SIMPLE)
 simple_vsetvl ();
   else
 lazy_vsetvl ();
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 44ed6d69da29..fd4f1a4df206 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -546,6 +546,20 @@ Target Undocumented Bool Var(riscv_vector_abi) Init(0)
 Enable the use of vector registers for function arguments and return value.
 This is an experimental switch and may be subject to change in the future.
 
+Enum
+Name(vsetvl_strategy) Type(enum vsetvl_strategy_enum)
+Valid arguments to -param=vsetvl-strategy=:
+
+EnumValue
+Enum(vsetvl_strategy) String(simple) Value(VSETVL_SIMPLE)
+
+EnumValue
+Enum(vsetvl_strategy) String(optim) Value(VSETVL_OPT)
+
+-param=vsetvl-strategy=
+Target Undocumented RejectNegative Joined Enum(vsetvl_strategy) 
Var(vsetvl_strategy) Init(VSETVL_OPT)
+-param=vsetvl-strategy=Set the optimization level of VSETVL 
insert pass.
+
 Enum
 Name(stringop_strategy) Type(enum stringop_strategy_enum)
 Valid arguments to -mstringop-strategy=:
-- 
2.34.1

Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread François Dumont


Hi

Looks like a great finding to me, this is indeed a useless check, thanks!

Have you any figures on the performance enhancement ? It might help to 
get proper approval as gcc is currently in dev stage 4 that is to say 
only bug fixes normally.


François

On 17/01/2024 09:11, Huanghui Nie wrote:


Hi.

When I implemented a hash table with reference to the C++ STL, I found 
that when the hash table in the C++ STL deletes elements, if the first 
element deleted is the begin element, the before begin node is 
repeatedly assigned. This creates unnecessary performance overhead.



First, let’s see the code implementation:

In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when 
&_M_before_begin == _M_buckets[__bkt]. That also means 
_M_buckets[__bkt]->_M_nxt is assigned under some conditions.


_M_remove_bucket_begin is called by _M_erase and _M_extract_node:

 1. Case _M_erase a range: _M_remove_bucket_begin is called in a for
loop when __is_bucket_begin is true. And if __is_bucket_begin is
true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be
&_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase.
That means _M_before_begin._M_nxt is always assigned, if
_M_remove_bucket_begin is called and &_M_before_begin ==
_M_buckets[__bkt]. So there’s no need to assign
_M_before_begin._M_nxt in _M_remove_bucket_begin.
 2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
_M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in
_M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt
is always assigned. So there's no need to assign
_M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.

In summary, there’s no need to check &_M_before_begin == 
_M_buckets[__bkt] and assign _M_before_begin._M_nxt in 
_M_remove_bucket_begin.



Then let’s see the responsibility of each method:

The hash table in the C++ STL is composed of hash buckets and a node 
list. The update of the node list is responsible for _M_erase and 
_M_extract_node method. _M_remove_bucket_begin method only needs to 
update the hash buckets. The update of _M_before_begin belongs to the 
update of the node list. So _M_remove_bucket_begin doesn’t need to 
update _M_before_begin.



Existing tests listed below cover this change:

23_containers/unordered_set/allocator/copy.cc

23_containers/unordered_set/allocator/copy_assign.cc

23_containers/unordered_set/allocator/move.cc

23_containers/unordered_set/allocator/move_assign.cc

23_containers/unordered_set/allocator/swap.cc

23_containers/unordered_set/erase/1.cc

23_containers/unordered_set/erase/24061-set.cc

23_containers/unordered_set/modifiers/extract.cc

23_containers/unordered_set/operations/count.cc

23_containers/unordered_set/requirements/exception/basic.cc

23_containers/unordered_map/allocator/copy.cc

23_containers/unordered_map/allocator/copy_assign.cc

23_containers/unordered_map/allocator/move.cc

23_containers/unordered_map/allocator/move_assign.cc

23_containers/unordered_map/allocator/swap.cc

23_containers/unordered_map/erase/1.cc

23_containers/unordered_map/erase/24061-map.cc

23_containers/unordered_map/modifiers/extract.cc

23_containers/unordered_map/modifiers/move_assign.cc

23_containers/unordered_map/operations/count.cc

23_containers/unordered_map/requirements/exception/basic.cc


Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?


---

ChangeLog:


libstdc++: hashtable: No need to update before begin node in 
_M_remove_bucket_begin



2024-01-16Huanghui Nie


gcc/

* libstdc++-v3/include/bits/hashtable.h


---


diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h


index b48610036fa..6056639e663 100644

--- a/libstdc++-v3/include/bits/hashtable.h

+++ b/libstdc++-v3/include/bits/hashtable.h

@@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

      if (!__next_n || __next_bkt != __bkt)

        {

          // Bucket is now empty

-         // First update next bucket if any

+         // Update next bucket if any

          if (__next_n)

            _M_buckets[__next_bkt] = _M_buckets[__bkt];

-         // Second update before begin node if necessary

-         if (&_M_before_begin == _M_buckets[__bkt])

-           _M_before_begin._M_nxt = __next_n;

          _M_buckets[__bkt] = nullptr;

        }

    }

[COMMITTEDv2] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Andrew Pinski

So the problem here is that aarch64_ldp_reg_operand will all subreg even subreg 
of lo_sum.
When LRA tries to fix that up, all things break. So the fix is to change the 
check to only
allow reg and subreg of regs.

Note the tendancy here is to use register_operand but that checks the mode of 
the register
but we need to allow a mismatch modes for this predicate for now.

Committed as approved.
Built and tested for aarch64-linux-gnu with no regressions
(Also tested with the LD/ST pair pass back on).

PR target/113221

gcc/ChangeLog:

* config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
only allow REG operands instead of allowing all.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr113221-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/predicates.md |  6 +-
 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c

diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 8a204e48bb5..b895f5dcb86 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -313,7 +313,11 @@ (define_predicate "pmode_plus_operator"
 
 (define_special_predicate "aarch64_ldp_reg_operand"
   (and
-(match_code "reg,subreg")
+(ior
+  (match_code "reg")
+  (and
+   (match_code "subreg")
+   (match_test "REG_P (SUBREG_REG (op))")))
 (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
 (ior
   (match_test "mode == VOIDmode")
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
new file mode 100644
index 000..942fa5eea88
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
@@ -0,0 +1,12 @@
+/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */
+/* PR target/113221 */
+/* This used to ICE after the `load/store pair fusion pass` was added
+   due to the predicate aarch64_ldp_reg_operand allowing too much. */
+
+
+void bar();
+void foo(int* b) {
+  for (;;)
+*b++ = (__SIZE_TYPE__)bar;
+}
+
-- 
2.39.3

[COMMITTED] Clean up documentation for -Wstrict-flex-arrays [PR111659]

2024-01-17 Thread Sandra Loosemore

gcc/ChangeLog
PR middle-end/111659
* doc/extend.texi (Common Variable Attributes): Fix long lines
in documentation of strict_flex_array + other minor copy-editing.
Add a cross-reference to -Wstrict-flex-arrays.
* doc/invoke.texi (Option Summary): Fix whitespace in tables
before -fstrict-flex-arrays and -Wstrict-flex-arrays.
(C Dialect Options): Combine the docs for the two
-fstrict-flex-arrays forms into a single entry.  Note this option
is for C/C++ only.  Add a cross-reference to -Wstrict-flex-arrays.
(Warning Options): Note -Wstrict-flex-arrays is for C/C++ only.
Minor copy-editing.  Add cross references to the strict_flex_array
attribute and -fstrict-flex-arrays option.  Add note that this
option depends on -ftree-vrp.
---
 gcc/doc/extend.texi | 30 +++---
 gcc/doc/invoke.texi | 51 ++---
 2 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89e823629e3..91f0b669b9e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7790,18 +7790,24 @@ are treated as flexible array members. @var{level}=3 is 
the strictest level,
 only when the trailing array is declared as a flexible array member per C99
 standard onwards (@samp{[]}), it is treated as a flexible array member.
 
-There are two more levels in between 0 and 3, which are provided to support
-older codes that use GCC zero-length array extension (@samp{[0]}) or 
one-element
-array as flexible array members (@samp{[1]}):
-When @var{level} is 1, the trailing array is treated as a flexible array member
-when it is declared as either @samp{[]}, @samp{[0]}, or @samp{[1]};
-When @var{level} is 2, the trailing array is treated as a flexible array member
-when it is declared as either @samp{[]}, or @samp{[0]}.
-
-This attribute can be used with or without the @option{-fstrict-flex-arrays}.
-When both the attribute and the option present at the same time, the level of
-the strictness for the specific trailing array field is determined by the
-attribute.
+There are two more levels in between 0 and 3, which are provided to
+support older codes that use GCC zero-length array extension
+(@samp{[0]}) or one-element array as flexible array members
+(@samp{[1]}).  When @var{level} is 1, the trailing array is treated as
+a flexible array member when it is declared as either @samp{[]},
+@samp{[0]}, or @samp{[1]}; When @var{level} is 2, the trailing array
+is treated as a flexible array member when it is declared as either
+@samp{[]}, or @samp{[0]}.
+
+This attribute can be used with or without the
+@option{-fstrict-flex-arrays} command-line option.  When both the
+attribute and the option are present at the same time, the level of
+the strictness for the specific trailing array field is determined by
+the attribute.
+
+The @code{strict_flex_array} attribute interacts with the
+@option{-Wstrict-flex-arrays} option.  @xref{Warning Options}, for more
+information.
 
 @cindex @code{alloc_size} variable attribute
 @item alloc_size (@var{position})
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43fd3c3a3cd..a537be66736 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -207,7 +207,7 @@ in the following sections.
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields
--fsigned-char  -funsigned-char -fstrict-flex-arrays[=@var{n}]
+-fsigned-char  -funsigned-char  -fstrict-flex-arrays[=@var{n}]
 -fsso-struct=@var{endianness}}
 
 @item C++ Language Options
@@ -405,7 +405,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wstrict-aliasing=n  -Wstrict-overflow  -Wstrict-overflow=@var{n}
 -Wstring-compare
 -Wno-stringop-overflow -Wno-stringop-overread
--Wno-stringop-truncation -Wstrict-flex-arrays
+-Wno-stringop-truncation  -Wstrict-flex-arrays
 -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{|}malloc@r{]}
 -Wswitch  -Wno-switch-bool  -Wswitch-default  -Wswitch-enum
 -Wno-switch-outside-range  -Wno-switch-unreachable  -Wsync-nand
@@ -2945,22 +2945,22 @@ is always just like one of those two.
 
 @opindex fstrict-flex-arrays
 @opindex fno-strict-flex-arrays
-@item -fstrict-flex-arrays
-Control when to treat the trailing array of a structure as a flexible array
-member for the purpose of accessing the elements of such an array.
-The positive form is equivalent to @option{-fstrict-flex-arrays=3}, which is 
the
-strictest.  A trailing array is treated as a flexible array member only when it
-is declared as a flexible array member per C99 standard onwards.
-The negative form is equivalent to @option{-fstrict-flex-arrays=0}, which is 
the
-least strict.  All trailing arrays of structures are treated as flexible array
-members.
-
 @opindex fstrict-flex-arrays=@var{level}
-@item -fstrict-flex-arrays=

[COMMITTED] Re-alphabetize attribute tables in extend.texi.

2024-01-17 Thread Sandra Loosemore

These sections used to be alphabetized, but when I was working on the
fix for PR111659 I noticed documentation for some newer attributes had
been inserted at random places in the tables instead of maintaining
alphabetical order.  There's no change to content here, just moving
blocks of text around.

gcc/ChangeLog
* doc/extend.texi (Common Function Attributes): Re-alphabetize
the table.
(Common Variable Attributes): Likewise.
(Common Type Attributes): Likewise.
---
 gcc/doc/extend.texi | 857 ++--
 1 file changed, 430 insertions(+), 427 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 91f0b669b9e..d1893ad860c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3028,19 +3028,6 @@ types (@pxref{Variable Attributes}, @pxref{Type 
Attributes}.)
 The message attached to the attribute is affected by the setting of
 the @option{-fmessage-length} option.
 
-@cindex @code{unavailable} function attribute
-@item unavailable
-@itemx unavailable (@var{msg})
-The @code{unavailable} attribute results in an error if the function
-is used anywhere in the source file.  This is useful when identifying
-functions that have been removed from a particular variation of an
-interface.  Other than emitting an error rather than a warning, the
-@code{unavailable} attribute behaves in the same manner as
-@code{deprecated}.
-
-The @code{unavailable} attribute can also be used for variables and
-types (@pxref{Variable Attributes}, @pxref{Type Attributes}.)
-
 @cindex @code{error} function attribute
 @cindex @code{warning} function attribute
 @item error ("@var{message}")
@@ -3666,6 +3653,10 @@ This attribute locally overrides the 
@option{-fstack-limit-register}
 and @option{-fstack-limit-symbol} command-line options; it has the effect
 of disabling stack limit checking in the function it applies to.
 
+@cindex @code{no_stack_protector} function attribute
+@item no_stack_protector
+This attribute prevents stack protection code for the function.
+
 @cindex @code{noclone} function attribute
 @item noclone
 This function attribute prevents a function from being considered for
@@ -3761,63 +3752,6 @@ my_memcpy (void *dest, const void *src, size_t len)
 __attribute__((nonnull));
 @end smallexample
 
-@cindex @code{null_terminated_string_arg} function attribute
-@item null_terminated_string_arg
-@itemx null_terminated_string_arg (@var{N})
-The @code{null_terminated_string_arg} attribute may be applied to a
-function that takes a @code{char *} or @code{const char *} at
-referenced argument @var{N}.
-
-It indicates that the passed argument must be a C-style null-terminated
-string.  Specifically, the presence of the attribute implies that, if
-the pointer is non-null, the function may scan through the referenced
-buffer looking for the first zero byte.
-
-In particular, when the analyzer is enabled (via @option{-fanalyzer}),
-if the pointer is non-null, it will simulate scanning for the first
-zero byte in the referenced buffer, and potentially emit
-@option{-Wanalyzer-use-of-uninitialized-value}
-or @option{-Wanalyzer-out-of-bounds} on improperly terminated buffers.
-
-For example, given the following:
-
-@smallexample
-char *example_1 (const char *p)
-  __attribute__((null_terminated_string_arg (1)));
-@end smallexample
-
-the analyzer will check that any non-null pointers passed to the function
-are validly terminated.
-
-If the parameter must be non-null, it is appropriate to use both this
-attribute and the attribute @code{nonnull}, such as in:
-
-@smallexample
-extern char *example_2 (const char *p)
-  __attribute__((null_terminated_string_arg (1),
- nonnull (1)));
-@end smallexample
-
-See the @code{nonnull} attribute for more information and
-caveats.
-
-If the pointer argument is also referred to by an @code{access} attribute on 
the
-function with @var{access-mode} either @code{read_only} or @code{read_write}
-and the latter attribute has the optional @var{size-index} argument
-referring to a size argument, this expressses the maximum size of the access.
-For example, given:
-
-@smallexample
-extern char *example_fn (const char *p, size_t n)
-  __attribute__((null_terminated_string_arg (1),
- access (read_only, 1, 2),
- nonnull (1)));
-@end smallexample
-
-the analyzer will require the first parameter to be non-null, and either
-be validly null-terminated, or validly readable up to the size specified by
-the second parameter.
-
 @cindex @code{noplt} function attribute
 @item noplt
 The @code{noplt} attribute is the counterpart to option @option{-fno-plt}.
@@ -3896,6 +3830,63 @@ the standard C library can be guaranteed not to throw an 
exception
 with the notable exceptions of @code{qsort} and @code{bsearch} that
 take function pointer arguments.
 
+@cindex @code{null_terminated_string_arg} function attribute
+@item null_terminated_string_arg
+@itemx null_terminated_string_ar

[PATCH] modula2: Many powerpc platforms do _not_ have support for IEEE754 long double [PR111956]

2024-01-17 Thread Gaius Mulley



ok for master ?

Bootstrapped on power8 (cfarm135), power9 (cfarm120) and
x86_64-linux-gnu.

---

This patch corrects commit
r14-4149-g81d5ca0b9b8431f1bd7a5ec8a2c94f04bb0cf032 which assummed
all powerpc platforms would have IEEE754 long double.  The patch
ensures that cc1gm2 obtains the default IEEE754 long double availability
from the configure generated tm_defines.  The user command
line switches -mabi=ibmlongdouble and -mabi=ieeelongdouble are implemented
to override the configuration defaults.

gcc/m2/ChangeLog:

PR modula2/111956
* Make-lang.in (host_mc_longreal): Remove.
* configure: Regenerate.
* configure.ac (M2C_LONGREAL_FLOAT128): Remove.
(M2C_LONGREAL_PPC64LE): Remove.
* gm2-compiler/M2Options.def (SetIBMLongDouble): New procedure.
(GetIBMLongDouble): New procedure function.
(SetIEEELongDouble): New procedure.
(GetIEEELongDouble): New procedure function.
* gm2-compiler/M2Options.mod (SetIBMLongDouble): New procedure.
(GetIBMLongDouble): New procedure function.
(SetIEEELongDouble): New procedure.
(GetIEEELongDouble): New procedure function.
(InitializeLongDoubleFlags): New procedure called during
module block initialization.
* gm2-gcc/m2configure.cc: Remove duplicate includes.
(m2configure_M2CLongRealFloat128): Remove.
(m2configure_M2CLongRealIBM128): Remove.
(m2configure_M2CLongRealLongDouble): Remove.
(m2configure_M2CLongRealLongDoublePPC64LE): Remove.
(m2configure_TargetIEEEQuadDefault): New function.
* gm2-gcc/m2configure.def (M2CLongRealFloat128): Remove.
(M2CLongRealIBM128): Remove.
(M2CLongRealLongDouble): Remove.
(M2CLongRealLongDoublePPC64LE): Remove.
(TargetIEEEQuadDefault): New function.
* gm2-gcc/m2configure.h (m2configure_M2CLongRealFloat128): Remove.
(m2configure_M2CLongRealIBM128): Remove.
(m2configure_M2CLongRealLongDouble): Remove.
(m2configure_M2CLongRealLongDoublePPC64LE): Remove.
(m2configure_TargetIEEEQuadDefault): New function.
* gm2-gcc/m2options.h (M2Options_SetIBMLongDouble): New prototype.
(M2Options_GetIBMLongDouble): New prototype.
(M2Options_SetIEEELongDouble): New prototype.
(M2Options_GetIEEELongDouble): New prototype.
* gm2-gcc/m2type.cc (build_m2_long_real_node): Re-implement using
results of M2Options_GetIBMLongDouble and M2Options_GetIEEELongDouble.
* gm2-lang.cc (gm2_langhook_handle_option): Add case
OPT_mabi_ibmlongdouble and call M2Options_SetIBMLongDouble.
Add case OPT_mabi_ieeelongdouble and call M2Options_SetIEEELongDouble.
* gm2config.aci.in: Regenerate.
* gm2spec.cc (lang_specific_driver): Remove block defined by
M2C_LONGREAL_PPC64LE.
Remove case OPT_mabi_ibmlongdouble.
Remove case OPT_mabi_ieeelongdouble.

libgm2/ChangeLog:

PR modula2/111956
* Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* Makefile.in: Regenerate.
* libm2cor/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2pim/Makefile.in: Regenerate.

---

diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
index d7bc7362bbf..45bfa933dca 100644
--- a/gcc/m2/Make-lang.in
+++ b/gcc/m2/Make-lang.in
@@ -98,9 +98,6 @@ GM2_PROG_DEP=gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext)
 
 include m2/config-make
 
-# Determine if float128 should represent the Modula-2 type LONGREAL.
-host_mc_longreal := $(if $(strip $(filter 
powerpc64le%,$(host))),--longreal=__float128)
-
 LIBSTDCXX=../$(TARGET_SUBDIR)/libstdc++-v3/src/.libs/libstdc++.a
 
 PGE=m2/pge$(exeext)
@@ -474,8 +471,7 @@ MC_ARGS= --olang=c++ \
  -I$(srcdir)/m2/gm2-gcc \
  --quiet \
  $(MC_COPYRIGHT) \
- --gcc-config-system \
- $(host_mc_longreal)
+ --gcc-config-system
 
 MCDEPS=m2/boot-bin/mc$(exeext)
 
diff --git a/gcc/m2/configure b/gcc/m2/configure
index f62f3d8729c..46530970785 100755
--- a/gcc/m2/configure
+++ b/gcc/m2/configure
@@ -3646,24 +3646,6 @@ $as_echo "#define HAVE_OPENDIR 1" >>confdefs.h
 fi
 
 
-case $target in #(
-  powerpc64le*) :
-
-$as_echo "#define M2C_LONGREAL_FLOAT128 1" >>confdefs.h
- ;; #(
-  *) :
- ;;
-esac
-
-case $target in #(
-  powerpc64le*) :
-
-$as_echo "#define M2C_LONGREAL_PPC64LE 1" >>confdefs.h
- ;; #(
-  *) :
- ;;
-esac
-
 ac_config_headers="$ac_config_headers gm2config.aci"
 
 cat >confcache <<\_ACEOF
diff --git a/gcc/m2/configure.ac b/gcc/m2/configure.ac
index efcca628068..15be50936f7 100644

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread Alexandre Oliva

David,

On Jan  7, 2024, "Kewen.Lin"  wrote:

> As PR113100 shows, the unbiasing introduced by r14-6737 can
> cause the scrubbing to overrun and screw some critical data
> on stack like saved toc base consequently cause segfault on
> Power.

I suppose this problem that Kewen fixed (thanks) was what caused you to
install commit r14-6838.  According to posted test results, strub worked
on AIX until Dec 20, when the fixes for sparc that broke strub on ppc
went in.

I can't seem to find the email in which you posted the patch, and I'd
have appreciated if you'd copied me.  I wouldn't have missed it for so
long if you had.  Since I couldn't find that patch, I'm responding in
this thread instead.

The r14-6838 patch is actually very very broken.  Disabling strub on a
target is not a matter of changing only the testsuite.  Your additions
to the tests even broke the strub-unsupported testcases, that tested
exactly the feature that enables ports to disable strub in a way that
informs users in case they attempt to use it.

I'd thus like to revert that patch.

Kewen's patch needs a little additional cleanup, that I'm preparing now,
to restore fully-functioning strub on sparc32.

Please let me know in case you observe any other problems related with
strub.  I'd be happy to fix them, but I can only do so once I'm aware of
them.

In case the reversal or the upcoming cleanup has any negative impact,
please make sure you let me know.

Thanks,

Happy GNU Year!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[Committed V3] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong

V3: Rebase to trunk and commit it.

This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible 
check
for conflict vsetvl fusion.

Buggy assembler before this patch:

.L69:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
j   .L37
.L68:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
addia3,a5,8
vmv.v.i v1,0
vse8.v  v1,0(a5)
vse8.v  v1,0(a3)
addia4,a4,-16
li  a3,8
bltua4,a3,.L37
j   .L69
.L67:
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
vse8.v  v1,0(a3)
addia4,a4,-32
li  a3,16
bltua4,a3,.L36
j   .L68

After this patch:

.L63:
ble s1,zero,.L49
sllia4,s1,3
li  a3,32
addia5,sp,48
bltua4,a3,.L62
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
addia4,a4,-32
addia5,sp,80
vse8.v  v1,0(a3)
.L35:
li  a3,16
bltua4,a3,.L36
addia3,a5,8
vmv.v.i v1,0
addia4,a4,-16
vse8.v  v1,0(a5)
addia5,a5,16
vse8.v  v1,0(a3)
.L36:
li  a3,8
bltua4,a3,.L37
vmv.v.i v1,0
vse8.v  v1,0(a5)

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113429

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-5.c: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 43 +++
 .../riscv/rvv/vsetvl/vlmax_conflict-4.c   |  5 +--
 .../riscv/rvv/vsetvl/vlmax_conflict-5.c   | 10 ++---
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 41d4b80648f..2067073185f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2254,6 +2254,22 @@ private:
 return true;
   }
 
+  bool has_compatible_reaching_vsetvl_p (vsetvl_info info)
+  {
+unsigned int index;
+sbitmap_iterator sbi;
+EXECUTE_IF_SET_IN_BITMAP (m_vsetvl_def_in[info.get_bb ()->index ()], 0,
+ index, sbi)
+  {
+   const auto prev_info = *m_vsetvl_def_exprs[index];
+   if (!prev_info.valid_p ())
+ continue;
+   if (m_dem.compatible_p (prev_info, info))
+ return true;
+  }
+return false;
+  }
+
   bool preds_all_same_avl_and_ratio_p (const vsetvl_info &curr_info)
   {
 gcc_assert (
@@ -3076,22 +3092,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
{
  vsetvl_info new_curr_info = curr_info;
  new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
- bool has_compatible_p = false;
- unsigned int def_expr_index;
- sbitmap_iterator sbi2;
- EXECUTE_IF_SET_IN_BITMAP (
-   m_vsetvl_def_in[new_curr_info.get_bb ()->index ()], 0,
-   def_expr_index, sbi2)
-   {
- vsetvl_info &prev_info = *m_vsetvl_def_exprs[def_expr_index];
- if (!prev_info.valid_p ())
-   continue;
- if (m_dem.compatible_p (prev_info, new_curr_info))
-   {
- has_compatible_p = true;
- break;
-   }
-   }
+ bool has_compatible_p
+   = has_compatible_reaching_vsetvl_p (new_curr_info);
  if (!has_compatible_p)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3146,7 +3148,10 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
  else
{
  /* Cancel lift up if probabilities are equal.  */
- if (successors_probability_equal_p (eg->src))
+ if (successors_probability_equal_p (eg->src)
+ || (dest_block_info.probability
+   > src_block_info.probability
+ && !has_compatible_reaching_vsetvl_p (curr_info)))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -3154,8 +3159,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
   "  Reset bb %u:",

[PATCH] c++: ICE when xobj is not the first parm [PR113389]

2024-01-17 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In grokdeclarator/cdk_function the comment says that the find_xobj_parm
lambda clears TREE_PURPOSE so that we can correctly detect an xobj that
is not the first parameter.  That's all good, but we should also clear
the TREE_PURPOSE once we've given the error, otherwise we crash later in
check_default_argument because the 'this' TREE_PURPOSE lacks a type.

PR c++/113389

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator) : Set TREE_PURPOSE to
NULL_TREE when emitting an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics10.C: New test.
---
 gcc/cp/decl.cc  | 1 +
 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C | 8 
 2 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 322e48dee2e..3e41fd4fa31 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13391,6 +13391,7 @@ grokdeclarator (const cp_declarator *declarator,
  if (TREE_PURPOSE (parm) != this_identifier)
continue;
  bad_xobj_parm_encountered = true;
+ TREE_PURPOSE (parm) = NULL_TREE;
  gcc_rich_location bad_xobj_parm
(DECL_SOURCE_LOCATION (TREE_VALUE (parm)));
  error_at (&bad_xobj_parm,
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
new file mode 100644
index 000..354823db166
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
@@ -0,0 +1,8 @@
+// PR c++/113389
+// { dg-do compile { target c++23 } }
+
+struct A {
+  void foo(A, this A); // { dg-error "only the first parameter" }
+  void qux(A, this A,  // { dg-error "only the first parameter" }
+  this A); // { dg-error "only the first parameter" }
+};

base-commit: 4a8430c8c3abb1c2c14274105b3a621100f251a2
-- 
2.43.0

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread David Edelsohn

If the fixes remove the failures on AIX, then the patch to disable the
tests also can be reverted.

Thanks, David


On Wed, Jan 17, 2024 at 8:06 PM Alexandre Oliva  wrote:

> David,
>
> On Jan  7, 2024, "Kewen.Lin"  wrote:
>
> > As PR113100 shows, the unbiasing introduced by r14-6737 can
> > cause the scrubbing to overrun and screw some critical data
> > on stack like saved toc base consequently cause segfault on
> > Power.
>
> I suppose this problem that Kewen fixed (thanks) was what caused you to
> install commit r14-6838.  According to posted test results, strub worked
> on AIX until Dec 20, when the fixes for sparc that broke strub on ppc
> went in.
>
> I can't seem to find the email in which you posted the patch, and I'd
> have appreciated if you'd copied me.  I wouldn't have missed it for so
> long if you had.  Since I couldn't find that patch, I'm responding in
> this thread instead.
>
> The r14-6838 patch is actually very very broken.  Disabling strub on a
> target is not a matter of changing only the testsuite.  Your additions
> to the tests even broke the strub-unsupported testcases, that tested
> exactly the feature that enables ports to disable strub in a way that
> informs users in case they attempt to use it.
>
> I'd thus like to revert that patch.
>
> Kewen's patch needs a little additional cleanup, that I'm preparing now,
> to restore fully-functioning strub on sparc32.
>
> Please let me know in case you observe any other problems related with
> strub.  I'd be happy to fix them, but I can only do so once I'm aware of
> them.
>
> In case the reversal or the upcoming cleanup has any negative impact,
> please make sure you let me know.
>
> Thanks,
>
> Happy GNU Year!
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

[COMMITTED] Document negative forms of -Wtsan and -Wxor-used-as-pow [PR110847]

2024-01-17 Thread Sandra Loosemore

These warnings are enabled by default, thus the manual should document the
-no form instead of the positive form.

gcc/ChangeLog
PR middle-end/110847
* doc/invoke.texi (Option Summary): Document negative forms of
-Wtsan and -Wxor-used-as-pow.
(Warning Options): Likewise.
---
 gcc/doc/invoke.texi | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a537be66736..4d43dda9839 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -410,7 +410,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wswitch  -Wno-switch-bool  -Wswitch-default  -Wswitch-enum
 -Wno-switch-outside-range  -Wno-switch-unreachable  -Wsync-nand
 -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs
--Wtrivial-auto-var-init -Wtsan -Wtype-limits  -Wundef
+-Wtrivial-auto-var-init  -Wno-tsan  -Wtype-limits  -Wundef
 -Wuninitialized  -Wunknown-pragmas
 -Wunsuffixed-float-constants  -Wunused
 -Wunused-but-set-parameter  -Wunused-but-set-variable
@@ -424,7 +424,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wvector-operation-performance
 -Wvla  -Wvla-larger-than=@var{byte-size}  -Wno-vla-larger-than
 -Wvolatile-register-var  -Wwrite-strings
--Wxor-used-as-pow
+-Wno-xor-used-as-pow
 -Wzero-length-bounds}
 
 @item Static Analyzer Options
@@ -9090,14 +9090,13 @@ This warning is enabled by default.
 
 @opindex Wtsan
 @opindex Wno-tsan
-@item -Wtsan
-Warn about unsupported features in ThreadSanitizer.
+@item -Wno-tsan
+
+Disable warnings about unsupported features in ThreadSanitizer.
 
 ThreadSanitizer does not support @code{std::atomic_thread_fence} and
 can report false positives.
 
-This warning is enabled by default.
-
 @opindex Wtype-limits
 @opindex Wno-type-limits
 @item -Wtype-limits
@@ -10434,17 +10433,18 @@ and/or writes to register variables.  This warning is 
enabled by
 
 @opindex Wxor-used-as-pow
 @opindex Wno-xor-used-as-pow
-@item -Wxor-used-as-pow @r{(C, C++, Objective-C and Objective-C++ only)}
-Warn about uses of @code{^}, the exclusive or operator, where it appears
-the user meant exponentiation.  Specifically, the warning occurs when the
+@item -Wno-xor-used-as-pow @r{(C, C++, Objective-C and Objective-C++ only)}
+Disable warnings about uses of @code{^}, the exclusive or operator,
+where it appears the code meant exponentiation.
+Specifically, the warning occurs when the
 left-hand side is the decimal constant 2 or 10 and the right-hand side
 is also a decimal constant.
 
 In C and C++, @code{^} means exclusive or, whereas in some other languages
 (e.g. TeX and some versions of BASIC) it means exponentiation.
 
-This warning is enabled by default.  It can be silenced by converting one
-of the operands to hexadecimal.
+This warning can be silenced by converting one of the operands to
+hexadecimal as well as by compiling with @option{-Wno-xor-used-as-pow}.
 
 @opindex Wdisabled-optimization
 @opindex Wno-disabled-optimization
-- 
2.31.1

[PATCH] libstdc++: Fix constexpr _Safe_iterator in C++20 mode

2024-01-17 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

Some _Safe_iterator member functions define a variable of non-literal
type __gnu_cxx::__scoped_lock, which automatically disqualifies them from
being constexpr in C++20 mode even if that code path is never constant
evaluated.  This restriction was lifted by P2242R3 for C++23, but we
need to work around it in C++20 mode.  To that end this patch defines
a pair of macros that encapsulate the lambda-based workaround mentioned
in that paper and uses them to make the functions valid C++20 constexpr
functions.  The augmented std::vector test element_access/constexpr.cc
now successfully compiles in C++20 mode with -D_GLIBCXX_DEBUG (and it
tests all modified member functions).

libstdc++-v3/ChangeLog:

* include/debug/safe_base.h (_Safe_sequence_base::_M_swap):
Remove _GLIBCXX20_CONSTEXPR.
* include/debug/safe_iterator.h 
(_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN):
(_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END): Define.
(_Safe_iterator::operator=): Use them around the code path that
defines a variable of type __gnu_cxx::__scoped_lock.
(_Safe_iterator::operator++): Likewise.
(_Safe_iterator::operator--): Likewise.
(_Safe_iterator::operator+=): Likewise.
(_Safe_iterator::operator-=): Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc
(test_iterators): Also test copy and move assignment.
* testsuite/std/ranges/adaptors/all.cc (test08) [_GLIBCXX_DEBUG]:
Use std::vector unconditionally.
---
 libstdc++-v3/include/debug/safe_base.h|  1 -
 libstdc++-v3/include/debug/safe_iterator.h| 48 ++-
 .../vector/element_access/constexpr.cc|  2 +
 .../testsuite/std/ranges/adaptors/all.cc  |  4 --
 4 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/include/debug/safe_base.h 
b/libstdc++-v3/include/debug/safe_base.h
index 107fef3cb02..d5fbe4b1320 100644
--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -268,7 +268,6 @@ namespace __gnu_debug
  *  operation is complete all iterators that originally referenced
  *  one container now reference the other container.
  */
-_GLIBCXX20_CONSTEXPR
 void
 _M_swap(_Safe_sequence_base& __x) _GLIBCXX_USE_NOEXCEPT;
 
diff --git a/libstdc++-v3/include/debug/safe_iterator.h 
b/libstdc++-v3/include/debug/safe_iterator.h
index 1bc7c904ee0..929fd9b0ade 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -65,6 +65,20 @@
   _GLIBCXX_DEBUG_VERIFY_OPERANDS(_Lhs, _Rhs, __msg_distance_bad,   \
 __msg_distance_different)
 
+// This pair of macros helps with writing valid C++20 constexpr functions that
+// contain a non-constexpr code path that defines a non-literal variable, which
+// was otherwise disallowed until P2242R3 for C++23.  We use them below for
+// __gnu_cxx::__scoped_lock so that the containing functions are still
+// considered valid C++20 constexpr functions.
+
+#if __cplusplus >= 202002L && __cpp_constexpr < 202110L
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN [&]() -> void { do
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END while(false); }();
+#else
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
+#endif
+
 namespace __gnu_debug
 {
   /** Helper struct to deal with sequence offering a before_begin
@@ -266,11 +280,11 @@ namespace __gnu_debug
  ._M_iterator(__x, "other"));
 
if (this->_M_sequence && this->_M_sequence == __x._M_sequence)
- {
+ _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
base() = __x.base();
_M_version = __x._M_sequence->_M_version;
- }
+ } _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
else
  {
_M_detach();
@@ -306,11 +320,11 @@ namespace __gnu_debug
  return *this;
 
if (this->_M_sequence && this->_M_sequence == __x._M_sequence)
- {
+ _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
base() = __x.base();
_M_version = __x._M_sequence->_M_version;
- }
+ } _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
else
  {
_M_detach();
@@ -378,8 +392,10 @@ namespace __gnu_debug
_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
  _M_message(__msg_bad_inc)
  ._M_iterator(*this, "this"));
-   __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-   ++base();
+   _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
+ __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+ ++base();
+   }

Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Monk Chiang

Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.


On Wed, Jan 17, 2024 at 10:37 PM Jeff Law  wrote:

>
>
> On 1/17/24 05:14, Richard Biener wrote:
> > On Wed, 17 Jan 2024, Monk Chiang wrote:
> >
> >> This allows the backend to generate movcc instructions, if target
> >> machine has movcc pattern.
> >>
> >> branchless-cond.c needs to be updated since some target machines have
> >> conditional move instructions, and the experssion will not change to
> >> branchless expression.
> >
> > While I agree this pattern should possibly be applied during RTL
> > expansion or instruction selection on x86 which also has movcc
> > the multiplication is cheaper.  So I don't think this isn't the way to
> go.
> >
> > I'd rather revert the change than trying to "fix" it this way?
> WRT reverting -- the patch in question's sole purpose was to enable
> branchless sequences for that very same code.  Reverting would regress
> performance on a variety of micro-architectures.  IIUC, the issue is
> that the SiFive part in question has a fusion which allows it to do the
> branchy sequence cheaply.
>
> ISTM this really needs to be addressed during expansion and most likely
> with a RISC-V target twiddle for the micro-archs which have
> short-forward-branch optimizations.
>
> jeff
>

Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Palmer Dabbelt

On Wed, 17 Jan 2024 19:19:58 PST (-0800), monk.chi...@sifive.com wrote:

Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.

On Wed, Jan 17, 2024 at 10:37 PM Jeff Law  wrote:

On 1/17/24 05:14, Richard Biener wrote:
> On Wed, 17 Jan 2024, Monk Chiang wrote:
>
>> This allows the backend to generate movcc instructions, if target
>> machine has movcc pattern.
>>
>> branchless-cond.c needs to be updated since some target machines have
>> conditional move instructions, and the experssion will not change to
>> branchless expression.
>
> While I agree this pattern should possibly be applied during RTL
> expansion or instruction selection on x86 which also has movcc
> the multiplication is cheaper.  So I don't think this isn't the way to
go.
>
> I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable
branchless sequences for that very same code.  Reverting would regress
performance on a variety of micro-architectures.  IIUC, the issue is
that the SiFive part in question has a fusion which allows it to do the
branchy sequence cheaply.

ISTM this really needs to be addressed during expansion and most likely
with a RISC-V target twiddle for the micro-archs which have
short-forward-branch optimizations.

IIRC I ran into some of these middle-end interactions a year or two ago 
and determined that we'd need middle-end changes to get this working 
smoothly -- essentially replacing the expander checks for a MOVCC insn  
with some sort of costing.

Without that, we're just going to end up with some missed optimizations 
that favor one way or the other.

jeff

Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-17 Thread Greg McGary

On Tue, Jan 16, 2024 at 11:44 PM Richard Biener 
wrote:

> > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary  wrote:

> > >

> > > The sign bit of a sign-extending load cannot be known until runtime,

> > > so don't attempt to simplify it in the combiner.

> >
> It feels like this papers over an issue downstream?

While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? There
are plenty of other conditions that also short circuit to "do nothing" and
this seems just as straightforward as those others. Efforts to catch this
further downstream add gratuitous complexity.

G

Re: [PATCH] hwasan: Check if Intel LAM_U57 is enabled

2024-01-17 Thread Hongtao Liu

On Wed, Jan 10, 2024 at 12:47 AM H.J. Lu  wrote:
>
> When -fsanitize=hwaddress is used, libhwasan will try to enable LAM_U57
> in the startup code.  Update the target check to enable hwaddress tests
> if LAM_U57 is enabled.  Also compile hwaddress tests with -mlam=u57 on
> x86-64 since hwasan requires LAM_U57 on x86-64.
I've tested it on lam enabled SRF, and it passed all hwasan testcases
except below

FAIL: c-c++-common/hwasan/alloca-outside-caught.c   -O0  output pattern test
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O1
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O3 -g
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -Os
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/vararray-outside-caught.c   -O0  output pattern test

Basically they're testcase issues, the testcases needs to be adjusted
for x86/ I'll commit a separate patch for those after this commit is
upstream.
Also I've also tested the patch on lam unsupported platforms, all
hwasan testcases shows unsupported.
So the patch LGTM.

>
> * lib/hwasan-dg.exp (check_effective_target_hwaddress_exec):
> Return 1 if Intel LAM_U57 is enabled.
> (hwasan_init): Add -mlam=u57 on x86-64.
> ---
>  gcc/testsuite/lib/hwasan-dg.exp | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/lib/hwasan-dg.exp b/gcc/testsuite/lib/hwasan-dg.exp
> index e9c5ef6524d..76057502ee6 100644
> --- a/gcc/testsuite/lib/hwasan-dg.exp
> +++ b/gcc/testsuite/lib/hwasan-dg.exp
> @@ -44,11 +44,25 @@ proc check_effective_target_hwaddress_exec {} {
> #ifdef __cplusplus
> extern "C" {
> #endif
> +   extern int arch_prctl (int, unsigned long int *);
> extern int prctl(int, unsigned long, unsigned long, unsigned long, 
> unsigned long);
> #ifdef __cplusplus
> }
> #endif
> int main (void) {
> +   #ifdef __x86_64__
> +   # ifdef __LP64__
> +   #  define ARCH_GET_UNTAG_MASK 0x4001
> +   #  define LAM_U57_MASK (0x3fULL << 57)
> + unsigned long mask = 0;
> + if (arch_prctl(ARCH_GET_UNTAG_MASK, &mask) != 0)
> +   return 1;
> + if (mask != ~LAM_U57_MASK)
> +   return 1;
> + return 0;
> +   # endif
> + return 1;
> +   #else
> #define PR_SET_TAGGED_ADDR_CTRL 55
> #define PR_GET_TAGGED_ADDR_CTRL 56
> #define PR_TAGGED_ADDR_ENABLE (1UL << 0)
> @@ -58,6 +72,7 @@ proc check_effective_target_hwaddress_exec {} {
>   || !prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0))
> return 1;
>   return 0;
> +   #endif
> }
>  }] {
> return 0;
> @@ -102,6 +117,10 @@ proc hwasan_init { args } {
>
>  setenv HWASAN_OPTIONS "random_tags=0"
>
> +if [istarget x86_64-*-*] {
> +  set target_hwasan_flags "-mlam=u57"
> +}
> +
>  set link_flags ""
>  if ![is_remote host] {
> if [info exists TOOL_OPTIONS] {
> @@ -119,12 +138,12 @@ proc hwasan_init { args } {
>  if [info exists ALWAYS_CXXFLAGS] {
> set hwasan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
> set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS]
> -   set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=hwaddress 
> --param hwasan-random-frame-tag=0 -g $include_flags}" $ALWAYS_CXXFLAGS]
> +   set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=hwaddress 
> $target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags}" 
> $ALWAYS_CXXFLAGS]
>  } else {
> if [info exists TEST_ALWAYS_FLAGS] {
> -   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress --param 
> hwasan-random-frame-tag=0 -g $include_flags $TEST_ALWAYS_FLAGS"
> +   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress 
> $target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags 
> $TEST_ALWAYS_FLAGS"
> } else {
> -   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress --param 
> hwasan-random-frame-tag=0 -g $include_flags"
> +   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress 
> $target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags"
> }
>  }
>  }
> --
> 2.43.0
>


-- 
BR,
Hongtao

Re: [PATCH] c++: ICE when xobj is not the first parm [PR113389]

2024-01-17 Thread Jason Merrill


On 1/17/24 20:17, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
In grokdeclarator/cdk_function the comment says that the find_xobj_parm
lambda clears TREE_PURPOSE so that we can correctly detect an xobj that
is not the first parameter.  That's all good, but we should also clear
the TREE_PURPOSE once we've given the error, otherwise we crash later in
check_default_argument because the 'this' TREE_PURPOSE lacks a type.

PR c++/113389

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator) : Set TREE_PURPOSE to
NULL_TREE when emitting an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics10.C: New test.
---
  gcc/cp/decl.cc  | 1 +
  gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C | 8 
  2 files changed, 9 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 322e48dee2e..3e41fd4fa31 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13391,6 +13391,7 @@ grokdeclarator (const cp_declarator *declarator,
  if (TREE_PURPOSE (parm) != this_identifier)
continue;
  bad_xobj_parm_encountered = true;
+ TREE_PURPOSE (parm) = NULL_TREE;
  gcc_rich_location bad_xobj_parm
(DECL_SOURCE_LOCATION (TREE_VALUE (parm)));
  error_at (&bad_xobj_parm,
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
new file mode 100644
index 000..354823db166
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
@@ -0,0 +1,8 @@
+// PR c++/113389
+// { dg-do compile { target c++23 } }
+
+struct A {
+  void foo(A, this A); // { dg-error "only the first parameter" }
+  void qux(A, this A,  // { dg-error "only the first parameter" }
+  this A); // { dg-error "only the first parameter" }
+};

base-commit: 4a8430c8c3abb1c2c14274105b3a621100f251a2

[committed] testsuite, rs6000: Adjust fold-vec-extract-char.p7.c [PR111850]

2024-01-17 Thread Kewen.Lin

Hi,

As PR101169 comment #c4 shows, previously the addi count
update on fold-vec-extract-char.p7.c covered a sub-optimal
code gen issue.  On trunk, pass fold-mem-offsets helps to
recover the best code sequence, so this patch is to
revert the count back to the original which matches the
optimal addi count.

Tested well on powerpc64-linux-gnu P8/P9,
powerpc64le-linux-gnu P9/P10 and powerpc-ibm-aix.

Pushed as r14-8201.

PR testsuite/111850

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fold-vec-extract-char.p7.c: Update the
checking count of addi to 6.
---
 gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
index 29a8aa84db2..42599c214e4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
@@ -11,7 +11,7 @@
 /* one extsb (extend sign-bit) instruction generated for each test against
unsigned types */

-/* { dg-final { scan-assembler-times {\maddi\M} 9 } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mli\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mstxvw4x\M|\mstvx\M|\mstxv\M} 6 } } */
 /* -m32 target uses rlwinm in place of rldicl. */
--
2.34.1

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread Kewen.Lin

Hi David,

on 2024/1/18 09:27, David Edelsohn wrote:
> If the fixes remove the failures on AIX, then the patch to disable the tests 
> also can be reverted.
> 

Since I didn't find strub-unsupported*.c failed on ppc64 linux, to ensure it's
related, I reverted your commit r14-6838 and my fix r14-7089 locally and 
supposed
to see those test cases failed on aix, but they passed.  Then I tried to reset
the repo to r14-6275 which added those test cases, and supposed to see they 
failed,
then they still passed.  Not sure if I missed something in the testing, could 
you
kindly double check if those test cases started to fail from r14-6275 on your
env? or some other specific commit?  Or maybe directly verify if they can pass
on latest trunk with r14-6838 reverted.  Just to ensure the reverting matches
our expectation.  Thanks in advance!

btw, the command I used to test on aix is:
make check-gcc RUNTESTFLAGS="--target_board=unix'{-m64,-m32}' 
dg.exp=strub-unsupported*.c"

BR,
Kewen
 
> Thanks, David
> 
> 
> On Wed, Jan 17, 2024 at 8:06 PM Alexandre Oliva  > wrote:
> 
> David,
> 
> On Jan  7, 2024, "Kewen.Lin"  > wrote:
> 
> > As PR113100 shows, the unbiasing introduced by r14-6737 can
> > cause the scrubbing to overrun and screw some critical data
> > on stack like saved toc base consequently cause segfault on
> > Power.
> 
> I suppose this problem that Kewen fixed (thanks) was what caused you to
> install commit r14-6838.  According to posted test results, strub worked
> on AIX until Dec 20, when the fixes for sparc that broke strub on ppc
> went in.
> 
> I can't seem to find the email in which you posted the patch, and I'd
> have appreciated if you'd copied me.  I wouldn't have missed it for so
> long if you had.  Since I couldn't find that patch, I'm responding in
> this thread instead.
> 
> The r14-6838 patch is actually very very broken.  Disabling strub on a
> target is not a matter of changing only the testsuite.  Your additions
> to the tests even broke the strub-unsupported testcases, that tested
> exactly the feature that enables ports to disable strub in a way that
> informs users in case they attempt to use it.
> 
> I'd thus like to revert that patch.
> 
> Kewen's patch needs a little additional cleanup, that I'm preparing now,
> to restore fully-functioning strub on sparc32.
> 
> Please let me know in case you observe any other problems related with
> strub.  I'd be happy to fix them, but I can only do so once I'm aware of
> them.
> 
> In case the reversal or the upcoming cleanup has any negative impact,
> please make sure you let me know.
> 
> Thanks,
> 
> Happy GNU Year!
> 
> -- 
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/ 
> 
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

[PATCH] i386: Default to -mcet-switch [PR104816]

2024-01-17 Thread Fangrui Song

When -fcf-protection=branch is used, with the current -mno-cet-switch
default, a NOTRACK indirect jump is generated for jump tables, which can
target a non-ENDBR instruction.  However, the overwhelming opinion is to
avoid NOTRACK (PR104816) to improve safety.  Projects such as Linux
kernel and Xen even specify -fno-jump-table to avoid NOTRACK. Therefore,
let's change the default.

Note, for `default: __builtin_unreachable()`, LLVM AArch64 even made a
decision (https://reviews.llvm.org/D155485) to keep the range check,
which can otherwise be optimized out.  This reinforces the opinion that
people want protection for jump tables.

#define DO A(0) A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8) A(9) A(10) A(11) 
A(12) A(13)
#define A(i) void bar##i();
DO
#undef A
void ext();
void foo(int i) {
  switch (i) {
#define A(i) case i: bar##i(); break;
DO
// -mbranch-protection=bti causes Clang AArch64 to keep the i <= 13 range 
check
  default: __builtin_unreachable();
  }
  ext();
}

gcc/ChangeLog:

PR target/104816
* config/i386/i386.opt: Default to -mcet-switch.
* doc/invoke.texi: Update doc.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cet-switch-1.c: Add -mno-cet-switch.
* gcc.target/i386/cet-switch-2.c: Remove -mcet-switch to check the
  default.
---
 gcc/config/i386/i386.opt |  2 +-
 gcc/doc/invoke.texi  | 19 +--
 gcc/testsuite/gcc.target/i386/cet-switch-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/cet-switch-2.c |  2 +-
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 5b4f1bff25f..0e168f3c07a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1074,7 +1074,7 @@ Enable shadow stack built-in functions from Control-flow 
Enforcement
 Technology (CET).
 
 mcet-switch
-Target Var(flag_cet_switch) Init(0)
+Target Var(flag_cet_switch) Init(1)
 Turn on CET instrumentation for switch statements that use a jump table and
 an indirect jump.
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16e31a3c6db..720be71f8fa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1455,7 +1455,7 @@ See RS/6000 and PowerPC Options.
 -msse4a  -m3dnow  -m3dnowa  -mpopcnt  -mabm  -mbmi  -mtbm  -mfma4  -mxop
 -madx  -mlzcnt  -mbmi2  -mfxsr  -mxsave  -mxsaveopt  -mrtm  -mhle  -mlwp
 -mmwaitx  -mclzero  -mpku  -mthreads  -mgfni  -mvaes  -mwaitpkg
--mshstk -mmanual-endbr -mcet-switch -mforce-indirect-call
+-mshstk -mmanual-endbr -mno-cet-switch -mforce-indirect-call
 -mavx512vbmi2 -mavx512bf16 -menqcmd
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid
@@ -34886,16 +34886,15 @@ function attribute. This is useful when used with the 
option
 @option{-fcf-protection=branch} to control ENDBR insertion at the
 function entry.
 
+@opindex mno-cet-switch
 @opindex mcet-switch
-@item -mcet-switch
-By default, CET instrumentation is turned off on switch statements that
-use a jump table and indirect branch track is disabled.  Since jump
-tables are stored in read-only memory, this does not result in a direct
-loss of hardening.  But if the jump table index is attacker-controlled,
-the indirect jump may not be constrained by CET.  This option turns on
-CET instrumentation to enable indirect branch track for switch statements
-with jump tables which leads to the jump targets reachable via any indirect
-jumps.
+@item -mno-cet-switch
+When @option{-fcf-protection=branch} is enabled, by default, switch statements
+that use a jump table are instrumented to use ENDBR instructions and constrain
+the indirect jump with CET to protect against an attacker-controlled jump table
+index.  @option{-mno-cet-switch} generates a NOTRACK indirect jump and removes
+ENDBR instructions, which may make the jump table smaller at the cost of an
+unprotected indirect jump.
 
 @opindex mcall-ms2sysv-xlogues
 @opindex mno-call-ms2sysv-xlogues
diff --git a/gcc/testsuite/gcc.target/i386/cet-switch-1.c 
b/gcc/testsuite/gcc.target/i386/cet-switch-1.c
index afe5adc2f3d..4931c3ad1d2 100644
--- a/gcc/testsuite/gcc.target/i386/cet-switch-1.c
+++ b/gcc/testsuite/gcc.target/i386/cet-switch-1.c
@@ -1,6 +1,6 @@
 /* Verify that CET works.  */
 /* { dg-do compile } */
-/* { dg-options "-O -fcf-protection" } */
+/* { dg-options "-O -fcf-protection -mno-cet-switch" } */
 /* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
 /* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "notrack jmp\[ \t]+\[*]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/cet-switch-2.c 
b/gcc/testsuite/gcc.target/i386/cet-switch-2.c
index 69ddc6fd5b7..11578d1a30c 100644
--- a/gcc/testsuite/gcc.target/i386/cet-switch-2.c
+++ b/gcc/testsuite/gcc.target/i386/cet-switch-2.c
@@ -1,6 +1,6 @@
 /* Verify th

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-17 Thread chenglulu






gcc.dg/tree-ssa/scev-16.c is OK to move
gcc.dg/pr104992.c should simply add -fno-tree-vectorize to the used
options and remove the vect_* stuff


Hi Richard:

I have a question. I don't understand the purpose of adding 
'-fno-tree-vectorize' here.


Thanks!

Re: [PATCH] sra: Disqualify bases of operands of asm gotos

2024-01-17 Thread Richard Biener

On Wed, 17 Jan 2024, Martin Jambor wrote:

> Hi,
> 
> PR 110422 shows that SRA can ICE assuming there is a single edge
> outgoing from a block terminated with an asm goto.  We need that for
> BB-terminating statements so that any adjustments they make to the
> aggregates can be copied over to their replacements.  Because we can't
> have that after ASM gotos, we need to punt.
> 
> Bootstrapped and tested on x86_64-linux, OK for master?  It will need
> some tweaking for release branches, is it in principle OK for them too
> (after testing)?

OK.

> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2024-01-17  Martin Jambor  
> 
>   PR tree-optimization/110422
>   * tree-sra.cc (scan_function): Disqualify bases of operands of asm
>   gotos.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-01-17  Martin Jambor  
> 
>   PR tree-optimization/110422
>   * gcc.dg/torture/pr110422.c: New test.
> ---
>  gcc/testsuite/gcc.dg/torture/pr110422.c | 10 +
>  gcc/tree-sra.cc | 29 -
>  2 files changed, 33 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr110422.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr110422.c 
> b/gcc/testsuite/gcc.dg/torture/pr110422.c
> new file mode 100644
> index 000..2e171a7a19e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr110422.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +
> +struct T { int x; };
> +int foo(void) {
> +  struct T v;
> +  asm goto("" : "+r"(v.x) : : : lab);
> +  return 0;
> +lab:
> +  return -5;
> +}
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index 6a1141b7377..f8e71ec48b9 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -1559,15 +1559,32 @@ scan_function (void)
>   case GIMPLE_ASM:
> {
>   gasm *asm_stmt = as_a  (stmt);
> - for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
> + if (stmt_ends_bb_p (asm_stmt)
> + && !single_succ_p (gimple_bb (asm_stmt)))
> {
> - t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> - ret |= build_access_from_expr (t, asm_stmt, false);
> + for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> + disqualify_base_of_expr (t, "OP of asm goto.");
> +   }
> + for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> + disqualify_base_of_expr (t, "OP of asm goto.");
> +   }
> }
> - for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
> + else
> {
> - t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> - ret |= build_access_from_expr (t, asm_stmt, true);
> + for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> + ret |= build_access_from_expr (t, asm_stmt, false);
> +   }
> + for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> + ret |= build_access_from_expr (t, asm_stmt, true);
> +   }
> }
> }
> break;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v2] test regression fix: Add !vect128 for variable length targets of bb-slp-subgroups-3.c

2024-01-17 Thread Juzhe-Zhong

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
index fb719915db7..d1d79125731 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
@@ -42,7 +42,7 @@ main (int argc, char **argv)
 /* Because we disable the cost model, targets with variable-length
vectors can end up vectorizing the store to a[0..7] on its own.
With the cost model we do something sensible.  */
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target { ! amdgcn-*-* } xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target { ! amdgcn-*-* } xfail { vect_variable_length && { ! vect128 } } } } } */
 
 /* amdgcn can do this in one vector.  */
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
target amdgcn-*-* } } } */
-- 
2.36.3

Re: [PATCH] modula2: Many powerpc platforms do _not_ have support for IEEE754 long double [PR111956]

2024-01-17 Thread Richard Biener

On Thu, Jan 18, 2024 at 1:58 AM Gaius Mulley  wrote:
>
>
> ok for master ?
>
> Bootstrapped on power8 (cfarm135), power9 (cfarm120) and
> x86_64-linux-gnu.

OK.

I wonder what this does to the libm2 ABI?

> ---
>
> This patch corrects commit
> r14-4149-g81d5ca0b9b8431f1bd7a5ec8a2c94f04bb0cf032 which assummed
> all powerpc platforms would have IEEE754 long double.  The patch
> ensures that cc1gm2 obtains the default IEEE754 long double availability
> from the configure generated tm_defines.  The user command
> line switches -mabi=ibmlongdouble and -mabi=ieeelongdouble are implemented
> to override the configuration defaults.
>
> gcc/m2/ChangeLog:
>
> PR modula2/111956
> * Make-lang.in (host_mc_longreal): Remove.
> * configure: Regenerate.
> * configure.ac (M2C_LONGREAL_FLOAT128): Remove.
> (M2C_LONGREAL_PPC64LE): Remove.
> * gm2-compiler/M2Options.def (SetIBMLongDouble): New procedure.
> (GetIBMLongDouble): New procedure function.
> (SetIEEELongDouble): New procedure.
> (GetIEEELongDouble): New procedure function.
> * gm2-compiler/M2Options.mod (SetIBMLongDouble): New procedure.
> (GetIBMLongDouble): New procedure function.
> (SetIEEELongDouble): New procedure.
> (GetIEEELongDouble): New procedure function.
> (InitializeLongDoubleFlags): New procedure called during
> module block initialization.
> * gm2-gcc/m2configure.cc: Remove duplicate includes.
> (m2configure_M2CLongRealFloat128): Remove.
> (m2configure_M2CLongRealIBM128): Remove.
> (m2configure_M2CLongRealLongDouble): Remove.
> (m2configure_M2CLongRealLongDoublePPC64LE): Remove.
> (m2configure_TargetIEEEQuadDefault): New function.
> * gm2-gcc/m2configure.def (M2CLongRealFloat128): Remove.
> (M2CLongRealIBM128): Remove.
> (M2CLongRealLongDouble): Remove.
> (M2CLongRealLongDoublePPC64LE): Remove.
> (TargetIEEEQuadDefault): New function.
> * gm2-gcc/m2configure.h (m2configure_M2CLongRealFloat128): Remove.
> (m2configure_M2CLongRealIBM128): Remove.
> (m2configure_M2CLongRealLongDouble): Remove.
> (m2configure_M2CLongRealLongDoublePPC64LE): Remove.
> (m2configure_TargetIEEEQuadDefault): New function.
> * gm2-gcc/m2options.h (M2Options_SetIBMLongDouble): New prototype.
> (M2Options_GetIBMLongDouble): New prototype.
> (M2Options_SetIEEELongDouble): New prototype.
> (M2Options_GetIEEELongDouble): New prototype.
> * gm2-gcc/m2type.cc (build_m2_long_real_node): Re-implement using
> results of M2Options_GetIBMLongDouble and M2Options_GetIEEELongDouble.
> * gm2-lang.cc (gm2_langhook_handle_option): Add case
> OPT_mabi_ibmlongdouble and call M2Options_SetIBMLongDouble.
> Add case OPT_mabi_ieeelongdouble and call M2Options_SetIEEELongDouble.
> * gm2config.aci.in: Regenerate.
> * gm2spec.cc (lang_specific_driver): Remove block defined by
> M2C_LONGREAL_PPC64LE.
> Remove case OPT_mabi_ibmlongdouble.
> Remove case OPT_mabi_ieeelongdouble.
>
> libgm2/ChangeLog:
>
> PR modula2/111956
> * Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * Makefile.in: Regenerate.
> * libm2cor/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2cor/Makefile.in: Regenerate.
> * libm2iso/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2iso/Makefile.in: Regenerate.
> * libm2log/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2log/Makefile.in: Regenerate.
> * libm2min/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2min/Makefile.in: Regenerate.
> * libm2pim/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2pim/Makefile.in: Regenerate.
>
> ---
>
> diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
> index d7bc7362bbf..45bfa933dca 100644
> --- a/gcc/m2/Make-lang.in
> +++ b/gcc/m2/Make-lang.in
> @@ -98,9 +98,6 @@ GM2_PROG_DEP=gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext)
>
>  include m2/config-make
>
> -# Determine if float128 should represent the Modula-2 type LONGREAL.
> -host_mc_longreal := $(if $(strip $(filter 
> powerpc64le%,$(host))),--longreal=__float128)
> -
>  LIBSTDCXX=../$(TARGET_SUBDIR)/libstdc++-v3/src/.libs/libstdc++.a
>
>  PGE=m2/pge$(exeext)
> @@ -474,8 +471,7 @@ MC_ARGS= --olang=c++ \
>   -I$(srcdir)/m2/gm2-gcc \
>   --quiet \
>   $(MC_COPYRIGHT) \
> - --gcc-config-system \
> - $(host_mc_longreal)
> + --gcc-config-system
>
>  MCDEPS=m2/boot-bin/mc$(exeext)
>
> diff --git a/gcc/m2/configure b/gcc/m2/configure
> index f62f3d8729c..46530970785 100755
> --- a/gcc/m2/configure
> +++ b/gcc/m2/configure
> @@ -3646,24 +3646,6 @@ $as_echo "#define HAVE_OPENDIR 1" >>confdefs.h
>  fi
>
>
> -case $target in #(
> -  powerpc64le*) :
> -
> -$as_echo "#define M2C_LONGREAL_FLOAT128 1" >>confdefs.h
> - ;; #(
> -  *) :
>

1 2 >

1 - 100 of 119 matches

Mail list logo