Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-10-07 Thread Julian Waters
Thanks for the patch! You certainly worked that out faster than I could
create a reproducer. It's a bit late for me now, so I'll have to try it out
tomorrow. Note however that in the final patch I will be only doing TLS for
mingw32.h and not cygming.h. The reason for this is that Cygwin likely
cannot support Windows TLS since it lacks the special TLS slot marker
_tls_index. Essentially, only the MinGW runtime supports Windows TLS,
whatever runtime Cygwin uses does not, so implementing this for Cygwin as
well would not work out

best regards,
Julian

On Mon, Oct 7, 2024 at 11:44 PM Eric Botcazou  wrote:

> > I'm not quite sure what you mean by a testcase, but when compiling gcc
> > itself, when libgomp/libgcc (Can't remember which) is being compiled, gcc
> > will spit out invalid assembly that looks something like
> >
> > movabsq $8+__gcov_indirect_call@secrel32, %rax
>
> Tentative patch attached which should work better.  AFAICS the problematic
> assembly is generated because of the legitimate_pic_operand_p hunk, so I
> removed it (which yields an ICE) and added some massaging code to
> compensate
> in legitimize_pic_address.
>
> --
> Eric Botcazou


Re: [PATCH 2/3] Release expanded template argument vector

2024-10-07 Thread Jason Merrill

On 10/7/24 10:26 AM, Patrick Palka wrote:

On Mon, 7 Oct 2024, Jason Merrill wrote:


On 10/7/24 9:58 AM, Patrick Palka wrote:

On Sat, 5 Oct 2024, Jason Merrill wrote:


On 10/4/24 11:00 AM, Patrick Palka wrote:

On Thu, 3 Oct 2024, Jason Merrill wrote:


On 10/3/24 12:38 PM, Jason Merrill wrote:

On 10/2/24 7:50 AM, Richard Biener wrote:

This reduces peak memory usage by 20% for a specific testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It's very ugly so I'd appreciate suggestions on how to handle such
situations better?


I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.


OK, apparently that was both too clever and not clever enough.
Replacing
it
with this one that's much closer to yours.

Jason



From: Jason Merrill 
Date: Thu, 3 Oct 2024 16:31:00 -0400
Subject: [PATCH] c++: free garbage vec in coerce_template_parms
To: gcc-patches@gcc.gnu.org

coerce_template_parms can create two different vecs for the inner
template
arguments, new_inner_args and (potentially) the result of
expand_template_argument_pack.  One or the other, or possibly both,
end up
being garbage: in the typical case, the expanded vec is garbage
because
it's
only used as the source for convert_template_argument.  In some
dependent
cases, the new vec is garbage because we decide to return the original
args
instead.  In these cases, ggc_free the garbage vec to reduce the
memory
overhead of overload resolution.

gcc/cp/ChangeLog:

* pt.cc (coerce_template_parms): Free garbage vecs.

Co-authored-by: Richard Biener 
---
gcc/cp/pt.cc | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 20affcd65a2..4ceae1d38de 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
{
  /* We don't know how many args we have yet, just use the
 unconverted (and still packed) ones for now.  */
+ ggc_free (new_inner_args);
  new_inner_args = orig_inner_args;
  arg_idx = nargs;
  break;
@@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
  = make_pack_expansion (conv, complain);
/* We don't know how many args we have yet, just
- use the unconverted ones for now.  */
+use the unconverted (but unpacked) ones for now.  */
+ ggc_free (new_inner_args);


I'm a bit worried about these ggc_frees.  If an earlier template
parameter is a constrained auto NTTP then new_inner_args/new_args could
have been captured by the satisfaction cache during coercion for that
argument, and so we'd be freeing a vector that's still live?


It seems like for e.g.

template  concept NotInt = !__is_same (T, int);
template  struct A { };
template  using B = A<'x', Ts...>;

we don't check satisfaction until after we're done coercing, because of

if (processing_template_decl && context == adc_unify)
  /* Constraints will be checked after deduction.  */;

in do_auto_deduction.


Ah, I wonder why we pass/use adc_unify from both unify and
convert_template_argument..  That early exit makes sense for unify
but not for convert_template_argument since it prevents us from
checking constrained auto NTTPs during ahead of time coercion:

template
concept C = T::value;

template
struct A { };

template
void f() {
  A<0> a; // no constraint error
}

A<0> a; // constraint error

I guess we'd ideally want to fix/implement this?  At which point the
ggc_free's of new_inner_args would be unsafe I think..


I don't think that would be an improvement; the C constraint is an associated
constraint of A (https://eel.is/c++draft/temp#constr.decl-3.3.1).


I thought C in 'template' is an associated constraint but not in
'template'?  At least that's how GCC currently behaves.


I think that's wrong: https://eel.is/c++draft/temp#param-11


Why don't we check satisfaction of A<0> when parsing f?


We do but it's considered to have no associated constraints since C is
'attached' to the auto rather than to A currently.


Ah, yes, I think I've run into trouble from that before.

Jason



Re: [PATCH v2] c: ICE in build_counted_by_ref [PR116735]

2024-10-07 Thread Qing Zhao
Thanks for the review.

Just pushed as:

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9a17e6d03c6ed53e3b2dfd2c3ff9b1066ffa97b9

Qing
> On Oct 7, 2024, at 11:44, Marek Polacek  wrote:
> 
> On Wed, Oct 02, 2024 at 10:20:11PM +, Qing Zhao wrote:
>> From: qing zhao 
>> 
>> Hi, this is the 2nd version of the patch. 
>> compared to the 1st version, the major changes are to address Marek and
>> Jacub's comments.
>> 
>> bootstrapped and regression tested on both x86 and aarch64.
>> Okay for committing?
> 
> Ok, thanks.  (Sorry, was on vacation last week.)
> 
> Marek
> 



Re: [PATCH v13 0/4] c: Add __lengthof__ operator

2024-10-07 Thread Joseph Myers
Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
wait for them to be reviewed so that we only have a single-patch series, 
before doing final review of the main patch.

Since the feature was accepted as _Lengthof, that's the form that should 
be added to GCC; no __lengthof__ variant needed.  In general in GCC, 
although not strictly required by the standard in this case, we use 
pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
new C2Y feature that's not in C23 (if -pedantic with a pre-C2Y standard, 
or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
-pedantic, no diagnostic with -std=c23 -pedantic-errors 
-Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
that logic, you just need the pedwarn_c23 call and the tests for those 
various cases.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Дилян Палаузов
Hello Jakub,

does [[noreturn]] optimize the generated [[noreturn]] function itself, or it 
optimizes the calls to the [[noreturn]] function?  Hence, in the latter case 
optimizations are done based on function declaration, irrespective of function 
body.

Greetings
  Дилян


-Original Message-
From: Jakub Jelinek 
Reply-To: Jakub Jelinek 
To: Дилян Палаузов 
Cc: Andrew Pinski , gcc-patches@gcc.gnu.org
Subject: Re: Why -Wsuggest-attribute=noreturn does not apply to static?
Date: 07/10/24 11:43:24

On Mon, Oct 07, 2024 at 10:21:05AM +0300, Дилян Палаузов wrote:
> do you mean that optimizations for [[noreturn]] functions can only be done 
> when the functions are called from other TU?

No, but if it is static functions called from within the same TU, then the
[[noreturn]] attribute don't help, the compiler can discover that itself
(and if it can't, it wouldn't suggest it).

> How about a function with __attribute__ ((visibility ("hidden"))) during LTO 
> linking,
> aren’t there all functions considered more or less to be in the same TU?  Do 
> the [[noreturn]] optimizations still apply?

The [[noreturn]] optimizations apply always, if you call a function proven
to be noreturn (from attributes or callee analysis), then the compiler can
optimize away code after the call etc.
The attribute can be useful for non-static but hidden functions, callers
can be optimized earlier and don't rely on LTO propagation if user marks it
explicitly.

> What happens to a static [[noreturn]] function, if a pointer to it is passed 
> to other TU and that other TU calls the function?

Nothing.  [[noreturn]] attribute is function declaration attribute, it
doesn't apply to function pointers, so users can't mark a function pointer
to be function pointer to noreturn function; it will be only optimized
if the indirection is optimized out and turned into a direct call, then
the compiler knows whether it calls a noreturn function or not.
But, marking a static function [[noreturn]] even if you take a pointer
to it and pass to other TUs doesn't help in any way, the other TUs
will still see a function pointer which can't be marked and will not know
it will call only a noreturn function.

Jakub




[committed] gcc: Remove executable permissions of testcases and *.md files

2024-10-07 Thread Jakub Jelinek
Hi!

I've noticed some files were marked as executable, as can be
seen with
find . \( -name \*.[chSC] -o -name \*.md -o -name \*.cc \) -a -perm /111 | 
xargs ls -l

This commit fixes that.

Committed as obvious.

2024-10-07  Jakub Jelinek  

gcc/
* config/riscv/vector-crypto.md: Remove executable permissions.
gcc/testsuite/
* gcc.target/aarch64/uxtl-combine-1.c: Remove executable permissions.
* gcc.target/aarch64/uxtl-combine-2.c: Likewise.
* gcc.target/aarch64/uxtl-combine-3.c: Likewise.
* gcc.target/aarch64/uxtl-combine-4.c: Likewise.
* gcc.target/aarch64/uxtl-combine-5.c: Likewise.
* gcc.target/aarch64/uxtl-combine-6.c: Likewise.
* gcc.target/gcn/complex.c: Likewise.
* gcc.target/i386/avx2-bf16-vec-absneg.c: Likewise.
* gcc.target/i386/avx512f-bf16-vec-absneg.c: Likewise.
* gcc.target/i386/pr104371-2.c: Likewise.
* gcc.target/i386/pr115146.c: Likewise.
* gcc.target/i386/vpermt2-special-bf16-shufflue.c: Likewise.
* g++.target/i386/pr107563-a.C: Likewise.
* g++.target/i386/pr107563-b.C: Likewise.

diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/g++.target/i386/pr107563-a.C 
b/gcc/testsuite/g++.target/i386/pr107563-a.C
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/g++.target/i386/pr107563-b.C 
b/gcc/testsuite/g++.target/i386/pr107563-b.C
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/aarch64/uxtl-combine-1.c 
b/gcc/testsuite/gcc.target/aarch64/uxtl-combine-1.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/aarch64/uxtl-combine-2.c 
b/gcc/testsuite/gcc.target/aarch64/uxtl-combine-2.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/aarch64/uxtl-combine-3.c 
b/gcc/testsuite/gcc.target/aarch64/uxtl-combine-3.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/aarch64/uxtl-combine-4.c 
b/gcc/testsuite/gcc.target/aarch64/uxtl-combine-4.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/aarch64/uxtl-combine-5.c 
b/gcc/testsuite/gcc.target/aarch64/uxtl-combine-5.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/aarch64/uxtl-combine-6.c 
b/gcc/testsuite/gcc.target/aarch64/uxtl-combine-6.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/gcn/complex.c 
b/gcc/testsuite/gcc.target/gcn/complex.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/i386/avx2-bf16-vec-absneg.c 
b/gcc/testsuite/gcc.target/i386/avx2-bf16-vec-absneg.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-bf16-vec-absneg.c 
b/gcc/testsuite/gcc.target/i386/avx512f-bf16-vec-absneg.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/i386/pr104371-2.c 
b/gcc/testsuite/gcc.target/i386/pr104371-2.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/i386/pr115146.c 
b/gcc/testsuite/gcc.target/i386/pr115146.c
old mode 100755
new mode 100644
diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
old mode 100755
new mode 100644

Jakub



Re: [RFC PATCH] RISC-V: Implement riscv_minimal_hwprobe_feature_bits

2024-10-07 Thread Kito Cheng
I suggest not handling the extension implication rules. This way, it
can simplify
the logic and reduce the cost of run-time checks.

Also, you need to consider situations where that extension can't be detected.

And last, I would like to defer this until run-time resolver patch coming, but
welcome to send another version of RFC patch :P


On Sun, Oct 6, 2024 at 2:21 AM Yangyu Chen  wrote:
>
> This patch implements the riscv_minimal_hwprobe_feature_bits feature
> for the RISC-V target. The feature bits are defined in the previous
> patch [1] to provide bitmasks of ISA extensions that defined in RISC-V
> C-API. Thus, we need a function to generate the feature bits for IFUNC
> resolver to dispatch between different functions based on the hardware
> features. The final version of the target_clones support on RISC-V is
> still under development, I am working on it.
>
> The minimal feature bits means to use the earliest extension appeard in
> the Linux hwprobe to cover the given ISA string. To allow older kernels
> without some implied extensions probe to run the FMV dispatcher
> correctly.
>
> For example, V implies Zve32x, but Zve32x appears in the Linux kernel
> since v6.11. If we use isa string directly to generate FMV dispatcher
> with functions with "arch=+v" extension, since we have V implied the
> Zve32x, FMV dispatcher will check if the Zve32x extension is supported
> by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
> will fail to detect the Zve32x extension even it already implies by the
> V extension, thus making the FMV dispatcher fail to dispatch the correct
> function.
>
> Thus, we need to generate the minimal feature bits to cover the given
> ISA string to allow the FMV dispatcher to work correctly on older
> kernels.
>
> [1] 
> https://patchwork.sourceware.org/project/gcc/patch/20241003182256.1765569-1-chenyan...@isrc.iscas.ac.cn/
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (struct riscv_ext_bitmask_table_t): New struct.
> (riscv_minimal_hwprobe_feature_bits): New function.
> * config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H):
> (riscv_minimal_hwprobe_feature_bits): Declare the function.
> * common/config/riscv/feature_bits.h: New file.
> ---
>  gcc/common/config/riscv/feature_bits.h  |  33 ++
>  gcc/common/config/riscv/riscv-common.cc | 144 
>  gcc/config/riscv/riscv-subset.h |   4 +
>  3 files changed, 181 insertions(+)
>  create mode 100644 gcc/common/config/riscv/feature_bits.h
>
> diff --git a/gcc/common/config/riscv/feature_bits.h 
> b/gcc/common/config/riscv/feature_bits.h
> new file mode 100644
> index 000..c6c6d983edb
> --- /dev/null
> +++ b/gcc/common/config/riscv/feature_bits.h

Rename to riscv_feature_bits.h to prevent potential file name conflict
also move to gcc/config/riscv/ rather than gcc/config/riscv/

> @@ -0,0 +1,33 @@
> +/* Definition of RISC-V feature bits corresponding to
> +   libgcc/config/riscv/feature_bits.c
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +

Need header guard, something like #ifndef GCC_RISCV_FEATURE_BITS_H

> +#define RISCV_FEATURE_BITS_LENGTH 2
> +struct riscv_feature_bits {
> +  unsigned length;
> +  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
> +};
> +
> +#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
> +
> +struct riscv_vendor_feature_bits {
> +  unsigned vendorID;
> +  unsigned length;
> +  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
> +};
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index bd42fd01532..9f343782ae6 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  #include 
>  #include 
> +#include 
>
>  #define INCLUDE_STRING
>  #define INCLUDE_SET
> @@ -1754,6 +1755,75 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{NULL, NULL, 0}
>  };
>
> +/* Types for recording extension to RISC-V C-API bitmask.  */
> +struct riscv_ext_bitmask_table_t {
> +  const char *ext;
> +  int groupid;
> +  int bit_position;
> +};
> +
> +/* Mapping table between extension to RISC-V C-API extension bitmask.
> +   This table should sort 

Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Xi Ruoyao
On Mon, 2024-10-07 at 14:25 +0200, Jakub Jelinek wrote:
> On Mon, Oct 07, 2024 at 03:05:56PM +0300, Дилян Палаузов wrote:
> > does [[noreturn]] optimize the generated [[noreturn]] function itself, or
> > it optimizes the calls to the [[noreturn]] function?  Hence, in the latter
> > case optimizations are done based on function declaration, irrespective of
> > function body.
> 
> Of course the latter, that is the whole point of the attribute.
> In the definition of [[noreturn]] function itself, all it can do is
> warn if the function does return anyway.

Technically it also turns the return statements in the function body
into __builtin_unreachable(), if we call this an "optimization."  In C++
it's done for non-void function even without [[noreturn]] as allowed by
the C++ standard.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Xi Ruoyao
On Mon, 2024-10-07 at 20:37 +0800, Xi Ruoyao wrote:
> On Mon, 2024-10-07 at 14:25 +0200, Jakub Jelinek wrote:
> > On Mon, Oct 07, 2024 at 03:05:56PM +0300, Дилян Палаузов wrote:
> > > does [[noreturn]] optimize the generated [[noreturn]] function itself, or
> > > it optimizes the calls to the [[noreturn]] function?  Hence, in the latter
> > > case optimizations are done based on function declaration, irrespective of
> > > function body.
> > 
> > Of course the latter, that is the whole point of the attribute.
> > In the definition of [[noreturn]] function itself, all it can do is
> > warn if the function does return anyway.
> 
> Technically it also turns the return statements in the function body
> into __builtin_unreachable(), if we call this an "optimization."  In C++
> it's done for non-void function even without [[noreturn]] as allowed by
> the C++ standard.

Sorry, for C++ I only mean the implicit return at the end of function.

Also gcc-patches isn't a correct list for discussing this.  It should be
in gcc-help instead.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] libcpp: Use constexpr for _cpp_trigraph_map initialization for C++14

2024-10-07 Thread Jakub Jelinek
Hi!

The _cpp_trigraph_map initialization used to be done for C99+ using
designated initializers, but can't be done that way for C++ because
the designated initializer support in C++ as array designators are just
an extension there and don't allow skipping anything nor going backwards.

But, we can get the same effect using C++14 constexpr constructor.
With the following patch we get rid of the runtime initialization
and the array can be in .rodata.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-18  Jakub Jelinek  

* internal.h (_cpp_trigraph_map_s): New type for C++14 or later.
(_cpp_trigraph_map_d): New variable for C++14 or later.
(_cpp_trigraph_map): Define to _cpp_trigraph_map_d.map for C++14 or
later.
* init.cc (init_trigraph_map): Define to nothing for C++14 or later.
(TRIGRAPH_MAP, END, s): Define differently for C++14 or later.

--- libcpp/internal.h.jj2024-09-12 18:16:49.993409101 +0200
+++ libcpp/internal.h   2024-09-18 09:45:36.832570227 +0200
@@ -666,6 +666,12 @@ struct cpp_embed_params
compiler that supports C99.  */
 #if HAVE_DESIGNATED_INITIALIZERS
 extern const unsigned char _cpp_trigraph_map[UCHAR_MAX + 1];
+#elif __cpp_constexpr >= 201304L
+extern const struct _cpp_trigraph_map_s {
+  unsigned char map[UCHAR_MAX + 1];
+  constexpr _cpp_trigraph_map_s ();
+} _cpp_trigraph_map_d;
+#define _cpp_trigraph_map _cpp_trigraph_map_d.map
 #else
 extern unsigned char _cpp_trigraph_map[UCHAR_MAX + 1];
 #endif
--- libcpp/init.cc.jj   2024-09-13 16:09:32.701455021 +0200
+++ libcpp/init.cc  2024-09-18 09:49:43.671189585 +0200
@@ -41,8 +41,8 @@ static void read_original_directory (cpp
 static void post_options (cpp_reader *);
 
 /* If we have designated initializers (GCC >2.7) these tables can be
-   initialized, constant data.  Otherwise, they have to be filled in at
-   runtime.  */
+   initialized, constant data.  Similarly for C++14 and later.
+   Otherwise, they have to be filled in at runtime.  */
 #if HAVE_DESIGNATED_INITIALIZERS
 
 #define init_trigraph_map()  /* Nothing.  */
@@ -52,6 +52,15 @@ __extension__ const uchar _cpp_trigraph_
 #define END };
 #define s(p, v) [p] = v,
 
+#elif __cpp_constexpr >= 201304L
+
+#define init_trigraph_map()  /* Nothing.  */
+#define TRIGRAPH_MAP \
+constexpr _cpp_trigraph_map_s::_cpp_trigraph_map_s () : map {} {
+#define END } \
+constexpr _cpp_trigraph_map_s _cpp_trigraph_map_d;
+#define s(p, v) map[p] = v;
+
 #else
 
 #define TRIGRAPH_MAP uchar _cpp_trigraph_map[UCHAR_MAX + 1] = { 0 }; \

Jakub



C++ patch ping

2024-10-07 Thread Jakub Jelinek
Hi!

I'd like to ping 16 C++ patches:

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662750.html
  CWG 2867 - Order of initialization for structured bindings - rest of 
implementation [PR115769]

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661904.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661905.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661906.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662330.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662331.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662333.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662334.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662336.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662379.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662380.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662381.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662545.html
  P2552R3 - On the ignorability of standard attributes - series [PR110345]

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663390.html
  Use type_id_in_expr_sentinel in 6 further spots in the parser

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659836.html
  c++: Attempt to implement C++26 P3034R1 - Module Declarations Shouldn't be 
Macros [PR114461]

Jakub



libcpp patch ping

2024-10-07 Thread Jakub Jelinek
Hi!

I'd to ping a few libcpp patches

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662960.html
  - [PATCH] libcpp, genmatch: Use gcc_diag instead of printf for libcpp 
diagnostics
The genmatch side approved, libcpp remains

https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664663.html
  - [PATCH] libcpp: Use constexpr for _cpp_trigraph_map initialization for C++14
Couldn't find this in gcc-patches archives, so bounced it again;
therefore not really sure if it reached at least directly CCed people or
not

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663388.html
  - [PATCH] libcpp, v2: Add -Wtrailing-whitespace= warning
Not sure about the kinds for the option, given -Wleading-whitespace=
uses plural and this option singular and -Wleading-whitespace= spaces
means literally just ' ' characters, while space in
-Wtrailing-whitespace= was ' ', '\t', '\v' and '\f'; so category;
perhaps just use any and blanks?

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663734.html
  - [PATCH] libcpp: Add -Wleading-whitespace= warning

Thanks

Jakub



Re: [PATCH 2/3] Release expanded template argument vector

2024-10-07 Thread Jason Merrill

On 10/7/24 9:58 AM, Patrick Palka wrote:

On Sat, 5 Oct 2024, Jason Merrill wrote:


On 10/4/24 11:00 AM, Patrick Palka wrote:

On Thu, 3 Oct 2024, Jason Merrill wrote:


On 10/3/24 12:38 PM, Jason Merrill wrote:

On 10/2/24 7:50 AM, Richard Biener wrote:

This reduces peak memory usage by 20% for a specific testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It's very ugly so I'd appreciate suggestions on how to handle such
situations better?


I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.


OK, apparently that was both too clever and not clever enough. Replacing
it
with this one that's much closer to yours.

Jason



From: Jason Merrill 
Date: Thu, 3 Oct 2024 16:31:00 -0400
Subject: [PATCH] c++: free garbage vec in coerce_template_parms
To: gcc-patches@gcc.gnu.org

coerce_template_parms can create two different vecs for the inner template
arguments, new_inner_args and (potentially) the result of
expand_template_argument_pack.  One or the other, or possibly both, end up
being garbage: in the typical case, the expanded vec is garbage because
it's
only used as the source for convert_template_argument.  In some dependent
cases, the new vec is garbage because we decide to return the original
args
instead.  In these cases, ggc_free the garbage vec to reduce the memory
overhead of overload resolution.

gcc/cp/ChangeLog:

* pt.cc (coerce_template_parms): Free garbage vecs.

Co-authored-by: Richard Biener 
---
   gcc/cp/pt.cc | 10 +-
   1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 20affcd65a2..4ceae1d38de 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
{
  /* We don't know how many args we have yet, just use the
 unconverted (and still packed) ones for now.  */
+ ggc_free (new_inner_args);
  new_inner_args = orig_inner_args;
  arg_idx = nargs;
  break;
@@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
  = make_pack_expansion (conv, complain);
   /* We don't know how many args we have yet, just
- use the unconverted ones for now.  */
+use the unconverted (but unpacked) ones for now.  */
+ ggc_free (new_inner_args);


I'm a bit worried about these ggc_frees.  If an earlier template
parameter is a constrained auto NTTP then new_inner_args/new_args could
have been captured by the satisfaction cache during coercion for that
argument, and so we'd be freeing a vector that's still live?


It seems like for e.g.

template  concept NotInt = !__is_same (T, int);
template  struct A { };
template  using B = A<'x', Ts...>;

we don't check satisfaction until after we're done coercing, because of

   if (processing_template_decl && context == adc_unify)
 /* Constraints will be checked after deduction.  */;

in do_auto_deduction.


Ah, I wonder why we pass/use adc_unify from both unify and
convert_template_argument..  That early exit makes sense for unify
but not for convert_template_argument since it prevents us from
checking constrained auto NTTPs during ahead of time coercion:

   template
   concept C = T::value;

   template
   struct A { };

   template
   void f() {
 A<0> a; // no constraint error
   }

   A<0> a; // constraint error

I guess we'd ideally want to fix/implement this?  At which point the
ggc_free's of new_inner_args would be unsafe I think..


I don't think that would be an improvement; the C constraint is an 
associated constraint of A (https://eel.is/c++draft/temp#constr.decl-3.3.1).


Why don't we check satisfaction of A<0> when parsing f?

Jason



[r15-4104 Regression] FAIL: gfortran.dg/gomp/allocate-static.f90 -Os (test for excess errors) on Linux/x86_64

2024-10-07 Thread haochen.jiang
On Linux/x86_64,

a8caeaacf499d58ba7ceabc311b7b71ca806f740 is the first bad commit
commit a8caeaacf499d58ba7ceabc311b7b71ca806f740
Author: Tobias Burnus 
Date:   Mon Oct 7 10:45:14 2024 +0200

OpenMP: Allocate directive for static vars, clean up

caused

FAIL: gfortran.dg/gomp/allocate-static.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/gomp/allocate-static.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/gomp/allocate-static.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/gomp/allocate-static.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/gomp/allocate-static.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/gomp/allocate-static.f90   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-4104/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-static.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-static.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-static.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-static.f90 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH 2/3] Release expanded template argument vector

2024-10-07 Thread Patrick Palka
On Mon, 7 Oct 2024, Jason Merrill wrote:

> On 10/7/24 9:58 AM, Patrick Palka wrote:
> > On Sat, 5 Oct 2024, Jason Merrill wrote:
> > 
> > > On 10/4/24 11:00 AM, Patrick Palka wrote:
> > > > On Thu, 3 Oct 2024, Jason Merrill wrote:
> > > > 
> > > > > On 10/3/24 12:38 PM, Jason Merrill wrote:
> > > > > > On 10/2/24 7:50 AM, Richard Biener wrote:
> > > > > > > This reduces peak memory usage by 20% for a specific testcase.
> > > > > > > 
> > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > > 
> > > > > > > It's very ugly so I'd appreciate suggestions on how to handle such
> > > > > > > situations better?
> > > > > > 
> > > > > > I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.
> > > > > 
> > > > > OK, apparently that was both too clever and not clever enough.
> > > > > Replacing
> > > > > it
> > > > > with this one that's much closer to yours.
> > > > > 
> > > > > Jason
> > > > 
> > > > > From: Jason Merrill 
> > > > > Date: Thu, 3 Oct 2024 16:31:00 -0400
> > > > > Subject: [PATCH] c++: free garbage vec in coerce_template_parms
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > 
> > > > > coerce_template_parms can create two different vecs for the inner
> > > > > template
> > > > > arguments, new_inner_args and (potentially) the result of
> > > > > expand_template_argument_pack.  One or the other, or possibly both,
> > > > > end up
> > > > > being garbage: in the typical case, the expanded vec is garbage
> > > > > because
> > > > > it's
> > > > > only used as the source for convert_template_argument.  In some
> > > > > dependent
> > > > > cases, the new vec is garbage because we decide to return the original
> > > > > args
> > > > > instead.  In these cases, ggc_free the garbage vec to reduce the
> > > > > memory
> > > > > overhead of overload resolution.
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   * pt.cc (coerce_template_parms): Free garbage vecs.
> > > > > 
> > > > > Co-authored-by: Richard Biener 
> > > > > ---
> > > > >gcc/cp/pt.cc | 10 +-
> > > > >1 file changed, 9 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > > index 20affcd65a2..4ceae1d38de 100644
> > > > > --- a/gcc/cp/pt.cc
> > > > > +++ b/gcc/cp/pt.cc
> > > > > @@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
> > > > >   {
> > > > > /* We don't know how many args we have yet, just use the
> > > > >unconverted (and still packed) ones for now.  */
> > > > > +   ggc_free (new_inner_args);
> > > > > new_inner_args = orig_inner_args;
> > > > > arg_idx = nargs;
> > > > > break;
> > > > > @@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
> > > > > = make_pack_expansion (conv, complain);
> > > > >/* We don't know how many args we have yet, just
> > > > > - use the unconverted ones for now.  */
> > > > > +  use the unconverted (but unpacked) ones for now.  */
> > > > > +   ggc_free (new_inner_args);
> > > > 
> > > > I'm a bit worried about these ggc_frees.  If an earlier template
> > > > parameter is a constrained auto NTTP then new_inner_args/new_args could
> > > > have been captured by the satisfaction cache during coercion for that
> > > > argument, and so we'd be freeing a vector that's still live?
> > > 
> > > It seems like for e.g.
> > > 
> > > template  concept NotInt = !__is_same (T, int);
> > > template  struct A { };
> > > template  using B = A<'x', Ts...>;
> > > 
> > > we don't check satisfaction until after we're done coercing, because of
> > > 
> > >if (processing_template_decl && context == adc_unify)
> > >  /* Constraints will be checked after deduction.  */;
> > > 
> > > in do_auto_deduction.
> > 
> > Ah, I wonder why we pass/use adc_unify from both unify and
> > convert_template_argument..  That early exit makes sense for unify
> > but not for convert_template_argument since it prevents us from
> > checking constrained auto NTTPs during ahead of time coercion:
> > 
> >template
> >concept C = T::value;
> > 
> >template
> >struct A { };
> > 
> >template
> >void f() {
> >  A<0> a; // no constraint error
> >}
> > 
> >A<0> a; // constraint error
> > 
> > I guess we'd ideally want to fix/implement this?  At which point the
> > ggc_free's of new_inner_args would be unsafe I think..
> 
> I don't think that would be an improvement; the C constraint is an associated
> constraint of A (https://eel.is/c++draft/temp#constr.decl-3.3.1).

I thought C in 'template' is an associated constraint but not in
'template'?  At least that's how GCC currently behaves.

> 
> Why don't we check satisfaction of A<0> when parsing f?

We do but it's considered to have no associated constraints since C is
'attached' to the auto rather than to A currently.

> 
> Jason
> 
> 



#embed patch ping

2024-10-07 Thread Jakub Jelinek
Hi!

I'd like to ping a few #embed related patches:

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657053.html
  - [PATCH] libcpp, c, middle-end: Optimize initializers using #embed in C

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658137.html
  - [PATCH] libcpp, c++: Optimize initializers using #embed in C++

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659332.html
  - [PATCH] c: Speed up compilation of large char array initializers when not 
using #embed

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659333.html
  - [PATCH] c++: Speed up compilation of large char array initializers when not 
using #embed

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659331.html
  - [PATCH] gimplify: Small RAW_DATA_CST gimplification fix
This particular one is actually approved but the prerequisites are
not, so I'm including it for completeness

Jakub



Re: [Ping*3, Patch, Fortran, 77871, v1] Allow for class typed coarray parameter as dummy [PR77871]

2024-10-07 Thread Andre Vehreschild
Hi all,

this patch somehow slipped my attention. Anyone for a review? Third time ping!

Rebased to current mainline.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre

On Wed, 18 Sep 2024 12:30:23 +0200
Andre Vehreschild  wrote:

> Hi all,
>
> back from my holidays and still no review.  PING PING!
>
> Rebased to current mainline.
>
> Regtested ok on x86_64-pc-linux-gnu / F39. Ok for mainline?
>
> Regards,
>   Andre
>
> On Wed, 21 Aug 2024 13:43:52 +0200
> Andre Vehreschild  wrote:
>
> > Hi all,
> >
> > pinging this patch for the first time.
> >
> > Rebased and regtested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for
> > mainline?
> >
> > - Andre
> >
> > On Thu, 15 Aug 2024 14:39:25 +0200
> > Andre Vehreschild  wrote:
> >
> > > Hi all,
> > >
> > > attached patch fixes another regression on coarrays. This time for class
> > > typed coarrays as dummys.
> > >
> > > Regtested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
> > >
> > > Regards,
> > >   Andre
> > > --
> > > Andre Vehreschild * Email: vehre ad gmx dot de
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
From 48e77542f0e3342c5da31ecce1b229fa3fbbdaa2 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 15 Aug 2024 13:49:49 +0200
Subject: [PATCH] [Fortran] Allow for class type coarray parameters. [PR77871]

gcc/fortran/ChangeLog:

	PR fortran/77871

	* trans-expr.cc (gfc_conv_derived_to_class): Assign token when
	converting a coarray to class.
	(gfc_get_tree_for_caf_expr): For classes get the caf decl from
	the saved descriptor.
	(gfc_get_caf_token_offset):Assert that coarray=lib is set and
	cover more cases where the tree having the coarray token can be.
	* trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Use unified
	test for pointers.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/dummy_3.f90: New test.
---
 gcc/fortran/trans-expr.cc | 36 ---
 gcc/fortran/trans-intrinsic.cc|  2 +-
 gcc/testsuite/gfortran.dg/coarray/dummy_3.f90 | 33 +
 3 files changed, 58 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/dummy_3.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 9f223a1314a..4065ea2a735 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -810,6 +810,16 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym,
   /* Now set the data field.  */
   ctree = gfc_class_data_get (var);

+  if (flag_coarray == GFC_FCOARRAY_LIB && CLASS_DATA (fsym)->attr.codimension)
+{
+  tree token;
+  tmp = gfc_get_tree_for_caf_expr (e);
+  if (POINTER_TYPE_P (TREE_TYPE (tmp)))
+	tmp = build_fold_indirect_ref (tmp);
+  gfc_get_caf_token_offset (parmse, &token, nullptr, tmp, NULL_TREE, e);
+  gfc_add_modify (&parmse->pre, gfc_conv_descriptor_token (ctree), token);
+}
+
   if (optional)
 cond_optional = gfc_conv_expr_present (e->symtree->n.sym);

@@ -2344,6 +2354,10 @@ gfc_get_tree_for_caf_expr (gfc_expr *expr)

   if (expr->symtree->n.sym->ts.type == BT_CLASS)
 {
+  if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl)
+	  && GFC_DECL_SAVED_DESCRIPTOR (caf_decl))
+	caf_decl = GFC_DECL_SAVED_DESCRIPTOR (caf_decl);
+
   if (expr->ref && expr->ref->type == REF_ARRAY)
 	{
 	  caf_decl = gfc_class_data_get (caf_decl);
@@ -2408,16 +2422,12 @@ gfc_get_caf_token_offset (gfc_se *se, tree *token, tree *offset, tree caf_decl,
 {
   tree tmp;

+  gcc_assert (flag_coarray == GFC_FCOARRAY_LIB);
+
   /* Coarray token.  */
   if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (caf_decl)))
-{
-  gcc_assert (GFC_TYPE_ARRAY_AKIND (TREE_TYPE (caf_decl))
-		== GFC_ARRAY_ALLOCATABLE
-		  || expr->symtree->n.sym->attr.select_type_temporary
-		  || expr->symtree->n.sym->assoc);
   *token = gfc_conv_descriptor_token (caf_decl);
-}
-  else if (DECL_LANG_SPECIFIC (caf_decl)
+  else if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl)
 	   && GFC_DECL_TOKEN (caf_decl) != NULL_TREE)
 *token = GFC_DECL_TOKEN (caf_decl);
   else
@@ -2435,7 +2445,7 @@ gfc_get_caf_token_offset (gfc_se *se, tree *token, tree *offset, tree caf_decl,
   && (GFC_TYPE_ARRAY_AKIND (TREE_TYPE (caf_decl)) == GFC_ARRAY_ALLOCATABLE
 	  || GFC_TYPE_ARRAY_AKIND (TREE_TYPE (caf_decl)) == GFC_ARRAY_POINTER))
 *offset = build_int_cst (gfc_array_index_type, 0);
-  else if (DECL_LANG_SPECIFIC (caf_decl)
+  else if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl)
 	   && GFC_DECL_CAF_OFFSET (caf_decl) != NULL_TREE)
 *offset = GFC_DECL_CAF_OFFSET (caf_decl);
   else if (GFC_TYPE_ARRAY_CAF_OFFSET (TREE_TYPE (caf_decl)) != NULL_TREE)
@@ -2502,11 +2512,13 @@ gfc_get_caf_token_offset (gfc_se *se, tree *token, tree *offset, tree caf_decl,
 }
   else if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (caf_decl)))
 tmp = gfc_con

Re: RFC: C++ and C23 zero initialization of padding bits

2024-10-07 Thread Joseph Myers
On Sat, 28 Sep 2024, Jakub Jelinek wrote:

> I'd hope that structure assignment is element-wise copying and so doesn't
> need to preserve those bits.  What about memcpy, or *(unsigned char *),
> or for C++ std::bit_cast inspection of the bits (constexpr for C++ or not)?

In C, memcpy or *(unsigned char *) should allow the zeroed padding bits to 
be observed.

Structure assignment does not need to preserve the bits.  Assigning to any 
structure member after initialization also may change padding bits.

(There was a proposal for more-defined evaluation order that was discussed 
at the WG14 meeting in Strasbourg in January - which received 
along-the-lines support but didn't come back in Minneapolis - that would 
have the effect of requiring initializer evaluations and stores in 
corresponding members in a particular order, effectively making automatic 
storage duration initializers more like a sequence of assignments.  It 
didn't say anything about what the effects would be on padding bits.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v6] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-07 Thread Jason Merrill

On 10/7/24 11:27 AM, Simon Martin wrote:

Hi Jason,

On 17 Sep 2024, at 18:41, Jason Merrill wrote:


On 9/17/24 10:38 AM, Simon Martin wrote:

Hi Jason,

Apologies for the back and forth and thanks for your patience!


No worries.


On 5 Sep 2024, at 19:00, Jason Merrill wrote:


On 9/5/24 7:02 AM, Simon Martin wrote:

Hi Jason,

On 4 Sep 2024, at 18:09, Jason Merrill wrote:


On 9/1/24 2:51 PM, Simon Martin wrote:

Hi Jason,

On 26 Aug 2024, at 19:23, Jason Merrill wrote:


On 8/25/24 12:37 PM, Simon Martin wrote:

On 24 Aug 2024, at 23:59, Simon Martin wrote:

On 24 Aug 2024, at 15:13, Jason Merrill wrote:


On 8/23/24 12:44 PM, Simon Martin wrote:

We currently emit an incorrect -Woverloaded-virtual warning



upon



the



following
test case

=== cut here ===
struct A {
virtual operator int() { return 42; }
virtual operator char() = 0;
};
struct B : public A {
operator char() { return 'A'; }
};
=== cut here ===

The problem is that warn_hidden relies on get_basefndecls to



find



the
methods
in A possibly hidden B's operator char(), and gets both the
conversion operator
to int and to char. It eventually wrongly concludes that the



conversion to int
is hidden.

This patch fixes this by filtering out conversion operators
to



different types
from the list returned by get_basefndecls.


Hmm, same_signature_p already tries to handle comparing
conversion
operators, why isn't that working?


It does indeed.

However, `ovl_range (fns)` does not only contain `char
B::operator()` -
for which `any_override` gets true - but also `conv_op_marker`
-



for



which `any_override` gets false, causing `seen_non_override`
to



get
to
true. Because of that, we run the last loop, that will emit a
warning
for all `base_fndecls` (except `char B::operator()` that has
been
removed).

We could test `fndecl` and `base_fndecls[k]` against
`conv_op_marker` in
the loop, but we’d still need to inspect the “converting
to”
type
in the last loop (for when `warn_overloaded_virtual` is 2).



This



would
make the code much more complex than the current patch.


Makes sense.


It would however probably be better if `get_basefndecls` only
returned
the right conversion operator, not all of them. I’ll draft
another
version of the patch that does that and submit it in this
thread.


I have explored my suggestion further and it actually ends up
more
complicated than the initial patch.


Yeah, you'd need to do lookup again for each member of fns.


Please find attached a new revision to fix the reported issue,
as



well
as new ones I discovered while testing with
-Woverloaded-virtual=2.




It’s pretty close to the initial patch, but (1) adds a
missing
“continue;” (2) fixes a location problem when
-Woverloaded-virtual==2 (3) adds more test cases. The commit
log
is
also
more comprehensive, and should describe well the various
problems



and




why the patch is correct.



+   if (IDENTIFIER_CONV_OP_P (name)
+   && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
+DECL_CONV_FN_TYPE (base_fndecls[k])))
+ {
+   base_fndecls[k] = NULL_TREE;
+   continue;
+ }


So this removes base_fndecls[k] if it doesn't return the same



type



as
fndecl.  But what if there's another conversion op in fns that
does



return the same type as base_fndecls[k]?

If I add an operator int() to both base and derived in
Woverloaded-virt7.C, the warning disappears.


That was an issue indeed. I’ve reworked the patch, and came up
with
the attached latest version. It explicitly keeps track both of
overloaded and of hidden base methods (and the “hiding



method” for
the latter), and uses those instead of juggling with bools and
nullified base_decls.

On top of fixing the issue the PR reports, it fixes a few that I
came across while investigating:
- wrongly emitting the warning if the base method is not virtual
(the
lines added to Woverloaded-virt1.C would cause a warning without
the patch)
- wrongly emitting the warning when the derived class method is a



template, which is wrong since template members don’t override
virtual
base methods (see the change in pr61945.C)


This change seems wrong to me; the warning is documented as "Warn



when
a function declaration hides virtual functions from a base class,"
and
templates can certainly hide virtual base methods, as indeed they
do
in that testcase.

Gasp, you’re right. The updated patch fixes this by simply
working
from the TEMPLATE_TEMPLATE_RESULT of TEMPLATE_DECL; so pr61945.C
warns
again (after changing the signature so that it actually hides the
base
class; it was not before, hence the warning was actually
incorrect).


It was hiding the base function before, the warning was correct;
hiding is based on name, not signature.  Only overriding depends on
the signature.

Indeed. The mistake in the last patch was to assume that
same_sig

Ping #2: [PATCH] PR 99293: Optimize splat of a V2DF/V2DI extract with constant element

2024-10-07 Thread Michael Meissner
This patch seems to have been over looked.

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663101.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [Fortran, Patch, PR51815, v3] Fix parsing of substring refs in coarrays.

2024-10-07 Thread Harald Anlauf

Hi Andre,

On 10/7/24 11:04, Andre Vehreschild wrote:

Hi Harald,

thank you for your input. I still have some small nits to discuss to make
everyone happy. Therefore:


this seems to go into the right direction - except that I am not a
great fan of gfc_error_now, as that tries to paper over deficiencies
in error recovery.


Me either, but when I remove the gfc_error_now() and only do


  if (gfc_peek_ascii_char () == '(')
return MATCH_ERROR;


as you proposed, then no error is given for:

character(:), allocatable :: x[:]
character(:), allocatable :: c
c = x(:)(2:5)

I.e. nothing at all.


hmmm, without the hunk in question I do get:

4 |   c = x(:)(2:5)
  |   1
Error: Unclassifiable statement at (1)


which is the same when doing a return MATCH_ERROR;

When I simply use:

  if (gfc_peek_ascii_char () == '(')
{
  gfc_error ("Unexpected array/substring ref at %C");
  return MATCH_ERROR;
}

this already generates:

4 |   c = x(:)(2:5)
  |   1
Error: Unexpected array/substring ref at (1)


> Therefore at the moment I prefer to stick to the initial> solution
with the gfc_error_now, which not only gives an error in the

associate, but also when one just does an array/substring-ref outside of
parentheses. And I like the new error message, because I consider it more
helpful than just a syntax error or the invalid association target message.
What do you think?


The motivation for my asking is based on the following naive thinking
(assuming that x is of type character):

x(:)(2:5)! could be a rank mismatch when x is an array
x[1](:)(2:5) ! is always a syntax error
x(:)[1](2:5) ! could by diagnosed as a rank mismatch

That is of course wishful thinking on my side.  No compiler
matches this completely, and diagnosing a syntax error is
certainly acceptable behavior.  (Some other brand shows funny
diagnostics coming likely from the resolution phase).


Is there a reason that you do not check the return value of
gfc_match_array_ref?


What am I to do with the result? We are in an error case independent of the
result of gfc_match_array_ref. The intention of using that routine here was to
digest the unexpected input and allow for (easier|better) error recovery.


Do you have an example that shows the use of gfc_match_array_ref here?
Commenting it out doesn't seem to make a difference in the error case
here, unless I missed something.

> May> be I should just put a comment on it, to make it more clear. Or
is there

another way to help the parser recover from an error?


Well, I am not the expert to answer that.  Without gfc_error_now,
we're more likely seeing errors coming from the parsing of the
associate, and here I would point to Paul as the one with the most
experience.  I would hope that the parsing of associate would see
if an error was issued for the associate target and allow that error
to be emitted.


Sorry for the additional round. But this error has been around for so long,
that it doesn't matter, if we need another day to come up with a solution.


Indeed!  :-)


Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?


I am fine with your solution.  Diagnostics can be improved later
any time...


Regards,
Andre


Thanks for your patience!

Harald




Indeed your suggestion (or the shortened version above) improves
the diagnostics ("user experience") also for this variant:

subroutine foo
 character(:), allocatable :: x[:]
 character(:), dimension(:), allocatable :: c[:]
 type t
character(:), allocatable :: x[:]
character(:), dimension(:), allocatable :: c[:]
 end type t
 type(t) :: z
 associate (y => x(:)(2:))
 end associate
 associate (a => c(:)(:)(2:))
 end associate
 associate (y => z%x(:)(2:))
 end associate
 associate (a => z%c(:)(:)(2:))
 end associate
end

with several error messages of the kind

Error: Invalid association target at (1)

or

Error: Rank mismatch in array reference at (1) (1/0)

looking less technical than a parsing error.
I think this is as good as it can be.

So OK from my side with either your additional patch or my
shortened version.

Thanks for the patch!

Harald



Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Is this ok?

Regards and thanks for the review,
Andre

On Tue, 1 Oct 2024 23:31:11 +0200
Harald Anlauf  wrote:


Hi Andre,

Am 01.10.24 um 09:43 schrieb Andre Vehreschild:

Hi all,

this rather old PR reported a parsing bug, when a coarray'ed character
substring ref is to be parsed, aka CHARACTER(:) :: str[:] ... str(2:5). In
this case the parser confused the substring ref with an array-ref, because
an array_spec was present. This patch fixes this by requesting only
coarray parsing from gfc_match_array_ref when no regular dimension is
present. The patch is not involved when an array of coarray'ed strings is
parsed (that worked beforehand).


while the patch address

Re: [PATCH] c++: Avoid "infinite parsing" because of cp_parser_decltype [PR114858]

2024-10-07 Thread Simon Martin
Hi Jason,

On 30 Sep 2024, at 20:56, Jason Merrill wrote:

> On 9/17/24 8:14 AM, Simon Martin wrote:
>> The invalid test case in this PR highlights a bad interaction between

>> the tentative_firewall and error recovery in cp_parser_decltype: the
>> firewall makes cp_parser_skip_to_closing_parenthesis a no-op, and the

>> parser does not make any progress, running "forever".
>>
>> This patch calls cp_parser_commit_to_tentative_parse before 
>> initiating
>> error recovery.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/114858
>>
>> gcc/cp/ChangeLog:
>>
>>  * parser.cc (cp_parser_decltype): Commit tentative parse before
>>  initiating error recovery.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/cpp0x/decltype10.C: Adjust test expectation.
>>  * g++.dg/cpp2a/pr114858.C: New test.
>> ---
>>   gcc/cp/parser.cc|  3 +++
>>   gcc/testsuite/g++.dg/cpp0x/decltype10.C |  2 ++
>>   gcc/testsuite/g++.dg/cpp2a/pr114858.C   | 25 
>> +
>>   3 files changed, 30 insertions(+)
>>   create mode 100644 gcc/testsuite/g++.dg/cpp2a/pr114858.C
>>
>> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
>> index 4dd9474cf60..3a7c5ffe4c8 100644
>> --- a/gcc/cp/parser.cc
>> +++ b/gcc/cp/parser.cc
>> @@ -17508,6 +17508,9 @@ cp_parser_decltype (cp_parser *parser)
>> /* Parse to the closing `)'.  */
>> if (expr == error_mark_node || !parens.require_close (parser))
>>   {
>> +  /* Commit to the tentative_firewall so we actually skip to the 
>> closing
>> + parenthesis.  */
>> +  cp_parser_commit_to_tentative_parse (parser);
>
> I don't think this is right.
>
> Earlier in cp_parser_decltype I see
>
>>   /* If in_declarator_p, a reparse as an expression might succeed 
>> (60361).  
>>  Otherwise, commit now for better diagnostics.  */
>>   if (cp_parser_uncommitted_to_tentative_parse_p (parser)
>>   && !parser->in_declarator_p)
>> cp_parser_commit_to_topmost_tentative_parse (parser);
>
> Here we're in a declarator, so we didn't commit at that point.  And we 
> still don't want to commit if parsing fails; as the comment says, when 
> reparsing as an expression-statement it might work.  Though there 
> seems not to be a testcase for that...
Right, understood. I’ll see if I can come up with something in a 
follow-up patch.
>
> In trying to come up with a testcase, I wrote this one that already 

> fails because the error doesn't happen until after the decltype, so we 
> memorize the wrong result:
>
> struct Helper { Helper(int, ...); };
> template  struct C;
> template<> struct C {};
> char A = 1;
> Helper testFail(int(A), C{}); // { dg-bogus "C" }
>
> So in the long term we need to overhaul this code to handle reparsing 
> even without a syntax error.  But it's not a high priority.
Nice; I’ll open a PR for that, and try to take a stab at it.
>
> Getting back to your patch, I think the problem is in 
> cp_parser_simple_type_specifier:
>
>> case RID_DECLTYPE:
>>   /* Since DR 743, decltype can either be a simple-type-specifier 
>> by
>>  itself or begin a nested-name-specifier.  Parsing it will replace
>>it 
>> with a CPP_DECLTYPE, so just rewind and let the CPP_DECLTYPE  
>>handling 
>> below decide what to do.  */
>>   cp_parser_decltype (parser);
>>   cp_lexer_set_token_position (parser->lexer, token);
>>   break;
>
> This assumes that cp_parser_decltype will always succeed, which is 
> wrong.  We need to check whether the token actually became 
> CPP_DECLTYPE and parser_error if not.
We should definitely to this. Note however that it’s not sufficient to 
fix the test case: compilation time still explodes because for 
constantly re-parse “brain dead” template ids (unterminated 
parameter list). The attached patch fixes both issues, and has been 
successfully tested on x86_64-pc-linux-gnu. Ok for trunk?

Thanks, SimonFrom 61fadd961442b919281c865d0db90fb4ed6265aa Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Mon, 16 Sep 2024 10:14:46 +0200
Subject: [PATCH] c++: Avoid "infinite parsing" because of cp_parser_decltype 
[PR114858]

The invalid test case in this PR highlights two deficiencies:
 1. cp_parser_simple_type_specifier assumes that parsing a decltype
always succeeds
 2. cp_parser_template_id does not turn the sequence of tokens into a
CPP_TEMPLATE_ID even in case we're sure that no reparse will ever
succeed (e.g. the template parameter list is not terminated).

So for each "decltype level" in the test case, we try to parse the
decltype expression as an id-expression, then a postfix-expression, then
an expression, failing every time, and the compilation time explodes.

This patches addresses issue #1 by checking whether the decltype token
was

Re: [r15-4104 Regression] FAIL: gfortran.dg/gomp/allocate-static.f90 -Os (test for excess errors) on Linux/x86_64

2024-10-07 Thread Thomas Schwinge
Hi Tobias!

On 2024-10-07T17:07:05+0200, Tobias Burnus  wrote:
> haochen.jiang wrote:
>> On Linux/x86_64,
>> FAIL: gfortran.dg/gomp/allocate-static.f90   -O0  (test for excess errors)
>
> If anyone can reproduce this, I would be interested in the excess errors.

gfortran: fatal error: cannot read spec file 'libgomp.spec': No such file 
or directory

> On two machines – with and without offloading configured – I cannot 
> reproduce this neither with a bootsstrap nor non-bootstrap build, 
> neither with the testsuite nor under valgrind and also not with -m32 vs. 
> -m64.

Try again with build-tree (non-installed) testing.  ;-)

On 2024-10-07T10:47:56+0200, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-static.f90
> @@ -0,0 +1,62 @@
> +! { dg-do run }

Implicit linking here.

I already was about to 'git mv' the file into
'libgomp/testsuite/libgomp.fortran/' -- but then realized that we
probably also should get rid of this local 'module omp_lib_kinds':

> +module omp_lib_kinds
> +  use iso_c_binding, only: c_int, c_intptr_t
> +  implicit none
> +  private :: c_int, c_intptr_t
> +  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
> +
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_null_allocator = 0
> +  [...]
> +end module

..., right?

> +[...]


Grüße
 Thomas


[PATCH] libstdc++: Implement P0849R8 auto(x) library changes

2024-10-07 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk only?
This doesn't seem worth backporting since there should be no
behavior change.

-- >8 --

This implements the library changes in P0849R8 "auto(x): decay-copy
in the language" which consist of replacing most uses of the
exposition-only function decay-copy with auto(x) throughout the library
wording.

Note the main difference between the two is that decay-copy materializes
its argument whereas auto(x) doesn't, and so the latter is a no-op when
its argument is a prvalue.  Effectively the former could introduce an
unnecessary move constructor call in some contexts.  In C++20 and earlier
we could emulate auto(x) with decay_t(x).

After this paper the only remaining uses of decay-copy in the library
are in the specification of some range adaptors.  In our implementation
of those range adaptors I believe decay-copy is already implied which
is why we don't mirror the wording and use __decay_copy explicitly.  So
since it's apparently no longer needed this patch goes ahead and removes
__decay_copy.

libstdc++-v3/ChangeLog:

* c++config (_GLIBCXX_AUTO_CAST): Define.
* include/bits/iterator_concepts.h (_Decay_copy, __decay_copy):
Remove.
(__member_begin, __adl_begin): Use _GLIBCXX_AUTO_CAST instead of
__decay_copy as per P0849R8.
* include/bits/ranges_base.h (_Begin): Likewise.
(__member_end, __adl_end, _End): Likewise.
(__member_rbegin, __adl_rbegin, _RBegin): Likewise.
(__member_rend, __adl_rend, _Rend): Likewise.
(__member_size, __adl_size, _Size): Likewise.
(_Data): Likewise.
---
 libstdc++-v3/include/bits/c++config   |  6 +++
 libstdc++-v3/include/bits/iterator_concepts.h | 13 +-
 libstdc++-v3/include/bits/ranges_base.h   | 40 +--
 3 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 29d795f687c..fdbf90e28fc 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -265,6 +265,12 @@
 #define _GLIBCXX_NOEXCEPT_QUAL
 #endif
 
+#if __cpp_auto_cast
+# define _GLIBCXX_AUTO_CAST(X) auto(X)
+#else
+# define _GLIBCXX_AUTO_CAST(X) ::std::__decay_t(X)
+#endif
+
 // Macro for extern template, ie controlling template linkage via use
 // of extern keyword on template declaration. As documented in the g++
 // manual, it inhibits all implicit instantiations and is used
diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index 490a362cdf1..0fcfed56737 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -1003,19 +1003,10 @@ namespace ranges
   {
 using std::__detail::__class_or_enum;
 
-struct _Decay_copy final
-{
-  template
-   constexpr decay_t<_Tp>
-   operator()(_Tp&& __t) const
-   noexcept(is_nothrow_convertible_v<_Tp, decay_t<_Tp>>)
-   { return std::forward<_Tp>(__t); }
-} inline constexpr __decay_copy{};
-
 template
   concept __member_begin = requires(_Tp& __t)
{
- { __decay_copy(__t.begin()) } -> input_or_output_iterator;
+ { _GLIBCXX_AUTO_CAST(__t.begin()) } -> input_or_output_iterator;
};
 
 // Poison pill so that unqualified lookup doesn't find std::begin.
@@ -1025,7 +1016,7 @@ namespace ranges
   concept __adl_begin = __class_or_enum>
&& requires(_Tp& __t)
{
- { __decay_copy(begin(__t)) } -> input_or_output_iterator;
+ { _GLIBCXX_AUTO_CAST(begin(__t)) } -> input_or_output_iterator;
};
 
 // Simplified version of std::ranges::begin that only supports lvalues,
diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index cb2eba1f841..80ff1e300ce 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -115,9 +115,9 @@ namespace ranges
  if constexpr (is_array_v>)
return true;
  else if constexpr (__member_begin<_Tp>)
-   return noexcept(__decay_copy(std::declval<_Tp&>().begin()));
+   return noexcept(_GLIBCXX_AUTO_CAST(std::declval<_Tp&>().begin()));
  else
-   return noexcept(__decay_copy(begin(std::declval<_Tp&>(;
+   return noexcept(_GLIBCXX_AUTO_CAST(begin(std::declval<_Tp&>(;
}
 
 public:
@@ -142,7 +142,7 @@ namespace ranges
 template
   concept __member_end = requires(_Tp& __t)
{
- { __decay_copy(__t.end()) } -> sentinel_for<__range_iter_t<_Tp>>;
+ { _GLIBCXX_AUTO_CAST(__t.end()) } -> 
sentinel_for<__range_iter_t<_Tp>>;
};
 
 // Poison pill so that unqualified lookup doesn't find std::end.
@@ -152,7 +152,7 @@ namespace ranges
   concept __adl_end = __class_or_enum>
&& requires(_Tp& __t)
{
- { __decay_copy(end(__t)) } -> s

Re: [PATCH v13 0/4] c: Add __lengthof__ operator

2024-10-07 Thread Alejandro Colomar
Hi Joseph,

On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> wait for them to be reviewed so that we only have a single-patch series, 
> before doing final review of the main patch.

I do not fully understand.  Who has to review patches 1,2,3?  Also, do
you want to merge them, then I resend patch 4 as a single patch, and
then you review that one?  If so, that looks like a good plan to me.

Thanks!

> Since the feature was accepted as _Lengthof, that's the form that should 
> be added to GCC; no __lengthof__ variant needed.

Okay, I'm indiferent to choosing between both of those names; since they
are equally harmful.  ;)

On the other hand, I'm tempted to propose a different name, and force
ISO to reconsider and follow.  The discussion on this list was more
thorough than the short discussion at WG14, which didn't really take
into consideration the dangers of harmful and error-prone nomenclature.

> In general in GCC, 
> although not strictly required by the standard in this case, we use 
> pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
> new C2Y feature that's not in C23

Thanks; will do.

Have a lovely night!
Alex

> (if -pedantic with a pre-C2Y standard, 
> or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
> verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
> -pedantic, no diagnostic with -std=c23 -pedantic-errors 
> -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
> with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
> that logic, you just need the pedwarn_c23 call and the tests for those 
> various cases.)
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 

-- 



signature.asc
Description: PGP signature


Re: [PATCH v13 0/4] c: Add __lengthof__ operator

2024-10-07 Thread Alejandro Colomar
On Tue, Oct 08, 2024 at 02:04:39AM GMT, Alejandro Colomar wrote:
> Hi Joseph,
> 
> On Mon, Oct 07, 2024 at 05:35:16PM GMT, Joseph Myers wrote:
> > Patches 1, 2 and 3 are logically nothing to do with this feature.  I'll 
> > wait for them to be reviewed so that we only have a single-patch series, 
> > before doing final review of the main patch.
> 
> I do not fully understand.  Who has to review patches 1,2,3?  Also, do
> you want to merge them, then I resend patch 4 as a single patch, and
> then you review that one?  If so, that looks like a good plan to me.
> 
> Thanks!
> 
> > Since the feature was accepted as _Lengthof, that's the form that should 
> > be added to GCC; no __lengthof__ variant needed.
> 
> Okay, I'm indiferent to choosing between both of those names; since they
> are equally harmful.  ;)
> 
> On the other hand, I'm tempted to propose a different name, and force
> ISO to reconsider and follow.  The discussion on this list was more
> thorough than the short discussion at WG14, which didn't really take
> into consideration the dangers of harmful and error-prone nomenclature.
> 
> > In general in GCC, 
> > although not strictly required by the standard in this case, we use 
> > pedwarn_c23 (pass OPT_Wpedantic as the option) to diagnose the use of a 
> > new C2Y feature that's not in C23
> 
> Thanks; will do.

On the other hand, should we provide a version of the operator that is
free from pedantic warnings?  A GNU extension?

Cheers,
Alex

> 
> Have a lovely night!
> Alex
> 
> > (if -pedantic with a pre-C2Y standard, 
> > or -Wc23-c2y-compat even in C2Y mode), with appropriate testcases to 
> > verify this (error with -std=c23 -pedantic-errors, warning with -std=c23 
> > -pedantic, no diagnostic with -std=c23 -pedantic-errors 
> > -Wno-c23-c2y-compat, no diagnostic with -std=c2y -pedantic-errors, warning 
> > with -std=c2y -pedantic-errors -Wc23-c2y-compat).  (pedwarn_c23 handles 
> > that logic, you just need the pedwarn_c23 call and the tests for those 
> > various cases.)
> > 
> > -- 
> > Joseph S. Myers
> > josmy...@redhat.com
> > 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


nvptx: Disable effective-target 'freestanding' (was: [PATCH 3/9] nvptx: Re-enable test cases by removing effective target 'freestanding')

2024-10-07 Thread Thomas Schwinge
Hi!

On 2022-12-02T13:03:09+0100, I wrote:
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp

> -# Check to see if a target is "freestanding". This is as per the definition
> -# in Section 4 of C99 standard. Effectively, it is a target which supports no
> -# extra headers or libraries other than what is considered essential.
> -proc check_effective_target_freestanding { } {
> -if { [istarget nvptx-*-*] } {
> -   return 1
> -}
> -return 0
> -}

I have, for now, pushed a simpler variant of this to trunk branch in
commit 65c7616c251a6697134b2a3ac7fe6460d308d2ed
"nvptx: Disable effective-target 'freestanding'", see attached.


Grüße
 Thomas


>From 65c7616c251a6697134b2a3ac7fe6460d308d2ed Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 28 Nov 2022 13:49:06 +0100
Subject: [PATCH] nvptx: Disable effective-target 'freestanding'

After 2014's commit 157e859ffe3b5d43db1e19475711c1a3d21ab57a "remove picochip",
the effective-target 'freestanding' (later) was only ever used for nvptx.
However, the relevant I/O library functions have long been implemented in nvptx
newlib.

These test cases generally PASS, just a few need to get XFAILed; see
,
and then supposedly
 for
description of the non-standard PTX 'vprintf' return value:

> Unlike the C-standard 'printf()', which returns the number of characters
> printed, CUDA's 'printf()' returns the number of arguments parsed. If no
> arguments follow the format string, 0 is returned. If the format string is
> NULL, -1 is returned. If an internal error occurs, -2 is returned.

(I've tried a few variants to confirm that PTX 'vprintf' -- which supposedly is
underlying the CUDA 'printf' -- is what's implementing this behavior.)
Probably, we ought to fix that up in nvptx newlib.

	gcc/testsuite/
	* gcc.c-torture/execute/printf-1.c: XFAIL for nvptx.
	* gcc.c-torture/execute/printf-chk-1.c: Likewise.
	* gcc.c-torture/execute/vprintf-1.c: Likewise.
	* gcc.c-torture/execute/vprintf-chk-1.c: Likewise.
	* lib/target-supports.exp (check_effective_target_freestanding):
	Disable for nvptx.
---
 gcc/testsuite/gcc.c-torture/execute/printf-1.c  | 1 +
 gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c  | 1 +
 gcc/testsuite/gcc.c-torture/execute/vprintf-1.c | 1 +
 gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c | 1 +
 gcc/testsuite/lib/target-supports.exp   | 3 ---
 5 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.c-torture/execute/printf-1.c b/gcc/testsuite/gcc.c-torture/execute/printf-1.c
index 654e62766a8..e1201365c1f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/printf-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/printf-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c b/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c
index aab43062bae..6418957edae 100644
--- a/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c b/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c
index 259397ebda3..0fb1ade94e0 100644
--- a/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #ifndef test
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c b/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c
index 04ecc4df4d9..7ea3617e184 100644
--- a/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #ifndef test
 #include 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 459af8e58c6..1c9bbf64817 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -859,9 +859,6 @@ proc check_profiling_available { test_what } {
 # in Section 4 of C99 standard. Effectively, it is a target which supports no
 # extra headers or libraries other than what is considered essential.
 proc check_effective_target_freestanding { } {
-if { [istarget nvptx-*-*] } {
-	return 1
-}
 return 0
 }
 
-- 
2.34.1



Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-07 Thread Richard Biener
On Mon, 7 Oct 2024, Jakub Jelinek wrote:

> On Mon, Oct 07, 2024 at 08:59:56AM +0200, Richard Biener wrote:
> > The forwprop added optmization looks like it would match PHI-opt better,
> > but I'm fine with leaving it in forwprop.  I do wonder whether instead
> > of adding a flag adding the actual values wanted as argument to
> > .SPACESHIP would allow further optimizations (maybe also indicating
> > cases "ignored")?  Are those -1, 0, 1, 2 standard mandated values
> > or implementation defined (or part of the platform ABI)?
> 
> They are implementation defined, -1, 0, 1, 2 is defined by libstdc++:
> using type = signed char;
> enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
> enum class _Ncmp : type { _Unordered = 2 };
> https://eel.is/c++draft/cmp#categories.pre-1 documents them as
> enum class ord { equal = 0, equivalent = equal, less = -1, greater = 1 }; // 
> exposition only
> enum class ncmp { unordered = -127 }; // 
> exposition only
> and now looking at it, LLVM's libc++ takes that literally and uses
> -1, 0, 1, -127.  One can't use <=> operator without including 
> which provides the enums, so I think if all we care about is libstdc++,
> then just hardcoding -1, 0, 1, 2 is fine, if we want to also optimize
> libc++ when used with gcc, we could support -1, 0, 1, -127 as another
> option.
> Supporting arbitrary 4 values doesn't make sense, at least on x86 the
> only reason to do the conversion to int in an optab is a good sequence
> to turn the flag comparisons to -1, 0, 1.  So, either we do nothing
> more than the patch, or add handle both 2 and -127 for unordered,
> or add support for arbitrary value for the unordered case except
> -1, 0, 1 (then -1 could mean signed int, 1 unsigned int, 0 do the jumps
> and any other value what should be returned for unordered.

I see.

Thanks for the clarification.  The patch OK still holds.

Richard.


Re: arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-07 Thread Christophe Lyon
Hi All,

FWIW, the previous patch (gcc-15-4066-g7766a2c1eb6) broke bootstrap on
arm-linux-gnueabihf. (reported via
https://linaro.atlassian.net/browse/GNU-1364)

Christophe

On Mon, 7 Oct 2024 at 10:10, Torbjorn SVENSSON
 wrote:
>
> Hello Andre,
>
> Compared to a run without any of the 2 patches for PR 116444, I get this
> diff:
>
> --- base/m55hard/analysis.gcc2024-09-18 09:07:18.879493251 +
> +++ pr116444/m55hard/analysis.gcc2024-10-05 11:44:05.261683071 +
> +FAIL: gcc.target/arm/attr_thumb.c scan-assembler ite
> -FAIL: gcc.target/arm/max-insns-skipped.c object-size text <= 40
> -FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler asreq
> -FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler lslne
> +FAIL: gcc.target/arm/vseleqdf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vseleqsf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselgedf.c scan-assembler-times vselge.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselgesf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselgtdf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselgtsf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselledf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vsellesf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselltdf.c scan-assembler-times vselge.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselltsf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselnedf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselnesf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
> --- base/m55soft/analysis.gcc2024-09-18 09:07:19.199493246 +
> +++ pr116444/m55soft/analysis.gcc2024-10-05 11:44:07.533683037 +
> +FAIL: gcc.target/arm/attr_thumb.c scan-assembler ite
> -FAIL: gcc.target/arm/max-insns-skipped.c object-size text <= 40
> -FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler asreq
> -FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler lslne
> +FAIL: gcc.target/arm/vseleqdf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vseleqsf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselgedf.c scan-assembler-times vselge.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselgesf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselgtdf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselgtsf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselledf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vsellesf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselltdf.c scan-assembler-times vselge.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselltsf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselnedf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselnesf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
> --- base/m85hard/analysis.gcc2024-09-18 09:07:18.035493264 +
> +++ pr116444/m85hard/analysis.gcc2024-10-05 11:44:04.289683085 +
> +FAIL: gcc.target/arm/attr_thumb.c scan-assembler ite
> -FAIL: gcc.target/arm/max-insns-skipped.c object-size text <= 40
> -FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler asreq
> -FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler lslne
> +FAIL: gcc.target/arm/vseleqdf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vseleqsf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselgedf.c scan-assembler-times vselge.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselgesf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselgtdf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselgtsf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselledf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vsellesf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselltdf.c scan-assembler-times vselge.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselltsf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselnedf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
> +FAIL: gcc.target/arm/vselnesf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
> +FAIL: gcc.target/arm/vselvcdf.c sc

Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-10-07 Thread Eric Botcazou
> I'm not quite sure what you mean by a testcase, but when compiling gcc
> itself, when libgomp/libgcc (Can't remember which) is being compiled, gcc
> will spit out invalid assembly that looks something like
> 
> movabsq $8+__gcov_indirect_call@secrel32, %rax

OK, I can reproduce this at -O0:

_gcov_indirect_call_profiler_v4.s: Assembler messages:
_gcov_indirect_call_profiler_v4.s:288: Error: 4-byte relocation cannot be 
applied to 8-byte field

but not at -O1 or -O2.

-- 
Eric Botcazou




Re: [PATCH v6] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-07 Thread Kito Cheng
Could you implement the latest API defined in the doc?

struct {
unsigned length;
unsigned long long features[];
} __riscv_feature_bits;

struct {
unsigned length;
unsigned long long features[];
} __riscv_vendor_feature_bits;

struct {
unsigned mvendorid;
unsigned marchid;
unsigned mimpid;
} __riscv_cpu_model;


On Fri, Oct 4, 2024 at 2:24 AM Yangyu Chen  wrote:
>
> From: Kito Cheng 
>
> This provides a common abstraction layer to probe the available extensions at
> run-time. These functions can be used to implement function multi-versioning 
> or
> to detect available extensions.
>
> The advantages of providing this abstraction layer are:
> - Easy to port to other new platforms.
> - Easier to maintain in GCC for function multi-versioning.
>   - For example, maintaining platform-dependent code in C code/libgcc is much
> easier than maintaining it in GCC by creating GIMPLEs...
>
> This API is intended to provide the capability to query minimal common 
> available extensions on the system.
>
> Proposal in riscv-c-api-doc: 
> https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74
>
> Full function multi-versioning implementation will come later. We are posting
> this first because we intend to backport it to the GCC 14 branch to unblock
> LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.
>
> Changes since v5:
> - Minor fixes on indentation.
>
> Changes since v4:
> - Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc*
>   Zimop, Zcmop, Zawrs.
> - Rename the return variable name of hwprobe syscall.
> - Minor fixes on indentation.
>
> Changes since v3:
> - Fix non-linux build.
> - Let __init_riscv_feature_bits become constructor
>
> Changes since v2:
> - Prevent it initialize more than once.
>
> Changes since v1:
> - Fix the format.
> - Prevented race conditions by introducing a local variable to avoid 
> load/store
>   operations during the computation of the feature bit.
>
> libgcc/ChangeLog:
>
> * config/riscv/feature_bits.c: New.
> * config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
>
> Co-Developed-by: Yangyu Chen 
> Signed-off-by: Yangyu Chen 
> ---
>  libgcc/config/riscv/feature_bits.c | 364 +
>  libgcc/config/riscv/t-elf  |   1 +
>  2 files changed, 365 insertions(+)
>  create mode 100644 libgcc/config/riscv/feature_bits.c
>
> diff --git a/libgcc/config/riscv/feature_bits.c 
> b/libgcc/config/riscv/feature_bits.c
> new file mode 100644
> index 000..c5339f065c1
> --- /dev/null
> +++ b/libgcc/config/riscv/feature_bits.c
> @@ -0,0 +1,364 @@
> +/* Helper function for function multi-versioning for RISC-V.
> +
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +.  */
> +
> +#define RISCV_FEATURE_BITS_LENGTH 2
> +struct {
> +  unsigned length;
> +  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
> +} __riscv_feature_bits __attribute__ ((visibility ("hidden"), nocommon));
> +
> +#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
> +
> +struct {
> +  unsigned vendorID;
> +  unsigned length;
> +  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
> +} __riscv_vendor_feature_bits __attribute__ ((visibility ("hidden"), 
> nocommon));
> +
> +#define A_GROUPID 0
> +#define A_BITMASK (1ULL << 0)
> +#define C_GROUPID 0
> +#define C_BITMASK (1ULL << 2)
> +#define D_GROUPID 0
> +#define D_BITMASK (1ULL << 3)
> +#define F_GROUPID 0
> +#define F_BITMASK (1ULL << 5)
> +#define I_GROUPID 0
> +#define I_BITMASK (1ULL << 8)
> +#define M_GROUPID 0
> +#define M_BITMASK (1ULL << 12)
> +#define V_GROUPID 0
> +#define V_BITMASK (1ULL << 21)
> +#define ZACAS_GROUPID 0
> +#define ZACAS_BITMASK (1ULL << 26)
> +#define ZBA_GROUPID 0
> +#define ZBA_BITMASK (1ULL << 27)
> +#define ZBB_GROUPID 0
> +#define ZBB_BITMASK (1ULL << 28)
> +#define ZBC_GROUPID 0
> +#define ZBC_BITMASK (1ULL << 29)
> +#define ZBKB_GROUPID 0
> +#define ZBKB_BITMASK (1ULL << 30)
> +#define ZBKC_GROUPID 0
> +#define ZBKC_BITMASK (1ULL << 31)
> +#define ZBKX_GROUPID 0
> +#define ZBKX_BITMASK (1ULL << 32)
> +#define Z

Re: [to-be-committed][V2][RISC-V] Add splitters to restore condops generation after recent phiopt changes

2024-10-07 Thread Maciej W. Rozycki
On Sun, 6 Oct 2024, Jeff Law wrote:

> V2:
>   Fix typo in ChangeLog.
>   Remove now extraneous comment in cset-sext.c.
>   Throttle back branch cost to 1 in various tests

 Thank you for addressing my concerns.

 There's a bunch of extraneous trailing new lines to remove if you care 
doing this in gcc/config/riscv/zicond.md (which I have only glanced over 
without diving into the details of the RTL pieces; they seem reasonable), 
but otherwise I have no further input on your change.

 Thank you for making these improvements.

  Maciej


[PATCH] testsuite: Define missing and use ET for arm_arch_* and arm_cpu_*

2024-10-07 Thread Torbjörn SVENSSON
Ok for trunk?

--

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog

* gcc.target/arm/pr65647.c: Use ET arm_arch_v6m.
* gcc.target/arm/mod_2.c: Use ET arm_cpu_cortex_a57.
* gcc.target/arm/mod_256.c: Likewise.
* gcc.target/arm/vseleqdf.c: Likewise.
* gcc.target/arm/vseleqsf.c: Likewise.
* gcc.target/arm/vselgedf.c: Likewise.
* gcc.target/arm/vselgesf.c: Likewise.
* gcc.target/arm/vselgtdf.c: Likewise.
* gcc.target/arm/vselgtsf.c: Likewise.
* gcc.target/arm/vselledf.c: Likewise.
* gcc.target/arm/vsellesf.c: Likewise.
* gcc.target/arm/vselltdf.c: Likewise.
* gcc.target/arm/vselltsf.c: Likewise.
* gcc.target/arm/vselnedf.c: Likewise.
* gcc.target/arm/vselnesf.c: Likewise.
* gcc.target/arm/vselvcdf.c: Likewise.
* gcc.target/arm/vselvcsf.c: Likewise.
* gcc.target/arm/vselvsdf.c: Likewise.
* gcc.target/arm/vselvssf.c: Likewise.
* lib/target-supports.exp: Define EF arm_cpu_cortex_a57.  Update ET
arm_v8_1_lob_ok to use -mcpu=unset.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/mod_2.c| 4 +++-
 gcc/testsuite/gcc.target/arm/mod_256.c  | 4 +++-
 gcc/testsuite/gcc.target/arm/pr65647.c  | 3 ++-
 gcc/testsuite/gcc.target/arm/vseleqdf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vseleqsf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselgedf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselgesf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselgtdf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselgtsf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselledf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vsellesf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselltdf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselltsf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselnedf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselnesf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselvcdf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselvcsf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselvsdf.c | 5 +++--
 gcc/testsuite/gcc.target/arm/vselvssf.c | 5 +++--
 gcc/testsuite/lib/target-supports.exp   | 3 ++-
 20 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c 
b/gcc/testsuite/gcc.target/arm/mod_2.c
index 1143725d59a..3a203b67d73 100644
--- a/gcc/testsuite/gcc.target/arm/mod_2.c
+++ b/gcc/testsuite/gcc.target/arm/mod_2.c
@@ -1,7 +1,9 @@
 /* { dg-do compile } */
 /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
"-mpure-code" } } */
 /* { dg-require-effective-target arm32 } */
-/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+/* { dg-require-effective-target arm_cpu_cortex_a57 } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-add-options arm_cpu_cortex_a57 } */
 
 #include "../aarch64/mod_2.x"
 
diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c 
b/gcc/testsuite/gcc.target/arm/mod_256.c
index d8dca0fe7d5..3521d7a05f3 100644
--- a/gcc/testsuite/gcc.target/arm/mod_256.c
+++ b/gcc/testsuite/gcc.target/arm/mod_256.c
@@ -1,7 +1,9 @@
 /* { dg-do compile } */
 /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
"-mpure-code" } } */
 /* { dg-require-effective-target arm32 } */
-/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+/* { dg-require-effective-target arm_cpu_cortex_a57 } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-add-options arm_cpu_cortex_a57 } */
 
 #include "../aarch64/mod_256.x"
 
diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
b/gcc/testsuite/gcc.target/arm/pr65647.c
index 26b4e399f6b..dc3a3ca1184 100644
--- a/gcc/testsuite/gcc.target/arm/pr65647.c
+++ b/gcc/testsuite/gcc.target/arm/pr65647.c
@@ -1,7 +1,8 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_arch_v6m_ok } */
 /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
{"-mfloat-abi=soft" } } */
-/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft" } */
+/* { dg-options "-mthumb -O3 -w -mfloat-abi=soft" } */
+/* { dg-add-options arm_arch_v6m } */
 
 a, b, c, e, g = &e, h, i = 7, l = 1, m, n, o, q = &m, r, s = &r, u, w = 9, x,
   y = 6, z, t6 = 7, t8, t9 = 1, t11 = 5, t12 = &t8, t13 = 3, t15,
diff --git a/gcc/testsuite/gcc.target/arm/vseleqdf.c 
b/gcc/testsuite/gcc.target/arm/vseleqdf.c
index 8a433356492..5be3ed2b1f9 100644
--- a/gcc/testsuite/gcc.target/arm/vseleqdf.c
+++ b/gcc/testsuite/gcc.target/arm/vseleqdf.c
@@ -1,7 +1,8 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_cpu_cortex_a57_ok } */
 /* { dg-require-effective-target arm_v8_vfp_ok } */
-/* { dg-options "-O2 -mcpu=cortex-a57" } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_cpu_cortex_a57 } */
 /* { dg-add-options arm_v8_vfp } */
 
 double
diff --git a/gcc/testsuite/gcc.target/arm/vseleqsf.c 
b/gcc/testsuite/gcc.target/arm/vseleqsf.c
index fc4631887d8..f870b

Re: arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-07 Thread Torbjorn SVENSSON

Hello Andre,

Compared to a run without any of the 2 patches for PR 116444, I get this 
diff:


--- base/m55hard/analysis.gcc2024-09-18 09:07:18.879493251 +
+++ pr116444/m55hard/analysis.gcc2024-10-05 11:44:05.261683071 +
+FAIL: gcc.target/arm/attr_thumb.c scan-assembler ite
-FAIL: gcc.target/arm/max-insns-skipped.c object-size text <= 40
-FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler asreq
-FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler lslne
+FAIL: gcc.target/arm/vseleqdf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vseleqsf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselgedf.c scan-assembler-times vselge.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselgesf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselgtdf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselgtsf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselledf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vsellesf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselltdf.c scan-assembler-times vselge.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselltsf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselnedf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselnesf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
--- base/m55soft/analysis.gcc2024-09-18 09:07:19.199493246 +
+++ pr116444/m55soft/analysis.gcc2024-10-05 11:44:07.533683037 +
+FAIL: gcc.target/arm/attr_thumb.c scan-assembler ite
-FAIL: gcc.target/arm/max-insns-skipped.c object-size text <= 40
-FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler asreq
-FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler lslne
+FAIL: gcc.target/arm/vseleqdf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vseleqsf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselgedf.c scan-assembler-times vselge.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselgesf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselgtdf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselgtsf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselledf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vsellesf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselltdf.c scan-assembler-times vselge.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselltsf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselnedf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselnesf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
--- base/m85hard/analysis.gcc2024-09-18 09:07:18.035493264 +
+++ pr116444/m85hard/analysis.gcc2024-10-05 11:44:04.289683085 +
+FAIL: gcc.target/arm/attr_thumb.c scan-assembler ite
-FAIL: gcc.target/arm/max-insns-skipped.c object-size text <= 40
-FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler asreq
-FAIL: gcc.target/arm/thumb-ifcvt-2.c scan-assembler lslne
+FAIL: gcc.target/arm/vseleqdf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vseleqsf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselgedf.c scan-assembler-times vselge.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselgesf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselgtdf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselgtsf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselledf.c scan-assembler-times vselgt.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vsellesf.c scan-assembler-times vselgt.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselltdf.c scan-assembler-times vselge.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselltsf.c scan-assembler-times vselge.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselnedf.c scan-assembler-times vseleq.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselnesf.c scan-assembler-times vseleq.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
+FAIL: gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
+FAIL: gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
--- base/m85soft/analysis.gcc2024-09-18 09:07:18.351493259 +
+++ pr116444/m85soft

Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-07 Thread Jakub Jelinek
On Mon, Oct 07, 2024 at 08:59:56AM +0200, Richard Biener wrote:
> The forwprop added optmization looks like it would match PHI-opt better,
> but I'm fine with leaving it in forwprop.  I do wonder whether instead
> of adding a flag adding the actual values wanted as argument to
> .SPACESHIP would allow further optimizations (maybe also indicating
> cases "ignored")?  Are those -1, 0, 1, 2 standard mandated values
> or implementation defined (or part of the platform ABI)?

They are implementation defined, -1, 0, 1, 2 is defined by libstdc++:
using type = signed char;
enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
enum class _Ncmp : type { _Unordered = 2 };
https://eel.is/c++draft/cmp#categories.pre-1 documents them as
enum class ord { equal = 0, equivalent = equal, less = -1, greater = 1 }; // 
exposition only
enum class ncmp { unordered = -127 }; // 
exposition only
and now looking at it, LLVM's libc++ takes that literally and uses
-1, 0, 1, -127.  One can't use <=> operator without including 
which provides the enums, so I think if all we care about is libstdc++,
then just hardcoding -1, 0, 1, 2 is fine, if we want to also optimize
libc++ when used with gcc, we could support -1, 0, 1, -127 as another
option.
Supporting arbitrary 4 values doesn't make sense, at least on x86 the
only reason to do the conversion to int in an optab is a good sequence
to turn the flag comparisons to -1, 0, 1.  So, either we do nothing
more than the patch, or add handle both 2 and -127 for unordered,
or add support for arbitrary value for the unordered case except
-1, 0, 1 (then -1 could mean signed int, 1 unsigned int, 0 do the jumps
and any other value what should be returned for unordered.

Jakub



Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-07 Thread Richard Biener
On Fri, Oct 4, 2024 at 3:07 PM Jonathan Wakely  wrote:
>
> On Fri, 4 Oct 2024 at 13:53, Dmitry Ilvokhin  wrote:
> >
> > On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote:
> > > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely  wrote:
> > > >
> > > > On Fri, 4 Oct 2024 at 07:53, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Instead of looping over every byte of the tail, unroll loop 
> > > > > > > > manually
> > > > > > > > using switch statement, then compilers (at least GCC and Clang) 
> > > > > > > > will
> > > > > > > > generate a jump table [1], which is faster on a microbenchmark 
> > > > > > > > [2].
> > > > > > > >
> > > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > > > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > > > > > >
> > > > > > > > libstdc++-v3/ChangeLog:
> > > > > > > >
> > > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): 
> > > > > > > > unroll
> > > > > > > >   loop using switch statement.
> > > > > > > >
> > > > > > > > Signed-off-by: Dmitry Ilvokhin 
> > > > > > > > ---
> > > > > > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 
> > > > > > > > +++
> > > > > > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > > > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > > > index 3665375096a..294a7323dd0 100644
> > > > > > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > > > > > @@ -50,10 +50,29 @@ namespace
> > > > > > > >load_bytes(const char* p, int n)
> > > > > > > >{
> > > > > > > >  std::size_t result = 0;
> > > > > > > > ---n;
> > > > > > > > -do
> > > > > > > > -  result = (result << 8) + static_cast > > > > > > > char>(p[n]);
> > > > > > > > -while (--n >= 0);
> > > > > > >
> > > > > > > Don't we still need to loop, for the case where n >= 8? Otherwise 
> > > > > > > we
> > > > > > > only hash the first 8 bytes.
> > > > > >
> > > > > > Ah, but it's only ever called with load_bytes(end, len & 0x7)
> > > > >
> > > > > The compiler should do such transforms - you probably want to tell
> > > > > it that n < 8 though, it likely doesn't (always) know.
> > > >
> > > > e.g. like this?
> > > >
> > > > if ((n & 7) != n)
> > > >   __builtin_unreachable();
> > > >
> > > > For the microbenchmark that seems to make things consistently worse:
> > > > https://quick-bench.com/q/2yCEqzFS8R8ueJ0-Gs-sZ6uWWEw
> > >
> > > Oh actually in the benchmark I used (!(1 <= n && n < 8)) because 1 <=
> > > n is always true too.
> > >
> >
> > GCC still wasn't able to unroll the loop, even with a
> > __builtin_unreachable, but benchmark link you mentioned above uses -O2
> > optimization level (not sure if it was intentional).
>
> That was intentional, because that's how libsupc++/hash_bytes.cc gets 
> compiled.

There's also the possibility to use #pragma GCC unroll, not sure if that can be
used to force peeling all iterations, you'd have to try.

> >
> > If we'll use -O3 [1], then GCC was able to unroll the loop for
> > load_bytes_loop_assume version, but at the same time I am not sure all
> > loop control instructions were elided, I still can see them on Godbolt
> > version of generated code [2]. Benchmark charts partially confirm that,
> > because performance of load_bytes_loop and load_bytes_loop_assume are
> > now quite close (same actually, except case n = 1). I guess it would
> > make sense, as we execute same amount of instructions.
> >
> > In addition, chart for load_bytes_switch look quite jumpy for [1] and
> > became better for cases n = 1 and n = 2. At this point I am not sure it
> > is not a code alignment issue and we are not measuring noise.
> >
> > [1]: https://quick-bench.com/q/LlcgMVhL61CasZVjCWbHd3uid8w
> > [2]: https://godbolt.org/z/qPf1n7xWs
> >
>


Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Andrew Pinski
On Mon, Oct 7, 2024 at 12:02 AM Дилян Палаузов
 wrote:
>
> Hello,
>
> https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wsuggest-attribute_003d
>  says for -Wsuggest-attribute=noreturn:
>
> > The compiler only warns for functions visible in other compilation units.
>
> Why?  clang -Wmissing-noreturn does warn even for static functions.

The reasoning is since static functions can only be called from the
local TU; adding the attribute won't change anything but for non-local
functions it can help other TUs so putting it on the declaration
rather than the definition of the function.

Thanks,
Andrew Pinski

>
> Greetings
>   Дилян


Handle non-grouped stores as single-lane SLP: adjust 'gcc.dg/vect/slp-26.c', GCN (was: [PATCH 3/3] Handle non-grouped stores as single-lane SLP)

2024-10-07 Thread Thomas Schwinge
Hi!

On 2024-10-03T13:34:47+0200, Richard Biener  wrote:
> On Thu, 3 Oct 2024, Thomas Schwinge wrote:
>> On 2024-09-06T11:30:06+0200, Richard Biener  wrote:
>> > On Thu, 5 Sep 2024, Richard Biener wrote:
>> >> The following enables single-lane loop SLP discovery for non-grouped 
>> >> stores
>> >> and adjusts vectorizable_store to properly handle those.
>> 
>> > I have now pushed this as r15-3509-gd34cda72098867
>> 
>> >> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
>> >> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
>> >> @@ -50,4 +50,5 @@ int main (void)
>> >>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { 
>> >> target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } 
>> >> } } } } */
>> >>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
>> >> target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } 
>> >> } */
>> >>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 
>> >> "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || 
>> >> loongarch_sx } } } } } } } */
>> >> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 
>> >> "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } 
>> >> } } } } } */
>> >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 
>> >> "vect" { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
>> >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 
>> >> "vect" { target riscv_v } } } */
>> 
>> For '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'),
>> I see:
>> 
>> PASS: gcc.dg/vect/slp-26.c (test for excess errors)
>> PASS: gcc.dg/vect/slp-26.c execution test
>> PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 
>> loops" 1
>> [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect 
>> "vectorizing stmts using SLP" 1
>> 
>> gcc.dg/vect/slp-26.c: pattern found 2 times
>> 
>> ..., so I suppose I'll apply the same change to 'amdgcn-*-*' as you did
>> to 'riscv_v'?
>
> I guess yes

Pushed to trunk branch commit b137e4bbcc488b44a037baad62a8da90659d7468
"Handle non-grouped stores as single-lane SLP: adjust 'gcc.dg/vect/slp-26.c', 
GCN",
see attached.


Grüße
 Thomas


> I don't remember exactly the reason but IIRC it's about the
> unsigned division which gcn might also be able to do - the 32817
> value is explicitly excluded from pattern recognition.  We don't have
> an effective target for unsigned [short] integer division.
>
> Richard.


>From b137e4bbcc488b44a037baad62a8da90659d7468 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 3 Oct 2024 12:52:30 +0200
Subject: [PATCH] Handle non-grouped stores as single-lane SLP: adjust
 'gcc.dg/vect/slp-26.c', GCN

As of commit d34cda720988674bcf8a24267c9e1ec61335d6de
"Handle non-grouped stores as single-lane SLP", we see for
'--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'):

PASS: gcc.dg/vect/slp-26.c (test for excess errors)
PASS: gcc.dg/vect/slp-26.c execution test
PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 1
[-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

gcc.dg/vect/slp-26.c: pattern found 2 times

Apply the same change to 'amdgcn-*-*' as done for 'riscv_v'.

	gcc/testsuite/
	* gcc.dg/vect/slp-26.c: Adjust GCN.
---
 gcc/testsuite/gcc.dg/vect/slp-26.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c
index cdb5d9c694b..23917474ddc 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -50,5 +50,5 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target riscv_v } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || loongarch_sx } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { riscv_v || amdgcn-*-* } } } } */
-- 
2.34.1



Re: [PATCH 0/4] Support more VLA SLP permutations

2024-10-07 Thread Richard Biener
On Fri, 4 Oct 2024, Richard Sandiford wrote:

> This series should fix the target-independent parts of PR116583.
> (We also need some target-specific patches, to be posted separately.)
> 
> The explanations are in the individual commit messages, but I've
> attached a -b diff below in case my attempt to split the patch up
> has just obfuscated things instead.
> 
> Tested on aarch64-linux-gnu (with and without SVE enabled by default)
> and x86_64-linux-gnu.  Also tested by running the vect testsuite
> with vect-force-slp=1.

The series is OK.

Thanks for working on this.

Richard.

> Richard Sandiford (4):
>   vect: Variable lane indices in vectorizable_slp_permutation_1
>   vect: Restructure repeating_p case for SLP permutations
>   vect: Support more VLA SLP permutations
>   vect: Add more dump messages for VLA SLP permutation
> 
>  gcc/testsuite/gcc.dg/vect/slp-13-big-array.c |   2 +-
>  gcc/testsuite/gcc.dg/vect/slp-13.c   |   2 +-
>  gcc/tree-vect-slp.cc | 190 +--
>  3 files changed, 134 insertions(+), 60 deletions(-)
> 
> 


Re: [Patch] OpenMP: Allocate directive for static vars, clean up

2024-10-07 Thread Tobias Burnus

Hi Andre,

first, thanks a lot for all your proof reading of patches! That's indeed 
helpful and reviewing (with offical LGTM stamp or as bystander) is a 
problem, you help to reduce it! :-)


Andre Vehreschild wrote:

@@ -821,6 +821,23 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)
+  if (sym->attr.omp_allocate && TREE_STATIC (decl))
+{
+  struct gfc_omp_namelist *n;
+  for (n = sym->ns->omp_allocate; n; n = n->next)
+   if (n->sym == sym)
+ break;

Theoretically n can be NULL here. This would then ICE. Or is there a guarantee,
that n is never NULL

+  tree alloc = gfc_conv_constant_to_tree (n->u2.allocator);


One should never say never, but I think it should never be NULL. In openmp.cc:

gfc_resolve_omp_allocate (gfc_namespace *ns, gfc_omp_namelist *list)
{
  for (gfc_omp_namelist *n = list; n; n = n->next)
...

  n->sym->attr.omp_allocate = 1;

And the caller is in resolve.cc:

if(ns->omp_allocate)

   gfc_resolve_omp_allocate (ns, ns->omp_allocate);

Cheers,

Tobias

PS: If you wonder about modules: It is not saved in .mod files. As it 
about allocating a variable, this property is only required where the 
variable is actually defined/has storage not where it is only accessed. 
Otherwise, that would be a loop hole.


Re: pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-10-07 Thread Alex Coplan
On 23/09/2024 11:31, Alex Coplan wrote:
> Hi,
> 
> As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
> returning false when a common base register (in this case, x1) was
> modified between the mem and the store insn.  This lead to wrong code as
> the accesses really did alias.
> 
> To avoid this sort of problem, this patch avoids invoking RTL alias
> analysis altogether (and assume an alias conflict) if the two insns to
> be compared share a common address register R, and the insns see different
> definitions of R (i.e. it was modified in between).
> 
> Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
> bootstrap and LTO+PGO bootstrap).  OK for trunk?

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663600.html

I realise it was bad timing on my part sending this just after
Richard (S) went away for a week, sorry about that!

Alex

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR rtl-optimization/116783
>   * pair-fusion.cc (def_walker::cand_addr_uses): New.
>   (def_walker::def_walker): Add parameter for candidate address
>   uses.
>   (def_walker::alias_conflict_p): Declare.
>   (def_walker::addr_reg_conflict_p): New.
>   (def_walker::conflict_p): New.
>   (store_walker::store_walker): Add parameter for candidate
>   address uses and pass to base ctor.
>   (store_walker::conflict_p): Rename to ...
>   (store_walker::alias_conflict_p): ... this.
>   (load_walker::load_walker): Add parameter for candidate
>   address uses and pass to base ctor.
>   (load_walker::conflict_p): Rename to ...
>   (load_walker::alias_conflict_p): ... this.
>   (pair_fusion_bb_info::try_fuse_pair): Collect address register
>   uses for candidate insns and pass down to alias walkers.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR rtl-optimization/116783
>   * g++.dg/torture/pr116783.C: New test.

> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> index cb0374f426b..b1ea611bacd 100644
> --- a/gcc/pair-fusion.cc
> +++ b/gcc/pair-fusion.cc
> @@ -2089,11 +2089,80 @@ protected:
>  
>def_iter_t def_iter;
>insn_info *limit;
> -  def_walker (def_info *def, insn_info *limit) :
> -def_iter (def), limit (limit) {}
> +
> +  // Array of register uses from the candidate insn which occur in MEMs.
> +  use_array cand_addr_uses;
> +
> +  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
> +def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
>  
>virtual bool iter_valid () const { return *def_iter; }
>  
> +  // Implemented in {load,store}_walker.
> +  virtual bool alias_conflict_p (int &budget) const = 0;
> +
> +  // Return true if the current (walking) INSN () uses a register R inside a
> +  // MEM, where R is also used inside a MEM by the (static) candidate insn, 
> and
> +  // those uses see different definitions of that register.  In this case we
> +  // can't rely on RTL alias analysis, and for now we conservatively assume 
> that
> +  // there is an alias conflict.  See PR116783.
> +  bool addr_reg_conflict_p () const
> +  {
> +use_array curr_insn_uses = insn ()->uses ();
> +auto cand_use_iter = cand_addr_uses.begin ();
> +auto insn_use_iter = curr_insn_uses.begin ();
> +while (cand_use_iter != cand_addr_uses.end ()
> +&& insn_use_iter != curr_insn_uses.end ())
> +  {
> + auto insn_use = *insn_use_iter;
> + auto cand_use = *cand_use_iter;
> + if (insn_use->regno () > cand_use->regno ())
> +   cand_use_iter++;
> + else if (insn_use->regno () < cand_use->regno ())
> +   insn_use_iter++;
> + else
> +   {
> + // As it stands I believe the alias code (memory_modified_in_insn_p)
> + // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
> + // be safe to skip over uses that only occur in notes.
> + if (insn_use->includes_address_uses ()
> + && !insn_use->only_occurs_in_notes ()
> + && insn_use->def () != cand_use->def ())
> +   {
> + if (dump_file)
> +   {
> + fprintf (dump_file,
> +  "assuming aliasing of cand i%d and i%d:\n"
> +  "-> insns see different defs of common addr reg 
> r%u\n"
> +  "-> ",
> +  cand_use->insn ()->uid (), insn_use->insn ()->uid 
> (),
> +  insn_use->regno ());
> +
> + // Note that while the following sequence could be made more
> + // concise by eliding pp_string calls into the pp_printf
> + // calls, doing so triggers -Wformat-diag.
> + pretty_printer pp;
> + pp_string (&pp, "[");
> + pp_access (&pp, cand_use, 0);
> + pp_string (&pp, "] in ");
> + pp_printf (&pp, "i%d", cand_use->insn ()->uid ());
> + pp_s

Re: [PATCH] middle-end: reorder masking priority of math functions

2024-10-07 Thread Richard Biener
On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento
 wrote:
>
> Given the categorization of math built-in functions as `ECF_CONST',
> when if-converting their uses, their calls are not masked and are thus
> called with an all-true predicate.
>
> This, however, is not appropriate where built-ins have library
> equivalents, wherein they may exhibit highly architecture-specific
> behaviors. For example, vectorized implementations may delegate the
> computation of values outside a certain acceptable numerical range to
> special (non-vectorized) routines which considerably slow down
> computation.
>
> As numerical simulation programs often do bounds check on input values
> prior to math calls, conditionally assigning default output values for
> out-of-bounds input and skipping the math call altogether, these
> fallback implementations should seldom be called in the execution of
> vectorized code.  If, however, we don't apply any masking to these
> math functions, we end up effectively executing both if and else
> branches for these values, leading to considerable performance
> degradation on scientific workloads.
>
> We therefore invert the order of handling of math function calls in
> `if_convertible_stmt_p' to prioritize the handling of their
> library-provided implementations over the equivalent internal function.
>
> Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
> new regressions.

I think the patch is good - note I think there's even a bugzilla about this
behavior.  So as incremental improvement the patch is OK.

I think we should further improve this, possibly with profile data, and
the situation could be improved by handling the calls like mask stores
where we put a if (mask != 0) before the store.

Richard.

> gcc/ChangeLog:
>
> * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
> function declaration before IFN fallback.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-fncall-mask-math.c: New.
> ---
>  .../gcc.dg/vect/vect-fncall-mask-math.c   | 33 +++
>  gcc/tree-if-conv.cc   | 18 +-
>  2 files changed, 42 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c 
> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> new file mode 100644
> index 000..15e22da2807
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> @@ -0,0 +1,33 @@
> +/* Test the correct application of masking to autovectorized math function 
> calls.
> +   Test is currently set to xfail pending the release of the relevant lmvec
> +   support. */
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw 
> -Ofast" { target { aarch64*-*-* } } } */
> +
> +#include 
> +
> +const int N = 20;
> +const float lim = 101.0;
> +const float cst =  -1.0;
> +float tot =   0.0;
> +
> +float b[20];
> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
> +   [10 ... 19] = 100.0 };/* Else branch.  */
> +
> +int main (void)
> +{
> +  #pragma omp simd
> +  for (int i = 0; i < N; i += 1)
> +{
> +  if (a[i] > lim)
> +   b[i] = cst;
> +  else
> +   b[i] = expf (a[i]);
> +  tot += b[i];
> +}
> +  return (0);
> +}
> +
> +/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { 
> xfail { aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, 
> _30>} ifcvt { xfail { aarch64*-*-* } } } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 3b04d1e8d34..90c754a4814 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt, 
> vec refs)
>
>  case GIMPLE_CALL:
>{
> -   /* There are some IFN_s that are used to replace builtins but have the
> -  same semantics.  Even if MASK_CALL cannot handle them 
> vectorable_call
> -  will insert the proper selection, so do not block conversion.  */
> -   int flags = gimple_call_flags (stmt);
> -   if ((flags & ECF_CONST)
> -   && !(flags & ECF_LOOPING_CONST_OR_PURE)
> -   && gimple_call_combined_fn (stmt) != CFN_LAST)
> - return true;
> -
> tree fndecl = gimple_call_fndecl (stmt);
> if (fndecl)
>   {
> @@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt, 
> vec refs)
>   }
>   }
>
> +   /* There are some IFN_s that are used to replace builtins but have the
> +  same semantics.  Even if MASK_CALL cannot handle them 
> vectorable_call
> +  will insert the proper selection, so do not block conversion.  */
> +   int flags = gimple_call_flags (stmt);
> +   if ((flags & ECF_CONST)
> +   && !(flags & ECF_LOOPING_CONST_OR_PURE)
> +   && gimple_call_combined_fn (stmt) != CFN_LAS

Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-07 Thread Richard Biener
On Fri, 4 Oct 2024, Uros Bizjak wrote:

> On Fri, Oct 4, 2024 at 11:58 AM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > The PR notes that we don't emit optimal code for C++ spaceship
> > operator if the result is returned as an integer rather than the
> > result just being compared against different values and different
> > code executed based on that.
> > So e.g. for
> > template 
> > auto foo (T x, T y) { return x <=> y; }
> > for both floating point types, signed integer types and unsigned integer
> > types.  auto in that case is std::strong_ordering or std::partial_ordering,
> > which are fancy C++ abstractions around struct with signed char member
> > which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
> > ordering (but for -ffast-math 2 is never the case).
> > I'm afraid functions like that are fairly common and unless they are
> > inlined, we really need to map the comparison to those -1, 0, 1 or
> > -1, 0, 1, 2 values.
> >
> > Now, for floating point spaceship I've in the past already added an
> > optimization (with tree-ssa-math-opts.cc discovery and named optab, the
> > optab only defined on x86 though right now), which ensures there is just
> > a single comparison instruction and then just tests based on flags.
> > Now, if we have code like:
> >   auto a = x <=> y;
> >   if (a == std::partial_ordering::less)
> > bar ();
> >   else if (a == std::partial_ordering::greater)
> > baz ();
> >   else if (a == std::partial_ordering::equivalent)
> > qux ();
> >   else if (a == std::partial_ordering::unordered)
> > corge ();
> > etc., that results in decent code generation, the spaceship named pattern
> > on x86 optimizes for the jumps, so emits comparisons on the flags, followed
> > by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that
> > well.  But if the result needs to be stored into an integer and just
> > returned that way or there are no immediate jumps based on it (or turned
> > into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
> > doesn't do a good job for that, we end up with say
> > comiss  %xmm1, %xmm0
> > jp  .L4
> > seta%al
> > movl$0, %edx
> > leal-1(%rax,%rax), %eax
> > cmove   %edx, %eax
> > ret
> > .L4:
> > movl$2, %eax
> > ret
> > The jp is good, that is the unlikely case and can't be easily handled in
> > straight line code due to the layout of the flags, but the rest uses cmov
> > which often isn't a win and a weird math.
> > With the patch below we can get instead
> > xorl%eax, %eax
> > comiss  %xmm1, %xmm0
> > jp  .L2
> > seta%al
> > sbbl$0, %eax
> > ret
> > .L2:
> > movl$2, %eax
> > ret
> >
> > The patch changes the discovery in the generic code, by detecting if
> > the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
> > -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
> > a new argument to .SPACESHIP ifn, so that the named pattern is told whether
> > it should optimize for branches or for loading the result into a -1, 0, 1
> > (, 2) integer.  Additionally, it doesn't detect just floating point <=>
> > anymore, but also integer and unsigned integer, but in those cases only
> > if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
> > result in good code).
> > The backend then can for those integer or unsigned integer <=>s return
> > effectively (x > y) - (x < y) in a way that is efficient on the target
> > (so for x86 with ensuring zero initialization first when needed before
> > setcc; one for floating point and unsigned, where there is just one setcc
> > and the second one optimized into sbb instruction, two for the signed int
> > case).  So e.g. for signed int we now emit
> > xorl%edx, %edx
> > xorl%eax, %eax
> > cmpl%esi, %edi
> > setl%dl
> > setg%al
> > subl%edx, %eax
> > ret
> > and for unsigned
> > xorl%eax, %eax
> > cmpl%esi, %edi
> > seta%al
> > sbbb$0, %al
> > ret
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The forwprop added optmization looks like it would match PHI-opt better,
but I'm fine with leaving it in forwprop.  I do wonder whether instead
of adding a flag adding the actual values wanted as argument to
.SPACESHIP would allow further optimizations (maybe also indicating
cases "ignored")?  Are those -1, 0, 1, 2 standard mandated values
or implementation defined (or part of the platform ABI)?

That said, the patch is OK.

Thanks,
Richar.

> > Note, I wonder if other targets wouldn't benefit from defining the
> > named optab too...
> >
> > 2024-10-04  Jakub Jelinek  
> >
> > PR middle-end/116896
> > * optabs.def (spaceship_optab): Use spaceship$a4 rather than
> >  

Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Дилян Палаузов
Hello,

https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wsuggest-attribute_003d
 says for -Wsuggest-attribute=noreturn:

> The compiler only warns for functions visible in other compilation units.

Why?  clang -Wmissing-noreturn does warn even for static functions.

Greetings
  Дилян


Re: [Patch] OpenMP: Allocate directive for static vars, clean up

2024-10-07 Thread Tobias Burnus

Now committed as r15-4104-ga8caeaacf499d5.

With a wording improvement in the commit log and avoiding an XPASS for 
C++ by excluding c++98 from the xfail in dg-bogus... xfail.


Tobias

Tobias Burnus wrote:

'omp allocate' permits to use a different (specified) allocator and
alignment for both stack/automatic and static/saved variables; the latter
takes only predefined allocators. Currently, only C and Fortran are
support for stack/automatic variables; static variables are rejected
before the attached patch. (For them, only predefined allocators are
permitted.)

* * *

I happened to look at the 'allocate' directive recently and, doing so,
I stumbled over a couple of issues, which the attached patch addresses
(missing diagnostics for corner cases, not updated checks, unhelpful
documentation ['allocate' *clause*], ...). Doing so, I wondered whether:

Shouldn't we just accept 'omp allocate' for static
variables by just honoring the aligning and ignoring the actually 
requested
allocator? - First, we do already the same for actual allocations as 
not all
traits are supported. And for the host this seems to be the most 
sensible to

do in any case.
[For some use cases, pointers + allocation in the constructor would be
better, but in general, not adding an indirection seems to be better and
has fewer corner-case usability issue.]

I guess we later want to honor the requested memory for nvptx and/or 
gcn; at
least Nvidia GPUs could make use for constant memory (having 
advantages for
reading the same memory by many threads/broadcasting it). I guess 
OpenACC 2.7's

'readonly' modifier serves a similar purpose.
For now we don't, but the attribute is passed on to the backends, 
which could

make use of them, if desired. ('groupprivate' directive vs. cgroup/thread
allocators are similar device-only features.)

As mentioned, this patch also fixes a few other issues here and there, 
see

commit log and source code for details.

Code comments? Suggestions or remarks? - Before I apply this patch?

Tobias

PS: I am aware that C++ support is lacking. There is a pending patch 
that needs
to be updated for this patch, probably some bitrotting, and in 
particular for the
review comments, cf. 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html

and https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639929.htmlcommit a8caeaacf499d58ba7ceabc311b7b71ca806f740
Author: Tobias Burnus 
Date:   Mon Oct 7 10:45:14 2024 +0200

OpenMP: Allocate directive for static vars, clean up

For the 'allocate' directive, remove the sorry for static variables and
just keep using normal memory, but honor the requested alignment and set
a DECL_ATTRIBUTE in case a target may want to make use of this later on.
The documentation is updated accordingly.

The C diagnostic to check for predefined allocators (req. for static vars)
failed to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was
already okay; but both now use new common #defined value for checking.)
And while Fortran common block variables are still rejected, the check
has been improved as before the sorry diagnostic did not work for
common blocks in modules.

Finally, for 'allocate' clause on the target/task/taskloop directives,
there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator
with access = thread), which is undefined behavior according to the
OpenMP specification.

And, last, testing showed that var decl + static_assert sets TREE_USED
but does not produce a statement list in C, which did run into an assert
in gimplify. This special case is now also handled.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_allocate): Set alignment for alignof;
accept static variables and fix predef allocator check.

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Use gomp-constants.h consts.
* trans-common.cc (translate_common): Reject OpenMP allocate directives.
* trans-decl.cc (gfc_finish_var_decl): Handle allocate directive
for static variables.
(gfc_trans_deferred_vars): Update for the latter.

gcc/ChangeLog:

* gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP
allocate directive.
(gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used
as allocator with the target/task/taskloop directive.

include/ChangeLog:

* gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX,
GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX,
GOMP_OMP_PREDEF_ALLOC_THREADS): New defines.

libgomp/ChangeLog:

* allocator.c: Add static asserts for news
GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values.
* libgomp.texi (OpenMP Impl. Status): Allocate directive for
static vars is now supported. Refer

Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-10-07 Thread Julian Waters
Resending again as I forgot to send it to the list

> Sorry, I somehow missed it. :-(  Then a configure check should be added
in the
> compiler to tell whether the detected linker has the fix or not.

> There are already some specific checks for the PE linker at
configure.ac:6500,
> although they do not invoke it.  A model could be the linker check
"linker EH
> garbage collection of sections bug" at configure.ac:6295 and the check
could
> use one of tests that Jan enabled in the linker testsuite
(secrel-reloc.d).

Haha, no worries. I'll see what I can do there. No promises that I can
figure it out on my own though, since gcc's build system has confused me to
no end, I'll ask for help again if I need to

> Do you have a testcase for this particular issue?

I'm not quite sure what you mean by a testcase, but when compiling gcc
itself, when libgomp/libgcc (Can't remember which) is being compiled, gcc
will spit out invalid assembly that looks something like

movabsq $8+__gcov_indirect_call@secrel32, %rax

This is, of course, not correct, since movabsq takes a 64 bit value, but
@secrel32 is 32 bit. The RTL code in legitimize_tls_address that was
responsible for this assembly is

rtx base = gen_reg_rtx (Pmode);

emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, gen_rtx_SET (base,
gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TLS_WIN32)),
gen_rtx_CLOBBER (VOIDmode, gen_rtx_SCRATCH (Pmode);

return gen_rtx_PLUS (Pmode, base, gen_rtx_CONST (Pmode, gen_rtx_UNSPEC
(Pmode, gen_rtvec (1, x), UNSPEC_SECREL32))); // This is the code that
results in the broken assembly!

This issue has been frustrating me to no end, since you can't add Pmode to
SImode normally, but then making the UNSPEC Pmode results in the broken
assembly shown above. The only solution I've found that works is zero
extending the UNSPEC const to Pmode, but that causes a useless register
load to appear:

leal local@secrel32, %edx

Right now I'm working on trying to find out how to make the zero_extend a
compile time no-op, but no luck there :(

best regards,
Julian

On Mon, Oct 7, 2024 at 3:26 PM Eric Botcazou  wrote:

> > The linker bug blocking this patch has actually already been fixed, see
> >
> https://github.com/bminor/binutils-gdb/commit/72cd2c70977943054ff784b7278cef
> > 5262288f32 for the patch that fixed it (Thanks for the help Jan!).
>
> Sorry, I somehow missed it. :-(  Then a configure check should be added in
> the
> compiler to tell whether the detected linker has the fix or not.
>
> There are already some specific checks for the PE linker at
> configure.ac:6500,
> although they do not invoke it.  A model could be the linker check "linker
> EH
> garbage collection of sections bug" at configure.ac:6295 and the check
> could
> use one of tests that Jan enabled in the linker testsuite (secrel-reloc.d).
>
> > I'll add your suggestions to the patch before pushing out a new version
> for
> > review, thanks (Well, there is one suggestion of yours I cannot add:
> Making
> > the secrel32 relocation Pmode, since the emitted assembly is broken when
> I
> > do that)
>
> Do you have a testcase for this particular issue?
>
> --
> Eric Botcazou
>
>
>


Re: arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-07 Thread Andre Vieira (lists)

Hi Torbjorn,

On 07/10/2024 09:08, Torbjorn SVENSSON wrote:




There are 3 test cases that are fixed with these 2 commits, but there is 
also a bunch that is marked as new fails.
Looking at the test cases that fail, there are 2 different kinds of 
failures.


1. gcc.target/arm/attr_thumb.c: This test case fails due to this 
difference:

--- /dev/fd/63  2024-10-07 08:25:49.595309010 +
+++ /dev/fd/62  2024-10-07 08:25:49.575309010 +
@@ -33,9 +33,10 @@
     @ args = 0, pretend = 0, frame = 0
     @ frame_needed = 0, uses_anonymous_args = 0
     @ link register save eliminated.
-   cmp r0, #0
-   ite eq
-   moveq   r0, #5
-   movne   r0, #1
+   cbz r0, .L3
+   movs    r0, #1
+   bx  lr
+.L3:
+   movs    r0, #5
     bx  lr
     .size   foo, .-foo
I'll leave the rest of the investigation of the reason for the failure, 
and the fix, to you Andre.


I think this test was meant to check __attribute__((thumb)) worked by 
switching to thumb, forcing a specific type of codegen, which no longer 
holds for armv8.1-m, so this is a testism that needs some creative 
thinking, probably best to skip if armv8.1-m.





2. All other the test cases in the list above: These need to be adapted 
to the change introduced in r15-3606-g7d6c6a0d15c to have the proper arch.
I've sent a patch that should fix these "regressions" in https:// 
gcc.gnu.org/pipermail/gcc-patches/2024-October/664611.html.


I  presume you are using -march=armv8.1-m.main+mve.fp+fp.dp for these 
rather than -mcpu? If I do:
RUNTESTFLAGS="--target_board=<...>/-mcpu=cortex-m55/-mfloat-abi=hard 
arm.exp=vseleqdf.c" then it works just fine for me as the -mcpu=unset 
does it work, but the -march=armv8.1-m.main+mve.fp+fp.dp does fail. I'll 
talk to Richard E about this one.


Thanks for helping with the testing I'll send a patch with the testism 
fixes up later.


I am however quite confident that these are both testisms. @Christophe: 
Any chance you can run the second patch through the bootstrap CI for 
arm-none-linux-gnueabihf ? Might end up committing the 2nd patch first 
if it helps fix that?





Re: [Fortran, Patch, PR51815, v3] Fix parsing of substring refs in coarrays.

2024-10-07 Thread Andre Vehreschild
Hi Harald,

thank you for your input. I still have some small nits to discuss to make
everyone happy. Therefore:

> this seems to go into the right direction - except that I am not a
> great fan of gfc_error_now, as that tries to paper over deficiencies
> in error recovery.

Me either, but when I remove the gfc_error_now() and only do

> if (gfc_peek_ascii_char () == '(')
>   return MATCH_ERROR;

as you proposed, then no error is given for:

   character(:), allocatable :: x[:]
   character(:), allocatable :: c
   c = x(:)(2:5)

I.e. nothing at all. Therefore at the moment I prefer to stick to the initial
solution with the gfc_error_now, which not only gives an error in the
associate, but also when one just does an array/substring-ref outside of
parentheses. And I like the new error message, because I consider it more
helpful than just a syntax error or the invalid association target message.
What do you think?

> Is there a reason that you do not check the return value of
> gfc_match_array_ref?

What am I to do with the result? We are in an error case independent of the
result of gfc_match_array_ref. The intention of using that routine here was to
digest the unexpected input and allow for (easier|better) error recovery. May
be I should just put a comment on it, to make it more clear. Or is there
another way to help the parser recover from an error?

Sorry for the additional round. But this error has been around for so long,
that it doesn't matter, if we need another day to come up with a solution.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre


> Indeed your suggestion (or the shortened version above) improves
> the diagnostics ("user experience") also for this variant:
>
> subroutine foo
> character(:), allocatable :: x[:]
> character(:), dimension(:), allocatable :: c[:]
> type t
>character(:), allocatable :: x[:]
>character(:), dimension(:), allocatable :: c[:]
> end type t
> type(t) :: z
> associate (y => x(:)(2:))
> end associate
> associate (a => c(:)(:)(2:))
> end associate
> associate (y => z%x(:)(2:))
> end associate
> associate (a => z%c(:)(:)(2:))
> end associate
> end
>
> with several error messages of the kind
>
> Error: Invalid association target at (1)
>
> or
>
> Error: Rank mismatch in array reference at (1) (1/0)
>
> looking less technical than a parsing error.
> I think this is as good as it can be.
>
> So OK from my side with either your additional patch or my
> shortened version.
>
> Thanks for the patch!
>
> Harald
>
>
> > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Is this ok?
> >
> > Regards and thanks for the review,
> > Andre
> >
> > On Tue, 1 Oct 2024 23:31:11 +0200
> > Harald Anlauf  wrote:
> >
> >> Hi Andre,
> >>
> >> Am 01.10.24 um 09:43 schrieb Andre Vehreschild:
> >>> Hi all,
> >>>
> >>> this rather old PR reported a parsing bug, when a coarray'ed character
> >>> substring ref is to be parsed, aka CHARACTER(:) :: str[:] ... str(2:5). In
> >>> this case the parser confused the substring ref with an array-ref, because
> >>> an array_spec was present. This patch fixes this by requesting only
> >>> coarray parsing from gfc_match_array_ref when no regular dimension is
> >>> present. The patch is not involved when an array of coarray'ed strings is
> >>> parsed (that worked beforehand).
> >>
> >> while the patch addresses the issue mentioned in the PR,
> >>
> >>> I had to fix the dg-error clauses in the testcase pr102532 because now the
> >>> error of having to many refs is detected by the parsing stage and no
> >>> longer by the resolve stage. It has become a simple syntax error. I hope
> >>> this is ok.
> >>
> >> I find the error messages now less helpful to users: before the patch
> >> we got "Rank mismatch in array reference", which was more suitable
> >> than the newer version with more or less confusing syntax errors.
> >>
> >> I assume you tried to find a better solution - but Intel and NAG
> >> also give syntax errors - so basically I am fine with the patch.
> >>
> >> You may want to wait for a second opinion.  If nobody else responds
> >> within the next 2 days, you may proceed nevertheless.
> >>
> >> Thanks,
> >> Harald
> >>
> >>> Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
> >>>
> >>> Regards,
> >>>   Andre
> >>> --
> >>> Andre Vehreschild * Email: vehre ad gmx dot de
> >>
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
> >
>


--
Andre Vehreschild * Email: vehre ad gmx dot de
From bf33a961a501e7a31f510518830e420a3f1e3b78 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Tue, 1 Oct 2024 09:30:59 +0200
Subject: [PATCH] Fix parsing of substring refs in coarrays. [PR51815]

The parser was greadily taking the substring ref as an array ref because
an array_spec was present.  Fix this by only parsing the coarray (pseudo)
ref when no regular array is present.

gcc/fortran/ChangeLog:

	PR fortran/5181

Re: arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-07 Thread Torbjorn SVENSSON




On 2024-10-07 10:53, Andre Vieira (lists) wrote:

Hi Torbjorn,

On 07/10/2024 09:08, Torbjorn SVENSSON wrote:




There are 3 test cases that are fixed with these 2 commits, but there 
is also a bunch that is marked as new fails.
Looking at the test cases that fail, there are 2 different kinds of 
failures.


1. gcc.target/arm/attr_thumb.c: This test case fails due to this 
difference:

--- /dev/fd/63  2024-10-07 08:25:49.595309010 +
+++ /dev/fd/62  2024-10-07 08:25:49.575309010 +
@@ -33,9 +33,10 @@
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
-   cmp r0, #0
-   ite eq
-   moveq   r0, #5
-   movne   r0, #1
+   cbz r0, .L3
+   movs    r0, #1
+   bx  lr
+.L3:
+   movs    r0, #5
 bx  lr
 .size   foo, .-foo
I'll leave the rest of the investigation of the reason for the 
failure, and the fix, to you Andre.


I think this test was meant to check __attribute__((thumb)) worked by 
switching to thumb, forcing a specific type of codegen, which no longer 
holds for armv8.1-m, so this is a testism that needs some creative 
thinking, probably best to skip if armv8.1-m.





2. All other the test cases in the list above: These need to be 
adapted to the change introduced in r15-3606-g7d6c6a0d15c to have the 
proper arch.
I've sent a patch that should fix these "regressions" in https:// 
gcc.gnu.org/pipermail/gcc-patches/2024-October/664611.html.


I  presume you are using -march=armv8.1-m.main+mve.fp+fp.dp for these 
rather than -mcpu? If I do:
RUNTESTFLAGS="--target_board=<...>/-mcpu=cortex-m55/-mfloat-abi=hard 
arm.exp=vseleqdf.c" then it works just fine for me as the -mcpu=unset 
does it work, but the -march=armv8.1-m.main+mve.fp+fp.dp does fail. I'll 
talk to Richard E about this one.


For these results, I did as before. This is the command line that was 
used for gcc.target/arm/vseleqdf.c:
.../bin/arm-none-eabi-gcc  .../gcc/testsuite/gcc.target/arm/vseleqdf.c 
-mthumb -march=armv8.1-m.main+mve+pacbti -mcpu=cortex-m85 
-mfloat-abi=hard -mfpu=fpv5-d16   -fdiagnostics-plain-output  -O2 
-mcpu=cortex-a57 -mfpu=fp-armv8 -mfloat-abi=softfp -ffat-lto-objects 
-fno-ident -S -o vseleqdf.s




Thanks for helping with the testing I'll send a patch with the testism 
fixes up later.


I am however quite confident that these are both testisms. @Christophe: 
Any chance you can run the second patch through the bootstrap CI for 
arm-none-linux-gnueabihf ? Might end up committing the 2nd patch first 
if it helps fix that?


I think the reason for the bootstrap failure is simply:
- the 2nd argument to arm_noce_conversion_profitable_p is never 
referenced in the function.
- the variable "set" in arm_is_v81m_cond_insn might not be defined 
before the if-statement (in the case the while loop is never executed).


Kind regards,
Torbjörn



Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-10-07 Thread Sam James
Julian Waters  writes:

> Resending again as I forgot to send it to the list
>
>> Sorry, I somehow missed it. :-(  Then a configure check should be added in 
>> the
>> compiler to tell whether the detected linker has the fix or not.
>
>> There are already some specific checks for the PE linker at 
>> configure.ac:6500,
>> although they do not invoke it.  A model could be the linker check "linker EH
>> garbage collection of sections bug" at configure.ac:6295 and the check could
>> use one of tests that Jan enabled in the linker testsuite (secrel-reloc.d).
>
> Haha, no worries. I'll see what I can do there. No promises that I can figure 
> it out on my own though, since gcc's build
> system has confused me to no end, I'll ask for help again if I need to
>  
>> Do you have a testcase for this particular issue?
>
> I'm not quite sure what you mean by a testcase, but when compiling gcc 
> itself, when libgomp/libgcc (Can't remember which)
> is being compiled, gcc will spit out invalid assembly that looks something 
> like

A minimal (ideally C) source file with a small set of commands to
produce obviously bad assembly or an abort on e.g. a bad runtime
condition.

This makes it easy to analyse, for people to help, and ultimately for it
to be added to the testsuite.

Trying to debug a massive source file as-is withou reducing it is not fun.

>
> movabsq $8+__gcov_indirect_call@secrel32, %rax


Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-10-07 Thread Julian Waters
Understood. I will try to reproduce the issue in the meantime as I rewrite
the patch

best regards,
Julian

On Mon, Oct 7, 2024 at 5:07 PM Sam James  wrote:

> Julian Waters  writes:
>
> > Resending again as I forgot to send it to the list
> >
> >> Sorry, I somehow missed it. :-(  Then a configure check should be added
> in the
> >> compiler to tell whether the detected linker has the fix or not.
> >
> >> There are already some specific checks for the PE linker at
> configure.ac:6500,
> >> although they do not invoke it.  A model could be the linker check
> "linker EH
> >> garbage collection of sections bug" at configure.ac:6295 and the check
> could
> >> use one of tests that Jan enabled in the linker testsuite
> (secrel-reloc.d).
> >
> > Haha, no worries. I'll see what I can do there. No promises that I can
> figure it out on my own though, since gcc's build
> > system has confused me to no end, I'll ask for help again if I need to
> >
> >> Do you have a testcase for this particular issue?
> >
> > I'm not quite sure what you mean by a testcase, but when compiling gcc
> itself, when libgomp/libgcc (Can't remember which)
> > is being compiled, gcc will spit out invalid assembly that looks
> something like
>
> A minimal (ideally C) source file with a small set of commands to
> produce obviously bad assembly or an abort on e.g. a bad runtime
> condition.
>
> This makes it easy to analyse, for people to help, and ultimately for it
> to be added to the testsuite.
>
> Trying to debug a massive source file as-is withou reducing it is not fun.
>
> >
> > movabsq $8+__gcov_indirect_call@secrel32, %rax
>


Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Дилян Палаузов
Hello Andrew,

do you mean that optimizations for [[noreturn]] functions can only be done when 
the functions are called from other TU?

How about a function with __attribute__ ((visibility ("hidden"))) during LTO 
linking, aren’t there all functions considered more or less to be in the same 
TU?  Do the [[noreturn]] optimizations still apply?

What happens to a static [[noreturn]] function, if a pointer to it is passed to 
other TU and that other TU calls the function?

Greetings
  Дилян

-Original Message-
From: Andrew Pinski 
To: Дилян Палаузов 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Why -Wsuggest-attribute=noreturn does not apply to static?
Date: 07/10/24 10:07:16

On Mon, Oct 7, 2024 at 12:02 AM Дилян Палаузов
 wrote:
> 
> Hello,
> 
> https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wsuggest-attribute_003d
>  says for -Wsuggest-attribute=noreturn:
> 
> > The compiler only warns for functions visible in other compilation units.
> 
> Why?  clang -Wmissing-noreturn does warn even for static functions.

The reasoning is since static functions can only be called from the
local TU; adding the attribute won't change anything but for non-local
functions it can help other TUs so putting it on the declaration
rather than the definition of the function.

Thanks,
Andrew Pinski

> 
> Greetings
>   Дилян



Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-10-07 Thread Eric Botcazou
> The linker bug blocking this patch has actually already been fixed, see
> https://github.com/bminor/binutils-gdb/commit/72cd2c70977943054ff784b7278cef
> 5262288f32 for the patch that fixed it (Thanks for the help Jan!).

Sorry, I somehow missed it. :-(  Then a configure check should be added in the 
compiler to tell whether the detected linker has the fix or not.

There are already some specific checks for the PE linker at configure.ac:6500, 
although they do not invoke it.  A model could be the linker check "linker EH 
garbage collection of sections bug" at configure.ac:6295 and the check could 
use one of tests that Jan enabled in the linker testsuite (secrel-reloc.d).

> I'll add your suggestions to the patch before pushing out a new version for
> review, thanks (Well, there is one suggestion of yours I cannot add: Making
> the secrel32 relocation Pmode, since the emitted assembly is broken when I
> do that)

Do you have a testcase for this particular issue?

-- 
Eric Botcazou




Re: arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-07 Thread Christophe Lyon
On Mon, 7 Oct 2024 at 11:04, Torbjorn SVENSSON
 wrote:
>
>
>
> On 2024-10-07 10:53, Andre Vieira (lists) wrote:
> > Hi Torbjorn,
> >
> > On 07/10/2024 09:08, Torbjorn SVENSSON wrote:
> >
> >
> >>
> >> There are 3 test cases that are fixed with these 2 commits, but there
> >> is also a bunch that is marked as new fails.
> >> Looking at the test cases that fail, there are 2 different kinds of
> >> failures.
> >>
> >> 1. gcc.target/arm/attr_thumb.c: This test case fails due to this
> >> difference:
> >> --- /dev/fd/63  2024-10-07 08:25:49.595309010 +
> >> +++ /dev/fd/62  2024-10-07 08:25:49.575309010 +
> >> @@ -33,9 +33,10 @@
> >>  @ args = 0, pretend = 0, frame = 0
> >>  @ frame_needed = 0, uses_anonymous_args = 0
> >>  @ link register save eliminated.
> >> -   cmp r0, #0
> >> -   ite eq
> >> -   moveq   r0, #5
> >> -   movne   r0, #1
> >> +   cbz r0, .L3
> >> +   movsr0, #1
> >> +   bx  lr
> >> +.L3:
> >> +   movsr0, #5
> >>  bx  lr
> >>  .size   foo, .-foo
> >> I'll leave the rest of the investigation of the reason for the
> >> failure, and the fix, to you Andre.
> >
> > I think this test was meant to check __attribute__((thumb)) worked by
> > switching to thumb, forcing a specific type of codegen, which no longer
> > holds for armv8.1-m, so this is a testism that needs some creative
> > thinking, probably best to skip if armv8.1-m.
> >
> >
> >>
> >> 2. All other the test cases in the list above: These need to be
> >> adapted to the change introduced in r15-3606-g7d6c6a0d15c to have the
> >> proper arch.
> >> I've sent a patch that should fix these "regressions" in https://
> >> gcc.gnu.org/pipermail/gcc-patches/2024-October/664611.html.
> >>
> > I  presume you are using -march=armv8.1-m.main+mve.fp+fp.dp for these
> > rather than -mcpu? If I do:
> > RUNTESTFLAGS="--target_board=<...>/-mcpu=cortex-m55/-mfloat-abi=hard
> > arm.exp=vseleqdf.c" then it works just fine for me as the -mcpu=unset
> > does it work, but the -march=armv8.1-m.main+mve.fp+fp.dp does fail. I'll
> > talk to Richard E about this one.
>
> For these results, I did as before. This is the command line that was
> used for gcc.target/arm/vseleqdf.c:
> .../bin/arm-none-eabi-gcc  .../gcc/testsuite/gcc.target/arm/vseleqdf.c
> -mthumb -march=armv8.1-m.main+mve+pacbti -mcpu=cortex-m85
> -mfloat-abi=hard -mfpu=fpv5-d16   -fdiagnostics-plain-output  -O2
> -mcpu=cortex-a57 -mfpu=fp-armv8 -mfloat-abi=softfp -ffat-lto-objects
> -fno-ident -S -o vseleqdf.s
>
> >
> > Thanks for helping with the testing I'll send a patch with the testism
> > fixes up later.
> >
> > I am however quite confident that these are both testisms. @Christophe:
> > Any chance you can run the second patch through the bootstrap CI for
> > arm-none-linux-gnueabihf ? Might end up committing the 2nd patch first
> > if it helps fix that?
>
> I think the reason for the bootstrap failure is simply:
> - the 2nd argument to arm_noce_conversion_profitable_p is never
> referenced in the function.
> - the variable "set" in arm_is_v81m_cond_insn might not be defined
> before the if-statement (in the case the while loop is never executed).
>

Right: I was mainly mentioning the bootstrap problem in case it takes
time to come to an agreement for this patch. (bootstrap is easy to
fix)

(@Andre I did manually start a precommit CI bootstrap build for this
patch, but it fails to apply because we keep the baseline as before
bootstrap is broken)

Thanks,

Christophe


> Kind regards,
> Torbjörn
>


Re: [Patch] OpenMP: Allocate directive for static vars, clean up

2024-10-07 Thread Andre Vehreschild
Hi Tobias,

just a question:

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 8231bd255d6..2586c6d7a79 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -821,6 +821,23 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)
   && (TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
 set_decl_tls_model (decl, decl_default_tls_model (decl));

+  if (sym->attr.omp_allocate && TREE_STATIC (decl))
+{
+  struct gfc_omp_namelist *n;
+  for (n = sym->ns->omp_allocate; n; n = n->next)
+   if (n->sym == sym)
+ break;

Theoretically n can be NULL here. This would then ICE. Or is there a guarantee,
that n is never NULL

+  tree alloc = gfc_conv_constant_to_tree (n->u2.allocator);



I just looked at the fortran part.

Thanks for the patch and regards,
Andre

On Mon, 7 Oct 2024 10:47:56 +0200
Tobias Burnus  wrote:

> Now committed as r15-4104-ga8caeaacf499d5.
>
> With a wording improvement in the commit log and avoiding an XPASS for
> C++ by excluding c++98 from the xfail in dg-bogus... xfail.
>
> Tobias
>
> Tobias Burnus wrote:
> > 'omp allocate' permits to use a different (specified) allocator and
> > alignment for both stack/automatic and static/saved variables; the latter
> > takes only predefined allocators. Currently, only C and Fortran are
> > support for stack/automatic variables; static variables are rejected
> > before the attached patch. (For them, only predefined allocators are
> > permitted.)
> >
> > * * *
> >
> > I happened to look at the 'allocate' directive recently and, doing so,
> > I stumbled over a couple of issues, which the attached patch addresses
> > (missing diagnostics for corner cases, not updated checks, unhelpful
> > documentation ['allocate' *clause*], ...). Doing so, I wondered whether:
> >
> > Shouldn't we just accept 'omp allocate' for static
> > variables by just honoring the aligning and ignoring the actually
> > requested
> > allocator? - First, we do already the same for actual allocations as
> > not all
> > traits are supported. And for the host this seems to be the most
> > sensible to
> > do in any case.
> > [For some use cases, pointers + allocation in the constructor would be
> > better, but in general, not adding an indirection seems to be better and
> > has fewer corner-case usability issue.]
> >
> > I guess we later want to honor the requested memory for nvptx and/or
> > gcn; at
> > least Nvidia GPUs could make use for constant memory (having
> > advantages for
> > reading the same memory by many threads/broadcasting it). I guess
> > OpenACC 2.7's
> > 'readonly' modifier serves a similar purpose.
> > For now we don't, but the attribute is passed on to the backends,
> > which could
> > make use of them, if desired. ('groupprivate' directive vs. cgroup/thread
> > allocators are similar device-only features.)
> >
> > As mentioned, this patch also fixes a few other issues here and there,
> > see
> > commit log and source code for details.
> >
> > Code comments? Suggestions or remarks? - Before I apply this patch?
> >
> > Tobias
> >
> > PS: I am aware that C++ support is lacking. There is a pending patch
> > that needs
> > to be updated for this patch, probably some bitrotting, and in
> > particular for the
> > review comments, cf.
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html
> > and https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639929.html


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Jakub Jelinek
On Mon, Oct 07, 2024 at 10:21:05AM +0300, Дилян Палаузов wrote:
> do you mean that optimizations for [[noreturn]] functions can only be done 
> when the functions are called from other TU?

No, but if it is static functions called from within the same TU, then the
[[noreturn]] attribute don't help, the compiler can discover that itself
(and if it can't, it wouldn't suggest it).

> How about a function with __attribute__ ((visibility ("hidden"))) during LTO 
> linking,
> aren’t there all functions considered more or less to be in the same TU?  Do 
> the [[noreturn]] optimizations still apply?

The [[noreturn]] optimizations apply always, if you call a function proven
to be noreturn (from attributes or callee analysis), then the compiler can
optimize away code after the call etc.
The attribute can be useful for non-static but hidden functions, callers
can be optimized earlier and don't rely on LTO propagation if user marks it
explicitly.

> What happens to a static [[noreturn]] function, if a pointer to it is passed 
> to other TU and that other TU calls the function?

Nothing.  [[noreturn]] attribute is function declaration attribute, it
doesn't apply to function pointers, so users can't mark a function pointer
to be function pointer to noreturn function; it will be only optimized
if the indirection is optimized out and turned into a direct call, then
the compiler knows whether it calls a noreturn function or not.
But, marking a static function [[noreturn]] even if you take a pointer
to it and pass to other TUs doesn't help in any way, the other TUs
will still see a function pointer which can't be marked and will not know
it will call only a noreturn function.

Jakub



Re: arm: Make arm_noce_conversion_profitable_p call default hook [PR 116444]

2024-10-07 Thread Andre Vieira (lists)




On 07/10/2024 10:15, Christophe Lyon wrote:

On Mon, 7 Oct 2024 at 11:04, Torbjorn SVENSSON
 wrote:




On 2024-10-07 10:53, Andre Vieira (lists) wrote:

Hi Torbjorn,



2. All other the test cases in the list above: These need to be
adapted to the change introduced in r15-3606-g7d6c6a0d15c to have the
proper arch.
I've sent a patch that should fix these "regressions" in https://
gcc.gnu.org/pipermail/gcc-patches/2024-October/664611.html.


I  presume you are using -march=armv8.1-m.main+mve.fp+fp.dp for these
rather than -mcpu? If I do:
RUNTESTFLAGS="--target_board=<...>/-mcpu=cortex-m55/-mfloat-abi=hard
arm.exp=vseleqdf.c" then it works just fine for me as the -mcpu=unset
does it work, but the -march=armv8.1-m.main+mve.fp+fp.dp does fail. I'll
talk to Richard E about this one.


For these results, I did as before. This is the command line that was
used for gcc.target/arm/vseleqdf.c:
.../bin/arm-none-eabi-gcc  .../gcc/testsuite/gcc.target/arm/vseleqdf.c
-mthumb -march=armv8.1-m.main+mve+pacbti -mcpu=cortex-m85
-mfloat-abi=hard -mfpu=fpv5-d16   -fdiagnostics-plain-output  -O2
-mcpu=cortex-a57 -mfpu=fp-armv8 -mfloat-abi=softfp -ffat-lto-objects
-fno-ident -S -o vseleqdf.s





@Torbjorn:
Where are all these options coming from though? There are some 
'conflicts' here. If you are passing all of these to RUNTESTFLAGS then 
... please don't.

For instance:
-march=armv8.1-m.main+mve+pacbti -mcpu=cortex-m85
These are contradictory:
Use either -march or -mcpu.
-mcpu=cortex-m85 is equivalent to:
-march=armv8.1-m.main+pacbti+mve.fp+fp.dp -mtune=cortex-m85

Your -march only enables integer mve it also doesn't enable any FP, 
cortex-m85 only seems to allow for double precision FP, we dont' seem to 
have a single FP configuration in GCC.


I also see a -mfpu=fpv5-d16, if you are passing that, then please don't. 
You should be passing -mfpu=auto if anything when compiling for MVE 
cores as there is no -mfpu= option that enables MVE and we don't 
recommend using both architecture extensions and -mfpu options.


Having said all this, I do now realize vseleq.f64 is available for 
armv8.1-m.main, so we should also enable that. I'm having a look!



Right: I was mainly mentioning the bootstrap problem in case it takes
time to come to an agreement for this patch. (bootstrap is easy to
fix)

(@Andre I did manually start a precommit CI bootstrap build for this
patch, but it fails to apply because we keep the baseline as before
bootstrap is broken)


Yeah, hadn't had the chance to look at the fail log. But I'll commit the 
fix as obvious today, regardless of the issues above.





Re: [PATCH v2] Add -ftime-report-wall

2024-10-07 Thread Richard Biener
On Sat, Oct 5, 2024 at 10:17 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Time vars normally use times(2) to get the user/sys/wall time, which is 
> always a
> system call. I don't think the system time is very useful because most 
> overhead
> is in user time. If we only use the wall (or monotonic) time modern OS have an
> optimized path to get it directly from a CPU instruction like RDTSC
> without system call, which is much faster.
>
> Add a -ftime-report-wall option. It actually uses the POSIX monotonic time,
> so strictly it's not wall clock, but it's still a reasonable name.
>
> Comparing the overhead with tramp3d -O0:
>
>   ./gcc/cc1plus -quiet  ../tsrc/tramp3d-v4.i ran
> 1.03 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report-wall 
> ../tsrc/tramp3d-v4.i
> 1.18 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report 
> ../tsrc/tramp3d-v4.i
>
> -ftime-report costs 18% (excluding the output), while -ftime-report-wall
> only costs 3%, so is nearly free. So it would be feasible for some build
> system to always enable it and break down the build time into passes.
>
> With -O2 it is a bit less pronounced but still visible:
>
>   ./gcc/cc1plus -O2 -quiet  ../tsrc/tramp3d-v4.i ran
> 1.00 ± 0.00 times faster than ./gcc/cc1plus -O2 -quiet -ftime-report-wall 
> ../tsrc/tramp3d-v4.i
> 1.08 ± 0.01 times faster than ./gcc/cc1plus -O2 -quiet -ftime-report 
> ../tsrc/tramp3d-v4.i
>
> The drawback is that if there is context switching with other programs
> the time will be overestimated, however for the common case that the
> system is not oversubscribed it is more accurate because each
> measurement has less overhead.
>
> Bootstrapped on x86_64-linux with full test suite run.

Thanks for doing this - I'd like to open up for discussion whether we
should simply
switch the default and stop recording user/system time for
-ftime-report.  One reason
some infrastructure isn't using fine-grained timevars is because of overhead.

So, shouldn't we go without the new option and simply change
-ftime-report behavior?

Related - with -ftime-trace coming up again recently I wonder if we
should transition
to -ftime-report={user,wall,details,trace,...} allowing
-ftime-report=user,details.  I'll note
that while adding an option, removing it later is always difficult.

Richard.

> gcc/ChangeLog:
>
> * common.opt (ftime-report-wall): Add.
> * common.opt.urls: Regenerate.
> * doc/invoke.texi: (ftime-report-wall): Document
> * gcc.cc (try_generate_repro): Check for -ftime-report-wall.
> * timevar.cc (get_time): Use clock_gettime if enabled.
> (timer::print): Print only wall time for time_report_wall.
> (make_json_for_timevar_time_def): Dito.
> * toplev.cc (toplev::start_timevars): Check for time_report_wall.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/timevar3.C: New test.
>
> ---
>
> v2: Adjust JSON/Sarif output too.
> ---
>  gcc/common.opt  |  4 +++
>  gcc/common.opt.urls |  3 +++
>  gcc/doc/invoke.texi |  7 ++
>  gcc/gcc.cc  |  3 ++-
>  gcc/testsuite/g++.dg/ext/timevar3.C | 14 +++
>  gcc/timevar.cc  | 38 +++--
>  gcc/toplev.cc   |  3 ++-
>  7 files changed, 62 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/timevar3.C
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 12b25ff486de..a200a8a0bc45 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3014,6 +3014,10 @@ ftime-report
>  Common Var(time_report)
>  Report the time taken by each compiler pass.
>
> +ftime-report-wall
> +Common Var(time_report_wall)
> +Report the wall time taken by each compiler.
> +
>  ftime-report-details
>  Common Var(time_report_details)
>  Record times taken by sub-phases separately.
> diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
> index e31736cd9945..6e79a8f9390b 100644
> --- a/gcc/common.opt.urls
> +++ b/gcc/common.opt.urls
> @@ -1378,6 +1378,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-fthread-jumps)
>  ftime-report
>  UrlSuffix(gcc/Developer-Options.html#index-ftime-report)
>
> +ftime-report-wall
> +UrlSuffix(gcc/Developer-Options.html#index-ftime-report-wall)
> +
>  ftime-report-details
>  UrlSuffix(gcc/Developer-Options.html#index-ftime-report-details)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d38c1feb86f7..8c11d12e7521 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -784,6 +784,7 @@ Objective-C and Objective-C++ Dialects}.
>  -frandom-seed=@var{string}  -fsched-verbose=@var{n}
>  -fsel-sched-verbose  -fsel-sched-dump-cfg  -fsel-sched-pipelining-verbose
>  -fstats  -fstack-usage  -ftime-report  -ftime-report-details
> +-ftime-report-wall
>  -fvar-tracking-assignments-toggle  -gtoggle
>  -print-file-name=@var{library}  -print-libgcc-file-name
>  -print-multi-directory  -print-multi-lib  -print-multi

Re: [PATCH] testsuite: Define missing and use ET for arm_arch_* and arm_cpu_*

2024-10-07 Thread Richard Earnshaw (lists)
On 07/10/2024 09:03, Torbjörn SVENSSON wrote:
> Ok for trunk?
> 
> --
> 
> Update test cases to use -mcpu=unset/-march=unset feature introduced in
> r15-3606-g7d6c6a0d15c.

The acronym ET isn't one I recognize - I'm guessing you intend it to be 
Effective Target, rather than Extra Terrestrial, or Elf Target or some other 
expansion?  I think perhaps it would be better to avoid this in the commit log. 
 Your summary line is also a little imprecise as I suspect we will have more 
patches of a similar nature for some other patches soon.  Something like:

testsuite: arm: use effective-target for vsel* and mod* tests

would be closer

> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/arm/pr65647.c: Use ET arm_arch_v6m.
>   * gcc.target/arm/mod_2.c: Use ET arm_cpu_cortex_a57.
>   * gcc.target/arm/mod_256.c: Likewise.
>   * gcc.target/arm/vseleqdf.c: Likewise.
>   * gcc.target/arm/vseleqsf.c: Likewise.
>   * gcc.target/arm/vselgedf.c: Likewise.
>   * gcc.target/arm/vselgesf.c: Likewise.
>   * gcc.target/arm/vselgtdf.c: Likewise.
>   * gcc.target/arm/vselgtsf.c: Likewise.
>   * gcc.target/arm/vselledf.c: Likewise.
>   * gcc.target/arm/vsellesf.c: Likewise.
>   * gcc.target/arm/vselltdf.c: Likewise.
>   * gcc.target/arm/vselltsf.c: Likewise.
>   * gcc.target/arm/vselnedf.c: Likewise.
>   * gcc.target/arm/vselnesf.c: Likewise.
>   * gcc.target/arm/vselvcdf.c: Likewise.
>   * gcc.target/arm/vselvcsf.c: Likewise.
>   * gcc.target/arm/vselvsdf.c: Likewise.
>   * gcc.target/arm/vselvssf.c: Likewise.
>   * lib/target-supports.exp: Define EF arm_cpu_cortex_a57.  Update ET
  ^^  
Typo for ET?

The body of the patch is OK with an updated commit message.

Thanks.
R.


>   arm_v8_1_lob_ok to use -mcpu=unset.
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.target/arm/mod_2.c| 4 +++-
>  gcc/testsuite/gcc.target/arm/mod_256.c  | 4 +++-
>  gcc/testsuite/gcc.target/arm/pr65647.c  | 3 ++-
>  gcc/testsuite/gcc.target/arm/vseleqdf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vseleqsf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselgedf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselgesf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselgtdf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselgtsf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselledf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vsellesf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselltdf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselltsf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselnedf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselnesf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselvcdf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselvcsf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselvsdf.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/vselvssf.c | 5 +++--
>  gcc/testsuite/lib/target-supports.exp   | 3 ++-
>  20 files changed, 58 insertions(+), 36 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c 
> b/gcc/testsuite/gcc.target/arm/mod_2.c
> index 1143725d59a..3a203b67d73 100644
> --- a/gcc/testsuite/gcc.target/arm/mod_2.c
> +++ b/gcc/testsuite/gcc.target/arm/mod_2.c
> @@ -1,7 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
> "-mpure-code" } } */
>  /* { dg-require-effective-target arm32 } */
> -/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
> +/* { dg-require-effective-target arm_cpu_cortex_a57 } */
> +/* { dg-options "-O2 -save-temps" } */
> +/* { dg-add-options arm_cpu_cortex_a57 } */
>  
>  #include "../aarch64/mod_2.x"
>  
> diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c 
> b/gcc/testsuite/gcc.target/arm/mod_256.c
> index d8dca0fe7d5..3521d7a05f3 100644
> --- a/gcc/testsuite/gcc.target/arm/mod_256.c
> +++ b/gcc/testsuite/gcc.target/arm/mod_256.c
> @@ -1,7 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
> "-mpure-code" } } */
>  /* { dg-require-effective-target arm32 } */
> -/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
> +/* { dg-require-effective-target arm_cpu_cortex_a57 } */
> +/* { dg-options "-O2 -save-temps" } */
> +/* { dg-add-options arm_cpu_cortex_a57 } */
>  
>  #include "../aarch64/mod_256.x"
>  
> diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
> b/gcc/testsuite/gcc.target/arm/pr65647.c
> index 26b4e399f6b..dc3a3ca1184 100644
> --- a/gcc/testsuite/gcc.target/arm/pr65647.c
> +++ b/gcc/testsuite/gcc.target/arm/pr65647.c
> @@ -1,7 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target arm_arch_v6m_ok } */
>  /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
> {"-mfloat-abi=soft" } } */
> -/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft" } */
> +/* { dg-options "-mthumb -O3 -w -mfloat-abi=soft" } */
> +/* { dg-add-options arm_arch_v6m } */
>  
>  a, b, c, e, g = &e, h, i = 7, l = 1, m, n, o, q = &m,

Re: [PATCH] contrib, libcpp, libstdc++: Update to Unicode 16.0

2024-10-07 Thread Jonathan Wakely
On Mon, 7 Oct 2024 at 11:47, Jakub Jelinek wrote:
>
> Hi!
>
> It is autumn again and there is a new Unicode version 16.0.
>
> The following patch updates our Unicode stuff in contrib, libcpp and
> libstdc++ from that Unicode version.
>
> As the patch is really large, I've split the unicode/UnicodeData.txt
> and unicode/DerivedCoreProperties.txt updates patch into xz -9e
> compressed one attached after this patch and the largest
> uname2c.h changes will be posted in a follow-up mail.
>
> Ok for trunk?

The libstdc++ parts are OK, thanks for updating them.



Re: [PATCH] middle-end: reorder masking priority of math functions

2024-10-07 Thread Victor Do Nascimento

On 10/7/24 10:52, Richard Biener wrote:

On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento
 wrote:


Given the categorization of math built-in functions as `ECF_CONST',
when if-converting their uses, their calls are not masked and are thus
called with an all-true predicate.

This, however, is not appropriate where built-ins have library
equivalents, wherein they may exhibit highly architecture-specific
behaviors. For example, vectorized implementations may delegate the
computation of values outside a certain acceptable numerical range to
special (non-vectorized) routines which considerably slow down
computation.

As numerical simulation programs often do bounds check on input values
prior to math calls, conditionally assigning default output values for
out-of-bounds input and skipping the math call altogether, these
fallback implementations should seldom be called in the execution of
vectorized code.  If, however, we don't apply any masking to these
math functions, we end up effectively executing both if and else
branches for these values, leading to considerable performance
degradation on scientific workloads.

We therefore invert the order of handling of math function calls in
`if_convertible_stmt_p' to prioritize the handling of their
library-provided implementations over the equivalent internal function.

Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
new regressions.


I think the patch is good - note I think there's even a bugzilla about this
behavior.  So as incremental improvement the patch is OK.


Thanks, this will be particularly useful in allowing us to vectorize
more math code without the worry of the trapping performance penalty,
unblocking some patches that were effectively put on hold pending
some solution to issue.


I think we should further improve this, possibly with profile data, and
the situation could be improved by handling the calls like mask stores
where we put a if (mask != 0) before the store.


Hmmm... Interesting that this behavior has already made it into
Bugzilla.  I'm guessing you're referring to `Bug 65425 - code
optimization leads to spurious FP exception'.

I will look into how we may refine the handling of the issue with the
aim of submitting a follow-up patch, as per your suggestion.

Many thanks,
Victor.



Richard.


gcc/ChangeLog:

 * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
 function declaration before IFN fallback.

gcc/testsuite/ChangeLog:

 * gcc.dg/vect/vect-fncall-mask-math.c: New.
---
  .../gcc.dg/vect/vect-fncall-mask-math.c   | 33 +++
  gcc/tree-if-conv.cc   | 18 +-
  2 files changed, 42 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c 
b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
new file mode 100644
index 000..15e22da2807
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
@@ -0,0 +1,33 @@
+/* Test the correct application of masking to autovectorized math function 
calls.
+   Test is currently set to xfail pending the release of the relevant lmvec
+   support. */
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" 
{ target { aarch64*-*-* } } } */
+
+#include 
+
+const int N = 20;
+const float lim = 101.0;
+const float cst =  -1.0;
+float tot =   0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+   [10 ... 19] = 100.0 };/* Else branch.  */
+
+int main (void)
+{
+  #pragma omp simd
+  for (int i = 0; i < N; i += 1)
+{
+  if (a[i] > lim)
+   b[i] = cst;
+  else
+   b[i] = expf (a[i]);
+  tot += b[i];
+}
+  return (0);
+}
+
+/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { xfail 
{ aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} 
ifcvt { xfail { aarch64*-*-* } } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 3b04d1e8d34..90c754a4814 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)

  case GIMPLE_CALL:
{
-   /* There are some IFN_s that are used to replace builtins but have the
-  same semantics.  Even if MASK_CALL cannot handle them vectorable_call
-  will insert the proper selection, so do not block conversion.  */
-   int flags = gimple_call_flags (stmt);
-   if ((flags & ECF_CONST)
-   && !(flags & ECF_LOOPING_CONST_OR_PURE)
-   && gimple_call_combined_fn (stmt) != CFN_LAST)
- return true;
-
 tree fndecl = gimple_call_fndecl (stmt);
 if (fndecl)
   {
@@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)
   }
   }

+   /* There are some IFN_s t

[PATCH] tree-optimization/116982 - analyze scalar loop exit early

2024-10-07 Thread Richard Biener
The following makes sure to discover the scalar loop IV exit during
analysis as failure to do so (if DCE and friends are disabled this
can happen due to if-conversion doing DCE and FRE on the if-converted
loop) would ICE later.

I refrained from larger refactoring to be able to eventually backport.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116982
* tree-vectorizer.h (vect_analyze_loop): Pass in .LOOP_VECTORIZED
call.
(vect_analyze_loop_form): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_form): Reject loops where we
cannot determine a IV exit for the scalar loop.
(vect_analyze_loop): Adjust.
* tree-vectorizer.cc (try_vectorize_loop_1): Likewise.
* tree-parloops.cc (gather_scalar_reductions): Likewise.
---
 gcc/tree-parloops.cc   |  4 ++--
 gcc/tree-vect-loop.cc  | 23 +++
 gcc/tree-vectorizer.cc |  3 ++-
 gcc/tree-vectorizer.h  |  6 --
 4 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index f4468658732..6a1249bebb6 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -3305,7 +3305,7 @@ gather_scalar_reductions (loop_p loop, 
reduction_info_table_type *reduction_list
 
   vec_info_shared shared;
   vect_loop_form_info info;
-  if (!vect_analyze_loop_form (loop, &info))
+  if (!vect_analyze_loop_form (loop, NULL, &info))
 goto gather_done;
 
   simple_loop_info = vect_create_loop_vinfo (loop, &shared, &info);
@@ -3347,7 +3347,7 @@ gather_scalar_reductions (loop_p loop, 
reduction_info_table_type *reduction_list
 {
   vec_info_shared shared;
   vect_loop_form_info info;
-  if (vect_analyze_loop_form (loop->inner, &info))
+  if (vect_analyze_loop_form (loop->inner, NULL, &info))
{
  simple_loop_info
= vect_create_loop_vinfo (loop->inner, &shared, &info);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f5ecf0bdb80..2335642d67c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1737,7 +1737,8 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info 
loop_vinfo)
  niter could be analyzed under some assumptions.  */
 
 opt_result
-vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
+vect_analyze_loop_form (class loop *loop, gimple *loop_vectorized_call,
+   vect_loop_form_info *info)
 {
   DUMP_VECT_SCOPE ("vect_analyze_loop_form");
 
@@ -1747,6 +1748,18 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   "not vectorized:"
   " could not determine main exit from"
   " loop with multiple exits.\n");
+  if (loop_vectorized_call)
+{
+  tree arg = gimple_call_arg (loop_vectorized_call, 1);
+  class loop *scalar_loop = get_loop (cfun, tree_to_shwi (arg));
+  edge scalar_exit_e = vec_init_loop_exit_info (scalar_loop);
+  if (!scalar_exit_e)
+   return opt_result::failure_at (vect_location,
+  "not vectorized:"
+  " could not determine main exit from"
+  " loop with multiple exits.\n");
+}
+
   info->loop_exit = exit_e;
   if (dump_enabled_p ())
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -1819,7 +1832,7 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
 
   /* Analyze the inner-loop.  */
   vect_loop_form_info inner;
-  opt_result res = vect_analyze_loop_form (loop->inner, &inner);
+  opt_result res = vect_analyze_loop_form (loop->inner, NULL, &inner);
   if (!res)
{
  if (dump_enabled_p ())
@@ -3520,7 +3533,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
for it.  The different analyses will record information in the
loop_vec_info struct.  */
 opt_loop_vec_info
-vect_analyze_loop (class loop *loop, vec_info_shared *shared)
+vect_analyze_loop (class loop *loop, gimple *loop_vectorized_call,
+  vec_info_shared *shared)
 {
   DUMP_VECT_SCOPE ("analyze_loop_nest");
 
@@ -3538,7 +3552,8 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
 
   /* Analyze the loop form.  */
   vect_loop_form_info loop_form_info;
-  opt_result res = vect_analyze_loop_form (loop, &loop_form_info);
+  opt_result res = vect_analyze_loop_form (loop, loop_vectorized_call,
+  &loop_form_info);
   if (!res)
 {
   if (dump_enabled_p ())
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index d4ab47349a3..fed12c41f9c 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1067,7 +1067,8 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
 LOCATION_LINE (vect_location.get_location_t ()));
 
   /* Try to analyze the loop, retaining an opt_problem if dump

[PATCH] tree-optimization/116990 - missed control flow check in vect_analyze_loop_form

2024-10-07 Thread Richard Biener
The following fixes checking for unsupported control flow in
vectorization to also cover the outer loop body.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116990
* tree-vect-loop.cc (vect_analyze_loop_form): Check the current
loop body for control flow.
---
 gcc/tree-vect-loop.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2335642d67c..dc6c7c2faa0 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1767,9 +1767,8 @@ vect_analyze_loop_form (class loop *loop, gimple 
*loop_vectorized_call,
   exit_e->src->index, exit_e->dest->index, exit_e->aux);
 
   /* Check if we have any control flow that doesn't leave the loop.  */
-  class loop *v_loop = loop->inner ? loop->inner : loop;
-  basic_block *bbs = get_loop_body (v_loop);
-  for (unsigned i = 0; i < v_loop->num_nodes; i++)
+  basic_block *bbs = get_loop_body (loop);
+  for (unsigned i = 0; i < loop->num_nodes; i++)
 if (EDGE_COUNT (bbs[i]->succs) != 1
&& (EDGE_COUNT (bbs[i]->succs) != 2
|| !loop_exits_from_bb_p (bbs[i]->loop_father, bbs[i])))
-- 
2.43.0


[pushed] c++: modules don't require preprocessor output

2024-10-07 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

init_modules has rejected -M -fmodules-ts on the premise that module
dependency analysis requires macro expansion, but this is no longer
accurate; P1857 prohibited module directives produced by macro expansion.
They can still be dependent on #if directives, but those are still handled
with -fdirectives-only.

What wasn't working was -M or -dM, because cpp_scan_nooutput never called
module_token_pre to implement the import.  The simplest fix is to use the
-fdirectives-only scan when modules are enabled and teach directives_only_cb
about flag_no_output.

gcc/cp/ChangeLog:

* module.cc (init_modules): Don't warn about -M.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (preprocess_file): For modules,
use directives-only scan even with flag_no_output.
(directives_only_cb): Respect flag_no_output.

gcc/ChangeLog:

* doc/invoke.texi (C++ Module Preprocessing): Allow -M,
refer to -fdeps.

gcc/testsuite/ChangeLog:

* g++.dg/modules/macro-8_a.H: New test.
* g++.dg/modules/macro-8_b.C: New test.
* g++.dg/modules/macro-8_c.C: New test.
* g++.dg/modules/macro-8_d.C: New test.
---
 gcc/doc/invoke.texi  | 13 
 gcc/c-family/c-ppoutput.cc   | 39 +++-
 gcc/cp/module.cc | 17 ---
 gcc/testsuite/g++.dg/modules/macro-8_b.C | 13 
 gcc/testsuite/g++.dg/modules/macro-8_c.C | 13 
 gcc/testsuite/g++.dg/modules/macro-8_d.C | 13 
 gcc/testsuite/g++.dg/modules/macro-8_a.H |  4 +++
 7 files changed, 73 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/macro-8_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/macro-8_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/macro-8_d.C
 create mode 100644 gcc/testsuite/g++.dg/modules/macro-8_a.H

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d38c1feb86f..987b6360152 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -38206,13 +38206,12 @@ Whether a particular directive is translated is 
controlled by the
 module mapper.  Header unit names are canonicalized during
 preprocessing.
 
-Dependency information can be emitted for macro import, extending the
-functionality of @option{-MD} and @option{-MMD} options.  Detection of
-import declarations also requires phase 4 preprocessing, and thus
-requires full preprocessing (or compilation).
-
-The @option{-M}, @option{-MM} and @option{-E -fdirectives-only} options halt
-preprocessing before phase 4.
+Dependency information can be emitted for module import, extending the
+functionality of the various @option{-M} options.  Detection of import
+declarations requires phase 4 handling of preprocessor directives, but
+does not require macro expansion, so it is not necessary to use
+@option{-MD}.  See also @option{-fdeps-*} for an alternate format for
+module dependency information.
 
 The @option{-save-temps} option uses @option{-fdirectives-only} for
 preprocessing, and preserve the macro definitions in the preprocessed
diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index e3f5ca3ec97..374252bb4f3 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -92,10 +92,16 @@ preprocess_file (cpp_reader *pfile)
  cpp_scan_nooutput or cpp_get_token next.  */
   if (flag_no_output && pfile->buffer)
 {
-  /* Scan -included buffers, then the main file.  */
-  while (pfile->buffer->prev)
-   cpp_scan_nooutput (pfile);
-  cpp_scan_nooutput (pfile);
+  if (flag_modules)
+   /* For macros from imported headers we need directives_only_cb.  */
+   scan_translation_unit_directives_only (pfile);
+  else
+   {
+ /* Scan -included buffers, then the main file.  */
+ while (pfile->buffer->prev)
+   cpp_scan_nooutput (pfile);
+ cpp_scan_nooutput (pfile);
+   }
 }
   else if (cpp_get_options (pfile)->traditional)
 scan_translation_unit_trad (pfile);
@@ -389,28 +395,31 @@ directives_only_cb (cpp_reader *pfile, CPP_DO_task task, 
void *data_, ...)
   gcc_unreachable ();
 
 case CPP_DO_print:
-  {
-   print.src_line += va_arg (args, unsigned);
+  if (!flag_no_output)
+   {
+ print.src_line += va_arg (args, unsigned);
 
-   const void *buf = va_arg (args, const void *);
-   size_t size = va_arg (args, size_t);
-   fwrite (buf, 1, size, print.outf);
-  }
+ const void *buf = va_arg (args, const void *);
+ size_t size = va_arg (args, size_t);
+ fwrite (buf, 1, size, print.outf);
+   }
   break;
 
 case CPP_DO_location:
-  maybe_print_line (va_arg (args, location_t));
+  if (!flag_no_output)
+   maybe_print_line (va_arg (args, location_t));
   break;
 
 case CPP_DO_token:
   {
const cpp_token *token = va_arg (args, const cpp_token *);
-   lo

[pushed] c++: -Wmismatched-tags and modules

2024-10-07 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In Wmismatched-tags-6.C, we try to compare two declarations of the Cp alias
template, and ICE trying to check whether they're in module purview.  We
need to check DECL_LANG_SPECIFIC like elsewhere in the compiler.

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Only check PURVIEW_P if
DECL_LANG_SPECIFIC.
---
 gcc/cp/decl.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 07fb9855cd2..0c5b5c06a12 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2530,7 +2530,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, 
bool was_hidden)
 
   /* Propagate purviewness and importingness as with
 set_instantiating_module.  */
-  if (modules_p ())
+  if (modules_p () && DECL_LANG_SPECIFIC (new_result))
{
  if (DECL_MODULE_PURVIEW_P (new_result))
DECL_MODULE_PURVIEW_P (old_result) = true;

base-commit: 53f20f992a7b0f18fec83ea696c466aa53a1293c
-- 
2.46.2



Re: [PATCH v2] c: ICE in build_counted_by_ref [PR116735]

2024-10-07 Thread Marek Polacek
On Wed, Oct 02, 2024 at 10:20:11PM +, Qing Zhao wrote:
> From: qing zhao 
> 
> Hi, this is the 2nd version of the patch. 
> compared to the 1st version, the major changes are to address Marek and
> Jacub's comments.
> 
> bootstrapped and regression tested on both x86 and aarch64.
> Okay for committing?

Ok, thanks.  (Sorry, was on vacation last week.)
 
Marek



[Patch] Update COPYING and COPYING.LIB for FSF' postal-address change

2024-10-07 Thread Tobias Burnus
As some have detected, the older FSF licenses changed as FSF moved out 
of their Franklin Street office.


That's sensible as writing to the old address is not the best way to get 
a printed version of the license and as the internet is now really 
ubiquitous.


→ https://www.fsf.org/blogs/community/fsf-office-closing-party

→ https://gcc.gnu.org/pipermail/gcc/2024-September/244885.html

OK to commit?

Can someone take care of the sync of libiberty/ directory, once committed?

Tobias

PS: As mentioned in the gcc@ email (see link above), we should also 
remove some Franklin Street references at least from the .texi files and 
possibly the code files. But that I defer to a follow-up patch. (By me 
or by others.)
Update COPYING and COPYING.LIB for FSF' postal-address change

The FSF moved out of their Franklin Street office in August 2024, cf.
https://www.fsf.org/blogs/community/fsf-office-closing-party

Therefore, ther FSF has updated their old licenses to remove the
reference to that street address.  This commit updates the license
GPL 2 and LGPL 2.1 to the ones at
https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt and
https://www.gnu.org/licenses/old-licenses/lgpl-2.1.txt

The main change is the replacement of the street address by the URL
plus some reformatting and the change of the vice president name from
Ty Coon to Moe Ghoul.

ChangeLog:

	* COPYING: Sync with new FSF version that replaced the
	Franklin Street address by https://fsf.org/.
	* COPYING.LIB: Likewise.

gcc/ChangeLog:

	* COPYING: Sync with new FSF version that replaced the
	Franklin Street address by https://fsf.org/.
	* COPYING.LIB: Likewise.

include/ChangeLog:

	* COPYING: Sync with new FSF version that replaced the
	Franklin Street address by https://fsf.org/.

libiberty/ChangeLog:

	* COPYING.LIB: Sync with new FSF version that replaced the
	Franklin Street address by https://fsf.org/.

libquadmath/ChangeLog:

	* COPYING.LIB: Sync with new FSF version that replaced the
	Franklin Street address by https://fsf.org/.

diff --git a/COPYING b/COPYING
index 623b6258a13..9efa6fbc962 100644
--- a/COPYING
+++ b/COPYING
@@ -1,2 +1,2 @@
-		GNU GENERAL PUBLIC LICENSE
-		   Version 2, June 1991
+GNU GENERAL PUBLIC LICENSE
+   Version 2, June 1991
@@ -4,2 +4,2 @@
- Copyright (C) 1989, 1991 Free Software Foundation, Inc.
- 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 
@@ -9 +9 @@
-			Preamble
+Preamble
@@ -18 +18 @@ using it.  (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.)  You can apply it to
+the GNU Lesser General Public License instead.)  You can apply it to
@@ -58,2 +58,2 @@ modification follow.
-
-		GNU GENERAL PUBLIC LICENSE
+
+GNU GENERAL PUBLIC LICENSE
@@ -113 +113 @@ above, provided that you also meet all of these conditions:
-
+
@@ -171 +171 @@ compelled to copy the source along with the object code.
-
+
@@ -228 +228 @@ be a consequence of the rest of this License.
-
+
@@ -258 +258 @@ of promoting the sharing and reuse of software generally.
-			NO WARRANTY
+NO WARRANTY
@@ -280,3 +280,3 @@ POSSIBILITY OF SUCH DAMAGES.
-		 END OF TERMS AND CONDITIONS
-
-	How to Apply These Terms to Your New Programs
+ END OF TERMS AND CONDITIONS
+
+How to Apply These Terms to Your New Programs
@@ -306,4 +306,2 @@ the "copyright" line and a pointer to where the full notice is found.
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
-
+You should have received a copy of the GNU General Public License along
+with this program; if not, see .
@@ -316 +314 @@ when it starts in an interactive mode:
-Gnomovision version 69, Copyright (C) year  name of author
+Gnomovision version 69, Copyright (C) year name of author
@@ -333,2 +331,2 @@ necessary.  Here is a sample; alter the names:
-  , 1 April 1989
-  Ty Coon, President of Vice
+  , 1 April 1989
+  Moe Ghoul, President of Vice
@@ -339 +337 @@ consider it more useful to permit linking proprietary applications with the
-library.  If this is what you want to do, use the GNU Library General
+library.  If this is what you want to do, use the GNU Lesser General
diff --git a/COPYING.LIB b/COPYING.LIB
index 2d2d780e601..f6683e74e0f 100644
--- a/COPYING.LIB
+++ b/COPYING.LIB
@@ -1 +0,0 @@
-
@@ -6 +5 @@
-	51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ 
@@ -26,2 +25 @@ this license or the ordinary General Public License is the better
-strategy to use in any particular case, based on the explanations
-below.
+strategy to use in any particular 

Re: Why -Wsuggest-attribute=noreturn does not apply to static?

2024-10-07 Thread Jakub Jelinek
On Mon, Oct 07, 2024 at 03:05:56PM +0300, Дилян Палаузов wrote:
> does [[noreturn]] optimize the generated [[noreturn]] function itself, or
> it optimizes the calls to the [[noreturn]] function?  Hence, in the latter
> case optimizations are done based on function declaration, irrespective of
> function body.

Of course the latter, that is the whole point of the attribute.
In the definition of [[noreturn]] function itself, all it can do is
warn if the function does return anyway.

Jakub



arm: fix bootstrap issue with arm_noce_conversion_profitable_p patch [NFC]

2024-10-07 Thread Andre Vieira (lists)

Committed attached patch as obvious.

This obvious patch fixes two warnings introduced with the implementation 
of arm_noce_conversion_profitable_p hook.


gcc/ChangeLog:

* config/arm/arm.cc (arm_noce_oncersion_profitable_p): Remove unused
argument name.
(arm_is_v81m_cond_insn): Initialize variable.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
077c80df4482d168d9694795be68c2eeb8f304d9..5c11621327e15b7212b2290769cc0a922347ce2d
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -36101,7 +36101,7 @@ static bool
 arm_is_v81m_cond_insn (rtx_insn *seq)
 {
   rtx_insn *curr_insn = seq;
-  rtx set;
+  rtx set = NULL_RTX;
   /* The pattern may start with a simple set with register operands.  Skip
  through any of those.  */
   while (curr_insn)
@@ -36164,7 +36164,7 @@ arm_is_v81m_cond_insn (rtx_insn *seq)
hook to only allow "noce" to generate the patterns that are profitable.  */
 
 bool
-arm_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+arm_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *)
 {
   if (!TARGET_COND_ARITH
   || reload_completed)


Re: [PATCH] expr, v2: Don't clear whole unions [PR116416]

2024-10-07 Thread Jason Merrill

On 10/4/24 1:02 PM, Jakub Jelinek wrote:

On Thu, Oct 03, 2024 at 12:14:35PM -0400, Jason Merrill wrote:

Agreed, the padding bits have indeterminate values (or erroneous in C++26),
so it's correct for infoleak-1.c to complain about 4b.


I've been afraid what the kernel people would say about this change (because
reading Linus' mails shows he doesn't care about what the standards say,
but what he expects to see, anything else is "broken").


Indeed.


Though, looking at godbolt, clang and icc 19 and older gcc all do zero
initialize the whole union before storing the single member in there (if
non-zero, otherwise just clear).

So whether we want to do this or do it by default is another question.


We will want to initialize the padding (for all types) to something for
C++26, but that's a separate issue...


But ideally in a way where uninit warnings know the bits aren't initialized
even if they are.


Yes.


Anyway, bootstrapped/regtested on x86_64-linux and i686-linux successfully.

2024-09-28  Jakub Jelinek  

PR c++/116416
* expr.cc (categorize_ctor_elements_1): Fix up union handling of
*p_complete.  Clear it only if num_fields is 0 and the union has
at least one FIELD_DECL, set to -1 if either union has no fields
and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
returned false.


Hmm, complete_ctor_at_level_p also seems to need a change for this
understanding of union semantics: "every meaningful byte" depends on the
active member, so it seems like it should return true for a union iff
num_elts == 1.


I thought complete_ctor_at_level_p has a single caller, but apparently
that isn't the case, cp/typeck2.cc uses it too.

Here is an updated version of the patch, which
a) moves some of the stuff into complete_ctor_at_level_p (but not
all the *p_complete = 0; case, for that it would need to change
so that it passes around the ctor rather than just its type)
b) introduces a new option, so that users can either get the new
behavior (only what is guaranteed by the standards, the default),
or previous behavior (union padding zero initialization, no such
guarantees in structures) or also a guarantee in structures
c) introduces a new CONSTRUCTOR flag which says that the padding bits
(if any) should be zero initialized (and sets it for now in the C++
FE for C23 {} initializers).

Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
for C23, if there is just empty initializer, I think we already mark
it as incomplete if there are any missing initializers.  Maybe with
some designated initializer games, say
void foo () {
   struct S { char a; long long b; };
   struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
...
}
Is this supposed to initialize padding bits in C23 and then the .c.a = 1
and .c.b = 2 stores preserve those padding bits, so is that supposed
to be different from struct T t2 = { .c = { 1, 2 } };
?  What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?

And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
on where exactly is zero-initialization done (vs. other types of
initialization) and where is e.g. zero-initialization of a temporary then
(member-wise) copied.
Say
struct S { char a; long long b; };
struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
void bar (T *);

void
foo ()
{
   T t (1, 2);
   bar (&t);
}
Is the c () value-initialization of t.c followed by c.a and c.b updates
which preserve the zero initialized padding bits?


Yes.


Or is there some
copy construction involved which does member-wise copying and makes the
padding bits undefined?


No, c() directly value-initializes the c member.


Looking at (older) clang++ with -O2, it initializes also the padding bits
when c () is used and doesn't with c {}.


That seems correct; since S is an aggregate c{} is member-wise 
initialization rather than value-initialization.



For GCC, note that there is that optimization from Alex to zero padding bits
for optimization purposes for small aggregates, so either one needs to look
at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
optimized that way.

Only lightly tested so far, this is mostly for further discussions.
And also a question what exactly does cp/typeck2.cc want from
complete_ctor_at_level_p, e.g. if it wants false for all the cases where
categorize_ctor_elements_1 does *p_complete = 0; (in that case it would need
to know whether CONSTRUCTOR_ZERO_PADDING_BITS flag was set).


Hmm, split_nonconstant_init uses that result to decide whether to 
discard the CONSTRUCTOR; I guess if _ZERO_PADDING_BITS is set we would 
want to keep the CONSTRUCTOR for clearing the padding.


Jason



Re: [r15-4104 Regression] FAIL: gfortran.dg/gomp/allocate-static.f90 -Os (test for excess errors) on Linux/x86_64

2024-10-07 Thread Tobias Burnus

haochen.jiang wrote:

On Linux/x86_64,
FAIL: gfortran.dg/gomp/allocate-static.f90   -O0  (test for excess errors)


If anyone can reproduce this, I would be interested in the excess errors.

On two machines – with and without offloading configured – I cannot 
reproduce this neither with a bootsstrap nor non-bootstrap build, 
neither with the testsuite nor under valgrind and also not with -m32 vs. 
-m64.


Tobias



Re: [PATCH 2/3] Release expanded template argument vector

2024-10-07 Thread Patrick Palka
On Sat, 5 Oct 2024, Jason Merrill wrote:

> On 10/4/24 11:00 AM, Patrick Palka wrote:
> > On Thu, 3 Oct 2024, Jason Merrill wrote:
> > 
> > > On 10/3/24 12:38 PM, Jason Merrill wrote:
> > > > On 10/2/24 7:50 AM, Richard Biener wrote:
> > > > > This reduces peak memory usage by 20% for a specific testcase.
> > > > > 
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > 
> > > > > It's very ugly so I'd appreciate suggestions on how to handle such
> > > > > situations better?
> > > > 
> > > > I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.
> > > 
> > > OK, apparently that was both too clever and not clever enough. Replacing
> > > it
> > > with this one that's much closer to yours.
> > > 
> > > Jason
> > 
> > > From: Jason Merrill 
> > > Date: Thu, 3 Oct 2024 16:31:00 -0400
> > > Subject: [PATCH] c++: free garbage vec in coerce_template_parms
> > > To: gcc-patches@gcc.gnu.org
> > > 
> > > coerce_template_parms can create two different vecs for the inner template
> > > arguments, new_inner_args and (potentially) the result of
> > > expand_template_argument_pack.  One or the other, or possibly both, end up
> > > being garbage: in the typical case, the expanded vec is garbage because
> > > it's
> > > only used as the source for convert_template_argument.  In some dependent
> > > cases, the new vec is garbage because we decide to return the original
> > > args
> > > instead.  In these cases, ggc_free the garbage vec to reduce the memory
> > > overhead of overload resolution.
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * pt.cc (coerce_template_parms): Free garbage vecs.
> > > 
> > > Co-authored-by: Richard Biener 
> > > ---
> > >   gcc/cp/pt.cc | 10 +-
> > >   1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > index 20affcd65a2..4ceae1d38de 100644
> > > --- a/gcc/cp/pt.cc
> > > +++ b/gcc/cp/pt.cc
> > > @@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
> > >   {
> > > /* We don't know how many args we have yet, just use the
> > >unconverted (and still packed) ones for now.  */
> > > +   ggc_free (new_inner_args);
> > > new_inner_args = orig_inner_args;
> > > arg_idx = nargs;
> > > break;
> > > @@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
> > > = make_pack_expansion (conv, complain);
> > >   /* We don't know how many args we have yet, just
> > > - use the unconverted ones for now.  */
> > > +  use the unconverted (but unpacked) ones for now.  */
> > > +   ggc_free (new_inner_args);
> > 
> > I'm a bit worried about these ggc_frees.  If an earlier template
> > parameter is a constrained auto NTTP then new_inner_args/new_args could
> > have been captured by the satisfaction cache during coercion for that
> > argument, and so we'd be freeing a vector that's still live?
> 
> It seems like for e.g.
> 
> template  concept NotInt = !__is_same (T, int);
> template  struct A { };
> template  using B = A<'x', Ts...>;
> 
> we don't check satisfaction until after we're done coercing, because of
> 
>   if (processing_template_decl && context == adc_unify)
> /* Constraints will be checked after deduction.  */;
> 
> in do_auto_deduction.

Ah, I wonder why we pass/use adc_unify from both unify and
convert_template_argument..  That early exit makes sense for unify
but not for convert_template_argument since it prevents us from
checking constrained auto NTTPs during ahead of time coercion:

  template
  concept C = T::value;

  template
  struct A { };

  template
  void f() {
A<0> a; // no constraint error
  }

  A<0> a; // constraint error

I guess we'd ideally want to fix/implement this?  At which point the
ggc_free's of new_inner_args would be unsafe I think..

> 
> Jason
> 
> 



[PATCH v6] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-07 Thread Simon Martin
Hi Jason,

On 17 Sep 2024, at 18:41, Jason Merrill wrote:

> On 9/17/24 10:38 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> Apologies for the back and forth and thanks for your patience!
>
> No worries.
>
>> On 5 Sep 2024, at 19:00, Jason Merrill wrote:
>>
>>> On 9/5/24 7:02 AM, Simon Martin wrote:
 Hi Jason,

 On 4 Sep 2024, at 18:09, Jason Merrill wrote:

> On 9/1/24 2:51 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 26 Aug 2024, at 19:23, Jason Merrill wrote:
>>
>>> On 8/25/24 12:37 PM, Simon Martin wrote:
 On 24 Aug 2024, at 23:59, Simon Martin wrote:
> On 24 Aug 2024, at 15:13, Jason Merrill wrote:
>
>> On 8/23/24 12:44 PM, Simon Martin wrote:
>>> We currently emit an incorrect -Woverloaded-virtual warning
>>
>>> upon

>>> the

>>> following
>>> test case
>>>
>>> === cut here ===
>>> struct A {
>>>virtual operator int() { return 42; }
>>>virtual operator char() = 0;
>>> };
>>> struct B : public A {
>>>operator char() { return 'A'; }
>>> };
>>> === cut here ===
>>>
>>> The problem is that warn_hidden relies on get_basefndecls to

>>> find

>>> the
>>> methods
>>> in A possibly hidden B's operator char(), and gets both the
>>> conversion operator
>>> to int and to char. It eventually wrongly concludes that the

>>> conversion to int
>>> is hidden.
>>>
>>> This patch fixes this by filtering out conversion operators 
>>> to
>>
>>> different types
>>> from the list returned by get_basefndecls.
>>
>> Hmm, same_signature_p already tries to handle comparing
>> conversion
>> operators, why isn't that working?
>>
> It does indeed.
>
> However, `ovl_range (fns)` does not only contain `char
> B::operator()` -
> for which `any_override` gets true - but also `conv_op_marker` 
> -
>>
> for
>>
> which `any_override` gets false, causing `seen_non_override` 
> to

> get
> to
> true. Because of that, we run the last loop, that will emit a
> warning
> for all `base_fndecls` (except `char B::operator()` that has
> been
> removed).
>
> We could test `fndecl` and `base_fndecls[k]` against
> `conv_op_marker` in
> the loop, but we’d still need to inspect the “converting
> to”
> type
> in the last loop (for when `warn_overloaded_virtual` is 2). 

> This
>>
> would
> make the code much more complex than the current patch.
>>>
>>> Makes sense.
>>>
> It would however probably be better if `get_basefndecls` only
> returned
> the right conversion operator, not all of them. I’ll draft
> another
> version of the patch that does that and submit it in this
> thread.
>
 I have explored my suggestion further and it actually ends up
 more
 complicated than the initial patch.
>>>
>>> Yeah, you'd need to do lookup again for each member of fns.
>>>
 Please find attached a new revision to fix the reported issue, 
 as
>>
 well
 as new ones I discovered while testing with
 -Woverloaded-virtual=2.
>>

 It’s pretty close to the initial patch, but (1) adds a 
 missing
 “continue;” (2) fixes a location problem when
 -Woverloaded-virtual==2 (3) adds more test cases. The commit 
 log
 is
 also
 more comprehensive, and should describe well the various 
 problems
>>
 and
>>

 why the patch is correct.
>>>
 +  if (IDENTIFIER_CONV_OP_P (name)
 +  && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
 +   DECL_CONV_FN_TYPE 
 (base_fndecls[k])))
 +{
 +  base_fndecls[k] = NULL_TREE;
 +  continue;
 +}
>>>
>>> So this removes base_fndecls[k] if it doesn't return the same 

>>> type
>>
>>> as
>>> fndecl.  But what if there's another conversion op in fns that
>>> does
>>
>>> return the same type as base_fndecls[k]?
>>>
>>> If I add an operator int() to both base and derived in
>>> Woverloaded-virt7.C, the warning disappears.
>>>
>> That was an issue indeed. I’ve reworked the patch, and came up 
>> with
>> the attached latest version. It explicitly keeps track both of
>> overloaded and of hidden base methods (and the “hiding 

>> method” fo

Re: [V2][PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-07 Thread Jeff Law



On 9/5/24 12:52 PM, Palmer Dabbelt wrote:

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

PR target/116615
* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
---
There's a bunch more discussion in the bug, but it's starting to smell
like this was just a holdover from MIPS (where maybe it also shouldn't
be set).  I haven't tested this, but I figured I'd send the patch to get
a little more visibility.

I guess we should also kick off something like a SPEC run to make sure
there's no regressions?
So as I noted earlier, this appears to be a nice win on the BPI. 
Testsuite fallout is minimal -- just the one SFB related test tripping 
at -Os that was also hit by Andrew P's work.


After looking at it more closely, the SFB codegen and the codegen after 
Andrew's work should be equivalent assuming two independent ops can 
dispatch together.


The test actually generates sensible code at -Os.  It's the -Os in 
combination with the -fno-ssa-phiopt that causes problems.   I think the 
best thing to do here is just skip at -Os.  That still keeps a degree of 
testing the SFB path.


Tested successfully in my tester.  But will wait for the pre-commit 
tester to render a verdict before moving forward.



Jeffdiff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 3aecb43f831..53b7b2a40ed 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -939,8 +939,6 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
 #define TARGET_VECTOR_MISALIGN_SUPPORTED \
riscv_vector_unaligned_access_p
 
-#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
-
 /* Control the assembler format that we output.  */
 
 /* Output to assembler file text saying following lines
diff --git a/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c 
b/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
index 6e9f8cc61de..1ee45b33e15 100644
--- a/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
+++ b/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" } } */
 /* { dg-options "-march=rv32gc -mtune=sifive-7-series -mbranch-cost=1 
-fno-ssa-phiopt -fdump-rtl-ce1" { target { rv32 } } } */
 /* { dg-options "-march=rv64gc -mtune=sifive-7-series -mbranch-cost=1 
-fno-ssa-phiopt -fdump-rtl-ce1" { target { rv64 } } } */
 


Re: [PATCH] libcpp: Use constexpr for _cpp_trigraph_map initialization for C++14

2024-10-07 Thread Marek Polacek
On Wed, Sep 18, 2024 at 06:00:48PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The _cpp_trigraph_map initialization used to be done for C99+ using
> designated initializers, but can't be done that way for C++ because
> the designated initializer support in C++ as array designators are just
> an extension there and don't allow skipping anything nor going backwards.
> 
> But, we can get the same effect using C++14 constexpr constructor.
> With the following patch we get rid of the runtime initialization
> and the array can be in .rodata.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-09-18  Jakub Jelinek  
> 
>   * internal.h (_cpp_trigraph_map_s): New type for C++14 or later.
>   (_cpp_trigraph_map_d): New variable for C++14 or later.
>   (_cpp_trigraph_map): Define to _cpp_trigraph_map_d.map for C++14 or
>   later.
>   * init.cc (init_trigraph_map): Define to nothing for C++14 or later.
>   (TRIGRAPH_MAP, END, s): Define differently for C++14 or later.
> 
> --- libcpp/internal.h.jj  2024-09-12 18:16:49.993409101 +0200
> +++ libcpp/internal.h 2024-09-18 09:45:36.832570227 +0200
> @@ -666,6 +666,12 @@ struct cpp_embed_params
> compiler that supports C99.  */
>  #if HAVE_DESIGNATED_INITIALIZERS
>  extern const unsigned char _cpp_trigraph_map[UCHAR_MAX + 1];
> +#elif __cpp_constexpr >= 201304L
> +extern const struct _cpp_trigraph_map_s {
> +  unsigned char map[UCHAR_MAX + 1];
> +  constexpr _cpp_trigraph_map_s ();
> +} _cpp_trigraph_map_d;
> +#define _cpp_trigraph_map _cpp_trigraph_map_d.map
>  #else
>  extern unsigned char _cpp_trigraph_map[UCHAR_MAX + 1];
>  #endif
> --- libcpp/init.cc.jj 2024-09-13 16:09:32.701455021 +0200
> +++ libcpp/init.cc2024-09-18 09:49:43.671189585 +0200
> @@ -41,8 +41,8 @@ static void read_original_directory (cpp
>  static void post_options (cpp_reader *);
>  
>  /* If we have designated initializers (GCC >2.7) these tables can be
> -   initialized, constant data.  Otherwise, they have to be filled in at
> -   runtime.  */
> +   initialized, constant data.  Similarly for C++14 and later.
> +   Otherwise, they have to be filled in at runtime.  */
>  #if HAVE_DESIGNATED_INITIALIZERS
>  
>  #define init_trigraph_map()  /* Nothing.  */
> @@ -52,6 +52,15 @@ __extension__ const uchar _cpp_trigraph_
>  #define END };
>  #define s(p, v) [p] = v,
>  
> +#elif __cpp_constexpr >= 201304L
> +
> +#define init_trigraph_map()  /* Nothing.  */
> +#define TRIGRAPH_MAP \
> +constexpr _cpp_trigraph_map_s::_cpp_trigraph_map_s () : map {} {
> +#define END } \
> +constexpr _cpp_trigraph_map_s _cpp_trigraph_map_d;
> +#define s(p, v) map[p] = v;
> +

So with this we generate:

constexpr _cpp_trigraph_map_s::_cpp_trigraph_map_s () : map {} {
  map['='] = '#'; map[')'] = ']'; map['!'] = '|';
  map['('] = '['; map['\''] = '^'; map['>'] = '}';
  map['/'] = '\\'; map['<'] = '{'; map['-'] = '~';
} constexpr _cpp_trigraph_map_s _cpp_trigraph_map_d;

That makes sense to me.  The patch is OK.

Marek



Re: [PATCH] expr: Don't clear whole unions [PR116416]

2024-10-07 Thread Marek Polacek
On Wed, Oct 02, 2024 at 05:52:13PM -0400, Jason Merrill wrote:
> On 10/2/24 3:20 PM, Marek Polacek wrote:
> > On Sat, Sep 28, 2024 at 08:39:12AM +0200, Jakub Jelinek wrote:
> > > On Fri, Sep 27, 2024 at 04:01:33PM +0200, Jakub Jelinek wrote:
> > > > So, I think we should go with (but so far completely untested except
> > > > for pr78687.C which is optimized with Marek's patch and the above 
> > > > testcase
> > > > which doesn't have the clearing anymore) the following patch.
> > > 
> > > That patch had a bug in type_has_padding_at_level_p and so it didn't
> > > bootstrap.
> > > 
> > > Here is a full patch which does.
> > 
> > [...]
> > 
> > And here's my patch, bootstrapped/regtested on x86_64-pc-linux-gnu
> > on top of Jakub's patch, ok for trunk once the prerequisite is in?
> > 
> > -- >8 --
> > This PR reports a missed optimization.  When we have:
> > 
> >Str str{"Test"};
> >callback(str);
> > 
> > as in the test, we're able to evaluate the Str::Str() call at compile
> > time.  But when we have:
> > 
> >callback(Str{"Test"});
> > 
> > we are not.  With this patch (in fact, it's Patrick's patch with a little
> > tweak), we turn
> > 
> >callback (TARGET_EXPR  >  5
> >  __ct_comp
> >  D.2890
> >  (struct Str *) <<< Unknown tree: void_cst >>>
> >  (const char *) "Test" )
> > 
> > into
> > 
> >callback (TARGET_EXPR )
> > 
> > I explored the idea of calling maybe_constant_value for the whole
> > TARGET_EXPR in cp_fold.  That has three problems:
> > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> >result is also a TARGET_EXPR;
> > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> >Things happen;
> > - getting a new slot is also problematic.  I've seen a test where we
> >had "TARGET_EXPR, D.2680", and folding the whole TARGET_EXPR
> >would get us "TARGET_EXPR", but since we don't see the outer
> >D.2680, we can't replace it with D.2681, and things break.
> > 
> > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > 
> > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant 
> > store: .*.a = {}"
> > is easy.  Previously, we would call C::C, so .gimple has:
> > 
> >D.2590 = {};
> >C::C (&D.2590);
> >D.2597 = D.2590;
> >return D.2597;
> > 
> > Then .einline inlines the C::C call:
> > 
> >D.2590 = {};
> >D.2590.a = {}; // #1
> >D.2590.b = 0;  // #2
> >D.2597 = D.2590;
> >D.2590 ={v} {CLOBBER(eos)};
> >return D.2597;
> > 
> > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > passes.  But with the patch, .gimple won't have that C::C call, so the
> > IL is of course going to look different.  The .optimized dump looks the
> > same though so there's no problem.
> > 
> > pr78687.C was fixed by Jakub's categorize_ctor_elements_1 patch.
> > 
> > PR c++/116416
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (cp_fold_r) : Try to fold
> > TARGET_EXPR_INITIAL and replace it with the folded result if
> > it's TREE_CONSTANT.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > * g++.dg/tree-ssa/pr90883.C: Adjust dg-final.
> > * g++.dg/cpp0x/constexpr-prvalue1.C: New test.
> > * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > 
> > Co-authored-by: Patrick Palka 
> > ---
> >   gcc/cp/cp-gimplify.cc | 10 +++
> >   gcc/testsuite/g++.dg/analyzer/pr97116.C   |  2 +-
> >   .../g++.dg/cpp0x/constexpr-prvalue1.C | 24 +++
> >   .../g++.dg/cpp1y/constexpr-prvalue1.C | 30 +++
> >   gcc/testsuite/g++.dg/tree-ssa/pr90883.C   |  4 +--
> >   5 files changed, 67 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue1.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index 003e68f1ea7..c63fdf3edd1 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > +++ b/gcc/cp/cp-gimplify.cc
> > @@ -1473,6 +1473,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
> > *data_)
> >  that case, strip it in favor of this one.  */
> > if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > {
> > + if ((data->flags & ff_genericize)
> 
> Why only with ff_genericize?

No reason AFAICT.  Dropped.
 
> > + && !flag_no_inline)
> > +   {
> > + tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
> > + if (folded != init && TREE_CONSTANT (folded))
> > +   {
> > + init = folded;
> > + break;
> 
> Are you sure we never need the TARGET_EXPR_CLEANUP walk in this case?

No.
 
> Maybe move the TARGET_EXPR_CLEANUP walk and the *walk_subtrees = 0 before
> this new code?  And the "folding might replace" comment down to the
> tree_code == target_expr block?

Like this?

Bootstrapped/regtested on x86

Re: [PATCH 2/3] Release expanded template argument vector

2024-10-07 Thread Patrick Palka
On Mon, 7 Oct 2024, Jason Merrill wrote:

> On 10/7/24 10:26 AM, Patrick Palka wrote:
> > On Mon, 7 Oct 2024, Jason Merrill wrote:
> > 
> > > On 10/7/24 9:58 AM, Patrick Palka wrote:
> > > > On Sat, 5 Oct 2024, Jason Merrill wrote:
> > > > 
> > > > > On 10/4/24 11:00 AM, Patrick Palka wrote:
> > > > > > On Thu, 3 Oct 2024, Jason Merrill wrote:
> > > > > > 
> > > > > > > On 10/3/24 12:38 PM, Jason Merrill wrote:
> > > > > > > > On 10/2/24 7:50 AM, Richard Biener wrote:
> > > > > > > > > This reduces peak memory usage by 20% for a specific testcase.
> > > > > > > > > 
> > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > > > > 
> > > > > > > > > It's very ugly so I'd appreciate suggestions on how to handle
> > > > > > > > > such
> > > > > > > > > situations better?
> > > > > > > > 
> > > > > > > > I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.
> > > > > > > 
> > > > > > > OK, apparently that was both too clever and not clever enough.
> > > > > > > Replacing
> > > > > > > it
> > > > > > > with this one that's much closer to yours.
> > > > > > > 
> > > > > > > Jason
> > > > > > 
> > > > > > > From: Jason Merrill 
> > > > > > > Date: Thu, 3 Oct 2024 16:31:00 -0400
> > > > > > > Subject: [PATCH] c++: free garbage vec in coerce_template_parms
> > > > > > > To: gcc-patches@gcc.gnu.org
> > > > > > > 
> > > > > > > coerce_template_parms can create two different vecs for the inner
> > > > > > > template
> > > > > > > arguments, new_inner_args and (potentially) the result of
> > > > > > > expand_template_argument_pack.  One or the other, or possibly
> > > > > > > both,
> > > > > > > end up
> > > > > > > being garbage: in the typical case, the expanded vec is garbage
> > > > > > > because
> > > > > > > it's
> > > > > > > only used as the source for convert_template_argument.  In some
> > > > > > > dependent
> > > > > > > cases, the new vec is garbage because we decide to return the
> > > > > > > original
> > > > > > > args
> > > > > > > instead.  In these cases, ggc_free the garbage vec to reduce the
> > > > > > > memory
> > > > > > > overhead of overload resolution.
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > >   * pt.cc (coerce_template_parms): Free garbage vecs.
> > > > > > > 
> > > > > > > Co-authored-by: Richard Biener 
> > > > > > > ---
> > > > > > > gcc/cp/pt.cc | 10 +-
> > > > > > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > > > > index 20affcd65a2..4ceae1d38de 100644
> > > > > > > --- a/gcc/cp/pt.cc
> > > > > > > +++ b/gcc/cp/pt.cc
> > > > > > > @@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
> > > > > > >   {
> > > > > > > /* We don't know how many args we have yet, just
> > > > > > > use the
> > > > > > >unconverted (and still packed) ones for now.
> > > > > > > */
> > > > > > > +   ggc_free (new_inner_args);
> > > > > > > new_inner_args = orig_inner_args;
> > > > > > > arg_idx = nargs;
> > > > > > > break;
> > > > > > > @@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
> > > > > > > = make_pack_expansion (conv, complain);
> > > > > > > /* We don't know how many args we have yet,
> > > > > > > just
> > > > > > > - use the unconverted ones for now.  */
> > > > > > > +  use the unconverted (but unpacked) ones for now.  */
> > > > > > > +   ggc_free (new_inner_args);
> > > > > > 
> > > > > > I'm a bit worried about these ggc_frees.  If an earlier template
> > > > > > parameter is a constrained auto NTTP then new_inner_args/new_args
> > > > > > could
> > > > > > have been captured by the satisfaction cache during coercion for
> > > > > > that
> > > > > > argument, and so we'd be freeing a vector that's still live?
> > > > > 
> > > > > It seems like for e.g.
> > > > > 
> > > > > template  concept NotInt = !__is_same (T, int);
> > > > > template  struct A { };
> > > > > template  using B = A<'x', Ts...>;
> > > > > 
> > > > > we don't check satisfaction until after we're done coercing, because
> > > > > of
> > > > > 
> > > > > if (processing_template_decl && context == adc_unify)
> > > > >   /* Constraints will be checked after deduction.  */;
> > > > > 
> > > > > in do_auto_deduction.
> > > > 
> > > > Ah, I wonder why we pass/use adc_unify from both unify and
> > > > convert_template_argument..  That early exit makes sense for unify
> > > > but not for convert_template_argument since it prevents us from
> > > > checking constrained auto NTTPs during ahead of time coercion:
> > > > 
> > > > template
> > > > concept C = T::value;
> > > > 
> > > > template
> > > > struct A { };
> > > > 
> > > > template
> > > > void f() {
> > > >   A<0> a; // no constraint error
> > > > }
> > > > 
> > > > A<0> a; // constraint error
> > > 

Re: [PATCH] testsuite: Define missing and use ET for arm_arch_* and arm_cpu_*

2024-10-07 Thread Torbjorn SVENSSON

Hi Richard,

On 2024-10-07 12:45, Richard Earnshaw (lists) wrote:

On 07/10/2024 09:03, Torbjörn SVENSSON wrote:

Ok for trunk?

--

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.


The acronym ET isn't one I recognize - I'm guessing you intend it to be 
Effective Target, rather than Extra Terrestrial, or Elf Target or some other 
expansion?  I think perhaps it would be better to avoid this in the commit log. 
 Your summary line is also a little imprecise as I suspect we will have more 
patches of a similar nature for some other patches soon.  Something like:

testsuite: arm: use effective-target for vsel* and mod* tests

would be closer


I'm fairly certain that I've seen the abbr ET for effective-target 
somewhere, but I could be wrong. Anyway, I've used your suggestion and 
will push it as soon as I get a comment on my questions below.






gcc/testsuite/ChangeLog

* gcc.target/arm/pr65647.c: Use ET arm_arch_v6m.
* gcc.target/arm/mod_2.c: Use ET arm_cpu_cortex_a57.
* gcc.target/arm/mod_256.c: Likewise.
* gcc.target/arm/vseleqdf.c: Likewise.
* gcc.target/arm/vseleqsf.c: Likewise.
* gcc.target/arm/vselgedf.c: Likewise.
* gcc.target/arm/vselgesf.c: Likewise.
* gcc.target/arm/vselgtdf.c: Likewise.
* gcc.target/arm/vselgtsf.c: Likewise.
* gcc.target/arm/vselledf.c: Likewise.
* gcc.target/arm/vsellesf.c: Likewise.
* gcc.target/arm/vselltdf.c: Likewise.
* gcc.target/arm/vselltsf.c: Likewise.
* gcc.target/arm/vselnedf.c: Likewise.
* gcc.target/arm/vselnesf.c: Likewise.
* gcc.target/arm/vselvcdf.c: Likewise.
* gcc.target/arm/vselvcsf.c: Likewise.
* gcc.target/arm/vselvsdf.c: Likewise.
* gcc.target/arm/vselvssf.c: Likewise.
* lib/target-supports.exp: Define EF arm_cpu_cortex_a57.  Update ET

   ^^
Typo for ET?


Yes :S



The body of the patch is OK with an updated commit message.

Thanks.
R.



arm_v8_1_lob_ok to use -mcpu=unset.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.target/arm/mod_2.c| 4 +++-
  gcc/testsuite/gcc.target/arm/mod_256.c  | 4 +++-
  gcc/testsuite/gcc.target/arm/pr65647.c  | 3 ++-
  gcc/testsuite/gcc.target/arm/vseleqdf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vseleqsf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselgedf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselgesf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselgtdf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselgtsf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselledf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vsellesf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselltdf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselltsf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselnedf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselnesf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselvcdf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselvcsf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselvsdf.c | 5 +++--
  gcc/testsuite/gcc.target/arm/vselvssf.c | 5 +++--
  gcc/testsuite/lib/target-supports.exp   | 3 ++-
  20 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c 
b/gcc/testsuite/gcc.target/arm/mod_2.c
index 1143725d59a..3a203b67d73 100644
--- a/gcc/testsuite/gcc.target/arm/mod_2.c
+++ b/gcc/testsuite/gcc.target/arm/mod_2.c
@@ -1,7 +1,9 @@
  /* { dg-do compile } */
  /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
"-mpure-code" } } */
  /* { dg-require-effective-target arm32 } */
-/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+/* { dg-require-effective-target arm_cpu_cortex_a57 } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-add-options arm_cpu_cortex_a57 } */
  
  #include "../aarch64/mod_2.x"
  
diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c b/gcc/testsuite/gcc.target/arm/mod_256.c

index d8dca0fe7d5..3521d7a05f3 100644
--- a/gcc/testsuite/gcc.target/arm/mod_256.c
+++ b/gcc/testsuite/gcc.target/arm/mod_256.c
@@ -1,7 +1,9 @@
  /* { dg-do compile } */
  /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
"-mpure-code" } } */
  /* { dg-require-effective-target arm32 } */
-/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+/* { dg-require-effective-target arm_cpu_cortex_a57 } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-add-options arm_cpu_cortex_a57 } */
  
  #include "../aarch64/mod_256.x"
  
diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c b/gcc/testsuite/gcc.target/arm/pr65647.c

index 26b4e399f6b..dc3a3ca1184 100644
--- a/gcc/testsuite/gcc.target/arm/pr65647.c
+++ b/gcc/testsuite/gcc.target/arm/pr65647.c
@@ -1,7 +1,8 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_arch_v6m_ok } */
  /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
{"-mfloat-abi=soft" } } */
-/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=s

Re: [PATCH v2] diagnostics: Fix compile error for MinGW <7.0

2024-10-07 Thread Jonathan Yong

On 9/28/24 12:49, Torbjörn SVENSSON wrote:

Ok for trunk?

Changes since v1:

- Updated the commit message to mention the actual build error.
- Switch to checking the required define rather than the version number of 
MinGW.



Patch looks OK to me.
Thanks for the reminder.



[PATCH 1/1] PowerPC vector pair support

2024-10-07 Thread Michael Meissner
See the previous post for a longer explanation of the motavations for this
patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664694.html

This patch adds a new include file (vector-pair.h) that implements a series of
functions that allows people implementing high performance libraries to
optimize their code to use the vector pair load/store instructions on power10
computers to enhance the memory bandwidth.

I have tested this on both big endian and little endian servers.  Can I check
this into the GCC trunk?

2024-10-07  Michael Meissner  

gcc/

* config.gcc (powerpc*-*-*): Add vector-pair.h to extra headers.
* config/rs6000/vector-pair.h: New file.
* doc/extend.texi (PowerPC Vector Pair Support): Document the vector
pair support functions.

gcc/testsuite/

* gcc.target/powerpc/vpair-1.c: New test or include file.
* gcc.target/powerpc/vpair-2.c: Likewise.
* gcc.target/powerpc/vpair-3-not-p10.c: Likewise.
* gcc.target/powerpc/vpair-3-p10.c: Likewise.
* gcc.target/powerpc/vpair-3.h: Likewise.
* gcc.target/powerpc/vpair-4-not-p10.c: Likewise.
* gcc.target/powerpc/vpair-4-p10.c: Likewise.
* gcc.target/powerpc/vpair-4.h: Likewise.
---
 gcc/config.gcc|   2 +-
 gcc/config/rs6000/rs6000-c.cc |   8 +-
 gcc/config/rs6000/vector-pair.h   | 519 ++
 gcc/doc/extend.texi   |  98 
 gcc/testsuite/gcc.target/powerpc/vpair-1.c| 141 +
 gcc/testsuite/gcc.target/powerpc/vpair-2.c| 141 +
 .../gcc.target/powerpc/vpair-3-not-p10.c  |  15 +
 .../gcc.target/powerpc/vpair-3-p10.c  |  14 +
 gcc/testsuite/gcc.target/powerpc/vpair-3.h| 435 +++
 .../gcc.target/powerpc/vpair-4-not-p10.c  |  15 +
 .../gcc.target/powerpc/vpair-4-p10.c  |  14 +
 gcc/testsuite/gcc.target/powerpc/vpair-4.h| 435 +++
 12 files changed, 1834 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/rs6000/vector-pair.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-3-not-p10.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-3-p10.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-4-not-p10.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-4-p10.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vpair-4.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 0b794e977f6..3627bed8b86 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -537,7 +537,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
extra_headers="${extra_headers} nmmintrin.h immintrin.h x86gprintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
-   extra_headers="${extra_headers} amo.h"
+   extra_headers="${extra_headers} amo.h vector-pair.h"
case x$with_cpu in

xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500|xfuture)
cpu_is_64bit=yes
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 82826f96a8e..77bee8fc878 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -590,9 +590,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
   if (rs6000_cpu == PROCESSOR_CELL)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
-  /* Tell the user if we support the MMA instructions.  */
+  /* Tell the user if we support the MMA instructions.  Also tell vector-pair.h
+ that we have the vector pair built-in function support.  */
   if ((flags & OPTION_MASK_MMA) != 0)
-rs6000_define_or_undefine_macro (define_p, "__MMA__");
+{
+  rs6000_define_or_undefine_macro (define_p, "__MMA__");
+  rs6000_define_or_undefine_macro (define_p, "__VPAIR__");
+}
   /* Whether pc-relative code is being generated.  */
   if ((flags & OPTION_MASK_PCREL) != 0)
 rs6000_define_or_undefine_macro (define_p, "__PCREL__");
diff --git a/gcc/config/rs6000/vector-pair.h b/gcc/config/rs6000/vector-pair.h
new file mode 100644
index 000..ceb28c4e974
--- /dev/null
+++ b/gcc/config/rs6000/vector-pair.h
@@ -0,0 +1,519 @@
+/* PowerPC vector pair include file.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez (al...@redhat.com).
+   Rewritten by Paolo Bonzini (bonz...@gnu.org).
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the h

Re: [PR middle-end/114635] Set OMP safelen handling to INT_MAX when the pragma didn’t provide one.

2024-10-07 Thread Kugan Vivekanandarajah
ping?

Thanks,
Kugan

From: Kugan Vivekanandarajah 
Sent: Tuesday, 20 August 2024 6:18 PM
To: Jakub Jelinek 
Cc: gcc-patches@gcc.gnu.org ; 
richard.guent...@gmail.com ; 
richard.sandif...@arm.com 
Subject: Re: [PR middle-end/114635] Set OMP safelen handling to INT_MAX when 
the pragma didn’t provide one.

External email: Use caution opening links or attachments

ping? Any feedback.

Thanks,
Kugan


From: Kugan Vivekanandarajah 
Sent: Monday, 5 August 2024 3:05 PM
To: Jakub Jelinek 
Cc: gcc-patches@gcc.gnu.org ; 
richard.guent...@gmail.com ; 
richard.sandif...@arm.com 
Subject: Re: [PR middle-end/114635] Set OMP safelen handling to INT_MAX when 
the pragma didn’t provide one.



> On 15 Jul 2024, at 5:18 pm, Jakub Jelinek  wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Mon, Jul 15, 2024 at 12:39:22AM +, Kugan Vivekanandarajah wrote:
>> OMP safelen handling is assigning backend provided max as an int even when 
>> the pragma didn’t provide one. As a result, vectoriser is rejecting SVE 
>> modes while comparing poly_int with the safelen.
>>
>> That is, for the attached test case,  omp_max_vf gets [16, 16] from the 
>> backend. This then becomes 16 as omp safelen is an integer. When vectoriser 
>> compares the potential vector mode with  maybe_lt (max_vf, min_vf)) , this 
>> would fail resulting in any SVE vector mode being  selected.
>>
>> One suggestion there was to set safelen to INT_MAX when OMP pragma does not 
>> provide safely explicitly.
>
> This is wrong.  The reason why safelen is set to that sctx.max_vf is that if
> there are any "omp simd array" arrays, sctx.max_vf is their size.
> The code you're touching has a comment that explains it even:
>  /* If max_vf is non-zero, then we can use only a vectorization factor
> up to the max_vf we chose.  So stick it into the safelen clause.  */
>  if (maybe_ne (sctx.max_vf, 0U))
>
> If sctx.max_vf is 0, there were no "omp simd array" arrays emitted and so
> OMP_CLAUSE_SAFELEN isn't set.
> The vectorizer can only shrink the arrays, not grow them and that is why
> there is this limit.
>
> Now, I think even SVE has a limit, which is not a scalar constant but
> poly_int, so I think in that case you need to arrange for the size of the
> arrays to be POLY_INT_CST as well and use that as a limit.
> Now, the clause argument itself at least in the OpenMP standard needs to be an
> integer constant (if provided), because the proposals to extend it for the
> SVE-like arches (aarch64, RISC-V) have not been voted in I believe.
> So, there is a question what to do if user specifies safelen (32) or
> something similar.
> But if the user didn't specify it (so it is effectively infinitity), then
> yes, it might be ok to set it to some POLY_INT_CST representing the sizes of
> the arrays and tweak the loop safelen so that it can represent those.

Thanks for the explanation. Does that mean:
1. We change loop->safelen to poly_int
2. Modify the apply_safelen  to account for the poly_int.

I am attaching an RFC patch for your reference.
Thanks,
Kugan



Signed-off-by: Kugan Vivekanandarajah 

>
>>PR middle-end/114635
>>PR 114635
>>
>> gcc/ChangeLog:
>>
>>* omp-low.cc (lower_rec_input_clauses): Set INT_MAX
>>when safelen is not provided instead of using backend
>>provided safelen.
>>
>> gcc/testsuite/ChangeLog:
>>
>>* c-c++-common/pr114635-1.cpp: New test.
>>* c-c++-common/pr114635-2.cpp: New test.
>>
>> Signed-off-by: Kugan Vivekanandarajah 
>
>Jakub




[PATCH v1 2/3] RISC-V: Add testcases for form 3 of scalar signed SAT_SUB

2024-10-07 Thread pan2 . li
From: Pan Li 

Form 3:
  #define DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_sub_##T##_fmt_3 (T x, T y) \
  {\
T minus;   \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
return overflow ? x < 0 ? MIN : MAX : minus;   \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_sub-3-i16.c: New test.
* gcc.target/riscv/sat_s_sub-3-i32.c: New test.
* gcc.target/riscv/sat_s_sub-3-i64.c: New test.
* gcc.target/riscv/sat_s_sub-3-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 +
 .../gcc.target/riscv/sat_s_sub-3-i16.c| 30 +++
 .../gcc.target/riscv/sat_s_sub-3-i32.c| 28 +
 .../gcc.target/riscv/sat_s_sub-3-i64.c| 27 +
 .../gcc.target/riscv/sat_s_sub-3-i8.c | 28 +
 .../gcc.target/riscv/sat_s_sub-run-3-i16.c| 16 ++
 .../gcc.target/riscv/sat_s_sub-run-3-i32.c| 16 ++
 .../gcc.target/riscv/sat_s_sub-run-3-i64.c| 16 ++
 .../gcc.target/riscv/sat_s_sub-run-3-i8.c | 16 ++
 9 files changed, 191 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 66d393399a2..fd3879d31c5 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -379,12 +379,26 @@ sat_s_sub_##T##_fmt_2 (T x, T y) \
 #define DEF_SAT_S_SUB_FMT_2_WRAP(T, UT, MIN, MAX) \
   DEF_SAT_S_SUB_FMT_2(T, UT, MIN, MAX)
 
+#define DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \
+T __attribute__((noinline))  \
+sat_s_sub_##T##_fmt_3 (T x, T y) \
+{\
+  T minus;   \
+  bool overflow = __builtin_sub_overflow (x, y, &minus); \
+  return overflow ? x < 0 ? MIN : MAX : minus;   \
+}
+#define DEF_SAT_S_SUB_FMT_3_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX)
+
 #define RUN_SAT_S_SUB_FMT_1(T, x, y) sat_s_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_S_SUB_FMT_1_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_1(T, x, y)
 
 #define RUN_SAT_S_SUB_FMT_2(T, x, y) sat_s_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_S_SUB_FMT_2_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_2(T, x, y)
 
+#define RUN_SAT_S_SUB_FMT_3(T, x, y) sat_s_sub_##T##_fmt_3(x, y)
+#define RUN_SAT_S_SUB_FMT_3_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_3(T, x, y)
+
 
/**/
 /* Saturation Truncate (unsigned and signed)  
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c
new file mode 100644
index 000..5a1368b11a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_sub_int16_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** li\s+[atx][0-9]+,\s*32768
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** xor\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** and\s+[atx][0-9]+,\

[PATCH v1 3/3] RISC-V: Add testcases for form 4 of scalar signed SAT_SUB

2024-10-07 Thread pan2 . li
From: Pan Li 

Form 4:
  #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_sub_##T##_fmt_4 (T x, T y)   \
  {  \
T minus;   \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
return !overflow ? minus : x < 0 ? MIN : MAX;  \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_sub-4-i16.c: New test.
* gcc.target/riscv/sat_s_sub-4-i32.c: New test.
* gcc.target/riscv/sat_s_sub-4-i64.c: New test.
* gcc.target/riscv/sat_s_sub-4-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 +
 .../gcc.target/riscv/sat_s_sub-4-i16.c| 30 +++
 .../gcc.target/riscv/sat_s_sub-4-i32.c| 28 +
 .../gcc.target/riscv/sat_s_sub-4-i64.c| 27 +
 .../gcc.target/riscv/sat_s_sub-4-i8.c | 28 +
 .../gcc.target/riscv/sat_s_sub-run-4-i16.c| 16 ++
 .../gcc.target/riscv/sat_s_sub-run-4-i32.c| 16 ++
 .../gcc.target/riscv/sat_s_sub-run-4-i64.c| 16 ++
 .../gcc.target/riscv/sat_s_sub-run-4-i8.c | 16 ++
 9 files changed, 191 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index fd3879d31c5..7c3859cc183 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -390,6 +390,17 @@ sat_s_sub_##T##_fmt_3 (T x, T y) \
 #define DEF_SAT_S_SUB_FMT_3_WRAP(T, UT, MIN, MAX) \
   DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX)
 
+#define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)   \
+T __attribute__((noinline))\
+sat_s_sub_##T##_fmt_4 (T x, T y)   \
+{  \
+  T minus;   \
+  bool overflow = __builtin_sub_overflow (x, y, &minus); \
+  return !overflow ? minus : x < 0 ? MIN : MAX;  \
+}
+#define DEF_SAT_S_SUB_FMT_4_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)
+
 #define RUN_SAT_S_SUB_FMT_1(T, x, y) sat_s_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_S_SUB_FMT_1_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_1(T, x, y)
 
@@ -399,6 +410,9 @@ sat_s_sub_##T##_fmt_3 (T x, T y) \
 #define RUN_SAT_S_SUB_FMT_3(T, x, y) sat_s_sub_##T##_fmt_3(x, y)
 #define RUN_SAT_S_SUB_FMT_3_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_3(T, x, y)
 
+#define RUN_SAT_S_SUB_FMT_4(T, x, y) sat_s_sub_##T##_fmt_4(x, y)
+#define RUN_SAT_S_SUB_FMT_4_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_4(T, x, y)
+
 
/**/
 /* Saturation Truncate (unsigned and signed)  
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c
new file mode 100644
index 000..60c22e25eb8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_sub_int16_t_fmt_4:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** li\s+[atx][0-9]+,\s*32768
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** xor\s+[atx][0-9]+,\s*[atx][0-9]+,\

[PATCH v1 1/3] Match: Support form 3 and form 4 for scalar signed integer SAT_SUB

2024-10-07 Thread pan2 . li
From: Pan Li 

This patch would like to support the form 3 and form 4 of the scalar signed
integer SAT_SUB.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_3 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return overflow ? x < 0 ? MIN : MAX : sum;   \
  }

Form 4:
  #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_sub_##T##_fmt_4 (T x, T y) \
  {\
T minus;   \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
return !overflow ? minus : x < 0 ? MIN : MAX;  \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX);

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add case 3 matching pattern for signed SAT_SUB.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index ba83f0f29e6..d50b732bc86 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3388,6 +3388,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value))
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
 
+/* Signed saturation sub, case 3:
+   Z = .SUB_OVERFLOW (X, Y)
+   SAT_S_SUB = IMAGPART_EXPR (Z) != 0 ? (-(T)(X < 0) ^ MAX) : REALPART_EXPR 
(Z);
+
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+(match (signed_integer_sat_sub @0 @1)
+ (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop)))
+  max_value)
+   (realpart @2))
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
 (match (unsigned_integer_sat_trunc @0)
-- 
2.43.0



Re: [PATCH v1 3/3] RISC-V: Add testcases for form 4 of scalar signed SAT_SUB

2024-10-07 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-08 09:21
To: gcc-patches
CC: richard.guenther; Tamar.Christina; juzhe.zhong; kito.cheng; jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1 3/3] RISC-V: Add testcases for form 4 of scalar signed 
SAT_SUB
From: Pan Li 
 
Form 4:
  #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_sub_##T##_fmt_4 (T x, T y)   \
  {  \
T minus;   \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
return !overflow ? minus : x < 0 ? MIN : MAX;  \
  }
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_sub-4-i16.c: New test.
* gcc.target/riscv/sat_s_sub-4-i32.c: New test.
* gcc.target/riscv/sat_s_sub-4-i64.c: New test.
* gcc.target/riscv/sat_s_sub-4-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-4-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 +
.../gcc.target/riscv/sat_s_sub-4-i16.c| 30 +++
.../gcc.target/riscv/sat_s_sub-4-i32.c| 28 +
.../gcc.target/riscv/sat_s_sub-4-i64.c| 27 +
.../gcc.target/riscv/sat_s_sub-4-i8.c | 28 +
.../gcc.target/riscv/sat_s_sub-run-4-i16.c| 16 ++
.../gcc.target/riscv/sat_s_sub-run-4-i32.c| 16 ++
.../gcc.target/riscv/sat_s_sub-run-4-i64.c| 16 ++
.../gcc.target/riscv/sat_s_sub-run-4-i8.c | 16 ++
9 files changed, 191 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i64.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i64.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-4-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index fd3879d31c5..7c3859cc183 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -390,6 +390,17 @@ sat_s_sub_##T##_fmt_3 (T x, T y) \
#define DEF_SAT_S_SUB_FMT_3_WRAP(T, UT, MIN, MAX) \
   DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX)
+#define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)   \
+T __attribute__((noinline))\
+sat_s_sub_##T##_fmt_4 (T x, T y)   \
+{  \
+  T minus;   \
+  bool overflow = __builtin_sub_overflow (x, y, &minus); \
+  return !overflow ? minus : x < 0 ? MIN : MAX;  \
+}
+#define DEF_SAT_S_SUB_FMT_4_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)
+
#define RUN_SAT_S_SUB_FMT_1(T, x, y) sat_s_sub_##T##_fmt_1(x, y)
#define RUN_SAT_S_SUB_FMT_1_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_1(T, x, y)
@@ -399,6 +410,9 @@ sat_s_sub_##T##_fmt_3 (T x, T y) \
#define RUN_SAT_S_SUB_FMT_3(T, x, y) sat_s_sub_##T##_fmt_3(x, y)
#define RUN_SAT_S_SUB_FMT_3_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_3(T, x, y)
+#define RUN_SAT_S_SUB_FMT_4(T, x, y) sat_s_sub_##T##_fmt_4(x, y)
+#define RUN_SAT_S_SUB_FMT_4_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_4(T, x, y)
+
/**/
/* Saturation Truncate (unsigned and signed)  */
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c
new file mode 100644
index 000..60c22e25eb8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_sub-4-i16.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_sub_int16_t_fmt_4:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** andi\s+[atx][0-9]+,\s*[atx][0-9]

Re: [PATCH v1 2/3] RISC-V: Add testcases for form 3 of scalar signed SAT_SUB

2024-10-07 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-08 09:21
To: gcc-patches
CC: richard.guenther; Tamar.Christina; juzhe.zhong; kito.cheng; jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1 2/3] RISC-V: Add testcases for form 3 of scalar signed 
SAT_SUB
From: Pan Li 
 
Form 3:
  #define DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_sub_##T##_fmt_3 (T x, T y) \
  {\
T minus;   \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
return overflow ? x < 0 ? MIN : MAX : minus;   \
  }
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_sub-3-i16.c: New test.
* gcc.target/riscv/sat_s_sub-3-i32.c: New test.
* gcc.target/riscv/sat_s_sub-3-i64.c: New test.
* gcc.target/riscv/sat_s_sub-3-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-3-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 +
.../gcc.target/riscv/sat_s_sub-3-i16.c| 30 +++
.../gcc.target/riscv/sat_s_sub-3-i32.c| 28 +
.../gcc.target/riscv/sat_s_sub-3-i64.c| 27 +
.../gcc.target/riscv/sat_s_sub-3-i8.c | 28 +
.../gcc.target/riscv/sat_s_sub-run-3-i16.c| 16 ++
.../gcc.target/riscv/sat_s_sub-run-3-i32.c| 16 ++
.../gcc.target/riscv/sat_s_sub-run-3-i64.c| 16 ++
.../gcc.target/riscv/sat_s_sub-run-3-i8.c | 16 ++
9 files changed, 191 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i64.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i64.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-3-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 66d393399a2..fd3879d31c5 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -379,12 +379,26 @@ sat_s_sub_##T##_fmt_2 (T x, T y) \
#define DEF_SAT_S_SUB_FMT_2_WRAP(T, UT, MIN, MAX) \
   DEF_SAT_S_SUB_FMT_2(T, UT, MIN, MAX)
+#define DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \
+T __attribute__((noinline))  \
+sat_s_sub_##T##_fmt_3 (T x, T y) \
+{\
+  T minus;   \
+  bool overflow = __builtin_sub_overflow (x, y, &minus); \
+  return overflow ? x < 0 ? MIN : MAX : minus;   \
+}
+#define DEF_SAT_S_SUB_FMT_3_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX)
+
#define RUN_SAT_S_SUB_FMT_1(T, x, y) sat_s_sub_##T##_fmt_1(x, y)
#define RUN_SAT_S_SUB_FMT_1_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_1(T, x, y)
#define RUN_SAT_S_SUB_FMT_2(T, x, y) sat_s_sub_##T##_fmt_2(x, y)
#define RUN_SAT_S_SUB_FMT_2_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_2(T, x, y)
+#define RUN_SAT_S_SUB_FMT_3(T, x, y) sat_s_sub_##T##_fmt_3(x, y)
+#define RUN_SAT_S_SUB_FMT_3_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_3(T, x, y)
+
/**/
/* Saturation Truncate (unsigned and signed)  */
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c
new file mode 100644
index 000..5a1368b11a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_sub-3-i16.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_sub_int16_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** li\s+[atx][0-9]+,\s

Re:[pushed] [PATCH] LoongArch: Add support to annotate tablejump

2024-10-07 Thread Lulu Cheng

Pushed to r15-4130.

在 2024/7/11 下午7:43, Xi Ruoyao 写道:

This is per the request from the kernel developers.  For generating the
ORC unwind info, the objtool program needs to analysis the control flow
of a .o file.  If a jump table is used, objtool has to correlate the
jump instruction with the table.

On x86 (where objtool was initially developed) it's simple: a relocation
entry natrually correlates them because one single instruction is used
for table-based jump.  But on an RISC machine objtool would have to
reconstruct the data flow if it must find out the correlation on its
own.

So, emit an additional section to store the correlation info as pairs of
addresses, each pair contains the address of a jump instruction (jr) and
the address of the jump table.  This is very trivial to implement in
GCC.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in
(mannotate-tablejump): New option.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.md (tablejump): Emit
additional correlation info between the jump instruction and the
jump table, if -mannotate-tablejump.
* doc/invoke.texi: Document -mannotate-tablejump.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/jump-table-annotate.c: New test.

Suggested-by: Tiezhu Yang 
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/genopts/loongarch.opt.in |  4 
  gcc/config/loongarch/loongarch.md | 12 +++-
  gcc/config/loongarch/loongarch.opt|  4 
  gcc/doc/invoke.texi   | 13 -
  .../gcc.target/loongarch/jump-table-annotate.c| 15 +++
  5 files changed, 46 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/jump-table-annotate.c

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index d00950cb4f4..d5bbf01d85e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -301,3 +301,7 @@ default value is 4.
  ; CPUCFG independently, so we use bit flags to specify them.
  TargetVariable
  HOST_WIDE_INT la_isa_evolution = 0
+
+mannotate-tablejump
+Target Mask(ANNOTATE_TABLEJUMP) Save
+Annotate table jump instruction (jr {reg}) to correlate it with the jump table.
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index b3cae49832e..6d9fdc257f8 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3548,12 +3548,22 @@ (define_expand "tablejump"
DONE;
  })
  
+(define_mode_attr mode_size [(DI "8") (SI "4")])

+
  (define_insn "@tablejump"
[(set (pc)
(match_operand:P 0 "register_operand" "e"))
 (use (label_ref (match_operand 1 "" "")))]
""
-  "jr\t%0"
+  {
+return TARGET_ANNOTATE_TABLEJUMP
+  ? "1:jr\t%0\n\t"
+   ".pushsection\t.discard.tablejump_annotate\n\t"
+   "\t.byte\t1b\n\t"
+   "\t.byte\t%1\n\t"
+   ".popsection"
+  : "jr\t%0";
+  }
[(set_attr "type" "jump")
 (set_attr "mode" "none")])
  
diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt

index 91cb5236ad8..6a396b539c4 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -310,6 +310,10 @@ default value is 4.
  TargetVariable
  HOST_WIDE_INT la_isa_evolution = 0
  
+mannotate-tablejump

+Target Mask(ANNOTATE_TABLEJUMP) Save
+Annotate table jump instruction (jr {reg}) to correlate it with the jump table
+
  mfrecipe
  Target Mask(ISA_FRECIPE) Var(la_isa_evolution)
  Support frecipe.{s/d} and frsqrte.{s/d} instructions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4d671c4f6d8..f27d2d6bb87 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1065,7 +1065,7 @@ Objective-C and Objective-C++ Dialects}.
  -mcmodel=@var{code-model} -mrelax -mpass-mrelax-to-as
  -mrecip  -mrecip=@var{opt} -mfrecipe -mno-frecipe -mdiv32 -mno-div32
  -mlam-bh -mno-lam-bh -mlamcas -mno-lamcas -mld-seq-sa -mno-ld-seq-sa
--mtls-dialect=@var{opt}}
+-mtls-dialect=@var{opt} -mannotate-tablejump -mno-annotate-tablejump}
  
  @emph{M32R/D Options}

  @gccoptlist{-m32r2  -m32rx  -m32r
@@ -27352,6 +27352,17 @@ Whether a load-load barrier (@code{dbar 0x700}) is 
needed.  When build with
  This option controls which tls dialect may be used for general dynamic and
  local dynamic TLS models.
  
+@opindex mannotate-tablejump

+@opindex mno-annotate-tablejump
+@item -mannotate-tablejump
+@itemx -mno-annotate-tablejump
+Create an annotation section @code{.discard.tablejump_annotate} to
+correlate the @code{jirl} instruction and the jump table when a jump
+table is used to optimize the @code{switch} statement.  Some external
+tools, for example @file{objtool} of the Linux kernel building system,
+need the annotation to analysis the control flow.  The defau

[to-be-committed][RISC-V][PR target/116615] RISC-V: Use default LOGICAL_OP_NON_SHORT_CIRCUIT

2024-10-07 Thread Jeff Law
Bah.  The pre-commit tester saw my previous message as just a comment on 
the thread rather than an update to Palmer's patch.  So this is a 
re-post so that the pre-commit tester picks up the new patch.  The patch 
itself is unchanged.


--

> We have cheap logical ops, so let's just move this back to the default
> to take advantage of the standard branch/op hueristics.
>
> gcc/ChangeLog:
>
> PR target/116615
> * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> ---
> There's a bunch more discussion in the bug, but it's starting to smell
> like this was just a holdover from MIPS (where maybe it also shouldn't
> be set).  I haven't tested this, but I figured I'd send the patch to get
> a little more visibility.
>
> I guess we should also kick off something like a SPEC run to make sure
> there's no regressions?
So as I noted earlier, this appears to be a nice win on the BPI. 
Testsuite fallout is minimal -- just the one SFB related test tripping 
at -Os that was also hit by Andrew P's work.


After looking at it more closely, the SFB codegen and the codegen after 
Andrew's work should be equivalent assuming two independent ops can 
dispatch together.


The test actually generates sensible code at -Os.  It's the -Os in 
combination with the -fno-ssa-phiopt that causes problems.   I think the 
best thing to do here is just skip at -Os.  That still keeps a degree of 
testing the SFB path.


Tested successfully in my tester.  But will wait for the pre-commit 
tester to render a verdict before moving forward.



Jeff
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 3aecb43f831..53b7b2a40ed 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -939,8 +939,6 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
 #define TARGET_VECTOR_MISALIGN_SUPPORTED \
riscv_vector_unaligned_access_p
 
-#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
-
 /* Control the assembler format that we output.  */
 
 /* Output to assembler file text saying following lines
diff --git a/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c 
b/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
index 6e9f8cc61de..1ee45b33e15 100644
--- a/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
+++ b/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" } } */
 /* { dg-options "-march=rv32gc -mtune=sifive-7-series -mbranch-cost=1 
-fno-ssa-phiopt -fdump-rtl-ce1" { target { rv32 } } } */
 /* { dg-options "-march=rv64gc -mtune=sifive-7-series -mbranch-cost=1 
-fno-ssa-phiopt -fdump-rtl-ce1" { target { rv64 } } } */
 


[committed] libgomp.texi: Update and cleanup of Impl. Status of OpenMP TR13

2024-10-07 Thread Tobias Burnus
Another update of 
https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Technical-Report-13.html


I made another pass through that list, comparing with the current OpenMP 
6 draft and found a couple of issues and improved the wording. I am 
sure, more word tweaking can be done and not unlikely I missed some 
changes that should be added (albeit those listed in Appendix B should 
now be all there).


Committed as r15-4126-ge2039386b82901

Tobias
commit e2039386b82901d2b7d78b2a27d2982aacbf46a4
Author: Tobias Burnus 
Date:   Mon Oct 7 23:13:29 2024 +0200

libgomp.texi: Update and cleanup of Impl. Status of OpenMP TR13

libgomp/ChangeLog:

* libgomp.texi (OpenMP Technical Report 13): Wording cleanup;
sort as in Appendix B; add missing items; remove duplicates.
---
 libgomp/libgomp.texi | 70 ++--
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index bad06e143dc..cc44efdd937 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -462,28 +462,33 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item Full support for Fortran 2023 was added @tab P @tab
 @item @code{_ALL} suffix to the device-scope environment variables
   @tab P @tab Host device number wrongly accepted
-@item @code{num_threads} now accepts a list @tab N @tab
+@item @code{num_threads} clause now accepts a list @tab N @tab
 @item Abstract names added for @code{OMP_NUM_THREADS},
   @code{OMP_THREAD_LIMIT} and @code{OMP_TEAMS_THREAD_LIMIT}
   @tab N @tab
 @item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
 @item Extension of @code{OMP_DEFAULT_DEVICE} and new
   @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
+@item New @code{uid} trait for target devices and for
+  @code{OMP_AVAILABLE_DEVICES} and @code{OMP_DEFAULT_DEVICE} @tab N @tab
 @item New @code{OMP_THREADS_RESERVE} environment variable @tab N @tab
 @item The @code{decl} attribute was added to the C++ attribute syntax
   @tab Y @tab
-@item The OpenMP directive syntax was extended to include C 23 attribute
+@item The OpenMP directive syntax was extended to include C23 attribute
   specifiers @tab Y @tab
 @item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab
 @item All inarguable clauses take now an optional Boolean argument @tab N @tab
 @item The @code{adjust_args} clause was extended to specify the argument by position
+  and supports variadic arguments @tab N @tab
 @item For Fortran, @emph{locator list} can be also function reference with
   data pointer result @tab N @tab
 @item Concept of @emph{assumed-size arrays} in C and C++
   @tab N @tab
 @item @emph{directive-name-modifier} accepted in all clauses @tab N @tab
-@item Argument-free version of @code{depobj} including added @code{init} clause
-  @tab N @tab
+@item Extension of @code{interop} operation of @code{append_args}, allowing
+  all modifiers of the @code{init} clause @tab N @tab
+@item New argument-free version of @code{depobj} with repeatable clauses and
+  the @code{init} clause @tab N @tab
 @item Undeprecate omitting the argument to the @code{depend} clause of
   the argument version of the @code{depend} construct @tab Y @tab
 @item For Fortran, atomic with BLOCK construct and, for C/C++, with
@@ -492,19 +497,20 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item For Fortran, atomic with enum and enumeration types @tab N @tab
 @item For Fortran, atomic compare with storing the comparison result
   @tab N @tab
-@item New @code{looprange} clause @tab N @tab
+@item Canonical loop sequences and new @code{looprange} clause @tab N @tab
 @item For Fortran, handling polymorphic types in data-sharing-attribute
   clauses @tab P @tab @code{private} not supported
 @item For Fortran, rejecting polymorphic types in data-mapping clauses
   @tab N @tab not diagnosed (and mostly unsupported)
 @item New @code{taskgraph} construct including @code{saved} modifier and
   @code{replayable} clause @tab N @tab
-@item @code{default} clause on the @code{target} directive @tab N @tab
-@item Ref-count change for @code{use_device_ptr} and @code{use_device_addr}
-  @tab N @tab
+@item @code{default} clause on the @code{target} directive and accepting
+  variable categories @tab N @tab
+@item Semantic change regarding the reference count update with
+  @code{use_device_ptr} and @code{use_device_addr} @tab N @tab
 @item Support for inductions @tab N @tab
-@item Deprecation of the combiner expression in the @code{declare_reduction}
-  argument @tab N @tab
+@item Reduction over private variables with @code{reduction} clause
+  @tab N @tab
 @item Implicit reduction identifiers of C++ classes
   @tab N @tab
 @item New @code{init_complete} clause to the @code{scan} directive
@@ -512,8 +518,6 @@ Technical Report (TR) 13 is 

[PATCH 0/1] PowerPC vector pair support

2024-10-07 Thread Michael Meissner
I will post the actual patch in the next post.  This part gives the
justification for the patch adding vector-pair.h.

The patch as a followup to this post adds a new include file (vector-pair.h)
that adds support so that users writing high performance libraries can change
their code to allow the generation of the vector pair load and store
instructions on power10.

The intention is that if the library authors need to write special loops that
go over arrays that they could modify their code to use the functions provided
to change loops that can take advantage of the higher bandwidth for load vector
pair and store instructions.

This particular patch just adds a new include file (vector-pair.h) that
provides a bunch of functions that on a power10 system would use the vector
pair load operation, 2 floating point operations, and a vector pair store.  It
does not add any new types, modes, or built-in function.

I have additional patches that can add built-in functions that the functions in
vector-pair.h could utilize so that the compiler can optimize and combine
operations.  I may submit those patches in the future, but I would like to
provide this patch to allow the library writer to optimize their code.

I've measured the performance of these new functions on a power10.  For default
unrolling, the percentage of change for the 3 methods over the normal vector
loop method:

116%Vector-pair.h function, default unroll
 93%Vector pair split built-in & 2 vector stores, default unroll
 86%Vector pair split & combine built-ins, default unroll

Using explicit 2 way unrolling the numbers are:

114%Vector-pair.h function, unroll 2
106%Vector pair split built-in & 2 vector stores, unroll 2
 98%Vector pair split & combine built-ins, unroll 2

These new functions provided in vector-pair.h use the vector pair load/store
instructions, and don't generate extra vector moves.  Using the existing
vector pair disassemble and assemble built-ins generate extra vector moves
which can hinder performance.

If I compile the loop code for power9, there is a minor speed up for default
unrolling and more of an improvement using the framework provided in the
vector-pair.h for explicit unrolling by 2:

101%Vector-pair.h function, default unroll for power9
107%Vector-pair.h function, unroll 2 for power9

Of course this is a synthetic benchmark run on a quiet power10 system.  Results
would vary for real code on real systems.  However, I feel adding these
functions can allow the writers of high performance libraries to better
optimize their code.

As an example, if the library wants to code a simple fused multiply-add loop,
they might write the code as follows:

#include 
#include 
#include 

void
fma_vector (double * __restrict__ r,
const double * __restrict__ a,
const double * __restrict__ b,
size_t n)
{
  vector double * __restrict__ vr = (vector double * __restrict__)r;
  const vector double * __restrict__ va = (const vector double * 
__restrict__)a;
  const vector double * __restrict__ vb = (const vector double * 
__restrict__)b;
  size_t num_elements = sizeof (vector double) / sizeof (double);
  size_t nv = n / num_elements;
  size_t i;

  for (i = 0; i < nv; i++)
vr[i] = __builtin_vsx_xvmadddp (va[i], vb[i], vr[i]);

  for (i = nv * num_elements; i < n; i++)
r[i] = fma (a[i], b[i], r[i]);
}

The inner loop would look like:

.L3:
lxvx 0,3,9
lxvx 12,4,9
addi 10,9,16
addi 2,2,-2
lxvx 11,5,9
xvmaddadp 0,12,11
lxvx 12,4,10
lxvx 11,5,10
stxvx 0,3,9
lxvx 0,3,10
addi 9,9,32
xvmaddadp 0,12,11
stxvx 0,3,10
bdnz .L3

Now if you code the loop to use __builtin_vsx_disassemble_pair to do a vector
pair load, but then do 2 vector stores:


#include 
#include 
#include 

void
fma_mma_ld (double * __restrict__ r,
const double * __restrict__ a,
const double * __restrict__ b,
size_t n)
{
  __vector_pair * __restrict__ vp_r = (__vector_pair * __restrict__)r;
  const __vector_pair * __restrict__ vp_a = (const __vector_pair * 
__restrict__)a;
  const __vector_pair * __restrict__ vp_b = (const __vector_pair * 
__restrict__)b;
  vector double * __restrict__ v_r = (vector double * __restrict__)r;
  size_t num_elements = (sizeof (__vector_pair) / sizeof (double));
  size_t n_vp = n / num_elements;
  size_t i, j;
  vector double a_hi_lo[2];
  vector double b_hi_lo[

Ping: [PATCH] d,ada/spec: only sub nostd{inc,lib} rather than nostd{inc,lib}*

2024-10-07 Thread Arsen Arsenović
Ping on this patch.

TIA, have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v6] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-07 Thread Marek Polacek
On Mon, Oct 07, 2024 at 07:35:27PM +, Simon Martin wrote:
> - /* Remove any overridden functions.  */
> - bool seen_non_override = false;
> + /* Find all the base_fndecls that are overridden, as well as those
> +that are hidden, in T.  */
>   for (tree fndecl : ovl_range (fns))
> {
> - bool any_override = false;
> + if (TREE_CODE (fndecl) == TEMPLATE_DECL)
> +   fndecl = DECL_TEMPLATE_RESULT (fndecl);

This can be
  fndecl = STRIP_TEMPLATE (fndecl);
You don't need to repost the patch just because of this.

Marek



[PING] [PATCH v2] diagnostics: Fix compile error for MinGW <7.0

2024-10-07 Thread Torbjorn SVENSSON

Gentle ping :)

Kind regards,
Torbjörn

On 2024-09-28 14:49, Torbjörn SVENSSON wrote:

Ok for trunk?

Changes since v1:

- Updated the commit message to mention the actual build error.
- Switch to checking the required define rather than the version number of 
MinGW.

--

The define ENABLE_VIRTUAL_TERMINAL_PROCESSING was introduced in MinGW
7.0

Build failure when building with MinGW 5.0.3:

.../gcc/diagnostic-color.cc:
In function 'bool should_colorize()':
.../gcc/diagnostic-color.cc:317:41:
error: 'ENABLE_VIRTUAL_TERMINAL_PROCESSING' was not declared in this
scope
mode |= ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING;
  ^~
.../gcc/diagnostic-color.cc:317:41:
note: suggested alternative: 'ENABLE_RTL_FLAG_CHECKING'
mode |= ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING;
  ^~
  ENABLE_RTL_FLAG_CHECKING
.../gcc/diagnostic-color.cc:
In function 'bool auto_enable_urls()':
.../gcc/diagnostic-color.cc:407:50:
error: 'ENABLE_VIRTUAL_TERMINAL_PROCESSING' was not declared in this
scope
if (GetConsoleMode (handle, &mode) && !(mode & 
ENABLE_VIRTUAL_TERMINAL_PROCESSING))
   
^~
.../gcc/diagnostic-color.cc:407:50:
note: suggested alternative: 'ENABLE_RTL_FLAG_CHECKING'
if (GetConsoleMode (handle, &mode) && !(mode & 
ENABLE_VIRTUAL_TERMINAL_PROCESSING))
   
^~
   ENABLE_RTL_FLAG_CHECKING
Makefile:1195: recipe for target 'diagnostic-color.o' failed
make[1]: *** [diagnostic-color.o] Error 1

gcc/ChangeLog:

* gcc/diagnostic-color.cc: Conditionally enable terminal
processing based on define availability.
* gcc/pretty-print.cc: Likewise.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/diagnostic-color.cc | 8 +++-
  gcc/pretty-print.cc | 6 +-
  2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/diagnostic-color.cc b/gcc/diagnostic-color.cc
index 8b195d023eb..2ad708c06e6 100644
--- a/gcc/diagnostic-color.cc
+++ b/gcc/diagnostic-color.cc
@@ -311,12 +311,14 @@ should_colorize (void)
if ((handle != INVALID_HANDLE_VALUE) && (handle != NULL))
  isconsole = GetConsoleMode (handle, &mode);
  
+#ifdef ENABLE_VIRTUAL_TERMINAL_PROCESSING

if (isconsole)
  {
/* Try to enable processing of VT100 escape sequences */
mode |= ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING;
SetConsoleMode (handle, mode);
  }
+#endif
  
return isconsole;

  #else
@@ -404,7 +406,11 @@ auto_enable_urls ()
/* If ansi escape sequences aren't supported by the console, then URLs will
   print mangled from mingw_ansi_fputs's console API translation. It 
wouldn't
   be useful even if this weren't the case.  */
-  if (GetConsoleMode (handle, &mode) && !(mode & 
ENABLE_VIRTUAL_TERMINAL_PROCESSING))
+  if (GetConsoleMode (handle, &mode)
+#ifdef ENABLE_VIRTUAL_TERMINAL_PROCESSING
+  && !(mode & ENABLE_VIRTUAL_TERMINAL_PROCESSING)
+#endif
+  )
  return false;
  #endif
  
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc

index 68c145e2d53..ea75442c6a4 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -679,7 +679,11 @@ mingw_ansi_fputs (const char *str, FILE *fp)
/* Don't mess up stdio functions with Windows APIs.  */
fflush (fp);
  
-  if (GetConsoleMode (h, &mode) && !(mode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))

+  if (GetConsoleMode (h, &mode)
+#ifdef ENABLE_VIRTUAL_TERMINAL_PROCESSING
+  && !(mode & ENABLE_VIRTUAL_TERMINAL_PROCESSING)
+#endif
+  )
  /* If it is a console, and doesn't support ANSI escape codes, translate
 them as needed.  */
  for (;;)




Re: [PATCH v6] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-07 Thread Simon Martin
Hi Jason,

On 7 Oct 2024, at 18:58, Jason Merrill wrote:

> On 10/7/24 11:27 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 17 Sep 2024, at 18:41, Jason Merrill wrote:
>>
>>> On 9/17/24 10:38 AM, Simon Martin wrote:
 Hi Jason,

 Apologies for the back and forth and thanks for your patience!
>>>
>>> No worries.
>>>
 On 5 Sep 2024, at 19:00, Jason Merrill wrote:

> On 9/5/24 7:02 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 4 Sep 2024, at 18:09, Jason Merrill wrote:
>>
>>> On 9/1/24 2:51 PM, Simon Martin wrote:
 Hi Jason,

 On 26 Aug 2024, at 19:23, Jason Merrill wrote:

> On 8/25/24 12:37 PM, Simon Martin wrote:
>> On 24 Aug 2024, at 23:59, Simon Martin wrote:
>>> On 24 Aug 2024, at 15:13, Jason Merrill wrote:
>>>
 On 8/23/24 12:44 PM, Simon Martin wrote:
> We currently emit an incorrect -Woverloaded-virtual 
> warning

> upon
>>
> the
>>
> following
> test case
>
> === cut here ===
> struct A {
> virtual operator int() { return 42; }
> virtual operator char() = 0;
> };
> struct B : public A {
> operator char() { return 'A'; }
> };
> === cut here ===
>
> The problem is that warn_hidden relies on get_basefndecls 
> to
>>
> find
>>
> the
> methods
> in A possibly hidden B's operator char(), and gets both 

> the
> conversion operator
> to int and to char. It eventually wrongly concludes that 
> the
>>
> conversion to int
> is hidden.
>
> This patch fixes this by filtering out conversion 
> operators
> to

> different types
> from the list returned by get_basefndecls.

 Hmm, same_signature_p already tries to handle comparing
 conversion
 operators, why isn't that working?

>>> It does indeed.
>>>
>>> However, `ovl_range (fns)` does not only contain `char
>>> B::operator()` -
>>> for which `any_override` gets true - but also 
>>> `conv_op_marker`
>>> -

>>> for

>>> which `any_override` gets false, causing `seen_non_override`

>>> to
>>
>>> get
>>> to
>>> true. Because of that, we run the last loop, that will emit 
>>> a
>>> warning
>>> for all `base_fndecls` (except `char B::operator()` that has

>>> been
>>> removed).
>>>
>>> We could test `fndecl` and `base_fndecls[k]` against
>>> `conv_op_marker` in
>>> the loop, but we’d still need to inspect the “converting
>>> to”
>>> type
>>> in the last loop (for when `warn_overloaded_virtual` is 2).
>>
>>> This

>>> would
>>> make the code much more complex than the current patch.
>
> Makes sense.
>
>>> It would however probably be better if `get_basefndecls` 
>>> only
>>> returned
>>> the right conversion operator, not all of them. I’ll draft
>>> another
>>> version of the patch that does that and submit it in this
>>> thread.
>>>
>> I have explored my suggestion further and it actually ends up

>> more
>> complicated than the initial patch.
>
> Yeah, you'd need to do lookup again for each member of fns.
>
>> Please find attached a new revision to fix the reported 
>> issue,
>> as

>> well
>> as new ones I discovered while testing with
>> -Woverloaded-virtual=2.

>>
>> It’s pretty close to the initial patch, but (1) adds a
>> missing
>> “continue;” (2) fixes a location problem when

>> -Woverloaded-virtual==2 (3) adds more test cases. The commit
>> log
>> is
>> also
>> more comprehensive, and should describe well the various
>> problems

>> and

>>
>> why the patch is correct.
>
>> +if (IDENTIFIER_CONV_OP_P (name)
>> +&& !same_type_p (DECL_CONV_FN_TYPE (fndecl),
>> + DECL_CONV_FN_TYPE 
>> (base_fndecls[k])))
>> +  {
>> +base_fndecls[k] = NULL_TREE;
>> +continue;
>> +  }
>
> So this removes base_fndecls[k] if it doesn't return the same
>>
> type
>>

Re: [PATCH v6] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-07 Thread Simon Martin
Hi Marek,

On 7 Oct 2024, at 21:44, Marek Polacek wrote:

> On Mon, Oct 07, 2024 at 07:35:27PM +, Simon Martin wrote:
>> -/* Remove any overridden functions.  */
>> -bool seen_non_override = false;
>> +/* Find all the base_fndecls that are overridden, as well as those
>> +   that are hidden, in T.  */
>>  for (tree fndecl : ovl_range (fns))
>>{
>> -bool any_override = false;
>> +if (TREE_CODE (fndecl) == TEMPLATE_DECL)
>> +  fndecl = DECL_TEMPLATE_RESULT (fndecl);
>
> This can be
>   fndecl = STRIP_TEMPLATE (fndecl);
> You don't need to repost the patch just because of this.
Indeed, thanks. Change integrated in my local branch, so that it goes in 
once I have an approved version.

Simon


Re: [PATCH v5] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-10-07 Thread Ken Matsui
On Monday, October 7th, 2024 at 4:41 PM, Marek Polacek  
wrote:

>
>
> On Sat, Jun 15, 2024 at 10:30:35PM -0700, Ken Matsui wrote:
>
> > This patch adds a warning switch for "#pragma once in main file". The
> > warning option name is Wpragma-once-outside-header, which is the same
> > as Clang provides.
>
>
> I think the patch is OK now, thanks. Other diagnostics inlude the '#'
> character but I know you just did what David suggested.

Thank you for your review!  It might be better to keep consistency between 
other compilers, but do we proceed with the current change?

Just to confirm, since you are a C front end reviewer, am I now ok to push this 
patch?

>
> > PR preprocessor/89808
> >
> > gcc/c-family/ChangeLog:
> >
> > * c.opt (Wpragma_once_outside_header): Define new option.
> > * c.opt.urls: Regenerate.
> >
> > gcc/ChangeLog:
> >
> > * doc/invoke.texi (Warning Options): Document
> > -Wno-pragma-once-outside-header.
> >
> > libcpp/ChangeLog:
> >
> > * include/cpplib.h (cpp_warning_reason): Define
> > CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> > * directives.cc (do_pragma_once): Use
> > CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
> > * g++.dg/warn/Wpragma-once-outside-header.C: New test.
> >
> > Signed-off-by: Ken Matsui kmat...@gcc.gnu.org
> > ---
> > gcc/c-family/c.opt | 4 
> > gcc/c-family/c.opt.urls | 3 +++
> > gcc/doc/invoke.texi | 10 --
> > .../g++.dg/warn/Wno-pragma-once-outside-header.C | 5 +
> > .../g++.dg/warn/Wpragma-once-outside-header.C | 6 ++
> > libcpp/directives.cc | 3 ++-
> > libcpp/include/cpplib.h | 3 ++-
> > 7 files changed, 30 insertions(+), 4 deletions(-)
> > create mode 100644 
> > gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> > create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> >
> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index 403abc1f26e..3439f36fe45 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -1188,6 +1188,10 @@ Wpragmas
> > C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
> > Warn about misuses of pragmas.
> >
> > +Wpragma-once-outside-header
> > +C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) 
> > CppReason(CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER) Init(1) Warning
> > +Warn about #pragma once outside of a header.
> > +
> > Wprio-ctor-dtor
> > C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
> > Warn if constructor or destructors with priorities from 0 to 100 are used.
> > diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
> > index dd455d7c0dc..778ca08be2e 100644
> > --- a/gcc/c-family/c.opt.urls
> > +++ b/gcc/c-family/c.opt.urls
> > @@ -672,6 +672,9 @@ 
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-pointer-to-int-cast)
> > Wpragmas
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-pragmas)
> >
> > +Wpragma-once-outside-header
> > +UrlSuffix(gcc/Warning-Options.html#index-Wno-pragma-once-outside-header)
> > +
> > Wprio-ctor-dtor
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-prio-ctor-dtor)
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 9456ced468a..c7f17ca9eb7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
> > -Wpacked -Wno-packed-bitfield-compat -Wpacked-not-aligned -Wpadded
> > -Wparentheses -Wno-pedantic-ms-format
> > -Wpointer-arith -Wno-pointer-compare -Wno-pointer-to-int-cast
> > --Wno-pragmas -Wno-prio-ctor-dtor -Wredundant-decls
> > --Wrestrict -Wno-return-local-addr -Wreturn-type
> > +-Wno-pragmas -Wno-pragma-once-outside-header -Wno-prio-ctor-dtor
> > +-Wredundant-decls -Wrestrict -Wno-return-local-addr -Wreturn-type
> > -Wno-scalar-storage-order -Wsequence-point
> > -Wshadow -Wshadow=global -Wshadow=local -Wshadow=compatible-local
> > -Wno-shadow-ivar
> > @@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as 
> > incorrect parameters,
> > invalid syntax, or conflicts between pragmas. See also
> > @option{-Wunknown-pragmas}.
> >
> > +@opindex Wno-pragma-once-outside-header
> > +@opindex Wpragma-once-outside-header
> > +@item -Wno-pragma-once-outside-header
> > +Do not warn when @code{#pragma once} is used in a file that is not a header
> > +file, such as a main file.
> > +
> > @opindex Wno-prio-ctor-dtor
> > @opindex Wprio-ctor-dtor
> > @item -Wno-prio-ctor-dtor
> > diff --git a/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C 
> > b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> > new file mode 100644
> > index 000..b5be4d25a9d
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> > @@ -0,0 +1,5 @@
> > +// { dg-do assemble }
> > +// { dg-options "-Wno-pragma-once-outside-header" }
> > +
> > +#pragma once
> > +int main() {}
> > diff --git a/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C 
> > b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> 

Re: [PATCH v5] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-10-07 Thread Marek Polacek
On Sat, Jun 15, 2024 at 10:30:35PM -0700, Ken Matsui wrote:
> This patch adds a warning switch for "#pragma once in main file".  The
> warning option name is Wpragma-once-outside-header, which is the same
> as Clang provides.

I think the patch is OK now, thanks.  Other diagnostics inlude the '#'
character but I know you just did what David suggested.
 
>   PR preprocessor/89808
> 
> gcc/c-family/ChangeLog:
> 
>   * c.opt (Wpragma_once_outside_header): Define new option.
>   * c.opt.urls: Regenerate.
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi (Warning Options): Document
>   -Wno-pragma-once-outside-header.
> 
> libcpp/ChangeLog:
> 
>   * include/cpplib.h (cpp_warning_reason): Define
>   CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
>   * directives.cc (do_pragma_once): Use
>   CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
>   * g++.dg/warn/Wpragma-once-outside-header.C: New test.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/c-family/c.opt |  4 
>  gcc/c-family/c.opt.urls|  3 +++
>  gcc/doc/invoke.texi| 10 --
>  .../g++.dg/warn/Wno-pragma-once-outside-header.C   |  5 +
>  .../g++.dg/warn/Wpragma-once-outside-header.C  |  6 ++
>  libcpp/directives.cc   |  3 ++-
>  libcpp/include/cpplib.h|  3 ++-
>  7 files changed, 30 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 403abc1f26e..3439f36fe45 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1188,6 +1188,10 @@ Wpragmas
>  C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
>  Warn about misuses of pragmas.
>  
> +Wpragma-once-outside-header
> +C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) 
> CppReason(CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER) Init(1) Warning
> +Warn about #pragma once outside of a header.
> +
>  Wprio-ctor-dtor
>  C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
>  Warn if constructor or destructors with priorities from 0 to 100 are used.
> diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
> index dd455d7c0dc..778ca08be2e 100644
> --- a/gcc/c-family/c.opt.urls
> +++ b/gcc/c-family/c.opt.urls
> @@ -672,6 +672,9 @@ 
> UrlSuffix(gcc/Warning-Options.html#index-Wno-pointer-to-int-cast)
>  Wpragmas
>  UrlSuffix(gcc/Warning-Options.html#index-Wno-pragmas)
>  
> +Wpragma-once-outside-header
> +UrlSuffix(gcc/Warning-Options.html#index-Wno-pragma-once-outside-header)
> +
>  Wprio-ctor-dtor
>  UrlSuffix(gcc/Warning-Options.html#index-Wno-prio-ctor-dtor)
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9456ced468a..c7f17ca9eb7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
>  -Wpacked  -Wno-packed-bitfield-compat  -Wpacked-not-aligned  -Wpadded
>  -Wparentheses  -Wno-pedantic-ms-format
>  -Wpointer-arith  -Wno-pointer-compare  -Wno-pointer-to-int-cast
> --Wno-pragmas  -Wno-prio-ctor-dtor  -Wredundant-decls
> --Wrestrict  -Wno-return-local-addr  -Wreturn-type
> +-Wno-pragmas  -Wno-pragma-once-outside-header  -Wno-prio-ctor-dtor
> +-Wredundant-decls  -Wrestrict  -Wno-return-local-addr  -Wreturn-type
>  -Wno-scalar-storage-order  -Wsequence-point
>  -Wshadow  -Wshadow=global  -Wshadow=local  -Wshadow=compatible-local
>  -Wno-shadow-ivar
> @@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as 
> incorrect parameters,
>  invalid syntax, or conflicts between pragmas.  See also
>  @option{-Wunknown-pragmas}.
>  
> +@opindex Wno-pragma-once-outside-header
> +@opindex Wpragma-once-outside-header
> +@item -Wno-pragma-once-outside-header
> +Do not warn when @code{#pragma once} is used in a file that is not a header
> +file, such as a main file.
> +
>  @opindex Wno-prio-ctor-dtor
>  @opindex Wprio-ctor-dtor
>  @item -Wno-prio-ctor-dtor
> diff --git a/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C 
> b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> new file mode 100644
> index 000..b5be4d25a9d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> @@ -0,0 +1,5 @@
> +// { dg-do assemble  }
> +// { dg-options "-Wno-pragma-once-outside-header" }
> +
> +#pragma once
> +int main() {}
> diff --git a/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C 
> b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> new file mode 100644
> index 000..29f09b69f71
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> @@ -0,0 +1,6 @@
> +// { dg-do assemble  }
> +// { dg-options "-Werror=pragma-once-outside-header" 

RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-10-07 Thread Prathamesh Kulkarni


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: Tuesday, October 1, 2024 8:26 PM
> To: Richard Sandiford 
> Cc: rguent...@suse.de; Thomas Schwinge ; gcc-
> patc...@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, September 24, 2024 12:29 PM
> > To: Prathamesh Kulkarni 
> > Cc: Richard Sandiford ; Thomas Schwinge
> > ; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 24 Sep 2024, Prathamesh Kulkarni wrote:
> >
> > >
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, September 9, 2024 7:24 PM
> > > > To: Prathamesh Kulkarni 
> > > > Cc: Richard Sandiford ; Thomas
> Schwinge
> > > > ; gcc-patches@gcc.gnu.org
> > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming
> in
> > > > for accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> > > >
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Prathamesh Kulkarni 
> > > > > > Sent: Thursday, August 22, 2024 7:41 PM
> > > > > > To: Richard Biener 
> > > > > > Cc: Richard Sandiford ; Thomas
> > > > > > Schwinge ; gcc-patches@gcc.gnu.org
> > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > streaming
> > > > > > in for accelerator
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Richard Biener 
> > > > > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > > > > To: Prathamesh Kulkarni 
> > > > > > > Cc: Richard Sandiford ; Thomas
> > > > > > > Schwinge ; gcc-patches@gcc.gnu.org
> > > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > streaming in
> > > > > > for
> > > > > > > accelerator
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -Original Message-
> > > > > > > > > From: Richard Biener 
> > > > > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > > > > To: Richard Sandiford 
> > > > > > > > > Cc: Prathamesh Kulkarni ;
> Thomas
> > > > > > Schwinge
> > > > > > > > > ; gcc-patches@gcc.gnu.org
> > > > > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > > > streaming
> > > > > > in
> > > > > > > > > for accelerator
> > > > > > > > >
> > > > > > > > > External email: Use caution opening links or
> attachments
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > > > > :
> > > > > > > > > >
> > > > > > > > > > Prathamesh Kulkarni  writes:
> > > > > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file
> > COPYING3.
> > > > > > > > > >> If
> > > > > > > not
> > > > > > > > > see
> > > > > > > > > >> #include "debug.h"
> > > > > > > > > >> #include "alloc-pool.h"
> > > > > > > > > >> #include "toplev.h"
> > > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > > >>
> > > > > > > > > >> /* Allocator used to hold string slot entries for
> > line
> > > > > > > > > >> map
> > > > > > > > > streaming.
> > > > > > > > > >> */ static struct object_allocator string_slot>
> > > > > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > > > > lto_read_tree_1
> > > > > > > > > (class lto_input_block *ib, class data_in *data_in,
> tree
> > > > > > > > > expr)
> > > > > > > > > >> with -g1, see for example PR113488.  */
> > > > > > > > > >>   else if (DECL_P (expr) &&
> DECL_ABSTRACT_ORIGIN
> > > > > > > > > >> (expr)
> > > > > > ==
> > > > > > > > > expr)
> > > > > > > > > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > > > > >> +
> > > > > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > > > > >> +  /* For decl with aggregate type, host
> streams
> > > > > > > > > >> +out
> > > > > > > VOIDmode.
> > > > > > > > > >> + Compute the correct DECL_MODE by calling
> > > > relayout_decl.
> > > > > > > */
> > > > > > > > > >> +  if ((VAR_P (expr)
> > > > > > > > > >> +   || TREE_CODE (expr) == PARM_DECL
> > > > > > > > > >> +   || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > > >> +  && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > > > > >> +  && DECL_MODE (expr) == VOIDmod

C++ ping: [PATCH 0/2] Support for coroutine frames with new-extended alignment

2024-10-07 Thread Arsen Arsenović
Arsen Arsenović  writes:

> This patch series implements support for coroutines whose frames require
> alignment.
>
> The standard currently does not specify much about this case AFAICT, so
> we can do this for now (until P2014 progresses).
>
> The new dump was useful for testing, and might be useful to coroutine
> hackers.
>
> This patchset also depends on Iains ramp rework patch, so will be pushed
> after.

Gentle ping on this patch series.

I originally forgot to note this it'd seem, but this patch series also
introduces a coroutine ABI break.  What is the proper mechanism for
dealing with that (given the factors of it being an experimental feature
that is being used in production)?  Specifically, the "public" bits of
the frame layout and procedure for resuming a suspended coroutine have
changed.

There's also the concern of compatibility with clang, which is yet to
implement this AFAIK.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


[committed] Move gfortran.dg/gomp/allocate-static.f90 to libgomp.fortran/ (was: [r15-4104 Regression] FAIL: gfortran.dg/gomp/allocate-static.f90 -Os (test for excess errors) on Linux/x86_64)

2024-10-07 Thread Tobias Burnus

Committed as r15-4127-gb95ad25f9c9376

Hi Thomas,

Thomas Schwinge wrote:

On 2024-10-07T17:07:05+0200, Tobias Burnus  wrote:

If anyone can reproduce this, I would be interested in the excess errors.

 gfortran: fatal error: cannot read spec file 'libgomp.spec': No such file 
or directory


Aha. Thanks! — I am in principle aware of it, but tend to forget it from 
time to time, especially when only later turning a compile-time test 
into a runtime one …


In principle, a compile-time only check would enough (as with the C 
testcase) if the alignment were visible in the dump.


(It is, kind of, with C/C++, using alignof; but for Fortran it isn't. On 
the other hand, checking it at runtime ensures that it really works.)



I already was about to 'git mv' the file into
'libgomp/testsuite/libgomp.fortran/' -- but then realized that we
probably also should get rid of this local 'module omp_lib_kinds':


Yes, if 'omp_lib_kinds.mod' is available, there is no point to define it 
locally.


Tobias
commit b95ad25f9c9376575dcde4bcb529d3ca31b27359
Author: Tobias Burnus 
Date:   Mon Oct 7 23:57:42 2024 +0200

Move gfortran.dg/gomp/allocate-static.f90 to libgomp.fortran/

The testcase was turned into a 'dg-do run' check to check for the alignment,
but this only works in testsuite/gfortran.dg, causing link errors for
out-of-tree testing. The test was added in r15-4104-ga8caeaacf499d5.

gcc/testsuite/:

* gfortran.dg/gomp/allocate-static.f90: Move to libgomp/testsuite/.

libgomp/:

* testsuite/libgomp.fortran/allocate-static.f90: Moved from
gcc/testsuite/ as it is a dg-do run test; use real omp_lib_kinds
instead of local definition
---
 .../testsuite/libgomp.fortran}/allocate-static.f90 | 28 --
 1 file changed, 28 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-static.f90 b/libgomp/testsuite/libgomp.fortran/allocate-static.f90
similarity index 50%
rename from gcc/testsuite/gfortran.dg/gomp/allocate-static.f90
rename to libgomp/testsuite/libgomp.fortran/allocate-static.f90
index e43dae5793f..2789e39e19b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-static.f90
+++ b/libgomp/testsuite/libgomp.fortran/allocate-static.f90
@@ -1,31 +1,3 @@
-! { dg-do run }
-
-module omp_lib_kinds
-  use iso_c_binding, only: c_int, c_intptr_t
-  implicit none
-  private :: c_int, c_intptr_t
-  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
-
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_null_allocator = 0
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_default_mem_alloc = 1
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_large_cap_mem_alloc = 2
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_const_mem_alloc = 3
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_high_bw_mem_alloc = 4
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_low_lat_mem_alloc = 5
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_cgroup_mem_alloc = 6
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_pteam_mem_alloc = 7
-  integer (kind=omp_allocator_handle_kind), &
- parameter :: omp_thread_mem_alloc = 8
-end module
-
 module m
   use iso_c_binding, only: c_intptr_t
   use omp_lib_kinds, only: omp_default_mem_alloc


  1   2   >