date:20250127

Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-01-27 Thread Torbjorn SVENSSON


Hi Christophe,

On 2025-01-27 13:07, Christophe Lyon wrote:

Hi Torbjorn,

On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON
 wrote:


Another ping... :)

Kind regards,
Torbjörn

On 2024-12-18 18:35, Torbjorn SVENSSON wrote:

Gentle ping :)

Kind regards,
Torbjörn

On 2024-11-14 17:16, Torbjorn SVENSSON wrote:



On 2024-11-14 16:32, Christophe Lyon wrote:

On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON
 wrote:


Ok for trunk and releases/gcc-14?

--

A long time ago, this test forced -march=armv6.

With -marm, the generated assembler is:
foo:
  sub r0, r0, #48
  cmp r0, #9
  movhi   r0, #0
  movls   r0, #1
  bx  lr

With -mthumb, the generated assembler is:
foo:
  subsr0, r0, #48
  movsr2, #9
  uxtbr3, r0
  movsr0, #0
  cmp r2, r3
  adcsr0, r0, r0
  uxtbr0, r0
  bx  lr

Require effective-target arm_arm_ok to skip the test for thumb-only
targets (Cortex-M).

gcc/testsuite/ChangeLog:

  * gcc.target/arm/unsigned-extend-1.c: Use effective-target
  arm_arm_ok.

Signed-off-by: Torbjörn SVENSSON 
---
   gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/
testsuite/gcc.target/arm/unsigned-extend-1.c
index 3b4ab048fb0..73f2e1a556d 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
@@ -1,4 +1,5 @@
   /* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
   /* { dg-options "-O2" } */



So I'm guessing arm_arm_ok fails when you are testing for M-profile
targets?
What happens when you test for A-profile, providing -mthumb as part of
your runtest flags?





Looking at our CI results, without your patch the test passes on:
cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode,
and it fails only on cortex-m0 and m23.
Does this match your observations?


Without my patch, I see failure on M0, M0+ and M23.
With my patch, it goes to unsupported on all Cortex-M targets that I test.


So your patch will make the test unsupported on m33, m3, m55 and m7
while it currently passes?

Am I missing something?


I think you might have missed my comment down



Thanks,

Christophe



For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft):

Testing arm/unsigned-extend-1.c
check_compile tool: gcc for arm_arm_ok
doing compile
Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21114.c  -
mthumb -mcpu=cortex-m0 -mfloat-abi=soft   -fdiagnostics-plain-output
-marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s
(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c -
mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -
marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s
pid is 21254 -21254
cc1: error: target CPU does not support ARM mode
pid is -1
close result is 21254 exp6 0 1
output is cc1: error: target CPU does not support ARM mode
   status 1
compiler exited with status 1
UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c


For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft):

Testing arm/unsigned-extend-1.c
check_compile tool: gcc for arm_arm_ok
doing compile
Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21296.c  -
mthumb -mcpu=cortex-a7 -mfloat-abi=soft   -fdiagnostics-plain-output
-marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s
(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c -
mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output -
marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s
pid is 21434 -21434
pid is -1
output is  status 0
doing compile
Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/
gcc.target/arm/unsigned-extend-1.c  -mthumb -mcpu=cortex-a7 -mfloat-
abi=soft   -fdiagnostics-plain-output  -O2 -ffat-lto-objects -fno-
ident -S -o unsigned-extend-1.s(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc .../gcc/testsuite/
gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat-
abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno-ident -
S -o unsigned-extend-1.s
pid is 21438 -21438
pid is -1
output is  status 0
check_compile tool: gcc for exceptions_enabled
doing compile
Executing on host: .../bin/arm-none-eabi-gcc
exceptions_enabled21296.cc   -mthumb -mcpu=cortex-a7 -mfloat-
abi=soft   -fdiagnostics-plain-output -Wno-complain-wrong-lang -S
-o exceptions_enabled21296.s(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc
exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat-abi=soft -
fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o
exceptions_enabled21296.s
pid is 21442 -21442
pid is -1
output is  status 0
PASS: gcc.target/arm/unsigned-extend-1.c (test for excess errors)
PASS: gcc.target/arm/unsigned-extend-1.c scan-assembler-not uxtb

For Cortex-A7 (cpu=cortex-a7/float

[PATCH][v2] rtl-optimization/118662 - wrong combination of vector sign-extends

2025-01-27 Thread Richard Biener

The following fixes an issue in the RTL combiner where we correctly
combine two vector sign-extends with a vector load

Trying 7, 9 -> 10:
7: r106:V4QI=[r119:DI]
  REG_DEAD r119:DI
9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel))
   10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel))
  REG_DEAD r108:V4HI

to

modifying insn i2 9: r109:V4SI=sign_extend([r119:DI])

but since r106 is used we wrongly materialize it using a subreg:

modifying insn i310: r106:V4QI=r109:V4SI#0

which of course does not work for modes with more than one component,
specifically vector and complex modes.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR rtl-optimization/118662
* combine.cc (try_combine): When re-materializing a load
from an extended reg by a lowpart subreg make sure we're
not dealing with vector or complex modes.

* gcc.dg/torture/pr118662.c: New testcase.
---
 gcc/combine.cc  |  5 +
 gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a2d4387cebe..b0159b23d86 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  copy.  This saves at least one insn, more if register allocation can
  eliminate the copy.
 
+ We cannot do this if the involved modes have more than one elements,
+ like for vector or complex modes.
+
  We cannot do this if the destination of the first assignment is a
  condition code register.  We eliminate this case by making sure
  the SET_DEST and SET_SRC have the same mode.
@@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND
   && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))
   == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0
+  && ! VECTOR_MODE_P (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0
+  && ! COMPLEX_MODE_P (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
   && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)),
   XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0))
diff --git a/gcc/testsuite/gcc.dg/torture/pr118662.c 
b/gcc/testsuite/gcc.dg/torture/pr118662.c
new file mode 100644
index 000..b9e8cca0aeb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118662.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */
+/* { dg-additional-options "-msse4" { target sse4_runtime} } */
+
+int __attribute__((noipa)) addup(signed char *num) {
+  int val = num[0] + num[1] + num[2] + num[3];
+  if (num[3] >= 0)
+val++;
+  return val;
+}
+
+int main(int, char *[])
+{
+  signed char num[4] = {1, 1, 1, -1};
+  if (addup(num) != 2)
+__builtin_abort();
+  return 0;
+}
-- 
2.43.0

[PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]

2025-01-27 Thread Simon Martin

Hi Jason,

On 24 Jan 2025, at 16:51, Jason Merrill wrote:

> On 1/24/25 6:19 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 23 Jan 2025, at 22:56, Jason Merrill wrote:
>>
>>> On 1/23/25 12:06 PM, Simon Martin wrote:
 Hi Marek,

 On 23 Jan 2025, at 16:45, Marek Polacek wrote:

> On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote:
>> Hi Jason,
>>
>> On 20 Jan 2025, at 22:49, Jason Merrill wrote:
>>
>>> On 1/20/25 2:11 PM, Simon Martin wrote:
 Hi Jason,

 On 15 Jan 2025, at 22:42, Jason Merrill wrote:

> On 12/12/24 2:07 PM, Simon Martin wrote:
>> We currently ICE upon the following valid (under -Wno-vla) 

>> code
>>
>>
>> === cut here ===
>> void f(int c) {
>>   constexpr int r = 4;
>>   [&](auto) { int t[r * c]; }(0);
>> }
>> === cut here ===
>>
>> The problem is that when parsing the lambda body, and more
>> specifically
>> the multiplication, we mark the lambda as
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>> even though the replacement of r by 4 is "undone" by the call

>> to
>> build_min_non_dep in build_x_binary_op. This makes
>> prune_lambda_captures
>> remove the proxy declaration while it should not, and we trip

>> on
>> an
>>
>> assert at instantiation time.
>
> Why does prune_lambda_captures remove the declaration if it's
> still
> used in the function body?  Setting
> LAMBDA_EXPR_CAPTURE_OPTIMIZED
> just
> means "we might have optimized away some captures", the tree
> walk
> should have found the use put back by build_x_binary_op.
 I think that this is due to a bug in mark_const_cap_r, that’s
 been
 here since the beginning, i.e. r8-7213-g1577f10a637352: it
 decides
>>
 NOT
 to walk sub-trees when encountering a DECL_EXPR for a constant
 capture
 proxy (at lambda.cc:1769). I don’t understand why we 

 wouldn’t
 want
 to continue.
>>>
>>> Because that's the declaration of the capture proxy, not a use 
>>> of
>>> it.
>> Indeed, thanks for clarifying.
>>
>>> Why aren't we finding the use in the declaration of t?
>> After further investigation, the problem is rather that neither
>> walk_tree_1 nor cp_walk_subtrees walk the dimensions of array
>> types,
>> so
>> we miss the uses.
>>
 Removing that line fixes the PR, but breaks 3 existing tests
 (lambda-capture1.C, lambda-const11.C and lambda-const11a.C, 
 that
 all
 assert that we optimise out the capture); that’s why I did 
 not
 pursue
 it in the first place.
>>>
>>> Right.
>>>
 But taking another look, it might not be that big a deal that 
 we
 don’t
 optimise those out: as soon as we use -O1 or above, the
 assignment
 to
>>
 the closure field actually disappears.
>>>
>>> Completely breaking this optimization is a big deal, 
>>> particularly
>>> since it affects the layout of closure objects.  We can't always

>>> optimize everything away.
>> ACK.
>>
>> The attached updated patch makes cp_walk_subtrees walk array type

>> dimensions, which fixes the initial PR without those 3 
>> regressions.
>>
>>
>> Successfully tested on x86_64-pc-linux-gnu. Is it OK?
>>
>> Simon
>>
>> PS: In theory I think it’d make sense to do do this change in
>>
>> walk_tree_1 since C also supports VLAs, but doing so breaks some
>>
>> OMP
>> tests. OMP can do interesting things with array bounds (see
>> r9-5354-gda972c05f48637), and fixing those would require teaching

>> walk_tree_1 about the “omp dummy var” array bounds, which I
>> think
>> would be a bit ugly. And I’m not aware of any C case that would
>> be
>> improved/fixed by doing this change, so we’re probably fine not
>> doing
>> it.
>
>>   From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17 
>> 00:00:00
>> 2001
>> From: Simon Martin 
>> Date: Wed, 22 Jan 2025 16:19:47 +0100
>> Subject: [PATCH] c++: Don't prune constant capture proxies only
>> used
>> in array
>>dimensions [PR114292]
>>
>> We currently ICE upon the following valid (under -Wno-vla) code
>>
>> === cut here ===
>> void f(int c) {
>> constexpr int r = 4;
>> [&](auto) { int t[r * c]; }(0);
>> }
>> === cut here ===
>>
>> When parsing the lambda body, and more specifically the
>> multiplication,
>> we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which
>> indicates
>> to
>> prune_lambda_captures that it might be po

Re: [PATCH] asan: Fix missing FakeStack flag cleanup

2025-01-27 Thread Ilya Leoshkevich

On Fri, 2025-01-24 at 17:25 +0100, Jakub Jelinek wrote:
> On Thu, Jan 09, 2025 at 01:15:30AM +0100, Ilya Leoshkevich wrote:
> > Bootstrapped and regtested on x86_64-redhat-linux.  Ok for master?
> > 
> > 
> > 
> > The FakeStack flag is not zeroed out when can_store_by_pieces()
> > returns false.  Over time, this causes FakeStack::Allocate() to
> > perform
> > the maximum number of loop iterations, significantly slowing down
> > the
> > instrumented program.
> 
> Took me a while to construct a testcase where it makes a difference,
> but e.g.
> void foo (int *, int *, int *, int *, int *, int *);
> 
> int
> bar (void)
> {
>   int a[3], b[26];
>   foo (a, b, 0, 0, 0, 0);
>   return 0;
> }
> 
> int
> baz (void)
> {
>   int a[3], b[26], c[371], d[12], e[257], f[5];
>   foo (a, b, c, d, e, f);
>   return 0;
> }
> shows it on s390x with -O2 -fsanitize=address on bar but not baz (on
> x86_64
> not on either).
> 
> > gcc/ChangeLog:
> > 
> >  * asan.cc (asan_emit_stack_protection): Always zero the flag
> >  unless it is cleared by the __asan_stack_free_N() libcall.
> > 
> > Signed-off-by: Ilya Leoshkevich 
> > ---
> >  gcc/asan.cc | 30 ++
> >  1 file changed, 18 insertions(+), 12 deletions(-)

[...]

> Ok for trunk with that nit fixed.
> 
>  Jakub

Thank you for the review!
I have fixed the style and committed this.

Would it be okay to backport this to gcc-13 and gcc-14?
Bootstrap and regtest pass on x86_64-redhat-linux.

Re: [PATCH] libstdc++: correct symbol version of typeinfo for bfloat16_t on RISC-V

2025-01-27 Thread Jonathan Wakely

On Mon, 27 Jan 2025 at 10:55, Andreas Schwab  wrote:
>
> RISC-V only gained support for bfloat16_t after gcc 14.  Passes
> libstdc++/check_abi on {x86_64,aarch64,ppc64le,riscv64,s390x}-suse-linux.

OK, thanks.

>
> PR libstdc++/118563
> * testsuite/util/testsuite_abi.cc (check_version): Add
> CXXABI_1.3.16.
> * config/abi/pre/gnu.ver (CXXABI_1.3.14) [__riscv]: Exclude
> typeinfo for bfloat16_t.
> (CXXABI_1.3.16) [__riscv]: Add it here.
> ---
>  libstdc++-v3/config/abi/pre/gnu.ver  | 23 
>  libstdc++-v3/testsuite/util/testsuite_abi.cc |  1 +
>  2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
> b/libstdc++-v3/config/abi/pre/gnu.ver
> index f7641974ec4..84ce874fe03 100644
> --- a/libstdc++-v3/config/abi/pre/gnu.ver
> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
> @@ -2852,10 +2852,15 @@ CXXABI_1.3.13 {
>  CXXABI_1.3.14 {
>
>  # typeinfo for _Float{16,32,64,128,32x,64x,128x} and
> -# __bf16
> -_ZTIDF[0-9]*[_bx];
> -_ZTIPDF[0-9]*[_bx];
> -_ZTIPKDF[0-9]*[_bx];
> +# __bf16/bfloat16_t
> +_ZTIDF[0-9]*[_x];
> +_ZTIPDF[0-9]*[_x];
> +_ZTIPKDF[0-9]*[_x];
> +#ifndef __riscv
> +_ZTIDF16b;
> +_ZTIPDF16b;
> +_ZTIPKDF16b;
> +#endif
>  _ZTIu6__bf16;
>  _ZTIPu6__bf16;
>  _ZTIPKu6__bf16;
> @@ -2869,6 +2874,16 @@ CXXABI_1.3.15 {
>
>  } CXXABI_1.3.14;
>
> +CXXABI_1.3.16 {
> +
> +#ifdef __riscv
> +_ZTIDF16b;
> +_ZTIPDF16b;
> +_ZTIPKDF16b;
> +#endif
> +
> +} CXXABI_1.3.15;
> +
>  # Symbols in the support library (libsupc++) supporting transactional memory.
>  CXXABI_TM_1 {
>
> diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
> b/libstdc++-v3/testsuite/util/testsuite_abi.cc
> index dcf5b1a4e0b..0d6080fb92c 100644
> --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
> +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
> @@ -237,6 +237,7 @@ check_version(symbol& test, bool added)
>known_versions.push_back("CXXABI_1.3.13");
>known_versions.push_back("CXXABI_1.3.14");
>known_versions.push_back("CXXABI_1.3.15");
> +  known_versions.push_back("CXXABI_1.3.16");
>known_versions.push_back("CXXABI_IEEE128_1.3.13");
>known_versions.push_back("CXXABI_TM_1");
>known_versions.push_back("CXXABI_FLOAT128");
> --
> 2.48.1
>
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
>

Re: [PATCH] asan: Fix missing FakeStack flag cleanup

2025-01-27 Thread Jakub Jelinek

On Mon, Jan 27, 2025 at 02:02:14PM +0100, Ilya Leoshkevich wrote:
> Thank you for the review!
> I have fixed the style and committed this.
> 
> Would it be okay to backport this to gcc-13 and gcc-14?
> Bootstrap and regtest pass on x86_64-redhat-linux.

Ok.

Jakub

Re: [PATCH] c++, v3: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]

2025-01-27 Thread Jason Merrill


On 1/26/25 2:25 PM, Jakub Jelinek wrote:

On Sat, Jan 25, 2025 at 10:53:50AM -0500, Jason Merrill wrote:

On 1/25/25 4:12 AM, Jakub Jelinek wrote:

On Fri, Jan 24, 2025 at 07:07:15PM -0500, Jason Merrill wrote:

Hypothetically, but those cases are just either error or DECL_EXTERNAL. In
the error case we're failing anyway; in the external case all the
base/nonbase for a particular structured binding declaration should be
consistent.


So shall I just remove all the prune_vars_needing_no_initialization hunks
then or add gcc_checking_assert (!STATIC_INIT_DECOMP_BASE_P (t) &&
!STATIC_INIT_DECOMP_NONBASE_P (t)); for the DECL_EXTERNAL punt case?


Just the assert sounds good.


Note, unfortunately it is hard to come up with a testcase that actually
prunes something on purpose...


Indeed, it shouldn't be possible.


So like this?


OK.


Passed x86_64-linux and i686-linux bootstrap/regtest.

2025-01-26  Jakub Jelinek  

PR c++/115769
gcc/cp/
* cp-tree.h (STATIC_INIT_DECOMP_BASE_P): Define.
(STATIC_INIT_DECOMP_NONBASE_P): Define.
* decl.cc (cp_finish_decl): Mark nodes in {static,tls}_aggregates
with
* decl2.cc (decomp_handle_one_var, decomp_finalize_var_list): New
functions.
(emit_partial_init_fini_fn): Use them.
(prune_vars_needing_no_initialization): Assert
STATIC_INIT_DECOMP_*BASE_P is not set on DECL_EXTERNAL vars to be
pruned out.
(partition_vars_for_init_fini): Use same priority for
consecutive STATIC_INIT_DECOMP_*BASE_P vars and propagate
those flags to new TREE_LISTs when possible.  Formatting fix.
(handle_tls_init): Use decomp_handle_one_var and
decomp_finalize_var_list functions.
gcc/testsuite/
* g++.dg/DRs/dr2867-5.C: New test.
* g++.dg/DRs/dr2867-6.C: New test.
* g++.dg/DRs/dr2867-7.C: New test.
* g++.dg/DRs/dr2867-8.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-09-07 09:31:20.601484156 +0200
+++ gcc/cp/cp-tree.h2024-09-09 15:53:44.924112247 +0200
@@ -470,6 +470,7 @@ extern GTY(()) tree cp_global_trees[CPTI
BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P (in BASELINK)
BIND_EXPR_VEC_DTOR (in BIND_EXPR)
ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (in ATOMIC_CONSTR)
+  STATIC_INIT_DECOMP_BASE_P (in the TREE_LIST for {static,tls}_aggregates)
 2: IDENTIFIER_KIND_BIT_2 (in IDENTIFIER_NODE)
ICS_THIS_FLAG (in _CONV)
DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (in VAR_DECL)
@@ -489,6 +490,8 @@ extern GTY(()) tree cp_global_trees[CPTI
IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR)
PACK_EXPANSION_AUTO_P (in *_PACK_EXPANSION)
contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
+  STATIC_INIT_DECOMP_NONBASE_P (in the TREE_LIST
+   for {static,tls}_aggregates)
 3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR)
ICS_BAD_FLAG (in _CONV)
FN_TRY_BLOCK_P (in TRY_BLOCK)
@@ -5947,6 +5950,21 @@ extern bool defer_mangling_aliases;
  
  extern bool flag_noexcept_type;
  
+/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic

+   initialization of namespace scope structured binding base or related
+   extended ref init temps.  Temporaries from the initialization of
+   STATIC_INIT_DECOMP_BASE_P dynamic initializers should be destroyed only
+   after the last STATIC_INIT_DECOMP_NONBASE_P dynamic initializer following
+   it.  */
+#define STATIC_INIT_DECOMP_BASE_P(NODE) \
+  TREE_LANG_FLAG_1 (TREE_LIST_CHECK (NODE))
+
+/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic
+   initialization of namespace scope structured binding non-base
+   variable using get.  */
+#define STATIC_INIT_DECOMP_NONBASE_P(NODE) \
+  TREE_LANG_FLAG_2 (TREE_LIST_CHECK (NODE))
+
  /* A list of namespace-scope objects which have constructors or
 destructors which reside in the global scope.  The decl is stored
 in the TREE_VALUE slot and the initializer is stored in the
--- gcc/cp/decl.cc.jj   2024-09-09 11:50:07.146394047 +0200
+++ gcc/cp/decl.cc  2024-09-09 17:16:26.459094150 +0200
@@ -8485,6 +8485,7 @@ cp_finish_decl (tree decl, tree init, bo
bool var_definition_p = false;
tree auto_node;
auto_vec extra_cleanups;
+  tree aggregates1 = NULL_TREE;
struct decomp_cleanup {
  tree decl;
  cp_decomp *&decomp;
@@ -8872,7 +8873,16 @@ cp_finish_decl (tree decl, tree init, bo
}
  
if (decomp)

-   cp_maybe_mangle_decomp (decl, decomp);
+   {
+ cp_maybe_mangle_decomp (decl, decomp);
+ if (TREE_STATIC (decl) && !DECL_FUNCTION_SCOPE_P (decl))
+   {
+ if (CP_DECL_THREAD_LOCAL_P (decl))
+   aggregates1 = tls_aggregates;
+ else
+   aggregates1 = static_aggregates;
+   }
+   }
  
/* If this is a local variable that will need a mangled name,

 register it now.  We must do this be

RE: [RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x

2025-01-27 Thread Jiang, Haochen

> From: Richard Biener 
> Sent: Monday, January 27, 2025 5:09 PM
> 
> On Mon, Jan 27, 2025 at 8:30 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > AVX10 has been published for one and half year and we have got many
> > feedbacks on that, one of the feedback is on whether the alias option
> > -mavx10.x should point to 256 or 512.
> >
> > If you also pay attention to LLVM community, you might see this thread
> > related to AVX10 options just sent out several hours ago:
> >
> > [X86][AVX10] Disable m[no-]avx10.1 and switch m[no-]avx10.2 to alias
> > of 512 bit options
> > https://github.com/llvm/llvm-project/pull/124511
> >
> > In GCC, we will also do so. This RFC patch is slightly different with
> > LLVM, just
> > including:
> >
> >   - Switch -m[no-]avx10.2 to alias of 512 bit options.
> >   - Change -mno-avx10.[1,2]-512 to disable both 256 and 512 instructions.
> This
> >   will also result in -mno-avx10.2 would still disable both 256 and 512 
> > insts
> >   according to new alias point to 512.
> >
> > But not including disabling -m[no-]avx10.1, since I still want more
> > input on how to handle that. We actually have three choices on that:
> >
> >  a. Directly re-alias -m[no-]avx10.1 to -m[no-]avx10.1-512 GCC 15 and
> > backport  to GCC 14.
> >  b. Disable -m[no]-avx10.1 in GCC 15, and add it back with
> > -m[no-]avx10.1-512  in the future. This is for in case if someone
> > cross compile with different versions  of GCC with -mavx10.1, it might get
> unexpected result sliently.
> >  c. Disable -m[no]-avx10.1 in GCC 15, and never add it back. Since the
> > option has  been 256 bit, changing them back and forth is messy.
> >
> > It might be the final chance we could change the alias option since
> > real
> > AVX10.1 hardware is coming soon. And it is only x86 specific, so it
> > might still squeeze into GCC 15 at this time.
> >
> > I call this patch RFC patch since we also need to change the doc and
> > testcases accordingly, which makes this patch incomplete. Discussion
> > and input is welcomed on this topic.
> 
> Can you re-hash on how users need to select 256bit vs 512bit support?
> I understand
> the above change basically makes -m[no-]avx10.[12] gate ISA features but not
> size?
> So that will now enable 512bit unless -mno-evex512 is given?
> 

-mno-evex512 will do nothing with AVX10 related options. It will only apply on
-mavx512xxx options.

In GCC currently, take AVX10.2 as example, we have the following options for one
AVX10 version:

-mavx10.2-256: Enable AVX10.2 ISA set with 256 bit vector size only
-mavx10.2-512: Enable AVX10.2 ISA set with both 256 and 512 bit vector size
-mavx10.2: An alias to -mavx10.2-256

Based on that, the current -mno- option would be:

-mno-avx10.2-256: Disable AVX10.2-256 ISA set, which actually disables the whole
AVX10.2.
-mno-avx10.2-512: Disable AVX10.2 ISA set 512 bit vector usage, but keep 256 bit
there.
-mno-avx10.2: An alias to -mno-avx10.2-256.

In this RFC, basically we have two main goal, both of them are introduced in 
GCC 15:
  - Change -mavx10.2 alias to 512 bit instead of 256 bit.
  - -mno-avx10.2-512 would disable the whole AVX10.2, instead of 512 bit only, 
to
  eliminate different understanding on that due to in AVX10 we turned the imply
  relationship from line-like to square-like. Also, it will lead to the new 
-mno-avx10.2
  still disable the whole AVX10.2 ISA set, which meets its first impression on 
option
  name.

Due to that, -mno-avx10.1-512 should also align with that, which is introduced 
in 
GCC 14.

> Given for the -mavx10.[12]-{256,512} the behavior changes compared to GCC
> 14 I'd rather drop the options that behave differently from GCC 14 on GCC 15
> than changing their meaning.  That unfortunately will make them a hard error
> (but I don't expect much use).  I'm not sure it's worth retaining -m[no-
> ]avx10.[12]-512.

-mavx10.[12]-256/512 are clear, -mno-avx10.[12]-256 is also clear. No need to
concern on that. As said, the changes are on -mavx10.x alias and
-mno-avx10.x-512, where AVX10.1 introduced in GCC 14 and AVX10.2 in GCC 15.

In my opinion, I would go either option a) or option c) for -mavx10.1. AVX10.1
options are not widely used for now due to its first appearance in GNR. So that
is why option a) could be doable although changing the meaning, but it is the
last chance for option a. Option c) is a safe choice and I pretty like it. I 
don't like
option b) but I list that out since it is also a choice. I listed them all for 
discussion.

For AVX10.2, since it is first introduced in GCC 15, we could change that w/o
considering compatibility issue. 

> 
> But maybe I'm misunderstanding the change (too many avx10.x related
> options are there).
> 

Yes, that is quite confusing and takes me some time to try to illustrate them
as clear as I can.

Thx,
Haochen

Re: [PATCH] options: Adjust cl_optimization_compare to avoid checking ICE [PR115913]

2025-01-27 Thread Richard Biener

On Mon, Jan 27, 2025 at 1:28 PM Lewis Hyatt  wrote:
>
> Hello-
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115913#c10
>
> This tweak to optc-save-gen.awk weakens the check performed by
> cl_optimization_compare() to avoid checking asserts that have been there
> since this function was first added in r11-1141. Is it OK for 15 please? I
> think it would be appropriate to backport back to 12,13,14 as well.
> Bootstrap + regtest all languages on x86-64 Linux with no issues. Thanks!

The check is supposed to verify that cl_target_option_restore does not
alter the set of global options restored by cl_optimization_restore (or others).

Since there are no warning flags restored in cl_optimization_restore the
patch is OK.

I suppose to follow the intend we'd have to save the current set of diagnostic
options after cl_optimization_restore and compare those as well after
the cl_target_option_restore.

Thanks,
Richard.


> -Lewis
>
> -- >8 --
>
> At the end of a sequence like:
>  #pragma GCC push_options
>  ...
>  #pragma GCC pop_options
>
> the handler for pop_options calls cl_optimization_compare() (as generated by
> optc-save-gen.awk) to make sure that all global state has been restored to
> the value it had prior to the push_options call. The verification is
> performed for almost all entries in the global_options struct. This leads to
> unexpected checking asserts, as discussed in the PR, in case the state of
> warnings-related options has been intentionally modified in between
> push_options and pop_options via a call to #pragma GCC diagnostic. Address
> that by skipping the verification for CL_WARNING-flagged options.
>
> gcc/ChangeLog:
>
> PR middle-end/115913
> * optc-save-gen.awk (cl_optimization_compare): Skip options with
> CL_WARNING flag.
>
> gcc/testsuite/ChangeLog:
>
> PR middle-end/115913
> * c-c++-common/cpp/pr115913.c: New test.
> ---
>  gcc/optc-save-gen.awk | 5 +
>  gcc/testsuite/c-c++-common/cpp/pr115913.c | 7 +++
>  2 files changed, 12 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/cpp/pr115913.c
>
> diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
> index fa9218472ed..a3d7e5a478e 100644
> --- a/gcc/optc-save-gen.awk
> +++ b/gcc/optc-save-gen.awk
> @@ -1484,6 +1484,11 @@ for (i = 0; i < n_opts; i++) {
> if (name == "")
> continue;
>
> +   # We do not want to compare warning-related options, since they
> +   # might have been modified by a #pragma GCC diagnostic.
> +   if (flag_set_p("Warning", flags[i]))
> +   continue;
> +
> if (name in checked_options)
> continue;
> checked_options[name]++
> diff --git a/gcc/testsuite/c-c++-common/cpp/pr115913.c 
> b/gcc/testsuite/c-c++-common/cpp/pr115913.c
> new file mode 100644
> index 000..b9d10cda8d2
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/cpp/pr115913.c
> @@ -0,0 +1,7 @@
> +/* { dg-do preprocess } */
> +/* PR middle-end/115913 */
> +#pragma GCC push_options
> +#pragma GCC diagnostic warning "-Wundef"
> +/* The call to cl_optimization_compare performed by pop_options should not
> +   lead to a checking failure.  */
> +#pragma GCC pop_options

Re: [PATCH v3] [ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]

2025-01-27 Thread Richard Biener

On Sat, Jan 25, 2025 at 4:37 AM Alexandre Oliva  wrote:
>
> On Jan 24, 2025, Richard Biener  wrote:
>
> > Hmm.  I think when an original ref could trap that should be the
> > insertion point (or the original ref should post-dominate the insertion 
> > point).
>
> I suppose we could do that, but...  intuitively, it doesn't feel safe to
> move a nontrapping load down to combine it with a trapping load either.
>
> > It's fine when an originally may-trap reference becomes not trapping
> > (and it really cannot trap).  Introducing new (may-)traps is never OK.
>
> I don't really see how a may-trap reference to a certain range of bits
> could possibly become non-trapping by merely reissuing it, so I can't
> respond intelligently to your assessment.

What I was saying that the conservative tree_could_trap_p could say 'yes'
to a certain encoding of a ref but 'no' to another if in reality the
ref can never
trap.  We of course cannot (apart from bugs in tree_could_trap_p) turn
a for-sure trap into a not-trap by simply rewriting the ref.

> > So, can we split the patch to the case with the testcase at hand and
> > consider the assert and the extra tree_could_trap_p checks separately?
>
> The approved patch, with the testcase and the simplest fix, is already
> in.  I've rebased the additional defensive checks and interface changes
> into the following, but I don't have any pressing need of getting them
> in: I don't know how to trigger any of the issues tested for.
>
>
> If decode_field_reference finds a load that accesses past the inner
> object's size, bail out.
>
> If loads we're attempting to combine don't have the same trapping
> status, bail out, to avoid replacing the wrong load and enabling loads
> to be moved that shouldn't.
>
> Regstrapped on x86_64-linux-gnu.  Should I put this in?
>
>
> for  gcc/ChangeLog
>
> PR tree-optimization/118514
> * gimple-fold.cc (decode_field_reference): Refuse to consider
> merging out-of-bounds BIT_FIELD_REFs.
> (fold_truth_andor_for_ifcombine): Check that combinable loads
> have the same trapping status.
> * tree-eh.cc (bit_field_ref_in_bounds_p): Rename to...
> (access_in_bounds_of_type_p): ... this.  Change interface,
> export.
> (tree_could_trap_p): Adjust.
> * tree-eh.h (access_in_bounds_of_type_p): Declare.
> ---
>  gcc/gimple-fold.cc |   16 ++--
>  gcc/tree-eh.cc |   25 +
>  gcc/tree-eh.h  |1 +
>  3 files changed, 24 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 45485782cdf91..24aada4bc8fee 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7686,10 +7686,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>|| bs <= shiftrt
>|| offset != 0
>|| TREE_CODE (inner) == PLACEHOLDER_EXPR
> -  /* Reject out-of-bound accesses (PR79731).  */
> -  || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> - && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> -  bp + bs) < 0)
> +  /* Reject out-of-bound accesses (PR79731, PR118514).  */
> +  || !access_in_bounds_of_type_p (TREE_TYPE (inner), bs, bp)

So I think we want this bit in (and it's dependences), but

>|| (INTEGRAL_TYPE_P (TREE_TYPE (inner))
>   && !type_has_mode_precision_p (TREE_TYPE (inner
>  return NULL_TREE;
> @@ -8228,8 +8226,12 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>std::swap (rl_loc, rr_loc);
>  }
>
> +  /* Check that the loads that we're trying to combine have the same vuse and
> + same trapping status.  */
>if ((ll_load && rl_load)
> -  ? gimple_vuse (ll_load) != gimple_vuse (rl_load)
> +  ? (gimple_vuse (ll_load) != gimple_vuse (rl_load)
> +|| (tree_could_trap_p (gimple_assign_rhs1 (ll_load))
> +!= tree_could_trap_p (gimple_assign_rhs1 (rl_load
>: (!ll_load != !rl_load))

this and

>  return 0;
>
> @@ -8282,7 +8284,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>else if (lr_reversep != rr_reversep
>|| ! operand_equal_p (lr_inner, rr_inner, 0)
>|| ((lr_load && rr_load)
> -  ? gimple_vuse (lr_load) != gimple_vuse (rr_load)
> +  ? (gimple_vuse (lr_load) != gimple_vuse (rr_load)
> + || (tree_could_trap_p (gimple_assign_rhs1 (lr_load))
> + != tree_could_trap_p (gimple_assign_rhs1 (rr_load
>: (!lr_load != !rr_load)))

this not at this point (I see the assert is no longer in the patch).

So, apart from these two hunks this is OK as well.

Thanks,
Richard.

>  return 0;
>
> diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
> index 27a4b2b5b1665..68008cea588a7 100644
> --- a/gcc/tree-eh.cc
> +++ b/gcc/tree-eh.cc
> @@ -2646,24 +2646,22 @@ range_in_array_bounds_p (tree ref)
>return true;
>  }
>

Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]

2025-01-27 Thread Jason Merrill


On 1/27/25 8:14 AM, Simon Martin wrote:

Hi Jason,

On 24 Jan 2025, at 16:51, Jason Merrill wrote:


On 1/24/25 6:19 AM, Simon Martin wrote:

Hi Jason,

On 23 Jan 2025, at 22:56, Jason Merrill wrote:


On 1/23/25 12:06 PM, Simon Martin wrote:

Hi Marek,

On 23 Jan 2025, at 16:45, Marek Polacek wrote:


On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote:

Hi Jason,

On 20 Jan 2025, at 22:49, Jason Merrill wrote:


On 1/20/25 2:11 PM, Simon Martin wrote:

Hi Jason,

On 15 Jan 2025, at 22:42, Jason Merrill wrote:


On 12/12/24 2:07 PM, Simon Martin wrote:

We currently ICE upon the following valid (under -Wno-vla)



code




=== cut here ===
void f(int c) {
   constexpr int r = 4;
   [&](auto) { int t[r * c]; }(0);
}
=== cut here ===

The problem is that when parsing the lambda body, and more
specifically
the multiplication, we mark the lambda as
LAMBDA_EXPR_CAPTURE_OPTIMIZED
even though the replacement of r by 4 is "undone" by the call



to
build_min_non_dep in build_x_binary_op. This makes
prune_lambda_captures
remove the proxy declaration while it should not, and we trip



on
an



assert at instantiation time.


Why does prune_lambda_captures remove the declaration if it's
still
used in the function body?  Setting
LAMBDA_EXPR_CAPTURE_OPTIMIZED
just
means "we might have optimized away some captures", the tree
walk
should have found the use put back by build_x_binary_op.

I think that this is due to a bug in mark_const_cap_r, that’s
been
here since the beginning, i.e. r8-7213-g1577f10a637352: it
decides



NOT
to walk sub-trees when encountering a DECL_EXPR for a constant
capture
proxy (at lambda.cc:1769). I don’t understand why we



wouldn’t
want
to continue.


Because that's the declaration of the capture proxy, not a use
of
it.

Indeed, thanks for clarifying.


Why aren't we finding the use in the declaration of t?

After further investigation, the problem is rather that neither
walk_tree_1 nor cp_walk_subtrees walk the dimensions of array
types,
so
we miss the uses.


Removing that line fixes the PR, but breaks 3 existing tests
(lambda-capture1.C, lambda-const11.C and lambda-const11a.C,
that
all
assert that we optimise out the capture); that’s why I did
not
pursue
it in the first place.


Right.


But taking another look, it might not be that big a deal that
we
don’t
optimise those out: as soon as we use -O1 or above, the
assignment
to



the closure field actually disappears.


Completely breaking this optimization is a big deal,
particularly
since it affects the layout of closure objects.  We can't always



optimize everything away.

ACK.

The attached updated patch makes cp_walk_subtrees walk array type



dimensions, which fixes the initial PR without those 3
regressions.




Successfully tested on x86_64-pc-linux-gnu. Is it OK?

Simon

PS: In theory I think it’d make sense to do do this change in

walk_tree_1 since C also supports VLAs, but doing so breaks some



OMP
tests. OMP can do interesting things with array bounds (see
r9-5354-gda972c05f48637), and fixing those would require teaching



walk_tree_1 about the “omp dummy var” array bounds, which I
think
would be a bit ugly. And I’m not aware of any C case that would
be
improved/fixed by doing this change, so we’re probably fine not
doing
it.



   From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17
00:00:00
2001
From: Simon Martin 
Date: Wed, 22 Jan 2025 16:19:47 +0100
Subject: [PATCH] c++: Don't prune constant capture proxies only
used
in array
dimensions [PR114292]

We currently ICE upon the following valid (under -Wno-vla) code

=== cut here ===
void f(int c) {
 constexpr int r = 4;
 [&](auto) { int t[r * c]; }(0);
}
=== cut here ===

When parsing the lambda body, and more specifically the
multiplication,
we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which
indicates
to
prune_lambda_captures that it might be possible to optimize out
some
captures.

The problem is that prune_lambda_captures then misses the use of



the
r
capture (because neither walk_tree_1 nor cp_walk_subtrees walks
the



dimensions of array types - here "r * c"), hence believes the
capture
can be pruned... and we trip on an assert when instantiating the
lambda.

This patch changes cp_walk_subtrees so that when walking a
declaration
with an array type, it walks that array type's dimensions. Since
C
also
supports VLAs, I thought it'd make sense to do this in
walk_tree_1,



but
this breaks some omp-low related test cases (because OMP can do
funky
things with array bounds when adjust_array_error_bounds is set),



and
I
don't really want to add code to walk_tree_1 to skips arrays
whose
dimension is a temporary variable with the "omp dummy var"
attribute.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/114292

gcc/cp/ChangeLog:

* tree.cc (cp_walk_subtrees): Walk array type dimensions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-ice4.C: New test.

---
gcc/cp/tree.cc

Re: [PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Richard Sandiford

Spencer Abson  writes:
> Enable a target with FEAT_FP16 to emit the half-precision variants
> of FCMP/FCMPE.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
>   and fcmpe to use the GPF_F16 iterator for floating-point
>   modes.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/_Float16_cmp_1.c: New test.
>   * gcc.target/aarch64/_Float16_cmp_2.c: New (negative) test.
> ---
>  gcc/config/aarch64/aarch64.md | 29 +-
>  .../gcc.target/aarch64/_Float16_cmp_1.c   | 54 +++
>  .../gcc.target/aarch64/_Float16_cmp_2.c   |  7 +++
>  3 files changed, 77 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 071058dbeb3..8721bf5d4f3 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -707,11 +707,12 @@
>  )
>  
>  (define_expand "cbranch4"
> -  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
> - [(match_operand:GPF 1 "register_operand")
> -  (match_operand:GPF 2 
> "aarch64_fp_compare_operand")])
> -(label_ref (match_operand 3 "" ""))
> -(pc)))]
> +  [(set (pc) (if_then_else
> + (match_operator 0 "aarch64_comparison_operator"
> +  [(match_operand:GPF_F16 1 "register_operand")
> +   (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
> + (label_ref (match_operand 3 "" ""))
> + (pc)))]
>""
>"
>operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
> @@ -4338,26 +4339,28 @@
>  
>  (define_insn "fcmp"
>[(set (reg:CCFP CC_REGNUM)
> -(compare:CCFP (match_operand:GPF 0 "register_operand")
> -   (match_operand:GPF 1 "aarch64_fp_compare_operand")))]
> + (compare:CCFP
> + (match_operand:GPF_F16 0 "register_operand")
> + (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))]

Minor formatting nit, but it'd be more usual to indent by two extra
columns relative to the (compare ...), i.e.:

(compare:CCFP
  (match_operand:GPF_F16 0 "register_operand")
  (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))]

Similarly for the others.

OK with that change for GCC 16, thanks.

BTW:

> diff --git a/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c 
> b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c
> new file mode 100644
> index 000..e714304970b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.2-a+nofp16" } */
> +
> +#include "_Float16_cmp_1.c"
> +
> +/* { dg-final { scan-assembler-not "\tfcmp\th\[0-9\]\+" } } */
> +/* { dg-final { scan-assembler-not "\tfcmpe\th\[0-9\]\+" } } */
> \ No newline at end of file

It can be easier to write these tests using {...} quoting for the regexp
as well, so that backslashes are regexp backslashes rather than Tcl string
backslashes:

/* { dg-final { scan-assembler-not {\tfcmp\th[0-9]+} } } */
/* { dg-final { scan-assembler-not {\tfcmpe\th[0-9]+} } } */

The "..." version is ok (and often used) too though.

Richard

Re: [PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends

2025-01-27 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, 27 Jan 2025, Jakub Jelinek wrote:
>
>> On Mon, Jan 27, 2025 at 11:09:38AM +0100, Richard Biener wrote:
>> >PR rtl-optimization/118662
>> >* combine.cc (try_combine): When re-materializing a load
>> >from an extended reg by a lowpart subreg make sure we're
>> >dealing with single-component modes.
>> > 
>> >* gcc.dg/torture/pr118662.c: New testcase.
>> > ---
>> >  gcc/combine.cc  |  5 +
>> >  gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++
>> >  2 files changed, 23 insertions(+)
>> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c
>> > 
>> > diff --git a/gcc/combine.cc b/gcc/combine.cc
>> > index a2d4387cebe..4849603ba5e 100644
>> > --- a/gcc/combine.cc
>> > +++ b/gcc/combine.cc
>> > @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
>> > *i1, rtx_insn *i0,
>> >   copy.  This saves at least one insn, more if register allocation can
>> >   eliminate the copy.
>> >  
>> > + We cannot do this if the involved modes have more than one elements,
>> > + like for vector or complex modes.
>> > +
>> >   We cannot do this if the destination of the first assignment is a
>> >   condition code register.  We eliminate this case by making sure
>> >   the SET_DEST and SET_SRC have the same mode.
>> > @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
>> > *i1, rtx_insn *i0,
>> >   && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND
>> >   && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))
>> >   == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0
>> > + && known_eq (GET_MODE_NUNITS
>> > +(GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1)
>> 
>> If you want to rule this out for vector/complex modes, then the above
>> doesn't do that, all the V1??mode will still be optimized, and I think
>> that is undesirable.
>
> If you think so (I think it should work fine for V1??mode).

Yeah, FWIW, I liked the original version, since it seemed to capture
the property that matters.  (No objection to the new variant though.)

Thanks,
Richard

[PATCH v3 3/4] RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]

2025-01-27 Thread pan2 . li

From: Pan Li 

This patch would like to fix the wroing code generation for the scalar
signed SAT_SUB.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like sub.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually
want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get
0xff - 1 = 0xfe which is incorrect.  Thus, we have to sign extend 0xff(Qmode)
to 0x(assume XImode is DImode) before sub in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688.h: Add test helper macro.
* gcc.target/riscv/pr117688-sub-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  4 ++--
 .../gcc.target/riscv/pr117688-sub-run-1-s16.c |  6 ++
 .../gcc.target/riscv/pr117688-sub-run-1-s32.c |  6 ++
 .../gcc.target/riscv/pr117688-sub-run-1-s64.c |  6 ++
 .../gcc.target/riscv/pr117688-sub-run-1-s8.c  |  6 ++
 gcc/testsuite/gcc.target/riscv/pr117688.h | 21 +++
 6 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b9fb3733b86..1e5de85e871 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12919,8 +12919,8 @@ riscv_expand_sssub (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant ();
   rtx shift_bits = GEN_INT (bitsize - 1);
-  rtx xmode_x = gen_lowpart (Xmode, x);
-  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND);
   rtx xmode_minus = gen_reg_rtx (Xmode);
   rtx xmode_xor_0 = gen_reg_rtx (Xmode);
   rtx xmode_xor_1 = gen_reg_rtx (Xmode);
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c
new file mode 100644
index 000..7b375bb6c85
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int16_t, INT16_MIN, INT16_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c
new file mode 100644
index 000..ba0e8fc8ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int32_t, INT32_MIN, INT32_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c
new file mode 100644
index 000..c24c549af30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int64_t, INT64_MIN, INT64_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c
new file mode 100644
index 000..67f9df179a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h 
b/gcc/testsuite/gcc.target/riscv/pr117688.h
index 1013a8a8012..3b734ce62ed 100644
--- a/gcc/testsuite/gcc.target/riscv/pr117688.h
+++ b/gcc/testsuite/gcc.target/riscv/pr117688.h
@@ -24,4 +24,25 @@
 return 0; \
   }
 
+#define DEFINE_SIGNED_SAT_SUB_RUN(T, MIN, MAX)  \
+  T x, y, result;

[PATCH v3 2/4] RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]

2025-01-27 Thread pan2 . li

From: Pan Li 

This patch would like to fix the wroing code generation for the scalar
signed SAT_ADD.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually
want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get
0xff + 2 = 0x101 which is incorrect.  Thus, we have to sign extend 0xff(Qmode)
to 0x(assume XImode is DImode) before plus in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688-add-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s8.c: New test.
* gcc.target/riscv/pr117688.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  4 +--
 .../gcc.target/riscv/pr117688-add-run-1-s16.c |  6 +
 .../gcc.target/riscv/pr117688-add-run-1-s32.c |  6 +
 .../gcc.target/riscv/pr117688-add-run-1-s64.c |  6 +
 .../gcc.target/riscv/pr117688-add-run-1-s8.c  |  6 +
 gcc/testsuite/gcc.target/riscv/pr117688.h | 27 +++
 6 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688.h

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 34f0a888c5c..b9fb3733b86 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12815,8 +12815,8 @@ riscv_expand_ssadd (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant ();
   rtx shift_bits = GEN_INT (bitsize - 1);
-  rtx xmode_x = gen_lowpart (Xmode, x);
-  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND);
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_dest = gen_reg_rtx (Xmode);
   rtx xmode_xor_0 = gen_reg_rtx (Xmode);
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c
new file mode 100644
index 000..21ec107cbf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int16_t, INT16_MIN, INT16_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c
new file mode 100644
index 000..1f197d1280b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int32_t, INT32_MIN, INT32_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c
new file mode 100644
index 000..4903bc854d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int64_t, INT64_MIN, INT64_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c
new file mode 100644
index 000..a9f2fe7f192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h 
b/gcc/testsuite/gcc.target/riscv/pr117688.h
new file mode 100644
index 000..1013a8a8012
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_PR117688_H
+#define HAVE_DEFINED_PR117688_H
+
+#include 
+
+#define DEFINE_SIGNED_SAT_ADD_RUN(T, MIN, MAX)\
+  T x, y, re

[PATCH v3 4/4] RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]

2025-01-27 Thread pan2 . li

From: Pan Li 

This patch would like to fix the wroing code generation for the scalar
signed SAT_TRUNC.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually
want compare -129 to -128, but if there is no sign extend like lbu, we will
compare 0xff7f to 0xff80(assum Xmode is DImode).  Thus, we have
to sign extend 0xff(Qmode) to 0xff7f(assume Xmode is DImode)
before compare in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688.h: Add test helper macros.
* gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  2 +-
 .../riscv/pr117688-trunc-run-1-s16-to-s8.c|  6 +
 .../riscv/pr117688-trunc-run-1-s32-to-s16.c   |  6 +
 .../riscv/pr117688-trunc-run-1-s32-to-s8.c|  6 +
 .../riscv/pr117688-trunc-run-1-s64-to-s16.c   |  6 +
 .../riscv/pr117688-trunc-run-1-s64-to-s32.c   |  6 +
 .../riscv/pr117688-trunc-run-1-s64-to-s8.c|  6 +
 gcc/testsuite/gcc.target/riscv/pr117688.h | 22 +++
 8 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1e5de85e871..7f8952735d8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13030,7 +13030,7 @@ riscv_expand_sstrunc (rtx dest, rtx src)
   rtx xmode_narrow_min = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
   rtx xmode_gt = gen_reg_rtx (Xmode);
-  rtx xmode_src = gen_lowpart (Xmode, src);
+  rtx xmode_src = riscv_extend_to_xmode_reg (src, GET_MODE (src), SIGN_EXTEND);
   rtx xmode_dest = gen_reg_rtx (Xmode);
   rtx xmode_mask = gen_reg_rtx (Xmode);
   rtx xmode_sat = gen_reg_rtx (Xmode);
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c
new file mode 100644
index 000..df84615a25f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_TRUNC_RUN(int16_t, int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c
new file mode 100644
index 000..1b0f860eb55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int16_t, INT16_MIN, INT16_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c
new file mode 100644
index 000..e412a29df36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c
new file mode 100644
index 000..234d33b1895
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-a

Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-01-27 Thread Christophe Lyon

Hi Torbjorn,

On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON
 wrote:
>
> Another ping... :)
>
> Kind regards,
> Torbjörn
>
> On 2024-12-18 18:35, Torbjorn SVENSSON wrote:
> > Gentle ping :)
> >
> > Kind regards,
> > Torbjörn
> >
> > On 2024-11-14 17:16, Torbjorn SVENSSON wrote:
> >>
> >>
> >> On 2024-11-14 16:32, Christophe Lyon wrote:
> >>> On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON
> >>>  wrote:
> 
>  Ok for trunk and releases/gcc-14?
> 
>  --
> 
>  A long time ago, this test forced -march=armv6.
> 
>  With -marm, the generated assembler is:
>  foo:
>   sub r0, r0, #48
>   cmp r0, #9
>   movhi   r0, #0
>   movls   r0, #1
>   bx  lr
> 
>  With -mthumb, the generated assembler is:
>  foo:
>   subsr0, r0, #48
>   movsr2, #9
>   uxtbr3, r0
>   movsr0, #0
>   cmp r2, r3
>   adcsr0, r0, r0
>   uxtbr0, r0
>   bx  lr
> 
>  Require effective-target arm_arm_ok to skip the test for thumb-only
>  targets (Cortex-M).
> 
>  gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/unsigned-extend-1.c: Use effective-target
>   arm_arm_ok.
> 
>  Signed-off-by: Torbjörn SVENSSON 
>  ---
>    gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
>    1 file changed, 1 insertion(+)
> 
>  diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/
>  testsuite/gcc.target/arm/unsigned-extend-1.c
>  index 3b4ab048fb0..73f2e1a556d 100644
>  --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
>  +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
>  @@ -1,4 +1,5 @@
>    /* { dg-do compile } */
>  +/* { dg-require-effective-target arm_arm_ok } */
>    /* { dg-options "-O2" } */
> 
> >>>
> >>> So I'm guessing arm_arm_ok fails when you are testing for M-profile
> >>> targets?
> >>> What happens when you test for A-profile, providing -mthumb as part of
> >>> your runtest flags?
> >>
> >>

Looking at our CI results, without your patch the test passes on:
cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode,
and it fails only on cortex-m0 and m23.
Does this match your observations?

So your patch will make the test unsupported on m33, m3, m55 and m7
while it currently passes?

Am I missing something?

Thanks,

Christophe

> >>
> >> For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft):
> >>
> >> Testing arm/unsigned-extend-1.c
> >> check_compile tool: gcc for arm_arm_ok
> >> doing compile
> >> Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21114.c  -
> >> mthumb -mcpu=cortex-m0 -mfloat-abi=soft   -fdiagnostics-plain-output
> >> -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s
> >> (timeout = 800)
> >> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c -
> >> mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -
> >> marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s
> >> pid is 21254 -21254
> >> cc1: error: target CPU does not support ARM mode
> >> pid is -1
> >> close result is 21254 exp6 0 1
> >> output is cc1: error: target CPU does not support ARM mode
> >>   status 1
> >> compiler exited with status 1
> >> UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c
> >>
> >>
> >> For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft):
> >>
> >> Testing arm/unsigned-extend-1.c
> >> check_compile tool: gcc for arm_arm_ok
> >> doing compile
> >> Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21296.c  -
> >> mthumb -mcpu=cortex-a7 -mfloat-abi=soft   -fdiagnostics-plain-output
> >> -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s
> >> (timeout = 800)
> >> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c -
> >> mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output -
> >> marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s
> >> pid is 21434 -21434
> >> pid is -1
> >> output is  status 0
> >> doing compile
> >> Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/
> >> gcc.target/arm/unsigned-extend-1.c  -mthumb -mcpu=cortex-a7 -mfloat-
> >> abi=soft   -fdiagnostics-plain-output  -O2 -ffat-lto-objects -fno-
> >> ident -S -o unsigned-extend-1.s(timeout = 800)
> >> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc .../gcc/testsuite/
> >> gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat-
> >> abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno-ident -
> >> S -o unsigned-extend-1.s
> >> pid is 21438 -21438
> >> pid is -1
> >> output is  status 0
> >> check_compile tool: gcc for exceptions_enabled
> >> doing compile
> >> Executing on host: .../bin/arm-none-eabi-gcc
> >> exceptions_enabled21296.cc   -mthumb -mcpu=cortex-a7 -mfloat-
> >> abi=soft   -fdiagnostics-plain-output -Wno-complain-wrong-lang -S
> >> -o exceptions_enabled21296.s(time

[PATCH] libstdc++: correct symbol version of typeinfo for bfloat16_t on RISC-V

2025-01-27 Thread Andreas Schwab

RISC-V only gained support for bfloat16_t after gcc 14.  Passes
libstdc++/check_abi on {x86_64,aarch64,ppc64le,riscv64,s390x}-suse-linux.

PR libstdc++/118563
* testsuite/util/testsuite_abi.cc (check_version): Add
CXXABI_1.3.16.
* config/abi/pre/gnu.ver (CXXABI_1.3.14) [__riscv]: Exclude
typeinfo for bfloat16_t.
(CXXABI_1.3.16) [__riscv]: Add it here.
---
 libstdc++-v3/config/abi/pre/gnu.ver  | 23 
 libstdc++-v3/testsuite/util/testsuite_abi.cc |  1 +
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index f7641974ec4..84ce874fe03 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2852,10 +2852,15 @@ CXXABI_1.3.13 {
 CXXABI_1.3.14 {
 
 # typeinfo for _Float{16,32,64,128,32x,64x,128x} and
-# __bf16
-_ZTIDF[0-9]*[_bx];
-_ZTIPDF[0-9]*[_bx];
-_ZTIPKDF[0-9]*[_bx];
+# __bf16/bfloat16_t
+_ZTIDF[0-9]*[_x];
+_ZTIPDF[0-9]*[_x];
+_ZTIPKDF[0-9]*[_x];
+#ifndef __riscv
+_ZTIDF16b;
+_ZTIPDF16b;
+_ZTIPKDF16b;
+#endif
 _ZTIu6__bf16;
 _ZTIPu6__bf16;
 _ZTIPKu6__bf16;
@@ -2869,6 +2874,16 @@ CXXABI_1.3.15 {
 
 } CXXABI_1.3.14;
 
+CXXABI_1.3.16 {
+
+#ifdef __riscv
+_ZTIDF16b;
+_ZTIPDF16b;
+_ZTIPKDF16b;
+#endif
+
+} CXXABI_1.3.15;
+
 # Symbols in the support library (libsupc++) supporting transactional memory.
 CXXABI_TM_1 {
 
diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
b/libstdc++-v3/testsuite/util/testsuite_abi.cc
index dcf5b1a4e0b..0d6080fb92c 100644
--- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
@@ -237,6 +237,7 @@ check_version(symbol& test, bool added)
   known_versions.push_back("CXXABI_1.3.13");
   known_versions.push_back("CXXABI_1.3.14");
   known_versions.push_back("CXXABI_1.3.15");
+  known_versions.push_back("CXXABI_1.3.16");
   known_versions.push_back("CXXABI_IEEE128_1.3.13");
   known_versions.push_back("CXXABI_TM_1");
   known_versions.push_back("CXXABI_FLOAT128");
-- 
2.48.1


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH v3 1/4] RISC-V: Refactor SAT_* operand rtx extend to reg help func [NFC]

2025-01-27 Thread pan2 . li

From: Pan Li 

This patch would like to refactor the helper function of the SAT_*
scalar.  The helper function will convert the define_pattern ops
to the xmode reg for the underlying code-gen.  This patch add
new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int
or the mode is non-Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ...
(riscv_extend_to_xmode_reg): Rename to and add rtx_code for
zero/sign extend if non-Xmode.
(riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND.
(riscv_expand_ussub): Ditto.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 77 ---
 1 file changed, 48 insertions(+), 29 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd50fe4eddf..34f0a888c5c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12644,14 +12644,26 @@ riscv_get_raw_result_mode (int regno)
 /* Generate a REG rtx of Xmode from the given rtx and mode.
The rtx x can be REG (QI/HI/SI/DI) or const_int.
The machine_mode mode is the original mode from define pattern.
+   The rtx_code can be ZERO_EXTEND or SIGN_EXTEND.
 
-   If rtx is REG and Xmode, the RTX x will be returned directly.
+   If rtx is REG:
 
-   If rtx is REG and non-Xmode, the zero extended to new REG of Xmode will be
-   returned.
+   1.  If rtx Xmode, the RTX x will be returned directly.
+   2.  If rtx non-Xmode, the extended to new REG of Xmode will be returned.
 
-   If rtx is const_int, a new REG rtx will be created to hold the value of
-   const_int and then returned.
+   The scalar ALU like add don't support non-Xmode like QI/HI.  Then the
+   gen_lowpart will have problem here.  For example, when we would like
+   to add -1 (0xff if QImode) and 2 (0x2 if QImode).  The 0xff and 0x2 will
+   be loaded to register for adding.  Aka:
+
+   0xff + 0x2 = 0x101 instead of -1 + 2 = 1.
+
+   Thus we need to sign extend 0xff to 0x if Xmode is DImode
+   for correctness.  Similar the unsigned also need zero extend.
+
+   If rtx is const_int:
+
+   1.  A new REG rtx will be created to hold the value of const_int.
 
According to the gccint doc, the constants generated for modes with fewer
bits than in HOST_WIDE_INT must be sign extended to full width.  Thus there
@@ -12669,34 +12681,41 @@ riscv_get_raw_result_mode (int regno)
the REG rtx of Xmode, instead of taking care of these in expand func.  */
 
 static rtx
-riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
+riscv_extend_to_xmode_reg (rtx x, machine_mode mode, enum rtx_code rcode)
 {
+  gcc_assert (rcode == ZERO_EXTEND || rcode == SIGN_EXTEND);
+
   rtx xmode_reg = gen_reg_rtx (Xmode);
 
-  if (!CONST_INT_P (x))
+  if (CONST_INT_P (x))
 {
   if (mode == Xmode)
-   return x;
-
-  riscv_emit_unary (ZERO_EXTEND, xmode_reg, x);
-  return xmode_reg;
-}
-
-  if (mode == Xmode)
-emit_move_insn (xmode_reg, x);
-  else
-{
-  /* Combine deliberately does not simplify extensions of constants
-(long story).  So try to generate the zero extended constant
-efficiently.
+   emit_move_insn (xmode_reg, x);
+  else if (rcode == ZERO_EXTEND)
+   {
+ /* Combine deliberately does not simplify extensions of constants
+(long story).  So try to generate the zero extended constant
+efficiently.
 
-First extract the constant and mask off all the bits not in MODE.  */
-  HOST_WIDE_INT val = INTVAL (x);
-  val &= GET_MODE_MASK (mode);
+First extract the constant and mask off all the bits not in
+MODE.  */
+ HOST_WIDE_INT val = INTVAL (x);
+ val &= GET_MODE_MASK (mode);
 
-  /* X may need synthesis, so do not blindly copy it.  */
-  xmode_reg = force_reg (Xmode, gen_int_mode (val, Xmode));
+ /* X may need synthesis, so do not blindly copy it.  */
+ xmode_reg = force_reg (Xmode, gen_int_mode (val, Xmode));
+   }
+  else /* SIGN_EXTEND.  */
+   {
+ rtx x_reg = gen_reg_rtx (mode);
+ emit_move_insn (x_reg, x);
+ riscv_emit_unary (rcode, xmode_reg, x_reg);
+   }
 }
+  else if (mode == Xmode)
+return x;
+  else
+riscv_emit_unary (rcode, xmode_reg, x);
 
   return xmode_reg;
 }
@@ -12717,8 +12736,8 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
-  rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode);
-  rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, ZERO_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, ZERO_EXTEND);
   rtx xmode_dest = gen_reg_rtx (Xmode);
 
   /* Step-1: sum = x + y  */
@@ -12852,8 +12

[PATCH] arm: libbacktrace: Check if the compiler supports __sync atomics

2025-01-27 Thread Richard Earnshaw

Older versions of the Arm architecture lack support for __sync
operations directly in hardware and require calls into appropriate
operating-system hooks.  But such hooks obviously don't exist in a
freestanding environment.

Consquently, it is incorrect to assume during configure that such
functions will exist and we need a configure-time check to determine
whether or not these routines will work.

libbacktrace:

* configure.ac: Always check if the compiler supports __sync
operations.
* configure: Regenerated.
---

A consequence of this assumption that these functions 'just work' (TM)
is that we build libbacktrace with threading support, but that then
fails at link time when we find out that they do not.  I'm not sure I
really understand why we can assume that having with_target_subdir set
is enough to assume that the operations exist: it isn't for Arm and it
looks like it wasn't for hpux either.  Perhaps the whole logic here
needs reconsideration...

 libbacktrace/configure| 23 +++
 libbacktrace/configure.ac | 10 ++
 2 files changed, 33 insertions(+)

diff --git a/libbacktrace/configure b/libbacktrace/configure
index db491a78234..0ecdd3ec0a3 100755
--- a/libbacktrace/configure
+++ b/libbacktrace/configure
@@ -12760,6 +12760,29 @@ else
   if test -n "${with_target_subdir}"; then
case "${host}" in
hppa*-*-hpux*) libbacktrace_cv_sys_sync=no ;;
+   arm*-*-eabi*)
+ # Older versions of the Arm architecture lack the necessary instructions
+ # for these constructs, so check whether we can use them.
+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int i;
+int
+main ()
+{
+__sync_bool_compare_and_swap (&i, i, i);
+ __sync_lock_test_and_set (&i, 1);
+ __sync_lock_release (&i);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  libbacktrace_cv_sys_sync=yes
+else
+  libbacktrace_cv_sys_sync=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext;;
*) libbacktrace_cv_sys_sync=yes ;;
esac
  else
diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac
index b700bf9d4f9..75b3a7536f1 100644
--- a/libbacktrace/configure.ac
+++ b/libbacktrace/configure.ac
@@ -199,6 +199,16 @@ AC_CACHE_CHECK([__sync extensions],
 [if test -n "${with_target_subdir}"; then
case "${host}" in
hppa*-*-hpux*) libbacktrace_cv_sys_sync=no ;;
+   arm*-*-eabi*)
+ # Older versions of the Arm architecture lack the necessary instructions
+ # for these constructs, so check whether we can use them.
+ AC_LINK_IFELSE(
+   [AC_LANG_PROGRAM([int i;],
+[__sync_bool_compare_and_swap (&i, i, i);
+ __sync_lock_test_and_set (&i, 1);
+ __sync_lock_release (&i);])],
+   [libbacktrace_cv_sys_sync=yes],
+   [libbacktrace_cv_sys_sync=no]);;
*) libbacktrace_cv_sys_sync=yes ;;
esac
  else
-- 
2.34.1

[PATCH] options: Adjust cl_optimization_compare to avoid checking ICE [PR115913]

2025-01-27 Thread Lewis Hyatt

Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115913#c10

This tweak to optc-save-gen.awk weakens the check performed by
cl_optimization_compare() to avoid checking asserts that have been there
since this function was first added in r11-1141. Is it OK for 15 please? I
think it would be appropriate to backport back to 12,13,14 as well.
Bootstrap + regtest all languages on x86-64 Linux with no issues. Thanks!

-Lewis

-- >8 --

At the end of a sequence like:
 #pragma GCC push_options
 ...
 #pragma GCC pop_options

the handler for pop_options calls cl_optimization_compare() (as generated by
optc-save-gen.awk) to make sure that all global state has been restored to
the value it had prior to the push_options call. The verification is
performed for almost all entries in the global_options struct. This leads to
unexpected checking asserts, as discussed in the PR, in case the state of
warnings-related options has been intentionally modified in between
push_options and pop_options via a call to #pragma GCC diagnostic. Address
that by skipping the verification for CL_WARNING-flagged options.

gcc/ChangeLog:

PR middle-end/115913
* optc-save-gen.awk (cl_optimization_compare): Skip options with
CL_WARNING flag.

gcc/testsuite/ChangeLog:

PR middle-end/115913
* c-c++-common/cpp/pr115913.c: New test.
---
 gcc/optc-save-gen.awk | 5 +
 gcc/testsuite/c-c++-common/cpp/pr115913.c | 7 +++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr115913.c

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index fa9218472ed..a3d7e5a478e 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1484,6 +1484,11 @@ for (i = 0; i < n_opts; i++) {
if (name == "")
continue;
 
+   # We do not want to compare warning-related options, since they
+   # might have been modified by a #pragma GCC diagnostic.
+   if (flag_set_p("Warning", flags[i]))
+   continue;
+
if (name in checked_options)
continue;
checked_options[name]++
diff --git a/gcc/testsuite/c-c++-common/cpp/pr115913.c 
b/gcc/testsuite/c-c++-common/cpp/pr115913.c
new file mode 100644
index 000..b9d10cda8d2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr115913.c
@@ -0,0 +1,7 @@
+/* { dg-do preprocess } */
+/* PR middle-end/115913 */
+#pragma GCC push_options
+#pragma GCC diagnostic warning "-Wundef"
+/* The call to cl_optimization_compare performed by pop_options should not
+   lead to a checking failure.  */
+#pragma GCC pop_options

Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-01-27 Thread Christophe Lyon

On Mon, 27 Jan 2025 at 13:30, Torbjorn SVENSSON
 wrote:
>
> Hi Christophe,
>
> On 2025-01-27 13:07, Christophe Lyon wrote:
> > Hi Torbjorn,
> >
> > On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON
> >  wrote:
> >>
> >> Another ping... :)
> >>
> >> Kind regards,
> >> Torbjörn
> >>
> >> On 2024-12-18 18:35, Torbjorn SVENSSON wrote:
> >>> Gentle ping :)
> >>>
> >>> Kind regards,
> >>> Torbjörn
> >>>
> >>> On 2024-11-14 17:16, Torbjorn SVENSSON wrote:
> 
> 
>  On 2024-11-14 16:32, Christophe Lyon wrote:
> > On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON
> >  wrote:
> >>
> >> Ok for trunk and releases/gcc-14?
> >>
> >> --
> >>
> >> A long time ago, this test forced -march=armv6.
> >>
> >> With -marm, the generated assembler is:
> >> foo:
> >>   sub r0, r0, #48
> >>   cmp r0, #9
> >>   movhi   r0, #0
> >>   movls   r0, #1
> >>   bx  lr
> >>
> >> With -mthumb, the generated assembler is:
> >> foo:
> >>   subsr0, r0, #48
> >>   movsr2, #9
> >>   uxtbr3, r0
> >>   movsr0, #0
> >>   cmp r2, r3
> >>   adcsr0, r0, r0
> >>   uxtbr0, r0
> >>   bx  lr
> >>
> >> Require effective-target arm_arm_ok to skip the test for thumb-only
> >> targets (Cortex-M).
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>   * gcc.target/arm/unsigned-extend-1.c: Use effective-target
> >>   arm_arm_ok.
> >>
> >> Signed-off-by: Torbjörn SVENSSON 
> >> ---
> >>gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
> >>1 file changed, 1 insertion(+)
> >>
> >> diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/
> >> testsuite/gcc.target/arm/unsigned-extend-1.c
> >> index 3b4ab048fb0..73f2e1a556d 100644
> >> --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
> >> +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
> >> @@ -1,4 +1,5 @@
> >>/* { dg-do compile } */
> >> +/* { dg-require-effective-target arm_arm_ok } */
> >>/* { dg-options "-O2" } */
> >>
> >
> > So I'm guessing arm_arm_ok fails when you are testing for M-profile
> > targets?
> > What happens when you test for A-profile, providing -mthumb as part of
> > your runtest flags?
> 
> 
> >
> > Looking at our CI results, without your patch the test passes on:
> > cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode,
> > and it fails only on cortex-m0 and m23.
> > Does this match your observations?
>
> Without my patch, I see failure on M0, M0+ and M23.
> With my patch, it goes to unsupported on all Cortex-M targets that I test.
>
> > So your patch will make the test unsupported on m33, m3, m55 and m7
> > while it currently passes?
> >
> > Am I missing something?
>
> I think you might have missed my comment down
>
> >
> > Thanks,
> >
> > Christophe
> >
> 
>  For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft):
> 
>  Testing arm/unsigned-extend-1.c
>  check_compile tool: gcc for arm_arm_ok
>  doing compile
>  Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21114.c  -
>  mthumb -mcpu=cortex-m0 -mfloat-abi=soft   -fdiagnostics-plain-output
>  -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s
>  (timeout = 800)
>  spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c -
>  mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -
>  marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s
>  pid is 21254 -21254
>  cc1: error: target CPU does not support ARM mode
>  pid is -1
>  close result is 21254 exp6 0 1
>  output is cc1: error: target CPU does not support ARM mode
> status 1
>  compiler exited with status 1
>  UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c
> 
> 
>  For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft):
> 
>  Testing arm/unsigned-extend-1.c
>  check_compile tool: gcc for arm_arm_ok
>  doing compile
>  Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21296.c  -
>  mthumb -mcpu=cortex-a7 -mfloat-abi=soft   -fdiagnostics-plain-output
>  -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s
>  (timeout = 800)
>  spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c -
>  mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output -
>  marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s
>  pid is 21434 -21434
>  pid is -1
>  output is  status 0
>  doing compile
>  Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/
>  gcc.target/arm/unsigned-extend-1.c  -mthumb -mcpu=cortex-a7 -mfloat-
>  abi=soft   -fdiagnostics-plain-output  -O2 -ffat-lto-objects -fno-
>  ident -S -o unsigned-extend

Re: [PATCH v3 0/4] Hard Register Constraints

2025-01-27 Thread Jeff Law





On 1/26/25 12:24 PM, Jakub Jelinek wrote:

On Sun, Jan 26, 2025 at 08:35:29AM -0700, Jeff Law wrote:

On 1/23/25 8:49 AM, Stefan Schulze Frielinghaus wrote:

On Sat, Jan 18, 2025 at 09:36:14AM -0700, Jeff Law wrote:
[...]

Do we detect conflicts between a hard register constraint and another
constraint which requires a singleton class?  That's going to be an error I
suspect, but curious if it's handled.


That is a good point.  Currently I suspect no.  I will have a look.

Thanks.  It's not the most important thing on our plate, but given the way
x86 is structured we probably need to do something sensible here.

I also worry a bit about non-singleton classes that the target may have
added to CLASS_LIKELY_SPILLED_P, though unlike the singleton case, there's
at least a chance these will work, albeit potentially generating poor code
when an object needs spilling.  I also don't think it's terribly common to
add non-singleton classes to that set.


I was first worried that the single register class construct is somewhat
special.  To me, it turns out that they behave very similar to my
current draft.  Basically during LRA in process_alt_operands() I'm
installing

Yea, I would think they'd largely behave like your proposal. Given the
presence of a singleton class one could use that to write an ASM just as
effectively as your hard register proposal.  And satisfying the constraints
should boil down to the same basic process.

What your proposal does is give users fine grained control as-if the port
had a singleton class for every register -- without us having to add all
those pesky register classes.


Though, don't we have various hacks for small register classes?
I mean e.g. targetm.class_likely_spilled_p calls in
combine/cse/loop-invariant etc.
If all the registers could be made to behave similarly, don't we need to
create those similarly?
Unclear at this point, though leaning towards not strictly requiring a 
change to that code, though we may be able to clean it up as a result of 
Stefan's work.


Essentially he's code code to split ranges, which in theory we could use 
to split ranges of pseudos in likely spilled classes and perhaps 
simplify or improve that code.


Jeff

[PATCH] tree-optimization/114052 - wrong code with niter analysis

2025-01-27 Thread Richard Biener

The following fixes part of the issue that niter analysis uses UB
of stmts that might not execute due to inner infinite loops (the
testcase in the PR) or due to calls terminating the program or
the function.  This patch addresses inner infinite loops by copying
parts of the analysis loop invariant motion does.  This requires
enumerating the loop body in DOM order rather than DFS order.

On trunk the testcase no longer triggers the bug, but it is present
on the branches where this patch fixes the issue.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Any objections?  The missing related bits are

  /* If LOOP exits from this BB stop processing.  */
  FOR_EACH_EDGE (e, ei, bb->succs)
if (!flow_bb_inside_loop_p (loop, e->dest))
  break;
  if (e)
break;

  /* A loop might be infinite (TODO use simple loop analysis
 to disprove this if possible).  */
  if (bb->flags & BB_IRREDUCIBLE_LOOP)
break;

and of course call handling.  But somehow I'd like to see relevant
testcases since this all likely will cause some (too optimistic)
missed optimizations.

Richard.

PR tree-optimization/114052
* tree-ssa-loop-niter.cc (estimate_numbers_of_iterations):
Enumerate the loop body in DOM order.
(infer_loop_bounds_from_undefined): Track which subloop
we walk, when exiting a subloop that may be not finite
terminate the walk.

* gcc.dg/pr114052-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr114052-1.c | 41 +++
 gcc/tree-ssa-loop-niter.cc| 19 +-
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr114052-1.c

diff --git a/gcc/testsuite/gcc.dg/pr114052-1.c 
b/gcc/testsuite/gcc.dg/pr114052-1.c
new file mode 100644
index 000..920d32b1a65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114052-1.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target alarm } */
+/* { dg-require-effective-target signal } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+#include 
+#include 
+
+volatile int y;
+void __attribute__((noipa)) put(int x)
+{
+  if (y)
+__builtin_printf ("%i\n", x);
+}
+
+void __attribute__((noipa)) f(void)
+{
+  int counter = 0;
+  while (1) {
+  if (counter >= 2) continue;
+  put (counter++);
+  }
+}
+
+void do_exit (int i)
+{
+  exit (0);
+}
+
+int main()
+{
+  struct sigaction s;
+  sigemptyset (&s.sa_mask);
+  s.sa_handler = do_exit;
+  s.sa_flags = 0;
+  sigaction (SIGALRM, &s, NULL);
+  alarm (1);
+  f();
+}
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index de8d5ae6233..2bd349e58bd 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4451,11 +4451,28 @@ infer_loop_bounds_from_undefined (class loop *loop, 
basic_block *bbs)
   gimple_stmt_iterator bsi;
   basic_block bb;
   bool reliable;
+  class loop *inn_loop = loop;
 
   for (i = 0; i < loop->num_nodes; i++)
 {
   bb = bbs[i];
 
+  if (!flow_bb_inside_loop_p (inn_loop, bb))
+   {
+ /* When we are leaving a possibly infinite inner loop
+we have to stop processing.  */
+ if (!finite_loop_p (inn_loop))
+   break;
+ /* If the loop was finite we can continue with processing
+the loop we exited to.  */
+ inn_loop = bb->loop_father;
+   }
+
+  if (bb->loop_father->header == bb)
+   /* Record that we enter into a subloop since it might not
+  be finite.  */
+   inn_loop = bb->loop_father;
+
   /* If BB is not executed in each iteration of the loop, we cannot
 use the operations in it to infer reliable upper bound on the
 # of iterations of the loop.  However, we can use it as a guess.
@@ -4885,7 +4902,7 @@ estimate_numbers_of_iterations (class loop *loop)
  diagnose those loops with -Waggressive-loop-optimizations.  */
   number_of_latch_executions (loop);
 
-  basic_block *body = get_loop_body (loop);
+  basic_block *body = get_loop_body_in_dom_order (loop);
   auto_vec exits = get_loop_exit_edges (loop, body);
   likely_exit = single_likely_exit (loop, exits);
   FOR_EACH_VEC_ELT (exits, i, ex)
-- 
2.43.0

[PATCH] tree-optimization/118653 - ICE in vectorizable_live_operation

2025-01-27 Thread Richard Biener

The checking code didn't take into account debug uses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/118653
* tree-vect-loop.cc (vectorizable_live_operation): Also allow
out-of-loop debug uses.

* gcc.dg/vect/pr118653.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr118653.c | 15 +++
 gcc/tree-vect-loop.cc|  3 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr118653.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr118653.c 
b/gcc/testsuite/gcc.dg/vect/pr118653.c
new file mode 100644
index 000..9322b23a17d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr118653.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -g" } */
+
+static short func1 (short si1, short si2) {
+  return (si1 - si2);
+}
+unsigned short g_72;
+extern int g_100[];
+short g_173;
+void func_42(void)
+{
+  for (g_173 = 10; g_173 > 0; g_173 = func1 (g_173, 1))
+for (g_72 = 1; g_72 < 5; g_72++)
+  g_100[g_72] &= 1;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index edd7d4d8763..ce674a71e8a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11574,7 +11574,8 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
-   gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
+   gcc_assert (is_gimple_debug (use_stmt)
+   || flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
 }
   else
 {
-- 
2.43.0

Re: [PATCH] tree-optimization/112859 - bogus loop distribution

2025-01-27 Thread Michael Matz

Hello,

On Thu, 23 Jan 2025, Richard Biener wrote:

> When we get a zero distance vector we still have to check for the
> situation of a common inner loop with zero distance.  But we can
> still allow a zero distance for the loop we distribute
> (gcc.dg/tree-ssa/ldist-33.c is such a case).  This is because
> zero distances in non-outermost loops are a misrepresentation
> of dependence by dependence analysis.

I think as long as that is the case your proposed changes makes sense.  
But perhaps it's worth a comment to that effect, i.e. that because 
dependence analysis is wonky (sometime using zero as unknown) we do these 
"strange" tests.  At least for the occasional reader it will be 
unintuitive why the obvious tests aren't working.

Ciao,
Michael.

[PATCH] RISC-V: testsuite: Fix gather_load_64-12-zvbb.c

2025-01-27 Thread Robin Dapp

Hi,

the test fails with _zvfh because we vectorize more.  Just adjust the
test expectations.  I regtested this a while ago on rv64gcv_zvl512b.
Will commit as obvious if the CI is happy.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c:
Distinguish between zvfh and !zvfh.
---
 .../riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
index de5a5ed7d56..698f0091390 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
@@ -106,7 +106,8 @@ TEST_LOOP (_Float16, uint64_t)
 TEST_LOOP (float, uint64_t)
 TEST_LOOP (double, uint64_t)
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 80 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 80 
"vect" { target { ! riscv_zvfh } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 
"vect" { target riscv_zvfh } } } */
 /* { dg-final { scan-tree-dump " \.MASK_LEN_GATHER_LOAD" "vect" } } */
 /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
-- 
2.47.1

[PATCH] RISC-V: testsuite: Fix reduc-8.c and reduc-9.c

2025-01-27 Thread Robin Dapp

Hi,

in both tests we expect a VEC_SHL_INSERT expression but we now add the
initial value at the end.  Just remove that scan check.

Will commit as obvious after the CI run.

Regards
 Robin


---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c | 1 -
 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c
index fe47aa3648d..518f0c33cc4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c
@@ -12,5 +12,4 @@ add_loop (int *x, int n, int res)
   return res;
 }
 
-/* { dg-final { scan-tree-dump-times "VEC_SHL_INSERT" 1 "optimized" } } */
 /* { dg-final { scan-assembler-times 
{vslide1up\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[a-x0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c
index 6630d302721..a5bb8dcccb8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c
@@ -12,5 +12,4 @@ add_loop (float *x, int n, float res)
   return res;
 }
 
-/* { dg-final { scan-tree-dump-times "VEC_SHL_INSERT" 1 "optimized" } } */
 /* { dg-final { scan-assembler-times 
{vfslide1up\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[a-x0-9]+} 1 } } */
-- 
2.47.1

Re: [PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends

2025-01-27 Thread Jakub Jelinek

On Mon, Jan 27, 2025 at 11:09:38AM +0100, Richard Biener wrote:
>   PR rtl-optimization/118662
>   * combine.cc (try_combine): When re-materializing a load
>   from an extended reg by a lowpart subreg make sure we're
>   dealing with single-component modes.
> 
>   * gcc.dg/torture/pr118662.c: New testcase.
> ---
>  gcc/combine.cc  |  5 +
>  gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++
>  2 files changed, 23 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index a2d4387cebe..4849603ba5e 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>   copy.  This saves at least one insn, more if register allocation can
>   eliminate the copy.
>  
> + We cannot do this if the involved modes have more than one elements,
> + like for vector or complex modes.
> +
>   We cannot do this if the destination of the first assignment is a
>   condition code register.  We eliminate this case by making sure
>   the SET_DEST and SET_SRC have the same mode.
> @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>  && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND
>  && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))
>  == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0
> +&& known_eq (GET_MODE_NUNITS
> +   (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1)

If you want to rule this out for vector/complex modes, then the above
doesn't do that, all the V1??mode will still be optimized, and I think
that is undesirable.
So, either && !VECTOR_MODE_P (GET_MODE (...))
   && !COMPLEX_MODE_P (GET_MODE (...))
or perhaps as less expensive but less readable check
   && GET_MODE_INNER (GET_MODE (...)) == GET_MODE (...)
with a comment.

Jakub

[PATCH] middle-end/118643 - ICE with out-of-bound decl access

2025-01-27 Thread Richard Biener

When RTL expansion of an out-of-bound access of a register falls
back to a BIT_FIELD_REF we have to ensure that's valid.  The
following avoids negative offsets by expanding through a stack
temporary.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR middle-end/118643
* expr.cc (expand_expr_real_1): Avoid falling back to BIT_FIELD_REF
expansion for negative offset.

* gcc.dg/pr118643.c: New testcase.
---
 gcc/expr.cc |  8 +---
 gcc/testsuite/gcc.dg/pr118643.c | 11 +++
 2 files changed, 16 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr118643.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index a310b2d9131..4bfcde54523 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -11796,13 +11796,15 @@ expand_expr_real_1 (tree exp, rtx target, 
machine_mode tmode,
&& known_eq (GET_MODE_BITSIZE (DECL_MODE (base)), type_size))
  return expand_expr (build1 (VIEW_CONVERT_EXPR, type, base),
  target, tmode, modifier);
-   if (TYPE_MODE (type) == BLKmode)
+   if (TYPE_MODE (type) == BLKmode
+   || maybe_lt (offset, 0))
  {
temp = assign_stack_temp (DECL_MODE (base),
  GET_MODE_SIZE (DECL_MODE (base)));
store_expr (base, temp, 0, false, false);
-   temp = adjust_address (temp, BLKmode, offset);
-   set_mem_size (temp, int_size_in_bytes (type));
+   temp = adjust_address (temp, TYPE_MODE (type), offset);
+   if (TYPE_MODE (type) == BLKmode)
+ set_mem_size (temp, int_size_in_bytes (type));
return temp;
  }
exp = build3 (BIT_FIELD_REF, type, base, TYPE_SIZE (type),
diff --git a/gcc/testsuite/gcc.dg/pr118643.c b/gcc/testsuite/gcc.dg/pr118643.c
new file mode 100644
index 000..ff2b081ae0b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr118643.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+typedef __attribute__((__vector_size__(1))) unsigned char V;
+
+V x;
+void foo()
+{
+  V v = x;
+  x = *(V *)(&v - 1);
+}
-- 
2.43.0

Re: [PATCH] middle-end/118643 - ICE with out-of-bound decl access

2025-01-27 Thread Jakub Jelinek

On Mon, Jan 27, 2025 at 11:32:17AM +0100, Richard Biener wrote:
> When RTL expansion of an out-of-bound access of a register falls
> back to a BIT_FIELD_REF we have to ensure that's valid.  The
> following avoids negative offsets by expanding through a stack
> temporary.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> OK if that succeeds?
> 
> Thanks,
> Richard.
> 
>   PR middle-end/118643
>   * expr.cc (expand_expr_real_1): Avoid falling back to BIT_FIELD_REF
>   expansion for negative offset.
> 
>   * gcc.dg/pr118643.c: New testcase.

> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -11796,13 +11796,15 @@ expand_expr_real_1 (tree exp, rtx target, 
> machine_mode tmode,
>   && known_eq (GET_MODE_BITSIZE (DECL_MODE (base)), type_size))
> return expand_expr (build1 (VIEW_CONVERT_EXPR, type, base),
> target, tmode, modifier);
> - if (TYPE_MODE (type) == BLKmode)
> + if (TYPE_MODE (type) == BLKmode
> + || maybe_lt (offset, 0))

The whole condition would fit on one line.

> {
>   temp = assign_stack_temp (DECL_MODE (base),
> GET_MODE_SIZE (DECL_MODE (base)));
>   store_expr (base, temp, 0, false, false);
> - temp = adjust_address (temp, BLKmode, offset);
> - set_mem_size (temp, int_size_in_bytes (type));
> + temp = adjust_address (temp, TYPE_MODE (type), offset);
> + if (TYPE_MODE (type) == BLKmode)
> +   set_mem_size (temp, int_size_in_bytes (type));
>   return temp;
> }
>   exp = build3 (BIT_FIELD_REF, type, base, TYPE_SIZE (type),

Otherwise LGTM.

Jakub

[Patch, fortran] PR118640 - [15 Regression] cp2k ICE in gfc_conv_expr_present since r15-5347

2025-01-27 Thread Paul Richard Thomas

Hi All,

Pushed as an 'obvious' one-liner(less additional comments) in r15-7224

Fortran: ICE in gfc_conv_expr_present w. defined assignment [PR118640]

2025-01-27  Paul Thomas  

gcc/fortran
PR fortran/118640
* resolve.cc (generate_component_assignments): Make sure that
the rhs temporary does not pick up the optional attribute from
the lhs.

gcc/testsuite/
PR fortran/118640
* gfortran.dg/pr118640.f90: New test.

Paul

[PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends

2025-01-27 Thread Richard Biener

The following fixes an issue in the RTL combiner where we correctly
combine two vector sign-exxtends with a vector load

Trying 7, 9 -> 10:
7: r106:V4QI=[r119:DI]
  REG_DEAD r119:DI
9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel))
   10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel))
  REG_DEAD r108:V4HI

to

modifying insn i2 9: r109:V4SI=sign_extend([r119:DI])

but since r106 is used we wrongly materialize it using a subreg:

modifying insn i310: r106:V4QI=r109:V4SI#0

which of course does not work for modes with more than one component.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Note the check allows subreg:V1QI reg:V1SI (which I think is OK).
There's no SCALAR_MODE_P, maybe the other checks guarantee it's
an integer mode so eventually SCALAR_INT_MODE_P covers everything
important (it wouldn't cover V1QI, not that that's important).

OK?  Or do you prefer a different check - which?

Thanks,
Richard.

PR rtl-optimization/118662
* combine.cc (try_combine): When re-materializing a load
from an extended reg by a lowpart subreg make sure we're
dealing with single-component modes.

* gcc.dg/torture/pr118662.c: New testcase.
---
 gcc/combine.cc  |  5 +
 gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a2d4387cebe..4849603ba5e 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  copy.  This saves at least one insn, more if register allocation can
  eliminate the copy.
 
+ We cannot do this if the involved modes have more than one elements,
+ like for vector or complex modes.
+
  We cannot do this if the destination of the first assignment is a
  condition code register.  We eliminate this case by making sure
  the SET_DEST and SET_SRC have the same mode.
@@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND
   && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))
   == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0
+  && known_eq (GET_MODE_NUNITS
+ (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1)
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
   && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)),
   XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0))
diff --git a/gcc/testsuite/gcc.dg/torture/pr118662.c 
b/gcc/testsuite/gcc.dg/torture/pr118662.c
new file mode 100644
index 000..b9e8cca0aeb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118662.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */
+/* { dg-additional-options "-msse4" { target sse4_runtime} } */
+
+int __attribute__((noipa)) addup(signed char *num) {
+  int val = num[0] + num[1] + num[2] + num[3];
+  if (num[3] >= 0)
+val++;
+  return val;
+}
+
+int main(int, char *[])
+{
+  signed char num[4] = {1, 1, 1, -1};
+  if (addup(num) != 2)
+__builtin_abort();
+  return 0;
+}
-- 
2.43.0

[PATCH/GCC16 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson

This patch allows the AArch64 back end to emit the half-precision variants of
FCMP and FCMPE, given the target supports FEAT_FP16. Previously, such 
comparisons
would be unnecessarily promoted to single-precision.

The latest documentation of these instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2024-12

Successfully bootstrapped and regtested on aarch64-linux-gnu.

OK for stage 1?

Spencer Abson (1):
  AArch64: Emit half-precision FCMP/FCMPE

 gcc/config/aarch64/aarch64.md | 29 +-
 .../gcc.target/aarch64/_Float16_cmp_1.c   | 54 +++
 .../gcc.target/aarch64/_Float16_cmp_2.c   |  7 +++
 3 files changed, 77 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c

-- 
2.34.1

[PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson

Enable a target with FEAT_FP16 to emit the half-precision variants
of FCMP/FCMPE.

gcc/ChangeLog:

* config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
and fcmpe to use the GPF_F16 iterator for floating-point
modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/_Float16_cmp_1.c: New test.
* gcc.target/aarch64/_Float16_cmp_2.c: New (negative) test.
---
 gcc/config/aarch64/aarch64.md | 29 +-
 .../gcc.target/aarch64/_Float16_cmp_1.c   | 54 +++
 .../gcc.target/aarch64/_Float16_cmp_2.c   |  7 +++
 3 files changed, 77 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 071058dbeb3..8721bf5d4f3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -707,11 +707,12 @@
 )
 
 (define_expand "cbranch4"
-  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand:GPF 1 "register_operand")
-(match_operand:GPF 2 
"aarch64_fp_compare_operand")])
-  (label_ref (match_operand 3 "" ""))
-  (pc)))]
+  [(set (pc) (if_then_else
+   (match_operator 0 "aarch64_comparison_operator"
+[(match_operand:GPF_F16 1 "register_operand")
+ (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
+   (label_ref (match_operand 3 "" ""))
+   (pc)))]
   ""
   "
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
@@ -4338,26 +4339,28 @@
 
 (define_insn "fcmp"
   [(set (reg:CCFP CC_REGNUM)
-(compare:CCFP (match_operand:GPF 0 "register_operand")
- (match_operand:GPF 1 "aarch64_fp_compare_operand")))]
+   (compare:CCFP
+   (match_operand:GPF_F16 0 "register_operand")
+   (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))]
"TARGET_FLOAT"
{@ [ cons: 0 , 1  ]
   [ w   , Y  ] fcmp\t%0, #0.0
   [ w   , w  ] fcmp\t%0, %1
   }
-  [(set_attr "type" "fcmp")]
+  [(set_attr "type" "fcmp")]
 )
 
 (define_insn "fcmpe"
   [(set (reg:CCFPE CC_REGNUM)
-(compare:CCFPE (match_operand:GPF 0 "register_operand")
-  (match_operand:GPF 1 "aarch64_fp_compare_operand")))]
+   (compare:CCFPE
+   (match_operand:GPF_F16 0 "register_operand")
+   (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))]
"TARGET_FLOAT"
{@ [ cons: 0 , 1  ]
   [ w   , Y  ] fcmpe\t%0, #0.0
   [ w   , w  ] fcmpe\t%0, %1
   }
-  [(set_attr "type" "fcmp")]
+  [(set_attr "type" "fcmp")]
 )
 
 (define_insn "*cmp_swp__reg"
@@ -4425,8 +4428,8 @@
 (define_expand "cstore4"
   [(set (match_operand:SI 0 "register_operand")
(match_operator:SI 1 "aarch64_comparison_operator_mode"
-[(match_operand:GPF 2 "register_operand")
- (match_operand:GPF 3 "aarch64_fp_compare_operand")]))]
+[(match_operand:GPF_F16 2 "register_operand")
+ (match_operand:GPF_F16 3 "aarch64_fp_compare_operand")]))]
   ""
   "
   operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
diff --git a/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c 
b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c
new file mode 100644
index 000..e49ace1d7dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.2-a+fp16" } */
+
+/*
+** test_fcmp_store:
+** fcmph0, h1
+** csetw0, eq
+** ret
+*/
+int
+test_fcmp_store(_Float16 a, _Float16 b)
+{
+return a == b;
+}
+
+/*
+** test_fcmpe_store:
+** fcmpe   h0, h1
+** csetw0, mi
+** ret
+*/
+int
+test_fcmpe_store(_Float16 a, _Float16 b)
+{
+return a < b;
+}
+
+/*
+** test_fcmp_branch:
+** fcmph0, h1
+** ...
+*/
+_Float16
+test_fcmp_branch(_Float16 a, _Float16 b)
+{
+if (a == b)
+return a * b;
+return a;
+}
+
+/*
+** test_fcmpe_branch:
+** fcmpe   h0, h1
+** ...
+*/
+_Float16
+test_fcmpe_branch(_Float16 a, _Float16 b)
+{
+if (a < b)
+return a * b;
+return a;
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c 
b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c
new file mode 100644
index 000..e714304970b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.2-a+nofp16" } */
+
+#include "_Float16_cmp_1.c"
+
+/* { dg-final { scan-assembler-not "\tfcmp\th\[0-9\]\+" } } */
+/* { dg-final { scan-assembler-not "\tfcmpe\th\[0-9\]\+" } } */
\ No newline at end of file
--

Re: [RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x

2025-01-27 Thread Richard Biener

On Mon, Jan 27, 2025 at 8:30 AM Haochen Jiang  wrote:
>
> Hi all,
>
> AVX10 has been published for one and half year and we have got many feedbacks
> on that, one of the feedback is on whether the alias option -mavx10.x should
> point to 256 or 512.
>
> If you also pay attention to LLVM community, you might see this thread related
> to AVX10 options just sent out several hours ago:
>
> [X86][AVX10] Disable m[no-]avx10.1 and switch m[no-]avx10.2 to alias of 512 
> bit options
> https://github.com/llvm/llvm-project/pull/124511
>
> In GCC, we will also do so. This RFC patch is slightly different with LLVM, 
> just
> including:
>
>   - Switch -m[no-]avx10.2 to alias of 512 bit options.
>   - Change -mno-avx10.[1,2]-512 to disable both 256 and 512 instructions. This
>   will also result in -mno-avx10.2 would still disable both 256 and 512 insts
>   according to new alias point to 512.
>
> But not including disabling -m[no-]avx10.1, since I still want more input on
> how to handle that. We actually have three choices on that:
>
>  a. Directly re-alias -m[no-]avx10.1 to -m[no-]avx10.1-512 GCC 15 and backport
>  to GCC 14.
>  b. Disable -m[no]-avx10.1 in GCC 15, and add it back with -m[no-]avx10.1-512
>  in the future. This is for in case if someone cross compile with different 
> versions
>  of GCC with -mavx10.1, it might get unexpected result sliently.
>  c. Disable -m[no]-avx10.1 in GCC 15, and never add it back. Since the option 
> has
>  been 256 bit, changing them back and forth is messy.
>
> It might be the final chance we could change the alias option since real
> AVX10.1 hardware is coming soon. And it is only x86 specific, so it might 
> still
> squeeze into GCC 15 at this time.
>
> I call this patch RFC patch since we also need to change the doc and testcases
> accordingly, which makes this patch incomplete. Discussion and input is 
> welcomed
> on this topic.

Can you re-hash on how users need to select 256bit vs 512bit support?
I understand
the above change basically makes -m[no-]avx10.[12] gate ISA features
but not size?
So that will now enable 512bit unless -mno-evex512 is given?

Given for the -mavx10.[12]-{256,512} the behavior changes compared to GCC 14 I'd
rather drop the options that behave differently from GCC 14 on GCC 15
than changing
their meaning.  That unfortunately will make them a hard error (but I
don't expect much
use).  I'm not sure it's worth retaining -m[no-]avx10.[12]-512.

But maybe I'm misunderstanding the change (too many avx10.x related
options are there).

Richard.

>
> Thx,
> Haochen
>
> ---
>  gcc/common/config/i386/i386-common.cc | 30 +--
>  gcc/common/config/i386/i386-isas.h|  2 +-
>  gcc/config/i386/i386-options.cc   |  2 +-
>  gcc/config/i386/i386.opt  |  4 ++--
>  gcc/doc/extend.texi   |  8 ---
>  gcc/doc/sourcebuild.texi  |  4 ++--
>  6 files changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 52ad1c5acd1..3891fca8ecb 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -325,14 +325,12 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F
>  #define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
>  #define OPTION_MASK_ISA2_USER_MSR_UNSET OPTION_MASK_ISA2_USER_MSR
> -#define OPTION_MASK_ISA2_AVX10_1_256_UNSET \
> -  (OPTION_MASK_ISA2_AVX10_1_256 | OPTION_MASK_ISA2_AVX10_1_512_UNSET \
> -   | OPTION_MASK_ISA2_AVX10_2_256_UNSET)
> -#define OPTION_MASK_ISA2_AVX10_1_512_UNSET \
> -  (OPTION_MASK_ISA2_AVX10_1_512 | OPTION_MASK_ISA2_AVX10_2_512_UNSET)
> -#define OPTION_MASK_ISA2_AVX10_2_256_UNSET OPTION_MASK_ISA2_AVX10_2_256
> -#define OPTION_MASK_ISA2_AVX10_2_512_UNSET \
> -  (OPTION_MASK_ISA2_AVX10_2_512 | OPTION_MASK_ISA2_AMX_AVX512_UNSET)
> +#define OPTION_MASK_ISA2_AVX10_1_UNSET \
> +  (OPTION_MASK_ISA2_AVX10_1_256 | OPTION_MASK_ISA2_AVX10_1_512 \
> +   | OPTION_MASK_ISA2_AVX10_2_UNSET)
> +#define OPTION_MASK_ISA2_AVX10_2_UNSET \
> +  (OPTION_MASK_ISA2_AVX10_2_256 | OPTION_MASK_ISA2_AVX10_2_512 \
> +   OPTION_MASK_ISA2_AMX_AVX512_UNSET)
>  #define OPTION_MASK_ISA2_AMX_AVX512_UNSET OPTION_MASK_ISA2_AMX_AVX512
>  #define OPTION_MASK_ISA2_AMX_TF32_UNSET OPTION_MASK_ISA2_AMX_TF32
>  #define OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET OPTION_MASK_ISA2_AMX_TRANSPOSE
> @@ -1378,8 +1376,8 @@ ix86_handle_option (struct gcc_options *opts,
> }
>else
> {
> - opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_256_UNSET;
> - opts->x_ix86_isa_flags2_explicit |= 
> OPTION_MASK_ISA2_AVX10_1_256_UNSET;
> + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_UNSET;
> + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_UNSET;
>   opts->x_ix86_no_avx10_1_explicit = 1;
> }
>return true;
> @@ -1394,8 +1392,8 @@ ix86

Re: [PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends

2025-01-27 Thread Richard Biener

On Mon, 27 Jan 2025, Jakub Jelinek wrote:

> On Mon, Jan 27, 2025 at 11:09:38AM +0100, Richard Biener wrote:
> > PR rtl-optimization/118662
> > * combine.cc (try_combine): When re-materializing a load
> > from an extended reg by a lowpart subreg make sure we're
> > dealing with single-component modes.
> > 
> > * gcc.dg/torture/pr118662.c: New testcase.
> > ---
> >  gcc/combine.cc  |  5 +
> >  gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++
> >  2 files changed, 23 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c
> > 
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index a2d4387cebe..4849603ba5e 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
> > *i1, rtx_insn *i0,
> >   copy.  This saves at least one insn, more if register allocation can
> >   eliminate the copy.
> >  
> > + We cannot do this if the involved modes have more than one elements,
> > + like for vector or complex modes.
> > +
> >   We cannot do this if the destination of the first assignment is a
> >   condition code register.  We eliminate this case by making sure
> >   the SET_DEST and SET_SRC have the same mode.
> > @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
> > *i1, rtx_insn *i0,
> >&& GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND
> >&& (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))
> >== GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0
> > +  && known_eq (GET_MODE_NUNITS
> > + (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1)
> 
> If you want to rule this out for vector/complex modes, then the above
> doesn't do that, all the V1??mode will still be optimized, and I think
> that is undesirable.

If you think so (I think it should work fine for V1??mode).

> So, either && !VECTOR_MODE_P (GET_MODE (...))
>  && !COMPLEX_MODE_P (GET_MODE (...))
> or perhaps as less expensive but less readable check
>  && GET_MODE_INNER (GET_MODE (...)) == GET_MODE (...)

Had the latter before figuring the GET_MODE_NUNITS variant.  I do
prefer a "positive" check, rtl.texi says sign_extend applies to
all fixed-point modes but ALL_SCALAR_FIXED_POINT_MODE_P
doesn't include regular MODE_INT.

I guess I'll go with !VECTOR_MODE_P && !COMPLEX_MODE_P, while
SCALAR_INT_MODE_P looks safe it might cause regressions.

Thanks,
Richard.

> with a comment.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] c++: Handle CWG2867 even in namespace scope structured bindings in header modules [PR115769]

2025-01-27 Thread Jason Merrill


On 1/21/25 10:52 AM, Jakub Jelinek wrote:

Hi!

On top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662750.html
patches (where the first one implements CWG2867 for block scope static
or thread_local structured bindings and the latter for namespace scope
structured bindings; CWG2867 for automatic structured bindings is
already committed in r15-3513) the following patch implements the module
streaming of the new STATIC_INIT_DECOMP_BASE_P and
STATIC_INIT_DECOMP_NONBASE_P flags.  As I think namespace scope structured
bindings in the header modules will be pretty rare, I've tried to stream
something extra only when they actually appear, in that case it streams
extra INTEGER_CSTs which mark end of STATIC_INIT_DECOMP_*BASE_P (0),
start of STATIC_INIT_DECOMP_BASE_P for static_aggregates (1), start of
STATIC_INIT_DECOMP_NONBASE_P for static_aggregates (2) and ditto for
tls_aggregates (3 and 4).
The patch also copies with just small tweaks the testcases from the
second patch above as header modules.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-21  Jakub Jelinek  

PR c++/115769
gcc/cp/
* module.cc (module_state::write_inits): Verify
STATIC_INIT_DECOMP_{,NON}BASE_P flags and stream changes in those
out.
(module_state::read_inits): Stream those flags in.
gcc/testsuite/
* g++.dg/modules/dr2867-1_a.H: New test.
* g++.dg/modules/dr2867-1_b.C: New test.
* g++.dg/modules/dr2867-2_a.H: New test.
* g++.dg/modules/dr2867-2_b.C: New test.
* g++.dg/modules/dr2867-3_a.H: New test.
* g++.dg/modules/dr2867-3_b.C: New test.
* g++.dg/modules/dr2867-4_a.H: New test.
* g++.dg/modules/dr2867-4_b.C: New test.

--- gcc/cp/module.cc.jj 2025-01-21 09:04:18.085457077 +0100
+++ gcc/cp/module.cc2025-01-21 13:31:40.670938455 +0100
@@ -18723,6 +18723,65 @@ module_state::write_inits (elf_out *to,
for (tree init = list; init; init = TREE_CHAIN (init))
if (TREE_LANG_FLAG_0 (init))
  {
+   if (STATIC_INIT_DECOMP_BASE_P (init))
+ {
+   /* Ensure that in the returned result chain if the
+  STATIC_INIT_DECOMP_*BASE_P flags are set, there is
+  always one or more STATIC_INIT_DECOMP_BASE_P TREE_LIST
+  followed by one or more STATIC_INIT_DECOMP_NONBASE_P.  */
+   int phase = 0;
+   tree last = NULL_TREE;
+   for (tree init2 = TREE_CHAIN (init);
+init2; init2 = TREE_CHAIN (init2))
+ {
+   if (phase == 0 && STATIC_INIT_DECOMP_BASE_P (init2))
+ ;
+   else if (phase == 0
+&& STATIC_INIT_DECOMP_NONBASE_P (init2))
+ {
+   phase = TREE_LANG_FLAG_0 (init2) ? 2 : 1;
+   last = init2;
+ }
+   else if (IN_RANGE (phase, 1, 2)
+&& STATIC_INIT_DECOMP_NONBASE_P (init2))
+ {
+   if (TREE_LANG_FLAG_0 (init2))
+ phase = 2;
+   last = init2;
+ }
+   else
+ break;
+ }
+   if (phase == 2)
+ {
+   /* In that case, add markers about it so that the
+  STATIC_INIT_DECOMP_BASE_P and
+  STATIC_INIT_DECOMP_NONBASE_P flags can be restored.  */
+   sec.tree_node (build_int_cst (integer_type_node,
+ 2 * passes + 1));
+   phase = 1;
+   for (tree init2 = init; init2 != TREE_CHAIN (last);
+init2 = TREE_CHAIN (init2))
+ if (TREE_LANG_FLAG_0 (init2))
+   {
+ tree decl = TREE_VALUE (init2);
+ if (phase == 1
+ && STATIC_INIT_DECOMP_NONBASE_P (init2))
+   {
+ sec.tree_node (build_int_cst (integer_type_node,
+   2 * passes + 2));
+ phase = 2;
+   }
+ dump ("Initializer:%u for %N", count, decl);
+ sec.tree_node (decl);
+ ++count;
+   }
+   sec.tree_node (integer_zero_node);
+   init = last;
+   continue;
+ }
+ }
+
tree decl = TREE_VALUE (init);
  
  	dump ("Initializer:%u for %N", count, decl);

@@ -18793,16 +18852,43 @@ module_state::read_inits (unsigned count
dump.indent ();
  
l

[PATCH v10] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2025-01-27 Thread Simon Martin

Hi Jason,

On 17 Jan 2025, at 23:33, Jason Merrill wrote:

> On 1/17/25 9:52 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 16 Jan 2025, at 22:49, Jason Merrill wrote:
>>
>>> On 10/16/24 11:43 AM, Simon Martin wrote:
 As you know the patch had to be reverted due to PR117114, that
 highlighted a bunch of issues with comparing DECL_VINDEXes: it 
 might
 give false positives in case of multiple inheritance (the case in
 PR117114), but also if there’s single inheritance by the 

 hierarchy
 has
 more than two levels (another issue I found while bootstrapping 
 with
 rust enabled).
>>>
>>> Yes, relying on DECL_VINDEX equality was wrong, sorry to mislead 
>>> you.
>>>
 The attached updated patch introduces an overrides_p function, 
 based
 on
 the existing check_final_overrider, and uses it when the signatures

 match.
>>>
>>> That seems unnecessary.  It seems like removing that only breaks
>>> Woverloaded-virt11.C, and making that work again only requires
>>> bringing back the check that DECL_VINDEX (fndecl) is set (to any
>>> value).  Or remembering that fndecl was a template, so it can't 
>>> really
>>> have the same signature as a non-template, whatever same_signature_p

>>> says.
>> That’s right, only Woverloaded-virt11.C fails without the
>> check_final_overrider call.
>>
>> Thanks for the suggestion to check whether fndecl is a template. This 
>> is
>> what the updated attached patch does, successfully tested on
>> x86_64-pc-linux-gnu.
>>
>> OK for GCC 15? And if so, thoughts on backporting to release branches
>> (technically it’s a regression but it’s “just” an incorrect
>> warning fix, so probably not worth the risk)?
>
> Right, I wouldn't backport.
>
>> +if (warn_overloaded_virtual == 1
>> +&& overrider_fndecls.elements () == num_fns)
>> +  /* All the fns override a base virtual.  */
>> +  continue;
>
> This looks like the only use of the overrider_fndecls hash_set.  A 
> hash_set seems a bit overkill for checking whether everything in fns 
> is an overrider; keeping track of how many times the old any_override 
> was set should work just as well?
Yeah you’re right :-/ I’ve changed my latest patch to simply count 
overriders.

>> +/* fndecls hides base_fndecls[k].  */
>> +auto_vec &hiders =
>> +  hidden_base_fndecls.get_or_insert (base_fndecls[k]);
>> +if (!hiders.contains (fndecl))
>> +  hiders.safe_push (fndecl);
>
> Hmm, do you think users want a full list of the overloads that don't 
> override?  I'd think the problem is more the overload that doesn't 
> exist rather than the ones that do.  The current code ends up in the 
> OVERLOAD handling of dump_decl that just prints scope::name.
Indeed, the full list is probably not super useful... One problem with 
the current code is that for conversion operators, it will give a note 
such as “note:   by 'operator’”, so I propose to keep track of at 
least one of the hiders, and use it to show the note (and get a proper 
“by 'virtual B::operator char()'” note for conversion operators).

Hence the updated patch, successfully tested on x86_64-pc-linux-gnu. Ok 
for trunk?

Simon
From 29ee5ada1c2fe132891d09528b6e26aece16e6d1 Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Mon, 27 Jan 2025 14:17:19 +0100
Subject: [PATCH] c++: Fix overeager Woverloaded-virtual with conversion
 operators [PR109918]

We currently emit an incorrect -Woverloaded-virtual warning upon the
following test case

=== cut here ===
struct A {
  virtual operator int() { return 42; }
  virtual operator char() = 0;
};
struct B : public A {
  operator char() { return 'A'; }
};
=== cut here ===

The problem is that when iterating over ovl_range (fns), warn_hidden
gets confused by the conversion operator marker, concludes that
seen_non_override is true and therefore emits a warning for all
conversion operators in A that do not convert to char, even if
-Woverloaded-virtual is 1 (e.g. with -Wall, the case reported).

A second set of problems is highlighted when -Woverloaded-virtual is 2.

First, with the same test case, since base_fndecls contains all
conversion operators in A (except the one to char, that's been removed
when iterating over ovl_range (fns)), we emit a spurious warning for
the conversion operator to int, even though it's unrelated.

Second, in case there are several conversion operators with different
cv-qualifiers to the same type in A, we rightfully emit a warning,
however the note uses the location of the conversion operator marker
instead of the right one; location_of should go over conv_op_marker.

This patch fixes all these by explicitly keeping track of (1) base
methods that are overriden, as well as (2) base methods that are hidden
but not overriden (and by what), and warning about methods that are in
(2) but not (1). It also ignores non virtual base methods, per
"definition" of -Woverloaded-virtual.

Successfully te

Re: [PATCH] c++: Use mapped reads and writes when munmap and msync are available

2025-01-27 Thread Jason Merrill


On 1/16/25 3:15 PM, John David Anglin wrote:

Tested on hppa64-hp-hpux11.11 and hppa-unknown-linux-gnu.

Okay for trunk?


OK.


Dave
---

c++: Use mapped reads and writes when munmap and msync are available

Module support is broken when MAPPED_READING and MAPPED_WRITING
are defined to 0.  This causes internal compiler errors in the
permissive-error-1.C and permissive-error-2.C tests.

HP-UX 11.11 doesn't define _POSIX_MAPPED_FILES but it does have
munmap and msync.  Testing indicates support is sufficient for
c++ modules, so use checks for these functions instead of
_POSIX_MAPPED_FILES check.

2025-01-16  John David Anglin  

gcc/ChangeLog:

PR c++/116524
* configure.ac: Check for munmap and msync.
* configure: Regenerate.
* config.in: Regenerate.

gcc/cp/ChangeLog:
* module.cc: Test HAVE_MUNMAP and HAVE_MSYNC instead of
_POSIX_MAPPED_FILES > 0.

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 6c38c4925fb..8fab93c9365 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1574,7 +1574,7 @@ AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
popen sysconf strsignal getrusage nl_langinfo \
gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \
gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2 fstatat getauxval \
-   clock_gettime)
+   clock_gettime munmap msync)
  
  # At least for glibc, clock_gettime is in librt.  But don't pull that

  # in if it still doesn't give us the function we want.
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 61116fe7669..9b2bbdb2988 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -241,11 +241,11 @@ Classes used:
  #define MAPPED_READING 0
  #define MAPPED_WRITING 0
  #else
-#if HAVE_MMAP_FILE && _POSIX_MAPPED_FILES > 0
-/* mmap, munmap.  */
+#if HAVE_MMAP_FILE && HAVE_MUNMAP && HAVE_MSYNC
+/* mmap, munmap, msync.  */
  #define MAPPED_READING 1
  #if HAVE_SYSCONF && defined (_SC_PAGE_SIZE)
-/* msync, sysconf (_SC_PAGE_SIZE), ftruncate  */
+/* sysconf (_SC_PAGE_SIZE), ftruncate  */
  /* posix_fallocate used if available.  */
  #define MAPPED_WRITING 1
  #else

Re: [PATCH] gcc: Add tree walk case to reach A pack from B in ...B>. [PR118265]

2025-01-27 Thread Patrick Palka

On Wed, 1 Jan 2025, A J Ryan Solutions Ltd wrote:

> Hi and happy new year, this patch is to fix a compiler seg-fault as 
> encountered in the following example:

Hi, thanks for the patch!  Your fix makes sense to me, and I believe it
also fixes the testcases from PR102626 and at least some of its related
PRs.

> 
> template struct Class1{};
> 
> template class Class2;
> template...Un> class Class2
> {
>     public: void apply(){}
> };
> 
> Class1 class1_bool;
> Class1 class1_char;
> 
> int main()
> {
>     Class2 class2;
>     class2.apply();
> }

> where it ends up populating the V argument in the instantiated template at 
> the class2 declaration with the unknown placeholder and the segfault is when 
> it later tries to get the name for an incompletely
> resolved type.

Here's a more reduced example that's fixed by your patch:

  template struct A { };

  template
  void f(A...);

  int main() {
f(A<0>{}, A<1u>{}, A<2l>{}); // OK, Ts deduced to {int,unsigned,long}
  }

A related PR that isn't and shouldn't be fixed by this patch is PR84796
where the type parameter pack is declared at a different level than
the non-type parameter pack, e.g.:

  template
  struct A { };

  template
  struct B {
template
static void f(A);
  };

  int main() {
A<1, true, 'c'> a;
B::f(a);
  }

Before your patch we would ICE on this valid testcase, after we now reject it
with a bogus error:

: In function ‘int main()’:
:12:24: error: no matching function for call to ‘B::f(A<1, true, 'c'>&)’
:12:24: note: there is 1 candidate
:7:15: note: candidate 1: ‘template static void B::f(A) [with Ts ...Vs = {Vs ...}; Ts = {int, bool, char}]’
:7:15: note: template argument deduction/substitution failed:
:12:24: note:   ‘Vs’ is not equivalent to ‘1’

Which is no worse than ICEing, I suppose :)

> 
> The incorrect processing happens in unify_pack_expansion where it is calling 
> unify_one_argument
> here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l24718)
>  for each argument of the argument pack and recursively unifying the inner 
> arguments. In
> this example the inner argument against V (bool/char) is set in targs however 
> it is not seen as a pack argument in unify_pack_expansion because in the 
> unfixed code it does not reach and link this associated
> pack and later on when processing the collected arguments to substitute it in 
> in instantiate_class_template the argument has not been set as an argument 
> pack in targs and it doesn't match the the template
> specialisation parameter which is a parameter pack and it falls back to the 
> unknown placeholder.
> The parameter packs are originally linked in make_pack_expansion 
> here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l4246)
>  and the change is in
> this tree walk so that in the example above V is chained to Un and after 
> calling unify_one_argument it loops over both packs and extracts their 
> associated value set in targs and puts it into it's pack
> argument collection.
> 
> Regards
> Adam Ryan

> 
> 
> Subject: [PATCH] Add tree walk case to obtain A pack in
>  ...B>.
> 
> For non-type parameter packs when unifying the arguments in 
> unify_pack_expansion
> it iterates over the associated packs of a param so that when it recursivly
> unifies the param with the arguments it knows which targs have been populated
> with parameter pack arguments that it can then collect up. This change adds a
> tree walk so that in the example above it reaches ...A and adds it to the
> associated packs for ...B and therefore knows it will have been set in targs
> in unify_pack_expansion and processes it as per other pack arguments.
> 
> PR gcc/118265

PR c++/118265 rather

A ChangeLog entry is missing from the commit message as per
https://gcc.gnu.org/contribute.html#patches

Here's an example ChangeLog we can use for this patch:

gcc/cp/ChangeLog:

* pt.cc (find_parameter_packs_r) :
Walk into the type of a parameter pack.

> 
> Signed-off-by: Adam J Ryan 
> ---
>  gcc/cp/pt.cc|  5 +
>  gcc/testsuite/g++.dg/pr118265.C | 17 +
>  2 files changed, 22 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/pr118265.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 0ffa0b53e26..22f5d4b1875 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -4012,6 +4012,11 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
> void* data)
>   &find_parameter_packs_r, ppd, ppd->visited);
>return NULL_TREE;
>  
> +case TEMPLATE_PARM_INDEX:
> +  if (parameter_pack_p)
> +WALK_SUBTREE ( TREE_TYPE(t) );

There should be a space before the ( starting a function call not
after (as per the GNU coding style), so this should be
"WALK_SUBTREE (TREE_TYPE (t));"

> +  return NULL_TREE;
> +
>  case DECL_EXPR:
>{
>   tree decl = DECL_EXPR_DECL (t);
> dif

Re: [PATCH] RISC-V: ensure needed FRM restore is not eliminable [PR118646]

2025-01-27 Thread Jeff Law





On 1/27/25 12:48 PM, Vineet Gupta wrote:


On 1/26/25 05:29, Jeff Law wrote:


On 1/24/25 3:12 PM, Vineet Gupta wrote:

RV-Vector FP-INT insns use the rounding mode in FRM register which if
explicitly set for V insn needs, is saved/restored (although from the
psABI CC Spec, it is not clear if it actually a caller-saved or
callee-saved).

Anyhow in the failure case the save/restore were generated by the
Mode Switch pass, but then eliminated by sched1:DCE and Late-Combine.
Fix this by using unspec_volatile variant which won't be eliminated.

This showed up as SPEC2017 527.cam4 runtime aborts in glibc:round_away()
which checks for standard rounding modes and the "leaking" rounding mode
due to the bug happened to be a non-standard RISC-V specific RMM
"Round to Nearest, ties to Max".

This is testsuite clean:

Not sure how it could be clean as I think the test itself is busted ;-)


Sorry for the snafu; I would blame "send the patch out on friday" that bit me.
I used the raw test for debugging and coming up with the fix, but before
integrating the test properly I ran the existing testsuite w/ my patch and made
sure it didn't regress anything, so those test results are for real minus the
new test.
It happens.  I had a goof of similar nature trying to get something out 
the door last week as well :-)









What I wanted to do was use your testcase with Pan's patch to see if
Pan's patch resolved both issues.


Yes it does.



Your compiler patch may still be desirable as well, I really haven't
really evaluated that yet.


FWIW with Pan's rework the need for volatile goes away anyways.

The fixed up test (actually tested for A/B pass/fail) which can be integrated in
Pan's commit can be found below
If you could go ahead and install the test into the testsuite, it'd be 
appreciated.


Thanks!

jeff

Re: [PATCH v2 3/7] Remove m_total_lines support from input cache

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote:
> From: Andi Kleen 
> 
> With the new cache maintenance algorithm we don't need the
> maximum number of lines anymore. Remove all the code for that.
> 
> gcc/ChangeLog:
> 
>   PR preprocessor/118168
>   * input.cc (total_lines_num): Remove.
>   (file_cache_slot::evict): Dito.
>   (file_cache_slot::create): Dito.
>   (file_cache_slot::set_content): Dito.
>   (file_cache_slot::file_cache_slot): Dito.
>   (file_cache_slot::dump): Dito.

Thanks, this is OK for trunk once patch 2 goes in - though it's
"Ditto", not "Dito" :)

Dave

Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]

2025-01-27 Thread Simon Martin

Hi Jason,

On 27 Jan 2025, at 20:53, Jason Merrill wrote:

> On 1/27/25 2:34 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 27 Jan 2025, at 15:23, Jason Merrill wrote:
>>
>>> On 1/27/25 8:14 AM, Simon Martin wrote:
 Hi Jason,

 On 24 Jan 2025, at 16:51, Jason Merrill wrote:

> On 1/24/25 6:19 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 23 Jan 2025, at 22:56, Jason Merrill wrote:
>>
>>> On 1/23/25 12:06 PM, Simon Martin wrote:
 Hi Marek,

 On 23 Jan 2025, at 16:45, Marek Polacek wrote:

> On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote:
>> Hi Jason,
>>
>> On 20 Jan 2025, at 22:49, Jason Merrill wrote:
>>
>>> On 1/20/25 2:11 PM, Simon Martin wrote:
 Hi Jason,

 On 15 Jan 2025, at 22:42, Jason Merrill wrote:

> On 12/12/24 2:07 PM, Simon Martin wrote:
>> We currently ICE upon the following valid (under 
>> -Wno-vla)

>> code
>>
>>
>> === cut here ===
>> void f(int c) {
>> constexpr int r = 4;
>> [&](auto) { int t[r * c]; }(0);
>> }
>> === cut here ===
>>
>> The problem is that when parsing the lambda body, and 
>> more
>> specifically
>> the multiplication, we mark the lambda as
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>> even though the replacement of r by 4 is "undone" by the
>> call

>> to
>> build_min_non_dep in build_x_binary_op. This makes
>> prune_lambda_captures
>> remove the proxy declaration while it should not, and we
>> trip

>> on
>> an
>>
>> assert at instantiation time.
>
> Why does prune_lambda_captures remove the declaration if
> it's
> still
> used in the function body?  Setting
> LAMBDA_EXPR_CAPTURE_OPTIMIZED
> just
> means "we might have optimized away some captures", the 
> tree
> walk
> should have found the use put back by build_x_binary_op.
 I think that this is due to a bug in mark_const_cap_r,
 that’s
 been
 here since the beginning, i.e. r8-7213-g1577f10a637352: it
 decides
>>
 NOT
 to walk sub-trees when encountering a DECL_EXPR for a
 constant
 capture
 proxy (at lambda.cc:1769). I don’t understand why we

 wouldn’t
 want
 to continue.
>>>
>>> Because that's the declaration of the capture proxy, not a 
>>> use
>>> of
>>> it.
>> Indeed, thanks for clarifying.
>>
>>> Why aren't we finding the use in the declaration of t?
>> After further investigation, the problem is rather that 
>> neither
>> walk_tree_1 nor cp_walk_subtrees walk the dimensions of array
>> types,
>> so
>> we miss the uses.
>>
 Removing that line fixes the PR, but breaks 3 existing 
 tests
 (lambda-capture1.C, lambda-const11.C and lambda-const11a.C,
 that
 all
 assert that we optimise out the capture); that’s why I 
 did
 not
 pursue
 it in the first place.
>>>
>>> Right.
>>>
 But taking another look, it might not be that big a deal 
 that
 we
 don’t
 optimise those out: as soon as we use -O1 or above, the
 assignment
 to
>>
 the closure field actually disappears.
>>>
>>> Completely breaking this optimization is a big deal,
>>> particularly
>>> since it affects the layout of closure objects.  We can't
>>> always

>>> optimize everything away.
>> ACK.
>>
>> The attached updated patch makes cp_walk_subtrees walk array
>> type

>> dimensions, which fixes the initial PR without those 3
>> regressions.
>>
>>
>> Successfully tested on x86_64-pc-linux-gnu. Is it OK?
>>
>> Simon
>>
>> PS: In theory I think it’d make sense to do do this change 
>> in
>>
>> walk_tree_1 since C also supports VLAs, but doing so breaks
>> some
>>
>> OMP
>> tests. OMP can do interesting things with array bounds (see
>> r9-5354-gda972c05f48637), and fixing those would require
>> teaching

>> walk_tree_1 about the “omp dummy var” arra

Re: [PATCH v2 2/7] Rebalance file_cache input line cache dynamically

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote:
> From: Andi Kleen 
> 
> The input context file_cache maintains an array of anchors
> to speed up accessing lines before the previous line.
> The array has a fixed upper size and the algorithm relies
> on the linemap reporting the maximum number of lines in the file
> in advance to compute the position of each anchor in the cache.
> 
> This doesn't work for C which doesn't know the maximum number
> of lines before the files has finished parsing. The code
> has a fallback for this, but it is quite inefficient and
> effectively defeats the cache, so many accesses have to
> go through most of the input buffer to compute line
> boundaries. For large files this can be very costly
> as demonstrated in PR118168.
> 
> Use a different algorithm to maintain the cache without
> needing the maximum number of lines in advance. When the cache
> runs out of entries and the gap to the last line anchor gets
> too large, prune every second entry in the cache. This maintains
> even spacing of the line anchors without requiring the maximum
> index.
> 
> For the original PR this moves the overhead of enabling
> -Wmisleading-indentation to 32% with the default cache size.
> With a 10k entry cache it becomes noise.
> 
>   cc1 -O0 -fsyntax-only mypy.c   -quiet  ran
>     1.03 ± 0.05 times faster than cc1 -O0 -fsyntax-only  mypy.c   -
> quiet -Wmisleading-indentation --param=file-cache-lines=1
>     1.09 ± 0.08 times faster than cc1 -O0 -fsyntax-only  mypy.c   -
> quiet -Wmisleading-indentation --param=file-cache-lines=1000
>     1.32 ± 0.07 times faster than cc1 -O0 -fsyntax-only  mypy.c   -
> quiet -Wmisleading-indentation
> 
> The code could be further optimized, e.g. use the vectorized
> line search functions the preprocessor uses.
> 
> Also it seems the input cache always reads the whole file into
> memory, so perhaps it should just be using file mmap if possible.
> 
> gcc/ChangeLog:
> 
>   PR preprocessor/118168
>   * input.cc (file_cache_slot::get_next_line): Use new
> algorithm
>   to maintain
>   (file_cache_slot::read_line_num): Use binary search for
> lookup.

Thanks; this is OK for trunk.

I spent some time stepping through this to get clear in my mind how the
new algorithm works.  FWIW I found the patch below helpful, to clarify
in dumps about the slot index versus the line number of the slot; I'll
commit this as a followup at some point:

Dave

diff --git a/gcc/input.cc b/gcc/input.cc
index 11dd39412ea1..2abd2cd8eb0f 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -663,10 +663,11 @@ file_cache_slot::dump (FILE *out, int indent) const
   indent, "", (int)m_missing_trailing_newline);
   fprintf (out, "%*sline records (%i):\n",
   indent, "", m_line_record.length ());
+  int idx = 0;
   for (auto &line : m_line_record)
-fprintf (out, "%*sline %zi: byte offsets: %zi-%zi\n",
+fprintf (out, "%*s[%i]: line %zi: byte offsets: %zi-%zi\n",
 indent + 2, "",
-line.line_num, line.start_pos, line.end_pos);
+idx++, line.line_num, line.start_pos, line.end_pos);
 }
 
 /* Returns TRUE iff the cache would need to be filled with data coming

Re: [PATCH v2 4/7] Add a cache of recent lines

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote:
> From: Andi Kleen 
> 
> For larger files the file_cache line index will be spread out to make
> the index fit into the fixed buffer, so any access to the non latest
> line
> will need some skipping of lines.
> 
> Most accesses for line are near the latest line because
> a diagnostic is likely near where the scanner is currently lexing.
> 
> Add a second cache for recent lines. It is organized as a ring buffer
> and maintains the last 256 lines relative to the last input line.
> 
> With that, enabling -Wmisleading-indentation for the test case in
> PR preprocessor/118168, is within the run-to-run variation.
> 
> gcc/ChangeLog:
> 
>   PR preprocessor/118168
>   * input.cc (file_cache::m_line_recent,
>   m_line_recent_first, m_line_recent_last): Add.
>   (file_cache_slot::evict): Clear new fields.
>   (file_cache_slot::create): Clear new fields.
>   (file_cache_slot::file_cache_slot): Initialize new fields.
>   (file_cache_slot::~file_cache_slot): Release m_line_recent.
>   (file_cache_slot::get_next_line): Maintain ring buffer of
> lines
>   in m_line_recent.
>   (file_cache_slot::read_line_num): Use m_line_recent to look
> up
>   recent lines quickly.
> ---
>  gcc/input.cc | 49 -
>  1 file changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 82a58fdef27..b3fe38eb77c 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -126,6 +126,7 @@ public:
>  
>    static const size_t buffer_size = 4 * 1024;
>    static size_t line_record_size;
> +  static size_t recent_cached_lines_shift;
>  
>    /* The number of time this file has been accessed.  This is used
>   to designate which file cache to evict from the cache
> @@ -174,6 +175,13 @@ public:
>   this is scaled down dynamically, with the line_info becoming
> anchors.  */
>    vec m_line_record;
>  
> +  /* A cache of the recently seen lines. This is maintained as a
> ring
> + buffer. */
> +  vec m_line_recent;
> +
> +  /* First and last valid entry in m_line_recent.  */
> +  size_t m_line_recent_last, m_line_recent_first;
> +
>    void offset_buffer (int offset)
>    {
>  gcc_assert (offset < 0 ? m_alloc_offset + offset >= 0
> @@ -187,6 +195,7 @@ public:
>  };
>  
>  size_t file_cache_slot::line_record_size = 100;
> +size_t file_cache_slot::recent_cached_lines_shift = 8;
>  
>  /* Tune file_cache.  */
>  void
> @@ -391,6 +400,8 @@ file_cache_slot::evict ()
>    m_line_start_idx = 0;
>    m_line_num = 0;
>    m_line_record.truncate (0);
> +  m_line_recent_first = 0;
> +  m_line_recent_last = 0;
>    m_use_count = 0;
>    m_missing_trailing_newline = true;
>  }
> @@ -486,6 +497,8 @@ file_cache_slot::create (const
> file_cache::input_context &in_context,
>    m_nb_read = 0;
>    m_line_start_idx = 0;
>    m_line_num = 0;
> +  m_line_recent_first = 0;
> +  m_line_recent_last = 0;
>    m_line_record.truncate (0);
>    /* Ensure that this cache entry doesn't get evicted next time
>   add_file_to_cache_tab is called.  */
> @@ -592,9 +605,13 @@ file_cache::lookup_or_add_file (const char
> *file_path)
>  file_cache_slot::file_cache_slot ()
>  : m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0),
>    m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx
> (0),
> -  m_line_num (0), m_missing_trailing_newline (true)
> +  m_line_num (0), m_missing_trailing_newline (true),
> +  m_line_recent_last (0), m_line_recent_first (0)
>  {
>    m_line_record.create (0);
> +  m_line_recent.create (1U << recent_cached_lines_shift);
> +  for (int i = 0; i < 1 << recent_cached_lines_shift; i++)
> +    m_line_recent.quick_push (file_cache_slot::line_info (0, 0, 0));
>  }
>  
>  /* Destructor for a cache of file used by caret diagnostic.  */
> @@ -613,6 +630,7 @@ file_cache_slot::~file_cache_slot ()
>    m_data = 0;
>  }
>    m_line_record.release ();
> +  m_line_recent.release ();
>  }
>  
>  void
> @@ -865,6 +883,20 @@ file_cache_slot::get_next_line (char **line,
> ssize_t *line_len)
>      line_end - m_data));
>  }
>  
> +  /* Cache recent tail lines separately for fast access. This
> assumes
> + most accesses do not skip backwards.  */
> +  if (m_line_recent_last == m_line_recent_first
> + || m_line_recent[m_line_recent_last].line_num == m_line_num
> - 1)
> +    {
> +  size_t mask = ((size_t)1 << recent_cached_lines_shift) - 1;
> +  m_line_recent_last = (m_line_recent_last + 1) & mask;
> +  if (m_line_recent_last == m_line_recent_first)
> + m_line_recent_first = (m_line_recent_first + 1) & mask;
> +  m_line_recent[m_line_recent_last] =
> + file_cache_slot::line_info (m_line_num, m_line_start_idx,
> +     line_end - m_data);
> +    }
> +

If I reading this right, calls to get_next_line lead to insertions into
the ring buffer whilst the buffer is empty or the last

Re: [PATCH] Fortran: fix bogus diagnostics on renamed interface import [PR110993]

2025-01-27 Thread Harald Anlauf


Am 27.01.25 um 01:41 schrieb Jerry D:

On 1/26/25 2:07 PM, Harald Anlauf wrote:

Dear all,

in the checking of imported interfaces we need to use the local names
of procedures that are renamed-on-use, as the original name becomes
inaccessible.  Similarly, we should not compare interfaces of
non-bind(C) procedures against bind(C) interfaces that are not
explicitly made accessible via a use statement, see testcase.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Could this one be backportable, e.g. to 14-branch?

Thanks,
Harald



This is OK. Backport up to you.


Pushed to mainline as r15-7234-g9104472b645f76 .
Will backport to 14-branch in a couple of days.


Jerry


Thanks,
Harald

Re: [PATCH v2 5/7] Size input line cache based on file size

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote:
> From: Andi Kleen 
> 
> While the input line cache size now tunable it's better if the
> compiler
> auto tunes it. Otherwise large files needing random file access will
> still have to search many lines to find the right lines.
> 
> Add support for allocating one line anchor per hundred input lines.
> This means an overhead of ~235k per 1M input lines on 64bit, which
> seems reasonable.
> 
> gcc/ChangeLog:
> 
>   PR preprocessor/118168
>   * input.cc (file_cache_slot::get_next_line): Implement
>   dynamic sizing of m_line_record based on input length.
>   * params.opt: (param_file_cache_lines): Set to 0 to size
>   dynamically.
> ---
>  gcc/input.cc   | 11 ---
>  gcc/params.opt |  2 +-
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index b3fe38eb77c..d5d7dbb043e 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -194,7 +194,7 @@ public:
>  
>  };
>  
> -size_t file_cache_slot::line_record_size = 100;
> +size_t file_cache_slot::line_record_size = 0;
>  size_t file_cache_slot::recent_cached_lines_shift = 8;
>  
>  /* Tune file_cache.  */
> @@ -865,8 +865,13 @@ file_cache_slot::get_next_line (char **line,
> ssize_t *line_len)
>    size_t delta = rlen >= 1 ?
>   m_line_num - m_line_record[rlen - 1].line_num : 1;
>  
> +  size_t max_size = line_record_size;
> +  /* One anchor per hundred input lines.  */
> +  if (max_size == 0)
> + max_size = m_line_num / 100;
> +
>    /* If we're too far beyond drop half of the lines to
> rebalance.  */
> -  if (rlen == line_record_size && delta >= spacing*2)
> +  if (rlen == max_size && delta >= spacing*2)
>   {
>     size_t j = 0;
>     for (size_t i = 1; i < rlen; i += 2)
> @@ -876,7 +881,7 @@ file_cache_slot::get_next_line (char **line,
> ssize_t *line_len)
>     spacing *= 2;
>   }
>  
> -  if (rlen < line_record_size && delta >= spacing)
> +  if (rlen < max_size && delta >= spacing)
>   m_line_record.safe_push
>     (file_cache_slot::line_info (m_line_num,
>      m_line_start_idx,
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 5d234a607c0..a56c20af2f8 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -139,7 +139,7 @@ Common Joined UInteger
> Var(param_file_cache_files) Init(16) Param
>  Max number of files in the file cache.
>  
>  -param=file-cache-lines=
> -Common Joined UInteger Var(param_file_cache_lines) Init(100) Param
> +Common Joined UInteger Var(param_file_cache_lines) Init(0) Param
>  Max number of lines to index into file cache.
>  
>  -param=fsm-scale-path-stmts=

Please update the description line to indicate that 0 is automatic.

OK for trunk with that change, once the prerequisites are in.

Thanks
Dave

Re: [PATCH v2 6/7] Enable vectorization for input.cc find_end_of_line function

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote:
> From: Andi Kleen 
> 
> This is the hot function in input.cc
> 
> The vectorizer can vectorize it now, but in a generic cpu O2 x86
> build it isn't.
> Add a automatic target clone to handle it for x86 and build
> that function with O3.
> 
> The ifdef here is ugly, perhaps gcc should have a more convenient
> "clone for vectorization if possible" attribute to handle this
> portably.

This patch is very cool (no pun intended); how much does it help?

The patch is OK by me, but given that we're in stage 4, does a release
manager approve? [CCed]

Thanks
Dave

> 
> gcc/ChangeLog:
> 
>   * input.cc: (VECTORIZE): Add definition for x86.
>   (find_end_of_line): Mark for vectorizer.
> ---
>  gcc/input.cc | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index d5d7dbb043e..f1a15de66f1 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -740,11 +740,18 @@ file_cache_slot::maybe_read_data ()
>    return read_data ();
>  }
>  
> +#if defined(__x86_64__) && __GNUC__ >= 15
> +#define VECTORIZE
> __attribute__((target_clones("default,avx2,avx10.2,avx512f"),
> optimize("O3")))
> +#else
> +#define VECTORIZE
> +#endif
> +
>  /* Helper function for file_cache_slot::get_next_line (), to find
> the end of
>     the next line.  Returns with the memchr convention, i.e. nullptr
> if a line
>     terminator was not found.  We need to determine line endings in
> the same
>     manner that libcpp does: any of \n, \r\n, or \r is a line
> ending.  */
>  
> +VECTORIZE
>  static const char *
>  find_end_of_line (const char *s, size_t len)
>  {

[PATCH] c++: friend vs inherited guide confusion [PR117855]

2025-01-27 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/14?

-- >8 --

We recently started using the lang_decl_fn::context field to track
inheritedness of a deduction guide (for C++23 inherited CTAD).  This
new overloading of the field accidentally made DECL_FRIEND_CONTEXT
return non-NULL for inherited guides, which breaks the below testcase
overload resolution containing an inherited guide.

This patch fixes this by refining DECL_FRIEND_CONTEXT to return NULL
for deduction guides.

PR c++/117855

gcc/cp/ChangeLog:

* cp-tree.h (DECL_FRIEND_CONTEXT): Refine to exclude deduction
guides.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/class-deduction-inherited7.C: New test.
---
 gcc/cp/cp-tree.h |  3 ++-
 .../g++.dg/cpp23/class-deduction-inherited7.C| 12 
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c9128bd3e7b..dc527380921 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -3631,7 +3631,8 @@ struct GTY(()) lang_decl {
the DECL_FRIEND_CONTEXT for `f' will be `S'.  */
 #define DECL_FRIEND_CONTEXT(NODE)  \
   ((DECL_DECLARES_FUNCTION_P (NODE) && !DECL_VIRTUAL_P (NODE)  \
-&& !DECL_CONSTRUCTOR_P (NODE)) \
+&& !DECL_CONSTRUCTOR_P (NODE)  \
+&& (cxx_dialect < cxx23 || !deduction_guide_p (NODE))) \
? LANG_DECL_FN_CHECK (NODE)->context\
: NULL_TREE)
 
diff --git a/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C 
b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C
new file mode 100644
index 000..b1d5e89ad02
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C
@@ -0,0 +1,12 @@
+// PR c++/117855
+// { dg-do compile { target c++20 } }
+
+template  struct span { span(T&&);};
+template  span(T &&) -> span;
+template 
+struct this_span : span {
+  using span::span;
+};
+template  this_span(T &&) -> this_span;
+int vec;
+this_span a = vec;
-- 
2.48.1.91.g5f8f7081f7

Re: [PATCH] c++: friend vs inherited guide confusion [PR117855]

2025-01-27 Thread Patrick Palka

On Mon, 27 Jan 2025, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk/14?
> 
> -- >8 --
> 
> We recently started using the lang_decl_fn::context field to track
> inheritedness of a deduction guide (for C++23 inherited CTAD).  This
> new overloading of the field accidentally made DECL_FRIEND_CONTEXT
> return non-NULL for inherited guides, which breaks the below testcase
> overload resolution containing an inherited guide.
> 
> This patch fixes this by refining DECL_FRIEND_CONTEXT to return NULL
> for deduction guides.
> 
>   PR c++/117855
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (DECL_FRIEND_CONTEXT): Refine to exclude deduction
>   guides.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp23/class-deduction-inherited7.C: New test.
> ---
>  gcc/cp/cp-tree.h |  3 ++-
>  .../g++.dg/cpp23/class-deduction-inherited7.C| 12 
>  2 files changed, 14 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index c9128bd3e7b..dc527380921 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -3631,7 +3631,8 @@ struct GTY(()) lang_decl {
> the DECL_FRIEND_CONTEXT for `f' will be `S'.  */
>  #define DECL_FRIEND_CONTEXT(NODE)\
>((DECL_DECLARES_FUNCTION_P (NODE) && !DECL_VIRTUAL_P (NODE)\
> -&& !DECL_CONSTRUCTOR_P (NODE))   \
> +&& !DECL_CONSTRUCTOR_P (NODE)\
> +&& (cxx_dialect < cxx23 || !deduction_guide_p (NODE)))   \

N.B. I decided to check cxx_dialect here since deduction_guide_p seems
relatively expensive to call?  In another version of this patch I made
us track dguide-ness via a dedicated bit flag instead of the DECL_NAME
comparison to make the predicate essentially free (and potentially also
more robust) but I eventually shelved that as more stage 1 material.

> ? LANG_DECL_FN_CHECK (NODE)->context  \
> : NULL_TREE)
>  
> diff --git a/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C 
> b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C
> new file mode 100644
> index 000..b1d5e89ad02
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C
> @@ -0,0 +1,12 @@
> +// PR c++/117855
> +// { dg-do compile { target c++20 } }
> +
> +template  struct span { span(T&&);};
> +template  span(T &&) -> span;
> +template 
> +struct this_span : span {
> +  using span::span;
> +};
> +template  this_span(T &&) -> this_span;
> +int vec;
> +this_span a = vec;
> -- 
> 2.48.1.91.g5f8f7081f7
> 
>

Re: [PATCH v2 7/7] Add a unit test for random access in the file cache

2025-01-27 Thread David Malcolm

On Mon, 2025-01-27 at 11:25 -0500, David Malcolm wrote:
> On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote:
> > From: Andi Kleen 
> 
> Thanks for writing the test case!
> 
> Some nitpicks...
> 
> 
> > 
> > gcc/ChangeLog:
> > 
> > * input.cc (check_line): New.
> > (test_replacement): New function to test line caching.
> > (input_cc_tests): Call test_replacement
> > ---
> >  gcc/input.cc | 46 ++
> >  1 file changed, 46 insertions(+)
> > 
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index f1a15de66f1..524d5d2b6d1 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> > @@ -2346,6 +2346,51 @@ test_make_location_nonpure_range_endpoints
> > (const line_table_case &case_)
> >    ASSERT_FALSE (IS_ADHOC_LOC (get_finish (not_aaa_eq_bbb)));
> >  }
> >  
> > +/* Verify reading of a specific line LINENUM in TMP, FC.  */
> > +
> > +static void check_line (temp_source_file &tmp, file_cache &fc, int
> > linenum)
> > +{
> > +  char_span line = fc.get_source_line (tmp.get_filename (),
> > linenum);
> > +  int n;
> > +  ASSERT_TRUE (sscanf (line.get_buffer (), "%d", &n) == 1);
> 
> Please add a comment around here something like:
> 
> /* Although get_buffer's result isn't guaranteed to be
>    null-terminated, it will be for every line in test_replacement. 
> */
> 
> or somesuch.

...or rather, sscanf will always hit a "\n" character before running
off the end of the buffer for the uses within test_replacement.

Dave

Re: [PING^2,PATCH] Add target hook to assemble a VAR_DECL

2025-01-27 Thread Richard Sandiford

Georg-Johann Lay  writes:
> Am 24.01.25 um 08:18 schrieb Richard Biener:
>> On Thu, Jan 23, 2025 at 4:53 PM Georg-Johann Lay  wrote:
>>>
>>> Am 23.01.25 um 14:58 schrieb Richard Biener:
 On Thu, Jan 23, 2025 at 2:23 PM Georg-Johann Lay  wrote:
>
> Hi, this is Ping #2 for a patch from 2024.
>
> It adds a new target hook that allows to output
> assembly code for a VAR_DECL in a custom way.
>
> The default action is an obvious  no-op,
> i.e. assemble_variable() behaves like before.

 I tried to understand the AVR part of the series - there I fail to see
 what exactly special handling "io" and friends requires and how
 that was made work with the TLS noswitch section.

 I do not think the sentence the middle-end doesn't allow custom
 NOSWITCH sections is correct (it would be a matter of exporting
 get_noswitch_section), but I also don't exactly see how
 "io" and friends require a NOSWITCH.
>>>
>>> A noswitch section is one way to go.
>>>
>>> However, there is no way to attach a noswitch section to a VAR_DECL.
>> 
>> Hmm, I don't know varasm.cc by heart either but there's the select_section
>> hook invoked by get_variable_section () which looks like it could do the
>> trick.
>
> No.
>
> That hook only returns a section *name*, not a section object.
> So you cannot return an unnamed section, which by definition, don't
> have a section name.

Perhaps I misunderstand what you mean, but select_section does return
a "section *".

On the point about -fdata-sections, I think avr's unique_section
should be a no-op for these variables.  That should ensure that
the compiler doesn't set DECL_SECTION_NAME itself.  (I suppose
that source-code attributes that set DECL_SECTION_NAME should be
diagnosed as incompatible with the io attribute.)

FWIW, I agree with Richard that a hook to bypass most of
assemble_variable doesn't feel like the right abstraction.

Thanks,
Richard

Re: [PATCH v10] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2025-01-27 Thread Jason Merrill


On 1/27/25 10:41 AM, Simon Martin wrote:

Hi Jason,

On 17 Jan 2025, at 23:33, Jason Merrill wrote:


On 1/17/25 9:52 AM, Simon Martin wrote:

Hi Jason,

On 16 Jan 2025, at 22:49, Jason Merrill wrote:


On 10/16/24 11:43 AM, Simon Martin wrote:

As you know the patch had to be reverted due to PR117114, that
highlighted a bunch of issues with comparing DECL_VINDEXes: it
might
give false positives in case of multiple inheritance (the case in
PR117114), but also if there’s single inheritance by the



hierarchy
has
more than two levels (another issue I found while bootstrapping
with
rust enabled).


Yes, relying on DECL_VINDEX equality was wrong, sorry to mislead
you.


The attached updated patch introduces an overrides_p function,
based
on
the existing check_final_overrider, and uses it when the signatures



match.


That seems unnecessary.  It seems like removing that only breaks
Woverloaded-virt11.C, and making that work again only requires
bringing back the check that DECL_VINDEX (fndecl) is set (to any
value).  Or remembering that fndecl was a template, so it can't
really
have the same signature as a non-template, whatever same_signature_p



says.

That’s right, only Woverloaded-virt11.C fails without the
check_final_overrider call.

Thanks for the suggestion to check whether fndecl is a template. This
is
what the updated attached patch does, successfully tested on
x86_64-pc-linux-gnu.

OK for GCC 15? And if so, thoughts on backporting to release branches
(technically it’s a regression but it’s “just” an incorrect
warning fix, so probably not worth the risk)?


Right, I wouldn't backport.


+   if (warn_overloaded_virtual == 1
+   && overrider_fndecls.elements () == num_fns)
+ /* All the fns override a base virtual.  */
+ continue;


This looks like the only use of the overrider_fndecls hash_set.  A
hash_set seems a bit overkill for checking whether everything in fns
is an overrider; keeping track of how many times the old any_override
was set should work just as well?

Yeah you’re right :-/ I’ve changed my latest patch to simply count
overriders.


+   /* fndecls hides base_fndecls[k].  */
+   auto_vec &hiders =
+ hidden_base_fndecls.get_or_insert (base_fndecls[k]);
+   if (!hiders.contains (fndecl))
+ hiders.safe_push (fndecl);


Hmm, do you think users want a full list of the overloads that don't
override?  I'd think the problem is more the overload that doesn't
exist rather than the ones that do.  The current code ends up in the
OVERLOAD handling of dump_decl that just prints scope::name.

Indeed, the full list is probably not super useful... One problem with
the current code is that for conversion operators, it will give a note
such as “note:   by 'operator’”, so I propose to keep track of at
least one of the hiders, and use it to show the note (and get a proper
“by 'virtual B::operator char()'” note for conversion operators).

Hence the updated patch, successfully tested on x86_64-pc-linux-gnu. Ok
for trunk?



+   else if (!template_p /* Template methods don't override.  */
+&& same_signature_p (fndecl, base_fndecls[k]))
+ {
+   overriden_base_fndecls.add (base_fndecls[k]);
+   ++num_overriders;
+ }


I'm concerned that this will increment num_overriders multiple times for 
a single fndecl if it overrides functions in multiple bases.


Jason

Re: [PING^2,PATCH] Add target hook to assemble a VAR_DECL

2025-01-27 Thread Georg-Johann Lay


Am 27.01.25 um 16:19 schrieb Richard Sandiford:

Georg-Johann Lay  writes:

Am 24.01.25 um 08:18 schrieb Richard Biener:

On Thu, Jan 23, 2025 at 4:53 PM Georg-Johann Lay  wrote:


Am 23.01.25 um 14:58 schrieb Richard Biener:

On Thu, Jan 23, 2025 at 2:23 PM Georg-Johann Lay  wrote:


Hi, this is Ping #2 for a patch from 2024.

It adds a new target hook that allows to output
assembly code for a VAR_DECL in a custom way.

The default action is an obvious  no-op,
i.e. assemble_variable() behaves like before.


I tried to understand the AVR part of the series - there I fail to see
what exactly special handling "io" and friends requires and how
that was made work with the TLS noswitch section.

I do not think the sentence the middle-end doesn't allow custom
NOSWITCH sections is correct (it would be a matter of exporting
get_noswitch_section), but I also don't exactly see how
"io" and friends require a NOSWITCH.


A noswitch section is one way to go.

However, there is no way to attach a noswitch section to a VAR_DECL.


Hmm, I don't know varasm.cc by heart either but there's the select_section
hook invoked by get_variable_section () which looks like it could do the
trick.


No.

That hook only returns a section *name*, not a section object.
So you cannot return an unnamed section, which by definition, don't
have a section name.


Perhaps I misunderstand what you mean, but select_section does return
a "section *".


Hi Richard,

returning a custom noswitch section in targetm.asm_out.select_section
does not work.

Reason is that varasm.cc::get_variable_section has its own ideas how
noswitch / unnamed sections should work and bypasses the result
from the target hook.

I tried to set DECL_COMMON (decl) = 0; and also
DECL_COMMON (decl) = 1; but neither of which does work.



On the point about -fdata-sections, I think avr's unique_section
should be a no-op for these variables.  That should ensure that
the compiler doesn't set DECL_SECTION_NAME itself.  (I suppose
that source-code attributes that set DECL_SECTION_NAME should be
diagnosed as incompatible with the io attribute.)


The avr backend rejects initializers for respective decls since
initializers don't make sense for the attribute.  Outcome is that
varasm puts the decl into .bss or .comm and ignores anything
from targetm.asm_out.select_section.


FWIW, I agree with Richard that a hook to bypass most of
assemble_variable doesn't feel like the right abstraction.


So would you please propose something that would be approved
*and* will work.  assemble_variable() calls a zoo of hook functions
and macros that cannot be customized since they don't provide
enough context.

I attached a patch of the targetm.asm_out.select_section
proposal for reference.  Can you please point out what I
am doing wrong, or what other hooks are eligible to implement
such a feature.

The avr testsuite has already some test cases (all of which are
currently ICEing due to abusing TLS).

$ make -k check-gcc RUNTESTFLAGS="--target_board=atmega128-sim 
avr.exp=pr112952*.c "


Johanndiff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 656d3e7389b..f264ac22688 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -222,6 +222,8 @@ avr_arch_id avr_arch_index;
avr_asm_select_section, but it must not be local there because of GTY.  */
 static GTY(()) section *progmem_section[ADDR_SPACE_COUNT];
 
+static GTY(()) section *my_noswitch_section;
+
 /* Condition for insns/expanders from avr-dimode.md.  */
 bool avr_have_dimode = true;
 
@@ -11930,7 +11932,12 @@ avr_asm_init_sections (void)
 readonly_data_section->unnamed.callback = avr_output_data_section_asm_op;
   data_section->unnamed.callback = avr_output_data_section_asm_op;
   bss_section->unnamed.callback = avr_output_bss_section_asm_op;
-  tls_comm_section->noswitch.callback = avr_output_addr_attrib;
+
+  extern section* get_noswitch_section (unsigned int tree,
+	noswitch_section_callback callback);
+  unsigned flags = SECTION_NOSWITCH;
+  my_noswitch_section = get_noswitch_section (flags,
+	  avr_output_addr_attrib);
 }
 
 
@@ -12159,9 +12166,8 @@ avr_encode_section_info (tree decl, rtx rtl, int new_decl_p)
 		 other way than making use of tls_comm_section.  As we are
 		 using that section anyway, also use it in the public case.  */
 
-	  DECL_COMMON (decl) = 1;
+	  DECL_COMMON (decl) = 0;
 	  set_decl_section_name (decl, (const char *) nullptr);
-	  set_decl_tls_model (decl, (tls_model) 2);
 	}
 	}
 }
@@ -12209,6 +12215,11 @@ avr_encode_section_info (tree decl, rtx rtl, int new_decl_p)
 static section *
 avr_asm_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align)
 {
+  if (lookup_attribute ("io", DECL_ATTRIBUTES (decl))
+  || lookup_attribute ("io_low", DECL_ATTRIBUTES (decl))
+  || lookup_attribute ("address", DECL_ATTRIBUTES (decl)))
+return my_noswitch_section;
+
   section *sect = default_elf_select_section (decl, reloc, align);
 
   if (decl

[PATCH] tree-ssa-dce: Avoid creating invalid BBs with no outgoing edge (PR117892)

2025-01-27 Thread Martin Jambor

Hi,

Zhendong Su and Michal Jireš found out that our gimple DSE pass can,
under fairly specific conditions, remove a noreturn call which then
leaves behind a "normal" BB with no successor edges which following
passes do not expect.  This patch simply tells the pass to leave such
calls alone even when they otherwise appear to be dead.

Interestingly, our CFG verifier does not report this.  I'll put on my
todo list to add a test for it in the next stage 1.

Bootstrapped and tested on x86_64-linux, OK for master?

Thanks,

Martin


gcc/ChangeLog:

2025-01-27  Martin Jambor  

PR tree-optimization/117892
* tree-ssa-dse.cc (dse_optimize_call): Leave noreturn calls alone.

gcc/testsuite/ChangeLog:

2025-01-27  Martin Jambor  

PR tree-optimization/117892
* gcc.dg/tree-ssa/pr117892.c: New test.
* gcc.dg/tree-ssa/pr118517.c: Likewise.

co-authored-by: Michal Jireš 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr117892.c | 17 +
 gcc/testsuite/gcc.dg/tree-ssa/pr118517.c | 11 +++
 gcc/tree-ssa-dse.cc  |  5 +++--
 3 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr117892.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118517.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr117892.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr117892.c
new file mode 100644
index 000..d9b9c15095f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr117892.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+
+volatile int a;
+void b(int *c) {
+  int *d = 0;
+  *c = 0;
+  *d = 0;
+  __builtin_abort();
+}
+int main() {
+  int f;
+  if (a)
+b(&f);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118517.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr118517.c
new file mode 100644
index 000..3a34f6788a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118517.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-ipa-pure-const" } */
+
+void __attribute__((noreturn)) bar(void) {
+  __builtin_unreachable ();
+}
+
+int p;
+void foo() {
+  if (p) bar();
+}
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 753d7ef148b..492080a7eb0 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -1396,8 +1396,9 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap 
live_bytes)
   if (!node)
 return false;
 
-  if (stmt_could_throw_p (cfun, stmt)
-  && !cfun->can_delete_dead_exceptions)
+  if ((stmt_could_throw_p (cfun, stmt)
+   && !cfun->can_delete_dead_exceptions)
+  || gimple_call_noreturn_p (stmt))
 return false;
 
   /* If return value is used the call is not dead.  */
-- 
2.47.1

Re: [PATCH v3 2/5] c++/modules: Ignore TU-local entities where necessary

2025-01-27 Thread Patrick Palka

On Sat, 12 Oct 2024, Nathaniel Shead wrote:

> This version rewords all "ignored exposures" language.
> 
> I haven't fixed up the issue with DECL_TEMPLATE_INSTANTIATIONS for this
> patch.  I'll try to get to that as a separate patch if I find the time,
> but it's not 100% needed here I don't think.
> 
> It seems the discussion around PR115126 RE: the libgcc changes that
> people are leaning towards reverting the changes I made to the gthread
> static variables and instead having some sort of attribute to instruct
> the compiler to ignore the ODR issues; that should be relatively
> straight-forward to add (maybe something like [[gnu::ignore_tu_local]]?)
> but I'll probably handle that as a separate patch once I'm sure that's
> what we actually want to do, and would probably be worth corresponding
> with the Clang folks to see what thoughts they have.
> 
> Finally, I wasn't able to work out a good way to fold non-ODR usages of
> TU-local constants early.  I attempted modifying 'mark_use', which
> helped for 'constexpr' usages (though I ran into issues with e.g. 
> 
>   constexpr int n[1] = 0;
>   constexpr int x = n ? 1 : 2;
> 
> segfaulting as build_address was not expecting to see a constructor
> rather than a declaration here).  But I didn't look too hard into
> solving this as it appears that any modifications made by 'mark_use' are
> not actually applied to the primary template at all, and this is
> consistent across many places I investigated.
> 
> Given that erroring on these cases is still the status-quo, how easy it
> is to workaround most of the time, and that I'm still not sure how to
> solve this, I've also left this as a FIXME (with an XFAILed testcase) to
> revisit later.
> 
> OK for trunk?
> 
> -- >8 --
> 
> [basic.link] p14 lists a number of circumstances where a declaration
> naming a TU-local entity is not an exposure, notably the bodies of
> non-inline templates and friend declarations in classes.  This patch
> ensures that these references do not error when exporting the module.
> 
> We do need to still error on instantiation from a different module,
> however, in case this refers to a TU-local entity.  As such this patch
> adds a new tree TU_LOCAL_ENTITY which is used purely as a placeholder to
> poison any attempted template instantiations that refer to it.

I'm a bit late to the party, but this is a nice patch series!  I have a
couple of comments below.

> 
> This is also streamed for friend decls so that merging (based on the
> index of an entity into the friend decl list) doesn't break and to
> prevent complicating the logic; I imagine this shouldn't ever come up
> though.
> 
> We also add a new warning, '-Wtemplate-names-tu-local', to handle the
> case where someone accidentally refers to a TU-local value from within a
> non-inline function template.  This will compile without errors as-is,
> but any attempt to instantiate the decl will fail; this warning can be
> used to ensure that this doesn't happen.  Unfortunately the warning has
> quite some false positives; for instance, a user could deliberately only
> call explicit instantiations of the decl, or use 'if constexpr' to avoid
> instantiating the TU-local entity from other TUs, neither of which are
> currently detected.
> 
> The main piece that this patch doesn't yet attempt to solve is ADL: as
> specified, if ADL adds an overload set that includes a translation-unit
> local entity when instantiating a template, that overload set is now
> poisoned and counts as an exposure.  Unfortunately, we don't currently
> differentiate between decls that are hidden due to not being exported,
> or decls that are hidden due to being hidden friends, so this patch
> instead just keeps the current (wrong) behaviour of non-exported
> entities not being visible to ADL at all.
> 
> Additionally, this patch doesn't attempt to ignore non-ODR uses of
> constants in constexpr functions or templates.  The obvious approach of
> folding them early in 'mark_use' doesn't seem to work (for a variety of
> reasons), so this leaves this to a later patch to implement, as it's at
> least no worse than the current behaviour and easy enough to workaround.
> 
> For completeness this patch adds a new xtreme-header testcase to ensure
> that we have no regressions with regards to exposures of TU-local
> declarations in the standard library header files.  A more restrictive
> test would be to do 'export extern "C++"' here, but unfortunately the
> system headers on some targets declare TU-local entities, so we'll make
> do with checking that at least the C++ standard library headers don't
> refer to such entities.
> 
> gcc/c-family/ChangeLog:
> 
>   * c.opt: New warning '-Wtemplate-names-tu-local'.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-objcp-common.cc (cp_tree_size): Add TU_LOCAL_ENTITY.
>   * cp-tree.def (TU_LOCAL_ENTITY): New tree code.
>   * cp-tree.h (struct tree_tu_local_entity): New type.
>   (TU_LOCAL_ENTITY_NAME): New accessor.
>   (TU_LOCAL

Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-01-27 Thread Torbjorn SVENSSON





On 2025-01-27 14:37, Christophe Lyon wrote:

On Mon, 27 Jan 2025 at 13:30, Torbjorn SVENSSON
 wrote:


Hi Christophe,

On 2025-01-27 13:07, Christophe Lyon wrote:

Hi Torbjorn,

On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON
 wrote:


Another ping... :)

Kind regards,
Torbjörn

On 2024-12-18 18:35, Torbjorn SVENSSON wrote:

Gentle ping :)

Kind regards,
Torbjörn

On 2024-11-14 17:16, Torbjorn SVENSSON wrote:



On 2024-11-14 16:32, Christophe Lyon wrote:

On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON
 wrote:


Ok for trunk and releases/gcc-14?

--

A long time ago, this test forced -march=armv6.

With -marm, the generated assembler is:
foo:
   sub r0, r0, #48
   cmp r0, #9
   movhi   r0, #0
   movls   r0, #1
   bx  lr

With -mthumb, the generated assembler is:
foo:
   subsr0, r0, #48
   movsr2, #9
   uxtbr3, r0
   movsr0, #0
   cmp r2, r3
   adcsr0, r0, r0
   uxtbr0, r0
   bx  lr

Require effective-target arm_arm_ok to skip the test for thumb-only
targets (Cortex-M).

gcc/testsuite/ChangeLog:

   * gcc.target/arm/unsigned-extend-1.c: Use effective-target
   arm_arm_ok.

Signed-off-by: Torbjörn SVENSSON 
---
gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/
testsuite/gcc.target/arm/unsigned-extend-1.c
index 3b4ab048fb0..73f2e1a556d 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
@@ -1,4 +1,5 @@
/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
/* { dg-options "-O2" } */



So I'm guessing arm_arm_ok fails when you are testing for M-profile
targets?
What happens when you test for A-profile, providing -mthumb as part of
your runtest flags?





Looking at our CI results, without your patch the test passes on:
cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode,
and it fails only on cortex-m0 and m23.
Does this match your observations?


Without my patch, I see failure on M0, M0+ and M23.
With my patch, it goes to unsupported on all Cortex-M targets that I test.


So your patch will make the test unsupported on m33, m3, m55 and m7
while it currently passes?

Am I missing something?


I think you might have missed my comment down



Thanks,

Christophe



For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft):

Testing arm/unsigned-extend-1.c
check_compile tool: gcc for arm_arm_ok
doing compile
Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21114.c  -
mthumb -mcpu=cortex-m0 -mfloat-abi=soft   -fdiagnostics-plain-output
-marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s
(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c -
mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -
marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s
pid is 21254 -21254
cc1: error: target CPU does not support ARM mode
pid is -1
close result is 21254 exp6 0 1
output is cc1: error: target CPU does not support ARM mode
status 1
compiler exited with status 1
UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c


For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft):

Testing arm/unsigned-extend-1.c
check_compile tool: gcc for arm_arm_ok
doing compile
Executing on host: .../bin/arm-none-eabi-gcc  arm_arm_ok21296.c  -
mthumb -mcpu=cortex-a7 -mfloat-abi=soft   -fdiagnostics-plain-output
-marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s
(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c -
mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output -
marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s
pid is 21434 -21434
pid is -1
output is  status 0
doing compile
Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/
gcc.target/arm/unsigned-extend-1.c  -mthumb -mcpu=cortex-a7 -mfloat-
abi=soft   -fdiagnostics-plain-output  -O2 -ffat-lto-objects -fno-
ident -S -o unsigned-extend-1.s(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc .../gcc/testsuite/
gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat-
abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno-ident -
S -o unsigned-extend-1.s
pid is 21438 -21438
pid is -1
output is  status 0
check_compile tool: gcc for exceptions_enabled
doing compile
Executing on host: .../bin/arm-none-eabi-gcc
exceptions_enabled21296.cc   -mthumb -mcpu=cortex-a7 -mfloat-
abi=soft   -fdiagnostics-plain-output -Wno-complain-wrong-lang -S
-o exceptions_enabled21296.s(timeout = 800)
spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc
exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat-abi=soft -
fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o
exceptions_enabled21296.s
pid is 21442 -21442
pid is -1
output is  status 0
PASS: gcc.target/arm/unsigned-extend-1.c

Re: [PATCH v2 7/7] Add a unit test for random access in the file cache

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote:
> From: Andi Kleen 

Thanks for writing the test case!

Some nitpicks...


> 
> gcc/ChangeLog:
> 
>   * input.cc (check_line): New.
>   (test_replacement): New function to test line caching.
>   (input_cc_tests): Call test_replacement
> ---
>  gcc/input.cc | 46 ++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index f1a15de66f1..524d5d2b6d1 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -2346,6 +2346,51 @@ test_make_location_nonpure_range_endpoints
> (const line_table_case &case_)
>    ASSERT_FALSE (IS_ADHOC_LOC (get_finish (not_aaa_eq_bbb)));
>  }
>  
> +/* Verify reading of a specific line LINENUM in TMP, FC.  */
> +
> +static void check_line (temp_source_file &tmp, file_cache &fc, int
> linenum)
> +{
> +  char_span line = fc.get_source_line (tmp.get_filename (),
> linenum);
> +  int n;
> +  ASSERT_TRUE (sscanf (line.get_buffer (), "%d", &n) == 1);

Please add a comment around here something like:

/* Although get_buffer's result isn't guaranteed to be
   null-terminated, it will be for every line in test_replacement.  */

or somesuch.

> +  ASSERT_EQ (n, linenum);
> +}
> +
> +/* Test file cache replacement.  */
> +
> +static void test_replacement ()
> +{
> +  const int maxline = 1000;
> +  char *vec = XNEWVEC (char, maxline * 15);
> +  char *p = vec;
> +  int i;
> +  for (i = 1; i <= maxline; i++)
> +    {
> +  p += sprintf (p, "%d\n", i);
> +    }
> +  temp_source_file tmp (SELFTEST_LOCATION, ".txt", vec);
> +  free (vec);
> +  file_cache fc;
> +
> +  for (i = 2; i <= maxline; i++)
> +    {
> +  check_line (tmp, fc, i);
> +  check_line (tmp, fc, i - 1);
> +  if (i >= 10)
> + check_line (tmp, fc, i - 9);
> +  if (i >= 350) /* Exceed the look behind cache.  */
> + check_line (tmp, fc, i - 300);
> +    }
> +  for (i = 5; i <= maxline; i += 100)
> +    {
> +  check_line (tmp, fc, i);
> +    }
> +  for (i = 1; i <= maxline; i++)
> +    {
> +  check_line (tmp, fc, i);
> +    }

Probably should lose the {} around the single statement being guarded
by the "if" in these two loops.

Patch 7 is OK otherwise, and I'm taking a look at the rest of the
patches now; thanks.

Dave



> +}
> +
>  /* Verify reading of input files (e.g. for caret-based
> diagnostics).  */
>  
>  static void
> @@ -4296,6 +4341,7 @@ input_cc_tests ()
>  
>    test_reading_source_line ();
>    test_reading_source_buffer ();
> +  test_replacement ();
>  
>    test_line_offset_overflow ();
>

Re: [PATCH] arm: libbacktrace: Check if the compiler supports __sync atomics

2025-01-27 Thread Ian Lance Taylor

Richard Earnshaw  writes:

> Older versions of the Arm architecture lack support for __sync
> operations directly in hardware and require calls into appropriate
> operating-system hooks.  But such hooks obviously don't exist in a
> freestanding environment.
>
> Consquently, it is incorrect to assume during configure that such
> functions will exist and we need a configure-time check to determine
> whether or not these routines will work.
>
> libbacktrace:
>
>   * configure.ac: Always check if the compiler supports __sync
>   operations.
>   * configure: Regenerated.
> ---
>
> A consequence of this assumption that these functions 'just work' (TM)
> is that we build libbacktrace with threading support, but that then
> fails at link time when we find out that they do not.  I'm not sure I
> really understand why we can assume that having with_target_subdir set
> is enough to assume that the operations exist: it isn't for Arm and it
> looks like it wasn't for hpux either.  Perhaps the whole logic here
> needs reconsideration...

My understanding was that libatomic would provide implementations of the
__sync operations where required. I guess that is not always the case,
though I don't understand why that should be.

In any case, this patch is fine.

Thanks.

Ian

Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]

2025-01-27 Thread Simon Martin

Hi Jason,

On 27 Jan 2025, at 15:23, Jason Merrill wrote:

> On 1/27/25 8:14 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 24 Jan 2025, at 16:51, Jason Merrill wrote:
>>
>>> On 1/24/25 6:19 AM, Simon Martin wrote:
 Hi Jason,

 On 23 Jan 2025, at 22:56, Jason Merrill wrote:

> On 1/23/25 12:06 PM, Simon Martin wrote:
>> Hi Marek,
>>
>> On 23 Jan 2025, at 16:45, Marek Polacek wrote:
>>
>>> On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote:
 Hi Jason,

 On 20 Jan 2025, at 22:49, Jason Merrill wrote:

> On 1/20/25 2:11 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 15 Jan 2025, at 22:42, Jason Merrill wrote:
>>
>>> On 12/12/24 2:07 PM, Simon Martin wrote:
 We currently ICE upon the following valid (under -Wno-vla)
>>
 code


 === cut here ===
 void f(int c) {
constexpr int r = 4;
[&](auto) { int t[r * c]; }(0);
 }
 === cut here ===

 The problem is that when parsing the lambda body, and more
 specifically
 the multiplication, we mark the lambda as
 LAMBDA_EXPR_CAPTURE_OPTIMIZED
 even though the replacement of r by 4 is "undone" by the 
 call
>>
 to
 build_min_non_dep in build_x_binary_op. This makes
 prune_lambda_captures
 remove the proxy declaration while it should not, and we 
 trip
>>
 on
 an

 assert at instantiation time.
>>>
>>> Why does prune_lambda_captures remove the declaration if 
>>> it's
>>> still
>>> used in the function body?  Setting
>>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>>> just
>>> means "we might have optimized away some captures", the tree
>>> walk
>>> should have found the use put back by build_x_binary_op.
>> I think that this is due to a bug in mark_const_cap_r, 
>> that’s
>> been
>> here since the beginning, i.e. r8-7213-g1577f10a637352: it
>> decides

>> NOT
>> to walk sub-trees when encountering a DECL_EXPR for a 
>> constant
>> capture
>> proxy (at lambda.cc:1769). I don’t understand why we
>>
>> wouldn’t
>> want
>> to continue.
>
> Because that's the declaration of the capture proxy, not a use
> of
> it.
 Indeed, thanks for clarifying.

> Why aren't we finding the use in the declaration of t?
 After further investigation, the problem is rather that neither
 walk_tree_1 nor cp_walk_subtrees walk the dimensions of array
 types,
 so
 we miss the uses.

>> Removing that line fixes the PR, but breaks 3 existing tests
>> (lambda-capture1.C, lambda-const11.C and lambda-const11a.C,
>> that
>> all
>> assert that we optimise out the capture); that’s why I did
>> not
>> pursue
>> it in the first place.
>
> Right.
>
>> But taking another look, it might not be that big a deal that
>> we
>> don’t
>> optimise those out: as soon as we use -O1 or above, the
>> assignment
>> to

>> the closure field actually disappears.
>
> Completely breaking this optimization is a big deal,
> particularly
> since it affects the layout of closure objects.  We can't 
> always
>>
> optimize everything away.
 ACK.

 The attached updated patch makes cp_walk_subtrees walk array 
 type
>>
 dimensions, which fixes the initial PR without those 3
 regressions.


 Successfully tested on x86_64-pc-linux-gnu. Is it OK?

 Simon

 PS: In theory I think it’d make sense to do do this change in

 walk_tree_1 since C also supports VLAs, but doing so breaks 
 some

 OMP
 tests. OMP can do interesting things with array bounds (see
 r9-5354-gda972c05f48637), and fixing those would require 
 teaching
>>
 walk_tree_1 about the “omp dummy var” array bounds, which I
 think
 would be a bit ugly. And I’m not aware of any C case that 
 would
 be
 improved/fixed by doing this change, so we’re probably fine 
 not
 doing
 it.
>>>
From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17
 00:00:00
 2001
 From: Simon Martin 
 Date: Wed, 22 Jan 2025 16:19:47 +0100
 Subject: [PATCH] c++: Don't prune constant capture proxies only
 used
>

Re: [PATCH v2 1/7] Add tunables for input buffer

2025-01-27 Thread David Malcolm

On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote:
> From: Andi Kleen 
> 
> The input machinery to read the source code independent of the lexer
> has a range of hard coded maximum array sizes that can impact
> performance.
> Make them tunable.
> 
> input.cc is part of libcommon so it cannot direct access params
> without a level of indirection.

Thanks; this patch is OK for trunk

Dave

> 
> gcc/ChangeLog:
> 
>   PR preprocessor/118168
>   * input.cc (file_cache::tune): New function.
>   * input.h (class file_cache): Make tunables non const.
>   * params.opt: Add new tunables.
>   * toplev.cc (toplev::main): Initialize input buffer context
>   tunables.
> ---
>  gcc/input.cc   | 18 +-
>  gcc/input.h    |  4 +++-
>  gcc/params.opt |  8 
>  gcc/toplev.cc  |  2 ++
>  4 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 7ed80cad13f..f824c34e0cc 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -79,6 +79,10 @@ public:
>    void evict ();
>    void set_content (const char *buf, size_t sz);
>  
> +  static void tune(size_t line_record_size_) {
> +  line_record_size = line_record_size_;
> +  }
> +
>   private:
>    /* These are information used to store a line boundary.  */
>    class line_info
> @@ -116,7 +120,7 @@ public:
>    bool goto_next_line ();
>  
>    static const size_t buffer_size = 4 * 1024;
> -  static const size_t line_record_size = 100;
> +  static size_t line_record_size;
>  
>    /* The number of time this file has been accessed.  This is used
>   to designate which file cache to evict from the cache
> @@ -189,6 +193,18 @@ public:
>  
>  };
>  
> +size_t file_cache_slot::line_record_size = 100;
> +
> +/* Tune file_cache.  */
> +void
> +file_cache::tune (size_t num_file_slots_, size_t lines)
> +{
> +  num_file_slots = num_file_slots_;
> +  file_cache_slot::tune (lines);
> +}
> +
> +size_t file_cache::num_file_slots = 16;
> +
>  static const char *
>  find_end_of_line (const char *s, size_t len);
>  
> diff --git a/gcc/input.h b/gcc/input.h
> index 18ccf4429fc..a60afe80681 100644
> --- a/gcc/input.h
> +++ b/gcc/input.h
> @@ -161,13 +161,15 @@ class file_cache
>    const char *buffer,
>    size_t sz);
>  
> +  static void tune(size_t num_file_slots_, size_t lines);
> +
>   private:
>    file_cache_slot *evicted_cache_tab_entry (unsigned
> *highest_use_count);
>    file_cache_slot *add_file (const char *file_path);
>    file_cache_slot *lookup_file (const char *file_path);
>  
>   private:
> -  static const size_t num_file_slots = 16;
> +  static size_t num_file_slots;
>    file_cache_slot *m_file_slots;
>    input_context m_input_context;
>  };
> diff --git a/gcc/params.opt b/gcc/params.opt
> index b5e7800d7e4..5d234a607c0 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -134,6 +134,14 @@ Maximum size (in bytes) of objects tracked
> bytewise by dead store elimination.
>  Common Joined UInteger Var(param_early_inlining_insns) Init(6)
> Optimization Param
>  Maximal estimated growth of function body caused by early inlining
> of single call.
>  
> +-param=file-cache-files=
> +Common Joined UInteger Var(param_file_cache_files) Init(16) Param
> +Max number of files in the file cache.
> +
> +-param=file-cache-lines=
> +Common Joined UInteger Var(param_file_cache_lines) Init(100) Param
> +Max number of lines to index into file cache.
> +
>  -param=fsm-scale-path-stmts=
>  Common Joined UInteger Var(param_fsm_scale_path_stmts) Init(2)
> IntegerRange(1, 10) Param Optimization
>  Scale factor to apply to the number of statements in a threading
> path crossing a loop backedge when comparing to max-jump-thread-
> duplication-stmts.
> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> index d45a12cab45..e03af8b1805 100644
> --- a/gcc/toplev.cc
> +++ b/gcc/toplev.cc
> @@ -2333,6 +2333,8 @@ toplev::main (int argc, char **argv)
>     UNKNOWN_LOCATION, global_dc,
>     targetm.target_option.override);
>  
> +  file_cache::tune (param_file_cache_files, param_file_cache_lines);
> +
>    handle_common_deferred_options ();
>  
>    init_local_tick ();

Re: [PATCH] gcc: Add tree walk case to reach A pack from B in ...B>. [PR118265]

2025-01-27 Thread Jason Merrill


On 1/27/25 1:26 PM, Patrick Palka wrote:

On Wed, 1 Jan 2025, A J Ryan Solutions Ltd wrote:


Hi and happy new year, this patch is to fix a compiler seg-fault as encountered 
in the following example:


Hi, thanks for the patch!  Your fix makes sense to me, and I believe it
also fixes the testcases from PR102626 and at least some of its related
PRs.


Yes, thanks!

In addition to Patrick's comments, all of which I agree with, the 
subject line should start with "c++" rather than "gcc".  We also leave 
out a closing period.


The subject line, and the rest of the commit message, must have lines no 
longer than 76 characters.


For the subject line specifically, under 70 characters is preferable, 
for the sake of git log --oneline.


Also, the testcase should not go in the main g++.dg/ directory, but a 
subdirectory, with a filename indicating what it tests.  Perhaps 
cpp1z/variadic-nontype1.C



template struct Class1{};

template class Class2;
template...Un> class Class2
{
     public: void apply(){}
};

Class1 class1_bool;
Class1 class1_char;

int main()
{
     Class2 class2;
     class2.apply();
}



where it ends up populating the V argument in the instantiated template at the 
class2 declaration with the unknown placeholder and the segfault is when it 
later tries to get the name for an incompletely
resolved type.


Here's a more reduced example that's fixed by your patch:

   template struct A { };

   template
   void f(A...);

   int main() {
 f(A<0>{}, A<1u>{}, A<2l>{}); // OK, Ts deduced to {int,unsigned,long}
   }

A related PR that isn't and shouldn't be fixed by this patch is PR84796
where the type parameter pack is declared at a different level than
the non-type parameter pack, e.g.:

   template
   struct A { };

   template
   struct B {
 template
 static void f(A);
   };

   int main() {
 A<1, true, 'c'> a;
 B::f(a);
   }

Before your patch we would ICE on this valid testcase, after we now reject it
with a bogus error:

: In function ‘int main()’:
:12:24: error: no matching function for call to ‘B::f(A<1, 
true, 'c'>&)’
:12:24: note: there is 1 candidate
:7:15: note: candidate 1: ‘template static void B::f(A) [with Ts ...Vs = {Vs ...}; Ts = {int, bool, char}]’
:7:15: note: template argument deduction/substitution failed:
:12:24: note:   ‘Vs’ is not equivalent to ‘1’

Which is no worse than ICEing, I suppose :)



The incorrect processing happens in unify_pack_expansion where it is calling 
unify_one_argument
here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l24718)
 for each argument of the argument pack and recursively unifying the inner 
arguments. In
this example the inner argument against V (bool/char) is set in targs however 
it is not seen as a pack argument in unify_pack_expansion because in the 
unfixed code it does not reach and link this associated
pack and later on when processing the collected arguments to substitute it in 
in instantiate_class_template the argument has not been set as an argument pack 
in targs and it doesn't match the the template
specialisation parameter which is a parameter pack and it falls back to the 
unknown placeholder.
The parameter packs are originally linked in make_pack_expansion 
here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l4246)
 and the change is in
this tree walk so that in the example above V is chained to Un and after 
calling unify_one_argument it loops over both packs and extracts their 
associated value set in targs and puts it into it's pack
argument collection.

Regards
Adam Ryan





Subject: [PATCH] Add tree walk case to obtain A pack in
  ...B>.

For non-type parameter packs when unifying the arguments in unify_pack_expansion
it iterates over the associated packs of a param so that when it recursivly
unifies the param with the arguments it knows which targs have been populated
with parameter pack arguments that it can then collect up. This change adds a
tree walk so that in the example above it reaches ...A and adds it to the
associated packs for ...B and therefore knows it will have been set in targs
in unify_pack_expansion and processes it as per other pack arguments.

PR gcc/118265


PR c++/118265 rather

A ChangeLog entry is missing from the commit message as per
https://gcc.gnu.org/contribute.html#patches

Here's an example ChangeLog we can use for this patch:

gcc/cp/ChangeLog:

* pt.cc (find_parameter_packs_r) :
Walk into the type of a parameter pack.



Signed-off-by: Adam J Ryan 
---
  gcc/cp/pt.cc|  5 +
  gcc/testsuite/g++.dg/pr118265.C | 17 +
  2 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/pr118265.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0ffa0b53e26..22f5d4b1875 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4012,6 +4012,11 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees,

Re: [PATCH] RISC-V: ensure needed FRM restore is not eliminable [PR118646]

2025-01-27 Thread Vineet Gupta



On 1/26/25 05:29, Jeff Law wrote:
>
> On 1/24/25 3:12 PM, Vineet Gupta wrote:
>> RV-Vector FP-INT insns use the rounding mode in FRM register which if
>> explicitly set for V insn needs, is saved/restored (although from the
>> psABI CC Spec, it is not clear if it actually a caller-saved or
>> callee-saved).
>>
>> Anyhow in the failure case the save/restore were generated by the
>> Mode Switch pass, but then eliminated by sched1:DCE and Late-Combine.
>> Fix this by using unspec_volatile variant which won't be eliminated.
>>
>> This showed up as SPEC2017 527.cam4 runtime aborts in glibc:round_away()
>> which checks for standard rounding modes and the "leaking" rounding mode
>> due to the bug happened to be a non-standard RISC-V specific RMM
>> "Round to Nearest, ties to Max".
>>
>> This is testsuite clean:
> Not sure how it could be clean as I think the test itself is busted ;-)

Sorry for the snafu; I would blame "send the patch out on friday" that bit me.
I used the raw test for debugging and coming up with the fix, but before
integrating the test properly I ran the existing testsuite w/ my patch and made
sure it didn't regress anything, so those test results are for real minus the
new test.

> As-is it'll trigger compile time failures:
>> FAIL: gfortran.target/riscv/rvv/pr118646.f90   -O0  (test for excess errors)
>> Excess errors:
>> /home/jlaw/test/gcc/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90:18:12:
>>  Warning: Deleted feature: End expression in DO loop at (1) must be integer
>> /home/jlaw/test/gcc/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90:22:15:
>>  Warning: Deleted feature: End expression in DO loop at (1) must be integer
>> /home/jlaw/test/gcc/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90:36:18:
>>  Warning: Deleted feature: End expression in DO loop at (1) must be integer

Added -std-leagcy to address that


> A "-w" will work around that, but then there's no Fortran equivalent of 
> main and we get a link error:
>> FAIL: gfortran.target/riscv/rvv/pr118646.f90   -O0  (test for excess errors)
>> Excess errors:
>> /release/linux/.build/src/glibc-git-4d29ec7c/csu/../sysdeps/riscv/start.S:67:(.text+0x22):
>>  undefined reference to `main'
>>
>> UNRESOLVED: gfortran.target/riscv/rvv/pr118646.f90   -O0  compilation failed 
>> to produce executable

Its not a run test, we just have to scan number of f.rm instances, which was
also missing in the test :-(


> What I wanted to do was use your testcase with Pan's patch to see if 
> Pan's patch resolved both issues.

Yes it does.


> Your compiler patch may still be desirable as well, I really haven't 
> really evaluated that yet.

FWIW with Pan's rework the need for volatile goes away anyways.

The fixed up test (actually tested for A/B pass/fail) which can be integrated in
Pan's commit can be found below

->
diff --git a/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90
b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90
new file mode 100644
index ..e2a034316123
--- /dev/null
+++ b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90
@@ -0,0 +1,48 @@
+! Reduced from SPEC2017 527.cam4 zm_conv.F90
+
+! { dg-do compile }
+! { dg-options "-Ofast -std=legacy -march=rv64gcv_zvl256b_zba_zbb_zbs_zicond
-ftree-vectorize -mabi=lp64d" }
+
+module a
+  contains
+subroutine b(f)
+
+   real d(4)
+   integer e(4)
+   integer f(4)
+   real hmax(4)
+   real g(4)
+
+   integer h(4)
+   integer l(4,5)
+   do i = 1,c
+  h(i) = 0
+   end do
+   do k = j  ,1
+  do i = 1,c
+ q = g(i) + hmax(i)
+ if (k >= nint(d(i)) .and. k <= e(i) .and. q > 1.e4) then
+    f(i) = k
+ end if
+  if (k < o ) then
+    if (buoy<= 0.) then
+   l(i,h) = k
+    end if
+ end if
+  end do
+   end do
+   do n = 1,5
+  do k = 1,m
+ do i = 1,c
+    if (k > l(i,n)) then
+   p = r()
+    end if
+ end do
+  end do
+   end do
+end
+end
+
+! { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 1 } }
+! { dg-final { scan-assembler-times {fsrmi\s+[01234]} 1 } }
+! { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 1 } }

Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]

2025-01-27 Thread Jason Merrill


On 1/27/25 2:34 PM, Simon Martin wrote:

Hi Jason,

On 27 Jan 2025, at 15:23, Jason Merrill wrote:


On 1/27/25 8:14 AM, Simon Martin wrote:

Hi Jason,

On 24 Jan 2025, at 16:51, Jason Merrill wrote:


On 1/24/25 6:19 AM, Simon Martin wrote:

Hi Jason,

On 23 Jan 2025, at 22:56, Jason Merrill wrote:


On 1/23/25 12:06 PM, Simon Martin wrote:

Hi Marek,

On 23 Jan 2025, at 16:45, Marek Polacek wrote:


On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote:

Hi Jason,

On 20 Jan 2025, at 22:49, Jason Merrill wrote:


On 1/20/25 2:11 PM, Simon Martin wrote:

Hi Jason,

On 15 Jan 2025, at 22:42, Jason Merrill wrote:


On 12/12/24 2:07 PM, Simon Martin wrote:

We currently ICE upon the following valid (under -Wno-vla)



code




=== cut here ===
void f(int c) {
constexpr int r = 4;
[&](auto) { int t[r * c]; }(0);
}
=== cut here ===

The problem is that when parsing the lambda body, and more
specifically
the multiplication, we mark the lambda as
LAMBDA_EXPR_CAPTURE_OPTIMIZED
even though the replacement of r by 4 is "undone" by the
call



to
build_min_non_dep in build_x_binary_op. This makes
prune_lambda_captures
remove the proxy declaration while it should not, and we
trip



on
an



assert at instantiation time.


Why does prune_lambda_captures remove the declaration if
it's
still
used in the function body?  Setting
LAMBDA_EXPR_CAPTURE_OPTIMIZED
just
means "we might have optimized away some captures", the tree
walk
should have found the use put back by build_x_binary_op.

I think that this is due to a bug in mark_const_cap_r,
that’s
been
here since the beginning, i.e. r8-7213-g1577f10a637352: it
decides



NOT
to walk sub-trees when encountering a DECL_EXPR for a
constant
capture
proxy (at lambda.cc:1769). I don’t understand why we



wouldn’t
want
to continue.


Because that's the declaration of the capture proxy, not a use
of
it.

Indeed, thanks for clarifying.


Why aren't we finding the use in the declaration of t?

After further investigation, the problem is rather that neither
walk_tree_1 nor cp_walk_subtrees walk the dimensions of array
types,
so
we miss the uses.


Removing that line fixes the PR, but breaks 3 existing tests
(lambda-capture1.C, lambda-const11.C and lambda-const11a.C,
that
all
assert that we optimise out the capture); that’s why I did
not
pursue
it in the first place.


Right.


But taking another look, it might not be that big a deal that
we
don’t
optimise those out: as soon as we use -O1 or above, the
assignment
to



the closure field actually disappears.


Completely breaking this optimization is a big deal,
particularly
since it affects the layout of closure objects.  We can't
always



optimize everything away.

ACK.

The attached updated patch makes cp_walk_subtrees walk array
type



dimensions, which fixes the initial PR without those 3
regressions.




Successfully tested on x86_64-pc-linux-gnu. Is it OK?

Simon

PS: In theory I think it’d make sense to do do this change in

walk_tree_1 since C also supports VLAs, but doing so breaks
some



OMP
tests. OMP can do interesting things with array bounds (see
r9-5354-gda972c05f48637), and fixing those would require
teaching



walk_tree_1 about the “omp dummy var” array bounds, which I
think
would be a bit ugly. And I’m not aware of any C case that
would
be
improved/fixed by doing this change, so we’re probably fine
not
doing
it.



From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17
00:00:00
2001
From: Simon Martin 
Date: Wed, 22 Jan 2025 16:19:47 +0100
Subject: [PATCH] c++: Don't prune constant capture proxies only
used
in array
 dimensions [PR114292]

We currently ICE upon the following valid (under -Wno-vla) code

=== cut here ===
void f(int c) {
  constexpr int r = 4;
  [&](auto) { int t[r * c]; }(0);
}
=== cut here ===

When parsing the lambda body, and more specifically the
multiplication,
we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which
indicates
to
prune_lambda_captures that it might be possible to optimize out
some
captures.

The problem is that prune_lambda_captures then misses the use
of



the
r
capture (because neither walk_tree_1 nor cp_walk_subtrees walks
the



dimensions of array types - here "r * c"), hence believes the
capture
can be pruned... and we trip on an assert when instantiating
the
lambda.

This patch changes cp_walk_subtrees so that when walking a
declaration
with an array type, it walks that array type's dimensions.
Since
C
also
supports VLAs, I thought it'd make sense to do this in
walk_tree_1,



but
this breaks some omp-low related test cases (because OMP can do
funky
things with array bounds when adjust_array_error_bounds is
set),



and
I
don't really want to add code to walk_tree_1 to skips arrays
whose
dimension is a temporary variable with the "omp dummy var"
attribute.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/114292

gcc/cp/ChangeLog:

* tree.cc (cp_walk_subtrees): Walk array type dimensions.

gcc/testsuite/C

[COMMITTED] RISC-V: Add another test for FRM elimination bug [PR118646]

2025-01-27 Thread Vineet Gupta

The issue is same as PR118103 and fixed by commit 55d288d4ff53
("RISC-V: Make FRM as global register [PR118103]").

Essentially FRM save/restore were getting eliminated because FRM reg
semantics were not being modelled correctly.

In this case it showed up as SPEC2017 527.cam4 runtime aborts in
glibc:round_away() due to non-canonical rounding mode showing up,
"leaking" earlier in the call chain because such rounding mode
save/restore was getting eliminated.

PR target/118646

gcc/testsuite/ChangeLog:
* gfortran.target/riscv/rvv/pr118646.f90 (New Test).

Signed-off-by: Vineet Gupta 
---
 .../gfortran.target/riscv/rvv/pr118646.f90| 48 +++
 1 file changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90

diff --git a/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 
b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90
new file mode 100644
index ..e2a034316123
--- /dev/null
+++ b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90
@@ -0,0 +1,48 @@
+! Reduced from SPEC2017 527.cam4 zm_conv.F90
+
+! { dg-do compile }
+! { dg-options "-Ofast -std=legacy -march=rv64gcv_zvl256b_zba_zbb_zbs_zicond 
-ftree-vectorize -mabi=lp64d" }
+
+module a
+  contains
+subroutine b(f)
+
+   real d(4)
+   integer e(4)
+   integer f(4)
+   real hmax(4)
+   real g(4)
+
+   integer h(4)
+   integer l(4,5)
+   do i = 1,c
+  h(i) = 0
+   end do
+   do k = j  ,1
+  do i = 1,c
+ q = g(i) + hmax(i)
+ if (k >= nint(d(i)) .and. k <= e(i) .and. q > 1.e4) then
+f(i) = k
+ end if
+  if (k < o ) then
+if (buoy<= 0.) then
+   l(i,h) = k
+end if
+ end if
+  end do
+   end do
+   do n = 1,5
+  do k = 1,m
+ do i = 1,c
+if (k > l(i,n)) then
+   p = r()
+end if
+ end do
+  end do
+   end do
+end
+end
+
+! { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 1 } }
+! { dg-final { scan-assembler-times {fsrmi\s+[01234]} 1 } }
+! { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 1 } }
-- 
2.43.0

Re: [PATCH] wwwdocs: Clarify DCO name/identity and (anonymous) pseudonym policy

2025-01-27 Thread Mark Wielaard

Hi,

On Mon, Jan 20, 2025 at 11:59:16AM +0100, Mark Wielaard wrote:
> On Sun, Jan 19, 2025 at 05:14:20PM +0100, Mark Wielaard wrote:
> > On Tue, Dec 17, 2024 at 04:40:10PM +0900, Gerald Pfeifer wrote:
> > > On Mon, 2 Dec 2024, Mark Wielaard wrote:
> > > > Adjust the DCO text to match the broader community usage and
> > > > clarifications around the use of real names, known identities and
> > > > (anonymous) pseudonyms.
> > > > 
> > > > These changes clarify what was meant by "real name" and that it is not
> > > > required to be a "legal name" or any other stronger requirement than a
> > > > known identity that could be contacted to discuss the contribution as
> > > > adopted by other communities like the linux kernel, elfutils, cncf and
> > > > gentoo.
> > > > 
> > > > Also explain that the FSF assignment policy might be more appropriate
> > > > when wanting to contribute using an anonymous pseudonym.
> > > 
> > > This looks fine to me personally, though I don't feel I can simply 
> > > approve 
> > > this wearing my wwwdocs hat (which is why I haven't responded earlier).
> > > 
> > > Jason, you have originally contributed this if I remember correctly; how 
> > > do we go about such a change?
> > 
> > Please let me know. Updated patch below.
> > 
> > > (One minor bit: The sentence "This will be done for you automatically if 
> > > you use `git commit -s`" feels a bit off now there is more details on 
> > > something else preceeding it. i.e., `git commit -s` does not establish
> > > real names as such. Maybe leave it earlier in the paragraph?)
> > 
> > Yes, good point. I swapped the sentences.
> 
> And then I attached the old patch anyway. Sorry. Really updated patch
> attached now.

Ping.

> > P.S. There will be a panel discussion organized by the FSF licensing
> > and compliance manager on copyrights and developer certificates of
> > origin at Fosdem:
> > https://fosdem.org/2025/schedule/event/fosdem-2025-5376-managing-copyrights-in-free-software-projects-discussion-panel-with-gnu-maintainers/
> > I'll try to attend and report back.

Also the GDB hackers and the FSF have a discussion about
clarifications of copyright assignment and DCO practices:
https://inbox.sourceware.org/gdb/605f81a2-000e-42a3-b5fd-1854e92c5...@fsf.org/

> From b385e23076bdf843068f8242995461bef2e79748 Mon Sep 17 00:00:00 2001
> From: Mark Wielaard 
> Date: Mon, 2 Dec 2024 11:16:00 +0100
> Subject: [PATCH] wwwdocs: Clarify DCO name/identity and (anonymous) pseudonym
>  policy
> 
> Adjust the DCO text to match the broader community usage and
> clarifications around the use of real names, known identities and
> (anonymous) pseudonyms.
> 
> These changes clarify what was meant by "real name" and that it is not
> required to be a "legal name" or any other stronger requirement than a
> known identity and can be contacted to discuss the contribution as
> adopted by other communities like the linux kernel, elfutils, cncf and
> gentoo.
> 
> Also explain that the FSF assignment policy might be more appropriate
> when wanting to contribute using an anonymous pseudonym or when
> needing an explicit company disclaimer.
> ---
>  htdocs/dco.html | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/htdocs/dco.html b/htdocs/dco.html
> index 68fa183b9fc0..0bf0d3fedf34 100644
> --- a/htdocs/dco.html
> +++ b/htdocs/dco.html
> @@ -54,8 +54,21 @@ then you just add a line saying:
>  
>  Signed-off-by: Random J Developer 
> 
>  
> -using your real name (sorry, no pseudonyms or anonymous contributions.)  This
> -will be done for you automatically if you use `git commit -s`.
> +using a known identity (sorry, no anonymous contributions.)  This will
> +be done for you automatically if you use `git commit -s`.  The name
> +you use as your identity should not be an anonymous id or false name
> +that misrepresents who you are.
> +
> +A known identity can be the committer's real, birth or legal name,
> +but can also be an established (online) identity.  It is the name you
> +convey to people in the community for them to use to identify you as
> +you.  A key concern is that your identification is sufficient enough
> +to contact you if an issue were to arise in the future about your
> +contribution.  You should not deliberately use a name or email address
> +that hides your identity.  When you wish to only contribute under an
> +(anonymous) pseudonym, or when you require an explicit employer
> +disclaimer, then following the FSF
> +assignment process is more appropriate.
>  
>  Some people also put extra optional tags at the end.  The GCC project does
>  not require tags from anyone other than the original author of the patch, but
> -- 
> 2.47.1
>

Re: [PATCH] c: For array element type drop qualifiers but keep other properties of the element type [PR116357]

2025-01-27 Thread Joseph Myers

On Sat, 25 Jan 2025, Jakub Jelinek wrote:

> The following patch uses build_qualified_type with TYPE_UNQUALIFIED instead,
> which will be in the common case the same as TYPE_MAIN_VARIANT if the
> checks are satisfied for it, but if not, will look up different unqualified
> type or even create it if there is none.

Using build_qualified_type would I think also have the effect of not 
removing the qualifiers from the element type of an array type.  The 
design followed in grokdeclarator is given by the comment:

 At each stage we maintain an unqualified version of the type
 together with any qualifiers that should be applied to it with
 c_build_qualified_type; this way, array types including
 multidimensional array types are first built up in unqualified
 form and then the qualified form is created with
 TYPE_MAIN_VARIANT pointing to the unqualified form.  */

The patch is OK with c_build_qualified_type used instead of 
build_qualified_type, if that works.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] c++: friend vs inherited guide confusion [PR117855]

2025-01-27 Thread Jason Merrill


On 1/27/25 4:17 PM, Patrick Palka wrote:

On Mon, 27 Jan 2025, Patrick Palka wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/14?


OK.


+&& (cxx_dialect < cxx23 || !deduction_guide_p (NODE)))  \


N.B. I decided to check cxx_dialect here since deduction_guide_p seems
relatively expensive to call?  In another version of this patch I made
us track dguide-ness via a dedicated bit flag instead of the DECL_NAME
comparison to make the predicate essentially free (and potentially also
more robust) but I eventually shelved that as more stage 1 material.


deduction_guide_p doesn't look like a significant expense to me; I 
wouldn't bother trying to optimize it without profile data as evidence.


Jason

[pushed] c++: only strip conversions for deduction [PR118632]

2025-01-27 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In r15-2761 I changed unify to always strip IMPLICIT_CONV_EXPR from PARM.
In this testcase that leads to comparing nullptr to (int*)0, and failing
because they aren't the same.  Let's only strip conversions if we're
actually going to try to deduce template arguments.

While we're at it, let's move this after the early exits.

And with this adjustment we can remove the workaround for mangle57.C.

PR c++/118632

gcc/cp/ChangeLog:

* pt.cc (unify): Only strip conversion if deducible_expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nontype7.C: New test.
---
 gcc/cp/pt.cc  | 24 ++--
 gcc/testsuite/g++.dg/cpp0x/nontype7.C | 13 +
 2 files changed, 23 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nontype7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d4a6e2e0675..2811afed007 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -25099,15 +25099,6 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
 ? tf_warning_or_error
 : tf_none);
 
-  /* I don't think this will do the right thing with respect to types.
- But the only case I've seen it in so far has been array bounds, where
- signedness is the only information lost, and I think that will be
- okay.  VIEW_CONVERT_EXPR can appear with class NTTP, thanks to
- finish_id_expression_1, and are also OK.  */
-  while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR
-|| TREE_CODE (parm) == IMPLICIT_CONV_EXPR)
-parm = TREE_OPERAND (parm, 0);
-
   if (arg == error_mark_node)
 return unify_invalid (explain_p);
   if (arg == unknown_type_node
@@ -25119,11 +25110,16 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   if (parm == any_targ_node || arg == any_targ_node)
 return unify_success (explain_p);
 
-  /* Stripping IMPLICIT_CONV_EXPR above can produce this mismatch
- (g++.dg/abi/mangle57.C).  */
-  if (TREE_CODE (parm) == FUNCTION_DECL
-  && TREE_CODE (arg) == ADDR_EXPR)
-arg = TREE_OPERAND (arg, 0);
+  /* Strip conversions that will interfere with NTTP deduction.
+ I don't think this will do the right thing with respect to types.
+ But the only case I've seen it in so far has been array bounds, where
+ signedness is the only information lost, and I think that will be
+ okay.  VIEW_CONVERT_EXPR can appear with class NTTP, thanks to
+ finish_id_expression_1, and are also OK.  */
+  if (deducible_expression (parm))
+while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR
+  || TREE_CODE (parm) == IMPLICIT_CONV_EXPR)
+  parm = TREE_OPERAND (parm, 0);
 
   /* If PARM uses template parameters, then we can't bail out here,
  even if ARG == PARM, since we won't record unifications for the
diff --git a/gcc/testsuite/g++.dg/cpp0x/nontype7.C 
b/gcc/testsuite/g++.dg/cpp0x/nontype7.C
new file mode 100644
index 000..30c1d3933ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nontype7.C
@@ -0,0 +1,13 @@
+// PR c++/118632
+// { dg-do compile { target c++11 } }
+
+template
+class Matrix {};
+
+template
+void operator*(Matrix, int);
+
+int main() {
+  Matrix m;
+  m * 42;
+}

base-commit: ceabea405ffdc851736e240111be9b297ad86c53
-- 
2.48.0

[PATCH] combine: Fix up make_extraction [PR118638]

2025-01-27 Thread Jakub Jelinek

Hi!

The following testcase is miscompiled at -Os on x86_64-linux.
The problem is during make_compound_operation of
(ashiftrt:SI (ashift:SI (mult:SI (reg:SI 107 [ a_5 ])
(const_int 3 [0x3]))
(const_int 31 [0x1f]))
(const_int 31 [0x1f]))
where it incorrectly returns
(mult:SI (sign_extract:SI (reg:SI 107 [ a_5 ])
(const_int 2 [0x2])
(const_int 0 [0]))
(const_int 3 [0x3]))
which isn't obviously true, the former returns either 0 or -1 depending
on the least significant bit of the multiplication,
the latter returns either 0 or -3 depending on the second least significant
bit of the multiplication argument.

The bug has been introduced in PR96998 r11-4563, which added handling of x
* (2^N) similar to x << N.  In the above case, pos is 0 and len is 1,
sign extracting a single least significant bit of the multiplication.
As 3 is not a power of 2, shift_amt is -1.
But IN_RANGE (-1, 1, 1 - 1) is still true, because the basic requirement of
IN_RANGE that LOWER is not greater than UPPER is violated.
The intention of using 1 as LOWER is to avoid matching multiplication by 1,
that really shouldn't appear in the IL.  But to avoid violating IN_RANGE
requirement, we need to verify that len is at least 2.

I've added this len > 1 check to the inner if rather than outer because I
think for GCC 16 we should add a further optimization.
In the particular case of 1 least significant bit sign extraction from
multiplication by 3, we could actually say it is equivalent to
(sign_extract:SI (reg:SI 107 [ a_5 ])
(const_int 1 [0x2])
(const_int 0 [0]))
That is because 3 is an odd number and multiplication by 2 will yield the
least significant bit 0 (we are sign extracting just one) and so the
multiplication doesn't change anything on the outcome.
More generally, even for larger len, multiplication by C which is
(1 << X) + 1 where X is >= len should be optimizable just to extraction
of the multiplicand's least significant len bits.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-28  Jakub Jelinek  

PR rtl-optimization/118638
* combine.cc (make_extraction): Only optimize (mult x 2^n) if len is
larger than 1.

* gcc.c-torture/execute/pr118638.c: New test.

--- gcc/combine.cc.jj   2025-01-27 16:38:47.0 +0100
+++ gcc/combine.cc  2025-01-27 18:55:22.082925556 +0100
@@ -7629,7 +7629,7 @@ make_extraction (machine_mode mode, rtx
 least significant (LEN - C) bits of X, giving an rtx
 whose mode is MODE, then multiply it by 2^C.  */
   const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
-  if (IN_RANGE (shift_amt, 1, len - 1))
+  if (len > 1 && IN_RANGE (shift_amt, 1, len - 1))
{
  new_rtx = make_extraction (mode, XEXP (inner, 0),
 0, 0, len - shift_amt,
--- gcc/testsuite/gcc.c-torture/execute/pr118638.c.jj   2025-01-27 
18:56:30.246986900 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr118638.c  2025-01-27 
18:56:13.660215308 +0100
@@ -0,0 +1,20 @@
+/* PR rtl-optimization/118638 */
+
+__attribute__((noipa)) int
+foo (int x)
+{
+  int a = x != -3, b, c;
+  a *= 3;
+  b = 2 * x - 9;
+  a = a + b;
+  a = ~a;
+  c = a & 1;
+  return -c;
+}
+
+int
+main ()
+{
+  if (foo (0) != -1)
+__builtin_abort ();
+}

Jakub

[patch, libfortran] PR114618 Format produces incorrect output when contains 1x, ok when uses " "

2025-01-27 Thread Jerry D


Hello all,

The attached patch is part 1 of my effort to fix these X and T edit 
descriptor issues. This one cleans up some really ugly output.


Before the patch with the test case provided by the reporter:

PI.^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ 3.1415926535897931
REAL(PI)...^@^@^@^@^@^@^@^@^@^@^@ 3.14159274
DBLE(PI)...^@^@^@^@^@^@^@^@^@^@^@ 3.1415926535897931
RADIX..^@^@^@^@^@^@^@^@^@^@^@ 3. 2
RANGE..^@^@^@^@^@^@^@^@^@^@^@ 3. 307
PRECISION..^@^@^@^@^@^@^@^@^@^@ 15

Which is complete garbage.

After the patch:

PI  3.1415926535897931
REAL(PI) .. 3.14159274
DBLE(PI) .. 3.1415926535897931
RADIX . 2
RANGE . 307
PRECISION . 15

I greatly reduced the test case included in the patch. While working on 
this one I discovered other problems not addressed by this patch and I 
will address these through PR113897.


You will see some changes in the factoring of some of the code in the
case FMT_X:, case FMT_TR:, case FMT_TL:, case FMT_T:. I anticipate in 
part 2 that I will be doing more specific changes on these.


Regression tested on x86_64_linux_gnu.

OK for trunk?

Regards,

Jerry

Author: Jerry DeLisle 
Date:   Mon Jan 27 19:08:46 2025 -0800

Fortran: Fix handling of the X edit descriptor.

This patch is a partial fix of handling of X edit descriptors
when combined with certain T edit descriptors.

PR libfortran/114618

libgfortran/ChangeLog:

  * io/transfer.c (formatted_transfer_scalar_write): Change name
of vriable 'pos' to 'tab_pos' to improve clarity. Add new
variable next_pos when calculating the maximum position.
Update the calculation of pending spaces.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr114618.f90: New test.

diff --git a/gcc/testsuite/gfortran.dg/pr114618.f90 b/gcc/testsuite/gfortran.dg/pr114618.f90
new file mode 100644
index 000..c06c4debe31
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114618.f90
@@ -0,0 +1,15 @@
+! { dg-do run }
+! PR114618 Format produces incorrect output when contains 1x, ok when uses " " 
+! aside: Before patch output1 is garbage.
+! Output should be 'RADIX . 2'
+program pr114618
+   implicit none
+   integer, parameter :: wp = kind(0d0)
+   real(kind=wp) :: pi  = 3.14159265358979323846264338_wp
+   character(len=*), parameter:: fmt1 = '(19("."),t1,g0,1x,t21,g0)'
+   character(len=*), parameter:: fmt2 = '(19("."),t1,g0," ",t21,g0)'
+   character(30) :: output1, output2
+   write (output1, fmt1) 'RADIX', radix(pi)
+   write (output2, fmt2) 'RADIX', radix(pi)
+   if (output1 /= output2) stop 1
+end program pr114618
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index b3b72f39c5b..8a24a099afe 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -2068,12 +2068,14 @@ static void
 formatted_transfer_scalar_write (st_parameter_dt *dtp, bt type, void *p, int kind,
  size_t size)
 {
-  gfc_offset pos, bytes_used;
+  gfc_offset tab_pos, bytes_used;
   const fnode *f;
   format_token t;
   int n;
   int consume_data_flag;
 
+  tab_pos = 0; bytes_used = 0;
+
   /* Change a complex data item into a pair of reals.  */
 
   n = (p == NULL) ? 0 : ((type != BT_COMPLEX) ? 1 : 2);
@@ -2398,10 +2400,17 @@ formatted_transfer_scalar_write (st_parameter_dt *dtp, bt type, void *p, int kin
 	case FMT_X:
 	case FMT_TR:
 	  consume_data_flag = 0;
-
 	  dtp->u.p.skips += f->u.n;
-	  pos = bytes_used + dtp->u.p.skips - 1;
-	  dtp->u.p.pending_spaces = pos - dtp->u.p.max_pos + 1;
+	  tab_pos = bytes_used + dtp->u.p.skips - 1;
+	  dtp->u.p.pending_spaces = tab_pos - dtp->u.p.max_pos + 1;
+	  dtp->u.p.pending_spaces = dtp->u.p.pending_spaces < 0
+? f->u.n : dtp->u.p.pending_spaces;
+
+	  if (t == FMT_X && tab_pos < dtp->u.p.max_pos)
+	  {
+	write_x (dtp, dtp->u.p.skips, dtp->u.p.pending_spaces);
+	dtp->u.p.skips = dtp->u.p.pending_spaces = 0;
+	  }
 	  /* Writes occur just before the switch on f->format, above, so
 	 that trailing blanks are suppressed, unless we are doing a
 	 non-advancing write in which case we want to output the blanks
@@ -2414,35 +2423,50 @@ formatted_transfer_scalar_write (st_parameter_dt *dtp, bt type, void *p, int kin
 	  break;
 
 	case FMT_TL:
-	case FMT_T:
 	  consume_data_flag = 0;
-
-	  if (f->format == FMT_TL)
+	  /* Handle the special case when no bytes have been used yet.
+	 Cannot go below zero. */
+	  if (bytes_used == 0)
 	{
-
-	  /* Handle the special case when no bytes have been used yet.
-	 Cannot go below zero. */
-	  if (bytes_used == 0)
-		{
-		  dtp->u.p.pending_spaces -= f->u.n;
-		  dtp->u.p.skips -= f->u.n;
-		  dtp->u.p.skips = dtp->u.p.skips < 0 ? 0 : dtp->u.p.skips;
-		}
-
-	  pos = bytes_used - f->u.n;
+	  dtp->u.p.pending_spaces -= f->u.n;
+	  dtp->u.p.skips -= f->u.n;
+	  dtp->u.p.skips = dtp->u.p.skips < 0 ? 0

[PATCH] libstdc++: Fix views::transform(move_only_fn{}) forwarding [PR118413]

2025-01-27 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14?

-- >8 --

This case was incorrectly failing in C++23 mode even after P2492R2
because the perfect forwarding simplification mechanism assumed bound
arguments are copy-constructible which is no longer necessarily true
after that paper.  It'd be easy enough to fix the mechanism, but in
C++23 mode the mechanism is no longer useful since we can just rely on
deducing 'this' to implement perfect forwarding with a single overload
(and done in r14-7150-gd2cb4693a0b383).  So this patch just disables
the mechanism in C++23 mode so that the generic implementation is always
used.

PR libstdc++/118413

libstdc++-v3/ChangeLog:

* include/std/ranges
(_GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING): New.
(__closure_has_simple_call_op): Only define in C++20 mode.
(__closure_has_simple_extra_args): Likewise.
(_Partial, _Pipe): Likewise, for the "simple" partial
specializations.
(*::_S_has_simple_call_op): Likewise.
(*::_S_has_simple_extra_args): Likewise.
* testsuite/std/ranges/adaptors/100577.cc: Disable some
implementation detail checks in C++23 mode.
* testsuite/std/ranges/adaptors/transform.cc (test09): Also test
partially applying the move-only function.
---
 libstdc++-v3/include/std/ranges   | 53 +++
 .../testsuite/std/ranges/adaptors/100577.cc   |  4 +-
 .../std/ranges/adaptors/transform.cc  |  2 +
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index ad69a94b21f..a2b8748bea6 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1028,6 +1028,16 @@ namespace views::__adaptor
}
 };
 
+#if __cplusplus <= 202003L
+  // In C++20 mode we simplify perfect forwarding of a range adaptor closure's
+  // bound arguments when possible (according to their types), for sake of 
compile
+  // times and diagnostic quality.  In C++23 mode we instead rely on deducing 
'this'
+  // to idiomatically implement perfect forwarding.  Note that this means the
+  // simplification logic doesn't consider C++23  changes such as 
P2492R2.
+# define _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING 1
+#endif
+
+#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
   // True if the range adaptor closure _Adaptor has a simple operator(), i.e.
   // one that's not overloaded according to constness or value category of the
   // _Adaptor object.
@@ -1039,6 +1049,7 @@ namespace views::__adaptor
   template
 concept __adaptor_has_simple_extra_args = 
_Adaptor::_S_has_simple_extra_args
   || _Adaptor::template _S_has_simple_extra_args<_Args...>;
+#endif
 
   // A range adaptor closure that represents partial application of
   // the range adaptor _Adaptor with arguments _Args.
@@ -1139,6 +1150,7 @@ namespace views::__adaptor
 #endif
 };
 
+#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
   // Partial specialization of the primary template for the case where the 
extra
   // arguments of the adaptor can always be safely and efficiently forwarded by
   // const reference.  This lets us get away with a single operator() overload,
@@ -1195,6 +1207,7 @@ namespace views::__adaptor
 
   static constexpr bool _S_has_simple_call_op = true;
 };
+#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
 
   template
 concept __pipe_invocable
@@ -1245,6 +1258,7 @@ namespace views::__adaptor
 #endif
 };
 
+#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
   // A partial specialization of the above primary template for the case where
   // both adaptor operands have a simple operator().  This in turn lets us
   // implement composition using a single simple operator(), which makes
@@ -1271,6 +1285,7 @@ namespace views::__adaptor
 
   static constexpr bool _S_has_simple_call_op = true;
 };
+#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
 } // namespace views::__adaptor
 
 #if __cpp_lib_ranges >= 202202L
@@ -1454,7 +1469,9 @@ namespace views::__adaptor
return owning_view{std::forward<_Range>(__r)};
}
 
+#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
   static constexpr bool _S_has_simple_call_op = true;
+#endif
 };
 
 inline constexpr _All all;
@@ -1872,7 +1889,9 @@ namespace views::__adaptor
 
   using _RangeAdaptor<_Filter>::operator();
   static constexpr int _S_arity = 2;
+#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
   static constexpr bool _S_has_simple_extra_args = true;
+#endif
 };
 
 inline constexpr _Filter filter;
@@ -2260,7 +2279,9 @@ namespace views::__adaptor
 
   using _RangeAdaptor<_Transform>::operator();
   static constexpr int _S_arity = 2;
+#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
   static constexpr bool _S_has_simple_extra_args = true;
+#endif
 };
 
 inline constexpr _Transform transform;
@@ -2494,12

Re: [PATCH] libstdc++: Fix views::transform(move_only_fn{}) forwarding [PR118413]

2025-01-27 Thread Patrick Palka

On Mon, 27 Jan 2025, Patrick Palka wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14?
> 
> -- >8 --
> 
> This case was incorrectly failing in C++23 mode even after P2492R2
> because the perfect forwarding simplification mechanism assumed bound
> arguments are copy-constructible which is no longer necessarily true
> after that paper.  It'd be easy enough to fix the mechanism, but in
> C++23 mode the mechanism is no longer useful since we can just rely on
> deducing 'this' to implement perfect forwarding with a single overload
> (and done in r14-7150-gd2cb4693a0b383).  So this patch just disables
> the mechanism in C++23 mode so that the generic implementation is always
> used.

I should add that this doesn't constitute an ABI change because the
disabled "simple" partial specializations of _Partial and _Pipe have the
same layout as the corresponding non-simple versions.

> 
>   PR libstdc++/118413
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/ranges
>   (_GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING): New.
>   (__closure_has_simple_call_op): Only define in C++20 mode.
>   (__closure_has_simple_extra_args): Likewise.
>   (_Partial, _Pipe): Likewise, for the "simple" partial
>   specializations.
>   (*::_S_has_simple_call_op): Likewise.
>   (*::_S_has_simple_extra_args): Likewise.
>   * testsuite/std/ranges/adaptors/100577.cc: Disable some
>   implementation detail checks in C++23 mode.
>   * testsuite/std/ranges/adaptors/transform.cc (test09): Also test
>   partially applying the move-only function.
> ---
>  libstdc++-v3/include/std/ranges   | 53 +++
>  .../testsuite/std/ranges/adaptors/100577.cc   |  4 +-
>  .../std/ranges/adaptors/transform.cc  |  2 +
>  3 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index ad69a94b21f..a2b8748bea6 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -1028,6 +1028,16 @@ namespace views::__adaptor
>   }
>  };
>  
> +#if __cplusplus <= 202003L
> +  // In C++20 mode we simplify perfect forwarding of a range adaptor 
> closure's
> +  // bound arguments when possible (according to their types), for sake of 
> compile
> +  // times and diagnostic quality.  In C++23 mode we instead rely on 
> deducing 'this'
> +  // to idiomatically implement perfect forwarding.  Note that this means the
> +  // simplification logic doesn't consider C++23  changes such as 
> P2492R2.
> +# define _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING 1
> +#endif
> +
> +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
>// True if the range adaptor closure _Adaptor has a simple operator(), i.e.
>// one that's not overloaded according to constness or value category of 
> the
>// _Adaptor object.
> @@ -1039,6 +1049,7 @@ namespace views::__adaptor
>template
>  concept __adaptor_has_simple_extra_args = 
> _Adaptor::_S_has_simple_extra_args
>|| _Adaptor::template _S_has_simple_extra_args<_Args...>;
> +#endif
>  
>// A range adaptor closure that represents partial application of
>// the range adaptor _Adaptor with arguments _Args.
> @@ -1139,6 +1150,7 @@ namespace views::__adaptor
>  #endif
>  };
>  
> +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
>// Partial specialization of the primary template for the case where the 
> extra
>// arguments of the adaptor can always be safely and efficiently forwarded 
> by
>// const reference.  This lets us get away with a single operator() 
> overload,
> @@ -1195,6 +1207,7 @@ namespace views::__adaptor
>  
>static constexpr bool _S_has_simple_call_op = true;
>  };
> +#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
>  
>template
>  concept __pipe_invocable
> @@ -1245,6 +1258,7 @@ namespace views::__adaptor
>  #endif
>  };
>  
> +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
>// A partial specialization of the above primary template for the case 
> where
>// both adaptor operands have a simple operator().  This in turn lets us
>// implement composition using a single simple operator(), which makes
> @@ -1271,6 +1285,7 @@ namespace views::__adaptor
>  
>static constexpr bool _S_has_simple_call_op = true;
>  };
> +#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
>  } // namespace views::__adaptor
>  
>  #if __cpp_lib_ranges >= 202202L
> @@ -1454,7 +1469,9 @@ namespace views::__adaptor
>   return owning_view{std::forward<_Range>(__r)};
>   }
>  
> +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING
>static constexpr bool _S_has_simple_call_op = true;
> +#endif
>  };
>  
>  inline constexpr _All all;
> @@ -1872,7 +1889,9 @@ namespace views::__adaptor
>  
>using _RangeAdaptor<_Filter>::operator();
>static constexpr int _S_arity = 2;
> +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWA

[committed][PR target/114085] Fix H8 constraint issue which led to ICE

2025-01-27 Thread Jeff Law



Nowhere near the top of my list, but a quick looksie Sunday led to an 
easy to fix backend bug.  It's not a regression, but given its the H8 
backend I think we've safely got a degree of freedom here.


The H8 has a constraint "U" which allowed both a subset of MEMs and 
REGs, so it wasn't marked as a memory constraint.  LRA doesn't really 
handle this well -- a pseudo which didn't get a hard reg was replaced by 
its MEM.  The stack slot doesn't fit the limited addressing forms 
available and LRA didn't know it just needed to reload the address into 
a reg.


Fixed by removing REG from the "U" constraint, turning "U" into a memory 
constraint and adjusting a few patterns to allow "rU" instead of "U".


We don't really support C++ on the H8 and as a result libstdc++ won't 
build. Interestingly enough that also keeps the C++ tests from working, 
even for a compile-only test.  So no testcase.  Though I did check the 
reduced and original test manually and ran it through my tester without 
any regressions.


Pushing to the trunk.

Jeff

commit 235215323c67d8ce021a00df0f42e2c1713c7959
Author: Jeff Law 
Date:   Mon Jan 27 21:25:39 2025 -0700

[PR target/114085] Fix H8 constraint issue which led to ICE

Nowhere near the top of my list, but a quick looksie Sunday led to an easy 
to
fix backend bug.  It's not a regression, but given its the H8 backend I 
think
we've safely got a degree of freedom here.

The H8 has a constraint "U" which allowed both a subset of MEMs and REGs, 
so it
wasn't marked as a memory constraint.  LRA doesn't really handle this well 
-- a
pseudo which didn't get a hard reg was replaced by its MEM.  The stack slot
doesn't fit the limited addressing forms available and LRA didn't know it 
just
needed to reload the address into a reg.

Fixed by removing REG from the "U" constraint, turning "U" into a memory
constraint and adjusting a few patterns to allow "rU" instead of "U".

We don't really support C++ on the H8 and as a result libstdc++ won't build.
Interestingly enough that also keeps the C++ tests from working, even for a
compile-only test.  So no testcase.  Though I did check the reduced and
original test manually and ran it through my tester without any regressions.

PR target/114085
gcc/
* config/h8300/constraints.md (U): No longer accept REGs.
* config/h8300/logical.md (andqi3_2): Use "rU" rather than "U".
(andqi3_2_clobber_flags, andqi3_1, qi3_1): Likewise.
* config/h8300/testcompare.md (tst_extzv_1_n): Likewise.

diff --git a/gcc/config/h8300/constraints.md b/gcc/config/h8300/constraints.md
index 65579267359..ba76df834a6 100644
--- a/gcc/config/h8300/constraints.md
+++ b/gcc/config/h8300/constraints.md
@@ -168,13 +168,9 @@ (define_constraint "T"
   (and (match_code "const_int")
(match_test "!h8300_shift_needs_scratch_p (ival, SImode, CLOBBER)")))
 
-(define_constraint "U"
+(define_memory_constraint "U"
   "An operand valid for a bset destination."
-  (ior (and (match_code "reg")
-   (match_test "(reload_in_progress || reload_completed)
-? REG_OK_FOR_BASE_STRICT_P (op)
-: REG_OK_FOR_BASE_P (op)"))
-   (and (match_code "mem")
+  (ior (and (match_code "mem")
(match_code "reg" "0")
(match_test "(reload_in_progress || reload_completed)
 ? REG_OK_FOR_BASE_STRICT_P (XEXP (op, 0))
diff --git a/gcc/config/h8300/logical.md b/gcc/config/h8300/logical.md
index 5df0922ef4e..f848242ac87 100644
--- a/gcc/config/h8300/logical.md
+++ b/gcc/config/h8300/logical.md
@@ -32,7 +32,7 @@ (define_expand "3"
 ;; --
 
 (define_insn_and_split "*andqi3_2"
-  [(set (match_operand:QI 0 "bit_operand" "=U,rQ,r")
+  [(set (match_operand:QI 0 "bit_operand" "=rU,rQ,r")
(and:QI (match_operand:QI 1 "bit_operand" "%0,0,WU")
(match_operand:QI 2 "h8300_src_operand" "Y0,rQi,IP1>X")))]
   "TARGET_H8300SX"
@@ -42,7 +42,7 @@ (define_insn_and_split "*andqi3_2"
  (clobber (reg:CC CC_REG))])])
 
 (define_insn "*andqi3_2_clobber_flags"
-  [(set (match_operand:QI 0 "bit_operand" "=U,rQ,r")
+  [(set (match_operand:QI 0 "bit_operand" "=rU,rQ,r")
(and:QI (match_operand:QI 1 "bit_operand" "%0,0,WU")
(match_operand:QI 2 "h8300_src_operand" "Y0,rQi,IP1>X")))
(clobber (reg:CC CC_REG))]
@@ -55,7 +55,7 @@ (define_insn "*andqi3_2_clobber_flags"
(set_attr "length_table" "*,logicb,*")])
 
 (define_insn_and_split "andqi3_1"
-  [(set (match_operand:QI 0 "bit_operand" "=U,r")
+  [(set (match_operand:QI 0 "bit_operand" "=rU,r")
(and:QI (match_operand:QI 1 "bit_operand" "%0,0")
(match_operand:QI 2 "h8300_src_operand" "Y0,rn")))]
   "register_operand (operands[0], QImode)
@@ -156,7 +156,7 @@ (define_insn "*andorsi3_shift_8_clobber_flags

[PATCH] c++: wrong-code with consteval constructor [PR117501]

2025-01-27 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?

-- >8 --
We've had a wrong-code problem since r14-4140, due to which we
forget to initialize a variable.

In consteval39.C, we evaluate

struct QQQ q;
  <>>
  (const char *) "" ) >;

into

struct QQQ q;
  <;

and then the useless expr_stmt is dropped on the floor, so q isn't
initialized.  As pre-r14-4140, we need to handle constructors specially.

With this patch, we generate:

struct QQQ q;
  <;

initializing q properly.

PR c++/117501

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): If DECL is a constructor,
build up an INIT_EXPR to initialize its object.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval39.C: New test.
* g++.dg/cpp2a/consteval40.C: New test.
---
 gcc/cp/cp-gimplify.cc| 13 ++--
 gcc/testsuite/g++.dg/cpp2a/consteval39.C | 27 
 gcc/testsuite/g++.dg/cpp2a/consteval40.C | 25 ++
 3 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval39.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval40.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 4ec3de13008..65b486ab943 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1259,7 +1259,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
  not need to produce a constant expression.  */
   if (DECL_IMMEDIATE_FUNCTION_P (decl))
 {
-  tree e = cxx_constant_value (stmt, tf_none);
+  tree obj_arg = NULL_TREE;
+  if (call_p && DECL_CONSTRUCTOR_P (decl))
+   if (tree o = get_nth_callarg (stmt, 0))
+ if (!is_dummy_object (o))
+   {
+ obj_arg = o;
+ if (TREE_CODE (obj_arg) == ADDR_EXPR)
+   obj_arg = TREE_OPERAND (obj_arg, 0);
+   }
+  tree e = cxx_constant_value (stmt, obj_arg, tf_none);
   if (e == error_mark_node)
{
  /* This takes care of, e.g.,
@@ -1297,7 +1306,7 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
}
   /* We've evaluated the consteval function call.  */
   if (call_p)
-   *stmt_p = e;
+   *stmt_p = obj_arg ? cp_build_init_expr (obj_arg, e) : e;
 }
   /* We've encountered a function call that may turn out to be consteval
  later.  Store its caller so that we can ensure that the call is
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval39.C 
b/gcc/testsuite/g++.dg/cpp2a/consteval39.C
new file mode 100644
index 000..523e8260eab
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval39.C
@@ -0,0 +1,27 @@
+// PR c++/117501
+// { dg-do run { target c++20 } }
+
+constexpr unsigned
+length ()
+{
+  bool __trans_tmp_1 = __builtin_is_constant_evaluated();
+  if (__trans_tmp_1)
+return 42;
+  return 1;
+}
+struct basic_string_view {
+  constexpr basic_string_view(const char *) : _M_len{length()}, _M_str{} {}
+  long _M_len;
+  char _M_str;
+};
+struct QQQ {
+  consteval QQQ(basic_string_view d) : data(d) {}
+  basic_string_view data;
+};
+int
+main ()
+{
+  QQQ q("");
+  if (q.data._M_len != 42)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval40.C 
b/gcc/testsuite/g++.dg/cpp2a/consteval40.C
new file mode 100644
index 000..4d3ba20092b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval40.C
@@ -0,0 +1,25 @@
+// PR c++/117501
+// { dg-do run { target c++20 } }
+
+constexpr int
+twiddle (int i)
+{
+ if (__builtin_is_constant_evaluated ())
+return 3;
+  return i;
+}
+struct S {
+  constexpr S(int i) : i{twiddle (i)} {}
+  int i;
+};
+struct Q {
+  consteval Q(S s_) : s{s_, s_} {}
+  S s[2];
+};
+int
+main ()
+{
+  Q q(twiddle (42));
+  if (q.s[0].i != 3 || q.s[1].i != 3)
+__builtin_abort ();
+}

base-commit: ceabea405ffdc851736e240111be9b297ad86c53
-- 
2.48.1

[PATCH] c++: Fix ICE with #embed/RAW_DATA_CST after list conversion [PR118671]

2025-01-27 Thread Jakub Jelinek

Hi!

The following testcases ICE with RAW_DATA_CSTs (so the first one since
introduction of #embed C++ optimizations and the latter since optimization
of large sequences of comma separated literals).
I've missed the fact that implicit_conversion can embed the exact expression
passed to it into stuff pointed out by conversion * (e.g. for user
conversions in sub->cand->args).
So, it isn't enough in convert_like_internal to pass the right INTEGER_CST
for each element of the RAW_DATA_CST because the whole RAW_DATA_CST might be
in sub->cand->args etc.
Either I'd need to chase for wherever the RAW_DATA_CST is found and update
those for each element processed, or, as implemented in the following patch,
build_list_conv detects the easy optimizable case where
convert_like_internal can be kept as the whole RAW_DATA_CST with changed
type and possibly narrowing diagnostics, and otherwise instead of having
a single subconversion it has RAW_DATA_CST separate subconversions.
Instead of trying to reallocate the subconvs array when we detect that case,
the patch instead uses an artificial ck_list inside of the u.list array
to hold the individual subconversions.
Seems the only places where u.list is used are build_list_conv and
convert_like_internal.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-28  Jakub Jelinek  

PR c++/118671
* call.cc (build_list_conv): For RAW_DATA_CST, call
implicit_conversion with INTEGER_CST representing first byte instead
of the whole RAW_DATA_CST.  If it is an optimizable trivial
conversion, just save that to subconvs, otherwise allocate an
artificial ck_list for all the RAW_DATA_CST bytes and create
subsubconv for each of them.
(convert_like_internal): For ck_list with RAW_DATA_CST, instead of
doing all the checks for optimizable conversion just check kind and
assert everything else, otherwise use subsubconversions instead of
the subconversion for each element.

* g++.dg/cpp/embed-25.C: New test.
* g++.dg/cpp0x/pr118671.C: New test.

--- gcc/cp/call.cc.jj   2025-01-22 09:22:53.0 +0100
+++ gcc/cp/call.cc  2025-01-27 21:36:31.159889633 +0100
@@ -868,6 +868,67 @@ build_list_conv (tree type, tree ctor, i
 
   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), i, val)
 {
+  if (TREE_CODE (val) == RAW_DATA_CST)
+   {
+ tree elt
+   = build_int_cst (TREE_TYPE (val), RAW_DATA_UCHAR_ELT (val, 0));
+ conversion *sub
+   = implicit_conversion (elttype, TREE_TYPE (val), elt,
+  false, flags, complain);
+ conversion *next;
+ if (sub == NULL)
+   return NULL;
+ /* For conversion to initializer_list or
+initializer_list or initializer_list
+we can optimize and keep RAW_DATA_CST with adjusted
+type if we report narrowing errors if needed.
+Use just one subconversion for that case.  */
+ if (sub->kind == ck_std
+ && sub->type
+ && (TREE_CODE (sub->type) == INTEGER_TYPE
+ || is_byte_access_type (sub->type))
+ && TYPE_PRECISION (sub->type) == CHAR_BIT
+ && (next = next_conversion (sub))
+ && next->kind == ck_identity)
+   {
+ subconvs[i] = sub;
+ continue;
+   }
+ /* Otherwise. build separate subconv for each RAW_DATA_CST
+byte.  Wrap those into an artificial ck_list which convert_like
+will then handle.  */
+ conversion **subsubconvs = alloc_conversions (RAW_DATA_LENGTH (val));
+ unsigned int j;
+ subsubconvs[0] = sub;
+ for (j = 1; j < (unsigned) RAW_DATA_LENGTH (val); ++j)
+   {
+ elt = build_int_cst (TREE_TYPE (val),
+  RAW_DATA_UCHAR_ELT (val, j));
+ sub = implicit_conversion (elttype, TREE_TYPE (val), elt,
+false, flags, complain);
+ if (sub == NULL)
+   return NULL;
+ subsubconvs[j] = sub;
+   }
+
+ t = alloc_conversion (ck_list);
+ t->type = type;
+ t->u.list = subsubconvs;
+ t->rank = cr_exact;
+ for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j)
+   {
+ sub = subsubconvs[i];
+ if (sub->rank > t->rank)
+   t->rank = sub->rank;
+ if (sub->user_conv_p)
+   t->user_conv_p = true;
+ if (sub->bad_p)
+   t->bad_p = true;
+   }
+ subconvs[i] = t;
+ continue;
+   }
+
   conversion *sub
= implicit_conversion (elttype, TREE_TYPE (val), val,
   false, flags, complain);
@@ -8841,22 +8902,22 @@ convert_like_internal (conversion *convs
  {
if (TREE_CODE (val) =

[PATCH] c++: Return false from __is_bounded_array for zero-sized arrays [PR118655]

2025-01-27 Thread Jakub Jelinek

Hi!

This is basically Marek's PR114479 r14-9759 __is_array fix applied to
__is_bounded_array as well.  Similarly to that trait, when not using
the builtin it returned false for zero sized arrays but when using
the builtin it returns true.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?
Do we want to backport this to 14.3 or not?

2025-01-28  Jakub Jelinek  

PR c++/118655
* semantics.cc (trait_expr_value) : Return
false for zero-sized arrays.

* g++.dg/ext/is_bounded_array.C: Extend.

--- gcc/cp/semantics.cc.jj  2025-01-17 11:29:33.616689665 +0100
+++ gcc/cp/semantics.cc 2025-01-27 12:21:11.439367867 +0100
@@ -13165,7 +13165,14 @@ trait_expr_value (cp_trait_kind kind, tr
  || DERIVED_FROM_P (type1, type2)));
 
 case CPTK_IS_BOUNDED_ARRAY:
-  return type_code1 == ARRAY_TYPE && TYPE_DOMAIN (type1);
+  return (type_code1 == ARRAY_TYPE
+ && TYPE_DOMAIN (type1)
+ /* We don't want to report T[0] as being a bounded array type.
+This is for compatibility with an implementation of
+std::is_bounded_array by template argument deduction, because
+compute_array_index_type_loc rejects a zero-size array
+in SFINAE context.  */
+ && !(TYPE_SIZE (type1) && integer_zerop (TYPE_SIZE (type1;
 
 case CPTK_IS_CLASS:
   return NON_UNION_CLASS_TYPE_P (type1);
--- gcc/testsuite/g++.dg/ext/is_bounded_array.C.jj  2023-12-22 
12:26:14.917893636 +0100
+++ gcc/testsuite/g++.dg/ext/is_bounded_array.C 2025-01-27 12:34:18.400549388 
+0100
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++11 } }
+// { dg-options "" }
 
 #define SA(X) static_assert((X),#X)
 
@@ -14,21 +15,34 @@
 
 class ClassType { };
 
+constexpr int sz0 = 0;
+constexpr int sz2 = 2;
+
 SA_TEST_CATEGORY(__is_bounded_array, int[2], true);
 SA_TEST_CATEGORY(__is_bounded_array, int[], false);
+SA_TEST_CATEGORY(__is_bounded_array, int[0], false);
 SA_TEST_CATEGORY(__is_bounded_array, int[2][3], true);
 SA_TEST_CATEGORY(__is_bounded_array, int[][3], false);
+SA_TEST_CATEGORY(__is_bounded_array, int[0][3], false);
+SA_TEST_CATEGORY(__is_bounded_array, int[3][0], false);
 SA_TEST_CATEGORY(__is_bounded_array, float*[2], true);
 SA_TEST_CATEGORY(__is_bounded_array, float*[], false);
 SA_TEST_CATEGORY(__is_bounded_array, float*[2][3], true);
 SA_TEST_CATEGORY(__is_bounded_array, float*[][3], false);
 SA_TEST_CATEGORY(__is_bounded_array, ClassType[2], true);
 SA_TEST_CATEGORY(__is_bounded_array, ClassType[], false);
+SA_TEST_CATEGORY(__is_bounded_array, ClassType[0], false);
 SA_TEST_CATEGORY(__is_bounded_array, ClassType[2][3], true);
 SA_TEST_CATEGORY(__is_bounded_array, ClassType[][3], false);
+SA_TEST_CATEGORY(__is_bounded_array, ClassType[0][3], false);
+SA_TEST_CATEGORY(__is_bounded_array, ClassType[2][0], false);
+SA_TEST_CATEGORY(__is_bounded_array, int[sz2], true);
+SA_TEST_CATEGORY(__is_bounded_array, int[sz0], false);
 SA_TEST_CATEGORY(__is_bounded_array, int(*)[2], false);
 SA_TEST_CATEGORY(__is_bounded_array, int(*)[], false);
+SA_TEST_CATEGORY(__is_bounded_array, int(*)[0], false);
 SA_TEST_CATEGORY(__is_bounded_array, int(&)[2], false);
+SA_TEST_CATEGORY(__is_bounded_array, int(&)[0], false);
 SA_TEST_FN(__is_bounded_array, int(&)[], false);
 
 // Sanity check.

Jakub

Re: [PATCH v1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-01-27 Thread Vineet Gupta

On 1/26/25 05:33, pan2...@intel.com wrote:
> From: Pan Li 
>
> After we add the frm register to the global_regs, we may not need to
> define_insn that volatile to emit the frm restore insns.  The
> cooperatively-managed global register will help to handle this, instead
> of emit the volatile define_insn explicitly.
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv.cc (riscv_emit_frm_mode_set): Refactor
>   the frm mode set by removing fsrmsi_restore_volatile.
>   * config/riscv/vector-iterators.md (unspecv): Remove as unnecessary.
>   * config/riscv/vector.md (fsrmsi_restore_volatile): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Adjust
>   the asm dump check times.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-75.c: Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc | 43 ++-
>  gcc/config/riscv/vector-iterators.md  |  4 --
>  gcc/config/riscv/vector.md| 13 --
>  .../rvv/base/float-point-dynamic-frm-49.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-50.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-52.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-74.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-75.c |  2 +-
>  8 files changed, 28 insertions(+), 42 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index dd50fe4eddf..8e3bf0077cd 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -12031,27 +12031,30 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
>if (prev_mode == riscv_vector::FRM_DYN_CALL)
>  emit_insn (gen_frrmsi (backup_reg)); /* Backup frm when DYN_CALL.  */
>  
> -  if (mode != prev_mode)
> -{
> -  rtx frm = gen_int_mode (mode, SImode);
> -
> -  if (mode == riscv_vector::FRM_DYN_CALL
> - && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
> - /* No need to emit when prev mode is DYN already.  */
> - emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
> -  else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
> - && prev_mode != riscv_vector::FRM_DYN
> - && prev_mode != riscv_vector::FRM_DYN_CALL)
> - /* No need to emit when prev mode is DYN or DYN_CALL already.  */
> - emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
> -  else if (mode == riscv_vector::FRM_DYN
> - && prev_mode != riscv_vector::FRM_DYN_CALL)
> - /* Restore frm value from backup when switch to DYN mode.  */
> - emit_insn (gen_fsrmsi_restore (backup_reg));
> -  else if (riscv_static_frm_mode_p (mode))
> - /* Set frm value when switch to static mode.  */
> - emit_insn (gen_fsrmsi_restore (frm));
> +  if (mode == prev_mode)
> +return;

Nit, can you move this check in the caller riscv_emit_mode_set () which already
checks similarly for VXRM (unless there's a corner case where the gen_frrmsi is
needed despite both prev and curr being same.

riscv_emit_mode_set (int entity, int mode, int prev_mode,
 HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
{
  switch (entity)
    {
    case RISCV_VXRM:
  if (mode != VXRM_MODE_NONE && mode != prev_mode)
    emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode)));
  break;
    case RISCV_FRM:
+ if (mode != prev_mode)
  riscv_emit_frm_mode_set (mode, prev_mode);
  break;


> +
> +  if (riscv_static_frm_mode_p (mode))
> +{
> +  /* Set frm value when switch to static mode.  */
> +  emit_insn (gen_fsrmsi_restore (gen_int_mode (mode, SImode)));
> +  return;
>  }
> +
> +  bool restore_p
> += /* No need to emit when prev mode is DYN.  */
> +  (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_CALL
> +   && prev_mode != riscv_vector::FRM_DYN)
> +  /* No need to emit if prev mode is DYN or DYN_CALL.  */
> +  || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_EXIT
> +   && prev_mode != riscv_vector::FRM_DYN
> +   && prev_mode != riscv_vector::FRM_DYN_CALL)
> +  /* Restore frm value when switch to DYN mode.  */
> +  || (mode == riscv_vector::FRM_DYN
> +   && prev_mode != riscv_vector::FRM_DYN_CALL);
> +
> +  if (restore_p)
> +emit_insn (gen_fsrmsi_restore (backup_reg));
>  }
>  
>  /* Implement Mode switching.  */
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index c1bd7397441..f64e7ad70dd 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -122,10 +122,6 @@ (define_c_enum "unspec" [
>UNSPEC_SF_VFNRCLIPU
>  ])
>  
> -(define_c_enum "unspecv" [
> -  UNSPECV_FRM_RESTORE_EXIT
>

79 matches

Mail list logo