date:20250620

[gcc] Created branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users'

2025-06-20 Thread Mikael Morin via Gcc-cvs

The branch 'mikael/heads/non_lvalue_v02' was created in namespace 'refs/users' 
pointing to:

 c6dab88a696d... match: Simplify double not and double negate to a non_lvalu

[gcc(refs/users/mikael/heads/non_lvalue_v02)] match: Simplify double not and double negate to a non_lvalue

2025-06-20 Thread Mikael Morin via Gcc-cvs

https://gcc.gnu.org/g:c6dab88a696d57401ace4c1b57f33ab785c1431c

commit c6dab88a696d57401ace4c1b57f33ab785c1431c
Author: Mikael Morin 
Date:   Thu Jul 4 12:59:34 2024 +0200

match: Simplify double not and double negate to a non_lvalue

I noticed while testing the second patch that none of the NON_LVALUE_EXPR
trees I expected were generated when simplifying unary operators, whereas
they were generated with binary operators.

Regression tested on x86_64-linux.  OK for master?

-- 8< --

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`): Add a NON_LVALUE_EXPR wrapper to the
simplification of doubled unary operators NEGATE_EXPR and
BIT_NOT_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.

Diff:
---
 gcc/match.pd   |  4 ++--
 gcc/testsuite/gfortran.dg/non_lvalue_1.f90 | 21 +
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f53c162fce3..ad0fa8f10044 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2357,7 +2357,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~~x -> x */
 (simplify
   (bit_not (bit_not @0))
-  @0)
+  (non_lvalue @0))
 
 /* zero_one_valued_p will match when a value is known to be either
0 or 1 including constants 0 or 1.
@@ -4037,7 +4037,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (negate (nop_convert? (negate @1)))
   (if (!TYPE_OVERFLOW_SANITIZED (type)
&& !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@1)))
-   (view_convert @1)))
+   (non_lvalue (view_convert @1
 
  /* We can't reassociate floating-point unless -fassociative-math
 or fixed-point plus or minus because of saturation to +-Inf.  */
diff --git a/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 
b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
new file mode 100644
index ..ac52b2720945
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
@@ -0,0 +1,21 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! Check the generation of NON_LVALUE_EXPR trees in cases where a unary 
operator expression
+! simplifies to a data reference.
+
+! A NON_LVALUE_EXPR is generated for a double negation that simplifies to a 
data reference.  */
+function f1 (f1_arg1)
+  integer, value :: f1_arg1
+  integer :: f1
+  f1 = -(-f1_arg1)
+end function
+! { dg-final { scan-tree-dump "__result_f1 = NON_LVALUE_EXPR ;" 
"original" } }
+
+! A NON_LVALUE_EXPR is generated for a double complement that simplifies to a 
data reference.  */
+function f2 (f2_arg1)
+  integer, value :: f2_arg1
+  integer :: f2
+  f2 = not(not(f2_arg1))
+end function
+! { dg-final { scan-tree-dump "__result_f2 = NON_LVALUE_EXPR ;" 
"original" } }

[gcc r16-1589] tree-optimization/120654 - ICE with range query from IVOPTs

2025-06-20 Thread Richard Biener via Gcc-cvs

https://gcc.gnu.org/g:6bd1223bd55ed60fa5dbfd4a8444e133e5e933f5

commit r16-1589-g6bd1223bd55ed60fa5dbfd4a8444e133e5e933f5
Author: Richard Biener 
Date:   Fri Jun 20 11:14:38 2025 +0200

tree-optimization/120654 - ICE with range query from IVOPTs

The following ICEs as we hand down an UNDEFINED range to where it
isn't expected.  Put the guard that's there earlier.

PR tree-optimization/120654
* vr-values.cc (range_fits_type_p): Check for undefined_p ()
before accessing type ().

* gcc.dg/torture/pr120654.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr120654.c | 24 
 gcc/vr-values.cc| 10 +-
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr120654.c 
b/gcc/testsuite/gcc.dg/torture/pr120654.c
new file mode 100644
index ..3819b78281d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr120654.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+int a, c, e, f, h, j;
+long g, k;
+void *malloc(long);
+void free(void *);
+int b(int m) {
+  if (m || a)
+return 1;
+  return 0.0f;
+}
+int d(int m, int p2) { return b(m) + m + (1 + p2 + p2); }
+int i() {
+  long l[] = {2, 9, 7, 8, g, g, 9, 0, 2, g};
+  e = l[c] << 6;
+}
+void n() {
+  long o;
+  int *p = malloc(sizeof(int));
+  k = 1 % j;
+  for (; i() + f + h; o++)
+if (p[d(j + 6, (int)k + 1992695866) + h + f + j + (int)k - 1 + o])
+  free(p);
+}
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 799f1bfd91d8..ff11656559bf 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -944,6 +944,10 @@ range_fits_type_p (const irange *vr,
   widest_int tem;
   signop src_sgn;
 
+  /* Now we can only handle ranges with constant bounds.  */
+  if (vr->undefined_p () || vr->varying_p ())
+return false;
+
   /* We can only handle integral and pointer types.  */
   src_type = vr->type ();
   if (!INTEGRAL_TYPE_P (src_type)
@@ -952,17 +956,13 @@ range_fits_type_p (const irange *vr,
 
   /* An extension is fine unless VR is SIGNED and dest_sgn is UNSIGNED,
  and so is an identity transform.  */
-  src_precision = TYPE_PRECISION (vr->type ());
+  src_precision = TYPE_PRECISION (src_type);
   src_sgn = TYPE_SIGN (src_type);
   if ((src_precision < dest_precision
&& !(dest_sgn == UNSIGNED && src_sgn == SIGNED))
   || (src_precision == dest_precision && src_sgn == dest_sgn))
 return true;
 
-  /* Now we can only handle ranges with constant bounds.  */
-  if (vr->undefined_p () || vr->varying_p ())
-return false;
-
   wide_int vrmin = vr->lower_bound ();
   wide_int vrmax = vr->upper_bound ();

[gcc r16-1592] libgcobol: Add license.

2025-06-20 Thread James K. Lowden via Gcc-cvs

https://gcc.gnu.org/g:632a50abc3a99cace8abc6ed3817f7eb1312c9d2

commit r16-1592-g632a50abc3a99cace8abc6ed3817f7eb1312c9d2
Author: James K. Lowden 
Date:   Fri Jun 20 10:16:26 2025 -0400

libgcobol: Add license.

libgcobol/ChangeLog:

* LICENSE: New file.

Diff:
---
 libgcobol/LICENSE | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/libgcobol/LICENSE b/libgcobol/LICENSE
new file mode 100644
index ..3937993c56a2
--- /dev/null
+++ b/libgcobol/LICENSE
@@ -0,0 +1,27 @@
+Copyright (c) 2021-2025 Symas Corporation
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above
+  copyright notice, this list of conditions and the following disclaimer
+  in the documentation and/or other materials provided with the
+  distribution.
+* Neither the name of the Symas Corporation nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

[gcc(refs/users/mikael/heads/non_lvalue_v02)] match: Simplify double not and double negate to a non_lvalue

2025-06-20 Thread Mikael Morin via Gcc-cvs

https://gcc.gnu.org/g:a63b01d68dd59e6e0fc9048b937b26c87b5fa676

commit a63b01d68dd59e6e0fc9048b937b26c87b5fa676
Author: Mikael Morin 
Date:   Thu Jul 4 12:59:34 2024 +0200

match: Simplify double not and double negate to a non_lvalue

Regression tested on x86_64-linux.  OK for master?

-- 8< --

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`): Add a NON_LVALUE_EXPR wrapper to the
simplification of doubled unary operators NEGATE_EXPR and
BIT_NOT_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.

Diff:
---
 gcc/match.pd   |  4 ++--
 gcc/testsuite/gfortran.dg/non_lvalue_1.f90 | 23 +++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f53c162fce3..ad0fa8f10044 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2357,7 +2357,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~~x -> x */
 (simplify
   (bit_not (bit_not @0))
-  @0)
+  (non_lvalue @0))
 
 /* zero_one_valued_p will match when a value is known to be either
0 or 1 including constants 0 or 1.
@@ -4037,7 +4037,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (negate (nop_convert? (negate @1)))
   (if (!TYPE_OVERFLOW_SANITIZED (type)
&& !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@1)))
-   (view_convert @1)))
+   (non_lvalue (view_convert @1
 
  /* We can't reassociate floating-point unless -fassociative-math
 or fixed-point plus or minus because of saturation to +-Inf.  */
diff --git a/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 
b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
new file mode 100644
index ..536c86b1eb6c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
@@ -0,0 +1,23 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! Check the generation of NON_LVALUE_EXPR expressions in cases where a unary
+! operator expression would simplify to a bare data reference.
+
+! A NON_LVALUE_EXPR is generated for a double negation that would simplify to
+! a bare data reference.
+function f1 (f1_arg1)
+  integer, value :: f1_arg1
+  integer :: f1
+  f1 = -(-f1_arg1)
+end function
+! { dg-final { scan-tree-dump "__result_f1 = NON_LVALUE_EXPR ;" 
"original" } }
+
+! A NON_LVALUE_EXPR is generated for a double complement that would simplify to
+! a bare data reference.
+function f2 (f2_arg1)
+  integer, value :: f2_arg1
+  integer :: f2
+  f2 = not(not(f2_arg1))
+end function
+! { dg-final { scan-tree-dump "__result_f2 = NON_LVALUE_EXPR ;" 
"original" } }

[gcc] Created branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users'

2025-06-20 Thread Mikael Morin via Gcc-cvs

The branch 'mikael/heads/non_lvalue_v02' was created in namespace 'refs/users' 
pointing to:

 a63b01d68dd5... match: Simplify double not and double negate to a non_lvalu

[gcc r15-9850] [RISC-V][PR target/119971] Avoid losing shift count masking

2025-06-20 Thread Jeff Law via Gcc-cvs

https://gcc.gnu.org/g:bf284e8f5edf0d039b1dd9af5d62e355ce85a7ba

commit r15-9850-gbf284e8f5edf0d039b1dd9af5d62e355ce85a7ba
Author: Jeff Law 
Date:   Mon May 5 17:14:29 2025 -0600

[RISC-V][PR target/119971] Avoid losing shift count masking

As is outlined in the PR, we have a few define_insn_and_split patterns which
optimize away explicit masking of shift/bit positions when the masking 
matches
what the hardware's behavior.

A small number of those define_insn_and_split patterns generate a single
instruction.  It's fairly elegant in that we were essentially just rewriting
the RTL to match an existing pattern.

In one case we'd do the rewriting and later turn a 32bit shift into a bset.
That's not safe because the masking of a 32bit shift uses 0x1f while 
masking on
bset uses 0x3f on rv64.   The net was incorrect code as seen in the BZ 
entry.

The fix is pretty simple.  There's no real reason we need to use a
define_insn_and_split.  It was just convenient.  Instead we can use a simple
define_insn.  That avoids a change in the masking behavior for the shift
count/bit position and the masking stays in the RTL.

I quickly scanned the entire port and didn't see any additional
define_insn_and_splits that obviously generated a single instruction outside
the shift/rotate space, though in the vector space that's nontrivial to
ascertain.

This was been run through my tester for the cross configurations, but not 
the
native bootstrap/regression test (yet).

PR target/119971
gcc/
* config/riscv/bitmanip.md (rotation with masked count): Rewrite
as define_insn patterns.  Fix formatting.
* config/riscv/riscv.md (shift with masked count): Similarly.

gcc/testsuite
* gcc.target/riscv/pr119971.c: New test.
* gcc.target/riscv/zbb-rol-ror-03.c: Adjust test slightly.

(cherry picked from commit 05d75c5bfcf923bc0258b79a08c5861590c5a2b9)

Diff:
---
 gcc/config/riscv/bitmanip.md| 59 +
 gcc/config/riscv/riscv.md   | 19 ++--
 gcc/testsuite/gcc.target/riscv/pr119971.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-03.c |  2 +-
 4 files changed, 59 insertions(+), 45 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 5ed5e18cb36a..320531cd1ed0 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -423,39 +423,40 @@
   "rolw\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
-(define_insn_and_split "*3_mask"
-  [(set (match_operand:GPR 0 "register_operand" "= r")
-(bitmanip_rotate:GPR
-(match_operand:GPR 1 "register_operand" "  r")
-(match_operator 4 "subreg_lowpart_operator"
- [(and:GPR2
-   (match_operand:GPR2 2 "register_operand"  "r")
-   (match_operand 3 "" ""))])))]
+(define_insn "*3_mask"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (bitmanip_rotate:X
+ (match_operand:X 1 "register_operand" "r")
+ (match_operator 4 "subreg_lowpart_operator"
+   [(and:X (match_operand:X 2 "register_operand"  "r")
+   (match_operand 3 "" ""))])))]
   "TARGET_ZBB || TARGET_ZBKB"
-  "#"
-  "&& 1"
-  [(set (match_dup 0)
-(bitmanip_rotate:GPR (match_dup 1)
- (match_dup 2)))]
-  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")])
 
-(define_insn_and_split "*si3_sext_mask"
-  [(set (match_operand:DI 0 "register_operand" "= r")
-  (sign_extend:DI (bitmanip_rotate:SI
-(match_operand:SI 1 "register_operand" "  r")
-(match_operator 4 "subreg_lowpart_operator"
- [(and:GPR
-   (match_operand:GPR 2 "register_operand"  "r")
-   (match_operand 3 "const_si_mask_operand"))]]
+(define_insn "*3_mask_si"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (bitmanip_rotate:SI
+ (match_operand:SI 1 "register_operand" "r")
+ (match_operator 3 "subreg_lowpart_operator"
+   [(and:X (match_operand:SI 2 "register_operand"  "r")
+   (const_int 31))])))]
   "TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)"
-  "#"
-  "&& 1"
-  [(set (match_dup 0)
-  (sign_extend:DI (bitmanip_rotate:SI (match_dup 1)
-   (match_dup 2]
-  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  "w\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "SI")])
+
+(define_insn "*si3_sext_mask"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (bitmanip_rotate:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operator 3 "subreg_lowpart_operator"
+ [(and:X (match_operand:G

[gcc r16-1593] amdgcn: allow SImode in VCC_HI [PR120722]

2025-06-20 Thread Andrew Stubbs via Gcc-cvs

https://gcc.gnu.org/g:95752870fb5c51868a084b94705a83d728be52c8

commit r16-1593-g95752870fb5c51868a084b94705a83d728be52c8
Author: Andrew Stubbs 
Date:   Fri Jun 20 16:43:37 2025 +

amdgcn: allow SImode in VCC_HI [PR120722]

This patch isn't fully tested yet, but it fixes the build failure, so that
will do for now.  SImode was not allowed in VCC_HI because there were 
issues,
way back before the port went upstream, so it's possible we'll find out what
those issues were again soon.

gcc/ChangeLog:

PR target/120722
* config/gcn/gcn.cc (gcn_hard_regno_mode_ok): Allow SImode in 
VCC_HI.

Diff:
---
 gcc/config/gcn/gcn.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 31a59dd6f22f..2d8dfa3232e2 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -585,9 +585,8 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode 
mode)
 case XNACK_MASK_HI_REG:
 case TBA_HI_REG:
 case TMA_HI_REG:
-  return mode == SImode;
 case VCC_HI_REG:
-  return false;
+  return mode == SImode;
 case EXEC_HI_REG:
   return mode == SImode /*|| mode == V32BImode */ ;
 case SCC_REG:

[gcc r16-1588] x86: Get the widest vector mode from MOVE_MAX

2025-06-20 Thread H.J. Lu via Gcc-cvs

https://gcc.gnu.org/g:050b1708ea532ea4840e97d85fad4ca63d4cd631

commit r16-1588-g050b1708ea532ea4840e97d85fad4ca63d4cd631
Author: H.J. Lu 
Date:   Thu Jun 19 05:03:48 2025 +0800

x86: Get the widest vector mode from MOVE_MAX

Since MOVE_MAX defines the maximum number of bytes that an instruction
can move quickly between memory and registers, use it to get the widest
vector mode in vector loop when inlining memcpy and memset.

gcc/

PR target/120708
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use
MOVE_MAX to get the widest vector mode in vector loop.

gcc/testsuite/

PR target/120708
* gcc.target/i386/memcpy-pr120708-1.c: New test.
* gcc.target/i386/memcpy-pr120708-2.c: Likewise.
* gcc.target/i386/memcpy-pr120708-3.c: Likewise.
* gcc.target/i386/memcpy-pr120708-4.c: Likewise.
* gcc.target/i386/memcpy-pr120708-5.c: Likewise.
* gcc.target/i386/memcpy-pr120708-6.c: Likewise.
* gcc.target/i386/memset-pr120708-1.c: Likewise.
* gcc.target/i386/memset-pr120708-2.c: Likewise.
* gcc.target/i386/memcpy-strategy-1.c: Drop dg-skip-if.  Replace
-march=atom with -mno-avx -msse2 -mtune=generic
-mtune-ctrl=^sse_typeless_stores.
* gcc.target/i386/memcpy-strategy-2.c: Likewise.
* gcc.target/i386/memcpy-vector_loop-1.c: Likewise.
* gcc.target/i386/memcpy-vector_loop-2.c: Likewise.
* gcc.target/i386/memset-vector_loop-1.c: Likewise.
* gcc.target/i386/memset-vector_loop-2.c: Likewise.

Signed-off-by: H.J. Lu 

Diff:
---
 gcc/config/i386/i386-expand.cc | 31 +++---
 gcc/testsuite/gcc.target/i386/memcpy-pr120708-1.c  | 11 
 gcc/testsuite/gcc.target/i386/memcpy-pr120708-2.c  | 11 
 gcc/testsuite/gcc.target/i386/memcpy-pr120708-3.c  | 11 
 gcc/testsuite/gcc.target/i386/memcpy-pr120708-4.c  | 11 
 gcc/testsuite/gcc.target/i386/memcpy-pr120708-5.c  | 15 +++
 gcc/testsuite/gcc.target/i386/memcpy-pr120708-6.c  | 15 +++
 gcc/testsuite/gcc.target/i386/memcpy-strategy-1.c  |  3 +--
 gcc/testsuite/gcc.target/i386/memcpy-strategy-2.c  |  3 +--
 .../gcc.target/i386/memcpy-vector_loop-1.c |  3 +--
 .../gcc.target/i386/memcpy-vector_loop-2.c |  5 ++--
 gcc/testsuite/gcc.target/i386/memset-pr120708-1.c  | 10 +++
 gcc/testsuite/gcc.target/i386/memset-pr120708-2.c  | 10 +++
 .../gcc.target/i386/memset-vector_loop-1.c |  3 +--
 .../gcc.target/i386/memset-vector_loop-2.c |  3 +--
 15 files changed, 110 insertions(+), 35 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4946f87a1317..423fc632003d 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -9351,7 +9351,6 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx 
count_exp, rtx val_exp,
   bool need_zero_guard = false;
   bool noalign;
   machine_mode move_mode = VOIDmode;
-  machine_mode wider_mode;
   int unroll_factor = 1;
   /* TODO: Once value ranges are available, fill in proper data.  */
   unsigned HOST_WIDE_INT min_size = 0;
@@ -9427,6 +9426,7 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx 
count_exp, rtx val_exp,
 
   unroll_factor = 1;
   move_mode = word_mode;
+  int nunits;
   switch (alg)
 {
 case libcall:
@@ -9447,27 +9447,14 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx 
count_exp, rtx val_exp,
 case vector_loop:
   need_zero_guard = true;
   unroll_factor = 4;
-  /* Find the widest supported mode.  */
-  move_mode = word_mode;
-  while (GET_MODE_WIDER_MODE (move_mode).exists (&wider_mode)
-&& optab_handler (mov_optab, wider_mode) != CODE_FOR_nothing)
-   move_mode = wider_mode;
-
-  if (TARGET_AVX256_SPLIT_REGS && GET_MODE_BITSIZE (move_mode) > 128)
-   move_mode = TImode;
-  if (TARGET_AVX512_SPLIT_REGS && GET_MODE_BITSIZE (move_mode) > 256)
-   move_mode = OImode;
-
-  /* Find the corresponding vector mode with the same size as MOVE_MODE.
-MOVE_MODE is an integer mode at the moment (SI, DI, TI, etc.).  */
-  if (GET_MODE_SIZE (move_mode) > GET_MODE_SIZE (word_mode))
-   {
- int nunits = GET_MODE_SIZE (move_mode) / GET_MODE_SIZE (word_mode);
- if (!mode_for_vector (word_mode, nunits).exists (&move_mode)
- || optab_handler (mov_optab, move_mode) == CODE_FOR_nothing)
-   move_mode = word_mode;
-   }
-  gcc_assert (optab_handler (mov_optab, move_mode) != CODE_FOR_nothing);
+  /* Get the vector mode to move MOVE_MAX bytes.  */
+  nunits = MOVE_MAX / GET_MODE_SIZE (word_mode);
+  if (nunits > 1)
+   {
+ move_mode = mode_for_vector (word_mode, nunits).require ();
+ gcc_assert (optab_handler (mov_optab, move_mode)
+

[gcc] Deleted branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users'

2025-06-20 Thread Mikael Morin via Gcc-cvs

The branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users' was deleted.
It previously pointed to:

 c6dab88a696d... match: Simplify double not and double negate to a non_lvalu

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  c6dab88... match: Simplify double not and double negate to a non_lvalu

[gcc r14-11855] tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis

2025-06-20 Thread Richard Biener via Gcc-cvs

https://gcc.gnu.org/g:638e90e5e8000b6b6b320b02229310c63c441b9f

commit r14-11855-g638e90e5e8000b6b6b320b02229310c63c441b9f
Author: Richard Biener 
Date:   Wed Sep 11 13:54:33 2024 +0200

tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis

When SLP analysis scraps an instance because it fails to analyze we
can end up calling vectorizable_* in analysis mode on a node that
was analyzed during the analysis of that instance again.
vectorizable_simd_clone_call wasn't expecting that and instead
guarded analysis/transform code on populated data structures.
The following changes it so it survives re-analysis.

PR tree-optimization/116674
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Support
re-analysis.

* g++.dg/vect/pr116674.cc: New testcase.

(cherry picked from commit 09a514fbb67caf7e33a6ceddf524ee21024c33c5)

Diff:
---
 gcc/testsuite/g++.dg/vect/pr116674.cc | 85 +++
 gcc/tree-vect-stmts.cc|  8 ++--
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr116674.cc 
b/gcc/testsuite/g++.dg/vect/pr116674.cc
new file mode 100644
index ..1c13f12290bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116674.cc
@@ -0,0 +1,85 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-Ofast" }
+// { dg-additional-options "-march=x86-64-v3" { target { x86_64-*-* i?86-*-* } 
} }
+
+namespace std {
+typedef int a;
+template  struct b;
+template  class aa {};
+template  c d(c e, c) { return e; }
+template  struct b> {
+   using f = c;
+   using g = c *;
+   template  using j = aa;
+};
+} // namespace std
+namespace l {
+template  struct m : std::b {
+   typedef std::b n;
+   typedef typename n::f &q;
+   template  struct ac { typedef typename n::j ad; };
+};
+} // namespace l
+namespace std {
+template  struct o {
+   typedef typename l::m::ac::ad ae;
+   typedef typename l::m::g g;
+   struct p {
+   g af;
+   };
+   struct ag : p {
+   ag(ae) {}
+   };
+   typedef ab u;
+   o(a, u e) : ah(e) {}
+   ag ah;
+};
+template > class r : o {
+   typedef o s;
+   typedef typename s::ae ae;
+   typedef l::m w;
+
+public:
+   c f;
+   typedef typename w::q q;
+   typedef a t;
+   typedef ab u;
+   r(t x, u e = u()) : s(ai(x, e), e) {}
+   q operator[](t x) { return *(this->ah.af + x); }
+   t ai(t x, u) { return x; }
+};
+extern "C" __attribute__((__simd__)) double exp(double);
+} // namespace std
+using namespace std;
+int ak;
+double v, y;
+void am(double, int an, double, double, double, double, double, double, double,
+   double, double, double, int, double, double, double, double,
+   r ap, double, double, double, double, double, double, double,
+   double, r ar, r as, double, double, r at,
+   r au, r av, double, double) {
+double ba;
+for (int k;;)
+  for (int i; i < an; ++i) {
+ y = i;
+ v = d(y, 25.0);
+ ba = exp(v);
+ ar[i * (ak + 1)] = ba;
+ as[i * (ak + 1)] = ar[i * (ak + 1)];
+ if (k && ap[k]) {
+ at[i * (ak + 1)] = av[i * (ak + 1)] = as[i * (ak + 1)];
+ au[i * (ak + 1)] = ar[i * (ak + 1)];
+ } else {
+ au[i * (ak + 1)] = ba;
+ at[i * (ak + 1)] = av[i * (ak + 1)] = k;
+ }
+  }
+}
+void b(int bc) {
+double bd, be, bf, bg, bh, ao, ap, bn, bo, bp, bq, br, bs, bt, bu, bv, bw, 
bx,
+by, aq, ar, as, bz, ca, at, au, av, cb, aw;
+int bi;
+am(bh, bc, bi, bi, bi, bi, bv, bw, bx, by, bu, bt, bi, ao, bn, bo, bp, ap, 
bq,
+   br, bs, bd, be, bf, bg, aq, ar, as, bz, ca, at, au, av, cb, aw);
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 307d989fbe3c..ecac53bbbebb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3945,6 +3945,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   vec& simd_clone_info = (slp_node ? SLP_TREE_SIMD_CLONE_INFO (slp_node)
: STMT_VINFO_SIMD_CLONE_INFO (stmt_info));
+  if (!vec_stmt)
+simd_clone_info.truncate (0);
   arginfo.reserve (nargs, true);
   auto_vec slp_op;
   slp_op.safe_grow_cleared (nargs);
@@ -3993,10 +3995,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   /* For linear arguments, the analyze phase should have saved
 the base and step in {STMT_VINFO,SLP_TREE}_SIMD_CLONE_INFO.  */
-  if (i * 3 + 4 <= simd_clone_info.length ()
+  if (vec_stmt
+ && i * 3 + 4 <= simd_clone_info.length ()
  && simd_clone_info[i * 3 + 2])
{
- gcc_assert (vec_stmt);
  thisarginfo.linear_step = tree_to_shwi (simd_clone_info[i * 3 + 2]);

[gcc r16-1590] Free buffer on function exit [PR120634]

2025-06-20 Thread Jorgen Kvalsvik via Gcc-cvs

https://gcc.gnu.org/g:246c33ac8e8e1967c74ae20c07454a24ef02822a

commit r16-1590-g246c33ac8e8e1967c74ae20c07454a24ef02822a
Author: Jørgen Kvalsvik 
Date:   Thu Jun 19 20:56:30 2025 +0200

Free buffer on function exit [PR120634]

Using auto_vec ensures that the buffer is always free'd when the
function returns.

PR gcov-profile/120634

gcc/ChangeLog:

* prime-paths.cc (trie::paths): Use auto_vec.

Diff:
---
 gcc/prime-paths.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
index 838343c8427d..38feeea15226 100644
--- a/gcc/prime-paths.cc
+++ b/gcc/prime-paths.cc
@@ -635,7 +635,7 @@ trie::insert_with_suffix (array_slice path)
 vec>
 trie::paths () const
 {
-  vec path {};
+  auto_vec path {};
   vec> all {};
   auto iter = paths (path);
   while (iter.next ())

[gcc r16-1591] Use auto_vec in prime paths selftests [PR120634]

2025-06-20 Thread Jorgen Kvalsvik via Gcc-cvs

https://gcc.gnu.org/g:69725b13e9dc8bdb17ec8a7d554071b6b517ad47

commit r16-1591-g69725b13e9dc8bdb17ec8a7d554071b6b517ad47
Author: Jørgen Kvalsvik 
Date:   Thu Jun 19 21:00:07 2025 +0200

Use auto_vec in prime paths selftests [PR120634]

The selftests had a bunch of memory leaks that showed up in make
selftest-valgrind as a result of not using auto_vec or other
explicitly calling release. Replacing vec with auto_vec makes the
problem go away.  The auto_vec_vec helper is made constructable from a
vec so that objects returned from functions can be automatically
managed too.

PR gcov-profile/120634

gcc/ChangeLog:

* prime-paths.cc (struct auto_vec_vec): Add constructor from
vec.
(test_split_components): Use auto_vec_vec.
(test_scc_internal_prime_paths): Ditto.
(test_scc_entry_exit_paths): Ditto.
(test_complete_prime_paths): Ditto.
(test_entry_prime_paths): Ditto.
(test_singleton_path): Ditto.

Diff:
---
 gcc/prime-paths.cc | 48 +++-
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
index 38feeea15226..5b626ce200a0 100644
--- a/gcc/prime-paths.cc
+++ b/gcc/prime-paths.cc
@@ -91,6 +91,8 @@ struct auto_sbitmap_vector
 /* A silly RAII wrpaper for automatically releasing a vec>.  */
 struct auto_vec_vec : vec>
 {
+  auto_vec_vec () = default;
+  auto_vec_vec (vec> v) : vec>(v) {}
   ~auto_vec_vec () { release_vec_vec (*this); }
 };
 
@@ -1658,8 +1660,8 @@ test_split_components ()
   int nscc = graphds_scc (cfg, NULL);
   auto_graph ccfg (disconnect_sccs (cfg));
 
-  vec> entries {};
-  vec> exits {};
+  auto_vec_vec entries {};
+  auto_vec_vec exits {};
   entries.safe_grow_cleared (nscc);
   exits.safe_grow_cleared (nscc);
 
@@ -1707,7 +1709,7 @@ test_split_components ()
  because other graph inconsistencies are easier to detect.  */
 
   /* Count and check singleton components.  */
-  vec scc_size {};
+  auto_vec scc_size {};
   scc_size.safe_grow_cleared (nscc);
   for (int i = 0; i != cfg->n_vertices; ++i)
 scc_size[cfg->vertices[i].component]++;
@@ -1722,14 +1724,14 @@ test_split_components ()
   /* Manually unroll the loop finding the simple paths starting at the
  vertices in the SCCs.  In this case there is only the one SCC.  */
   trie ccfg_paths;
-  simple_paths (ccfg, ccfg_paths, 2);
-  simple_paths (ccfg, ccfg_paths, 4);
-  simple_paths (ccfg, ccfg_paths, 5);
-  simple_paths (ccfg, ccfg_paths, 6);
-  simple_paths (ccfg, ccfg_paths, 7);
-  simple_paths (ccfg, ccfg_paths, 9);
+  auto_vec_vec (simple_paths (ccfg, ccfg_paths, 2));
+  auto_vec_vec (simple_paths (ccfg, ccfg_paths, 4));
+  auto_vec_vec (simple_paths (ccfg, ccfg_paths, 5));
+  auto_vec_vec (simple_paths (ccfg, ccfg_paths, 6));
+  auto_vec_vec (simple_paths (ccfg, ccfg_paths, 7));
+  auto_vec_vec (simple_paths (ccfg, ccfg_paths, 9));
   /* Then in+out of trie.  */
-  vec> xscc_internal_pp = ccfg_paths.paths ();
+  auto_vec_vec xscc_internal_pp = ccfg_paths.paths ();
   trie scc_internal_pp;
   for (auto &p : xscc_internal_pp)
 scc_internal_pp.insert_with_suffix (p);
@@ -1782,7 +1784,7 @@ test_scc_internal_prime_paths ()
   add_edge (scc, 9, 7);
   add_edge (scc, 7, 2);
 
-  vec> paths = prime_paths (scc, 100);
+  auto_vec_vec paths = prime_paths (scc, 100);
   const int p01[] = { 5, 7, 2, 4, 6, 9 };
   const int p02[] = { 4, 6, 9, 7, 2, 4 };
   const int p03[] = { 2, 4, 6, 9, 7, 2 };
@@ -1806,7 +1808,6 @@ test_scc_internal_prime_paths ()
   ASSERT_TRUE (any_equal_p (p09, paths));
   ASSERT_TRUE (any_equal_p (p10, paths));
   ASSERT_TRUE (any_equal_p (p11, paths));
-  release_vec_vec (paths);
 }
 
 /* Test the entry/exit path helpers for the strongly connected component in
@@ -1825,13 +1826,13 @@ test_scc_entry_exit_paths ()
   add_edge (scc, 7, 2);
 
   trie scc_internal_trie;
-  simple_paths (scc, scc_internal_trie, 2);
-  simple_paths (scc, scc_internal_trie, 4);
-  simple_paths (scc, scc_internal_trie, 5);
-  simple_paths (scc, scc_internal_trie, 6);
-  simple_paths (scc, scc_internal_trie, 7);
-  simple_paths (scc, scc_internal_trie, 9);
-  vec> scc_prime_paths = scc_internal_trie.paths ();
+  auto_vec_vec (simple_paths (scc, scc_internal_trie, 2));
+  auto_vec_vec (simple_paths (scc, scc_internal_trie, 4));
+  auto_vec_vec (simple_paths (scc, scc_internal_trie, 5));
+  auto_vec_vec (simple_paths (scc, scc_internal_trie, 6));
+  auto_vec_vec (simple_paths (scc, scc_internal_trie, 7));
+  auto_vec_vec (simple_paths (scc, scc_internal_trie, 9));
+  auto_vec_vec scc_prime_paths = scc_internal_trie.paths ();
 
   trie entry_exits {};
   scc_entry_exit_paths (scc_prime_paths, 2, 2, entry_exits);
@@ -1867,8 +1868,6 @@ test_scc_entry_exit_paths ()
   ASSERT_EQ (count (entries), 2);
   ASSERT_TRUE (contains (entries, p07));
   ASSERT_TRUE (contains (entries, p08));
-
-  release_vec_

[gcc r16-1594] Fix range wrap check and enhance verify_range.

2025-06-20 Thread Andrew Macleod via Gcc-cvs

https://gcc.gnu.org/g:b03e0d69b37f6ea7aef220652635031a89f56a11

commit r16-1594-gb03e0d69b37f6ea7aef220652635031a89f56a11
Author: Andrew MacLeod 
Date:   Fri Jun 20 08:50:39 2025 -0400

Fix range wrap check and enhance verify_range.

when snapping range bounds to satidsdaybitmask constraints, end bound 
overflow
and underflow checks were not working properly.
Also Adjust some comments, and enhance verify_range to make sure range pairs
are sorted properly.

PR tree-optimization/120701
gcc/
* value-range.cc (irange::verify_range): Verify range pairs are
sorted properly.
(irange::snap): Check for over/underflow properly.

gcc/testsuite/
* gcc.dg/pr120701.c: New.

Diff:
---
 gcc/testsuite/gcc.dg/pr120701.c | 40 
 gcc/value-range.cc  | 38 +-
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr120701.c b/gcc/testsuite/gcc.dg/pr120701.c
new file mode 100644
index ..09f7b6192eda
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr120701.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int a, b, c, e, f;
+int main() {
+  int d, g, i;
+j:
+  if (d >= 0)
+goto k;
+  if (g >= 0)
+goto l;
+k:
+  i = a + 3;
+m:
+  f = 652685095 + 818172564 * g;
+  if (-1101344938 * f - 1654872807 * d >= 0)
+goto n;
+  goto l;
+o:
+  if (i) {
+c = -b;
+if (-c >= 0)
+  goto l;
+g = b;
+b = i + 5;
+if (b * c)
+  goto n;
+goto o;
+  }
+  if (e)
+goto m;
+  goto j;
+n:
+  d = 978208086 * g - 1963072513;
+  if (d + i)
+return 0;
+  goto k;
+l:
+  goto o;
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 0f0770ad7051..ce13acc312d2 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1552,6 +1552,11 @@ irange::verify_range ()
   gcc_checking_assert (ub.get_precision () == prec);
   int c = wi::cmp (lb, ub, TYPE_SIGN (m_type));
   gcc_checking_assert (c == 0 || c == -1);
+  // Previous UB should be lower than LB
+  if (i > 0)
+   gcc_checking_assert (wi::lt_p (upper_bound (i - 1),
+  lb,
+  TYPE_SIGN (m_type)));
 }
   m_bitmask.verify_mask ();
 }
@@ -1628,7 +1633,7 @@ irange::contains_p (const wide_int &cst) const
   if (undefined_p ())
 return false;
 
-  // Check is the known bits in bitmask exclude CST.
+  // Check if the known bits in bitmask exclude CST.
   if (!m_bitmask.member_p (cst))
 return false;
 
@@ -2269,7 +2274,7 @@ irange::invert ()
 
 // This routine will take the bounds [LB, UB], and apply the bitmask to those
 // values such that both bounds satisfy the bitmask.  TRUE is returned
-// if either bound changes, and they are retuirned as [NEW_LB, NEW_UB].
+// if either bound changes, and they are returned as [NEW_LB, NEW_UB].
 // if NEW_UB < NEW_LB, then the entire bound is to be removed as none of
 // the values are valid.
 //   ie,   [4, 14] MASK 0xFFFE  VALUE 0x1
@@ -2285,30 +2290,29 @@ irange::snap (const wide_int &lb, const wide_int &ub,
   uint z = wi::ctz (m_bitmask.mask ());
   if (z == 0)
 return false;
-  const wide_int &wild_mask = m_bitmask.mask ();
 
   const wide_int step = (wi::one (TYPE_PRECISION (type ())) << z);
   const wide_int match_mask = step - 1;
   const wide_int value = m_bitmask.value () & match_mask;
 
-  wide_int rem_lb = lb & match_mask;
-
-  wi::overflow_type ov_sub;
-  wide_int diff = wi::sub(value, rem_lb, UNSIGNED, &ov_sub);
-  wide_int offset = diff & match_mask;
+  bool ovf = false;
 
-  wi::overflow_type ov1;
-  new_lb = wi::add (lb, offset, UNSIGNED, &ov1);
+  wide_int rem_lb = lb & match_mask;
+  wide_int offset = (value - rem_lb) & match_mask;
+  new_lb = lb + offset;
+  // Check for overflows at +INF
+  if (wi::lt_p (new_lb, lb, TYPE_SIGN (type (
+ovf = true;
 
   wide_int rem_ub = ub & match_mask;
   wide_int offset_ub = (rem_ub - value) & match_mask;
-
-  wi::overflow_type ov2;
-  new_ub = wi::sub (ub, offset_ub, UNSIGNED, &ov2);
+  new_ub = ub - offset_ub;
+  // Check for underflows at -INF
+  if (wi::gt_p (new_ub, ub, TYPE_SIGN (type (
+ovf = true;
 
   // Overflow or inverted range = invalid
-  if (ov1 != wi::OVF_NONE || ov2 != wi::OVF_NONE
-  || wi::lt_p (new_ub, new_lb, TYPE_SIGN (type (
+  if (ovf || wi::lt_p (new_ub, new_lb, TYPE_SIGN (type (
 {
   new_lb = wi::one (lb.get_precision ());
   new_ub = wi::zero (ub.get_precision ());
@@ -2454,7 +2458,7 @@ irange::set_range_from_bitmask ()
 
   // Make sure we call intersect, so do it first.
   changed = intersect (mask_range) | changed;
-  // Npw make sure each subrange endpoint matches the bitmask.
+  // Now make sure each subrange endpoint matches the bitmask.
   changed |= snap_subranges ();
 
   return changed;
@@ -2548,7 +2552,7 @@ irange::inter

[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:95913b192448b8f17208186a87e58f5282720eb2

commit 95913b192448b8f17208186a87e58f5282720eb2
Author: Alexandre Oliva 
Date:   Thu Jun 19 11:05:36 2025 -0300

[lra] simplify disabling of fp2sp elimination [PR120424]

Whether with or without the lra fp2sp elimination accumulated
improvements, building a native arm-linux-gnueabihf toolchain with
{BOOT_CFLAGS,TFLAGS}='-O2 -g -fnon-call-exceptions
-fstack-clash-protection' doesn't get very far: crtbegin.o gets
miscompiled in __do_global_dtors_aux, as spilled pseudos get assigned
stack slots that get incorrectly adjusted after the fp2sp elimination
is disabled.

AFAICT eliminations are reversible before the final round, and ISTM
that deferring spilling of registers in disabled eliminations to
update_reg_eliminate would avoid the incorrect adjustments I saw in
spilling within or after lra_update_fp2sp_elimination: the logic to
deal with them was already there, we just have to set the stage for it
to do its job.

Diff:
---
 gcc/lra-eliminations.cc | 38 +++---
 gcc/lra-int.h   |  2 +-
 gcc/lra-spills.cc   | 13 +++--
 3 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 2719a11d9dd6..2278661097a3 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1405,30 +1405,18 @@ process_insn_for_elimination (rtx_insn *insn, bool 
final_p, bool first_p)
permitted frame pointer elimination and now target reports that we can not
do this elimination anymore.  Record spilled pseudos in SPILLED_PSEUDOS
unless it is null, and return the recorded pseudos number.  */
-int
-lra_update_fp2sp_elimination (int *spilled_pseudos)
+void
+lra_update_fp2sp_elimination (void)
 {
-  int n;
-  HARD_REG_SET set;
   class lra_elim_table *ep;
 
   if (frame_pointer_needed || !targetm.frame_pointer_required ())
-return 0;
+return;
   gcc_assert (!elimination_fp2sp_occured_p);
-  ep = elimination_map[FRAME_POINTER_REGNUM];
-  if (ep->to == STACK_POINTER_REGNUM)
-{
-  elimination_map[FRAME_POINTER_REGNUM] = NULL;
-  setup_can_eliminate (ep, false);
-}
-  else
-ep = NULL;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
 " Frame pointer can not be eliminated anymore\n");
   frame_pointer_needed = true;
-  CLEAR_HARD_REG_SET (set);
-  add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
   /* If !lra_reg_spill_p, we likely have incomplete range information
  for pseudos assigned to the frame pointer that will have to be
  spilled, and so we may end up incorrectly sharing them unless we
@@ -1437,12 +1425,24 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
 /* If lives ranges changed, update the aggregate live ranges in
slots as well before spilling any further pseudos.  */
 lra_recompute_slots_live_ranges ();
-  n = spill_pseudos (set, spilled_pseudos);
-  if (!ep)
+  /* Conceivably, we wouldn't need to disable the fp2sp elimination
+ unless the target says so, but we have to vacate the frame
+ pointer register to make room for the frame pointer proper, and
+ disabling the elimination will spill everything assigned to it
+ during update_reg_eliminate.  */
+  ep = elimination_map[FRAME_POINTER_REGNUM];
+  if (ep->to == STACK_POINTER_REGNUM)
+{
+  /* Avoid using this known-unused elimination when removing
+spilled pseudos.  */
+  elimination_map[FRAME_POINTER_REGNUM] = NULL;
+  setup_can_eliminate (ep, false);
+}
+  else
 for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
   if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
-   setup_can_eliminate (ep, false);
-  return n;
+   setup_can_eliminate (ep, false);
+  return;
 }
 
 /* Return true if we have a pseudo assigned to hard frame pointer.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 0cf7266ce646..05d28af7c9e0 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -433,7 +433,7 @@ extern int lra_get_elimination_hard_regno (int);
 extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode,
 bool, bool, poly_int64, bool);
 extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64);
-extern int lra_update_fp2sp_elimination (int *spilled_pseudos);
+extern void lra_update_fp2sp_elimination (void);
 extern bool lra_fp_pseudo_p (void);
 extern void lra_eliminate (bool, bool);
 
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 7603d0dcf163..494ef3c96c8e 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -658,7 +658,7 @@ lra_need_for_spills_p (void)
 void
 lra_spill (void)
 {
-  int i, n, n2, curr_regno;
+  int i, n, curr_regno;
   int *pseudo_regnos;
 
   regs_num = max_reg_num ();
@@ -685,15 +685,8 @@ lra_spill (void)
   for (i = 0; i < n; i++)
 if (pseudo_slots[pseudo_r

[gcc(refs/users/aoliva/heads/testme)] [lra] recompute ranges upon disabling fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:ed2c07a2eb0f94f760da263fbb0c3ef410e13a51

commit ed2c07a2eb0f94f760da263fbb0c3ef410e13a51
Author: Alexandre Oliva 
Date:   Wed Jun 18 04:11:18 2025 -0300

[lra] recompute ranges upon disabling fp2sp elimination [PR120424]

If the frame size grows to nonzero, arm_frame_pointer_required may
flip to true under -fstack-clash-protection -fnon-call-exceptions, and
that may disable the fp2sp elimination part-way through lra.

If pseudos had got assigned to the frame pointer register before that,
they have to be spilled, and that requires complete live range
information.  If !lra_reg_spill_p, lra_spill won't have live ranges
for such pseudos, and they could end up sharing spill slots with other
pseudos whose live ranges actually overlap.

This affects at least Ada.Strings.Wide_Superbounded.Super_Insert and
.Super_Replace_Slice in libgnat/a-stwisu.adb, when compiled with -O2
-fstack-clash-protection -march=armv7 (implied Thumb2), causing
acats-4's cdd2a01 to fail.

Recomputing live ranges including registers may renumber and compress
points, so we have to recompute the aggregated live ranges for
already-assigned spill slots as well.

As a safety net, reject empty live ranges when computing slot sharing.


for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (lra_update_fp2sp_elimination):
Compute complete live ranges and recompute slots' live ranges
if needed.
* lra-lives.cc (lra_reset_live_range_list): New.
(lra_complete_live_ranges): New.
* lra-spills.cc (assign_spill_hard_regs): Reject empty live
ranges.
(add_pseudo_to_slot): Likewise.
(lra_recompute_slots_live_ranges): New.
* lra-int.h (lra_reset_live_range_list): Declare.
(lra_complete_live_ranges): Declare.
(lra_recompute_slots_live_ranges): Declare.

Diff:
---
 gcc/lra-eliminations.cc |  8 
 gcc/lra-int.h   |  3 +++
 gcc/lra-lives.cc| 22 ++
 gcc/lra-spills.cc   | 37 +
 4 files changed, 70 insertions(+)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index ff05501dbbbd..6663d1c37e8b 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1429,6 +1429,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
+  /* If !lra_reg_spill_p, we likely have incomplete range information
+ for pseudos assigned to the frame pointer that will have to be
+ spilled, and so we may end up incorrectly sharing them unless we
+ get live range information for them.  */
+  if (lra_complete_live_ranges ())
+/* If lives ranges changed, update the aggregate live ranges in
+   slots as well before spilling any further pseudos.  */
+lra_recompute_slots_live_ranges ();
   n = spill_pseudos (set, spilled_pseudos);
   if (!ep)
 for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index ad42f48cc822..0cf7266ce646 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -381,7 +381,9 @@ extern int *lra_point_freq;
 extern int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
 
 extern int lra_live_range_iter;
+extern void lra_reset_live_range_list (lra_live_range_t &);
 extern void lra_create_live_ranges (bool, bool);
+extern bool lra_complete_live_ranges (void);
 extern lra_live_range_t lra_copy_live_range_list (lra_live_range_t);
 extern lra_live_range_t lra_merge_live_ranges (lra_live_range_t,
   lra_live_range_t);
@@ -417,6 +419,7 @@ extern bool lra_need_for_scratch_reg_p (void);
 extern bool lra_need_for_spills_p (void);
 extern void lra_spill (void);
 extern void lra_final_code_change (void);
+extern void lra_recompute_slots_live_ranges (void);
 
 /* lra-remat.cc:  */
 
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index ffce162907e9..3b1d7a97caf9 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -113,6 +113,15 @@ free_live_range_list (lra_live_range_t lr)
 }
 }
 
+/* Reset and release live range list LR.  */
+void
+lra_reset_live_range_list (lra_live_range_t &lr)
+{
+  lra_live_range_t first = lr;
+  lr = NULL;
+  free_live_range_list (first);
+}
+
 /* Create and return pseudo live range with given attributes.  */
 static lra_live_range_t
 create_live_range (int regno, int start, int finish, lra_live_range_t next)
@@ -1524,6 +1533,19 @@ lra_create_live_ranges (bool all_p, bool dead_insn_p)
   lra_assert (! res);
 }
 
+/* Run lra_create_live_ranges if !complete_info_p.  Return FALSE iff
+   live ranges are known to have remained unchanged.  */
+
+bool
+lra_complete_live_ranges (void)
+{
+  if (complete_info_p)
+return false

[gcc(refs/users/aoliva/heads/testme)] [lra] recompute ranges upon disabling fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:6eb0e6bb15dd02c7f0c396c71059731924496ae6

commit 6eb0e6bb15dd02c7f0c396c71059731924496ae6
Author: Alexandre Oliva 
Date:   Wed Jun 18 04:11:18 2025 -0300

[lra] recompute ranges upon disabling fp2sp elimination [PR120424]

If the frame size grows to nonzero, arm_frame_pointer_required may
flip to true under -fstack-clash-protection -fnon-call-exceptions, and
that may disable the fp2sp elimination part-way through lra.

If pseudos had got assigned to the frame pointer register before that,
they have to be spilled, and that requires complete live range
information.  If !lra_reg_spill_p, lra_spill won't have live ranges
for such pseudos, and they could end up sharing spill slots with other
pseudos whose live ranges actually overlap.

This affects at least Ada.Strings.Wide_Superbounded.Super_Insert and
.Super_Replace_Slice in libgnat/a-stwisu.adb, when compiled with -O2
-fstack-clash-protection -march=armv7 (implied Thumb2), causing
acats-4's cdd2a01 to fail.

Recomputing live ranges including registers may renumber and compress
points, so we have to recompute the aggregated live ranges for
already-assigned spill slots as well.

As a safety net, reject empty live ranges when computing slot sharing.


for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (lra_update_fp2sp_elimination):
Compute complete live ranges and recompute slots' live ranges
if needed.
* lra-lives.cc (lra_reset_live_range_list): New.
(lra_complete_live_ranges): New.
* lra-spills.cc (assign_spill_hard_regs): Reject empty live
ranges.
(add_pseudo_to_slot): Likewise.
(lra_recompute_slots_live_ranges): New.
* lra-int.h (lra_reset_live_range_list): Declare.
(lra_complete_live_ranges): Declare.
(lra_recompute_slots_live_ranges): Declare.

Diff:
---
 gcc/lra-eliminations.cc |  8 
 gcc/lra-int.h   |  3 +++
 gcc/lra-lives.cc| 22 ++
 gcc/lra-spills.cc   | 37 +
 4 files changed, 70 insertions(+)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 6c8c91086f32..2719a11d9dd6 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1429,6 +1429,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
+  /* If !lra_reg_spill_p, we likely have incomplete range information
+ for pseudos assigned to the frame pointer that will have to be
+ spilled, and so we may end up incorrectly sharing them unless we
+ get live range information for them.  */
+  if (lra_complete_live_ranges ())
+/* If lives ranges changed, update the aggregate live ranges in
+   slots as well before spilling any further pseudos.  */
+lra_recompute_slots_live_ranges ();
   n = spill_pseudos (set, spilled_pseudos);
   if (!ep)
 for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index ad42f48cc822..0cf7266ce646 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -381,7 +381,9 @@ extern int *lra_point_freq;
 extern int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
 
 extern int lra_live_range_iter;
+extern void lra_reset_live_range_list (lra_live_range_t &);
 extern void lra_create_live_ranges (bool, bool);
+extern bool lra_complete_live_ranges (void);
 extern lra_live_range_t lra_copy_live_range_list (lra_live_range_t);
 extern lra_live_range_t lra_merge_live_ranges (lra_live_range_t,
   lra_live_range_t);
@@ -417,6 +419,7 @@ extern bool lra_need_for_scratch_reg_p (void);
 extern bool lra_need_for_spills_p (void);
 extern void lra_spill (void);
 extern void lra_final_code_change (void);
+extern void lra_recompute_slots_live_ranges (void);
 
 /* lra-remat.cc:  */
 
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index ffce162907e9..3b1d7a97caf9 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -113,6 +113,15 @@ free_live_range_list (lra_live_range_t lr)
 }
 }
 
+/* Reset and release live range list LR.  */
+void
+lra_reset_live_range_list (lra_live_range_t &lr)
+{
+  lra_live_range_t first = lr;
+  lr = NULL;
+  free_live_range_list (first);
+}
+
 /* Create and return pseudo live range with given attributes.  */
 static lra_live_range_t
 create_live_range (int regno, int start, int finish, lra_live_range_t next)
@@ -1524,6 +1533,19 @@ lra_create_live_ranges (bool all_p, bool dead_insn_p)
   lra_assert (! res);
 }
 
+/* Run lra_create_live_ranges if !complete_info_p.  Return FALSE iff
+   live ranges are known to have remained unchanged.  */
+
+bool
+lra_complete_live_ranges (void)
+{
+  if (complete_info_p)
+return false

[gcc/aoliva/heads/testme] (3 commits) [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

The branch 'aoliva/heads/testme' was updated to point to:

 23c8aa5860bd... [lra] simplify disabling of fp2sp elimination [PR120424]

It previously pointed to:

 c6ce3a5fe59c... [lra] simplify disabling of fp2sp elimination [PR120424]

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  c6ce3a5... [lra] simplify disabling of fp2sp elimination [PR120424]
  b4101b1... [genoutput] mark scratch outputs as eliminable [PR120424]
  79e2fb7... [lra] recompute ranges upon disabling fp2sp elimination [PR


Summary of changes (added commits):
---

  23c8aa5... [lra] simplify disabling of fp2sp elimination [PR120424]
  6eb0e6b... [lra] recompute ranges upon disabling fp2sp elimination [PR
  329376c... [genoutput] mark scratch outputs as eliminable [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:23c8aa5860bd6eacf759ad9ad6431d28a6724458

commit 23c8aa5860bd6eacf759ad9ad6431d28a6724458
Author: Alexandre Oliva 
Date:   Thu Jun 19 11:05:36 2025 -0300

[lra] simplify disabling of fp2sp elimination [PR120424]

Whether with or without the lra fp2sp elimination accumulated
improvements, building a native arm-linux-gnueabihf toolchain with
{BOOT_CFLAGS,TFLAGS}='-O2 -g -fnon-call-exceptions
-fstack-clash-protection' doesn't get very far: crtbegin.o gets
miscompiled in __do_global_dtors_aux, as spilled pseudos get assigned
stack slots that get incorrectly adjusted after the fp2sp elimination
is disabled.

AFAICT eliminations are reversible before the final round, and ISTM
that deferring spilling of registers in disabled eliminations to
update_reg_eliminate would avoid the incorrect adjustments I saw in
spilling within or after lra_update_fp2sp_elimination: the logic to
deal with them was already there, we just have to set the stage for it
to do its job, and only be concerned with disabled fp2sp eliminations
during the final round, during which no further disabling of
eliminations is expected.

Diff:
---
 gcc/lra-eliminations.cc | 68 -
 gcc/lra-int.h   |  2 +-
 gcc/lra-spills.cc   | 13 +++---
 3 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 2719a11d9dd6..227064e2d84a 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -309,8 +309,12 @@ move_plus_up (rtx x)
   return x;
 }
 
-/* Flag that we already did frame pointer to stack pointer elimination.  */
-static bool elimination_fp2sp_occured_p = false;
+/* elimination_fp2sp_final is set as we start the final round of
+   elimination.  elimination_fp2sp_used is copied from
+   elimination_fp2sp_final whenever any frame pointer to stack pointer
+   is encountered.  */
+static bool elimination_fp2sp_final = false;
+static bool elimination_fp2sp_used = false;
 
 /* Scan X and replace any eliminable registers (such as fp) with a
replacement (such as sp) if SUBST_P, plus an offset.  The offset is
@@ -318,7 +322,7 @@ static bool elimination_fp2sp_occured_p = false;
substitution if UPDATE_P, or the full offset if FULL_P, or
otherwise zero.  If FULL_P, we also use the SP offsets for
elimination to SP.  If UPDATE_P, use UPDATE_SP_OFFSET for updating
-   offsets of register elimnable to SP.  If UPDATE_SP_OFFSET is
+   offsets of register eliminable to SP.  If UPDATE_SP_OFFSET is
non-zero, don't use difference of the offset and the previous
offset.
 
@@ -370,7 +374,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
  if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
-   elimination_fp2sp_occured_p = true;
+   elimination_fp2sp_used = elimination_fp2sp_final;
 
  if (maybe_ne (update_sp_offset, 0))
{
@@ -402,8 +406,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
- if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx
+ && ep->from == FRAME_POINTER_REGNUM)
+   elimination_fp2sp_used = elimination_fp2sp_final;
 
  if (! update_p && ! full_p)
return simplify_gen_binary (PLUS, Pmode, to, XEXP (x, 1));
@@ -465,8 +470,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
- if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx
+ && ep->from == FRAME_POINTER_REGNUM)
+   elimination_fp2sp_used = elimination_fp2sp_final;
 
  if (maybe_ne (update_sp_offset, 0))
{
@@ -1199,7 +1205,8 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
 that actual elimination has not been done yet.   */
  gcc_assert (ep->to_rtx != stack_pointer_rtx
  || (ep->from == FRAME_POINTER_REGNUM
- && !elimination_fp2sp_occured_p)
+ && !(elimination_fp2sp_final
+  && elimination_fp2sp_used))
  || (ep->from < FIRST_PSEUDO_REGISTER
  && fixed_regs [ep->from]));
 
@@ -1405,30 +1412,19 @@ process_insn_for_elimination (rtx_insn *insn, bool 
final_p, bool first_p)
permitted frame pointer elimination and now target reports that we can not
do this elimination anymore.  Reco

[gcc(refs/users/aoliva/heads/testme)] [genoutput] mark scratch outputs as eliminable [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:329376ceeb6fd78c3da2392dca11e0e66737d5d8

commit 329376ceeb6fd78c3da2392dca11e0e66737d5d8
Author: Alexandre Oliva 
Date:   Wed Jun 18 04:13:19 2025 -0300

[genoutput] mark scratch outputs as eliminable [PR120424]

acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2
-fstack-clash-protection -march=armv7-a -marm: a clobbered scratch
register in a *iorsi3_compare0_scratch pattern gets initially assigned
to the frame pointer register, but at some point during lra the frame
size grows to nonzero, arm_frame_pointer_required flips to true, and
the fp2sp elimination has to be disabled, so the scratch register gets
spilled to a stack slot.

It needs to get the sfp elimination at that point, because later
rounds of elimination will assume the previous round's offset has
already been applied.  But since scratch matches are not regarded as
eliminable by genoutput, we don't attempt elimination in the clobbered
stack slot MEM rtx.

Later on, lra issues a reload for that slot, using a new pseudo
allocated to a hardware register, that gets stored in the stack slot
after the original insn.  Elimination in that reload store insn
eventually updates the elimination offset, but it's an incremental
update, assuming that the offset so far has already been applied.

Without applying the initial offset, the store ends up overlapping
with the function's register save area, corrupting a caller's
call-saved register.

AFAICT the old reload's elimination wouldn't be harmed by allowing
elimination in scratch operands, so I'm enabling eliminable for them
regardless.  Should it be found to make a difference, we could
presumably set a different bit in eliminable to enable reload and lra
to tell them apart and behave accordingly.


for  gcc/ChangeLog

PR rtl-optimization/120424
* genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable.

Diff:
---
 gcc/genoutput.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index dd4e7b80c2a9..25d0b8b86467 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -478,7 +478,7 @@ scan_operands (class data *d, rtx part, int this_address_p,
   d->operand[opno].n_alternatives
= n_occurrences (',', d->operand[opno].constraint) + 1;
   d->operand[opno].address_p = 0;
-  d->operand[opno].eliminable = 0;
+  d->operand[opno].eliminable = 1;
   return;
 
 case MATCH_OPERATOR:

[gcc r16-1597] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread Pan Li via Gcc-cvs

https://gcc.gnu.org/g:52582b40a9bf839ae3771de1557ce6691eb8eedd

commit r16-1597-g52582b40a9bf839ae3771de1557ce6691eb8eedd
Author: Pan Li 
Date:   Thu Jun 19 18:58:17 2025 +0800

RISC-V: Fix ICE for expand_select_vldi [PR120652]

The will be one ICE when expand pass, the bt similar as below.

during RTL pass: expand
red.c: In function 'main':
red.c:20:5: internal compiler error: in require, at machmode.h:323
   20 | int main() {
  | ^~~~
0x2e0b1d6 internal_error(char const*, ...)
../../../gcc/gcc/diagnostic-global-context.cc:517
0xd0d3ed fancy_abort(char const*, int, char const*)
../../../gcc/gcc/diagnostic.cc:1803
0xc3da74 opt_mode::require() const
../../../gcc/gcc/machmode.h:323
0xc3de2f opt_mode::require() const
../../../gcc/gcc/poly-int.h:1383
0xc3de2f riscv_vector::expand_select_vl(rtx_def**)
../../../gcc/gcc/config/riscv/riscv-v.cc:4218
0x21c7d22 gen_select_vldi(rtx_def*, rtx_def*, rtx_def*)
../../../gcc/gcc/config/riscv/autovec.md:1344
0x134db6c maybe_expand_insn(insn_code, unsigned int, expand_operand*)
../../../gcc/gcc/optabs.cc:8257
0x134db6c expand_insn(insn_code, unsigned int, expand_operand*)
../../../gcc/gcc/optabs.cc:8288
0x11b21d3 expand_fn_using_insn
../../../gcc/gcc/internal-fn.cc:318
0xef32cf expand_call_stmt
../../../gcc/gcc/cfgexpand.cc:3097
0xef32cf expand_gimple_stmt_1
../../../gcc/gcc/cfgexpand.cc:4264
0xef32cf expand_gimple_stmt
../../../gcc/gcc/cfgexpand.cc:4411
0xef95b6 expand_gimple_basic_block
../../../gcc/gcc/cfgexpand.cc:6472
0xefb66f execute
../../../gcc/gcc/cfgexpand.cc:7223

The select_vl op_1 and op_2 may be the same const_int like (const_int 32).
And then maybe_legitimize_operands will:

1. First mov the const op_1 to a reg.
2. Resue the reg of op_1 for op_2 as the op_1 and op_2 is equal.

That will break the assumption that the op_2 of select_vl is immediate,
or something like CONST_INT_POLY.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

PR target/120652

gcc/ChangeLog:

* config/riscv/autovec.md: Add immediate_operand for
select_vl operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr120652-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652.h: New test.

Signed-off-by: Pan Li 

Diff:
---
 gcc/config/riscv/autovec.md|  2 +-
 .../gcc.target/riscv/rvv/autovec/pr120652-1.c  |  5 
 .../gcc.target/riscv/rvv/autovec/pr120652-2.c  |  5 
 .../gcc.target/riscv/rvv/autovec/pr120652-3.c  |  5 
 .../gcc.target/riscv/rvv/autovec/pr120652.h| 31 ++
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c678eefc7003..94a61bdc5cf5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1338,7 +1338,7 @@
 (define_expand "select_vl"
   [(match_operand:P 0 "register_operand")
(match_operand:P 1 "vector_length_operand")
-   (match_operand:P 2 "")]
+   (match_operand:P 2 "immediate_operand")]
   "TARGET_VECTOR"
 {
   riscv_vector::expand_select_vl (operands);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c
new file mode 100644
index ..260e4c08f16f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvl256b -mabi=lp64d -O3" } */
+
+#include "pr120652.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c
new file mode 100644
index ..6f8594267662
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvl512b -mabi=lp64d -O3" } */
+
+#include "pr120652.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c
new file mode 100644
index ..9852b5de86a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvl1024b -mabi=lp64d -O3" } */
+
+#include "pr120652.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652.h 
b/gcc/te

[gcc/aoliva/heads/testme] [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

The branch 'aoliva/heads/testme' was updated to point to:

 95913b192448... [lra] simplify disabling of fp2sp elimination [PR120424]

It previously pointed to:

 23c8aa5860bd... [lra] simplify disabling of fp2sp elimination [PR120424]

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  23c8aa5... [lra] simplify disabling of fp2sp elimination [PR120424]


Summary of changes (added commits):
---

  95913b1... [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc r16-1598] Implement afdo inliner

2025-06-20 Thread Jan Hubicka via Gcc-cvs

https://gcc.gnu.org/g:8f40a8e8f8d1ebe931d52f914533036c2f950814

commit r16-1598-g8f40a8e8f8d1ebe931d52f914533036c2f950814
Author: Jan Hubicka 
Date:   Wed Jun 18 12:10:25 2025 +0200

Implement afdo inliner

This patch moves afdo inlining from early inliner into specialized one.
The reason is that early inliner is by design non-recursive while afdo
inliner needs to recurse.  In the past google handled it by increasing
early inliner iterations, but it can be done easily and cheaply without
it by simply recusing into inlined functions.

I will also look into moving VPT to early inliner now.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (get_inline_stack): Add fn parameter.
* ipa-inline.cc (want_early_inline_function_p): Do not care
about AFDO.
(inline_functions_by_afdo): New function.
(early_inliner): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Likewise.
* gcc.dg/tree-prof/afdo-inline.c: New test.

Diff:
---
 gcc/auto-profile.cc| 21 +++--
 gcc/ipa-inline.cc  | 90 +++---
 gcc/testsuite/gcc.dg/tree-prof/afdo-inline.c   | 27 +++
 .../gcc.dg/tree-prof/afdo-vpt-earlyinline.c|  6 +-
 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c |  4 +-
 5 files changed, 128 insertions(+), 20 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 3272cbec9b07..07580f8cc998 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -386,7 +386,8 @@ get_function_decl_from_block (tree block)
 /* Store inline stack for STMT in STACK.  */
 
 static void
-get_inline_stack (location_t locus, inline_stack *stack)
+get_inline_stack (location_t locus, inline_stack *stack,
+ tree fn = current_function_decl)
 {
   if (LOCATION_LOCUS (locus) == UNKNOWN_LOCATION)
 return;
@@ -408,9 +409,7 @@ get_inline_stack (location_t locus, inline_stack *stack)
   locus = tmp_locus;
 }
 }
-  stack->safe_push (
-  std::make_pair (current_function_decl,
-  get_combined_location (locus, current_function_decl)));
+  stack->safe_push (std::make_pair (fn, get_combined_location (locus, fn)));
 }
 
 /* Return STMT's combined location, which is a 32bit integer in which
@@ -822,7 +821,19 @@ autofdo_source_profile::get_callsite_total_count (
 {
   inline_stack stack;
   stack.safe_push (std::make_pair (edge->callee->decl, 0));
-  get_inline_stack (gimple_location (edge->call_stmt), &stack);
+
+  cgraph_edge *e = edge;
+  do
+{
+  get_inline_stack (gimple_location (e->call_stmt), &stack,
+   e->caller->decl);
+  /* If caller is inlined, continue building stack.  */
+  if (!e->caller->inlined_to)
+   e = NULL;
+  else
+   e = e->caller->callers;
+}
+  while (e);
 
   function_instance *s = get_function_instance_by_inline_stack (stack);
   if (s == NULL
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index 35e5496d8463..c4ea37820913 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -782,14 +782,6 @@ want_early_inline_function_p (struct cgraph_edge *e)
 
   if (DECL_DISREGARD_INLINE_LIMITS (callee->decl))
 ;
-  /* For AutoFDO, we need to make sure that before profile summary, all
- hot paths' IR look exactly the same as profiled binary. As a result,
- in einliner, we will disregard size limit and inline those callsites
- that are:
-   * inlined in the profiled binary, and
-   * the cloned callee has enough samples to be considered "hot".  */
-  else if (flag_auto_profile && afdo_callsite_hot_enough_for_early_inline (e))
-;
   else if (!DECL_DECLARED_INLINE_P (callee->decl)
   && !opt_for_fn (e->caller->decl, flag_inline_small_functions))
 {
@@ -3117,6 +3109,81 @@ early_inline_small_functions (struct cgraph_node *node)
   return inlined;
 }
 
+/* With auto-fdo inline all functions that was inlined in the train run
+   and inlining seems useful.  That is there are enough samples in the callee
+   function.
+
+   Unlike early inlining, we inline recursively.
+   TODO: We should also integrate VPT.  */
+
+static bool
+inline_functions_by_afdo (struct cgraph_node *node)
+{
+  if (!flag_auto_profile)
+return false;
+  struct cgraph_edge *e;
+  bool inlined = false;
+
+  for (e = node->callees; e; e = e->next_callee)
+{
+  struct cgraph_node *callee = e->callee->ultimate_alias_target ();
+
+  if (!e->inline_failed)
+   {
+ inlined |= inline_functions_by_afdo (e->callee);
+ continue;
+   }
+  if (!afdo_callsite_hot_enough_for_early_inline (e))
+   continue;
+
+  if (callee->definition
+ && !ipa_fn_summaries->get (callee))
+   compute_fn_summary (c

[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:c6ce3a5fe59c2668354cc4cf9a3e946a8215f57a

commit c6ce3a5fe59c2668354cc4cf9a3e946a8215f57a
Author: Alexandre Oliva 
Date:   Thu Jun 19 11:05:36 2025 -0300

[lra] simplify disabling of fp2sp elimination [PR120424]

Whether with or without the lra fp2sp elimination accumulated
improvements, building a native arm-linux-gnueabihf toolchain with
{BOOT_CFLAGS,TFLAGS}='-O2 -g -fnon-call-exceptions
-fstack-clash-protection' doesn't get very far: crtbegin.o gets
miscompiled in __do_global_dtors_aux, as spilled pseudos get assigned
stack slots that get incorrectly adjusted after the fp2sp elimination
is disabled.

AFAICT eliminations are reversible before the final round, and ISTM
that deferring spilling of registers in disabled eliminations to
update_reg_eliminate would avoid the incorrect adjustments I saw in
spilling within or after lra_update_fp2sp_elimination: the logic to
deal with them was already there, we just have to set the stage for it
to do its job, and only be concerned with disabled fp2sp eliminations
during the final round, during which no further disabling of
eliminations is expected.

Diff:
---
 gcc/lra-eliminations.cc | 68 -
 gcc/lra-int.h   |  2 +-
 gcc/lra-spills.cc   | 13 +++---
 3 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 09959bbe3ed9..56f07520d12f 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -309,8 +309,12 @@ move_plus_up (rtx x)
   return x;
 }
 
-/* Flag that we already did frame pointer to stack pointer elimination.  */
-static bool elimination_fp2sp_occured_p = false;
+/* elimination_fp2sp_final is set as we start the final round of
+   elimination.  elimination_fp2sp_used is copied from
+   elimination_fp2sp_final whenever any frame pointer to stack pointer
+   is encountered.  */
+static bool elimination_fp2sp_final = false;
+static bool elimination_fp2sp_used = false;
 
 /* Scan X and replace any eliminable registers (such as fp) with a
replacement (such as sp) if SUBST_P, plus an offset.  The offset is
@@ -318,7 +322,7 @@ static bool elimination_fp2sp_occured_p = false;
substitution if UPDATE_P, or the full offset if FULL_P, or
otherwise zero.  If FULL_P, we also use the SP offsets for
elimination to SP.  If UPDATE_P, use UPDATE_SP_OFFSET for updating
-   offsets of register elimnable to SP.  If UPDATE_SP_OFFSET is
+   offsets of register eliminable to SP.  If UPDATE_SP_OFFSET is
non-zero, don't use difference of the offset and the previous
offset.
 
@@ -370,7 +374,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
  if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
-   elimination_fp2sp_occured_p = true;
+   elimination_fp2sp_used = elimination_fp2sp_final;
 
  if (maybe_ne (update_sp_offset, 0))
{
@@ -402,8 +406,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
- if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx
+ && ep->from == FRAME_POINTER_REGNUM)
+   elimination_fp2sp_used = elimination_fp2sp_final;
 
  if (! update_p && ! full_p)
return simplify_gen_binary (PLUS, Pmode, to, XEXP (x, 1));
@@ -465,8 +470,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
- if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
-   elimination_fp2sp_occured_p = true;
+ if (ep->to_rtx == stack_pointer_rtx
+ && ep->from == FRAME_POINTER_REGNUM)
+   elimination_fp2sp_used = elimination_fp2sp_final;
 
  if (maybe_ne (update_sp_offset, 0))
{
@@ -1199,7 +1205,8 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
 that actual elimination has not been done yet.   */
  gcc_assert (ep->to_rtx != stack_pointer_rtx
  || (ep->from == FRAME_POINTER_REGNUM
- && !elimination_fp2sp_occured_p)
+ && !(elimination_fp2sp_final
+  && elimination_fp2sp_used))
  || (ep->from < FIRST_PSEUDO_REGISTER
  && fixed_regs [ep->from]));
 
@@ -1405,30 +1412,19 @@ process_insn_for_elimination (rtx_insn *insn, bool 
final_p, bool first_p)
permitted frame pointer elimination and now target reports that we can not
do this elimination anymore.  Reco

[gcc(refs/users/aoliva/heads/testme)] [genoutput] mark scratch outputs as eliminable [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:b4101b15255fe09fc3dccd36f9cb0b4e1978e94c

commit b4101b15255fe09fc3dccd36f9cb0b4e1978e94c
Author: Alexandre Oliva 
Date:   Wed Jun 18 04:13:19 2025 -0300

[genoutput] mark scratch outputs as eliminable [PR120424]

acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2
-fstack-clash-protection -march=armv7-a -marm: a clobbered scratch
register in a *iorsi3_compare0_scratch pattern gets initially assigned
to the frame pointer register, but at some point during lra the frame
size grows to nonzero, arm_frame_pointer_required flips to true, and
the fp2sp elimination has to be disabled, so the scratch register gets
spilled to a stack slot.

It needs to get the sfp elimination at that point, because later
rounds of elimination will assume the previous round's offset has
already been applied.  But since scratch matches are not regarded as
eliminable by genoutput, we don't attempt elimination in the clobbered
stack slot MEM rtx.

Later on, lra issues a reload for that slot, using a new pseudo
allocated to a hardware register, that gets stored in the stack slot
after the original insn.  Elimination in that reload store insn
eventually updates the elimination offset, but it's an incremental
update, assuming that the offset so far has already been applied.

Without applying the initial offset, the store ends up overlapping
with the function's register save area, corrupting a caller's
call-saved register.

AFAICT the old reload's elimination wouldn't be harmed by allowing
elimination in scratch operands, so I'm enabling eliminable for them
regardless.  Should it be found to make a difference, we could
presumably set a different bit in eliminable to enable reload and lra
to tell them apart and behave accordingly.


for  gcc/ChangeLog

PR rtl-optimization/120424
* genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable.

Diff:
---
 gcc/genoutput.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index dd4e7b80c2a9..25d0b8b86467 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -478,7 +478,7 @@ scan_operands (class data *d, rtx part, int this_address_p,
   d->operand[opno].n_alternatives
= n_occurrences (',', d->operand[opno].constraint) + 1;
   d->operand[opno].address_p = 0;
-  d->operand[opno].eliminable = 0;
+  d->operand[opno].eliminable = 1;
   return;
 
 case MATCH_OPERATOR:

[gcc(refs/users/aoliva/heads/testme)] [lra] inactivate disabled fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:1498558396ef74391911407e13b76ff3a09e60b0

commit 1498558396ef74391911407e13b76ff3a09e60b0
Author: Alexandre Oliva 
Date:   Fri Jun 6 02:03:31 2025 -0300

[lra] inactivate disabled fp2sp elimination [PR120424]

Even after we disable the fp2sp elimination when it is the active
elimination for the fp, spilling might use it before
update_reg_eliminate runs and inactivates it for good.  If it is used,
update_reg_eliminate will fail the check that fp2sp was not used.

Since we keep track of uses of this specific elimination, and
lra_update_fp2sp_elimination checks it before disabling it, we know it
hasn't been used, so we can inactivate it without any ill effects.

This fixes the pr118591-1.c avr-none regression exposed by the
PR120424 fix.


for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (lra_update_fp2sp_elimination):
Inactivate the unused fp2sp elimination right away.

Diff:
---
 gcc/lra-eliminations.cc | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index bb708b007a4e..6c8c91086f32 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1415,6 +1415,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   if (frame_pointer_needed || !targetm.frame_pointer_required ())
 return 0;
   gcc_assert (!elimination_fp2sp_occured_p);
+  ep = elimination_map[FRAME_POINTER_REGNUM];
+  if (ep->to == STACK_POINTER_REGNUM)
+{
+  elimination_map[FRAME_POINTER_REGNUM] = NULL;
+  setup_can_eliminate (ep, false);
+}
+  else
+ep = NULL;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
 " Frame pointer can not be eliminated anymore\n");
@@ -1422,9 +1430,10 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
   n = spill_pseudos (set, spilled_pseudos);
-  for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
-if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
-  setup_can_eliminate (ep, false);
+  if (!ep)
+for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+  if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
+   setup_can_eliminate (ep, false);
   return n;
 }

[gcc(refs/users/aoliva/heads/testme)] [lra] recompute ranges upon disabling fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:79e2fb764a79b04290cea5d19207f00e2e7637bf

commit 79e2fb764a79b04290cea5d19207f00e2e7637bf
Author: Alexandre Oliva 
Date:   Wed Jun 18 04:11:18 2025 -0300

[lra] recompute ranges upon disabling fp2sp elimination [PR120424]

If the frame size grows to nonzero, arm_frame_pointer_required may
flip to true under -fstack-clash-protection -fnon-call-exceptions, and
that may disable the fp2sp elimination part-way through lra.

If pseudos had got assigned to the frame pointer register before that,
they have to be spilled, and that requires complete live range
information.  If !lra_reg_spill_p, lra_spill won't have live ranges
for such pseudos, and they could end up sharing spill slots with other
pseudos whose live ranges actually overlap.

This affects at least Ada.Strings.Wide_Superbounded.Super_Insert and
.Super_Replace_Slice in libgnat/a-stwisu.adb, when compiled with -O2
-fstack-clash-protection -march=armv7 (implied Thumb2), causing
acats-4's cdd2a01 to fail.

Recomputing live ranges including registers may renumber and compress
points, so we have to recompute the aggregated live ranges for
already-assigned spill slots as well.

As a safety net, reject empty live ranges when computing slot sharing.


for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (lra_update_fp2sp_elimination):
Compute complete live ranges and recompute slots' live ranges
if needed.
* lra-lives.cc (lra_reset_live_range_list): New.
(lra_complete_live_ranges): New.
* lra-spills.cc (assign_spill_hard_regs): Reject empty live
ranges.
(add_pseudo_to_slot): Likewise.
(lra_recompute_slots_live_ranges): New.
* lra-int.h (lra_reset_live_range_list): Declare.
(lra_complete_live_ranges): Declare.
(lra_recompute_slots_live_ranges): Declare.

Diff:
---
 gcc/lra-eliminations.cc |  8 
 gcc/lra-int.h   |  3 +++
 gcc/lra-lives.cc| 22 ++
 gcc/lra-spills.cc   | 37 +
 4 files changed, 70 insertions(+)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 6c8c91086f32..09959bbe3ed9 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1429,6 +1429,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
+  /* If !lra_reg_spill_p, we likely have incomplete range information
+ for pseudos assigned to the frame pointer that will have to be
+ spilled, and so we may end up incorrectly sharing them unless we
+ get live range information for them.  */
+  if (lra_complete_live_ranges (true))
+/* If lives ranges changed, update the aggregate live ranges in
+   slots as well before spilling any further pseudos.  */
+lra_recompute_slots_live_ranges ();
   n = spill_pseudos (set, spilled_pseudos);
   if (!ep)
 for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index ad42f48cc822..99e4699648cc 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -381,7 +381,9 @@ extern int *lra_point_freq;
 extern int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
 
 extern int lra_live_range_iter;
+extern void lra_reset_live_range_list (lra_live_range_t &);
 extern void lra_create_live_ranges (bool, bool);
+extern bool lra_complete_live_ranges (bool);
 extern lra_live_range_t lra_copy_live_range_list (lra_live_range_t);
 extern lra_live_range_t lra_merge_live_ranges (lra_live_range_t,
   lra_live_range_t);
@@ -417,6 +419,7 @@ extern bool lra_need_for_scratch_reg_p (void);
 extern bool lra_need_for_spills_p (void);
 extern void lra_spill (void);
 extern void lra_final_code_change (void);
+extern void lra_recompute_slots_live_ranges (void);
 
 /* lra-remat.cc:  */
 
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index ffce162907e9..722efc83bbc4 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -113,6 +113,15 @@ free_live_range_list (lra_live_range_t lr)
 }
 }
 
+/* Reset and release live range list LR.  */
+void
+lra_reset_live_range_list (lra_live_range_t &lr)
+{
+  lra_live_range_t first = lr;
+  lr = NULL;
+  free_live_range_list (first);
+}
+
 /* Create and return pseudo live range with given attributes.  */
 static lra_live_range_t
 create_live_range (int regno, int start, int finish, lra_live_range_t next)
@@ -1524,6 +1533,19 @@ lra_create_live_ranges (bool all_p, bool dead_insn_p)
   lra_assert (! res);
 }
 
+/* Run lra_create_live_ranges if !complete_info_p.  Return FALSE iff
+   live ranges are known to have remained unchanged.  */
+
+bool
+lra_complete_live_ranges (bool dead_insn_p)
+{
+  if (complete_info_p)
+

[gcc(refs/users/aoliva/heads/testme)] [arm] require armv7 support for [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

https://gcc.gnu.org/g:026509bfd57223583035a372d2a13d8f73447ed1

commit 026509bfd57223583035a372d2a13d8f73447ed1
Author: Alexandre Oliva 
Date:   Thu Jun 19 05:57:38 2025 -0300

[arm] require armv7 support for [PR120424]

Without stating the architecture version required by the test, test
runs with options that are incompatible with the required
architecture version fail, e.g. -mfloat-abi=hard.

armv7 was not covered by the long list of arm variants in
target-supports.exp, so add it, and use it for the effective target
requirement and for the option.


for  gcc/testsuite/ChangeLog

PR rtl-optimization/120424
* lib/target-supports.exp (arm arches): Add arm_arch_v7.
* g++.target/arm/pr120424.C: Require armv7 support.  Use
dg-add-options arm_arch_v7 instead of explicit -march=armv7.

Diff:
---
 gcc/testsuite/g++.target/arm/pr120424.C | 4 +++-
 gcc/testsuite/lib/target-supports.exp   | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/arm/pr120424.C 
b/gcc/testsuite/g++.target/arm/pr120424.C
index 4d0e49013c04..40295ac80da9 100644
--- a/gcc/testsuite/g++.target/arm/pr120424.C
+++ b/gcc/testsuite/g++.target/arm/pr120424.C
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv7 -O2 -fstack-clash-protection 
-fnon-call-exceptions" } */
+/* { dg-require-effective-target arm_arch_v7_ok } */
+/* { dg-options "-O2 -fstack-clash-protection -fnon-call-exceptions" } */
+/* { dg-add-options arm_arch_v7 } */
 /* { dg-final { scan-assembler-not {#-8} } } */
 /* LRA register elimination gets confused when register spilling
causes arm_frame_pointer_required to switch from false to true, and
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index dfffe3adfbdd..858fa1787f19 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6073,6 +6073,7 @@ foreach { armfunc armflag armdefs } {
v6z_arm "-march=armv6z+fp -marm" "__ARM_ARCH_6Z__ && !__thumb__"
v6z_thumb "-march=armv6z+fp -mthumb -mfloat-abi=softfp" 
"__ARM_ARCH_6Z__ && __thumb__"
v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
+   v7 "-march=armv7" __ARM_ARCH_7__
v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
v7a_fp_hard "-march=armv7-a+fp -mfpu=auto -mfloat-abi=hard" 
__ARM_ARCH_7A__

[gcc/aoliva/heads/testme] (5 commits) [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-20 Thread Alexandre Oliva via Gcc-cvs

The branch 'aoliva/heads/testme' was updated to point to:

 c6ce3a5fe59c... [lra] simplify disabling of fp2sp elimination [PR120424]

It previously pointed to:

 930fc5e2f0a4... [genoutput] mark scratch outputs as eliminable

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  930fc5e... [genoutput] mark scratch outputs as eliminable
  6987d76... [lra] recompute ranges upon disabling fp2sp elimination
  af1a44d... [arm] [vxworks] require thumb2 for pr120424.C
  577cc6d... [lra] inactivate disabled fp2sp elimination


Summary of changes (added commits):
---

  c6ce3a5... [lra] simplify disabling of fp2sp elimination [PR120424]
  b4101b1... [genoutput] mark scratch outputs as eliminable [PR120424]
  79e2fb7... [lra] recompute ranges upon disabling fp2sp elimination [PR
  1498558... [lra] inactivate disabled fp2sp elimination [PR120424]
  026509b... [arm] require armv7 support for [PR120424]

[gcc r16-1599] Extend afdo inliner to introduce speculative calls

2025-06-20 Thread Jan Hubicka via Gcc-cvs

https://gcc.gnu.org/g:d29cf57f9e4e9e16285a627a1717269ef7cf131b

commit r16-1599-gd29cf57f9e4e9e16285a627a1717269ef7cf131b
Author: Jan Hubicka 
Date:   Sat Jun 21 05:37:24 2025 +0200

Extend afdo inliner to introduce speculative calls

This patch makes the AFDO's VPT to happen during early inlining.  This 
should
make the einline pass inside afdo pass unnecesary, but some inlining still
happens there - I will need to debug why that happens and will try to drop 
the
afdo's inliner incrementally.

get_inline_stack_in_node can now be used to produce inline stack out of
callgraph nodes which are marked as inline clones, so we do not need to 
iterate
tree-inline and IPA decisions phases like old code did.   I also added some
debug facilities - dumping of decisions and inline stacks, so one can match
them with data in gcov profile.

Former VPT pass identified all caes where in train run indirect call was 
inlined
and the inlined callee collected some samples. In this case it forced 
inline without
doing any checks, such as whether inlining is possible.

New code simply introduces speculative edges into callgraph and lets afdo 
inlining
to decide.  Old code also marked statements that were introduced during 
promotion
to prevent doing double speculation i.e.

   if (ptr == foo)
  .. inlined foo ...
   else
  ptr ();

to

   if (ptr == foo)
  .. inlined foo ...
   else if (ptr == foo)
  foo (); // for IPA inlining
   else
  ptr ();

Since inlning now happens much earlier, tracking the statements would be 
quite hard.
Instead I simply remove the targets from profile data which sould have same 
effect.

I also noticed that there is nothing setting max_count so all non-0 profile 
is
considered hot which I fixed too.

Training with ref run I now get
500.perlbench_r   1160   9.93  *   1162   
9.84  *
502.gcc_r NR
   NR
505.mcf_r 1186   8.68  *   1194   
8.34  *
520.omnetpp_r 1183   7.15  *   1208   
6.32  *
523.xalancbmk_r   NR
   NR
525.x264_r1 85.220.5   *   1 85.8
20.4   *
531.deepsjeng_r   1165   6.93  *   1176   
6.51  *
541.leela_r   1268   6.18  *   1282   
5.87  *
548.exchange2_r   1 86.330.4   *   1 88.9
29.5   *
557.xz_r  1224   4.81  *   1224   
4.82  *
 Est. SPECrate2017_int_base  9.72
 Est. SPECrate2017_int_peak   
9.33

503.bwaves_r  NR
   NR
507.cactuBSSN_r   1107  11.9   *   1  105
12.0   *
508.namd_r1108   8.79  *   1  116 
8.18  *
510.parest_r  1143  18.3   *   1  156
16.8   *
511.povray_r  1188  12.4   *   1  163
14.4   *
519.lbm_r 1 72.014.6   *   1   75.0  
14.1   *
521.wrf_r 1106  21.1   *   1  106
21.1   *
526.blender_r 1147  10.3   *   1  147
10.4   *
527.cam4_r1110  15.9   *   1  118
14.8   *
538.imagick_r 1104  23.8   *   1  105
23.7   *
544.nab_r 1146  11.6   *   1  143
11.8   *
549.fotonik3d_r   1134  29.0   *   1  169
23.1   *
554.roms_r1 86.618.4   *   1   89.3  
17.8   *
 Est. SPECrate2017_fp_base   15.4
 Est. SPECrate2017_fp_peak
14.9

Base is without profile feedback and peak is AFDO.

gcc/ChangeLog:

* auto-profile.cc (dump_inline_stack): New function.
(get_inline_stack_in_node): New function.
(get_relative_location_for_stmt): Add FN parameter.
(has_indirect_call): Remove.
(function_instance::find_icall_target_map): Add FN parameter.
(function_instance::remove_icall_target): New function.
(function_instance::read_function_instance): Set sum_max.
(autofdo_source_profile::get_count_info): Add NODE parameter.
(autofdo_source_profile::update_inlined_ind_target): Add NODE 
parameter.
(autofdo_source_profile::remove_icall_target): New function.

[gcc r16-1595] cobol: Correct diagnostic strings for 32-bit builds.

2025-06-20 Thread James K. Lowden via Gcc-cvs

https://gcc.gnu.org/g:007392c0f93cf46b9e87aebdd04e123e3381fc07

commit r16-1595-g007392c0f93cf46b9e87aebdd04e123e3381fc07
Author: James K. Lowden 
Date:   Fri Jun 20 12:43:51 2025 -0400

cobol: Correct diagnostic strings for 32-bit builds.

Avoid %z for printf-family.  Cast pid_t to long.  Avoid use of YYUNDEF
for old Bison versions.

PR cobol/120621

gcc/cobol/ChangeLog:

* genapi.cc (parser_compile_ecs): Cast argument to unsigned long.
(parser_compile_dcls): Same.
(parser_division): RAII.
(inspect_tally): Cast argument to unsigned long.
* lexio.cc (cdftext::lex_open): Cast pid_t to long.
* parse.y: hard-code values for old versions of Bison, and message 
format.
* scan_ante.h (wait_for_the_child): Cast pid_t to long.

Diff:
---
 gcc/cobol/genapi.cc   | 25 +
 gcc/cobol/lexio.cc|  6 +++---
 gcc/cobol/parse.y |  8 
 gcc/cobol/scan_ante.h |  9 ++---
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
index 0ea41f167afa..42f1599a87f6 100644
--- a/gcc/cobol/genapi.cc
+++ b/gcc/cobol/genapi.cc
@@ -957,8 +957,8 @@ parser_compile_ecs( const std::vector& ecs )
 {
 SHOW_PARSE_HEADER
 char ach[64];
-snprintf(ach, sizeof(ach), " Size is %ld; retval is %p",
- ecs.size(), as_voidp(retval));
+snprintf(ach, sizeof(ach), " Size is %lu; retval is %p",
+ gb4(ecs.size()), as_voidp(retval));
 SHOW_PARSE_TEXT(ach)
 SHOW_PARSE_END
 }
@@ -966,8 +966,8 @@ parser_compile_ecs( const std::vector& ecs )
 {
 TRACE1_HEADER
 char ach[64];
-snprintf(ach, sizeof(ach), " Size is %ld; retval is %p",
- ecs.size(), as_voidp(retval));
+snprintf(ach, sizeof(ach), " Size is %lu; retval is %p",
+ gb4(ecs.size()), as_voidp(retval));
 TRACE1_TEXT_ABC("", ach, "");
 TRACE1_END
 }
@@ -1006,8 +1006,8 @@ parser_compile_dcls( const std::vector& dcls )
 {
 SHOW_PARSE_HEADER
 char ach[64];
-snprintf(ach, sizeof(ach), " Size is %ld; retval is %p",
- dcls.size(), as_voidp(retval));
+snprintf(ach, sizeof(ach), " Size is %lu; retval is %p",
+ gb4(dcls.size()), as_voidp(retval));
 SHOW_PARSE_TEXT(ach);
 SHOW_PARSE_END
 }
@@ -1015,8 +1015,8 @@ parser_compile_dcls( const std::vector& dcls )
 {
 TRACE1_HEADER
 char ach[64];
-snprintf(ach, sizeof(ach), " Size is %ld; retval is %p",
- dcls.size(), as_voidp(retval));
+snprintf(ach, sizeof(ach), " Size is %lu; retval is %p",
+ gb4(dcls.size()), as_voidp(retval));
 TRACE1_TEXT_ABC("", ach, "");
 TRACE1_END
 }
@@ -6898,7 +6898,7 @@ parser_division(cbl_division_t division,
 
   // There are 'nusing' elements in the PROCEDURE DIVISION USING list.
 
-  tree parameter;
+  tree parameter = NULL_TREE;
   tree rt_i = gg_define_int();
   for(size_t i=0; i(pid));
 
 if( WIFSIGNALED(status) ) {
-  cbl_errx( "%s pid %d terminated by %s",
-   filter, kid, strsignal(WTERMSIG(status)) );
+  cbl_errx( "%s pid %ld terminated by %s",
+filter, static_cast(kid), strsignal(WTERMSIG(status)) );
 }
 if( WIFEXITED(status) ) {
   if( (status = WEXITSTATUS(status)) != 0 ) {
diff --git a/gcc/cobol/parse.y b/gcc/cobol/parse.y
index 99295e8db3e3..f0faaa415776 100644
--- a/gcc/cobol/parse.y
+++ b/gcc/cobol/parse.y
@@ -11409,8 +11409,8 @@ keyword_str( int token ) {
   switch( token ) {
   case YYEOF:   return "YYEOF";
   case YYEMPTY: return "YYEMPTY";
-  case YYerror: return "YYerror";
-  case YYUNDEF: return "invalid token";
+  case 256: return "YYerror";
+  case 257: return "invalid token"; // YYUNDEF
   }
   
   if( token < 256 ) {
@@ -12359,7 +12359,7 @@ numstr2i( const char input[], radix_t radix ) {
 return output;
   }
   if( erc == -1 ) {
-yywarn("'%s' was accepted as %wd", input, integer);
+yywarn("'%s' was accepted as %zu", input, integer);
   }
   return output;
 }
@@ -13141,7 +13141,7 @@ literal_subscripts_valid( YYLTYPE loc, const 
cbl_refer_t& name ) {
 
 // X(0): subscript 1 of for out of range for 02 X OCCURS 4 to 6
 error_msg(loc, "%s(%s): subscript %zu out of range "
-   "for %s %s OCCURS %lu%s",
+   "for %s %s OCCURS %zu%s",
  oob->name, subscript_names.c_str(), 1 + isub,
  oob->level_str(), oob->name,
  oob->occurs.bounds.lower, upper_phrase );
diff --git a/gcc/cobol/scan_ante.h b/gcc/cobol/scan_ante.h
index 037c929aff33..96b688e75128 100644
--- a/gcc/cobol/scan_ante.h
+++ b/gcc/cobol/scan_ante.h
@@ -824,17 +824,20 @@ wait_for_the_child(void) {
   }
 
   if( WIFSIGNALED(status) ) {
-yywarn( "process %d terminated by %s", pid, strsignal(WTERMSIG(status)) );
+yywarn( "process %ld terminated by %s", 
+   static

[gcc] Created branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users'

[gcc(refs/users/mikael/heads/non_lvalue_v02)] match: Simplify double not and double negate to a non_lvalue

[gcc r16-1589] tree-optimization/120654 - ICE with range query from IVOPTs

[gcc r16-1592] libgcobol: Add license.

[gcc(refs/users/mikael/heads/non_lvalue_v02)] match: Simplify double not and double negate to a non_lvalue

[gcc] Created branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users'

[gcc r15-9850] [RISC-V][PR target/119971] Avoid losing shift count masking

[gcc r16-1593] amdgcn: allow SImode in VCC_HI [PR120722]

[gcc r16-1588] x86: Get the widest vector mode from MOVE_MAX

[gcc] Deleted branch 'mikael/heads/non_lvalue_v02' in namespace 'refs/users'

[gcc r14-11855] tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis

[gcc r16-1590] Free buffer on function exit [PR120634]

[gcc r16-1591] Use auto_vec in prime paths selftests [PR120634]

[gcc r16-1594] Fix range wrap check and enhance verify_range.

[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [lra] recompute ranges upon disabling fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [lra] recompute ranges upon disabling fp2sp elimination [PR120424]

[gcc/aoliva/heads/testme] (3 commits) [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [genoutput] mark scratch outputs as eliminable [PR120424]

[gcc r16-1597] RISC-V: Fix ICE for expand_select_vldi [PR120652]

[gcc/aoliva/heads/testme] [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc r16-1598] Implement afdo inliner

[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [genoutput] mark scratch outputs as eliminable [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [lra] inactivate disabled fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [lra] recompute ranges upon disabling fp2sp elimination [PR120424]

[gcc(refs/users/aoliva/heads/testme)] [arm] require armv7 support for [PR120424]

[gcc/aoliva/heads/testme] (5 commits) [lra] simplify disabling of fp2sp elimination [PR120424]

[gcc r16-1599] Extend afdo inliner to introduce speculative calls

[gcc r16-1595] cobol: Correct diagnostic strings for 32-bit builds.

31 matches

Site Navigation

Mail list logo

Footer information