[gcc r14-10160] cfgrtl: Fix MEM_EXPR update in duplicate_insn_chain [PR114924]

2024-05-03 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:242fbc0df6c23115c47d256e66fba6a770265c5d

commit r14-10160-g242fbc0df6c23115c47d256e66fba6a770265c5d
Author: Alex Coplan 
Date:   Fri May 3 09:23:59 2024 +0100

cfgrtl: Fix MEM_EXPR update in duplicate_insn_chain [PR114924]

The PR shows that when cfgrtl.cc:duplicate_insn_chain attempts to
update the MR_DEPENDENCE_CLIQUE information for a MEM_EXPR we can end up
accidentally dropping (e.g.) an ARRAY_REF from the MEM_EXPR and end up
replacing it with the underlying MEM_REF.  This leads to an
inconsistency in the MEM_EXPR information, and could lead to wrong code.

While the walk down to the MEM_REF is necessary to update
MR_DEPENDENCE_CLIQUE, we should use the outer tree expression for the
MEM_EXPR.  This patch does that.

gcc/ChangeLog:

PR rtl-optimization/114924
* cfgrtl.cc (duplicate_insn_chain): When updating MEM_EXPRs,
don't strip (e.g.) ARRAY_REFs from the final MEM_EXPR.

(cherry picked from commit fe40d525619eee9c2821126390df75068df4773a)

Diff:
---
 gcc/cfgrtl.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index 304c429c99b..a5dc3512159 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -4432,12 +4432,13 @@ duplicate_insn_chain (rtx_insn *from, rtx_insn *to,
   since MEM_EXPR is shared so make a copy and
   walk to the subtree again.  */
tree new_expr = unshare_expr (MEM_EXPR (*iter));
+   tree orig_new_expr = new_expr;
if (TREE_CODE (new_expr) == WITH_SIZE_EXPR)
  new_expr = TREE_OPERAND (new_expr, 0);
while (handled_component_p (new_expr))
  new_expr = TREE_OPERAND (new_expr, 0);
MR_DEPENDENCE_CLIQUE (new_expr) = newc;
-   set_mem_expr (const_cast  (*iter), new_expr);
+   set_mem_expr (const_cast  (*iter), orig_new_expr);
  }
  }
}


[gcc r15-282] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-05-07 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:74690ff96b263b8609639b7fbc5d6afd3f19cb98

commit r15-282-g74690ff96b263b8609639b7fbc5d6afd3f19cb98
Author: Alex Coplan 
Date:   Wed Apr 10 16:30:36 2024 +0100

aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

The ldp/stp fusion pass can change the base of an access so that the two
accesses end up using a common base register.  So far we have been using
adjust_address_nv to do this, but this means that we don't preserve
other properties of the mem we're replacing.  It seems better to use
replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
mem whose address we're changing.

The PR shows that by adjusting the other mem we lose alignment
information about the original access and therefore end up rejecting an
otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
This patch fixes that by using replace_equiv_address_nv instead.

Notably this is the same approach as taken by
aarch64_check_consecutive_mems when a change of base is required, so
this at least makes things more consistent between the ldp fusion pass
and the peepholes.

gcc/ChangeLog:

PR target/114674
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Use replace_equiv_address_nv on a change of base instead of
adjust_address_nv on the other access.

gcc/testsuite/ChangeLog:

PR target/114674
* gcc.target/aarch64/pr114674.c: New test.

Diff:
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc|  8 
 gcc/testsuite/gcc.target/aarch64/pr114674.c | 17 +
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 0bc225dae7b..437f0aeb787 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1722,11 +1722,11 @@ ldp_bb_info::fuse_pair (bool load_p,
adjust_amt *= -1;
 
   rtx change_reg = XEXP (change_pat, !load_p);
-  machine_mode mode_for_mem = GET_MODE (change_mem);
   rtx effective_base = drop_writeback (base_mem);
-  rtx new_mem = adjust_address_nv (effective_base,
-  mode_for_mem,
-  adjust_amt);
+  rtx adjusted_addr = plus_constant (Pmode,
+XEXP (effective_base, 0),
+adjust_amt);
+  rtx new_mem = replace_equiv_address_nv (change_mem, adjusted_addr);
   rtx new_set = load_p
? gen_rtx_SET (change_reg, new_mem)
: gen_rtx_SET (new_mem, change_reg);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
b/gcc/testsuite/gcc.target/aarch64/pr114674.c
new file mode 100644
index 000..944784fd008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
+typedef struct {
+   unsigned int f1;
+   unsigned int f2;
+} test_struct;
+
+static test_struct ts = {
+   123, 456
+};
+
+void foo(void)
+{
+   ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
+   ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
+}
+/* { dg-final { scan-assembler-times "stp" 1 } } */


[gcc r15-320] aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

2024-05-08 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:73c8e24b692e691c665d0f1f5424432837bd8c06

commit r15-320-g73c8e24b692e691c665d0f1f5424432837bd8c06
Author: Alex Coplan 
Date:   Fri May 3 14:12:32 2024 +

aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

This fixes a typo in combine_reg_notes in the load/store pair fusion
pass.  As it stands, the calls to filter_notes store any
REG_FRAME_RELATED_EXPR to fr_expr with the following association:

 - i2 -> fr_expr[0]
 - i1 -> fr_expr[1]

but then the checks inside the following if statement expect the
opposite (more natural) association, i.e.:

 - i2 -> fr_expr[1]
 - i1 -> fr_expr[0]

this patch fixes the oversight by swapping the fr_expr indices in the
calls to filter_notes.

In hindsight it would probably have been less confusing / error-prone to
have combine_reg_notes take an array of two insns, then we wouldn't have
to mix 1-based and 0-based indexing as well as remembering to call
filter_notes in reverse program order.  This however is a minimal fix
for backporting purposes.

gcc/ChangeLog:

PR target/114936
* config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
FR_EXPR[N-1], thus matching the correspondence expected by the
copy_rtx calls.

Diff:
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 437f0aeb7877..1d9caeab05d4 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1085,9 +1085,9 @@ combine_reg_notes (insn_info *i1, insn_info *i2, bool 
load_p)
   bool found_eh_region = false;
   rtx result = NULL_RTX;
   result = filter_notes (REG_NOTES (i2->rtl ()), result,
-&found_eh_region, fr_expr);
-  result = filter_notes (REG_NOTES (i1->rtl ()), result,
 &found_eh_region, fr_expr + 1);
+  result = filter_notes (REG_NOTES (i1->rtl ()), result,
+&found_eh_region, fr_expr);
 
   if (!load_p)
 {


[gcc r15-3254] testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

2024-08-28 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:3f51f0dc88ec21c1ec79df694200f10ef85915f4

commit r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
Author: Alex Coplan 
Date:   Tue Aug 27 16:51:12 2024 +

testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

This extends the scan-ltrans-tree* helpers to create RTL variants.  This
is needed to check the behaviour of an RTL pass under LTO.

gcc/ChangeLog:

PR libstdc++/116140
* doc/sourcebuild.texi: Document ltrans-rtl value of kind for
scan--dump*.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
(scan-ltrans-rtl-dump-not): New.
(scan-ltrans-rtl-dump-dem): New.
(scan-ltrans-rtl-dump-dem-not): New.
(scan-ltrans-rtl-dump-times): New.

Diff:
---
 gcc/doc/sourcebuild.texi |  4 +-
 gcc/testsuite/lib/scanltranstree.exp | 80 +---
 2 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 3c55f1037953..0636fc0567c5 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3627,8 +3627,8 @@ stands for zero or more unmatched lines; the whitespace 
after
 @subsubsection Scan optimization dump files
 
 These commands are available for @var{kind} of @code{tree}, @code{ltrans-tree},
-@code{offload-tree}, @code{rtl}, @code{offload-rtl}, @code{ipa},
-@code{offload-ipa}, and @code{wpa-ipa}.
+@code{offload-tree}, @code{rtl}, @code{ltrans-rtl}, @code{offload-rtl},
+@code{ipa}, @code{offload-ipa}, and @code{wpa-ipa}.
 
 @table @code
 @item scan-@var{kind}-dump @var{regex} @var{suffix} [@{ target/xfail 
@var{selector} @}]
diff --git a/gcc/testsuite/lib/scanltranstree.exp 
b/gcc/testsuite/lib/scanltranstree.exp
index bc6e02dc3696..a7d4de3765fc 100644
--- a/gcc/testsuite/lib/scanltranstree.exp
+++ b/gcc/testsuite/lib/scanltranstree.exp
@@ -19,50 +19,44 @@
 
 load_lib scandump.exp
 
-# The first item in the list is an LTO equivalent of the second item
-# in the list; see the documentation of the second item for details.
-foreach { name scan type suffix } {
-scan-ltrans-tree-dump scan-dump ltrans-tree t
-scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
-scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
-scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
-} {
-eval [string map [list @NAME@ $name \
-  @SCAN@ $scan \
-  @TYPE@ $type \
-  @SUFFIX@ $suffix] {
-proc @NAME@ { args } {
-   if { [llength $args] < 2 } {
-   error "@NAME@: too few arguments"
-   return
-   }
-   if { [llength $args] > 3 } {
-   error "@NAME@: too many arguments"
-   return
+# Define scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.  These are LTO
+# variants of the corresponding functions without -ltrans in the name.
+foreach ir { tree rtl } {
+foreach modifier { {} -not -dem -dem-not } {
+   eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
+  @SCAN@ scan$modifier \
+  @TYPE@ ltrans-$ir \
+  @SUFFIX@ [string index $ir 0]] {
+   proc @NAME@ { args } {
+   if { [llength $args] < 2 } {
+   error "@NAME@: too few arguments"
+   return
+   }
+   if { [llength $args] > 3 } {
+   error "@NAME@: too many arguments"
+   return
+   }
+   if { [llength $args] >= 3 } {
+   @SCAN@ @TYPE@ [lindex $args 0] \
+   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
+   ".ltrans0.ltrans" \
+   [lindex $args 2]
+   } else {
+   @SCAN@ @TYPE@ [lindex $args 0] \
+   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
+   ".ltrans0.ltrans"
+   }
}
-   if { [llength $args] >= 3 } {
-   @SCAN@ @TYPE@ [lindex $args 0] \
-   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
-   ".ltrans0.ltrans" \
-   [lindex $args 2]
-   } else {
-   @SCAN@ @TYPE@ [lindex $args 0] \
-   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
-   ".ltrans0.ltrans"
-   }
-}
-}]
+   }]
+}
 }
 
-# The first item in the list is an LTO equivalent of the second item
-# in the list; see the documentation of the second item for details.
-foreach { name scan type suffix } {
-scan-ltrans-tree-dump-times scan-dump-times ltrans-tree t
-} {
-eval [string map [list @NAME@ $name \
-  @SCAN@ $scan \
-

[gcc r15-3283] testsuite: Fix up refactored scanltranstree.exp functions [PR116522]

2024-08-29 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:4b729d2ff3259e5b1d40f93d4f9e7edf5f0064f4

commit r15-3283-g4b729d2ff3259e5b1d40f93d4f9e7edf5f0064f4
Author: Alex Coplan 
Date:   Thu Aug 29 11:31:40 2024 +0100

testsuite: Fix up refactored scanltranstree.exp functions [PR116522]

When adding RTL variants of the scan-ltrans-tree* functions in:
r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
I messed up the name of the underlying scan function to invoke.  The
code currently attempts to invoke functions named
scan{,-not,-dem,-dem-not} but should instead be invoking
scan-dump{,-not,-dem,-dem-not}.  This patch fixes that.

gcc/testsuite/ChangeLog:

PR testsuite/116522
* lib/scanltranstree.exp: Fix name of underlying scan function
used for scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.

Diff:
---
 gcc/testsuite/lib/scanltranstree.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/scanltranstree.exp 
b/gcc/testsuite/lib/scanltranstree.exp
index a7d4de3765fc..3d85813ea2f9 100644
--- a/gcc/testsuite/lib/scanltranstree.exp
+++ b/gcc/testsuite/lib/scanltranstree.exp
@@ -24,7 +24,7 @@ load_lib scandump.exp
 foreach ir { tree rtl } {
 foreach modifier { {} -not -dem -dem-not } {
eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
-  @SCAN@ scan$modifier \
+  @SCAN@ scan-dump$modifier \
   @TYPE@ ltrans-$ir \
   @SUFFIX@ [string index $ir 0]] {
proc @NAME@ { args } {


[gcc r15-3322] gdbhooks: Fix printing of vec with vl_ptr layout

2024-08-30 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:5020f8ea80af90d8a08eff9fdef3276056df98f5

commit r15-3322-g5020f8ea80af90d8a08eff9fdef3276056df98f5
Author: Alex Coplan 
Date:   Fri Aug 30 15:29:34 2024 +0100

gdbhooks: Fix printing of vec with vl_ptr layout

As it stands, the pretty printing of GCC's vecs by gdbhooks.py only
handles vectors with vl_embed layout.  As such, when encountering a vec
with vl_ptr layout, GDB would print a diagnostic like:

  gdb.error: There is no member or method named m_vecpfx.

when (e.g.) any such vec occurred in a backtrace.  This patch extends
VecPrinter.children to also handle vl_ptr vectors.

gcc/ChangeLog:

* gdbhooks.py (VEC_KIND_EMBED): New.
(VEC_KIND_PTR): New.
(get_vec_kind): New.
(VecPrinter.children): Also handle vectors with vl_ptr layout.

Diff:
---
 gcc/gdbhooks.py | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 7a64c03b8acb..904ee28423a9 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -453,6 +453,25 @@ class PassPrinter:
 
 ##
 
+VEC_KIND_EMBED = 0
+VEC_KIND_PTR = 1
+
+"""
+Given a vec or pointer to vec, return its layout (embedded or space
+efficient).
+"""
+def get_vec_kind(val):
+typ = val.type
+if typ.code == gdb.TYPE_CODE_PTR:
+typ = typ.target()
+kind = typ.template_argument(2).name
+if kind == "vl_embed":
+return VEC_KIND_EMBED
+elif kind == "vl_ptr":
+return VEC_KIND_PTR
+else:
+assert False, f"unexpected vec kind {kind}"
+
 class VecPrinter:
 #-ex "up" -ex "p bb->preds"
 def __init__(self, gdbval):
@@ -467,11 +486,16 @@ class VecPrinter:
 return '0x%x' % intptr(self.gdbval)
 
 def children (self):
-if intptr(self.gdbval) == 0:
+val = self.gdbval
+if intptr(val) != 0 and get_vec_kind(val) == VEC_KIND_PTR:
+val = val['m_vec']
+
+if intptr(val) == 0:
 return
-m_vecpfx = self.gdbval['m_vecpfx']
+
+assert get_vec_kind(val) == VEC_KIND_EMBED
+m_vecpfx = val['m_vecpfx']
 m_num = m_vecpfx['m_num']
-val = self.gdbval
 typ = val.type
 if typ.code == gdb.TYPE_CODE_PTR:
 typ = typ.target()


[gcc r15-3378] testsuite: Rename scanltranstree.exp -> scanltrans.exp

2024-09-02 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:e4d3e7f9add34216f4baffd3124bcb22a82c39bf

commit r15-3378-ge4d3e7f9add34216f4baffd3124bcb22a82c39bf
Author: Alex Coplan 
Date:   Wed Aug 28 14:53:11 2024 +0100

testsuite: Rename scanltranstree.exp -> scanltrans.exp

Since r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
added scan-ltrans-rtl* variants to scanltranstree.exp, it no longer
makes sense to have "tree" in the name.  This renames the file
accordingly and updates users.

libatomic/ChangeLog:

* testsuite/lib/libatomic.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libgomp/ChangeLog:

* testsuite/lib/libgomp.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libitm/ChangeLog:

* testsuite/lib/libitm.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libphobos/ChangeLog:

* testsuite/lib/libphobos-dg.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libvtv/ChangeLog:

* testsuite/lib/libvtv.exp: Load scanltrans.exp instead of
scanltranstree.exp.

gcc/testsuite/ChangeLog:

* gcc.dg-selftests/dg-final.exp: Load scanltrans.exp instead of
scanltranstree.exp.
* lib/gcc-dg.exp: Likewise.
* lib/scanltranstree.exp: Rename to ...
* lib/scanltrans.exp: ... this.

Diff:
---
 gcc/testsuite/gcc.dg-selftests/dg-final.exp  | 2 +-
 gcc/testsuite/lib/gcc-dg.exp | 2 +-
 gcc/testsuite/lib/{scanltranstree.exp => scanltrans.exp} | 0
 libatomic/testsuite/lib/libatomic.exp| 2 +-
 libgomp/testsuite/lib/libgomp.exp| 2 +-
 libitm/testsuite/lib/libitm.exp  | 2 +-
 libphobos/testsuite/lib/libphobos-dg.exp | 2 +-
 libvtv/testsuite/lib/libvtv.exp  | 2 +-
 8 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.dg-selftests/dg-final.exp 
b/gcc/testsuite/gcc.dg-selftests/dg-final.exp
index 6b6f32e0510c..5503b0c09114 100644
--- a/gcc/testsuite/gcc.dg-selftests/dg-final.exp
+++ b/gcc/testsuite/gcc.dg-selftests/dg-final.exp
@@ -23,7 +23,7 @@ load_lib "scanlang.exp"
 load_lib "lto.exp"
 load_lib "scanasm.exp"
 load_lib "scanwpaipa.exp"
-load_lib "scanltranstree.exp"
+load_lib "scanltrans.exp"
 load_lib "scanoffloadtree.exp"
 load_lib "scanoffloadrtl.exp"
 load_lib "gcc-dg.exp"
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 992062103c12..d9513e2859ce 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -21,7 +21,7 @@ load_lib target-supports-dg.exp
 load_lib scanasm.exp
 load_lib scanrtl.exp
 load_lib scantree.exp
-load_lib scanltranstree.exp
+load_lib scanltrans.exp
 load_lib scanipa.exp
 load_lib scanwpaipa.exp
 load_lib scanlang.exp
diff --git a/gcc/testsuite/lib/scanltranstree.exp 
b/gcc/testsuite/lib/scanltrans.exp
similarity index 100%
rename from gcc/testsuite/lib/scanltranstree.exp
rename to gcc/testsuite/lib/scanltrans.exp
diff --git a/libatomic/testsuite/lib/libatomic.exp 
b/libatomic/testsuite/lib/libatomic.exp
index ed6ba806732f..642530557f78 100644
--- a/libatomic/testsuite/lib/libatomic.exp
+++ b/libatomic/testsuite/lib/libatomic.exp
@@ -38,7 +38,7 @@ load_gcc_lib scanlang.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scansarif.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanipa.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib multiline.exp
diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 7c1092629168..2d0339b5e565 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -31,7 +31,7 @@ load_gcc_lib scanlang.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scansarif.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanoffload.exp
 load_gcc_lib scanoffloadipa.exp
 load_gcc_lib scanoffloadtree.exp
diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
index 3e60797c3e31..0182234a24ab 100644
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -44,7 +44,7 @@ load_gcc_lib scanlang.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scansarif.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanipa.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib timeout-dg.exp
diff --git a/libphobos/testsuite/lib/libphobos-dg.exp 
b/libphobos/testsuite/lib/libphobos-dg.exp
index 965ff025a04d..90bc02ef5e5e 100644
--- a/libphobos/testsuite/lib/libphobos-dg.exp
+++ b/libphobos/testsuite/lib/libphobos-dg.exp
@@ -17,7 +17,7 @@
 load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib scandump.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 lo

[gcc r15-3583] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-09-11 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:f97d86242b86e4ad2bef3623c97e91481840a210

commit r15-3583-gf97d86242b86e4ad2bef3623c97e91481840a210
Author: Alex Coplan 
Date:   Fri Aug 2 09:52:50 2024 +0100

c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions 
[PR116140]

For the testcase added with this patch, we would end up losing the:

  #pragma GCC unroll 4

and emitting "warning: ignoring loop annotation".  That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.

The function documents the following pre-condition:

   /* [...] We assume that the annotations come immediately before the
  condition in BB, if any.  */

now looking at the exiting BB of the loop, we have:

   :
  D.4524 = .ANNOTATE (iftmp.1, 1, 4);
  retval.0 = D.4524;
  if (retval.0 != 0)
goto ; [INV]
  else
goto ; [INV]

and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call.  To see where this comes from, we can
look to the IR given by -fdump-tree-original:

  if (<::operator() (&pred, *first), unroll 4>>>)
goto ;
  else
goto ;

here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.

The CLEANUP_POINT_EXPR gets added by the following call chain:

finish_while_stmt_cond
 -> maybe_convert_cond
 -> condition_conversion
 -> fold_build_cleanup_point_expr

this patch chooses to fix the issue by first introducing a new helper
class (annotate_saver) to save and restore outer chains of
ANNOTATE_EXPRs and then using it in maybe_convert_cond.

With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.

gcc/cp/ChangeLog:

PR libstdc++/116140
* semantics.cc (anotate_saver): New. Use it ...
(maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
remain the outermost expression(s) of the condition.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda.C: New test.

Diff:
---
 gcc/cp/semantics.cc | 88 -
 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda.C | 17 +
 2 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 3e117c216da5..63212afafb3b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
nested_p,
 }
 }
 
+/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, users
+   can construct one of these like so:
+
+ annotate_saver s (&t);
+
+   and t will be updated to have any annotations removed.  The user can then
+   transform t, and later restore the ANNOTATE_EXPRs with:
+
+ t = s.restore (t).
+
+   The intent is to ensure that any ANNOTATE_EXPRs remain the outermost
+   expressions following any operations on t.  */
+
+class annotate_saver {
+  /* The chain of saved annotations, if there were any.  Otherwise null.  */
+  tree m_annotations;
+
+  /* If M_ANNOTATIONS is non-null, then M_INNER points to TREE_OPERAND (A, 0)
+ for the innermost annotation A.  */
+  tree *m_inner;
+
+public:
+  annotate_saver (tree *);
+  tree restore (tree);
+};
+
+/* If *COND is an ANNOTATE_EXPR, walk through the chain of annotations, and set
+   *COND equal to the first non-ANNOTATE_EXPR (saving a pointer to the
+   original chain of annotations for later use in restore).  */
+
+annotate_saver::annotate_saver (tree *cond) : m_annotations (nullptr)
+{
+  tree *t = cond;
+  while (TREE_CODE (*t) == ANNOTATE_EXPR)
+t = &TREE_OPERAND (*t, 0);
+
+  if (t != cond)
+{
+  m_annotations = *cond;
+  *cond = *t;
+  m_inner = t;
+}
+}
+
+/* If we didn't strip any annotations on construction, return NEW_INNER
+   unmodified.  Otherwise, wrap the saved annotations around NEW_INNER 
(updating
+   the types and flags of the annotations if needed) and return the resulting
+   expression.  */
+
+tree
+annotate_saver::restore (tree new_inner)
+{
+  if (!m_annotations)
+return new_inner;
+
+  /* If the type of the inner expression changed, we need to update the types
+ of all the ANNOTATE_EXPRs.  We may need to update the flags too, but we
+ assume they only change if the type of the inner expression changes.
+ The flag update logic assumes that the other operands to the
+ ANNOTATE_EXPRs are always INTEGER_CS

[gcc r15-3584] testsuite: Ensure ltrans dump files get cleaned up properly [PR116140]

2024-09-11 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:31ff173c70847bba94613eac5b1ef2c0bec842e6

commit r15-3584-g31ff173c70847bba94613eac5b1ef2c0bec842e6
Author: Alex Coplan 
Date:   Thu Aug 8 13:15:39 2024 +

testsuite: Ensure ltrans dump files get cleaned up properly [PR116140]

I noticed while working on a test that uses LTO and requests a dump
file, that we are failing to cleanup ltrans dump files in the testsuite.

E.g. the test I was working on compiles with -flto
-fdump-rtl-loop2_unroll, and we end up with the following file:

./gcc/testsuite/g++/pr116140.ltrans0.ltrans.287r.loop2_unroll

being left behind by the testsuite.  This is problematic not just from a
"missing cleanup" POV, but also because it can cause the test to pass
spuriously when the test is re-run wtih an unpatched compiler (without
the bug fix).  In the broken case, loop2_unroll isn't run at all, so we
end up scanning the old dumpfile (from the previous test run) and making
the dumpfile scan pass.

Running with `-v -v` in RUNTESTFLAGS we can see the following cleanup
attempt is made:

remove-build-file 
`pr116140.{C,exe}.{ltrans[0-9]*.,}[0-9][0-9][0-9]{l,i,r,t}.*'

looking again at the ltrans dump file above we can see this will fail for 
two
reasons:

 - The actual dump file has no {C,exe} extension between the basename and
   ltrans0.
 - The actual dump file has an additional `.ltrans` component after 
`.ltrans0`.

This patch therefore relaxes the pattern constructed for cleaning up such
dumpfiles to also match dumpfiles with the above form.

Running the testsuite before/after this patch shows the number of files in
gcc/testsuite (in the build dir) with "ltrans" in the name goes from 1416 
to 62
on aarch64.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* lib/gcc-dg.exp (schedule-cleanups): Relax ltrans dumpfile
cleanup pattern to handle missing cases.

Diff:
---
 gcc/testsuite/lib/gcc-dg.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index d9513e2859ce..cb401a704359 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -190,7 +190,7 @@ proc schedule-cleanups { opts } {
 # Handle ltrans files around -flto
 if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] {
verbose "Cleanup -flto seen" 4
-   set ltrans "{ltrans\[0-9\]*.,}"
+   set ltrans "{ltrans\[0-9\]*{.ltrans,}.,}"
 } else {
set ltrans ""
 }
@@ -206,7 +206,7 @@ proc schedule-cleanups { opts } {
if {$basename_ext != ""} {
regsub -- {^.*\.} $basename_ext {} basename_ext
}
-   lappend tfiles "$stem.{$basename_ext,exe}"
+   lappend tfiles "$stem{.$basename_ext,.exe,}"
unset basename_ext
} else {
lappend tfiles $basename


[gcc r15-3585] lto: Stream has_unroll flag during LTO [PR116140]

2024-09-11 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:9759f6299d9633cabac540e5c893341c708093ac

commit r15-3585-g9759f6299d9633cabac540e5c893341c708093ac
Author: Alex Coplan 
Date:   Sat Aug 3 17:02:36 2024 +

lto: Stream has_unroll flag during LTO [PR116140]

When #pragma GCC unroll is processed in
tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
field (which is currently streamed out and back in during LTO) but also
the cfun->has_unroll flag.

cfun->has_unroll, however, is not currently streamed during LTO.  This
patch fixes that.

Prior to this patch, loops marked with #pragma GCC unroll that would be
unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
unrolled under LTO.

gcc/ChangeLog:

PR libstdc++/116140
* lto-streamer-in.cc (input_struct_function_base): Stream in
fn->has_unroll.
* lto-streamer-out.cc (output_struct_function_base): Stream out
fn->has_unroll.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda-lto.C: New test.

Diff:
---
 gcc/lto-streamer-in.cc |  1 +
 gcc/lto-streamer-out.cc|  1 +
 .../g++.dg/ext/pragma-unroll-lambda-lto.C  | 32 ++
 3 files changed, 34 insertions(+)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 64f758073280..9d0ec5d589c4 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1326,6 +1326,7 @@ input_struct_function_base (struct function *fn, class 
data_in *data_in,
   fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
   fn->has_simduid_loops = bp_unpack_value (&bp, 1);
   fn->has_musttail = bp_unpack_value (&bp, 1);
+  fn->has_unroll = bp_unpack_value (&bp, 1);
   fn->assume_function = bp_unpack_value (&bp, 1);
   fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
   fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index a4b171358d41..807b935537be 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2283,6 +2283,7 @@ output_struct_function_base (struct output_block *ob, 
struct function *fn)
   bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
   bp_pack_value (&bp, fn->has_simduid_loops, 1);
   bp_pack_value (&bp, fn->has_musttail, 1);
+  bp_pack_value (&bp, fn->has_unroll, 1);
   bp_pack_value (&bp, fn->assume_function, 1);
   bp_pack_value (&bp, fn->va_list_fpr_size, 8);
   bp_pack_value (&bp, fn->va_list_gpr_size, 8);
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C 
b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
new file mode 100644
index ..144c4c326924
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -0,0 +1,32 @@
+// { dg-do link { target c++11 } }
+// { dg-options "-O2 -flto -fdump-rtl-loop2_unroll" }
+
+#include 
+
+template
+inline Iter
+my_find(Iter first, Iter last, Pred pred)
+{
+#pragma GCC unroll 4
+while (first != last && !pred(*first))
+++first;
+return first;
+}
+
+__attribute__((noipa))
+short *use_find(short *p)
+{
+auto pred = [](short x) { return x == 42; };
+return my_find(p, p + 1024, pred);
+}
+
+int main(void)
+{
+  short a[1024];
+  for (int i = 0; i < 1024; i++)
+a[i] = rand ();
+
+  return use_find (a) - a;
+}
+
+// { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 
"loop2_unroll" } }


[gcc r14-10369] aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

2024-07-02 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:8eb469546f7f32dcec5d3376dfb419c1d4f21e4a

commit r14-10369-g8eb469546f7f32dcec5d3376dfb419c1d4f21e4a
Author: Alex Coplan 
Date:   Fri May 3 14:12:32 2024 +

aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

This fixes a typo in combine_reg_notes in the load/store pair fusion
pass.  As it stands, the calls to filter_notes store any
REG_FRAME_RELATED_EXPR to fr_expr with the following association:

 - i2 -> fr_expr[0]
 - i1 -> fr_expr[1]

but then the checks inside the following if statement expect the
opposite (more natural) association, i.e.:

 - i2 -> fr_expr[1]
 - i1 -> fr_expr[0]

this patch fixes the oversight by swapping the fr_expr indices in the
calls to filter_notes.

In hindsight it would probably have been less confusing / error-prone to
have combine_reg_notes take an array of two insns, then we wouldn't have
to mix 1-based and 0-based indexing as well as remembering to call
filter_notes in reverse program order.  This however is a minimal fix
for backporting purposes.

gcc/ChangeLog:

PR target/114936
* config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
FR_EXPR[N-1], thus matching the correspondence expected by the
copy_rtx calls.

(cherry picked from commit 73c8e24b692e691c665d0f1f5424432837bd8c06)

Diff:
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..1fc25e389cf 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1093,9 +1093,9 @@ combine_reg_notes (insn_info *i1, insn_info *i2, bool 
load_p)
   bool found_eh_region = false;
   rtx result = NULL_RTX;
   result = filter_notes (REG_NOTES (i2->rtl ()), result,
-&found_eh_region, fr_expr);
-  result = filter_notes (REG_NOTES (i1->rtl ()), result,
 &found_eh_region, fr_expr + 1);
+  result = filter_notes (REG_NOTES (i1->rtl ()), result,
+&found_eh_region, fr_expr);
 
   if (!load_p)
 {


[gcc r15-1854] middle-end: Add debug functions to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:ae07f62a70ee2d0fdd7d8786122ae6360cfd4ca9

commit r15-1854-gae07f62a70ee2d0fdd7d8786122ae6360cfd4ca9
Author: Alex Coplan 
Date:   Fri Jul 5 11:57:56 2024 +0100

middle-end: Add debug functions to dump dominator tree in dot format

This adds debug functions to dump the dominator tree in dot format.
There are two overloads: one which takes a FILE * and another which
takes a const char *fname and wraps the first with fopen/fclose for
convenience.

gcc/ChangeLog:

* dominance.cc (dot_dominance_tree): New.

Diff:
---
 gcc/dominance.cc | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/dominance.cc b/gcc/dominance.cc
index 0357210ed27..c14d997ded7 100644
--- a/gcc/dominance.cc
+++ b/gcc/dominance.cc
@@ -1658,6 +1658,36 @@ debug_dominance_info (enum cdi_direction dir)
   fprintf (stderr, "%i %i\n", bb->index, bb2->index);
 }
 
+/* Dump the dominance tree in direction DIR to the file F in dot form.
+   This allows easily visualizing the tree using graphviz.  */
+
+DEBUG_FUNCTION void
+dot_dominance_tree (FILE *f, enum cdi_direction dir)
+{
+  fprintf (f, "digraph {\n");
+  basic_block bb, idom;
+  FOR_EACH_BB_FN (bb, cfun)
+if ((idom = get_immediate_dominator (dir, bb)))
+  fprintf (f, "%i -> %i;\n", idom->index, bb->index);
+  fprintf (f, "}\n");
+}
+
+/* Convenience wrapper around the above that dumps the dominance tree in
+   direction DIR to the file at path FNAME in dot form.  */
+
+DEBUG_FUNCTION void
+dot_dominance_tree (const char *fname, enum cdi_direction dir)
+{
+  FILE *f = fopen (fname, "w");
+  if (f)
+{
+  dot_dominance_tree (f, dir);
+  fclose (f);
+}
+  else
+fprintf (stderr, "failed to open %s: %s\n", fname, xstrerror (errno));
+}
+
 /* Prints to stderr representation of the dominance tree (for direction DIR)
rooted in ROOT, indented by INDENT tabulators.  If INDENT_FIRST is false,
the first line of the output is not indented.  */


[gcc r14-9814] aarch64: Fix whitespace in aarch64-ldp-fusion.cc:alias_walker

2024-04-05 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:e7d015b2506a1d9e84d9f7182e42e097147527e1

commit r14-9814-ge7d015b2506a1d9e84d9f7182e42e097147527e1
Author: Alex Coplan 
Date:   Fri Apr 5 17:53:47 2024 +0100

aarch64: Fix whitespace in aarch64-ldp-fusion.cc:alias_walker

I spotted this whitespace error during the review of
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648846.html

gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc (struct alias_walker):
Fix double space after const qualifier on valid ().

Diff:
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 22ed95eb743..365dcf48b22 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -2138,7 +2138,7 @@ struct alias_walker
 {
   virtual bool conflict_p (int &budget) const = 0;
   virtual insn_info *insn () const = 0;
-  virtual bool valid () const  = 0;
+  virtual bool valid () const = 0;
   virtual void advance () = 0;
 };


[gcc r15-126] cfgrtl: Fix MEM_EXPR update in duplicate_insn_chain [PR114924]

2024-05-03 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:fe40d525619eee9c2821126390df75068df4773a

commit r15-126-gfe40d525619eee9c2821126390df75068df4773a
Author: Alex Coplan 
Date:   Fri May 3 09:23:59 2024 +0100

cfgrtl: Fix MEM_EXPR update in duplicate_insn_chain [PR114924]

The PR shows that when cfgrtl.cc:duplicate_insn_chain attempts to
update the MR_DEPENDENCE_CLIQUE information for a MEM_EXPR we can end up
accidentally dropping (e.g.) an ARRAY_REF from the MEM_EXPR and end up
replacing it with the underlying MEM_REF.  This leads to an
inconsistency in the MEM_EXPR information, and could lead to wrong code.

While the walk down to the MEM_REF is necessary to update
MR_DEPENDENCE_CLIQUE, we should use the outer tree expression for the
MEM_EXPR.  This patch does that.

gcc/ChangeLog:

PR rtl-optimization/114924
* cfgrtl.cc (duplicate_insn_chain): When updating MEM_EXPRs,
don't strip (e.g.) ARRAY_REFs from the final MEM_EXPR.

Diff:
---
 gcc/cfgrtl.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index 304c429c99b..a5dc3512159 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -4432,12 +4432,13 @@ duplicate_insn_chain (rtx_insn *from, rtx_insn *to,
   since MEM_EXPR is shared so make a copy and
   walk to the subtree again.  */
tree new_expr = unshare_expr (MEM_EXPR (*iter));
+   tree orig_new_expr = new_expr;
if (TREE_CODE (new_expr) == WITH_SIZE_EXPR)
  new_expr = TREE_OPERAND (new_expr, 0);
while (handled_component_p (new_expr))
  new_expr = TREE_OPERAND (new_expr, 0);
MR_DEPENDENCE_CLIQUE (new_expr) = newc;
-   set_mem_expr (const_cast  (*iter), new_expr);
+   set_mem_expr (const_cast  (*iter), orig_new_expr);
  }
  }
}


[gcc r15-2714] gdbhooks: Make dot viewer configurable

2024-08-05 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:08cc516a8cfe553064f84a86be4c30f05a614342

commit r15-2714-g08cc516a8cfe553064f84a86be4c30f05a614342
Author: Alex Coplan 
Date:   Mon Aug 5 08:45:29 2024 +0100

gdbhooks: Make dot viewer configurable

This adds a new GDB parameter 'gcc-dot-cmd' which allows the user to
configure the command used to render the CFG within dot-fn.

E.g. with this patch the user can change their dot viewer like so:

(gdb) show gcc-dot-cmd
The current value of 'gcc-dot-cmd' is "dot -Tx11".
(gdb) set gcc-dot-cmd xdot
(gdb) dot-fn # opens in xdot

The second patch in this series adds a hook which users can define in
their .gdbinit in order to be called when the GCC extensions have
finished loading, thus allowing users to automatically configure
gcc-dot-cmd as desired in their .gdbinit.

gcc/ChangeLog:

* gdbhooks.py (GCCDotCmd): New.
(gcc_dot_cmd): New. Use it ...
(DotFn.invoke): ... here.

Diff:
---
 gcc/gdbhooks.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 92e38880a70a..db8ce0d071bb 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -783,6 +783,18 @@ class DumpFn(gdb.Command):
 
 DumpFn()
 
+class GCCDotCmd(gdb.Parameter):
+"""
+This parameter controls the command used to render dot files within
+GCC's dot-fn command.  It will be invoked as gcc-dot-cmd .
+"""
+def __init__(self):
+super(GCCDotCmd, self).__init__('gcc-dot-cmd',
+gdb.COMMAND_NONE, gdb.PARAM_STRING)
+self.value = "dot -Tx11"
+
+gcc_dot_cmd = GCCDotCmd()
+
 class DotFn(gdb.Command):
 """
 A custom command to show a gimple/rtl function control flow graph.
@@ -848,7 +860,8 @@ class DotFn(gdb.Command):
 return
 
 # Show graph in temp file
-os.system("( dot -Tx11 \"%s\"; rm \"%s\" ) &" % (filename, filename))
+dot_cmd = gcc_dot_cmd.value
+os.system("( %s \"%s\"; rm \"%s\" ) &" % (dot_cmd, filename, filename))
 
 DotFn()


[gcc r15-2715] gdbhooks: Add attempt to invoke on-gcc-hooks-load

2024-08-05 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:f01df5e47b2551e0f435a9efa8e0a30142f3d46b

commit r15-2715-gf01df5e47b2551e0f435a9efa8e0a30142f3d46b
Author: Alex Coplan 
Date:   Mon Aug 5 08:45:58 2024 +0100

gdbhooks: Add attempt to invoke on-gcc-hooks-load

This extends GCC's GDB hooks to attempt invoking the user-defined
command "on-gcc-hooks-load".  The idea is that users can define the
command in their .gdbinit to override the default values of parameters
defined by GCC's GDB extensions.

For example, together with the previous patch, I can add the following
fragment to my .gdbinit:

define on-gcc-hooks-load
  set gcc-dot-cmd xdot
end

which means, once the GCC extensions get loaded, whenever I invoke
dot-fn then the graph will be rendered using xdot.

The try/except should make this patch a no-op for users that don't
currently define this command.  I looked for a way to test explicitly
for whether a GDB command exists but didn't find one.

This is needed because the user's .gdbinit is sourced before GCC's GDB
extensions are loaded, and GCC-specific parameters can't be configured
before they are defined.

gcc/ChangeLog:

* gdbhooks.py: Add attempted call to "on-gcc-hooks-load" once
we've finished loading the hooks.

Diff:
---
 gcc/gdbhooks.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index db8ce0d071bb..7a64c03b8acb 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -865,4 +865,12 @@ class DotFn(gdb.Command):
 
 DotFn()
 
+# Try and invoke the user-defined command "on-gcc-hooks-load".  Doing
+# this allows users to customize the GCC extensions once they've been
+# loaded by defining the hook in their .gdbinit.
+try:
+gdb.execute('on-gcc-hooks-load')
+except gdb.error:
+pass
+
 print('Successfully loaded GDB hooks for GCC')


[gcc r15-4106] testsuite: Prevent unrolling of main in LTO test [PR116683]

2024-10-07 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:7faadb1f261c6b8ef988c400c39ec7df09839dbe

commit r15-4106-g7faadb1f261c6b8ef988c400c39ec7df09839dbe
Author: Alex Coplan 
Date:   Thu Sep 26 16:36:48 2024 +0100

testsuite: Prevent unrolling of main in LTO test [PR116683]

In r15-3585-g9759f6299d9633cabac540e5c893341c708093ac I added a test which
started failing on PowerPC.  The test checks that we unroll exactly one loop
three times with the following:

// { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 
"loop2_unroll" } }

which passes on most targets.  However, on PowerPC, the loop in main
gets unrolled too, causing the scan-ltrans-rtl-dump-times check to fail
as the statement now appears twice in the dump.  I think the extra
unrolling is due to different unrolling heuristics in the rs6000 port.

This patch therefore explicitly tries to block the unrolling in main with an
appropriate #pragma.

gcc/testsuite/ChangeLog:

PR testsuite/116683
* g++.dg/ext/pragma-unroll-lambda-lto.C (main): Add #pragma to
prevent unrolling of the setup loop.

Diff:
---
 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C 
b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
index ddf11730e338..0db57c8d3a01 100644
--- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -25,6 +25,7 @@ short *use_find(short *p)
 int main(void)
 {
   short a[1024];
+#pragma GCC unroll 0
   for (int i = 0; i < 1024; i++)
 a[i] = rand ();


[gcc r15-4518] pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-10-21 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:c0e54ce1999ccf2241f74c5188b11b92e5aedc1f

commit r15-4518-gc0e54ce1999ccf2241f74c5188b11b92e5aedc1f
Author: Alex Coplan 
Date:   Fri Sep 20 17:39:39 2024 +0100

pair-fusion: Assume alias conflict if common address reg changes [PR116783]

As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

gcc/ChangeLog:

PR rtl-optimization/116783
* pair-fusion.cc (def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(pair_fusion_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.

Diff:
---
 gcc/pair-fusion.cc  | 127 +---
 gcc/testsuite/g++.dg/torture/pr116783.C |  98 
 2 files changed, 213 insertions(+), 12 deletions(-)

diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 653055fdcf67..ccbb5511e9d1 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -2089,11 +2089,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into the pp_printf
+   // calls, doing so triggers -Wformat-diag.
+   pretty_printer pp;
+   pp_string (&pp, "[");
+   pp_access (&pp, cand_use, 0);
+   pp_st

[gcc r15-4465] MAINTAINERS: Add myself as pair fusion and aarch64 ldp/stp maintainer

2024-10-18 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:261d803c40c9fd28c59d8d1771051663f738a871

commit r15-4465-g261d803c40c9fd28c59d8d1771051663f738a871
Author: Alex Coplan 
Date:   Fri Oct 18 11:02:15 2024 +0100

MAINTAINERS: Add myself as pair fusion and aarch64 ldp/stp maintainer

ChangeLog:

* MAINTAINERS (CPU Port Maintainers): Add myself as aarch64 ldp/stp
maintainer.
(Various Maintainers): Add myself as pair fusion maintainer.

Diff:
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 269ac2ea6b49..1074886f4419 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -54,6 +54,7 @@ docs, and the testsuite related to that.
 
 CPU Port Maintainers(CPU alphabetical order)
 
+aarch64 ldp/stp Alex Coplan 
 aarch64 portRichard Earnshaw
 aarch64 portRichard Sandiford   
 aarch64 portMarcus Shawcroft
@@ -251,6 +252,7 @@ AutoFDO Eugene Rozenfeld

 reload  Ulrich Weigand  
 RTL optimizers  Eric Botcazou   
 instruction combinerSegher Boessenkool  
+pair fusion Alex Coplan 
 auto-vectorizer Richard Biener  
 auto-vectorizer Zdenek Dvorak   
 loop infrastructure Zdenek Dvorak   


[gcc r14-10853] aarch64: Assume alias conflict if common address reg changes [PR116783]

2024-10-30 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:434483ac32a08d1f3608c26fe2da302f0e09d6a2

commit r14-10853-g434483ac32a08d1f3608c26fe2da302f0e09d6a2
Author: Alex Coplan 
Date:   Wed Oct 30 13:46:12 2024 +

aarch64: Assume alias conflict if common address reg changes [PR116783]

As the PR shows, pair fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

This is a backport (but not a straight cherry pick) of
r15-4518-gc0e54ce1999ccf2241f74c5188b11b92e5aedc1f.

gcc/ChangeLog:

PR rtl-optimization/116783
* config/aarch64/aarch64-ldp-fusion.cc
(def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(ldp_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.

Diff:
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 127 ---
 gcc/testsuite/g++.dg/torture/pr116783.C  |  98 
 2 files changed, 213 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 1fc25e389cfe..f32d30d54c5c 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -2173,11 +2173,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into 

[gcc r15-5697] gdbhooks: Handle references to vec* in VecPrinter

2024-11-26 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:098a41cb972d3711ebcb518a72a1addfdb6c70cf

commit r15-5697-g098a41cb972d3711ebcb518a72a1addfdb6c70cf
Author: Alex Coplan 
Date:   Tue Nov 26 15:10:29 2024 +

gdbhooks: Handle references to vec* in VecPrinter

vec.h has this method:

  template
  inline T *
  vec_safe_push (vec *&v, const T &obj CXX_MEM_STAT_INFO)

where v is a reference to a pointer to vec.  This matches the regex for
VecPrinter, so gdbhooks.py attempts to print it but chokes on the reference.
I see the following:

  #1  0x02b84b7b in vec_safe_push (v=Traceback 
(most
  recent call last):
File "$SRC/gcc/gcc/gdbhooks.py", line 486, in to_string
  return '0x%x' % intptr(self.gdbval)
File "$SRC/gcc/gcc/gdbhooks.py", line 168, in intptr
  return long(gdbval) if sys.version_info.major == 2 else int(gdbval)
  gdb.error: Cannot convert value to long.

This patch makes VecPrinter handle such references by stripping them
(dereferencing) at the top of the relevant functions.

gcc/ChangeLog:

* gdbhooks.py (strip_ref): New. Use it ...
(VecPrinter.to_string): ... here,
(VecPrinter.children): ... and here.

Diff:
---
 gcc/gdbhooks.py | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 904ee28423a9..a91e5fd2a834 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -472,6 +472,11 @@ def get_vec_kind(val):
 else:
 assert False, f"unexpected vec kind {kind}"
 
+def strip_ref(gdbval):
+if gdbval.type.code == gdb.TYPE_CODE_REF:
+return gdbval.referenced_value ()
+return gdbval
+
 class VecPrinter:
 #-ex "up" -ex "p bb->preds"
 def __init__(self, gdbval):
@@ -483,10 +488,10 @@ class VecPrinter:
 def to_string (self):
 # A trivial implementation; prettyprinting the contents is done
 # by gdb calling the "children" method below.
-return '0x%x' % intptr(self.gdbval)
+return '0x%x' % intptr(strip_ref(self.gdbval))
 
 def children (self):
-val = self.gdbval
+val = strip_ref(self.gdbval)
 if intptr(val) != 0 and get_vec_kind(val) == VEC_KIND_PTR:
 val = val['m_vec']


[gcc r15-7491] vect: Set counts of early break exit blocks correctly [PR117790]

2025-02-12 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:cfdb961588ba318a78e995d2e2cde43130acd993

commit r15-7491-gcfdb961588ba318a78e995d2e2cde43130acd993
Author: Alex Coplan 
Date:   Tue Nov 26 17:48:14 2024 +

vect: Set counts of early break exit blocks correctly [PR117790]

This adds missing code to correctly set the counts of the exit blocks we
create when building the CFG for a vectorized early break loop.

gcc/ChangeLog:

PR tree-optimization/117790
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Set profile counts for {main,alt}_loop_exit_block.

Diff:
---
 gcc/tree-vect-loop-manip.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 06ca99eaab95..56a4e9a8b63f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1686,6 +1686,16 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
 
  set_immediate_dominator (CDI_DOMINATORS, new_preheader,
   loop->header);
+
+ /* Fix up the profile counts of the new exit blocks.
+main_loop_exit_block was created by duplicating the
+preheader, so needs its count scaling according to the main
+exit edge's probability.  The remaining count from the
+preheader goes to the alt_loop_exit_block, since all
+alternative exits have been redirected there.  */
+ main_loop_exit_block->count = loop_exit->count ();
+ alt_loop_exit_block->count
+   = preheader->count - main_loop_exit_block->count;
}
 
   /* Adjust the epilog loop PHI entry values to continue iteration.


[gcc r14-11454] df: Treat partial defs as uses in df_simulate_defs [PR116564]

2025-03-26 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:23108ab2bdea84beeddf0b8d3b10f4251d3e1686

commit r14-11454-g23108ab2bdea84beeddf0b8d3b10f4251d3e1686
Author: Alex Coplan 
Date:   Mon Mar 10 16:44:15 2025 +

df: Treat partial defs as uses in df_simulate_defs [PR116564]

The PR shows us spinning in dce.cc:fast_dce at the start of combine.
This spinning appears to be because of a disagreement between the fast_dce 
code
and the code in df-problems.cc:df_lr_bb_local_compute.  Specifically, they
disagree on the treatment of partial defs.  For the testcase in the PR, we 
have
the following insn in bb 3:

(insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1
 (nil))

which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e.
DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG.

Eliding the large block comment for readability, the code in
df_lr_bb_local_compute does the following (for each insn):

  FOR_EACH_INSN_INFO_DEF (def, insn_info)
{
  unsigned int dregno = DF_REF_REGNO (def);
  bitmap_set_bit (&bb_info->def, dregno);
  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
bitmap_set_bit (&bb_info->use, dregno);
  else
bitmap_clear_bit (&bb_info->use, dregno);
}

i.e. it models partial defs as a RMW operation; thus for the def arising
from i10 above, it records a use of r104; hence it ends up in the
live-in set for bb 3.

However, as it stands, the code in dce.cc:fast_dce (and its callee
dce_process_block) has no such provision for DF_REF_PARTIAL defs.  It
does not treat these as a RMW and does not compute r104 above as being
live-in to bb 3.  At the end of dce_process_block we compute the
following "did something happen" condition used to decide termination of
the analysis:

  block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb));
  if (block_changed)
bitmap_copy (DF_LR_IN (bb), local_live);

  BITMAP_FREE (local_live);
  return block_changed;

because of the disagreement between df_lr_local_compute and the local
analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
in local_live.  Hence we always return true here, call
df_analyze_problem (which re-computes DF_LR_IN according to
df_lr_bb_local_compute, re-adding r104), and so the analysis never
terminates.

This patch therefore adjusts df_simulate_defs (called from
dce_process_block) to match the behaviour of df_lr_bb_local_compute in
this respect, namely we make it model partial defs as RMW operations by
setting the relevant register live.  This fixes the spinning in fast_dce
for this testcase.

gcc/ChangeLog:

PR rtl-optimization/116564
* df-problems.cc (df_simulate_defs): For partial defs, mark the
register live (treat it as a RMW operation).

gcc/testsuite/ChangeLog:

PR rtl-optimization/116564
* gcc.target/aarch64/torture/pr116564.c: New test.

(cherry picked from commit 758e617bcf224dc9d4a7e26dd858d43c1e63b916)

Diff:
---
 gcc/df-problems.cc  |  8 +---
 gcc/testsuite/gcc.target/aarch64/torture/pr116564.c | 11 +++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index 88ee0dd67fc3..6b956fa26d16 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -3855,9 +3855,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live)
 {
   unsigned int dregno = DF_REF_REGNO (def);
 
-  /* If the def is to only part of the reg, it does
-not kill the other defs that reach here.  */
-  if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+  /* If the def is to only part of the reg, model it as a RMW operation
+by marking it live.  It only kills the reg if it is a complete def.  */
+  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+   bitmap_set_bit (live, dregno);
+  else
bitmap_clear_bit (live, dregno);
 }
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c 
b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
new file mode 100644
index ..d471e097294c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+#include 
+void test()
+{
+  for (int L = 0; L < 4; ++L) {
+float64_t ResData[1 * 2];
+float64x1x2_t Src1;
+vst2_f64(ResData, Src1);
+  }
+}


[gcc r13-9517] df: Treat partial defs as uses in df_simulate_defs [PR116564]

2025-04-11 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:7c6bd95b1811d797f43983dbbeb4ad478bdb5c9e

commit r13-9517-g7c6bd95b1811d797f43983dbbeb4ad478bdb5c9e
Author: Alex Coplan 
Date:   Mon Mar 10 16:44:15 2025 +

df: Treat partial defs as uses in df_simulate_defs [PR116564]

The PR shows us spinning in dce.cc:fast_dce at the start of combine.
This spinning appears to be because of a disagreement between the fast_dce 
code
and the code in df-problems.cc:df_lr_bb_local_compute.  Specifically, they
disagree on the treatment of partial defs.  For the testcase in the PR, we 
have
the following insn in bb 3:

(insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1
 (nil))

which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e.
DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG.

Eliding the large block comment for readability, the code in
df_lr_bb_local_compute does the following (for each insn):

  FOR_EACH_INSN_INFO_DEF (def, insn_info)
{
  unsigned int dregno = DF_REF_REGNO (def);
  bitmap_set_bit (&bb_info->def, dregno);
  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
bitmap_set_bit (&bb_info->use, dregno);
  else
bitmap_clear_bit (&bb_info->use, dregno);
}

i.e. it models partial defs as a RMW operation; thus for the def arising
from i10 above, it records a use of r104; hence it ends up in the
live-in set for bb 3.

However, as it stands, the code in dce.cc:fast_dce (and its callee
dce_process_block) has no such provision for DF_REF_PARTIAL defs.  It
does not treat these as a RMW and does not compute r104 above as being
live-in to bb 3.  At the end of dce_process_block we compute the
following "did something happen" condition used to decide termination of
the analysis:

  block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb));
  if (block_changed)
bitmap_copy (DF_LR_IN (bb), local_live);

  BITMAP_FREE (local_live);
  return block_changed;

because of the disagreement between df_lr_local_compute and the local
analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
in local_live.  Hence we always return true here, call
df_analyze_problem (which re-computes DF_LR_IN according to
df_lr_bb_local_compute, re-adding r104), and so the analysis never
terminates.

This patch therefore adjusts df_simulate_defs (called from
dce_process_block) to match the behaviour of df_lr_bb_local_compute in
this respect, namely we make it model partial defs as RMW operations by
setting the relevant register live.  This fixes the spinning in fast_dce
for this testcase.

gcc/ChangeLog:

PR rtl-optimization/116564
* df-problems.cc (df_simulate_defs): For partial defs, mark the
register live (treat it as a RMW operation).

gcc/testsuite/ChangeLog:

PR rtl-optimization/116564
* gcc.target/aarch64/torture/pr116564.c: New test.

(cherry picked from commit 758e617bcf224dc9d4a7e26dd858d43c1e63b916)

Diff:
---
 gcc/df-problems.cc  |  8 +---
 gcc/testsuite/gcc.target/aarch64/torture/pr116564.c | 11 +++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index d2cfaf7f50f6..334ecc585646 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -3855,9 +3855,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live)
 {
   unsigned int dregno = DF_REF_REGNO (def);
 
-  /* If the def is to only part of the reg, it does
-not kill the other defs that reach here.  */
-  if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+  /* If the def is to only part of the reg, model it as a RMW operation
+by marking it live.  It only kills the reg if it is a complete def.  */
+  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+   bitmap_set_bit (live, dregno);
+  else
bitmap_clear_bit (live, dregno);
 }
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c 
b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
new file mode 100644
index ..d471e097294c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+#include 
+void test()
+{
+  for (int L = 0; L < 4; ++L) {
+float64_t ResData[1 * 2];
+float64x1x2_t Src1;
+vst2_f64(ResData, Src1);
+  }
+}


[gcc r12-11082] df: Treat partial defs as uses in df_simulate_defs [PR116564]

2025-05-01 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:91be90b5ae9ee345bd011afdf02df7003d7598d3

commit r12-11082-g91be90b5ae9ee345bd011afdf02df7003d7598d3
Author: Alex Coplan 
Date:   Mon Mar 10 16:44:15 2025 +

df: Treat partial defs as uses in df_simulate_defs [PR116564]

The PR shows us spinning in dce.cc:fast_dce at the start of combine.
This spinning appears to be because of a disagreement between the fast_dce 
code
and the code in df-problems.cc:df_lr_bb_local_compute.  Specifically, they
disagree on the treatment of partial defs.  For the testcase in the PR, we 
have
the following insn in bb 3:

(insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1
 (nil))

which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e.
DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG.

Eliding the large block comment for readability, the code in
df_lr_bb_local_compute does the following (for each insn):

  FOR_EACH_INSN_INFO_DEF (def, insn_info)
{
  unsigned int dregno = DF_REF_REGNO (def);
  bitmap_set_bit (&bb_info->def, dregno);
  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
bitmap_set_bit (&bb_info->use, dregno);
  else
bitmap_clear_bit (&bb_info->use, dregno);
}

i.e. it models partial defs as a RMW operation; thus for the def arising
from i10 above, it records a use of r104; hence it ends up in the
live-in set for bb 3.

However, as it stands, the code in dce.cc:fast_dce (and its callee
dce_process_block) has no such provision for DF_REF_PARTIAL defs.  It
does not treat these as a RMW and does not compute r104 above as being
live-in to bb 3.  At the end of dce_process_block we compute the
following "did something happen" condition used to decide termination of
the analysis:

  block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb));
  if (block_changed)
bitmap_copy (DF_LR_IN (bb), local_live);

  BITMAP_FREE (local_live);
  return block_changed;

because of the disagreement between df_lr_local_compute and the local
analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
in local_live.  Hence we always return true here, call
df_analyze_problem (which re-computes DF_LR_IN according to
df_lr_bb_local_compute, re-adding r104), and so the analysis never
terminates.

This patch therefore adjusts df_simulate_defs (called from
dce_process_block) to match the behaviour of df_lr_bb_local_compute in
this respect, namely we make it model partial defs as RMW operations by
setting the relevant register live.  This fixes the spinning in fast_dce
for this testcase.

gcc/ChangeLog:

PR rtl-optimization/116564
* df-problems.cc (df_simulate_defs): For partial defs, mark the
register live (treat it as a RMW operation).

gcc/testsuite/ChangeLog:

PR rtl-optimization/116564
* gcc.target/aarch64/torture/pr116564.c: New test.

(cherry picked from commit 758e617bcf224dc9d4a7e26dd858d43c1e63b916)

Diff:
---
 gcc/df-problems.cc  |  8 +---
 gcc/testsuite/gcc.target/aarch64/torture/pr116564.c | 11 +++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index 238424ceac8a..fe027593df98 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -3855,9 +3855,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live)
 {
   unsigned int dregno = DF_REF_REGNO (def);
 
-  /* If the def is to only part of the reg, it does
-not kill the other defs that reach here.  */
-  if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+  /* If the def is to only part of the reg, model it as a RMW operation
+by marking it live.  It only kills the reg if it is a complete def.  */
+  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+   bitmap_set_bit (live, dregno);
+  else
bitmap_clear_bit (live, dregno);
 }
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c 
b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
new file mode 100644
index ..d471e097294c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+#include 
+void test()
+{
+  for (int L = 0; L < 4; ++L) {
+float64_t ResData[1 * 2];
+float64x1x2_t Src1;
+vst2_f64(ResData, Src1);
+  }
+}


[gcc r15-7604] pair-fusion: Tweak wording in dump message [PR118320]

2025-02-18 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:facdce9028060c8dc4340a38ae02fb071e16af08

commit r15-7604-gfacdce9028060c8dc4340a38ae02fb071e16af08
Author: Alex Coplan 
Date:   Tue Feb 18 10:48:50 2025 +

pair-fusion: Tweak wording in dump message [PR118320]

As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675978.html
this tweaks the dump messasge added with the fix for PR118320 since it 
doesn't
just apply to load pairs.

gcc/ChangeLog:

PR rtl-optimization/118320
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Tweak wording in 
dump
message when punting on invalid use arrays.

Diff:
---
 gcc/pair-fusion.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 5708d0f3b671..72e642465340 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -1742,7 +1742,7 @@ pair_fusion_bb_info::fuse_pair (bool load_p,
 {
   if (dump_file)
fprintf (dump_file,
-"  load pair: i%d and i%d use different definiitions of"
+"  rejecting pair: i%d and i%d use different definiitions of"
 " the same register\n",
 insns[0]->uid (), insns[1]->uid ());
   return false;


[gcc r15-7854] pair-fusion: Add singleton move_range asserts [PR114492]

2025-03-06 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:d6d7da92fb598c50d25332011bffe6b6515c9669

commit r15-7854-gd6d7da92fb598c50d25332011bffe6b6515c9669
Author: Alex Coplan 
Date:   Wed Mar 5 15:45:09 2025 +

pair-fusion: Add singleton move_range asserts [PR114492]

The PR claims that pair-fusion has invalid uses of gcc_assert (such that
the pass will misbehave with --disable-checking).  As noted in the
comments, in the case of the calls to restrict_movement, the only way we
can possibly depend on the side effects is if we call it with a
non-singleton move range.  However, the intent is that we always have a
singleton move range here, and thus we do not rely on the side effects.

This patch therefore adds asserts to check for a singleton move range
before calling restrict_movement, thus clarifying the intent and
hopefully dispelling any concerns that having the calls wrapped in
asserts is problematic here.

gcc/ChangeLog:

PR rtl-optimization/114492
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Check for 
singleton
move range before calling restrict_movement.
(pair_fusion::try_promote_writeback): Likewise.

Diff:
---
 gcc/pair-fusion.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 72e642465340..a4ef72879675 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -1994,7 +1994,8 @@ pair_fusion_bb_info::fuse_pair (bool load_p,
 
   auto ignore = ignore_changing_insns (changes);
   for (unsigned i = 0; i < changes.length (); i++)
-gcc_assert (rtl_ssa::restrict_movement (*changes[i], ignore));
+gcc_assert (changes[i]->move_range.singleton ()
+   && rtl_ssa::restrict_movement (*changes[i], ignore));
 
   // Check the pair pattern is recog'd.
   if (!rtl_ssa::recog (attempt, *pair_change, ignore))
@@ -3084,7 +3085,8 @@ pair_fusion::try_promote_writeback (insn_info *insn, bool 
load_p)
 
   auto ignore = ignore_changing_insns (changes);
   for (unsigned i = 0; i < ARRAY_SIZE (changes); i++)
-gcc_assert (rtl_ssa::restrict_movement (*changes[i], ignore));
+gcc_assert (changes[i]->move_range.singleton ()
+   && rtl_ssa::restrict_movement (*changes[i], ignore));
 
   if (!rtl_ssa::recog (attempt, pair_change, ignore))
 {


[gcc r15-8000] df: Treat partial defs as uses in df_simulate_defs [PR116564]

2025-03-12 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:758e617bcf224dc9d4a7e26dd858d43c1e63b916

commit r15-8000-g758e617bcf224dc9d4a7e26dd858d43c1e63b916
Author: Alex Coplan 
Date:   Mon Mar 10 16:44:15 2025 +

df: Treat partial defs as uses in df_simulate_defs [PR116564]

The PR shows us spinning in dce.cc:fast_dce at the start of combine.
This spinning appears to be because of a disagreement between the fast_dce 
code
and the code in df-problems.cc:df_lr_bb_local_compute.  Specifically, they
disagree on the treatment of partial defs.  For the testcase in the PR, we 
have
the following insn in bb 3:

(insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1
 (nil))

which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e.
DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG.

Eliding the large block comment for readability, the code in
df_lr_bb_local_compute does the following (for each insn):

  FOR_EACH_INSN_INFO_DEF (def, insn_info)
{
  unsigned int dregno = DF_REF_REGNO (def);
  bitmap_set_bit (&bb_info->def, dregno);
  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
bitmap_set_bit (&bb_info->use, dregno);
  else
bitmap_clear_bit (&bb_info->use, dregno);
}

i.e. it models partial defs as a RMW operation; thus for the def arising
from i10 above, it records a use of r104; hence it ends up in the
live-in set for bb 3.

However, as it stands, the code in dce.cc:fast_dce (and its callee
dce_process_block) has no such provision for DF_REF_PARTIAL defs.  It
does not treat these as a RMW and does not compute r104 above as being
live-in to bb 3.  At the end of dce_process_block we compute the
following "did something happen" condition used to decide termination of
the analysis:

  block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb));
  if (block_changed)
bitmap_copy (DF_LR_IN (bb), local_live);

  BITMAP_FREE (local_live);
  return block_changed;

because of the disagreement between df_lr_local_compute and the local
analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
in local_live.  Hence we always return true here, call
df_analyze_problem (which re-computes DF_LR_IN according to
df_lr_bb_local_compute, re-adding r104), and so the analysis never
terminates.

This patch therefore adjusts df_simulate_defs (called from
dce_process_block) to match the behaviour of df_lr_bb_local_compute in
this respect, namely we make it model partial defs as RMW operations by
setting the relevant register live.  This fixes the spinning in fast_dce
for this testcase.

gcc/ChangeLog:

PR rtl-optimization/116564
* df-problems.cc (df_simulate_defs): For partial defs, mark the
register live (treat it as a RMW operation).

gcc/testsuite/ChangeLog:

PR rtl-optimization/116564
* gcc.target/aarch64/torture/pr116564.c: New test.

Diff:
---
 gcc/df-problems.cc  |  8 +---
 gcc/testsuite/gcc.target/aarch64/torture/pr116564.c | 11 +++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index f32185b3eac3..907537930982 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -3893,9 +3893,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live)
 {
   unsigned int dregno = DF_REF_REGNO (def);
 
-  /* If the def is to only part of the reg, it does
-not kill the other defs that reach here.  */
-  if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+  /* If the def is to only part of the reg, model it as a RMW operation
+by marking it live.  It only kills the reg if it is a complete def.  */
+  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+   bitmap_set_bit (live, dregno);
+  else
bitmap_clear_bit (live, dregno);
 }
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c 
b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
new file mode 100644
index ..d471e097294c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+#include 
+void test()
+{
+  for (int L = 0; L < 4; ++L) {
+float64_t ResData[1 * 2];
+float64x1x2_t Src1;
+vst2_f64(ResData, Src1);
+  }
+}


[gcc r14-11386] aarch64: Check for invalid use arrays in ldp_fusion [PR118320]

2025-03-05 Thread Alex Coplan via Gcc-cvs
https://gcc.gnu.org/g:1a4f1e5f91a69434f2ec0d95c91bad2c928dcff3

commit r14-11386-g1a4f1e5f91a69434f2ec0d95c91bad2c928dcff3
Author: Richard Sandiford 
Date:   Wed Mar 5 12:18:20 2025 +

aarch64: Check for invalid use arrays in ldp_fusion [PR118320]

As Andrew says in the bugzilla comments, this PR is about a case where
we tried to fuse two stores of x0, one in which x0 was defined and one
in which it was undefined.  merge_access_arrays failed on the conflict,
but the failure wasn't caught.

Normally the hazard detection code would fail if the instructions
had incompatible uses.  However, an undefined use doesn't impose
many restrictions on movements.  I think this is likely to be the
only case where hazard detection isn't enough.

As Andrew notes in bugzilla, it might be possible to allow uses
of defined and undefined values to be merged to the defined value.
But that sounds dangerous in the general case, as an rtl-ssa-level
decision.  We might run the risk of turning conditional UB into
unconditional UB.  And LLVM proves that the definition of "undef"
isn't simple.

This is a backport of
r15-7282-gf559ac896942ffe0e2315d0a4d8b8b517a16d607.

gcc/
PR rtl-optimization/118320
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Commonize the merge of input_uses and return early if it
fails.

gcc/testsuite/
PR rtl-optimization/118320
* g++.dg/torture/pr118320.C: New test.

Co-authored-by: Alex Coplan 

Diff:
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 31 +--
 gcc/testsuite/g++.dg/torture/pr118320.C  | 15 +++
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index f32d30d54c5c..1791ca5563c5 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1784,6 +1784,24 @@ ldp_bb_info::fuse_pair (bool load_p,
   input_uses[i] = remove_note_accesses (attempt, input_uses[i]);
 }
 
+  // Get the uses that the pair instruction would have, and punt if
+  // the unpaired instructions use different definitions of the same
+  // register.  That would normally be caught as a side-effect of
+  // hazard detection below, but this check also deals with cases
+  // in which one use is undefined and the other isn't.
+  auto new_uses = merge_access_arrays (attempt,
+  drop_memory_access (input_uses[0]),
+  drop_memory_access (input_uses[1]));
+  if (!new_uses.is_valid ())
+{
+  if (dump_file)
+   fprintf (dump_file,
+"  rejecting pair: i%d and i%d use different definiitions of"
+" the same register\n",
+insns[0]->uid (), insns[1]->uid ());
+  return false;
+}
+
   // Edge case: if the first insn is a writeback load and the
   // second insn is a non-writeback load which transfers into the base
   // register, then we should drop the writeback altogether as the
@@ -1911,10 +1929,7 @@ ldp_bb_info::fuse_pair (bool load_p,
   input_defs[1]);
   gcc_assert (pair_change->new_defs.is_valid ());
 
-  pair_change->new_uses
-   = merge_access_arrays (attempt,
-  drop_memory_access (input_uses[0]),
-  drop_memory_access (input_uses[1]));
+  pair_change->new_uses = new_uses;
   gcc_assert (pair_change->new_uses.is_valid ());
   set_pair_pat (pair_change);
 }
@@ -1936,9 +1951,7 @@ ldp_bb_info::fuse_pair (bool load_p,
case Action::CHANGE:
{
  set_pair_pat (change);
- change->new_uses = merge_access_arrays (attempt,
- input_uses[0],
- input_uses[1]);
+ change->new_uses = new_uses;
  auto d1 = drop_memory_access (input_defs[0]);
  auto d2 = drop_memory_access (input_defs[1]);
  change->new_defs = merge_access_arrays (attempt, d1, d2);
@@ -1970,9 +1983,7 @@ ldp_bb_info::fuse_pair (bool load_p,
  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
  change = make_change (new_insn);
  change->move_range = move_range;
- change->new_uses = merge_access_arrays (attempt,
- input_uses[0],
- input_uses[1]);
+ change->new_uses = new_uses;
  gcc_assert (change->new_uses.is_valid ());
 
  auto d1 = drop_memory_access (input_defs[0]);
diff --git a/gcc/testsuite/g++.dg/torture/pr118320.C 
b/gcc/testsuite/g++.dg/tor