[PATCH] A new copy propagation and PHI elimination pass

2023-10-20 Thread Filip Kastl
Hi,

this is a patch that I submitted two months ago as an RFC ([RFC] gimple ssa:
SCCP - A new PHI optimization pass). I added some polish since.

It is a new lightweight pass that removes redundant PHI functions and as a
bonus does basic copy propagation. With Jan Hubička we measured that it is able
to remove usually more than 5% of all PHI functions when run among early passes
(sometimes even 13% or more). Those are mostly PHI functions that would be
later optimized away but with this pass it is possible to remove them early
enough so that they don't get streamed when runing LTO (and also potentially
inlined at multiple places). It is also able to remove some redundant PHIs
that otherwise would still be present during RTL expansion.

Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
with and without this patch. These are the sizes of .debug_info and
.debug_loclists

.debug_info without patch 181694311
.debug_infowith patch 181692320
+0.0011% change

.debug_loclists without patch 47934753
.debug_loclistswith patch 47934966
-0.0004% change

I wanted to use dwlocstat to compare debug coverages but didn't manage to get
the program working on my machine sadly. Hope this suffices. Seems to me that
my patch doesn't have a significant impact on debug info.

Bootstraped and tested* on x86_64-pc-linux-gnu.

* One testcase (pr79691.c) did regress. However that is because the test is
dependent on a certain variable not being copy propagated. I will go into more
detail about this in a reply to this mail.

Ok to commit?

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* tree-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

    * gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/passes.def  |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 +++
 gcc/tree-pass.h |   1 +
 gcc/tree-ssa-sccopy.cc  | 867 
 5 files changed, 950 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
 create mode 100644 gcc/tree-ssa-sccopy.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..d381d8129dc 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1734,6 +1734,7 @@ OBJS = \
tree-ssa-pre.o \
tree-ssa-propagate.o \
tree-ssa-reassoc.o \
+   tree-ssa-sccopy.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
tree-ssa-sink.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 2bafd60bbfb..f81c3546b44 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_if_to_switch);
  NEXT_PASS (pass_convert_switch);
  NEXT_PASS (pass_cleanup_eh);
+ NEXT_PASS (pass_sccopy);
  NEXT_PASS (pass_profile);
  NEXT_PASS (pass_local_pure_const);
  NEXT_PASS (pass_modref);
@@ -367,6 +368,7 @@ along with GCC; see the file COPYING3.  If not see
 However, this also causes us to misdiagnose cases that should be
 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+  NEXT_PASS (pass_sccopy);
   NEXT_PASS (pass_tail_calls);
   /* Split critical edges before late uninit warning to reduce the
  number of false positives from it.  */
@@ -408,6 +410,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
+  NEXT_PASS (pass_sccopy);
   /* ???  We do want some kind of loop invariant motion, but we possibly
  need to adjust LIM to be more friendly towards preserving accurate
 debug information here.  */
diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
new file mode 100644
index 000..1e61a6b320e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sccopy-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
+/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
+
+int __GIMPLE (ssa, startwith ("sccopy"))
+main ()
+{
+  int a;
+  int y;

Re: [PATCH] A new copy propagation and PHI elimination pass

2023-10-20 Thread Filip Kastl
On Fri 2023-10-20 15:50:25, Filip Kastl wrote:
> Bootstraped and tested* on x86_64-pc-linux-gnu.
> 
> * One testcase (pr79691.c) did regress. However that is because the test is
> dependent on a certain variable not being copy propagated. I will go into more
> detail about this in a reply to this mail.

This testcase checks for the string '= 9' being present in the tree-optimized
gimple dump ({ dg-final { scan-tree-dump " = 9;" "optimized" } }). This is how
the relevant place in the dump looks like without my patch:

int f4 (int i)
{
  int _6; 

   [local count: 1073741824]:
  _6 = 9;
  return _6; 

}

Note that '= 9' is indeed present but there is an opportunity for copy
propagation. With my patch, the copy propagation happens:

int f4 (int i)
{
  int _6;

   [local count: 1073741824]:
  return 9;

}

Which means no '= 9' is present and therefore the test fails.

What should I do? I don't suppose that changing the testcase to search for just
'9' would be wise since the dump may contain other '9's. I could change it to
search for 'return 9'. That would make it dependent on some copy propagation
being run late enough. However it is currently dependent on *no* copy
propagation being run late in the compilation. Also, if the test would search
for 'return 9', it would search for the most optimized version of the function
f4.

Or maybe searching for '9;' would work.

Filip Kastl


Re: [PATCH] A new copy propagation and PHI elimination pass

2023-11-02 Thread Filip Kastl
Hi,

thanks for the guidance.  I'm going to post a new version of the patch with the
testcase modified so that it searches for 'return 9;' instead of '= 9;'.

Filip Kastl


On Fri 2023-10-27 13:55:37, Jeff Law wrote:
> 
> 
> On 10/20/23 07:52, Filip Kastl wrote:
> > On Fri 2023-10-20 15:50:25, Filip Kastl wrote:
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test 
> > > is
> > > dependent on a certain variable not being copy propagated. I will go into 
> > > more
> > > detail about this in a reply to this mail.
> > 
> > This testcase checks for the string '= 9' being present in the 
> > tree-optimized
> > gimple dump ({ dg-final { scan-tree-dump " = 9;" "optimized" } }). This is 
> > how
> > the relevant place in the dump looks like without my patch:
> > 
> > int f4 (int i)
> > {
> >int _6;
> > 
> > [local count: 1073741824]:
> >_6 = 9;
> >return _6;
> > 
> > }
> > 
> > Note that '= 9' is indeed present but there is an opportunity for copy
> > propagation. With my patch, the copy propagation happens:
> > 
> > int f4 (int i)
> > {
> >int _6;
> > 
> > [local count: 1073741824]:
> >return 9;
> > 
> > }
> > 
> > Which means no '= 9' is present and therefore the test fails.
> > 
> > What should I do? I don't suppose that changing the testcase to search for 
> > just
> > '9' would be wise since the dump may contain other '9's. I could change it 
> > to
> > search for 'return 9'. That would make it dependent on some copy propagation
> > being run late enough. However it is currently dependent on *no* copy
> > propagation being run late in the compilation. Also, if the test would 
> > search
> > for 'return 9', it would search for the most optimized version of the 
> > function
> > f4.
> > 
> > Or maybe searching for '9;' would work.
> So in general you have to go back and try to assess the original intent of
> the test.  Once you have the original intent, the path forward is often
> clear.
> 
> In this specific case the source is:
> +/* Verify -fprintf-return-value results used for constant propagation.  */
> +int f4 (int i)
> +{
> +  int n1 = __builtin_snprintf (0, 0, "%i", 1234);
> +  int n2 = __builtin_snprintf (0, 0, "%i", 12345);
> +  return n1 + n2;
> +}
> 
> And the intent of the test is to verify that we get constants from the
> snprintf calls and that they in turn simplify to a constant.
> 
> That is certainly still the case after your patch, just the form of the
> output is different (the constant is propagated further).  So I think
> testing for "return 9" would be the right approach here.
> 
> jeff


[PATCH v2] A new copy propagation and PHI elimination pass

2023-11-02 Thread Filip Kastl
> Hi,
> 
> this is a patch that I submitted two months ago as an RFC. I added some polish
> since.
> 
> It is a new lightweight pass that removes redundant PHI functions and as a
> bonus does basic copy propagation. With Jan Hubička we measured that it is 
> able
> to remove usually more than 5% of all PHI functions when run among early 
> passes
> (sometimes even 13% or more). Those are mostly PHI functions that would be
> later optimized away but with this pass it is possible to remove them early
> enough so that they don't get streamed when runing LTO (and also potentially
> inlined at multiple places). It is also able to remove some redundant PHIs
> that otherwise would still be present during RTL expansion.
> 
> Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> with and without this patch. These are the sizes of .debug_info and
> .debug_loclists
> 
> .debug_info without patch 181694311
> .debug_infowith patch 181692320
> +0.0011% change
> 
> .debug_loclists without patch 47934753
> .debug_loclistswith patch 47934966
> -0.0004% change
> 
> I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> the program working on my machine sadly. Hope this suffices. Seems to me that
> my patch doesn't have a significant impact on debug info.
> 
> Bootstraped and tested* on x86_64-pc-linux-gnu.
> 
> * One testcase (pr79691.c) did regress. However that is because the test is
> dependent on a certain variable not being copy propagated. I will go into more
> detail about this in a reply to this mail.
> 
> Ok to commit?

This is a second version of the patch.  In this version, I modified the
pr79691.c testcase so that it works as intended with other changes from the
patch.

The pr79691.c testcase checks that we get constants from snprintf calls and
that they simplify into a single constant.  The testcase doesn't account for
the fact that this constant may be further copy propagated which is exactly
what happens with this patch applied.

Bootstrapped and tested on x86_64-pc-linux-gnu.

Ok to commit?

Filip Kastl

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* tree-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
  for additional copy propagation this patch adds.
* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/passes.def  |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h |   1 +
 gcc/tree-ssa-sccopy.cc  | 867 
 6 files changed, 951 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
 create mode 100644 gcc/tree-ssa-sccopy.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a25a1e32fbc..2bd5a015676 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1736,6 +1736,7 @@ OBJS = \
tree-ssa-pre.o \
tree-ssa-propagate.o \
tree-ssa-reassoc.o \
+   tree-ssa-sccopy.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
tree-ssa-sink.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..fa6c5a2c9fa 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_if_to_switch);
  NEXT_PASS (pass_convert_switch);
  NEXT_PASS (pass_cleanup_eh);
+ NEXT_PASS (pass_sccopy);
  NEXT_PASS (pass_profile);
  NEXT_PASS (pass_local_pure_const);
  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 However, this also causes us to misdiagnose cases that should be
 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+  NEXT_PASS (pass_sccopy);
   NEXT_PASS (pass_tail_calls);
   /* Split critical edges before late uninit warning to reduce the
  number of false positiv

[COMMITED] contrib: Update test_mklog to correspond to mklog

2024-03-07 Thread Filip Kastl
Hi,

the recent change to contrib/mklog.py broke contrib/test_mklog.py.  It modified
mklog.py to produce "Move to..." instead of "Moved to..." note in changelog for
files that were moved.  I've commited the fix as obvious.

Filip Kastl

-- 8< --

contrib/ChangeLog:

* test_mklog.py: "Moved to..." -> "Move to..."

Signed-off-by: Filip Kastl 
---
 contrib/test_mklog.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/test_mklog.py b/contrib/test_mklog.py
index b6210738e55..80e159fcca4 100755
--- a/contrib/test_mklog.py
+++ b/contrib/test_mklog.py
@@ -400,7 +400,7 @@ rename to gcc/ipa-icf2.c
 EXPECTED8 = '''\
 gcc/ChangeLog:
 
-   * ipa-icf.c: Moved to...
+   * ipa-icf.c: Move to...
* ipa-icf2.c: ...here.
 
 '''
-- 
2.43.0



[COMMITED] MAINTAINERS: Fix order in Write After Aproval

2024-03-12 Thread Filip Kastl
Hi,

I pushed this patch on Fri 8th and sent this mail to notify that I did so.
I had some trouble with sending the mail though and it didn't arrive to the
mailing list.  I'm sending it now instead.

Filip Kastl

-- 8< --

ChangeLog:

* MAINTAINERS: Fix order of names in Write After Aproval

Signed-off-by: Filip Kastl 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a681518d704..8f64ee630b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -448,9 +448,9 @@ Wei Guozhi  

 Vineet Gupta   
 Naveen H.S 
 Mostafa Hagog  
-Demin Han  
 Jivan Hakobyan 
 Andrew Haley   
+Demin Han  
 Frederik Harwath   
 Stuart Hastings
 Michael Haubenwallner  

-- 
2.43.0


[PATCH v4] A new copy propagation and PHI elimination pass

2023-12-13 Thread Filip Kastl
urrent commit. Once that is successfully done, is the pass
OK to be pushed to main?

Filip

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/gimple-ssa-sccopy.cc| 680 
 gcc/passes.def  |   2 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 
 gcc/tree-pass.h |   1 +
 5 files changed, 762 insertions(+)
 create mode 100644 gcc/gimple-ssa-sccopy.cc
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 753f2f36618..2e6f2fef1ba 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1497,6 +1497,7 @@ OBJS = \
gimple-ssa-backprop.o \
gimple-ssa-isolate-paths.o \
gimple-ssa-nonnull-compare.o \
+   gimple-ssa-sccopy.o \
gimple-ssa-split-paths.o \
gimple-ssa-store-merging.o \
gimple-ssa-strength-reduction.o \
diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
new file mode 100644
index 000..ac5ec32eb32
--- /dev/null
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -0,0 +1,680 @@
+/* Strongly-connected copy propagation pass for the GNU compiler.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Filip Kastl 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "vec.h"
+#include "hash-set.h"
+#include 
+#include "ssa-iterators.h"
+#include "gimple-fold.h"
+#include "gimplify.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "builtins.h"
+#include "tree-ssa-dce.h"
+#include "fold-const.h"
+
+/* Strongly connected copy propagation pass.
+
+   This is a lightweight copy propagation pass that is also able to eliminate
+   redundant PHI statements.  The pass considers the following types of copy
+   statements:
+
+   1 An assignment statement with a single argument.
+
+   _3 = _2;
+   _4 = 5;
+
+   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
+ itself or one other value.
+
+   _5 = PHI <_1>;
+   _6 = PHI <_6, _6, _1, _1>;
+   _7 = PHI <16, _7>;
+
+   3 A set of PHI statements that only refer to each other or to one other
+ value.
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>;
+
+   All of these statements produce copies and can be eliminated from the
+   program.  For a copy statement we identify the value it creates a copy of
+   and replace references to the statement with the value -- we propagate the
+   copy.
+
+   _3 = _2; // Replace all occurences of _3 by _2
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
+
+   To find all three types of copy statements we use an algorithm based on
+   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
+   introduced in an article from 2013[1]. We describe the algorithm bellow.
+
+   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
+   SCC computation we wrap potential copy statements in the 'vertex' struct.
+   To each of these statements we also assign a v

Re: [PATCH v4] A new copy propagation and PHI elimination pass

2023-12-14 Thread Filip Kastl
Successfully bootstrapped and regtested on x86_64-linux. Will push to master.

Filip


[PATCH v3] A new copy propagation and PHI elimination pass

2023-12-08 Thread Filip Kastl
#x27;t assume that all
variables are copies of each other at any point. It instead identifies copy
statements (or PHI SCCs that act as copy statements) and then for each of these
it propagates: By that I mean if a copy statement says that _3 is a copy of _2,
then it replaces all uses of _3 by _2.

But its possible that I still misinterpret what 'optimistic' means. If
mentioning that it works in an optimistic way would help readability of the
pass, I'm happy to add that comment.


Richard, is the patch ok as it is now? Or do you think that making changes 1, 2
or 3 is necessary? Or are there any other issues which should be adressed?

Filip


-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expanstion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
  for additional copy propagation this patch adds.
* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/gimple-ssa-sccopy.cc| 687 
 gcc/passes.def  |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h |   1 +
 6 files changed, 771 insertions(+), 1 deletion(-)
 create mode 100644 gcc/gimple-ssa-sccopy.cc
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 753f2f36618..2e6f2fef1ba 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1497,6 +1497,7 @@ OBJS = \
gimple-ssa-backprop.o \
gimple-ssa-isolate-paths.o \
gimple-ssa-nonnull-compare.o \
+   gimple-ssa-sccopy.o \
gimple-ssa-split-paths.o \
gimple-ssa-store-merging.o \
gimple-ssa-strength-reduction.o \
diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
new file mode 100644
index 000..b74f7a62569
--- /dev/null
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -0,0 +1,687 @@
+/* Strongly-connected copy propagation pass for the GNU compiler.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Filip Kastl 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "vec.h"
+#include "hash-set.h"
+#include 
+#include "ssa-iterators.h"
+#include "gimple-fold.h"
+#include "gimplify.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "builtins.h"
+#include "tree-ssa-dce.h"
+#include "fold-const.h"
+
+/* Strongly connected copy propagation pass.
+
+   This is a lightweight copy propagation pass that is also able to eliminate
+   redundant PHI statements.  The pass considers the following types of copy
+   statements:
+
+   1 An assignment statement with a single argument.
+
+   _3 = _2;
+   _4 = 5;
+
+   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
+ itself or one other value.
+
+   _5 = PHI <_1>;
+   _6 = PHI <_6, _6, _1, _1>;
+   _7 = PHI <16, _7>;
+
+   3 A set of PHI statements that only refer to each other or to one other
+ value.
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>;
+
+   All of these statements produce copies and can be eliminated from the
+   program.  For a copy statement 

[RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-24 Thread Filip Kastl
Hi,

As a part of my bachelor thesis under the supervision of Honza (Jan Hubicka), I
implemented a new PHI elimination algorithm into GCC. The algorithm is
described in the following article:

Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
(2013). Simple and Efficient Construction of Static Single Assignment
Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6

In the article the PHI elimination algorithm is only considered a part of
another algorithm. However, with Honza we tried running the algorithm as a
standalone pass and found that there is a surprisingly big number of PHI
functions it is able to remove -- sometimes even ~13% of PHI functions or more.
This email contains a patch with the pass and with the placement in passes.def
we used to measure this.

Do you think that the pass is worthy of inclusion into upstream GCC? What are
some things that I should change? Should I try to put the pass in different
places in passes.def?

Things I already know I'd like to change:
- Split the patch into two (one for sccp, one for the utility functions)
- Change the name SCCP to something else since there already is a pass with
  that name (any suggestions?)
- Add a comment into sccp.cc explaining the algorithm

I successfully bootstrapped and tested GCC with the patch applied (with the
commit 3b691e0190c6e7291f8a52e1e14d8293a28ff4ce checked out). 

Here are my measurements. I measured the number of PHIs before the PHI
elimination algorithm was run and after it was run. I measured on the standard
2017 benchmarks with -O3. Since the pass is present in passes.def twice,
results of the first run are marked (1) and results of the second are marked
(2). Honza also did measurements with profile feedback and got even bigger
percentages.

500.perlbench_r
Started with (1) 30287
Ended with (1) 26188
Removed PHI % (1) 13.53385941162875161000
Started with (2) 38005
Ended with (2) 37897
Removed PHI % (2) .28417313511380081600

502.gcc_r
Started with (1) 148187
Ended with (1) 140292
Removed PHI % (1) 5.32772780338356266100
Started with (2) 211479
Ended with (2) 210635
Removed PHI % (2) .39909399987705635100

505.mcf_r
Started with (1) 341
Ended with (1) 303
Removed PHI % (1) 11.14369501466275659900
Started with (2) 430
Ended with (2) 426
Removed PHI % (2) .93023255813953488400

523.xalancbmk_r
Started with (1) 62514
Ended with (1) 57785
Removed PHI % (1) 7.5647055059346800
Started with (2) 132561
Ended with (2) 131726
Removed PHI % (2) .62989868815111533600

531.deepsjeng_r
Started with (1) 1388
Ended with (1) 1250
Removed PHI % (1) 9.94236311239193083600
Started with (2) 1887
Ended with (2) 1879
Removed PHI % (2) .42395336512983571900

541.leela_r
Started with (1) 3332
Ended with (1) 2994
Removed PHI % (1) 10.14405762304921968800
Started with (2) 4372
Ended with (2) 4352
Removed PHI % (2) .45745654162854528900

Here is the patch:

-- >8 --

This patch introduces two things:
- A new PHI elimination pass (major)
- New utility functions for passes that replace one ssa name with a
  different one (minor)

Strongly-connected copy propagation (SCCP) pass

The PHI elimination pass is a lightweight optimization pass based on
strongly-connected components. Some set of PHIs may be redundant because
the PHIs only refer to each other or to a single value from outside the
set. This pass finds and eliminates these sets. As a bonus the pass also
does some basic copy propagation because it considers a copy statement
to be a PHI with a single argument.

SCCP uses an algorithm from this article:
Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
(2013). Simple and Efficient Construction of Static Single Assignment
Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6

cleanup_after_replace and cleanup_after_all_replaces_done

Whenever you replace all uses of an ssa name by a different ssa name,
some GCC internal structures have to be updated. To streamline this
process, the patch adds the cleanup_after_replace function that should
be called after an ssa name is replaced by a different one and the
cleanup_after_all_replaces_done that should be called before a pass that
replaced one or more ssa names exits. The SCCP pass uses these
functions.

Signed-off-by: Filip Kastl 

gcc/ChangeLog:

* Makefile.in: Added sccp pass.
* passes.def: Added sccp pass to early and late optimizations.
* tree-pass.h (make_pass_sccp): Added sccp pass.
* tree-ssa-propagate.cc (cleanup_after_replace): New function.
(cleanup_after_all_replaces_done): New function.
* tree-ssa-propagate.h (cleanup_after_replace): New function.
(cleanup_after_all_replaces_done): New function.
*

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Filip Kastl
> The most obvious places would be right after SSA construction and before RTL 
> expansion.
> Can you provide measurements for those positions?

The algorithm should only remove PHIs that break SSA form minimality. Since
GCC's SSA construction already produces minimal SSA form, the algorithm isn't
expected to remove any PHIs if run right after the construction. I even
measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
algorithm managed to remove exactly 1 PHI, which is weird). 

I tried putting the pass before pass_expand. There isn't a lot of PHIs to
remove at that point, but there still are some.

500.perlbench_r
Started with 43111
Ended with 42942
Removed PHI % .39201131961680313700

502.gcc_r
Started with 141392
Ended with 140455
Removed PHI % .66269661649881181400

505.mcf_r
Started with 482
Ended with 478
Removed PHI % .82987551867219917100

523.xalancbmk_r
Started with 136040
Ended with 135629
Removed PHI % .30211702440458688700

531.deepsjeng_r
Started with 2150
Ended with 2148
Removed PHI % .09302325581395348900

541.leela_r
Started with 4664
Ended with 4650
Removed PHI % .30017152658662092700

557.xz_r
Started with 43
Ended with 43
Removed PHI % 0

> Can the pass somehow be used as part of propagations like during value 
> numbering?

I don't think that the pass could be used as a part of different optimizations
since it works on the whole CFG (except for copy propagation as I noted in the
RFC). I'm adding Honza into Cc. He'll have more insight into this.

> Could the new file be called gimple-ssa-sccp.cc or something similar?

Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
appropriate?

I'm thinking about naming the pass 'scc-copy' and the file
'tree-ssa-scc-copy.cc'.

> Removing some PHIs is nice, but it would be also interesting to know what
> are the effects on generated code size and/or performance.
> And also if it has any effects on debug information coverage.

Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
Here are the full results. What do you think? Should I run more benchmarks? Or
benchmark multiple times? Or run the benchmarks on different machines?*

500.perlbench_r
Without SCCP: 244.151807s
With SCCP: 242.448438s
-0.7025695913124297%

502.gcc_r
Without SCCP: 211.029606s
With SCCP: 211.614523s
+0.27640683243653763%

505.mcf_r
Without SCCP: 298.782621s
With SCCP: 291.671468s
-2.438069465197046%

523.xalancbmk_r
Without SCCP: 189.940639s
With SCCP: 189.876261s
-0.03390523894928332%

531.deepsjeng_r
Without SCCP: 250.63648s
With SCCP: 250.988624s
+0.1403027732444051%

541.leela_r
Without SCCP: 346.066278s
With SCCP: 339.692987s
-1.8761915152519792%

Regarding size: The pass doesn't seem to significantly reduce or increase the
size of the result binary. The differences were at most ~0.1%.

Regarding debug info coverage: I didn't notice any additional guality testcases
failing after I applied the patch. *Is there any other way how I should check
debug info coverage?*


Filip K


Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-09-01 Thread Filip Kastl
> That's interesting.  Your placement at
> 
>   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>   NEXT_PASS (pass_phiopt, true /* early_p */);
> + NEXT_PASS (pass_sccp);
> 
> and
> 
>NEXT_PASS (pass_tsan);
>NEXT_PASS (pass_dse, true /* use DR analysis */);
>NEXT_PASS (pass_dce);
> +  NEXT_PASS (pass_sccp);
> 
> isn't immediately after the "best" existing pass we have to
> remove dead PHIs which is pass_cd_dce.  phiopt might leave
> dead PHIs around and the second instance runs long after the
> last CD-DCE.
> 
> So I wonder if your pass just detects unnecessary PHIs we'd have
> removed by other means and what survives until RTL expansion is
> what we should count?
> 
> Can you adjust your original early placement to right after
> the cd-dce pass and for the late placement turn the dce pass
> before it into cd-dce and re-do your measurements?

So I did this

  NEXT_PASS (pass_dse);
  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
  NEXT_PASS (pass_sccp);
  NEXT_PASS (pass_phiopt, true /* early_p */);
  NEXT_PASS (pass_tail_recursion); 

and this

  NEXT_PASS (pass_dse, true /* use DR analysis */);
  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
  NEXT_PASS (pass_sccp);
  /* Pass group that runs when 1) enabled, 2) there are loops

and got these results:

500.perlbench_r
Started with (1) 30318
Ended with (1) 26219
Removed PHI % (1) 13.52002110957187149600
Started with (2) 39043
Ended with (2) 38941
Removed PHI % (2) .26125041620777092000

502.gcc_r
Started with (1) 148361
Ended with (1) 140464
Removed PHI % (1) 5.32282742769326170700
Started with (2) 216209
Ended with (2) 215367
Removed PHI % (2) .38943799749316633500

505.mcf_r
Started with (1) 342
Ended with (1) 304
Removed PHI % (1) 11.1200
Started with (2) 437
Ended with (2) 433
Removed PHI % (2) .91533180778032036700
 
523.xalancbmk_r
Started with (1) 62995
Ended with (1) 58289 
Removed PHI % (1) 7.47043416144138423700
Started with (2) 134026
Ended with (2) 133193
Removed PHI % (2) .62152119737961291100
  
531.deepsjeng_r
Started with (1) 1402
Ended with (1) 1264
Removed PHI % (1) 9.84308131241084165500
Started with (2) 1928
Ended with (2) 1920
Removed PHI % (2) .41493775933609958600

541.leela_r
Started with (1) 3398
Ended with (1) 3060
Removed PHI % (1) 9.94702766333137139500
Started with (2) 4473
Ended with (2) 4453
Removed PHI % (2) .44712720769058797300

557.xz_r
Started with (1) 47
Ended with (1) 44
Removed PHI % (1) 6.38297872340425532000
Started with (2) 43
Ended with (2) 43
Removed PHI % (2) 0

These measurements don't differ very much from the previous. It seems to me
that phiopt does output some redundant PHIs but the vast majority of the
eliminated PHIs are generated in earlier passes and cd_dce isn't able to get
rid of them.

A noteworthy information might be that most of the eliminated PHIs are actually
trivial PHIs. I consider a PHI to be trivial if it only references itself or
one other SSA name.

Here is a comparison of the newest measurements (sccp after cd_dce) with the
previous ones (sccp after phiopt and dce):

500.perlbench_r
 
Started with (1-PREV) 30287
Started with (1-NEW) 30318
 
Ended with (1-PREV) 26188
Ended with (1-NEW) 26219
 
Removed PHI % (1-PREV) 13.53385941162875161000
Removed PHI % (1-NEW) 13.52002110957187149600
 
Started with (2-PREV) 38005
Started with (2-NEW) 39043
 
Ended with (2-PREV) 37897
Ended with (2-NEW) 38941
 
Removed PHI % (2-PREV) .28417313511380081600
Removed PHI % (2-NEW) .26125041620777092000
 
502.gcc_r
 
Started with (1-PREV) 148187
Started with (1-NEW) 148361
 
Ended with (1-PREV) 140292
Ended with (1-NEW) 140464
 
Removed PHI % (1-PREV) 5.32772780338356266100
Removed PHI % (1-NEW) 5.32282742769326170700
  
Started with (2-PREV) 211479
Started with (2-NEW) 216209
 
Ended with (2-PREV) 210635
Ended with (2-NEW) 215367
 
Removed PHI % (2-PREV) .39909399987705635100
Removed PHI % (2-NEW) .38943799749316633500


Filip K


P.S. I made a small mistake and didn't compute the benchmark speedup
percentages right in the previous email. Here are the corrected results. The
correct percentages are a little bit smaller but very similar. There is still a
~2% speedup with 505.mcf_r and 541.leela_r.

500.perlbench_r
Without SCCP: 244.151807s
With SCCP: 242.448438s
-0.6976679881791663%

502.gcc_r
Without SCCP: 211.029606s
With SCCP: 211.614523s
+0.27717295742853737%

505.mcf_r
Without SCCP: 298.782621s
With SCCP: 291.671468s
-2.380042378703145%

523.xalancbmk_r
Without SCCP: 189.940639s
With SCCP: 189.876261s
-0.03389374719330334%

531.deepsjeng_r
Without SCCP: 250.63648s
With SCCP: 250.988624s
+0.14049989849840747%

541.leela_r
Without SCCP: 346.066278s
With SCCP: 339.692987s
-1.84163884352811

[PING] [PATCH v2] A new copy propagation and PHI elimination pass

2023-11-15 Thread Filip Kastl
- Forwarded message from Filip Kastl  -

From: Filip Kastl 
To: gcc-patches@gcc.gnu.org
Cc: rguent...@suse.de, hubi...@ucw.cz
Subject: [PATCH v2] A new copy propagation and PHI elimination pass
Date: Thu, 2 Nov 2023 14:00:02 +0100
Message-ID: 

> Hi,
> 
> this is a patch that I submitted two months ago as an RFC. I added some polish
> since.
> 
> It is a new lightweight pass that removes redundant PHI functions and as a
> bonus does basic copy propagation. With Jan Hubička we measured that it is 
> able
> to remove usually more than 5% of all PHI functions when run among early 
> passes
> (sometimes even 13% or more). Those are mostly PHI functions that would be
> later optimized away but with this pass it is possible to remove them early
> enough so that they don't get streamed when runing LTO (and also potentially
> inlined at multiple places). It is also able to remove some redundant PHIs
> that otherwise would still be present during RTL expansion.
> 
> Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> with and without this patch. These are the sizes of .debug_info and
> .debug_loclists
> 
> .debug_info without patch 181694311
> .debug_infowith patch 181692320
> +0.0011% change
> 
> .debug_loclists without patch 47934753
> .debug_loclistswith patch 47934966
> -0.0004% change
> 
> I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> the program working on my machine sadly. Hope this suffices. Seems to me that
> my patch doesn't have a significant impact on debug info.
> 
> Bootstraped and tested* on x86_64-pc-linux-gnu.
> 
> * One testcase (pr79691.c) did regress. However that is because the test is
> dependent on a certain variable not being copy propagated. I will go into more
> detail about this in a reply to this mail.
> 
> Ok to commit?

This is a second version of the patch.  In this version, I modified the
pr79691.c testcase so that it works as intended with other changes from the
patch.

The pr79691.c testcase checks that we get constants from snprintf calls and
that they simplify into a single constant.  The testcase doesn't account for
the fact that this constant may be further copy propagated which is exactly
what happens with this patch applied.

Bootstrapped and tested on x86_64-pc-linux-gnu.

Ok to commit?

Filip Kastl

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* tree-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
  for additional copy propagation this patch adds.
* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/passes.def  |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h |   1 +
 gcc/tree-ssa-sccopy.cc  | 867 
 6 files changed, 951 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
 create mode 100644 gcc/tree-ssa-sccopy.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a25a1e32fbc..2bd5a015676 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1736,6 +1736,7 @@ OBJS = \
tree-ssa-pre.o \
tree-ssa-propagate.o \
tree-ssa-reassoc.o \
+   tree-ssa-sccopy.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
tree-ssa-sink.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..fa6c5a2c9fa 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_if_to_switch);
  NEXT_PASS (pass_convert_switch);
  NEXT_PASS (pass_cleanup_eh);
+ NEXT_PASS (pass_sccopy);
  NEXT_PASS (pass_profile);
  NEXT_PASS (pass_local_pure_const);
  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 However, this also causes us to misdiagnose cases that should be
 real warnings (e.g., testsuite/gcc.dg/

Re: [PATCH v2] A new copy propagation and PHI elimination pass

2023-11-22 Thread Filip Kastl
Hi Richard,

> Can you name the new file gimple-ssa-sccopy.cc please?

Yes, no problem.

Btw, I thought that it is standard that gimple ssa passes have the tree-ssa-
prefix. Do I understand it correctly that this is not true and many
tree-ssa-*.cc passes should actually be named gimple-ssa-*.cc but remain
tree-ssa-*.cc for historical reasons?

>> +   3 A set of PHI statements that only refer to each other or to one other
>> + value.
>> +
>> +   _8 = PHI <_9, _10>;
>> +   _9 = PHI <_8, _10>;
>> +   _10 = PHI <_8, _9, _1>;
> 
> this case necessarily involves a cyclic CFG, so maybe say
> 
> "This is a lightweight SSA copy propagation pass that is able to handle
> cycles optimistically, eliminating PHIs within those."
> 
> ?  Or is this a mis-characterization?

I'm not sure what you mean here. Yes, this case always involves a cyclic CFG.
Is it weird that a lightweight pass is able to handle cyclic CFG and therefore
you suggest to comment this fact and say that the pass handles cycles
optimistically?

I'm not sure if optimistic is a good word to characterize the pass. I'd expect
an "optimistic" pass to make assumptions which may not be true and therefore
not always all redundancies it can. This pass however should achieve all that
it sets out to do.

> It might be nice to optimize SCCs of size 1 somehow, not sure how
> many times these appear - possibly prevent them from even entering
> the SCC discovery?

Maybe that could be done. I would have to think about it and make sure it
doesn't break anything. I'd prefer to get this version into upstream and then
possibly post this upgrade later.

Btw, SCCs of size of size 1 appear all the time. Those are the cases 1 and 2
described in the comment at the beginning of the file.

> I'll note that while you are working with stmts everywhere that
> you are really bound to using SSA defs and those would already
> nicely have numbers (the SSA_NAME_VERSION).  In principle the
> SCC lattice could be pre-allocated once, indexed by
> SSA_NAME_VERSION and you could keep a "generation" number
> indicating what SCC discovery round it belongs to (aka the
> set_using).

I see. I could allocate a vertex struct for each statement only once when the
pass is invoked instead of allocating the structs each time tarjan_compute_sccs
is called. Will do that.

I'm not sure if I want to use SSA_NAME_VERSION for indexing an vec/array with
all those vertex structs. Many SSA names will be defined neither by PHI nor by
a copy assignment statement. If I created a vertex struct for every SSA name I
would allocate a lot of extra memory.

> There's a old SCC finding algorithm working on the SSA graph
> in the old SCC based value-numbering, for example on the
> gcc 7 branch in tree-ssa-sccvn.c:DFS

> For reading it would be nice to put the SCC finding in its
> own class.

Okay, I'll do that.

> > +   }
> > +}
> > +
> > +  if (!stack.is_empty ())
> > +gcc_unreachable ();
> > +
> > +  /* Clear copy stmts' 'using' flags.  */
> > +  for (vertex v : vs)
> > +{
> > +  gimple *s = v.stmt;
> > +  tarjan_clear_using (s);
> > +}
> > +
> > +  return sccs;
> > +}
> > +
> > +/* Could this statement potentially be a copy statement?
> > +
> > +   This pass only considers statements for which this function returns 
> > 'true'.
> > +   Those are basically PHI functions and assignment statements similar to
> > +
> > +   _2 = _1;
> > +   or
> > +   _2 = 5;  */
> > +
> > +static bool
> > +stmt_may_generate_copy (gimple *stmt)
> > +{
> > +  if (gimple_code (stmt) == GIMPLE_PHI)
> > +{
> > +  gphi *phi = as_a  (stmt);
> > +
> > +  /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> > +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> > +   return false;
> > +
> > +  unsigned i;
> > +  for (i = 0; i < gimple_phi_num_args (phi); i++)
> > +   {
> > + tree op = gimple_phi_arg_def (phi, i);
> > + if (TREE_CODE (op) == SSA_NAME
> > + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> > +   return false;
> > +   }
> 
> When there's more than one non-SSA PHI argument and they are not
> the same then the stmt also cannot be a copy, right?
> 
> > +  return true;
> > +}

Do I understand you correctly that you propose to put another check here?
Something like

unsigned nonssa_args_num = 0;
unsigned i;
for (i = 0; i < gimple_phi_num_args (phi); i++)
  {
tree op = gimple_phi_arg_def (phi, i);
if (TREE_CODE (op) == SSA_NAME)
  {
nonssa_args_num++;
if (nonssa_args_num >= 2)
  return false;

if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
  return false;
  }
  }

> > +
> > +  if (gimple_code (stmt) != GIMPLE_ASSIGN)
> > +return false;
> > +
> > +  /* If the statement has volatile operands, it won't generate a
> > + useful copy.  */
> > +  if (gimple_has_volatile_ops (stmt))
> > +return false;
> > +
> > +  /* Statements with loads and/or stores will never generate a useful 
> > copy.  

[COMMITED] MAINTAINERS: Fix order in DCO

2024-07-05 Thread Filip Kastl
Hi,

I've noticed wrong order in the Contributing under the DCO section of the
MAINTAINERS file.  I'm commiting the fix as obvious.

Filip Kastl

-- 8< --

ChangeLog:

* MAINTAINERS: Fix order in Contributing under the DCO.

Signed-off-by: Filip Kastl 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b4739f29107..762b91256c4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -766,6 +766,7 @@ Robin Dapp  

 Robin Dapp 
 Michal Jires   
 Matthias Kretz 
+Prathamesh Kulkarni
 Tim Lange  
 Jeff Law   
 Jeff Law   
@@ -791,4 +792,3 @@ Jonathan Wakely 

 Alexander Westbrooks   
 Chung-Ju Wu
 Pengxuan Zheng 
-Prathamesh Kulkarni
-- 
2.45.2



Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-08 Thread Filip Kastl
Hi,

I'm replying to Richard and keeping Andrew in cc since your suggestions
overlap.


On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> On Thu, 30 May 2024, Filip Kastl wrote:
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> 
> I think it's better to enable -mpopcnt and -mbmi (or what remains
> as minimal requirement).

Will do.  Currently the testcases are in the i386 directory.  After I exchange
the -march for -mpopcnt -mbmi can I put these testcases into gcc.dg/tree-ssa?
Will the -mpopcnt -mbmi options work with all target architectures?

> > +/* Check that the "exponential index transform" can be applied to this 
> > switch.
> > +
> > +   See comment of the exp_index_transform function for details about this
> > +   transformation.
> > +
> > +   We want:
> > +   - This form of the switch is more efficient
> > +   - Cases are powers of 2
> > +
> > +   Expects that SWTCH has at least one case.  */
> > +
> > +bool
> > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > +{
> > +  tree index = gimple_switch_index (swtch);
> > +  tree index_type = TREE_TYPE (index);
> > +  basic_block swtch_bb = gimple_bb (swtch);
> > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > +
> > +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> > + test that a given number is 2^k for some k (using POPCOUNT).  */
> > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > +|| !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, 
> > opt_type))
> > +return false;
> > +
> 
> See above, I think this can be improved.  Would be nice to split out
> a can_pow2p (type) and can_log2 (type) and a corresponding
> gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> and alternate variants added when required.
> 

Just to check that I understand this correctly:  You'd like me to create
functions can_pow2p, can_log2.  Those functions will check that there are
optabs for the target machine which allow us to efficiently test
power-of-2-ness of a number and which allow us to efficiently compute the
base-2 log of a power-of-2 number.  You'd also like me to create functions
gen_pow2p and gen_log2 which generate this code.  For now these functions will
just use POPCOUNT and FFS but they can be later extended to also consider
different instructions.  Is that right?

Into which file should I put these functions?

Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
(x & -x) == x so testing pow2p can always be done efficiently.

> > +  /* Insert a statement that takes the logarithm of the index variable.  */
> > +  tree tmp2 = make_ssa_name (index_type);
> > +  gsi = gsi_start_bb (swtch_bb);
> 
> Please use gsi_after_labels (swtch_bb) even though you know there's no
> labels there.
> 
> > +  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
> > +  gimple_call_set_lhs (stmt_ffs, tmp2);
> > +  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
> > +
> > +  tree tmp3 = make_ssa_name (index_type);
> > +  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, 
> > one);
> > +  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);
> 
> You could also use
> 
>  tree tmp2 = gimple_build (gsi, true, GSI_SAME_STMT,
>  UNKNOWN_LOCATION, IFN_FFS, index);
>  tree tmp3 = gimple_build (gsi, true, GSI_SAME_STMT,
>UNKNOWN_LOCATION, MINUS_EXPR, tmp2, one);
> 
> which does the stmt building, temporary SSA name creation and insertion
> plus eventually folding things with a definition.  There isn't a
> gimple_build_cond with this API, but it would still work for the
> popcount build above.

I've tried using the gimple_build API and it is indeed nicer.  However, I
wasn't able to get it working with the FFS internal function.  IFN_FFS is of a
different type than what gimple_build accepts.  I've tried this

  tree tmp2 = gimple_build (&gsi, true, GSI_SAME_STMT, UNKNOWN_LOCATION,
as_combined_fn (IFN_FFS), index);

but that only caused an ICE.  I tried looking for an example of using
gimple_build to build an internal function call in the sources but I haven't
found any.

If you'd be willing to help me with this, I will happily use the gimple_build
API.  If I don't figure this out I will just stick with the less clean original
version.

> > +  /* Use the result of the logarithm as t

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-11 Thread Filip Kastl
> > > > +/* Check that the "exponential index transform" can be applied to this 
> > > > switch.
> > > > +
> > > > +   See comment of the exp_index_transform function for details about 
> > > > this
> > > > +   transformation.
> > > > +
> > > > +   We want:
> > > > +   - This form of the switch is more efficient
> > > > +   - Cases are powers of 2
> > > > +
> > > > +   Expects that SWTCH has at least one case.  */
> > > > +
> > > > +bool
> > > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > > +{
> > > > +  tree index = gimple_switch_index (swtch);
> > > > +  tree index_type = TREE_TYPE (index);
> > > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > > +
> > > > +  /* Check that we can efficiently compute logarithm of 2^k (using 
> > > > FFS) and
> > > > + test that a given number is 2^k for some k (using POPCOUNT).  */
> > > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > > +|| !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, 
> > > > opt_type))
> > > > +return false;
> > > > +
> > > 
> > > See above, I think this can be improved.  Would be nice to split out
> > > a can_pow2p (type) and can_log2 (type) and a corresponding
> > > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > > and alternate variants added when required.
> > > 
> > 
> > Just to check that I understand this correctly:  You'd like me to create
> > functions can_pow2p, can_log2.  Those functions will check that there are
> > optabs for the target machine which allow us to efficiently test
> > power-of-2-ness of a number and which allow us to efficiently compute the
> > base-2 log of a power-of-2 number.  You'd also like me to create functions
> > gen_pow2p and gen_log2 which generate this code.  For now these functions 
> > will
> > just use POPCOUNT and FFS but they can be later extended to also consider
> > different instructions.  Is that right?
> 
> Right.
> 
> > Into which file should I put these functions?
> 
> Just in this file for now.
>  
> > Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> > (x & -x) == x so testing pow2p can always be done efficiently.
> 
> If you add this fallback then can_pow2p / gen_pow2p wouldn't be
> necessary indeed.

Hi Richard,

I put some work into splitting out the can_ and gen_ functions as you
suggested.  I'm still a bit unsure what your vision of these is so before I
submit all the changes I made to the patch as version 2 I would like to share
how I implemented the functions (see bellow).  Is this how you imagined the
functions?  Would you change something or do the they look ok?

I wasn't sure how generic to make the functions.  The more generic the more
convoluted the implementation becomes.  For example: I could make them more
generic by also including a gsi_iterator_update parameter or I could make them
less generic but more straightforward by removing the BEFORE parameter.

Cheers,
Filip Kastl


/* Does the target have optabs needed to efficiently compute exact base two
   logarithm of a value with type TYPE?

   See gen_log2.  */

static bool
can_log2 (tree type, optimization_type opt_type)
{
  /* Check if target supports FFS.  */
  return direct_internal_fn_supported_p (IFN_FFS, type, opt_type);
}

/* Assume that OP is a power of two.  Build a sequence of gimple statements
   efficiently computing the base two logarithm of OP using special optabs.
   Insert statements before GSI if BEFORE is true or after GSI otherwise.
   Return the result value as an ssa name tree.

   Leave GSI at the same statement (GSI_SAME_STMT behavior).

   Should only be used if target supports the needed optabs.  See can_log2.  */

static tree
gen_log2 (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
{
  /* Use .FFS (op) - 1.  */
  gimple *s = gsi->ptr;
  tree type = TREE_TYPE (op);
  gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
  tree tmp1 = gimple_build (gsi, before, update, loc, IFN_FFS, type, op);
  tree tmp2 = gimple_build (gsi, before, update, loc, MINUS_EXPR, type, tmp1,
build_one_cst (type));
  gsi->ptr = s;
  return tmp2;
}

/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
   sp

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-16 Thread Filip Kastl
On Wed 2024-07-10 11:34:44, Richard Biener wrote:
> On Mon, 8 Jul 2024, Filip Kastl wrote:
> 
> > Hi,
> > 
> > I'm replying to Richard and keeping Andrew in cc since your suggestions
> > overlap.
> > 
> > 
> > On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> > > On Thu, 30 May 2024, Filip Kastl wrote:
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> > > 
> > > I think it's better to enable -mpopcnt and -mbmi (or what remains
> > > as minimal requirement).
> > 
> > Will do.  Currently the testcases are in the i386 directory.  After I 
> > exchange
> > the -march for -mpopcnt -mbmi can I put these testcases into 
> > gcc.dg/tree-ssa?
> > Will the -mpopcnt -mbmi options work with all target architectures?
> 
> No, those are i386 specific flags.  At least for popcount there's
> dejagnu effective targets popcount, popcountl and popcountll so you
> could do
> 
> /* { dg-additional-options "-mpopcnt" { target { x86_64-*-* i?86-*-* } } } 
> */
> 
> and guard the tree dump scan with { target popcount } to cover other
> archs that have popcount (without adding extra flags).
> 

How does this take into account the FFS instruction?  If -mbmi is i386 specific
then I can't just put it into dg-options, right?  And if I wanted to handle it
similarly to how you suggest handling POPCOUNT, there would have to be
something like { target bmi }.  Is there something like that?

Note that I commited to adding x & -x == x as a fallback to POPCOUNT so now I
do not require -mpopcount.  I now just have to ensure that the testcase only
runs when the target supports FFS (or runs always but scans output only when
target supports FFS).

Cheers,
Filip Kastl


[PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-17 Thread Filip Kastl
Hi,

This is the second version of my patch introducing "exponential index
transformation" to the switch conversion pass.  See the version 1 mail here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653120.html


Changes made


In this version I addressed the following comments:

- Linaro CI: switch-3.c failing regtest
Exp transform interfered with this test so I added -fdisable-tree-switchconv.

- richi: gsi_start_bb -> gsi_after_labels
- richi: Use int_const_binop
- richi: Merge two cycles into one
- apinki, richi: Use wide_int instead of HOST_WIDE_INT
- richi: You can use the gimple_build API for nicer code
- richi: Use -mbmi -mpopcount instead -march=znver4
Made these modifications.

- richi: Split out code generating GIMPLE for log2 and pow2p operations
Made this modification.  The final implementation differs from the one I sent
in a reply to the version 1 patch.  I chose to return gimple_seq as Richard
suggested because it seems more elegant to me.

- richi: "It is good to not leave IL in a broken state"
Removed my FIXME remark suggesting to not fix dominators because the GIMPLE
code gets removed later.


Changes not made


These are the comments that I think were meant as suggestions, not as "this
must be changed" and I'm leaving them for possible future patches:

- richi: Also handle *minus* powers of 2 and powers of 2 +- a constant
- richi: Also allow using CTZ to compute log2
- apinski, richi: Smarter handling of type of index variable (current
  implementation cannot handle shorts and chars for example)

These are the comments that I'd like to reply to here but they didn't prompt
any change to the patch:

- richi: Signed index variable with 0x8000 as its value may be a problem.
Tested the patch version 2 for this situation.  In this situation, switch
conversion evaluates exponential transform as not viable so its fine.

- richi:
> > +  redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
> > +
> > +  /* FIXME: Since exponential transform is run only if we know that the
> > +switch will be converted, we know these blocks will be removed and we
> > +maybe don't have to bother updating their dominators.  */
> 
> It's good to not leave the IL in an intermediate broken state.
> 
> > +  edge e;
> > +  edge_iterator ei;
> > +  FOR_EACH_EDGE (e, ei, swtch_bb->succs)
> > +   {
> > + basic_block bb = e->dest;
> > + if (bb == m_final_bb || bb == default_bb)
> > +   continue;
> > + set_immediate_dominator (CDI_DOMINATORS, bb, swtch_bb);
> 
> If there's an alternate edge into the cases, thus the original
> dominator wasn't the swtch_bb the dominator shouldn't change.
> I wonder why it would change at all - you are only creating
> a new edge into the default case so only that block dominance
> relationship changes?
If I read the switch conversion sources right, there cannot be an alternate
edge into the non-default cases.  switch_convesion::collect would reject that
kind of switch.  So we know that case basic blocks will always have the switch
basic block as their immediate dominator.  This code here actually just
partially reverts (only for the case basic blocks) the
redirect_immediate_dominators call that happens a few lines earlier.  That call
is needed because all basic blocks *outside* the switch that had the switch
basic block as their immediate dominator should now have cond_bb as their
immediate dominator.

- richi:
> > +   }
> > +
> > +  vec v;
> > +  v.create (1);
> > +  v.quick_push (m_final_bb);
> > +  iterate_fix_dominators (CDI_DOMINATORS, v, true);
> 
> The final BB should have the condition block as immediate dominator
> if it's old immediate dominator was the switch block, otherwise
> it's immediate dominator shouldn't change.
I think that this is not true.  Consider this CFG where no path through default
BB intersects final BB:

switch BB ---+
/  |  \   \
case BBsdefault BB
\  |  /   /
final BB /
   |/

Here idom(final BB) == switch BB.
After the index exponential transform the CFG looks like this

cond BB -+
   | |
switch BB ---+   |
/  |  \   \  |
case BBsdefault BB
\  |  /   /
final BB /
   |/

It still holds that idom(final BB) == switch BB.


I bootstrapped and regtested the patch on x86_64 linux.

Can I commit the patch like this?  Or are there some things that still need
addressing?

Cheers
Filip Kastl


--- 8< ---


gimple ssa: Teach switch conversion to optimize powers of 2 switches

Sometimes a switch has case numbers that are powers of 2.  Switch
conversion usually isn't able to optimize switches.  This patch adds
"exponential index transformatio

Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-18 Thread Filip Kastl
On Thu 2024-07-18 12:07:42, Richard Biener wrote:
> On Wed, 17 Jul 2024, Filip Kastl wrote:
> > > > +   }
> > > > +
> > > > +  vec v;
> > > > +  v.create (1);
> > > > +  v.quick_push (m_final_bb);
> > > > +  iterate_fix_dominators (CDI_DOMINATORS, v, true);
> > > 
> > > The final BB should have the condition block as immediate dominator
> > > if it's old immediate dominator was the switch block, otherwise
> > > it's immediate dominator shouldn't change.
> > I think that this is not true.  Consider this CFG where no path through 
> > default
> > BB intersects final BB:
> > 
> > switch BB ---+
> > /  |  \   \
> > case BBsdefault BB
> > \  |  /   /
> > final BB /
> >|/
> > 
> > Here idom(final BB) == switch BB.
> > After the index exponential transform the CFG looks like this
> > 
> > cond BB -+
> >| |
> > switch BB ---+   |
> > /  |  \   \  |
> > case BBsdefault BB
> > \  |  /   /
> > final BB /
> >|/
> > 
> > It still holds that idom(final BB) == switch BB.
> 
> Sure but you still know the dominator of the default BB as the cond BB
> then?  That said, I think that using iterate_fix_dominators is overkill
> and you should know the new immediate dominator and can set it directly?

Sorry, I'm not sure if I understand.  Are you suggesting something like this?

if (idom(default bb) == cond bb)
{
  if (exists a path from default bb to final bb)
  {
idom(final bb) = cond bb;
  }
  else
  {
idom(final bb) = switch bb;
  }
}
else
{
  // idom(final bb) doesn't change
}

If so, how do I implement testing existence of a path from default bb to final
bb?  I guess I could do BFS but that seems like a pretty complicated solution.

> 
> That said, even above if there's a merge of the default BB and final BB
> downstream in the CFG, inserting cond BB requires adjustment of the
> immediate dominator of that merge block and you are missing that?

I think this isn't a problem because I do

redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);

If the old immediate dominator of the merge bb was the switch bb then I set its
immediate dominator to cond bb.  Otherwise its immediate dominator stays the
same.

I try to prove that this is correct here:

> > +3 Other blocks that had the switch block as their immediate dominator
> > +should now have the cond block as their immediate dominator.
> > +
> > +4 Immediate dominators of the rest of the blocks shouldn't change.
> > +
> > +Reasoning for 3 and 4:
> > +
> > +We'll only consider blocks that do not fall into 1 or 2.
> > +
> > +Consider a block X whose original imm dom was the switch block.  All
> > +paths to X must also intersect the cond block since it's the only
> > +pred of the switch block.  The final block doesn't dominate X so at
> > +least one path P must lead through the default block.  Let P' be P but
> > +instead of going through the switch block, take E.  The switch block
> > +doesn't dominate X so its imm dom must now be the cond block.
> > +
> > +Consider a block X whose original imm dom was Y != the switch block.
> > +We only added an edge so all original paths to X are still present.
> > +So X gained no new dominators.  Observe that Y still dominates X.
> > +There would have to be a path that avoids Y otherwise.  But any block
> > +we can avoid now except for the switch block we were able to avoid
> > +before adding E.  */

Cheers,
Filip Kastl


[COMMITED] MAINTAINERS: Fix an entry using spaces instead of tabs

2024-05-14 Thread Filip Kastl
In the MAINTAINERS file, names and emails are separated by tabs.  One of
the entries recently added used spaces.  This patch corrects this.

The check-MAINTAINERS.py script breaks a bit when this happens.  This
patch also adds warning about this situation into the script.

ChangeLog:

* MAINTAINERS: Use tabs between name and email.

contrib/ChangeLog:

* check-MAINTAINERS.py: Add warning about not using tabs.

Signed-off-by: Filip Kastl 
---
 MAINTAINERS  | 2 +-
 contrib/check-MAINTAINERS.py | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 361059fd55c..8bb435dd54e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -738,7 +738,7 @@ Kwok Cheung Yeung   

 Greta Yorsh
 David Yuste
 Adhemerval Zanella 
-Xiao Zeng   
+Xiao Zeng  
 Dennis Zhang   
 Yufeng Zhang   
 Qing Zhao  
diff --git a/contrib/check-MAINTAINERS.py b/contrib/check-MAINTAINERS.py
index 9f31a10bcff..2bac67f0821 100755
--- a/contrib/check-MAINTAINERS.py
+++ b/contrib/check-MAINTAINERS.py
@@ -71,6 +71,14 @@ def check_group(name, lines):
 print(f'Line should not start with space: "{line}"')
 exit_code = 2
 
+# Special-case some names
+if line == 'James Norris':
+continue
+
+if '\t' not in line:
+print(f'Name and email should be separated by tabs: "{line}"')
+exit_code = 2
+
 lines = [line + '\n' for line in lines]
 sorted_lines = sorted(lines, key=sort_by_surname)
 if lines != sorted_lines:
-- 
2.45.0



[PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-05-30 Thread Filip Kastl
Hi,

This patch adds a transformation into the switch conversion pass --
the "exponential index transform".  This transformation can help switch
conversion convert switches it otherwise could not.  The transformation is
intended for switches whose cases are all powers of 2.  Here is a more detailed
description of what and how this patch tries to address:


The problem
---

Consider this piece of C code

switch (i)
  {
case (1 << 0): return 0;
case (1 << 1): return 1;
case (1 << 2): return 2;
...
case (1 << 30): return 30;
default: return 31;
  }

If i is a power of 2 (2^k), the switch just returns the exponent (k).  This can
be seen as taking the base 2 logarithm of i or as returning the position of the
singular 1 bit in i.

Currently, GCC fails to see this and doesn't optimize the switch in any way.

Switch conversion is able to optimize similar switches to an array lookup.
This is not possible here because the range of cases is too wide.  Another
optimization that switch conversion is able to do is the "linear
transformation" -- switch conversion is able to notice a linear relationship
between the index variable (variable i in the case) and the result value and
rewrite switch to just an assignment (or multiple assignments in case of
multiple result values). Sadly, linear transformation isn't applicable here
because the linear relationship is based on the exponent of i, not on i itself.


The solution


The exponential index transformation does the following.  When it recognises
that a switch only has case numbers that are powers of 2 it replaces them with
their exponents.  It also replaces the index variable by its exponent.  This is
done by inserting a statement that takes the logarithm of i and using the
result as the new index variable.  Actually we use the FFS operation for this
-- since we expect a power of two, we may just ask for the position of the
first 1 bit.

We also need to insert a conditional that checks at runtime that the index
variable is a power of two.  If it isn't, the resulting value should just be
the default case value (31 in the example above).

With exponential index transform, switch conversion is able to simplify the
above example into something like this

if (i is power of 2)
  return log2(i); // actually implemented as ffs(i) - 1
else
  return 31;

Switch conversion bails if the range of case numbers is too big.  Exponential
index transform shrinks this range (exponentially).  So even if there is no
linear relationship in the switch, exponential index transform can still help
convert the switch at least to an array lookup.


Limitations
---

Currently we only run the exponential index transform if the target has the
POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
logarithm) instructions -- we check direct_internal_fn_supported_p () for
POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
less efficient than just using a jump table.  We try to avoid transforming a
switch into a less efficient form.  Maybe this is too conservative and could be
tweaked in the future.


Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
and regtest on a version where I removed the check that the target has the
POPCOUNT and FFS instructions so that the transformation would be triggered
more often.  That testing also went well.

Are there any things I should tweak?  Or is the patch ready to be applied?

Cheers,
Filip Kastl


-- 8< --


Sometimes a switch has case numbers that are powers of 2.  Switch
conversion usually isn't able to optimize switches.  This patch adds
"exponential index transformation" to switch conversion.  After switch
conversion applies this transformation on the switch the index variable
of the switch becomes the exponent instead of the whole value.  For
example:

switch (i)
  {
case (1 << 0): return 0;
case (1 << 1): return 1;
case (1 << 2): return 2;
...
case (1 << 30): return 30;
default: return 31;
  }

gets transformed roughly into

switch (log2(i))
  {
case 0: return 0;
case 1: return 1;
case 2: return 2;
...
case 30: return 30;
default: return 31;
  }

This enables switch conversion to further optimize the switch.

This patch only enables this transformation if there are optabs for
POPCOUNT and FFS so that the base 2 logarithm can be computed
efficiently at runtime.

gcc/ChangeLog:

* tree-switch-conversion.cc (switch_conversion::switch_conversion):
Track if the transformation happened.
(switch_conversion::is_exp_index_transform_viable): New function
to decide whether the transformation should be applied.
(switch_conversion::exp_index_transform): New function to
execute the transformation.
(switch_conversion::gen_inbound_check): Don't 

[wwwdocs v2] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-21 Thread Filip Kastl
Hi,

this is the second version of my patch.  See version 1 here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659584.html

Changes made:
- Removed plural when referring to the single changed header.  From the two
  versions of the text I considered I chose the one with less changes as
  Jonathan suggested.
- Changed "in libstdc++" to "within libstdc++".

Validated with the W3 Validator.  Is this ok to be pushed?

Cheers,
Filip Kastl


---
 htdocs/gcc-15/changes.html|  3 +-
 htdocs/gcc-15/porting_to.html | 54 +++
 2 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 htdocs/gcc-15/porting_to.html

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index fe7cf3c1..d0d6d147 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -17,9 +17,8 @@
 
 This page is a "brief" summary of some of the huge number of improvements
 in GCC 15.
-
 
diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html
new file mode 100644
index ..702cf507
--- /dev/null
+++ b/htdocs/gcc-15/porting_to.html
@@ -0,0 +1,54 @@
+
+
+
+
+
+Porting to GCC 15
+https://gcc.gnu.org/gcc.css";>
+
+
+
+Porting to GCC 15
+
+
+The GCC 15 release series differs from previous GCC releases in
+a number of ways. Some of these are a result
+of bug fixing, and some old behaviors have been intentionally changed
+to support new standards, or relaxed in standards-conforming ways to
+facilitate compilation or run-time performance.
+
+
+
+Some of these changes are user visible and can cause grief when
+porting to GCC 15. This document is an effort to identify common issues
+and provide solutions. Let us know if you have suggestions for improvements!
+
+
+Note: GCC 15 has not been released yet, so this document is
+a work-in-progress.
+
+
+
+C++ language issues
+
+Header dependency changes
+Some C++ Standard Library headers have been changed to no longer include
+other headers that were being used internally by the library.
+As such, C++ programs that used standard library components without
+including the right headers will no longer compile.
+
+
+In particular, the following header is used less widely within libstdc++ and
+may need to be included explicitly when compiling with GCC 15:
+
+
+ <cstdint>
+  (for std::int8_t, std::int32_t etc.)
+
+
+
+
+
+
+
+
-- 
2.45.2



Re: [wwwdocs v2] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-21 Thread Filip Kastl
On Wed 2024-08-21 09:50:39, Jonathan Wakely wrote:
> On Wed, 21 Aug 2024 at 09:48, Filip Kastl  wrote:
> >
> > Hi,
> >
> > this is the second version of my patch.  See version 1 here:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659584.html
> >
> > Changes made:
> > - Removed plural when referring to the single changed header.  From the two
> >   versions of the text I considered I chose the one with less changes as
> >   Jonathan suggested.
> > - Changed "in libstdc++" to "within libstdc++".
> >
> > Validated with the W3 Validator.  Is this ok to be pushed?
> 
> LGTM. I think I can approve this, since it's documenting libstdc++
> changes and I can approve libstdc++ patches, so unless Gerald has any
> further suggestions, please do push - thanks!
> 

I've waited a few hours in case Gerald chimes in.  I'm going to push it now.
Hope that's ok.

Thanks for the feedback!
Filip Kastl

> >
> > Cheers,
> > Filip Kastl
> >
> >
> > ---
> >  htdocs/gcc-15/changes.html|  3 +-
> >  htdocs/gcc-15/porting_to.html | 54 +++
> >  2 files changed, 55 insertions(+), 2 deletions(-)
> >  create mode 100644 htdocs/gcc-15/porting_to.html
> >
> > diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
> > index fe7cf3c1..d0d6d147 100644
> > --- a/htdocs/gcc-15/changes.html
> > +++ b/htdocs/gcc-15/changes.html
> > @@ -17,9 +17,8 @@
> >  
> >  This page is a "brief" summary of some of the huge number of improvements
> >  in GCC 15.
> > -
> >  
> > diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html
> > new file mode 100644
> > index ..702cf507
> > --- /dev/null
> > +++ b/htdocs/gcc-15/porting_to.html
> > @@ -0,0 +1,54 @@
> > +
> > +
> > +
> > +
> > +
> > +Porting to GCC 15
> > +https://gcc.gnu.org/gcc.css";>
> > +
> > +
> > +
> > +Porting to GCC 15
> > +
> > +
> > +The GCC 15 release series differs from previous GCC releases in
> > +a number of ways. Some of these are a result
> > +of bug fixing, and some old behaviors have been intentionally changed
> > +to support new standards, or relaxed in standards-conforming ways to
> > +facilitate compilation or run-time performance.
> > +
> > +
> > +
> > +Some of these changes are user visible and can cause grief when
> > +porting to GCC 15. This document is an effort to identify common issues
> > +and provide solutions. Let us know if you have suggestions for 
> > improvements!
> > +
> > +
> > +Note: GCC 15 has not been released yet, so this document is
> > +a work-in-progress.
> > +
> > +
> > +
> > +C++ language issues
> > +
> > +Header dependency changes
> > +Some C++ Standard Library headers have been changed to no longer include
> > +other headers that were being used internally by the library.
> > +As such, C++ programs that used standard library components without
> > +including the right headers will no longer compile.
> > +
> > +
> > +In particular, the following header is used less widely within libstdc++ 
> > and
> > +may need to be included explicitly when compiling with GCC 15:
> > +
> > +
> > + <cstdint>
> > +  (for std::int8_t, std::int32_t etc.)
> > +
> > +
> > +
> > +
> > +
> > +
> > +
> > +
> > --
> > 2.45.2
> >
> 


[PATCH] gimple ssa: switchconv: Use __builtin_popcount and support more types in exp transform [PR116355]

2024-08-24 Thread Filip Kastl
Hi,

bootstrapped and regtested on x86_64-linux.  Ok to push?

Cheers,
Filip Kastl


-- 8< --


The gen_pow2p function generates (a & -a) == a as a fallback for
POPCOUNT (a) == 1.  Not only is the bitmagic not equivalent to
POPCOUNT (a) == 1 but it also introduces UB (consider signed
a = INT_MIN).

This patch rewrites gen_pow2p to always use __builtin_popcount instead.
This means that what the end result GIMPLE code is gets decided by an
already existing machinery in a later pass.  That is a cleaner solution
I think.  This existing machinery also uses a ^ (a - 1) > a - 1 which is
the correct bitmagic.

While rewriting gen_pow2p I had to add logic for converting the
operand's type to a type that __builtin_popcount accepts.  I naturally
also added this logic to gen_log2.  Thanks to this, exponential index
transform gains the capability to handle all operand types with
precision at most that of long long int.

PR tree-optimization/116355

gcc/ChangeLog:

* tree-switch-conversion.cc (can_log2): Take into account the
conversion added to gen_log2.
(gen_log2): Add a conversion to a type compatible with FFS.
(can_pow2p): New function.
(gen_pow2p): Rewrite to use __builtin_popcount instead of
manually inserting an internal fn call or bitmagic.
(switch_conversion::is_exp_index_transform_viable): Call
can_pow2p.
(switch_conversion::exp_index_transform): Params of gen_pow2p
changed so update its call.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of POPCOUNT internal fn after switch conversion.  Test
for it after __builtin_popcount has had a chance to get
expanded.
* gcc.target/i386/switch-exp-transform-3.c: Also test char and
short.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |   7 +-
 .../gcc.target/i386/switch-exp-transform-3.c  |  98 ++-
 gcc/tree-switch-conversion.cc | 117 ++
 3 files changed, 192 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index 53d31460ba3..a8c9e03e515 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,9 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -mpopcnt -mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated.  */
+   power of two" check has been generated and that it has been later expanded
+   into an internal function.  */
 
 int foo(unsigned bar)
 {
@@ -29,4 +30,4 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
+/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index 64a7b146172..5011d1ebb0e 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -3,10 +3,104 @@
 
 /* Checks that the exponential index transformation is done for all these types
of the index variable:
+   - (unsigned) char
+   - (unsigned) short
- (unsigned) int
- (unsigned) long
- (unsigned) long long  */
 
+int unopt_char(char bit_position)
+{
+switch (bit_position)
+{
+case (1 << 0):
+return 0;
+case (1 << 1):
+return 1;
+case (1 << 2):
+return 2;
+case (1 << 3):
+return 3;
+case (1 << 4):
+return 4;
+case (1 << 5):
+return 5;
+case (1 << 6):
+return 6;
+default:
+return 0;
+}
+}
+
+int unopt_unsigned_char(unsigned char bit_position)
+{
+switch (bit_position)
+{
+case (1 << 0):
+return 0;
+case (1 << 1):
+return 1;
+case (1 << 2):
+return 2;
+case (1 << 3):
+return 3;
+case (1 << 4):
+return 4;
+case (1 << 5):
+return 5;
+case (1 << 6):
+return 6;
+default:
+return 0;
+}
+}
+
+int unopt_short(short bit_position)
+{
+switch (bit_position)
+{
+case (1 << 0):
+return 0;
+   

[PATCH] contrib: Set check-params-in-docs.py to skip tables of values of a param

2024-08-24 Thread Filip Kastl
Hi,

Here is the part of invoke.texi that currently confuses
check-params-in-docs.py:

@item aarch64-autovec-preference
Force an ISA selection strategy for auto-vectorization.
@table @samp
@item default
Use the default heuristics.
@item asimd-only
Use only Advanced SIMD for auto-vectorization.
@item sve-only
Use only SVE for auto-vectorization.
@item prefer-asimd
Use both Advanced SIMD and SVE.  Prefer Advanced SIMD when the costs are
deemed equal.
@item prefer-sve 
Use both Advanced SIMD and SVE.  Prefer SVE when the costs are deemed equal.
@end table

The script reports

Extra:
{'default', 'asimd-only', 'sve-only', 'prefer-asimd', 'prefer-sve'}

Is the patch ok to be pushed?

Cheers,
Filip Kastl


-- 8< --


Currently check-params-in-docs.py reports extra params being listed in
invoke.texi.  However, those aren't actual params but items in a table of
possible values of the aarch64-autove-preference param.

This patch changes check-params-in-docs.py to ignore similar tables.

contrib/ChangeLog:

* check-params-in-docs.py: Skip tables of values of a param.

Signed-off-by: Filip Kastl 
---
 contrib/check-params-in-docs.py | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index ccdb8d72169..8574842a4e7 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -66,7 +66,18 @@ texi = takewhile(lambda x: '@node Instrumentation Options' 
not in x, texi)
 texi = list(texi)[1:]
 
 texi_params = []
+skip = False
 for line in texi:
+# Skip @table @samp sections of manual where values of a param are usually
+# listed
+if skip:
+if line.startswith('@end table'):
+skip = False
+continue
+elif line.startswith('@table @samp'):
+skip = True
+continue
+
 for token in ('@item ', '@itemx '):
 if line.startswith(token):
 texi_params.append(line[len(token):])
-- 
2.46.0



[PATCH v2] gimple ssa: switchconv: Use __builtin_popcount and support more types in exp transform [PR116355]

2024-08-27 Thread Filip Kastl
Hi,

this is the second version of this patch.  See the mail with the first version
here:

https://inbox.sourceware.org/gcc-patches/ZsnRLdYErnIWQlCe@localhost.localdomain/

In this version I've made these adjustments:
- Added calls direct_internal_fn_supported_p to can_pow2p.  Before I just
  assumed that if the target supports FFS at all it will support it for
  unsigned, long unsigned and long long unsigned and didn't check this.
  - Also added a direct_intenal_fn_supported_p check for the type passed to
can_pow2p as a small compile time optimization.  This is the first check
that runs and if it is positive, the function exits early and doesn't
bother with any conversions.
- can_pow2p and can_log2 now return the type that gen_pow2p and gen_log2 should
  convert to before generating the respective operation.  gen_pow2p and
  gen_log2 now have this type as one of their parameters.
- Using gimple_convert instead of manually building CONVERT_EXPR/NOP_EXPR
  assignments.
- Using gimple_build for building __builtin_popcount.
- Adjusted ChangeLog entries.

Bootstrapped and regtested on x86_64 linux.  Ok to push?

Cheers,
Filip Kastl


-- 8< --


The gen_pow2p function generates (a & -a) == a as a fallback for
POPCOUNT (a) == 1.  Not only is the bitmagic not equivalent to
POPCOUNT (a) == 1 but it also introduces UB (consider signed
a = INT_MIN).

This patch rewrites gen_pow2p to always use __builtin_popcount instead.
This means that what the end result GIMPLE code is gets decided by an
already existing machinery in a later pass.  That is a cleaner solution
I think.  This existing machinery also uses a ^ (a - 1) > a - 1 which is
the correct bitmagic.

While rewriting gen_pow2p I had to add logic for converting the
operand's type to a type that __builtin_popcount accepts.  I naturally
also added this logic to gen_log2.  Thanks to this, exponential index
transform gains the capability to handle all operand types with
precision at most that of long long int.

PR tree-optimization/116355

gcc/ChangeLog:

* tree-switch-conversion.cc (can_log2): Add capability to
suggest converting the operand to a different type.
(gen_log2): Add capability to generate a conversion in case the
operand is of a type incompatible with the logarithm operation.
(can_pow2p): New function.
(gen_pow2p): Rewrite to use __builtin_popcount instead of
manually inserting an internal fn call or bitmagic.  Also add
capability to generate a conversion.
(switch_conversion::is_exp_index_transform_viable): Call
can_pow2p.  Store types suggested by can_log2 and gen_log2.
(switch_conversion::exp_index_transform): Params of gen_pow2p
and gen_log2 changed so update their calls.
* tree-switch-conversion.h: Add m_exp_index_transform_log2_type
and m_exp_index_transform_pow2p_type to switch_conversion class
to track type conversions needed to generate the "is power of 2"
and logarithm operations.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of POPCOUNT internal fn after switch conversion.  Test
for it after __builtin_popcount has had a chance to get
expanded.
* gcc.target/i386/switch-exp-transform-3.c: Also test char and
short.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |   7 +-
 .../gcc.target/i386/switch-exp-transform-3.c  |  98 ++-
 gcc/tree-switch-conversion.cc | 152 ++
 gcc/tree-switch-conversion.h  |   7 +
 4 files changed, 227 insertions(+), 37 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index 53d31460ba3..a8c9e03e515 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,9 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -mpopcnt -mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated.  */
+   power of two" check has been generated and that it has been later expanded
+   into an internal function.  */
 
 int foo(unsigned bar)
 {
@@ -29,4 +30,4 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
+/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b

Re: [PATCH][contrib]: support json output from check_GNU_style_lib.py

2024-07-22 Thread Filip Kastl
Hi Tamar,

I wanted to try reviewing a patch and this seemed simple enough so I gave it a
shot.  Hopefully this saves some time of the maintainer that eventually
approves this :).

I don't see any bug in the code.  I also tried running it on my own input and
the output was correct.  So functionally the patch is good.  I have some
remarks about the style though:

- You sometimes put a space between function name and parentheses.  This
  doesn't look python-ish and isn't consistent in the file.
- There's one very long line (check_GNU_style_lib.py:335).  I would shorten it
  so it is at most 79 characters long.
- On the same line, there is a space after { but not before }.  For
  consistency, I would erase the space after {
- On the same line there are spaces after :.  I think a more python-ish way
  would be not to have those spaces there.  Here I'm maybe being too pedantic
  so feel free to ignore this.  I think it will look nice either way.

To summarize the last 3 points, I would replace this

errlines.append({ "file" : locs[0], "row" : locs[1], "column" : 
locs[2], "err" : e.console_error})

with this

errlines.append({"file": locs[0], "row": locs[1],
"column": locs[2], "err": e.console_error})

Cheers,
Filip Kastl


Re: [PATCH][contrib]: support json output from check_GNU_style_lib.py

2024-07-24 Thread Filip Kastl
> How about this formatting, I tend to find it a bit easier to read even.
> I also updated the location numbering to be numerical so, removed the quotes.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> +elif format == 'json':
> +fn = lambda x: x.error_message
> +i = 1
> +result = []
> +for (k, errors) in groupby(sorted(errors, key = fn), fn):
> +errors = list(errors)
> +entry = {}
> +entry['type'] = i
> +entry['msg'] = k
> +entry['count'] = len(errors)
> +i += 1
> +errlines = []
> +for e in errors:
> +locs = e.error_location ().split(':')
> +errlines.append({ "file": locs[0]
> +, "row": int(locs[1])
> +, "column": int(locs[2])
> +, "err": e.console_error })
> +entry['errors'] = errlines
> +result.append(entry)
> +
> +if len(errors) == 0:
> +exit(0)
> +else:
> +json_string = json.dumps(result)
> +print(json_string)
> +exit(1)
>  else:
>  assert False

Sure, this looks nice.  I'm not sure if I have the right to approve the patch
though.

Cheers,
Filip Kastl


Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-29 Thread Filip Kastl
Hi Richard,

> > Sorry, I'm not sure if I understand.  Are you suggesting something like 
> > this?
> > 
> > if (idom(default bb) == cond bb)
> > {
> >   if (exists a path from default bb to final bb)
> >   {
> > idom(final bb) = cond bb;
> >   }
> >   else
> >   {
> > idom(final bb) = switch bb;
> >   }
> > }
> > else
> > {
> >   // idom(final bb) doesn't change
> > }

Sidenote: I've just noticed that this code wouldn't work since the original
idom of final_bb may be some block outside of the switch.  In that case, idom
of final_bb should remain the same after the transformation regardless of if
idom(default bb) == cond bb.  So the code would look like this

if (original idom(final bb) == switch bb and idom(default bb) == cond bb)
{
  if (exists a path from default bb to final bb)
  {
idom(final bb) = cond bb;
  }
  else
  {
idom(final bb) = switch bb;
  }
}
else
{
  // idom(final bb) doesn't change
}

> > 
> > If so, how do I implement testing existence of a path from default bb to 
> > final
> > bb?  I guess I could do BFS but that seems like a pretty complicated 
> > solution.
> > > 
> > > That said, even above if there's a merge of the default BB and final BB
> > > downstream in the CFG, inserting cond BB requires adjustment of the
> > > immediate dominator of that merge block and you are missing that?
> > 
> > I think this isn't a problem because I do
> > 
> > redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
> 
> Hmm, I'm probably just confused.  So the problem is that
> redirect_immediate_dominators makes the dominator of final_bb incorrect
> (but also all case_bb immediate dominator?!)?

Yes, the problem is what the idom of final_bb should be after the
transformation.  However, redirect_immediate_dominators doesn't *make* the idom
of final_bb incorrect.  It may have been already incorrect before the call (the
call may also possibly make the idom correct btw).

This has probably already been clear to you.  I'm just making sure we're on the
same page.

> 
> Ah, I see you fix those up.  Then 2.) is left - the final block.  Iff
> the final block needs adjustment you know there was a path from
> the default case to it which means one of its predecessors is dominated
> by the default case?  In that case, adjust the dominator to cond_bb,
> otherwise leave it at switch_bb?

Yes, what I'm saying is that if I want to know idom of final_bb after the
transformation, I have to know if there is a path between default_bb and
final_bb.  It is because of these two cases:

1.

cond BB -+
   | |
switch BB ---+   |
/  |  \   \  |
case BBsdefault BB
\  |  /   /
final BB <---+  <- this may be an edge or a path
   |

2.

cond BB -+
   | |
switch BB ---+   |
/  |  \   \  |
case BBsdefault BB
\  |  /   /
final BB / <- this may be an edge or a path
   |/

In the first case, there is a path between default_bb and final_bb and in the
second there isn't.  Otherwise the cases are the same.  In the first case idom
of final_bb should be cond_bb.  In the second case idom of final_bb should be
switch_bb. Algorithm deciding what should be idom of final_bb therefore has to
know if there is a path between default_bb and final_bb.

You said that if there is a path between default_bb and final_bb, one of the
predecessors of final_bb is dominated by default_bb.  That would indeed give a
nice way to check existence of a path between default_bb and final_bb.  But
does it hold?  Consider this situation:

   | |
cond BB --+
   | ||
switch BB +   |
/  |  \  | \  |
case BBs |default BB
\  |  /  |/
final BB <- pred BB -+
   |

Here no predecessors of final_bb are dominated by default_bb but at the same
time there does exist a path from default_bb to final_bb.  Or is this CFG
impossible for some reason?

Btw to further check that we're on the same page:  Right now we're only trying
to figure out if there is a way to update idom of final_bb after the
transformation without using iterate_fix_dominators, right?  The rest of my
dominator fixing code makes sense / is ok?

Cheers,
Filip Kastl


Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-30 Thread Filip Kastl
> > > Ah, I see you fix those up.  Then 2.) is left - the final block.  Iff
> > > the final block needs adjustment you know there was a path from
> > > the default case to it which means one of its predecessors is dominated
> > > by the default case?  In that case, adjust the dominator to cond_bb,
> > > otherwise leave it at switch_bb?
> > 
> > Yes, what I'm saying is that if I want to know idom of final_bb after the
> > transformation, I have to know if there is a path between default_bb and
> > final_bb.  It is because of these two cases:
> > 
> > 1.
> > 
> > cond BB -+
> >| |
> > switch BB ---+   |
> > /  |  \   \  |
> > case BBsdefault BB
> > \  |  /   /
> > final BB <---+  <- this may be an edge or a path
> >|
> > 
> > 2.
> > 
> > cond BB -+
> >| |
> > switch BB ---+   |
> > /  |  \   \  |
> > case BBsdefault BB
> > \  |  /   /
> > final BB / <- this may be an edge or a path
> >|/
> > 
> > In the first case, there is a path between default_bb and final_bb and in 
> > the
> > second there isn't.  Otherwise the cases are the same.  In the first case 
> > idom
> > of final_bb should be cond_bb.  In the second case idom of final_bb should 
> > be
> > switch_bb. Algorithm deciding what should be idom of final_bb therefore has 
> > to
> > know if there is a path between default_bb and final_bb.
> > 
> > You said that if there is a path between default_bb and final_bb, one of the
> > predecessors of final_bb is dominated by default_bb.  That would indeed 
> > give a
> > nice way to check existence of a path between default_bb and final_bb.  But
> > does it hold?  Consider this situation:
> > 
> >| |
> > cond BB --+
> >| ||
> > switch BB +   |
> > /  |  \  | \  |
> > case BBs |default BB
> > \  |  /  |/
> > final BB <- pred BB -+
> >|
> > 
> > Here no predecessors of final_bb are dominated by default_bb but at the same
> > time there does exist a path from default_bb to final_bb.  Or is this CFG
> > impossible for some reason?
> 
> I think in this case the dominator simply need not change - the only case
> you need to adjust it is when the immediate dominator of final BB was
> switch BB before the transform, and then we know we have to update it
> too cond BB, no?

Ah, my bad.  Yes, my counterexample actually isn't a problem.  I was glad when
I realized that and started thinking that this...

if (original idom(final bb) == switch bb)
{
  if (exists a pred of final bb dominated by default bb)
  {
idom(final bb) = cond bb;
  }
  else
  {
idom(final bb) = switch bb;
  }
}
else
{
  // idom(final bb) doesn't change
}

...might be the final solution.  But after thinking about it for a while I
(saddly) came up with another counterexample.

   |  
cond BB --+
   |  |
switch BB +   |
/  |  \\  |
case BBs  default BB
\  |  /   /
final BB <- pred BB -+
   |   ^
   |   |
   +---+  <- this may be a path or an edge I guess

Here there *is* a path between default_bb and final_bb but since no predecessor
of final_bb is dominated by default_bb we incorrectly decide that there is no
such path.  Therefore we incorrectly assign idom(final_bb) = switch_bb instead
of idom(final_bb) = cond_bb.

So unless I'm missing something, "final has a pred dominated by default" isn't
equivalent with "there is a path between default and final" even when we assume
that the original idom of final_bb was switch_bb.  Therefore I think we're back
to searching for a nice way to test "there is a path between default and
final".

Maybe you can spot a flaw in my logic or maybe you see a solution I don't.
Meanwhile I'll look into source code of the rest of the switch conversion pass.
Switch conversion pass inserts conditions similar to what I'm doing so someone
before me may have already solved how to properly fix dominators in this
situation.

Cheers,
Filip Kastl


Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-30 Thread Filip Kastl
> Meanwhile I'll look into source code of the rest of the switch conversion 
> pass.
> Switch conversion pass inserts conditions similar to what I'm doing so someone
> before me may have already solved how to properly fix dominators in this
> situation.

Oh nevermind.  Switch conversion (gen_inbound_check ()) actually uses
iterate_fix_dominators.


Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-30 Thread Filip Kastl
On Tue 2024-07-30 14:34:54, Richard Biener wrote:
> On Tue, 30 Jul 2024, Filip Kastl wrote:
> 
> > > > > Ah, I see you fix those up.  Then 2.) is left - the final block.  Iff
> > > > > the final block needs adjustment you know there was a path from
> > > > > the default case to it which means one of its predecessors is 
> > > > > dominated
> > > > > by the default case?  In that case, adjust the dominator to cond_bb,
> > > > > otherwise leave it at switch_bb?
> > > > 
> > > > Yes, what I'm saying is that if I want to know idom of final_bb after 
> > > > the
> > > > transformation, I have to know if there is a path between default_bb and
> > > > final_bb.  It is because of these two cases:
> > > > 
> > > > 1.
> > > > 
> > > > cond BB -+
> > > >| |
> > > > switch BB ---+   |
> > > > /  |  \   \  |
> > > > case BBsdefault BB
> > > > \  |  /   /
> > > > final BB <---+  <- this may be an edge or a path
> > > >|
> > > > 
> > > > 2.
> > > > 
> > > > cond BB -+
> > > >| |
> > > > switch BB ---+   |
> > > > /  |  \   \  |
> > > > case BBsdefault BB
> > > > \  |  /   /
> > > > final BB / <- this may be an edge or a path
> > > >|/
> > > > 
> > > > In the first case, there is a path between default_bb and final_bb and 
> > > > in the
> > > > second there isn't.  Otherwise the cases are the same.  In the first 
> > > > case idom
> > > > of final_bb should be cond_bb.  In the second case idom of final_bb 
> > > > should be
> > > > switch_bb. Algorithm deciding what should be idom of final_bb therefore 
> > > > has to
> > > > know if there is a path between default_bb and final_bb.
> > > > 
> > > > You said that if there is a path between default_bb and final_bb, one 
> > > > of the
> > > > predecessors of final_bb is dominated by default_bb.  That would indeed 
> > > > give a
> > > > nice way to check existence of a path between default_bb and final_bb.  
> > > > But
> > > > does it hold?  Consider this situation:
> > > > 
> > > >| |
> > > > cond BB --+
> > > >| ||
> > > > switch BB +   |
> > > > /  |  \  | \  |
> > > > case BBs |default BB
> > > > \  |  /  |/
> > > > final BB <- pred BB -+
> > > >|
> > > > 
> > > > Here no predecessors of final_bb are dominated by default_bb but at the 
> > > > same
> > > > time there does exist a path from default_bb to final_bb.  Or is this 
> > > > CFG
> > > > impossible for some reason?
> > > 
> > > I think in this case the dominator simply need not change - the only case
> > > you need to adjust it is when the immediate dominator of final BB was
> > > switch BB before the transform, and then we know we have to update it
> > > too cond BB, no?
> > 
> > Ah, my bad.  Yes, my counterexample actually isn't a problem.  I was glad 
> > when
> > I realized that and started thinking that this...
> > 
> > if (original idom(final bb) == switch bb)
> > {
> >   if (exists a pred of final bb dominated by default bb)
> >   {
> > idom(final bb) = cond bb;
> >   }
> >   else
> >   {
> > idom(final bb) = switch bb;
> >   }
> > }
> > else
> > {
> >   // idom(final bb) doesn't change
> > }
> > 
> > ...might be the final solution.  But after thinking about it for a while I
> > (saddly) came up with another counterexample.
> > 
> >|  
> > cond BB --+
> >|  |
> > switch BB +   |
> > /  |  \\  |
> > case BBs  default BB
> > \  |  /   /
> > final BB <- pred BB -+
> >|   ^
> >|   |
> >+---+  <- this may be a path or an edge I guess
> > 
> > Here there *is* a path between default_bb and final_bb but since no 
> > predecessor
> > of final_bb is dominated by default_bb we incorrectly d

[PATCH] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl
32bit x86 CPUs won't natively support the FFS operation on a 64 bit
type.  Therefore, the switch-exp-transform-3.c test will always fail
with a 32bit target.  I'm fixing my mistake.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-3.c: Remove code testing
  that the exponential index transform is able to handle long
  long int.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-3.c  | 51 +--
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index c8fae70692e..cd00071d0bc 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -4,8 +4,7 @@
 /* Checks that the exponential index transformation is done for all these types
of the index variable:
- (unsigned) int
-   - (unsigned) long
-   - (unsigned) long long  */
+   - (unsigned) long  */
 
 int unopt_int(int bit_position)
 {
@@ -99,50 +98,4 @@ int unopt_unsigned_long(unsigned long bit_position)
 }
 }
 
-int unopt_long_long(long long bit_position)
-{
-switch (bit_position)
-{
-case (1 << 0):
-return 0;
-case (1 << 1):
-return 1;
-case (1 << 2):
-return 2;
-case (1 << 3):
-return 3;
-case (1 << 4):
-return 4;
-case (1 << 5):
-return 5;
-case (1 << 6):
-return 6;
-default:
-return 0;
-}
-}
-
-int unopt_unsigned_long_long(unsigned long long bit_position)
-{
-switch (bit_position)
-{
-case (1 << 0):
-return 0;
-case (1 << 1):
-return 1;
-case (1 << 2):
-return 2;
-case (1 << 3):
-return 3;
-case (1 << 4):
-return 4;
-case (1 << 5):
-return 5;
-case (1 << 6):
-return 6;
-default:
-return 0;
-}
-}
-
-/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" } } */
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
"switchconv" } } */
-- 
2.45.2



[PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl
On Wed 2024-07-31 12:18:34, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 12:02:08PM +0200, Filip Kastl wrote:
> > 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> > type.  Therefore, the switch-exp-transform-3.c test will always fail
> > with a 32bit target.  I'm fixing my mistake.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/switch-exp-transform-3.c: Remove code testing
> >   that the exponential index transform is able to handle long
> >   long int.
> 
> But for -m64 it does and it is good to test even that.
> Can't you wrap the long long stuff with
> #ifdef __x86_64__
> and
> do
> /* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
> "switchconv" { target ia32 } } } */
> /* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
> "switchconv" { target { ! ia32 } } } } */
> or so?
> 
>   Jakub
> 

Thanks for the feedback!  Here is a second version of the patch.  I've tested
this version with

make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
--target_board='unix{-m32}'"

and

make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"

on a x86_64 machine and in both cases the test didn't produce any errors and
scan-tree-dump-times was successful.

Is this version ok?

Thanks,
Filip Kastl


-- 8< --


testsuite: Adjust switch-exp-transform-3.c for 32bit

32bit x86 CPUs won't natively support the FFS operation on a 64 bit
type.  Therefore, I'm setting the long long int part of the
switch-exp-transform-3.c test to only execute with 64bit targets.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-3.c: Set the long long
  int test to only execute with 64bit targets.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index c8fae70692e..64a7b146172 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -99,6 +99,8 @@ int unopt_unsigned_long(unsigned long bit_position)
 }
 }
 
+#ifdef __x86_64__
+
 int unopt_long_long(long long bit_position)
 {
 switch (bit_position)
@@ -145,4 +147,7 @@ int unopt_unsigned_long_long(unsigned long long 
bit_position)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" } } */
+#endif
+
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
"switchconv" { target ia32 } } } */
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" { target { ! ia32 } } } } */
-- 
2.45.2



Re: [PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl
On Wed 2024-07-31 13:34:28, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 01:32:06PM +0200, Filip Kastl wrote:
> > Thanks for the feedback!  Here is a second version of the patch.  I've 
> > tested
> > this version with
> > 
> > make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
> > --target_board='unix{-m32}'"
> > 
> > and
> > 
> > make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"
> 
> You can just use
> make check RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} 
> i386.exp=switch-exp-transform-3.c"
> 
> > testsuite: Adjust switch-exp-transform-3.c for 32bit
> > 
> > 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> > type.  Therefore, I'm setting the long long int part of the
> > switch-exp-transform-3.c test to only execute with 64bit targets.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/switch-exp-transform-3.c: Set the long long
> >   int test to only execute with 64bit targets.
> 
> There should be just tab, not tab + 2 spaces before int test.
> Ok with that nit changed.
> 
>   Jakub
> 

Ok, removed the 2 spaces and pushed the patch.

Thanks,
Filip Kastl


[COMMITED] gimple ssa: Fix a typo in gimple-ssa-sccopy.cc

2024-08-05 Thread Filip Kastl
Hello,

just commited this as obvious.

Filip Kastl

-- 8< --

Fixes a misplaced comment in gimple-ssa-sccopy.cc.  The comment belongs
to a bitmap definition but was instead placed before the beginning of a
namespace block.

gcc/ChangeLog:

* gimple-ssa-sccopy.cc: Move a misplaced comment.

Signed-off-by: Filip Kastl 
---
 gcc/gimple-ssa-sccopy.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index 138ee9a0ac4..191a4c0b451 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -92,10 +92,11 @@ along with GCC; see the file COPYING3.  If not see
  Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
  Section 3.2.  */
 
+namespace {
+
 /* Bitmap tracking statements which were propagated to be removed at the end of
the pass.  */
 
-namespace {
 static bitmap dead_stmts;
 
 /* State of vertex during SCC discovery.
-- 
2.45.2



[PATCH] gimple ssa: Put SCCOPY logic into a class

2024-08-06 Thread Filip Kastl
Hello everybody,

In pr113054[1] Andrew said that he doesn't like the 'dead_stmts' static
variable I used when implementing the sccopy pass.  We agreed that wrapping
the relevant code from the pass in a class would be most likely the best
solution.  Here is a patch that does exactly that.  I waited until stage 1 to
submit it.

Bootstrapped and regtested on x86_64.  Is the patch ok to be pushed to trunk?

Cheers,
Filip Kastl


[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113054


-- 8< --


Currently the main logic of the sccopy pass is implemented as static
functions.  This patch instead puts the code into a class.  This also
gets rid of a global variable (dead_stmts).

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (class scc_copy_prop): New class.
(replace_scc_by_value): Put into...
(scc_copy_prop::replace_scc_by_value): ...scc_copy_prop.
(sccopy_visit_op): Put into...
(scc_copy_prop::visit_op): ...scc_copy_prop.
(sccopy_propagate): Put into...
(scc_copy_prop::propagate): ...scc_copy_prop.
(init_sccopy): Replace by...
(scc_copy_prop::scc_copy_prop): ...the construtor.
(finalize_sccopy): Replace by...
(scc_copy_prop::~scc_copy_prop): ...the destructor.
(pass_sccopy::execute): Use scc_copy_prop.

Signed-off-by: Filip Kastl 
---
 gcc/gimple-ssa-sccopy.cc | 66 ++--
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index 191a4c0b451..d9eaeab4abb 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -94,11 +94,6 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace {
 
-/* Bitmap tracking statements which were propagated to be removed at the end of
-   the pass.  */
-
-static bitmap dead_stmts;
-
 /* State of vertex during SCC discovery.
 
unvisited  Vertex hasn't yet been popped from worklist.
@@ -459,11 +454,33 @@ get_all_stmt_may_generate_copy (void)
   return result;
 }
 
+/* SCC copy propagation
+
+   'scc_copy_prop::propagate ()' is the main function of this pass.  */
+
+class scc_copy_prop
+{
+public:
+  scc_copy_prop ();
+  ~scc_copy_prop ();
+  void propagate ();
+
+private:
+  /* Bitmap tracking statements which were propagated so that they can be
+ removed at the end of the pass.  */
+  bitmap dead_stmts;
+
+  void visit_op (tree op, hash_set &outer_ops,
+   hash_set &scc_set, bool &is_inner,
+   tree &last_outer_op);
+  void replace_scc_by_value (vec scc, tree val);
+};
+
 /* For each statement from given SCC, replace its usages by value
VAL.  */
 
-static void
-replace_scc_by_value (vec scc, tree val)
+void
+scc_copy_prop::replace_scc_by_value (vec scc, tree val)
 {
   for (gimple *stmt : scc)
 {
@@ -476,12 +493,12 @@ replace_scc_by_value (vec scc, tree val)
 fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
 }
 
-/* Part of 'sccopy_propagate ()'.  */
+/* Part of 'scc_copy_prop::propagate ()'.  */
 
-static void
-sccopy_visit_op (tree op, hash_set &outer_ops,
-hash_set &scc_set, bool &is_inner,
-tree &last_outer_op)
+void
+scc_copy_prop::visit_op (tree op, hash_set &outer_ops,
+hash_set &scc_set, bool &is_inner,
+tree &last_outer_op)
 {
   bool op_in_scc = false;
 
@@ -539,8 +556,8 @@ sccopy_visit_op (tree op, hash_set &outer_ops,
  Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
  Section 3.2.  */
 
-static void
-sccopy_propagate ()
+void
+scc_copy_prop::propagate ()
 {
   auto_vec useful_stmts = get_all_stmt_may_generate_copy ();
   scc_discovery discovery;
@@ -575,14 +592,12 @@ sccopy_propagate ()
for (j = 0; j < gimple_phi_num_args (phi); j++)
  {
op = gimple_phi_arg_def (phi, j);
-   sccopy_visit_op (op, outer_ops, scc_set, is_inner,
-  last_outer_op);
+   visit_op (op, outer_ops, scc_set, is_inner, last_outer_op);
  }
break;
  case GIMPLE_ASSIGN:
op = gimple_assign_rhs1 (stmt);
-   sccopy_visit_op (op, outer_ops, scc_set, is_inner,
-  last_outer_op);
+   visit_op (op, outer_ops, scc_set, is_inner, last_outer_op);
break;
  default:
gcc_unreachable ();
@@ -613,19 +628,13 @@ sccopy_propagate ()
 }
 }
 
-/* Called when pass execution starts.  */
-
-static void
-init_sccopy (void)
+scc_copy_prop::scc_copy_prop ()
 {
   /* For propagated statements.  */
   dead_stmts = BITMAP_ALLOC (NULL);
 }
 
-/* Called before pass execution ends.  */
-
-static void
-finalize_sccopy (void)
+scc_copy_prop

Re: [PATCH] gimple ssa: Put SCCOPY logic into a class

2024-08-06 Thread Filip Kastl
On Tue 2024-08-06 15:14:32, Richard Biener wrote:
> On Tue, 6 Aug 2024, Filip Kastl wrote:
> 
> > Hello everybody,
> > 
> > In pr113054[1] Andrew said that he doesn't like the 'dead_stmts' static
> > variable I used when implementing the sccopy pass.  We agreed that wrapping
> > the relevant code from the pass in a class would be most likely the best
> > solution.  Here is a patch that does exactly that.  I waited until stage 1 
> > to
> > submit it.
> > 
> > Bootstrapped and regtested on x86_64.  Is the patch ok to be pushed to 
> > trunk?
> 
> OK.
> 
> Richard.
> 

Thanks, pushed.

Filip


[wwwdocs] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-06 Thread Filip Kastl
Hi,

I recently had to add '#include ' to a program to compile it with the
latest trunk GCC due to

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659176.html

So I thought I might create the GCC 15 porting_to.html page and add an entry
about this (I just copied the entry from GCC 14 porting_to.html).

Ha!  As I'm writing this I noticed that actually Jonathan predicted this and
suggested a corresponding porting_to.html entry.  Well, here it is :).

Validated with the W3 Validator.  Is this ok to be pushed?

Cheers,
Filip Kastl


-- 8< --


---
 htdocs/gcc-15/changes.html|  3 +-
 htdocs/gcc-15/porting_to.html | 53 +++
 2 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 htdocs/gcc-15/porting_to.html

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index bfd98496..0d2c0aaf 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -17,9 +17,8 @@
 
 This page is a "brief" summary of some of the huge number of improvements
 in GCC 15.
-
 
diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html
new file mode 100644
index ..43c3b302
--- /dev/null
+++ b/htdocs/gcc-15/porting_to.html
@@ -0,0 +1,53 @@
+
+
+
+
+
+Porting to GCC 15
+https://gcc.gnu.org/gcc.css";>
+
+
+
+Porting to GCC 15
+
+
+The GCC 15 release series differs from previous GCC releases in
+a number of ways. Some of these are a result
+of bug fixing, and some old behaviors have been intentionally changed
+to support new standards, or relaxed in standards-conforming ways to
+facilitate compilation or run-time performance.
+
+
+
+Some of these changes are user visible and can cause grief when
+porting to GCC 15. This document is an effort to identify common issues
+and provide solutions. Let us know if you have suggestions for improvements!
+
+
+Note: GCC 15 has not been released yet, so this document is
+a work-in-progress.
+
+
+
+C++ language issues
+
+Header dependency changes
+Some C++ Standard Library headers have been changed to no longer include
+other headers that were being used internally by the library.
+As such, C++ programs that used standard library components without
+including the right headers will no longer compile.
+
+
+The following headers are used less widely in libstdc++ and may need to
+be included explicitly when compiling with GCC 15:
+
+
+ <cstdint>
+  (for std::int8_t, std::int32_t etc.)
+
+
+
+
+
+
+
-- 
2.45.2



Re: [wwwdocs] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-07 Thread Filip Kastl
On Tue 2024-08-06 17:00:24, Gerald Pfeifer wrote:
> > +
> > +The following headers are used less widely in libstdc++ and may need to
> > +be included explicitly when compiling with GCC 15:
> > +
> > + <cstdint>
> > +  (for std::int8_t, std::int32_t etc.)
> > +
> 
> The text reads "headers", alas there only appears to be one right now?
> So "header is" (singular)?

Nice catch.  I didn't notice this plural/singular mismatch.  Hmm, I could do
this:

Some C++ Standard Library headers have been changed to no longer include
other headers that were being used internally by the library.
As such, C++ programs that used standard library components without
including the right headers will no longer compile.


In particular, the following header is used less widely in libstdc++ and may
need to be included explicitly when compiling with GCC 15:


 <cstdint>
  (for std::int8_t, std::int32_t etc.)



Here I only change the second paragraph.  The logic behind this is that the
first paragraph introduces the problem, therefore speaks more generally and
uses plural.  Then the second paragraph speaks about the particular header in
question and uses singular.

Or I could do this:

Some C++ Standard Library headers have been changed to no longer include the
<cstdint>/<stdint.h> header that was
being used internally by the library. As such, C++ programs that used standard
library components without including <cstdint> where needed
will no longer compile.




Any opinion on these two options?

Cheers,
Filip Kastl


[PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Filip Kastl
On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
> Hi!
> 
> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
> > contrib/check-params-in-docs.py is a script that checks that all
> > options reported with ./gcc/xgcc -Bgcc --help=param are in
> > gcc/doc/invoke.texi and vice versa.
> 
> Eh, first time I'm hearing about this one!
> 
> (a) Shouldn't this be running as part of the GCC build process?
> 
> > gcn-preferred-vectorization-factor is in the manual but normally not
> > reported by --help, probably because I do not have gcn offload
> > configured.
> 
> No, because you've not been building GCC for GCN target.  ;-P
> 
> > This patch makes the script silently about this particular
> > fact.
> 
> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
> to how that's done for "skip aarch64 params"?
> 
> (c) ..., and shouldn't we likewise skip any "x86" ones?
> 
> (d) ..., or in fact any target specific ones, following after the generic
> section?  (Easily achieved with a special marker in
> 'gcc/doc/invoke.texi', just before:
> 
> The following choices of @var{name} are available on AArch64 targets:
> 
> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
> accordingly?

Hi,

I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
chose to do it differently since target-specific params in both invoke.texi and
--help=params have to be ignored.

The downside of this patch is that the script won't complain if someone adds a
target-specific param and doesn't document it.

What do you think?

Cheers,
Filip

-- 8< --

contrib/check-params-in-docs.py is a script that checks that all options
reported with gcc --help=params are in gcc/doc/invoke.texi and vice
versa.
gcc/doc/invoke.texi lists target-specific params but gcc --help=params
doesn't.  This meant that the script would mistakenly complain about
parms missing from --help=params.  Previously, the script was just set
to ignore aarch64 and gcn params which solved this issue only for x86.
This patch sets the script to ignore all target-specific params.

contrib/ChangeLog:

* check-params-in-docs.py: Ignore target specific params.

Signed-off-by: Filip Kastl 
---
 contrib/check-params-in-docs.py | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index f7879dd8e08..ccdb8d72169 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -38,6 +38,9 @@ def get_param_tuple(line):
 description = line[i:].strip()
 return (name, description)
 
+def target_specific(param):
+return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
+
 
 parser = argparse.ArgumentParser()
 parser.add_argument('texi_file')
@@ -45,13 +48,16 @@ parser.add_argument('params_output')
 
 args = parser.parse_args()
 
-ignored = {'logical-op-non-short-circuit', 
'gcn-preferred-vectorization-factor'}
-params = {}
+ignored = {'logical-op-non-short-circuit'}
+help_params = {}
 
 for line in open(args.params_output).readlines():
 if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
 r = get_param_tuple(line)
-params[r[0]] = r[1]
+help_params[r[0]] = r[1]
+
+# Skip target-specific params
+help_params = [x for x in help_params.keys() if not target_specific(x)]
 
 # Find section in .texi manual with parameters
 texi = ([x.strip() for x in open(args.texi_file).readlines()])
@@ -66,14 +72,13 @@ for line in texi:
 texi_params.append(line[len(token):])
 break
 
-# skip digits
+# Skip digits
 texi_params = [x for x in texi_params if not x[0].isdigit()]
-# skip aarch64 params
-texi_params = [x for x in texi_params if not x.startswith('aarch64')]
-sorted_params = sorted(texi_params)
+# Skip target-specific params
+texi_params = [x for x in texi_params if not target_specific(x)]
 
 texi_set = set(texi_params) - ignored
-params_set = set(params.keys()) - ignored
+params_set = set(help_params) - ignored
 
 success = True
 extra = texi_set - params_set
-- 
2.43.1



Re: [RFC] gimple ssa: VRP to mark switch default as unreachable instead of removing it [PR112687]

2024-09-01 Thread Filip Kastl
On Sun 2024-09-01 09:25:07, Jeff Law wrote:
> 
> 
> On 9/1/24 9:19 AM, Filip Kastl wrote:
> > (I'm Cc-ing Diego since he originally contributed the VRP pass and Jeff 
> > because
> > I've seen him in git blame on many lines of vr-values.cc around the place I
> > would like to make modifications)
> > 
> > Hello,
> > 
> > In this RFC I'd like to propose the following: When the value range 
> > statement
> > simplification machinery encounters a switch with a default case that never
> > executes, it should mark the default case with __builtin_unreachable 
> > instead of
> > removing the default case.  I think that should be the "canonical"
> > representation of switches with this property.
> > 
> > In terms of changes made to GCC code, this would be a small patch.
> > 
> > I'd like to hear other people's opinions on this.  Perhaps there are
> > implications of this change or some other issues with the idea which I don't
> > see.
> The right thing to do is to eliminate the unreachable code.   We don't
> replace the body of an if (0)  with a __builtin_unreachable.  We remove it.
> Switch statements should be no different here.
> 
> If that inhibits later analysis or optimization then that's an indicator
> that we need to improve our frameworks, not that we should keep dead code
> lying around in the IL.
> 
> Jeff
> 
> 

I see.  This is not the way forward.  While reading your reply it occured to me
that it could be possible to use value range info in switch conversion to get
the switches in question optimized without relying on __bultin_unreachable.
I'll try that.

Thanks,
Filip


[RFC] gimple ssa: VRP to mark switch default as unreachable instead of removing it [PR112687]

2024-09-01 Thread Filip Kastl
(I'm Cc-ing Diego since he originally contributed the VRP pass and Jeff because
I've seen him in git blame on many lines of vr-values.cc around the place I
would like to make modifications)

Hello,

In this RFC I'd like to propose the following: When the value range statement
simplification machinery encounters a switch with a default case that never
executes, it should mark the default case with __builtin_unreachable instead of
removing the default case.  I think that should be the "canonical"
representation of switches with this property.

In terms of changes made to GCC code, this would be a small patch.

I'd like to hear other people's opinions on this.  Perhaps there are
implications of this change or some other issues with the idea which I don't
see.


Initial example
===

Consider this C code

switch (v & 3) {
case 0: return 0;
case 1: return 1;
case 2: return 2;
case 3: return 3;
default: __builtin_unreachable();
}

Currently, when early vrp is processing this code
(vr-values.cc:simplify_switch_using_ranges), it sees that the index expression
can only have values [0,3].  Since the numbered cases cover this range
entirely, vrp decides to transform the switch so that it only has these 4
numbered cases and removes the default case.  Because GIMPLE switches have to
have a default, vrp converts case 0 to the default case.  The switch then looks
like this

switch (v & 3) {
default: return 0;
case 1: return 1;
case 2: return 2;
case 3: return 3;
}

However, I think it would be better if the default case with
__builtin_unreachable remained.  I think it conveys more information to the
passes running after vrp.


What would this solve
=

Notice that the initial example switch doesn't do anything.  It is just an
identity mapping.  The example is actually taken from PR112687.  The PR shows
that currently GCC (at least in some cases) fails to optimize similar identity
switches.  Currently the initial example looks like this at the end of the
GIMPLE pipeline:

int unopt (int v)
{
  int _1;
  int _2;
  unsigned int _6;
  unsigned int _7;
 
  
  _1 = v_3(D) & 3;
  _6 = (unsigned int) _1;
  _7 = _6 + 4294967295;
  if (_7 <= 2)
goto ;
  else
goto ;
 
  
 
  
  # _2 = PHI <_1(2), 0(3)>
  return _2;
 
}

While it could look like this:

int unopt (int v)
{
  int _1;
 
  
  _1 = v_2(D) & 3;
  return _1;
 
}

Here is another example of similar code that currently doesn't get optimized...

int unopt(int v)
{
switch (v & 3) {
case 0: return 0;
case 1: return 2;
case 2: return 4;
case 3: return 6;
default: return -1;
}
}

...and could get optimized into something like this if VRP marked the default
case as unreachable.

int opt(int v)
{
return (v & 3) * 2;
}

I have originally thought about solving this problem in the switch conversion
pass.  However, I think that would require basically detecting that the "take
away the default case" transformation happened and then reversing it.  That
seems pretty messy to me.  Also, as I already mentioned, I think that the
canonical representation of this kind of switches should be some representation
that marks the default case as unreachable (with __builtin_unreachable or
somehow else).


How I'd do this
===

I'm attaching a work-in-progress patch to this RFC.

I'm aware that in the final version it would be good to also modify the code
surrounding the blob that I insert in the patch.  For example,
cleanup_edges_and_switches() contains code dealing with the situation when the
default case gets removed, which is a situation that never happens with my
patch applied.

I'll appreciate comments on if I should place the blob of code creating the
__builtin_unreachable somewhere else in the file.  I'll also appreciate any
other comments on the patch.


Cheers,
Filip Kastl


P.S. While writing this I realized that in this case...

int unopt(int v)
{
switch (v & 3) {
default: return 0;
case 1: return 1;
case 2: return 2;
case 3: return 3;
}
}

...GCC fails to optimize the switch even with my patch applied.  This is
because here VRP sees a default case that actually gets executed sometimes and
leaves the default case be.  We could make the following change to remedy this:
Whenever VRP encounters a default that only corresponds to one possible value
of the index expression of the switch, it could create a numbered case
corresponding to this value and mark the default case as unreachable.


-- 8< --


diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index cf273a3fc62..b297e782b09 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -1368,6 +1368,32 @@ simplify_using_ranges::simplify

[PATCH] gimple ssa: Don't use __builtin_popcount in switch exp transform [PR116616]

2024-09-06 Thread Filip Kastl
Hi,

bootstrapped and regtested on x86_64-linux.  Ok to push?

Thanks,
Filip Kastl


 8< 


Switch exponential transformation in the switch conversion pass
currently generates

tmp1 = __builtin_popcount (var);
tmp2 = tmp1 == 1;

when inserting code to determine if var is power of two.  If the target
doesn't support expanding the builtin as special instructions switch
conversion relies on this whole pattern being expanded as bitmagic.
However, it is possible that other GIMPLE optimizations move the two
statements of the pattern apart.  In that case the builtin becomes a
libgcc call in the final binary.  The call is slow and in case of
freestanding programs can result in linking error (this bug was
originally found while compiling Linux kernel).

This patch modifies switch conversion to insert the bitmagic
(var ^ (var - 1)) > (var - 1) instead of the builtin.

gcc/ChangeLog:

PR tree-optimization/116616
* tree-switch-conversion.cc (can_pow2p): Remove this function.
(gen_pow2p): Generate bitmagic instead of a builtin.  Remove the
TYPE parameter.
(switch_conversion::is_exp_index_transform_viable): Don't call
can_pow2p.
(switch_conversion::exp_index_transform): Call gen_pow2p without
the TYPE parameter.
* tree-switch-conversion.h: Remove
m_exp_index_transform_pow2p_type.

gcc/testsuite/ChangeLog:

PR tree-optimization/116616
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of the POPCOUNT internal fn call.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |  7 +-
 gcc/tree-switch-conversion.cc | 82 ---
 gcc/tree-switch-conversion.h  |  6 +-
 3 files changed, 19 insertions(+), 76 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index a8c9e03e515..4832f5b52c3 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,10 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
-   this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated and that it has been later expanded
-   into an internal function.  */
+   this switch into an array lookup.  */
 
 int foo(unsigned bar)
 {
@@ -30,4 +28,3 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index c1332a26094..2e42611f9fd 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -133,75 +133,27 @@ gen_log2 (tree op, location_t loc, tree *result, tree 
type)
   return stmts;
 }
 
-/* Is it possible to efficiently check that a value of TYPE is a power of 2?
-
-   If yes, returns TYPE.  If no, returns NULL_TREE.  May also return another
-   type.  This indicates that logarithm of the variable can be computed but
-   only after it is converted to this type.
-
-   Also see gen_pow2p.  */
-
-static tree
-can_pow2p (tree type)
-{
-  /* __builtin_popcount supports the unsigned type or its long and long long
- variants.  Choose the smallest out of those that can still fit TYPE.  */
-  int prec = TYPE_PRECISION (type);
-  int i_prec = TYPE_PRECISION (unsigned_type_node);
-  int li_prec = TYPE_PRECISION (long_unsigned_type_node);
-  int lli_prec = TYPE_PRECISION (long_long_unsigned_type_node);
-
-  if (prec <= i_prec)
-return unsigned_type_node;
-  else if (prec <= li_prec)
-return long_unsigned_type_node;
-  else if (prec <= lli_prec)
-return long_long_unsigned_type_node;
-  else
-return NULL_TREE;
-}
-
-/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
-   special optabs if target supports them.  Return the result as a
-   boolean_type_node ssa name through RESULT.  Assumes that OP's value will
-   be non-negative.  The generated check may give arbitrary answer for negative
-   values.
-
-   Before computing the check, OP may have to be converted to another type.
-   This should be specified in TYPE.  Use can_pow2p to decide what this type
-   should be.
-
-   Should only be used if can_pow2p returns true for type of OP.  */
+/* Build a sequence of gimple statements checking that OP is a power of 2.
+   Return the result as a boolean_type_node ssa name through RESULT.  Assumes
+   that OP's value will be non-negative.  The generated check may give
+   arbitrary answer for negative v

[PATCH v2] gimple ssa: Don't use __builtin_popcount in switch exp transform

2024-09-10 Thread Filip Kastl
Hi,

This is the second version of this patch.  See the previous version here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662462.html

In this verison I added a conversion to unsigned type to the bitmagic that I
generate as Andrew suggested.

Bootstrapped and regtested on x86_64-linux.  Ok to push?

Thanks,
Filip Kastl


--- 8< ---


Switch exponential transformation in the switch conversion pass
currently generates

tmp1 = __builtin_popcount (var);
tmp2 = tmp1 == 1;

when inserting code to determine if var is power of two.  If the target
doesn't support expanding the builtin as special instructions switch
conversion relies on this whole pattern being expanded as bitmagic.
However, it is possible that other GIMPLE optimizations move the two
statements of the pattern apart.  In that case the builtin becomes a
libgcc call in the final binary.  The call is slow and in case of
freestanding programs can result in linking error (this bug was
originally found while compiling Linux kernel).

This patch modifies switch conversion to insert the bitmagic
(var ^ (var - 1)) > (var - 1) instead of the builtin.

gcc/ChangeLog:

PR tree-optimization/116616
* tree-switch-conversion.cc (can_pow2p): Remove this function.
(gen_pow2p): Generate bitmagic instead of a builtin.  Remove the
TYPE parameter.
(switch_conversion::is_exp_index_transform_viable): Don't call
can_pow2p.
(switch_conversion::exp_index_transform): Call gen_pow2p without
the TYPE parameter.
* tree-switch-conversion.h: Remove
m_exp_index_transform_pow2p_type.

gcc/testsuite/ChangeLog:

PR tree-optimization/116616
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of the POPCOUNT internal fn call.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |  7 +-
 gcc/tree-switch-conversion.cc | 84 +--
 gcc/tree-switch-conversion.h  |  6 +-
 3 files changed, 23 insertions(+), 74 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index a8c9e03e515..4832f5b52c3 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,10 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
-   this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated and that it has been later expanded
-   into an internal function.  */
+   this switch into an array lookup.  */
 
 int foo(unsigned bar)
 {
@@ -30,4 +28,3 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index c1332a26094..00426d4 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -133,75 +133,33 @@ gen_log2 (tree op, location_t loc, tree *result, tree 
type)
   return stmts;
 }
 
-/* Is it possible to efficiently check that a value of TYPE is a power of 2?
-
-   If yes, returns TYPE.  If no, returns NULL_TREE.  May also return another
-   type.  This indicates that logarithm of the variable can be computed but
-   only after it is converted to this type.
-
-   Also see gen_pow2p.  */
-
-static tree
-can_pow2p (tree type)
-{
-  /* __builtin_popcount supports the unsigned type or its long and long long
- variants.  Choose the smallest out of those that can still fit TYPE.  */
-  int prec = TYPE_PRECISION (type);
-  int i_prec = TYPE_PRECISION (unsigned_type_node);
-  int li_prec = TYPE_PRECISION (long_unsigned_type_node);
-  int lli_prec = TYPE_PRECISION (long_long_unsigned_type_node);
-
-  if (prec <= i_prec)
-return unsigned_type_node;
-  else if (prec <= li_prec)
-return long_unsigned_type_node;
-  else if (prec <= lli_prec)
-return long_long_unsigned_type_node;
-  else
-return NULL_TREE;
-}
-
-/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
-   special optabs if target supports them.  Return the result as a
-   boolean_type_node ssa name through RESULT.  Assumes that OP's value will
-   be non-negative.  The generated check may give arbitrary answer for negative
-   values.
-
-   Before computing the check, OP may have to be converted to another type.
-   This should be specified in TYPE.  Use can_pow2p to decide what this type
-   should be.
-
-   Should only be used if can_pow2p returns true for type of OP.  */
+/* Build a seq

[COMMITED] MAINTAINERS: Fix name order

2024-10-15 Thread Filip Kastl
ChangeLog:

* MAINTAINERS: Fix Write After Approval name order.

Signed-off-by: Filip Kastl 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cf1cf78e16c..269ac2ea6b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -760,7 +760,6 @@ Ankur Saini arsenic 

 Hariharan Sandanagobalane   hariharans  
 Richard Sandiford   rsandifo
 Iain Sandoe iains   
-Feng Xuefxue
 Duncan Sandsbaldrick
 Sujoy Saraswati ssaraswati  
 Trevor Saunders tbsaunde
@@ -880,6 +879,7 @@ Ruoyao Xi   xry111  

 Mingjie Xingxmj 
 Chenghua Xu paulhua 
 Li Xu   -   
+Feng Xuefxue
 Canqun Yang canqun  
 Fei Yangfyang   
 Jeffrey Yasskin jyasskin
-- 
2.46.0



Re: [PATCH 2/2] Only do switch bit test clustering when multiple labels point to same bb

2024-10-17 Thread Filip Kastl
Hi Andi,

This seems like a reasonable way to avoid the specific issue in PR117091 and
generally speed up switch lowering of switches with all cases unique.  I cannot
approve this but want to share some comments.

On Wed 2024-10-16 17:50:59, Andi Kleen wrote:
> diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
> index 96ce1c380a59..4151d1bb520e 100644
> --- a/gcc/gimple-if-to-switch.cc
> +++ b/gcc/gimple-if-to-switch.cc
> @@ -254,7 +254,7 @@ if_chain::is_beneficial ()
>else
>  output.release ();
>  
> -  output = bit_test_cluster::find_bit_tests (filtered_clusters);
> +  output = bit_test_cluster::find_bit_tests (filtered_clusters, 2);

Maybe it would be nicer to not have max_c as a parameter of find_bit_tests()
but just guard the call to find_bit_tests() in analyze_switch_statement() with
max_c > 2.  Since we don't have max_c in if_chain::is_beneficial(), having
max_c as parameter leads to this constant 2 in this call that will possibly
confuse people reading if_chain::is_beneficial().  But this is a minor thing
and I guess your way (max_c as a parameter) is also fine by me.

>r = output.length () < filtered_clusters.length ();
>if (r)
>  dump_clusters (&output, "BT can be built");
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index 00426d46..bb7b8cf215a3 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -1772,12 +1772,13 @@ jump_table_cluster::is_beneficial (const vec *> &,
>  }
>  
>  /* Find bit tests of given CLUSTERS, where all members of the vector
> -   are of type simple_cluster.  New clusters are returned.  */
> +   are of type simple_cluster. max_c is the max number of cases per label.

There should be two spaces before "max_c".  Also it would be nice if MAX_C was
written in uppercase for consistency with other function comments in
tree-switch-conversion.cc.

> @@ -577,8 +577,9 @@ public:
>bool try_switch_expansion (vec &clusters);
>/* Compute the number of case labels that correspond to each outgoing edge 
> of
>   switch statement.  Record this information in the aux field of the edge.
> + Returns max number of cases per edge.
>   */

I would specify "approx max number" instead of "max number" here.

Otherwise looks good to me.

Cheers,
Filip Kastl


[PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-11 Thread Filip Kastl
Hi,

Bootstrapped and regtested on x86_64 linux.  I also tested that all the new
calls can be linked with the AOCL LibM library.  Ok to push?

Thanks,
Filip Kastl


-- 8< --


We currently support generating vectorized math calls to the AMD core
math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
its successor is the math library from AMD Optimizing CPU Libraries
(AOCL).

This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
broadens the range of vectorized math functions optimized for AMD CPUs
that GCC can generate calls to.

See the edit to invoke.texi for a complete list of added functions.
Compared to the list of functions in AOCL LibM docs I left out the
sincos, linearfrac, powx, sqrt and fabs operations.  I also left out all
the functions working with arrays and amd_vrd2_expm1() (the AMD docs
list the function but I wasn't able to link calls to it with the current
version of the library).

gcc/ChangeLog:

PR target/56504
* config/i386/i386-options.cc (ix86_option_override_internal):
Add ix86_veclibabi_type_aocl case.
* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
ix86_veclibabi_aocl().
* config/i386/i386-opts.h (enum ix86_veclibabi): Add
ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
* config/i386/i386.opt: Add the 'aocl' type.
* doc/invoke.texi: Document -mveclibabi=aocl.

gcc/testsuite/ChangeLog:

PR target/56504
* gcc.target/i386/vectorize-aocl1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/config/i386/i386-options.cc   |   4 +
 gcc/config/i386/i386-options.h|   1 +
 gcc/config/i386/i386-opts.h   |   3 +-
 gcc/config/i386/i386.cc   | 143 +++
 gcc/config/i386/i386.opt  |   3 +
 gcc/doc/invoke.texi   |  57 +++--
 .../gcc.target/i386/vectorize-aocl1.c | 224 ++
 7 files changed, 419 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vectorize-aocl1.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 603166d249c..76a20179a36 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2877,6 +2877,10 @@ ix86_option_override_internal (bool main_args_p,
ix86_veclib_handler = &ix86_veclibabi_acml;
break;
 
+  case ix86_veclibabi_type_aocl:
+   ix86_veclib_handler = &ix86_veclibabi_aocl;
+   break;
+
   default:
gcc_unreachable ();
   }
diff --git a/gcc/config/i386/i386-options.h b/gcc/config/i386/i386-options.h
index 0d448ef9f15..591a6152c01 100644
--- a/gcc/config/i386/i386-options.h
+++ b/gcc/config/i386/i386-options.h
@@ -60,6 +60,7 @@ void ix86_simd_clone_adjust (struct cgraph_node *node);
 extern tree (*ix86_veclib_handler) (combined_fn, tree, tree);
 extern tree ix86_veclibabi_svml (combined_fn, tree, tree);
 extern tree ix86_veclibabi_acml (combined_fn, tree, tree);
+extern tree ix86_veclibabi_aocl (combined_fn, tree, tree);
 
 enum ix86_function_specific_strings
 {
diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 35542b28936..69fcd82bf47 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -87,7 +87,8 @@ enum asm_dialect {
 enum ix86_veclibabi {
   ix86_veclibabi_type_none,
   ix86_veclibabi_type_svml,
-  ix86_veclibabi_type_acml
+  ix86_veclibabi_type_acml,
+  ix86_veclibabi_type_aocl
 };
 
 enum stack_protector_guard {
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 6ac3a5d55f2..8ccbc8bbc07 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -19882,6 +19882,149 @@ ix86_veclibabi_acml (combined_fn fn, tree type_out, 
tree type_in)
   return new_fndecl;
 }
 
+/* Handler for an AOCL-LibM-style interface to
+   a library with vectorized intrinsics.  */
+
+tree
+ix86_veclibabi_aocl (combined_fn fn, tree type_out, tree type_in)
+{
+  char name[20] = "amd_vr";
+  int name_len = 6;
+  tree fntype, new_fndecl, args;
+  unsigned arity;
+  const char *bname;
+  machine_mode el_mode, in_mode;
+  int n, in_n;
+
+  /* AOCL-LibM is 64bits only.  It is also only suitable for unsafe math only
+ as it trades off some accuracy for increased performance.  */
+  if (!TARGET_64BIT
+  || !flag_unsafe_math_optimizations)
+return NULL_TREE;
+
+  el_mode = TYPE_MODE (TREE_TYPE (type_out));
+  n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+  if (el_mode != in_mode
+  || n != in_n)
+return NULL_TREE;
+
+  gcc_checking_assert (n > 0);
+
+  /* Decide whether there exists a function for the combination of FN, the mode
+ and the vector width.  Return early if it doesn't.  */
+
+  if (el_mode != DFmode

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Filip Kastl
Hi Honza,

Here is the second version of the patch.

On Mon 2024-11-11 18:31:47, Jan Hubicka wrote:
> > We currently support generating vectorized math calls to the AMD core
> > math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
> > its successor is the math library from AMD Optimizing CPU Libraries
> > (AOCL).
> > 
> > This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
> > broadens the range of vectorized math functions optimized for AMD CPUs
> > that GCC can generate calls to.
> > 
> > See the edit to invoke.texi for a complete list of added functions.
> > Compared to the list of functions in AOCL LibM docs I left out the
> > sincos, linearfrac, powx, sqrt and fabs operations.  I also left out all
> Why those are out?

I've modified the commit message so that it lists all the reasons.

> > the functions working with arrays and amd_vrd2_expm1() (the AMD docs
> > list the function but I wasn't able to link calls to it with the current
> > version of the library).
> > 
> > gcc/ChangeLog:
> > 
> > PR target/56504
> > * config/i386/i386-options.cc (ix86_option_override_internal):
> > Add ix86_veclibabi_type_aocl case.
> > * config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
> > ix86_veclibabi_aocl().
> > * config/i386/i386-opts.h (enum ix86_veclibabi): Add
> > ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
> > * config/i386/i386.cc (ix86_veclibabi_aocl): New function.
> > * config/i386/i386.opt: Add the 'aocl' type.
> > * doc/invoke.texi: Document -mveclibabi=aocl.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/56504
> > * gcc.target/i386/vectorize-aocl1.c: New test.
> > 
> > +  new_fndecl = build_decl (BUILTINS_LOCATION,
> > +  FUNCTION_DECL, get_identifier (name), fntype);
> > +  TREE_PUBLIC (new_fndecl) = 1;
> > +  DECL_EXTERNAL (new_fndecl) = 1;
> > +  DECL_IS_NOVOPS (new_fndecl) = 1;
> > +  TREE_READONLY (new_fndecl) = 1;
> 
> I see that NOVOPS is copied from the older implementation.  I think
> const (which is specified by TREE_READONLY = 1) should be sufficient.
> 

I've removed this line.

Bootstrapped and regtested on x86_64 linux.  Ok to commit?

Thanks,
Filip Kastl


-- 8< --


We currently support generating vectorized math calls to the AMD core
math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
its successor is the math library from AMD Optimizing CPU Libraries
(AOCL).

This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
broadens the range of vectorized math functions optimized for AMD CPUs
that GCC can generate calls to.

See the edit to invoke.texi for a complete list of added functions.
Compared to the list of functions in AOCL LibM docs I left out these
vectorized function families:

- sincos and all functions working with arrays ... Because these
  functions have pointer arguments and that would require a bigger
  rework of ix86_veclibabi_aocl().  Also, I'm not sure if GCC even ever
  generates calls to these functions.
- linearfrac ... Because these functions are specific to the AMD
  library.  There's no equivalent glibc function nor GCC internal
  function nor GCC built-in.
- powx, sqrt, fabs ... Because GCC doesn't vectorize these functions.

I also left amd_vrd2_expm1() (the AMD docs list the function but I
wasn't able to link calls to it with the current version of the
library).

gcc/ChangeLog:

PR target/56504
* config/i386/i386-options.cc (ix86_option_override_internal):
Add ix86_veclibabi_type_aocl case.
* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
ix86_veclibabi_aocl().
* config/i386/i386-opts.h (enum ix86_veclibabi): Add
ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
* config/i386/i386.opt: Add the 'aocl' type.
* doc/invoke.texi: Document -mveclibabi=aocl.

gcc/testsuite/ChangeLog:

PR target/56504
* gcc.target/i386/vectorize-aocl1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/config/i386/i386-options.cc   |   4 +
 gcc/config/i386/i386-options.h|   1 +
 gcc/config/i386/i386-opts.h   |   3 +-
 gcc/config/i386/i386.cc   | 142 +++
 gcc/config/i386/i386.opt  |   3 +
 gcc/doc/invoke.texi   |  57 +++--
 .../gcc.target/i386/vectorize-aocl1.c | 224 ++
 7 files changed, 418 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vectorize-aocl1.c

di

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Filip Kastl
On Wed 2024-11-13 15:18:32, Jan Hubicka wrote:
> > - sincos and all functions working with arrays ... Because these
> >   functions have pointer arguments and that would require a bigger
> >   rework of ix86_veclibabi_aocl().  Also, I'm not sure if GCC even ever
> >   generates calls to these functions.
> GCC is able to recognize sin and cos calls and turn them into single sincos.
> 
> double a[2];
> void test (double b)
> {
> a[0]=__builtin_sin (b);
> a[1]=__builtin_cos (b);
> }
> 
> There is sincos pass for this.
> 
> For functions working on arrays, I think we would need to pattern match
> them like we do memset/memcpy in -ftree-loop-distribute-patterns...
> > - linearfrac ... Because these functions are specific to the AMD
> >   library.  There's no equivalent glibc function nor GCC internal
> >   function nor GCC built-in.
> > - powx, sqrt, fabs ... Because GCC doesn't vectorize these functions.
> sqrt/fabs are vectorized, but produced inline since these are easily
> doable with SSE/AVX.

Ah, ok.

Since you said that the patch is ok in response to the first version already,
I've pushed the second version of the patch and changed "Because GCC doesn't
vectorize these functions." to "Because GCC doesn't vectorize these functions
into calls and uses instruction instead."

I hope that's ok.

Cheers,
Filip


[PING] [PATCH v2] gimple ssa: Don't use __builtin_popcount in switch exp transform

2024-09-23 Thread Filip Kastl
Hi,

I'd like to ping my patch.  You can find it here

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662744.html

Btw I forgot to include [PR116616] in the subject.  Hope I didn't confuse
people.  I will take care to include the tag in the git commit message.

Thanks,
Filip Kastl


Re: [PATCH] Update email in MAINTAINERS file.

2024-09-24 Thread Filip Kastl
On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote:
> From: Aldy Hernandez 
> 
> ChangeLog:
> 
>   * MAINTAINERS: Update email and add myself to DCO.
> ---
>  MAINTAINERS | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cfd96c9f33e..e9fafaf45a7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -116,7 +116,7 @@ riscv port  Jim Wilson  
> 
>  rs6000/powerpc port David Edelsohn  
>  rs6000/powerpc port Segher Boessenkool  
>  rs6000/powerpc port Kewen Lin   
> -rs6000 vector extns Aldy Hernandez  
> +rs6000 vector extns Aldy Hernandez  
>  rx port Nick Clifton
>  s390 port   Ulrich Weigand  
>  s390 port   Andreas Krebbel 
> @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely 
> 
>  c++ runtime libs special modes  François Dumont 
>  fixincludes Bruce Korb  
>  *gimpl* Jakub Jelinek   
> -*gimpl* Aldy Hernandez  
> +*gimpl* Aldy Hernandez  
>  *gimpl* Jason Merrill   
>  gcse.cc Jeff Law
>  global opt frameworkJeff Law
> @@ -240,7 +240,7 @@ option handling Joseph Myers
> 
>  middle-end  Jeff Law
>  middle-end  Ian Lance Taylor
>  middle-end  Richard Biener  
> -*vrp, rangerAldy Hernandez  
> +*vrp, rangerAldy Hernandez  
>  *vrp, rangerAndrew MacLeod  
>  tree-ssaAndrew MacLeod  
>  tree browser/unparser   Sebastian Pop   
> @@ -518,7 +518,7 @@ Daniel Hellstromdanielh 
> 
>  Fergus Henderson-   
>  Richard Henderson   rth 
>  Stuart Hendersonshenders
> -Aldy Hernandez  aldyh   
> +Aldy Hernandez  aldyh   
>  Philip Herron   redbrain
> 
>  Marius Hillenbrand  -   
>  Matthew Hiller  -   
> @@ -948,3 +948,4 @@ Jonathan Wakely 
> 
>  Alexander Westbrooks
>  Chung-Ju Wu 
>  Pengxuan Zheng  
> +Aldy Hernandez  
> -- 
> 2.43.0
> 

Hi Aldy,

Could you move your entry in the DCO list so that it respects surname
alphabetical order, please?  Your name should be between Robin Dapp and Michal
Jires.

Thanks,
Filip Kastl


Re: [PATCH] Update email in MAINTAINERS file.

2024-09-24 Thread Filip Kastl
On Tue 2024-09-24 11:43:47, Aldy Hernandez wrote:
> Pushed attached patch.
> 
> Thanks.
> Aldy
> 

Nice.

Thanks!
Filip

> On Tue, Sep 24, 2024 at 10:09 AM Filip Kastl  wrote:
> 
> > On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote:
> > > From: Aldy Hernandez 
> > >
> > > ChangeLog:
> > >
> > >   * MAINTAINERS: Update email and add myself to DCO.
> > > ---
> > >  MAINTAINERS | 9 +
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index cfd96c9f33e..e9fafaf45a7 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -116,7 +116,7 @@ riscv port  Jim Wilson  <
> > jim.wilson@gmail.com>
> > >  rs6000/powerpc port David Edelsohn  
> > >  rs6000/powerpc port Segher Boessenkool  <
> > seg...@kernel.crashing.org>
> > >  rs6000/powerpc port Kewen Lin   
> > > -rs6000 vector extns Aldy Hernandez  
> > > +rs6000 vector extns Aldy Hernandez  
> > >  rx port Nick Clifton
> > >  s390 port   Ulrich Weigand  
> > >  s390 port   Andreas Krebbel 
> > > @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely <
> > jwak...@redhat.com>
> > >  c++ runtime libs special modes  François Dumont 
> > >  fixincludes Bruce Korb  
> > >  *gimpl* Jakub Jelinek   
> > > -*gimpl* Aldy Hernandez  
> > > +*gimpl* Aldy Hernandez  
> > >  *gimpl* Jason Merrill   
> > >  gcse.cc Jeff Law
> > >  global opt frameworkJeff Law
> > > @@ -240,7 +240,7 @@ option handling Joseph Myers<
> > josmy...@redhat.com>
> > >  middle-end  Jeff Law
> > >  middle-end  Ian Lance Taylor
> > >  middle-end  Richard Biener  
> > > -*vrp, rangerAldy Hernandez  
> > > +*vrp, rangerAldy Hernandez  
> > >  *vrp, rangerAndrew MacLeod  
> > >  tree-ssaAndrew MacLeod  
> > >  tree browser/unparser   Sebastian Pop   
> > > @@ -518,7 +518,7 @@ Daniel Hellstromdanielh <
> > dan...@gaisler.com>
> > >  Fergus Henderson-   
> > >  Richard Henderson   rth 
> > >  Stuart Hendersonshenders
> > > -Aldy Hernandez  aldyh   
> > > +Aldy Hernandez  aldyh   
> > >  Philip Herron   redbrain<
> > herron.phi...@googlemail.com>
> > >  Marius Hillenbrand  -   
> > >  Matthew Hiller  -   
> > > @@ -948,3 +948,4 @@ Jonathan Wakely     <
> > jwak...@redhat.com>
> > >  Alexander Westbrooks > >
> > >  Chung-Ju Wu 
> > >  Pengxuan Zheng  <
> > quic_pzh...@quicinc.com>
> > > +Aldy Hernandez  
> > > --
> > > 2.43.0
> > >
> >
> > Hi Aldy,
> >
> > Could you move your entry in the DCO list so that it respects surname
> > alphabetical order, please?  Your name should be between Robin Dapp and
> > Michal
> > Jires.
> >
> > Thanks,
> > Filip Kastl
> >
> >

> From 34366176046351250e1beb578664d926fbdd50c9 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Tue, 24 Sep 2024 11:40:52 +0200
> Subject: [PATCH] Alphabetize my entry in MAINTAINER's DCO list.
> 
> ChangeLog:
> 
>   * MAINTAINERS: Move my entry in DCO list into alphabetical order.
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3b4cf9d20d8..47b5915e9f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -917,6 +917,7 @@ information.
>  Juergen Christ  
>  Robin Dapp  
>  Robin Dapp  
> +Aldy Hernandez  
>  Michal Jires
>  Matthias Kretz  
>  Prathamesh Kulkarni 
> @@ -949,4 +950,3 @@ Jonathan Wakely 
> 
>  Alexander Westbrooks
>  Chung-Ju Wu 
>  Pengxuan Zheng  
> -Aldy Hernandez  
> -- 
> 2.43.0
> 



[PING] [PATCH] contrib: Set check-params-in-docs.py to skip tables of values of a param

2024-09-18 Thread Filip Kastl
Hi,

I'd like to ping my patch.  You can find it here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661362.html

Btw not sure if I could Cc anyone on this.  AFAIK no one is a maintainer of the
./contrib scripts.

Thanks,
Filip Kastl


[COMMITED] [PATCH v2] contrib: Set check-params-in-docs.py to skip tables of values of a param

2024-09-18 Thread Filip Kastl
Thanks for the approval Richard!

I've incorporated your suggestion to remove the "digits skip" code and I've
pushed the patch.

Cheers,
Filip Kastl


-- 8< --


Currently check-params-in-docs.py reports extra params being listed in
invoke.texi.  However, those aren't actual params but items in a table of
possible values of the aarch64-autove-preference param.

This patch changes check-params-in-docs.py to ignore similar tables.

contrib/ChangeLog:

* check-params-in-docs.py: Skip tables of values of a param.
Remove code that skips items beginning with a number.

Signed-off-by: Filip Kastl 
---
 contrib/check-params-in-docs.py | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index ccdb8d72169..102f0e64e98 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -66,14 +66,23 @@ texi = takewhile(lambda x: '@node Instrumentation Options' 
not in x, texi)
 texi = list(texi)[1:]
 
 texi_params = []
+skip = False
 for line in texi:
+# Skip @table @samp sections of manual where values of a param are usually
+# listed
+if skip:
+if line.startswith('@end table'):
+skip = False
+continue
+elif line.startswith('@table @samp'):
+skip = True
+continue
+
 for token in ('@item ', '@itemx '):
 if line.startswith(token):
 texi_params.append(line[len(token):])
 break
 
-# Skip digits
-texi_params = [x for x in texi_params if not x[0].isdigit()]
 # Skip target-specific params
 texi_params = [x for x in texi_params if not target_specific(x)]
 
-- 
2.46.0



Re: [PATCH 1/2] doc: install: document bootstrap-ubsan

2024-11-06 Thread Filip Kastl
Hi,

I'm not a maintainer but I think we certainly want to have bootstrap-ubsan
documented and the patch looks good to me.

Cheers,
Filip

On Thu 2024-10-31 21:11:13, Sam James wrote:
> gcc/ChangeLog:
>   PR other/116948
> 
>   * doc/install.texi (Building a native compiler): Mention 
> bootstrap-ubsan.
> ---
>  gcc/doc/install.texi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index f19d55f76c3d..85896721a0ac 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -3151,6 +3151,10 @@ Compiles GCC itself using HWAddress Sanitization in 
> order to catch invalid
>  memory accesses within the GCC code.  This option is only available on 
> AArch64
>  systems that are running Linux kernel version 5.4 or later.
>  
> +@item @samp{bootstrap-ubsan}
> +Compiles GCC itself using Undefined Behavior Sanitization in order to catch
> +undefined behavior within the GCC code.
> +
>  @end table
>  
>  @section Building a cross compiler
> -- 
> 2.47.0
> 


Re: [PATCH 2/2] doc: install: document UBSAN_OPTIONS

2024-11-06 Thread Filip Kastl
On Wed 2024-11-06 10:19:12, Sam James wrote:
> Sam James  writes:
> 
> > Explain that 'bootstrap-ubsan' won't abort on errors by default and how
> > to override that by setting UBSAN_OPTIONS.
> >
> > gcc/ChangeLog:
> > PR other/116948
> >
> > * doc/install.texi (Building a native compiler): Document UBSAN_OPTIONS.
> > ---
> >  gcc/doc/install.texi | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> > index 85896721a0ac..188ce4ba964a 100644
> > --- a/gcc/doc/install.texi
> > +++ b/gcc/doc/install.texi
> > @@ -3153,7 +3153,9 @@ systems that are running Linux kernel version 5.4 or 
> > later.
> >  
> >  @item @samp{bootstrap-ubsan}
> >  Compiles GCC itself using Undefined Behavior Sanitization in order to catch
> > -undefined behavior within the GCC code.
> > +undefined behavior within the GCC code.  Note that it does not abort on 
> > errors
> > +by default.  @code{UBSAN_OPTIONS} can be set to change this, like
> > +@samp{UBSAN_OPTIONS='abort_on_error=1:print_summary=1:print_stacktrace=1'}.
> 
> After discussion with Filip, he found that halt_on_error=1 is needed as
> well, and abort_on_error tweaks the behaviour (exit vs abort) when that
> is set, but it doesn't work by itself. I've added halt_on_error=1 on top
> locally.
> 
> >  
> >  @end table

With this change the patch looks good to me.

Cheers,
Filip


[PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-11-15 Thread Filip Kastl
Hi,

Andi's greedy bit test finding algorithm was reverted.  I found a fix for the
problem that caused the revert.  I made this patch to reintroduce the greedy
alg into GCC.  However I think we should keep the old slow but more powerful
algorithm so I added a limit on the number of cases of the switch statement
that decides which of the two algorithms to use.

Bootstrapped and regtested on x86_64-linux.

I've also bootstrapped and regtested version of this patch where I force switch
lowering to only use the Andi's greedy algorithm (with my fix).  This also went
smoothly without any testsuite regression nor problems during bootstrap (which
surprised me -- I expected some testcases to rely on bit test finding to be
powerful but I double checked and really didn't find any problems).  OFC I've
also checked that the greedy algorithm doesn't cause the problems from PR117352
anymore.

Ok to push?

Thanks,
Filip Kastl


-- 8< --


This patch adds a limit on the number of cases of a switch.  When this
limit is exceeded, switch lowering decides to use faster but less
powerful algorithms.

In particular this means that for finding bit tests switch lowering
decides between the old dynamic programming O(n^2) algorithm and the
new greedy algorithm that Andi Kleen recently added but then reverted
due to PR117352.  It also means that switch lowering may bail out on
finding jump tables if the switch is too large  (Btw it also may not
bail!  It can happen that the greedy algorithms finds some bit tests
which then basically split the switch into multiple smaller switches and
those may be small enough to fit under the limit.)

The limit is implemented as --param switch-lower-slow-alg-max-cases.
Exceeding the limit is reported through -Wdisabled-optimization.

This patch fixes the issue with the greedy algorithm described in
PR117352.  The problem was incorrect usage of the is_beneficial()
heuristic.

gcc/ChangeLog:

PR middle-end/117091
PR middle-end/117352
* params.opt: Add switch-lower-slow-alg-max-cases.
* tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
Note in a comment that we are looking for jump tables in
case sequences delimited by the already found bit tests.
(bit_test_cluster::find_bit_tests): Decide between
find_bit_tests_fast() and find_bit_tests_slow().
(bit_test_cluster::find_bit_tests_fast): New function.
(bit_test_cluster::find_bit_tests_slow): New function.
(switch_decision_tree::analyze_switch_statement): Report
exceeding the limit.
* tree-switch-conversion.h: Add find_bit_tests_fast() and
find_bit_tests_slow().

Co-Authored-By: Andi Kleen 
Signed-off-by: Filip Kastl 
---
 gcc/params.opt|   4 ++
 gcc/tree-switch-conversion.cc | 112 +++---
 gcc/tree-switch-conversion.h  |  18 ++
 3 files changed, 127 insertions(+), 7 deletions(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index 7c572774df2..edd638565b5 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1044,6 +1044,10 @@ Maximum size of a single store merging region in bytes.
 Common Joined UInteger Var(param_switch_conversion_branch_ratio) Init(8) 
IntegerRange(1, 65536) Param Optimization
 The maximum ratio between array size and switch branches for a switch 
conversion to take place.
 
+-param=switch-lower-slow-alg-max-cases=
+Common Joined UInteger Var(param_switch_lower_slow_alg_max_cases) Init(1) 
IntegerRange(1, 10) Param Optimization
+Maximum number of cases for slow switch lowering algorithms to be used.
+
 -param=modref-max-bases=
 Common Joined UInteger Var(param_modref_max_bases) Init(32) Param Optimization
 Maximum number of bases stored in each modref tree.
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 852419b2f4b..3c8cbce5d37 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -55,6 +55,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, 
MA
 #include "tree-cfgcleanup.h"
 #include "hwint.h"
 #include "internal-fn.h"
+#include "diagnostic-core.h"
 
 /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
type in the GIMPLE type system that is language-independent?  */
@@ -1642,6 +1643,11 @@ jump_table_cluster::find_jump_tables (vec 
&clusters)
 return clusters.copy ();
 
   unsigned l = clusters.length ();
+
+  /* Note: l + 1 is the number of cases of the switch.  */
+  if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
+return clusters.copy ();
+
   auto_vec min;
   min.reserve (l + 1);
 
@@ -1772,16 +1778,80 @@ jump_table_cluster::is_beneficial (const vec 
&,
   return end - start + 1 >= case_values_threshold ();
 }
 
-/* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.   MAX_C is the approx max num

[COMMITTED] contrib: Fix 2 bugs in check-params-in-docs.py

2024-12-04 Thread Filip Kastl
Hi All,

Commiting this as obvious.

Cheers,
Filip Kastl


-- 8< --


In my last patch for check-params-in-docs.py I accidentally
1. left one occurence of the 'help_params' variable not renamed
2. converted 'help_params' from a dict to a list

These issues cause the script to error when encountering a parameter
missing in docs.  This patch should fix these issues.

contrib/ChangeLog:

* check-params-in-docs.py: 'params' -> 'help_params'.  Don't
    convert 'help_params' to a list.

Signed-off-by: Filip Kastl 
---
 contrib/check-params-in-docs.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index 102f0e64e98..5d5c64c14f7 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -57,7 +57,7 @@ for line in open(args.params_output).readlines():
 help_params[r[0]] = r[1]
 
 # Skip target-specific params
-help_params = [x for x in help_params.keys() if not target_specific(x)]
+help_params = {x:y for x,y in help_params.items() if not target_specific(x)}
 
 # Find section in .texi manual with parameters
 texi = ([x.strip() for x in open(args.texi_file).readlines()])
@@ -87,7 +87,7 @@ for line in texi:
 texi_params = [x for x in texi_params if not target_specific(x)]
 
 texi_set = set(texi_params) - ignored
-params_set = set(help_params) - ignored
+params_set = set(help_params.keys()) - ignored
 
 success = True
 extra = texi_set - params_set
@@ -101,7 +101,7 @@ if len(missing):
 print('Missing:')
 for m in missing:
 print('@item ' + m)
-print(params[m])
+print(help_params[m])
 print()
 success = False
 
-- 
2.46.0



[COMMITTED] doc: Add store-forwarding-max-distance to invoke.texi

2024-12-05 Thread Filip Kastl
gcc/ChangeLog:

* doc/invoke.texi: Add store-forwarding-max-distance.

Signed-off-by: Filip Kastl 
---
 gcc/doc/invoke.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2409a41d50..4b1acf9b79c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17122,6 +17122,11 @@ diagnostics.
 @item store-merging-max-size
 Maximum size of a single store merging region in bytes.
 
+@item store-forwarding-max-distance
+Maximum number of instruction distance that a small store forwarded to a larger
+load may stall. Value '0' disables the cost checks for the
+avoid-store-forwarding pass.
+
 @item hash-table-verification-limit
 The number of elements for which hash table verification is done
 for each searched element.
-- 
2.46.1



[COMMITTED] params.opt: Fix typo

2024-12-05 Thread Filip Kastl
Add missing '=' after -param=cycle-accurate-model.

gcc/ChangeLog:

* params.opt: Add missing '=' after -param=cycle-accurate-model.

Signed-off-by: Filip Kastl 
---
 gcc/params.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index f5cc71d0f49..5853bf02f9e 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -66,7 +66,7 @@ Enable asan stack protection.
 Common Joined UInteger Var(param_asan_use_after_return) Init(1) 
IntegerRange(0, 1) Param Optimization
 Enable asan detection of use-after-return bugs.
 
--param=cycle-accurate-model
+-param=cycle-accurate-model=
 Common Joined UInteger Var(param_cycle_accurate_model) Init(1) IntegerRange(0, 
1) Param Optimization
 Whether the scheduling description is mostly a cycle-accurate model of the 
target processor and is likely to be spill aggressively to fill any pipeline 
bubbles.
 
-- 
2.46.1



[PATCH v2] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-12-09 Thread Filip Kastl
Hi Richi,

This is the second version of the patch.  I've lowered the default value of the
switch-lower-slow-alg-max-cases from 1 to 1000.  I've also noticed that I
forgot to add an entry into the param section of doc/invoke.texi so I fixed
that.

I'm bootstraping and regtesting this again just to be extra sure.  Ok to push
if bootstrap/regtest succeeds?

Cheers,
Filip Kastl


-- 8< --


This patch adds a limit on the number of cases of a switch.  When this
limit is exceeded, switch lowering decides to use faster but less
powerful algorithms.

In particular this means that for finding bit tests switch lowering
decides between the old dynamic programming O(n^2) algorithm and the
new greedy algorithm that Andi Kleen recently added but then reverted
due to PR117352.  It also means that switch lowering may bail out on
finding jump tables if the switch is too large  (Btw it also may not
bail!  It can happen that the greedy algorithms finds some bit tests
which then basically split the switch into multiple smaller switches and
those may be small enough to fit under the limit.)

The limit is implemented as --param switch-lower-slow-alg-max-cases.
Exceeding the limit is reported through -Wdisabled-optimization.

This patch fixes the issue with the greedy algorithm described in
PR117352.  The problem was incorrect usage of the is_beneficial()
heuristic.

gcc/ChangeLog:

PR middle-end/117091
PR middle-end/117352
* doc/invoke.texi: Add switch-lower-slow-alg-max-cases.
* params.opt: Add switch-lower-slow-alg-max-cases.
* tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
Note in a comment that we are looking for jump tables in
case sequences delimited by the already found bit tests.
(bit_test_cluster::find_bit_tests): Decide between
find_bit_tests_fast() and find_bit_tests_slow().
(bit_test_cluster::find_bit_tests_fast): New function.
(bit_test_cluster::find_bit_tests_slow): New function.
(switch_decision_tree::analyze_switch_statement): Report
exceeding the limit.
* tree-switch-conversion.h: Add find_bit_tests_fast() and
find_bit_tests_slow().

Co-Authored-By: Andi Kleen 
Signed-off-by: Filip Kastl 
---
 gcc/doc/invoke.texi   |   3 +
 gcc/params.opt|   4 ++
 gcc/tree-switch-conversion.cc | 112 +++---
 gcc/tree-switch-conversion.h  |  18 ++
 4 files changed, 130 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 14afd1934bd..3cb9a50b690 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16500,6 +16500,9 @@ Switch initialization conversion refuses to create 
arrays that are
 bigger than @option{switch-conversion-max-branch-ratio} times the number of
 branches in the switch.
 
+@item switch-lower-slow-alg-max-cases
+Maximum number of cases for slow switch lowering algorithms to be used.
+
 @item max-partial-antic-length
 Maximum length of the partial antic set computed during the tree
 partial redundancy elimination optimization (@option{-ftree-pre}) when
diff --git a/gcc/params.opt b/gcc/params.opt
index 5853bf02f9e..1c88d5212c4 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1052,6 +1052,10 @@ Maximum number of instruction distance that a small 
store forwarded to a larger
 Common Joined UInteger Var(param_switch_conversion_branch_ratio) Init(8) 
IntegerRange(1, 65536) Param Optimization
 The maximum ratio between array size and switch branches for a switch 
conversion to take place.
 
+-param=switch-lower-slow-alg-max-cases=
+Common Joined UInteger Var(param_switch_lower_slow_alg_max_cases) Init(1000) 
IntegerRange(1, 10) Param Optimization
+Maximum number of cases for slow switch lowering algorithms to be used.
+
 -param=modref-max-bases=
 Common Joined UInteger Var(param_modref_max_bases) Init(32) Param Optimization
 Maximum number of bases stored in each modref tree.
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 3436c2a8b98..b98e70cf7d1 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -54,6 +54,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, 
MA
 #include "tree-cfgcleanup.h"
 #include "hwint.h"
 #include "internal-fn.h"
+#include "diagnostic-core.h"
 
 /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
type in the GIMPLE type system that is language-independent?  */
@@ -1641,6 +1642,11 @@ jump_table_cluster::find_jump_tables (vec 
&clusters)
 return clusters.copy ();
 
   unsigned l = clusters.length ();
+
+  /* Note: l + 1 is the number of cases of the switch.  */
+  if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
+return clusters.copy ();
+
   auto_vec min;
   min.reserve (l + 1);
 
@@ -1771,16 +1777,80 @@ jump_table_cluster::is_beneficial (const vec 
&,
   

Re: [PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-12-05 Thread Filip Kastl
On Fri 2024-11-15 12:08:24, Richard Biener wrote:
> On Fri, 15 Nov 2024, Filip Kastl wrote:
> 
> > Hi,
> > 
> > Andi's greedy bit test finding algorithm was reverted.  I found a fix for 
> > the
> > problem that caused the revert.  I made this patch to reintroduce the greedy
> > alg into GCC.  However I think we should keep the old slow but more powerful
> > algorithm so I added a limit on the number of cases of the switch statement
> > that decides which of the two algorithms to use.
> > 
> > Bootstrapped and regtested on x86_64-linux.
> > 
> > I've also bootstrapped and regtested version of this patch where I force 
> > switch
> > lowering to only use the Andi's greedy algorithm (with my fix).  This also 
> > went
> > smoothly without any testsuite regression nor problems during bootstrap 
> > (which
> > surprised me -- I expected some testcases to rely on bit test finding to be
> > powerful but I double checked and really didn't find any problems).  OFC 
> > I've
> > also checked that the greedy algorithm doesn't cause the problems from 
> > PR117352
> > anymore.
> > 
> > Ok to push?
> > 
> > Thanks,
> > Filip Kastl
> > 
> > 
> > -- 8< --
> > 
> > 
> > This patch adds a limit on the number of cases of a switch.  When this
> > limit is exceeded, switch lowering decides to use faster but less
> > powerful algorithms.
> > 
> > In particular this means that for finding bit tests switch lowering
> > decides between the old dynamic programming O(n^2) algorithm and the
> > new greedy algorithm that Andi Kleen recently added but then reverted
> > due to PR117352.  It also means that switch lowering may bail out on
> > finding jump tables if the switch is too large  (Btw it also may not
> > bail!  It can happen that the greedy algorithms finds some bit tests
> > which then basically split the switch into multiple smaller switches and
> > those may be small enough to fit under the limit.)
> > 
> > The limit is implemented as --param switch-lower-slow-alg-max-cases.
> > Exceeding the limit is reported through -Wdisabled-optimization.
> > 
> > This patch fixes the issue with the greedy algorithm described in
> > PR117352.  The problem was incorrect usage of the is_beneficial()
> > heuristic.
> 
> Did you adjust the testcase to use the 1 default of the param
> and figure it's still fast enough with it?  If the alg is quadratic
> 1^2 is quite large ;)

Ah, sorry.  Forgot to mention how I came up with that number.  I didn't use any
scientific process for that.  I wasn't sure what number to use.  I picked 1
because it seemed like no human-written switch will have that many cases.  Then
I ran the testcase set to 1 and it finished under a second so I stuck with
1.

But yeah, thinking about it some more, 1 seems like a lot.  Maybe the limit
could be 1000.  That's also big enough.  I could try to run the testcase set to
1000 on my not-so-powerful laptop this time and check that even on that machine
it finishes "fast" (under a second I guess).

I'm also open to any other ideas for how to choose this number.

Cheers,
Filip


Re: [PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-12-05 Thread Filip Kastl
Hi Andi and Richi,

sorry for the late reply.  While looking for a testcase where the DP algorithm
performs better than Andi's greedy one I found out some new things about bit
test switch lowering and I had to think them through.  I write about what I
found bellow.

On Thu 2024-11-21 10:01:38, Andi Kleen wrote:
> On Fri, Nov 15, 2024 at 10:43:57AM +0100, Filip Kastl wrote:
> > Hi,
> > 
> > Andi's greedy bit test finding algorithm was reverted.  I found a fix for 
> > the
> > problem that caused the revert.  I made this patch to reintroduce the greedy
> > alg into GCC.  However I think we should keep the old slow but more powerful
> > algorithm so I added a limit on the number of cases of the switch statement
> > that decides which of the two algorithms to use.
> 
> Do we actually have a case where the DP algorithm is better?
> 
> In the bootstrap comparison greedy does produce less or the same clusters
> 

It seems reasonable to me to want a bit test cluster finding algorithm that
produces the minimal number of clusters (I count both the bit test clusters and
simple clusters which correspond to single cases of the original switch).  I
actually found out that, with respect to this metric, neither the DP algorithm
nor the greedy algorithm are optimal.

I came up with this testcase

int f0();
int f1();
int f2();
int f3();
int f4();
 
int main(int a)
{
switch (a)
{
case 0:
case 2:
case 4:
case 6:
return f0();
case 8:
return f1();
case 10:
case 14:
case 16:
case 18:
return f2();
case 12:
return f3();
case 20:
return f4();
}
return -1;
}

where the DP algorithm manages to cover all cases with two bit test clusters
but the greedy algorithm only finds one bit test cluster (and 4 simple clusters
for a total of 5 clusters).

But if you reverse the cases (meaning you map case i to case 20 - i), you get
this testcase

int f0();
int f1();
int f2();
int f3();
int f4();

int main(int a)
{
switch (a)
{
case 20:
case 18:
case 16:
case 14:
return f0();
case 12:
return f1();
case 10:
case 6:
case 4:
case 2:
return f2();
case 8:
return f3();
case 0:
return f4();
}
return -1;
}

where the situation is reversed:  Greedy manages to cover all cases with two
bit test clusters but DP only manages to find one bit test cluster.

This surprised me.  I thought that the DP algorithm is optimal and that the
greedy algorithm being better on some cases could be explained by the
differences in how the two algorithms used the 'is_beneficial' heuristic.  But
that's not the case.

I looked more thoroughly into the code of the DP algorithm.  I'm now pretty
convinced the DP algorithm could give optimal solution but simply isn't
configured to do so.  It first looks for minimal number of clusters and only
after it finds the clusters it goes through them and checks that they are
'is_beneficial'.  It breaks down the non-'is_beneficial' ones into simple
clusters.  Instead of this the algorithm could just look for the minimal number
of clusters such that each of them is 'is_beneficial' in the first place.  So
we could have an optimal bit test clustering algorithm.  We just don't at the
moment.

Since we are currently in stage 3, I don't expect we want to go make the DP
algorithm optimal.  We can do that in the next stage 1.  However, I think we
still want to fix PR117091.  I think we need to keep the limit I introduced at
least for the jump table clustering because that is still at least O(n^2).  For
the bit test clustering we can either use both the DP algorithm and the greedy
algorithm as I do in my patch or I guess we could just use the greedy
algorithm.  Since none of the two is optimal, it isn't really clear which one
is better.  On one hand the greedy algorithm is faster and Andi claims that he
saw it performing better.  On the other hand, I would like to make the DP
algorithm optimal and use it in GCC 16 and it seems to me simpler to keep the
DP algorithm than to remove it and add it again later.

What do you think, Andi and Richi?  I myself slightly prefer keeping the DP but
I would be fine with either option.

> 
> > +  k = 0;
> > +  while (i + k < l)
> > +   {
> > + cluster *end_cluster = clusters[i + k];
> > +
> > + /* Does value range fit into the BITS_PER_WORD window?  */
> > + HOST_W

Re: [wwwdocs] gcc-15/changes: Mention the new -mveclibabi=aocl option in the IA-32/x86-64 section

2025-03-15 Thread Filip Kastl
I would like to gently ping this.

Thanks,
Filip Kastl

On Mon 2025-02-17 10:52:22, Filip Kastl wrote:
> Hi,
> 
> I'm mentioning a change I made in the gcc-15/changes.html file.  Validated 
> with
> the W3 Validator.  Is this ok to be pushed?
> 
> Cheers,
> Filip Kastl
> 
> 
> -- 8< --
> 
> 
> ---
>  htdocs/gcc-15/changes.html | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
> index 853fad03..e8ff31da 100644
> --- a/htdocs/gcc-15/changes.html
> +++ b/htdocs/gcc-15/changes.html
> @@ -584,6 +584,12 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
>-mavx512pf, -mprefetchwt1,
>-mtune=knl, and -mtune=knm compiler switches.
>
> +  GCC now supports generating vectorized math calls to the math library
> +from AMD Optimizing CPU Libraries (AOCL LibM). This option is available
> +through the -mveclibabi=aocl compiler switch. GCC continues 
> to
> +support generating calls to AMD Core Math Library (ACML). However, that
> +library is end-of-life and AOCL offers many more vectorized functions.
> +  
>  
>  
>  
> -- 
> 2.47.1
> 


[PATCH] gimple: sccopy: Don't increment i after vec::unordered_remove()

2025-03-20 Thread Filip Kastl
Hi,

Ok to push if bootstrap and regtest (on x86_64 linux) succeeds?

Thanks,
Filip Kastl


-- 8< --


I increment the index variable in a loop even when I do
vec::unordered_remove() which causes the vector traversal to miss some
elements.  Mikael notified me of this mistake I made in my last patch.

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Don't
increment after vec::unordered_remove().

Reported-by: Mikael Morin 
Signed-off-by: Filip Kastl 
---
 gcc/gimple-ssa-sccopy.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index 298feb05571..ee2a7fa8a72 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -582,9 +582,11 @@ scc_copy_prop::propagate ()
 get removed.  That means parts of CFG get removed.  Those may
 contain copy statements.  For that reason we prune SCCs here.  */
   unsigned i;
-  for (i = 0; i < scc.length (); i++)
+  for (i = 0; i < scc.length ();)
if (gimple_bb (scc[i]) == NULL)
  scc.unordered_remove (i);
+   else
+ i++;
   if (scc.is_empty ())
{
  scc.release ();
-- 
2.47.1



Re: [PATCH] SCC-Copy: Add More Debug dumps

2025-03-20 Thread Filip Kastl
On Mon 2025-03-17 15:52:53, Andrew Pinski wrote:
> While debugging a failure, I noticed that SCC copy didn't print
> out what it was doing, e.g. replacing name1 with name 2.
> This adds that dump.

That's useful indeed.  Thanks for that addition!

Filip Kastl


Re: [PATCH] gimple: sccopy: Prune removed statements from SCCs [PR117919]

2025-04-05 Thread Filip Kastl
On Mon 2025-03-17 21:31:13, Mikael Morin wrote:
> Le 28/02/2025 à 17:01, Filip Kastl a écrit :
> > diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> > index 9f25fbaff36..7ffb5718ab6 100644
> > --- a/gcc/gimple-ssa-sccopy.cc
> > +++ b/gcc/gimple-ssa-sccopy.cc
> > @@ -568,6 +568,19 @@ scc_copy_prop::propagate ()
> >   {
> > vec scc = worklist.pop ();
> > +  /* When we do 'replace_scc_by_value' it may happen that some EH edges
> > +get removed.  That means parts of CFG get removed.  Those may
> > +contain copy statements.  For that reason we prune SCCs here.  */
> > +  unsigned i;
> > +  for (i = 0; i < scc.length (); i++)
> > +   if (gimple_bb (scc[i]) == NULL)
> > + scc.unordered_remove (i);
> Hello,
> this may not be important, but don't you need to skip the increment of i if
> the item is removed?
>  for (i = 0; i < scc.length ();)
>   if (gimple_bb (scc[i]) == NULL)
> scc.unordered_remove (i);
> else
>   i++;
> 
> Mikael

Hi Mikael,

Yes!  You're right and this is something I overlooked.  Thanks for the
correction.  I'll create a patch and mention you as Reported-By:

Filip


[PATCH 0/4] gimple: Rework the bit-test switch lowering algorithm

2025-05-01 Thread Filip Kastl
Hi,

while developing GCC 15, we found out that the bit-test switch lowering
algorithm is quadratic and that it is possible to construct a switch that
causes huge compile time (PR117091).

Andi Kleen came up with a fast bit-test lowering algorithm which was
1) linear
2) in some cases better than the original algorithm
3) in some cases worse than the original algorithm

For GCC 15 we decided to use the original algorithm when the given switch is
small and the fast algorithm when the switch is too big.

In this patch set I combine both algorithms.  The new algorithm gives strictly
better solutions than Andi's and the original one and runs in linear time.

More details below and in commit messages.  Bootstrapped and regtested on
x86_64 linux.

Ok for trunk?
Filip Kastl


Better solutions


When I say "better solutions", I mean the number of clusters.  Cluster is a
concept from switch lowering pass.  It is basically a group of cases that can
be handled by constantly many operations.

On every switch the new algorithm should output clustering of the cases such
that the number of clusters is the smallest possible.  This should be
guaranteed by how the algorithm works.

To be extra sure, I tested this by compiling GCC itself.  I bootstrapped GCC
once for each of the three algorithms (the original one, Andi's and the new
one) and I counted the number of clusters for each function (I didn't find a
way to easily compare individual switches).  Indeed the new algorithm either
gave a function with fewer clusters or a function with more switches
(if-to-switch conversion converts if-else chains into switches based on if
switch lowering can lower the resulting switch -- I think that this is also a
win).


A note about --param switch-lower-slow-alg-max-cases


The param switch-lower-slow-alg-max-cases specified how big a switch had to be
so that GCC would switch to the fast algorithm.  After this patch set it will
serve no purpose.  I kept it though.  There is the jump-table switch lowering
algorithm.  That algorithm is still quadratic.  Nobody ran into any real issues
with that yet, but it would be nice to come up with a fast algorithm for
jump-table lowering and set GCC to fall back to it when a switch is too big
(pr118353).  Then the param would be relevant again.  I plan to look into that
in this Stage 1.


Testing
---

- I compiled the testcase from PR117091 to confirm that the new algorithm
  spends reasonable time on it.

> gcc.sh -O2 -ftime-report big-switch.c 
Time variable  wall   GGC
 ipa SRA:  12.58 (  5%)   536  (  0%)
 dominator optimization :  55.13 ( 22%)  4687k (  1%)
 tree FRE   :  71.05 ( 29%)   744  (  0%)
 tree switch lowering   :  27.41 ( 11%)13M (  3%)
 tree modref:  13.24 (  5%)   480  (  0%)
 rest of compilation:  23.09 (  9%)  2816  (  0%)
 TOTAL  : 248.27  413M
(some entries omitted)

The original bug report claimed that this testcase ran over 30 minutes.  That's
not the case here.  Switch lowering doesn't even dominate the compilation time.

- I also ran a few SPEC CPU 2017 benchmarks (x86_64 linux).

with this patch set:

O2 generic
500.perlbench_r   NR   1237   6.72  
*
502.gcc_r NR   1216   6.57  
*

Ofast native
500.perlbench_r   NR   1251   6.33  
*
502.gcc_r NR   1210   6.73  
*

without this patch set:

O2 generic
500.perlbench_r   NR   1242   6.57  
*
502.gcc_r NR   1216   6.56  
*

Ofast native
500.perlbench_r   NR   1253   6.29  
*
502.gcc_r NR   1210   6.73  
*

The fourth row is the exec time in seconds.  There are only small differences
(2% max) and they favor the new algorithm.  They could just be noise, btw.

I've selected the perl and gcc benchmarks because I expect that those contain
many switches.

- I bootstrapped and regtested this on x86_64 linux.


Filip Kastl (4):
  gimple: Merge slow and fast bit-test switch lowering [PR117091]
  gimple: Make bit-test switch lowering more powerful
  gimple: Don't warn about using different algs for big switch lowering
[PR117091]
  gimple: Switch bit-test lowering testcases for the more powerful alg

 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c |  60 ++
 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c |  51 +
 gcc/tree-switch-conversion.cc| 245 +++
 gcc/tree-switch-conversion.h |  10 -
 4 files changed, 

[PATCH 1/4] gimple: Merge slow and fast bit-test switch lowering [PR117091]

2025-05-01 Thread Filip Kastl
PR117091 showed that bit-test switch lowering can take a lot of time.
The algorithm was O(n^2).  We therefore came up with a faster algorithm
(O(n * BITS_IN_WORD)) and made GCC choose between the slow and the fast
algorithm based on how big the switch is.

Here I combine the algorithms so that we get the results of the slower
algorithm in the faster asymptotic time.

PR middle-end/117091

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests_fast):
Remove function.
(bit_test_cluster::find_bit_tests_slow): Remove function.
(bit_test_cluster::find_bit_tests): We don't need to decide
between slow and fast so just put the modified (no longer) slow
algorithm here.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 107 ++
 1 file changed, 17 insertions(+), 90 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 39a8a893edd..a70274b0337 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1773,92 +1773,38 @@ jump_table_cluster::is_beneficial (const vec 
&,
   return end - start + 1 >= case_values_threshold ();
 }
 
-/* Find bit tests of given CLUSTERS, where all members of the vector are of
-   type simple_cluster.  Use a fast algorithm that might not find the optimal
-   solution (minimal number of clusters on the output).  New clusters are
-   returned.
-
-   You should call find_bit_tests () instead of calling this function
-   directly.  */
-
-vec
-bit_test_cluster::find_bit_tests_fast (vec &clusters)
-{
-  unsigned l = clusters.length ();
-  vec output;
-
-  output.create (l);
-
-  /* Look at sliding BITS_PER_WORD sized windows in the switch value space
- and determine if they are suitable for a bit test cluster.  Worst case
- this can examine every value BITS_PER_WORD-1 times.  */
-  unsigned k;
-  for (unsigned i = 0; i < l; i += k)
-{
-  hash_set targets;
-  cluster *start_cluster = clusters[i];
-
-  /* Find the biggest k such that clusters i to i+k-1 can be turned into a
-one big bit test cluster.  */
-  k = 0;
-  while (i + k < l)
-   {
- cluster *end_cluster = clusters[i + k];
-
- /* Does value range fit into the BITS_PER_WORD window?  */
- HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
-   end_cluster->get_high ());
- if (w == 0 || w > BITS_PER_WORD)
-   break;
-
- /* Check for max # of targets.  */
- if (targets.elements () == m_max_case_bit_tests
- && !targets.contains (end_cluster->m_case_bb))
-   break;
-
- targets.add (end_cluster->m_case_bb);
- k++;
-   }
-
-  if (is_beneficial (k, targets.elements ()))
-   {
- output.safe_push (new bit_test_cluster (clusters, i, i + k - 1,
- i == 0 && k == l));
-   }
-  else
-   {
- output.safe_push (clusters[i]);
- /* ??? Might be able to skip more.  */
- k = 1;
-   }
-}
-
-  return output;
-}
-
 /* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.  Use a slow (quadratic) algorithm that always
-   finds the optimal solution (minimal number of clusters on the output).  New
-   clusters are returned.
-
-   You should call find_bit_tests () instead of calling this function
-   directly.  */
+   are of type simple_cluster.  MAX_C is the approx max number of cases per
+   label.  New clusters are returned.  */
 
 vec
-bit_test_cluster::find_bit_tests_slow (vec &clusters)
+bit_test_cluster::find_bit_tests (vec &clusters, int max_c)
 {
+  if (!is_enabled () || max_c == 1)
+return clusters.copy ();
+
   unsigned l = clusters.length ();
   auto_vec min;
   min.reserve (l + 1);
 
   min.quick_push (min_cluster_item (0, 0, 0));
 
+  unsigned bits_in_word = GET_MODE_BITSIZE (word_mode);
+
   for (unsigned i = 1; i <= l; i++)
 {
   /* Set minimal # of clusters with i-th item to infinite.  */
   min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
 
-  for (unsigned j = 0; j < i; j++)
+  /* Since each cluster contains at least one case number and one bit test
+cluster can cover at most bits_in_word case numbers, we don't need to
+look farther than bits_in_word clusters back.  */
+  unsigned j;
+  if (i - 1 >= bits_in_word)
+   j = i - 1 - bits_in_word;
+  else
+   j = 0;
+  for (; j < i; j++)
{
  if (min[j].m_count + 1 < min[i].m_count
  && can_be_handled (clusters, j, i - 1))
@@ -1900,25 +1846,6 @@ bit_test_cluster::find_bit_tests_slow (vec 
&clusters)
   return output;
 }
 
-/* Find bit tests of given CLUSTERS, where all membe

[PATCH 2/4] gimple: Make bit-test switch lowering more powerful

2025-05-01 Thread Filip Kastl
A reasonable goal for bit-test lowering is to produce the least amount
of clusters for a given switch (a cluster is basically a group of cases
that can be handled by constantly many operations).

The current algorithm doesn't always give optimal solutions in that
sense.  This patch should fix this.  The important thing is basically
just to ask if a cluster is_beneficial() more proactively.

The patch also has a fix for a mistake which made bit-test lowering only
create BITS_IN_WORD - 1 big clusters.  There are also some new comments
that go into more detail on the dynamic programming algorithm.

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Modify the dynamic programming algorithm to take is_beneficial()
into account earlier.  To do this efficiently, copy some logic
from is_beneficial() here.  Add detailed comments about how the
DP algorithm works.
(bit_test_cluster::can_be_handled): Check that the cluster range
is >, not >= BITS_IN_WORD.  Remove the
"vec &, unsigned, unsigned" overloaded variant since
we no longer need it.
(bit_test_cluster::is_beneficial): Add a comment that this
function is closely tied to m_max_case_bit_tests.  Remove the
"vec &, unsigned, unsigned" overloaded variant since
we no longer need it.
* tree-switch-conversion.h: Remove the vec overloaded variants
of bit_test_cluster::is_beneficial and
bit_test_cluster::can_be_handled.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 153 +++---
 gcc/tree-switch-conversion.h  |  10 ---
 2 files changed, 67 insertions(+), 96 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index a70274b0337..4f0be8c43f0 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1783,58 +1783,98 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
   if (!is_enabled () || max_c == 1)
 return clusters.copy ();
 
+  /* Dynamic programming algorithm.
+
+ In: List of simple clusters
+ Out: List of simple clusters and bit test clusters such that each bit test
+ cluster can_be_handled() and is_beneficial()
+
+ Tries to merge consecutive clusters into bigger (bit test) ones.  Tries to
+ end up with as few clusters as possible.  */
+
   unsigned l = clusters.length ();
   auto_vec min;
   min.reserve (l + 1);
 
-  min.quick_push (min_cluster_item (0, 0, 0));
+  gcc_checking_assert (l > 0);
+  gcc_checking_assert (l <= INT_MAX);
 
-  unsigned bits_in_word = GET_MODE_BITSIZE (word_mode);
+  int bits_in_word = GET_MODE_BITSIZE (word_mode);
 
-  for (unsigned i = 1; i <= l; i++)
+  /* First phase: Compute the minimum number of clusters for each prefix of the
+ input list incrementally
+
+ min[i] = (count, j, _) means that the prefix ending with the (i-1)-th
+ element can be made to contain as few as count clusters and that in such
+ clustering the last cluster is made up of input clusters [j, i-1]
+ (inclusive).  */
+  min.quick_push (min_cluster_item (0, 0, INT_MAX));
+  min.quick_push (min_cluster_item (1, 0, INT_MAX));
+  for (int i = 2; i <= (int) l; i++)
 {
-  /* Set minimal # of clusters with i-th item to infinite.  */
-  min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
+  auto_vec unique_labels;
 
   /* Since each cluster contains at least one case number and one bit test
 cluster can cover at most bits_in_word case numbers, we don't need to
 look farther than bits_in_word clusters back.  */
-  unsigned j;
-  if (i - 1 >= bits_in_word)
-   j = i - 1 - bits_in_word;
-  else
-   j = 0;
-  for (; j < i; j++)
+  for (int j = i - 1; j >= 0 && j >= i - bits_in_word; j--)
{
- if (min[j].m_count + 1 < min[i].m_count
- && can_be_handled (clusters, j, i - 1))
-   min[i] = min_cluster_item (min[j].m_count + 1, j, INT_MAX);
-   }
+ /* Consider creating a bit test cluster from input clusters [j, i-1]
+(inclusive)  */
 
-  gcc_checking_assert (min[i].m_count != INT_MAX);
+ simple_cluster *sc = static_cast (clusters[j]);
+ unsigned label = sc->m_case_bb->index;
+ if (!unique_labels.contains (label))
+   {
+ if (unique_labels.length () >= m_max_case_bit_tests)
+   /* is_beneficial() will be false for this and the following
+  iterations.  */
+   break;
+ unique_labels.quick_push (label);
+   }
+
+ unsigned new_count = min[j].m_count + 1;
+
+ if (j == i - 1)
+   {
+ min.quick_push (min_cluster_item (new_count, j, INT_MAX));
+ continue;
+   }
+
+ un

[PATCH 3/4] gimple: Don't warn about using different algs for big switch lowering [PR117091]

2025-05-01 Thread Filip Kastl
We currently don't switch to a faster switch lowering algorithm when a
switch is too big.  This patch removes a warning about this.

PR middle-end/117091

gcc/ChangeLog:

* tree-switch-conversion.cc 
(switch_decision_tree::analyze_switch_statement):
Remove warning about using different algorithms.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 4f0be8c43f0..dea217a01ef 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -2257,13 +2257,6 @@ switch_decision_tree::analyze_switch_statement ()
 
   reset_out_edges_aux (m_switch);
 
-  if (l > (unsigned) param_switch_lower_slow_alg_max_cases)
-warning_at (gimple_location (m_switch), OPT_Wdisabled_optimization,
-  "Using faster switch lowering algorithms. "
-  "Number of switch cases (%d) exceeds "
-  "%<--param=switch-lower-slow-alg-max-cases=%d%> limit.",
-  l, param_switch_lower_slow_alg_max_cases);
-
   /* Find bit-test clusters.  */
   vec output = bit_test_cluster::find_bit_tests (clusters, max_c);
 
-- 
2.48.1



[PATCH 4/4] gimple: Switch bit-test lowering testcases for the more powerful alg

2025-05-01 Thread Filip Kastl
This patch adds 2 testcases.  One tests that GCC is able to create
bit-test clusters of size 64.  The other one contains two switches which
GCC wouldn't completely cover with bit-test clusters before the changes
from this patch set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/switch-5.c: New test.
* gcc.dg/tree-ssa/switch-6.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c | 60 
 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c | 51 
 2 files changed, 111 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/switch-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
new file mode 100644
index 000..b05742cf153
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
@@ -0,0 +1,60 @@
+/* { dg-do compile { target { { x86_64-*-* aarch64-*-* ia64-*-* powerpc64-*-* 
} && lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-switchlower1" } */
+
+int f0();
+int f1();
+int f2();
+int f3();
+int f4();
+ 
+int foo(int a)
+{
+switch (a)
+{
+case 0:
+case 2:
+case 4:
+case 6:
+return f0();
+case 8:
+return f1();
+case 10:
+case 14:
+case 16:
+case 18:
+return f2();
+case 12:
+return f3();
+case 20:
+return f4();
+}
+return -1;
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:0-8 
BT:10-20" "switchlower1" } } */
+
+int bar(int a)
+{
+switch (a)
+{
+case 20:
+case 18:
+case 16:
+case 14:
+return f0();
+case 12:
+return f1();
+case 10:
+case 6:
+case 4:
+case 2:
+return f2();
+case 8:
+return f3();
+case 0:
+return f4();
+}
+return -1;
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:0-10 
BT:12-20" "switchlower1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/switch-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/switch-6.c
new file mode 100644
index 000..bbbc87462c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/switch-6.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { { x86_64-*-* aarch64-*-* ia64-*-* powerpc64-*-* 
} && lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-switchlower1 -fno-jump-tables" } */
+
+/* Test that bit-test switch lowering can create cluster of size 64 (there was
+   an of-by-one error causing it to only do 63 before).  */
+
+int f();
+
+int foo(int a)
+{
+switch (a)
+{
+case 0:
+case 3:
+case 5:
+case 7:
+case 9:
+case 11:
+case 13:
+case 15:
+case 17:
+case 19:
+case 21:
+case 23:
+case 25:
+case 27:
+case 29:
+case 31:
+case 33:
+case 35:
+case 37:
+case 39:
+case 41:
+case 43:
+case 45:
+case 47:
+case 49:
+case 51:
+case 53:
+case 55:
+case 57:
+case 59:
+case 61:
+case 63:
+return f();
+default:
+return -1;
+}
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:0-63" 
"switchlower1" } } */
-- 
2.48.1



Re: [PATCH] sreal.h: fix typo in the comment for sreal::max

2025-05-01 Thread Filip Kastl
On Wed 2025-04-30 08:44:11, Vojtěch Káně wrote:
> ---
>  gcc/sreal.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/sreal.h b/gcc/sreal.h
> index 8700807a131..c5aef1f3a82 100644
> --- a/gcc/sreal.h
> +++ b/gcc/sreal.h
> @@ -118,7 +118,7 @@ public:
>  return min;
>}
>  
> -  /* Global minimum sreal can hold.  */
> +  /* Global maximum sreal can hold.  */
>inline static sreal max ()
>{
>  sreal max;
> -- 
> 2.30.2

Hi Vojta!

I believe that even for small patches you have to include a ChangeLog entry in
the commit message.  See this https://gcc.gnu.org/contribute.html#patches.
Basically you run `contrib/git-gcc-customization.sh` on your local repo and
then use `git gcc-commit-mklog` instead of `git commit` which gives you a
template that you then fill out.

Cheers,
Filip


Re: [PATCH] Add diffsummary.py to contrib

2025-05-02 Thread Filip Kastl
Hi Andi,

I cannot approve patches but LGTM.  Maybe 'from urllib.error import URLError'
should be removed since it isn't needed.

Seems useful.  The code looks good (although I'm not proficient in using
html.parser and urllib).  I've tried to use the script and everything worked.
I've run flake8 on this and didn't see any pep8 violation that other contrib
scripts wouldn't violate too :) (lines longer than 79 chars, not enough blank
lines and similar stuff).

Thanks,
Filip Kastl

On Tue 2025-04-29 11:38:26, Andi Kleen wrote:
> This adds an automatic downloader for the latest test results from
> the mailing list archive and supports diffing test_summary to it.
> Useful if you don't want to run your own baseline.
> 
> contrib/ChangeLog:
> 
>   * diffsummary.py: New file.
> ---
>  contrib/diffsummary.py | 104 +
>  1 file changed, 104 insertions(+)
>  create mode 100755 contrib/diffsummary.py
> 
> diff --git a/contrib/diffsummary.py b/contrib/diffsummary.py
> new file mode 100755
> index 000..0a666707272
> --- /dev/null
> +++ b/contrib/diffsummary.py
> @@ -0,0 +1,104 @@
> +#!/usr/bin/env python3
> +# Show or diff test_summary output to latest posted test result for a 
> platform
> +# test_summary -t | diffsummary.py (compare test result from current build 
> to last posted)
> +# diffsummary.py --show (only show last posted)
> +from urllib.request import urlopen
> +from urllib.error import URLError
> +from html.parser import HTMLParser
> +import time
> +import re
> +import argparse
> +import platform
> +import tempfile
> +import os
> +import sys
> +
> +baseurl = "https://gcc.gnu.org/pipermail/gcc-testresults/";
> +
> +ap = argparse.ArgumentParser(description="Diff stdin to latest posted test 
> result and set exit code to result")
> +ap.add_argument("--branch", default="experimental",
> +help="Branch to match (regex substring match)")
> +ap.add_argument("--arch", default=platform.machine() + ".*" + 
> platform.system().lower(),
> +help="architecture to match (regex substring match)")
> +ap.add_argument("--retrytime", default=1, type=int,
> +help="time to wait before fetching next page (fractional 
> seconds)")
> +ap.add_argument("--show", help="Show last test result, but do not diff", 
> action="store_true", default=False)
> +ap.add_argument("--url", help="Show URLs downloaded", action="store_true")
> +ap.add_argument("--diff", help="Diff program to use (default diff)", 
> default="diff")
> +ap.add_argument("--diff-args", help="Diff arguments to use (default -u)", 
> default="-u")
> +ap.add_argument("--skip", type=int, default=0, help="Skip first N posted 
> test results")
> +args = ap.parse_args()
> +
> +class ParseMonths(HTMLParser):
> +def __init__(self):
> +super().__init__()
> +self.months = []
> +def handle_starttag(self, tag, attrs):
> +if tag == "a" and "thread.html" in attrs[0][1]:
> +self.months.append(attrs[0][1])
> +
> +class ParseThread(HTMLParser):
> +def __init__(self):
> +super().__init__()
> +self.link = None
> +self.target = []
> +def handle_starttag(self, tag, attrs):
> +if (tag == "a"
> +and attrs[0][1][0].isdigit()
> +and attrs[0][1].endswith(".html")):
> +self.link = attrs[0][1]
> +else:
> +self.link = None
> +def handle_data(self, data):
> +if (self.link
> +and re.search(args.branch, data)
> +and re.search(args.arch, data)):
> +self.target.append(thread.link)
> +
> +class ParseArticle(HTMLParser):
> +def __init__(self):
> +super().__init__()
> +self.contents = None
> +self.pre = False
> +self.data = None
> +def handle_starttag(self, tag, attrs):
> +self.pre = tag == "pre"
> +def handle_data(self, data):
> +if self.pre and not self.data:
> +self.data = data
> +
> +def my_urlopen(url):
> +if args.url:
> +print(url)
> +return urlopen(url)
> +
> +with my_urlopen(baseurl) as index:
> +months = ParseMonths()
> +months.feed(index.read().decode('utf-8'))
> +if len(months.months) == 0:
> +sys.exit("no months found")
> +for mon

Re: [PATCH] gimple: sccopy: Prune removed statements from SCCs [PR117919]

2025-03-03 Thread Filip Kastl
Hi Richard,

I almost forgot that the issue is also present on GCC 14.  Can I backport to
releases/gcc-14 branch?

Thanks,
Filip Kastl

On Fri 2025-02-28 17:46:42, Richard Biener wrote:
> 
> 
> > Am 28.02.2025 um 17:02 schrieb Filip Kastl :
> > 
> > Hi,
> > 
> > bootstrapped and regtested on x86_64 linux.  Ok to be pushed?
> 
> Ok
> 
> Richard 
> 
> > Thanks,
> > Filip Kastl
> > 
> > 
> > -- 8< --
> > 
> > 
> > While writing the sccopy pass I didn't realize that 'replace_uses_by ()' can
> > remove portions of the CFG.  This happens when replacing arguments of some
> > statement results in the removal of an EH edge.  Because of this sccopy can
> > then work with GIMPLE statements that aren't part of the IR anymore.  In
> > PR117919 this triggered an assertion within the pass which assumes that
> > statements the pass works with are reachable.
> > 
> > This patch tells the pass to notice when a statement isn't in the IR anymore
> > and remove it from it's worklist.
> > 
> >PR tree-optimization/117919
> > 
> > gcc/ChangeLog:
> > 
> >* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Prune
> >statements that 'replace_uses_by ()' removed.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >* g++.dg/pr117919.C: New test.
> > 
> > Signed-off-by: Filip Kastl 
> > ---
> > gcc/gimple-ssa-sccopy.cc| 13 +
> > gcc/testsuite/g++.dg/pr117919.C | 52 +
> > 2 files changed, 65 insertions(+)
> > create mode 100644 gcc/testsuite/g++.dg/pr117919.C
> > 
> > diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> > index 9f25fbaff36..7ffb5718ab6 100644
> > --- a/gcc/gimple-ssa-sccopy.cc
> > +++ b/gcc/gimple-ssa-sccopy.cc
> > @@ -568,6 +568,19 @@ scc_copy_prop::propagate ()
> > {
> >   vec scc = worklist.pop ();
> > 
> > +  /* When we do 'replace_scc_by_value' it may happen that some EH edges
> > + get removed.  That means parts of CFG get removed.  Those may
> > + contain copy statements.  For that reason we prune SCCs here.  */
> > +  unsigned i;
> > +  for (i = 0; i < scc.length (); i++)
> > +if (gimple_bb (scc[i]) == NULL)
> > +  scc.unordered_remove (i);
> > +  if (scc.is_empty ())
> > +{
> > +  scc.release ();
> > +  continue;
> > +}
> > +
> >   auto_vec inner;
> >   hash_set outer_ops;
> >   tree last_outer_op = NULL_TREE;
> > diff --git a/gcc/testsuite/g++.dg/pr117919.C 
> > b/gcc/testsuite/g++.dg/pr117919.C
> > new file mode 100644
> > index 000..fa2d9c9cd1e
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pr117919.C
> > @@ -0,0 +1,52 @@
> > +/* PR tree-optimization/117919 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fno-tree-forwprop -fnon-call-exceptions 
> > --param=early-inlining-insns=192 -std=c++20" } */
> > +
> > +char _M_p, _M_construct___beg;
> > +struct _Alloc_hider {
> > +  _Alloc_hider(char);
> > +};
> > +long _M_string_length;
> > +void _M_destroy();
> > +void _S_copy_chars(char *, char *, char *) noexcept;
> > +char _M_local_data();
> > +struct Trans_NS___cxx11_basic_string {
> > +  _Alloc_hider _M_dataplus;
> > +  bool _M_is_local() {
> > +if (_M_local_data())
> > +  if (_M_string_length)
> > +return true;
> > +return false;
> > +  }
> > +  void _M_dispose() {
> > +if (!_M_is_local())
> > +  _M_destroy();
> > +  }
> > +  char *_M_construct___end;
> > +  Trans_NS___cxx11_basic_string(Trans_NS___cxx11_basic_string &)
> > +  : _M_dataplus(0) {
> > +struct _Guard {
> > +  ~_Guard() { _M_guarded->_M_dispose(); }
> > +  Trans_NS___cxx11_basic_string *_M_guarded;
> > +} __guard0;
> > +_S_copy_chars(&_M_p, &_M_construct___beg, _M_construct___end);
> > +  }
> > +};
> > +namespace filesystem {
> > +struct path {
> > +  path();
> > +  Trans_NS___cxx11_basic_string _M_pathname;
> > +};
> > +} // namespace filesystem
> > +struct FileWriter {
> > +  filesystem::path path;
> > +  FileWriter() : path(path) {}
> > +};
> > +struct LanguageFileWriter : FileWriter {
> > +  LanguageFileWriter(filesystem::path) {}
> > +};
> > +int
> > +main() {
> > +  filesystem::path output_file;
> > +  LanguageFileWriter writer(output_file);
> > +}
> > --
> > 2.47.1
> > 


[wwwdocs v2] gcc-15/changes: Mention the new -mveclibabi=aocl option in the IA-32/x86-64 section

2025-03-07 Thread Filip Kastl
Hi Gerald,

On Thu 2025-03-06 18:38:12, Gerald Pfeifer wrote:
> On Thu, 6 Mar 2025, Filip Kastl wrote:
> > I would like to gently ping this.
> 
> Ooops, sorry.
> 
> > > +  GCC now supports generating vectorized math calls to the math 
> > > library
> > > +from AMD Optimizing CPU Libraries (AOCL LibM). This option is 
> > > available
> > > +through the -mveclibabi=aocl compiler switch. GCC 
> > > continues to
> > > +support generating calls to AMD Core Math Library (ACML). However, 
> > > that
> > > +library is end-of-life and AOCL offers many more vectorized 
> > > functions.
> > > +  
> 
> This is fine, though I suggest to work out a bit more clearly what the 
> default is. I guess it's neither of these two, unless... ?
> 
> And one specific suggestion (which you may opt to implement or not): How 
> about "GCC still supports generating" to avoid having three verb forms in
> sequence (which made my head hurt a bit ;-)?
> 
> Gerald

Here is a second version of the patch.  Thanks for the suggestions.  I
addressed them both.  Do you think this is better?

Thanks,
Filip Kastl


-- 8< --


---
 htdocs/gcc-15/changes.html | 8 
 1 file changed, 8 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index c1116899..bc071501 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -592,6 +592,14 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
   -mavx512pf, -mprefetchwt1,
   -mtune=knl, and -mtune=knm compiler switches.
   
+  With the -mveclibabi compiler switch GCC is able to generate
+vectorized calls to external libraries. GCC 15 newly supports generating
+vectorized math calls to the math library from AMD Optimizing CPU Libraries
+(AOCL LibM). This option is available through
+-mveclibabi=aocl. GCC still supports generating calls to AMD
+Core Math Library (ACML). However, that library is end-of-life and AOCL
+offers many more vectorized functions.
+  
 
 
 
-- 
2.47.1



Re: [wwwdocs v2] gcc-15/changes: Mention the new -mveclibabi=aocl option in the IA-32/x86-64 section

2025-03-08 Thread Filip Kastl
On Fri 2025-03-07 16:36:17, Gerald Pfeifer wrote:
> On Fri, 7 Mar 2025, Filip Kastl wrote:
> > Here is a second version of the patch.  Thanks for the suggestions.  I
> > addressed them both.  Do you think this is better?
> 
> Loving it. :)
> 
> Thank you,
> Gerald

Ok, pushed.  Thanks!

Filip


[PATCH] gimple: sccopy: Prune removed statements from SCCs [PR117919]

2025-02-28 Thread Filip Kastl
Hi,

bootstrapped and regtested on x86_64 linux.  Ok to be pushed?

Thanks,
Filip Kastl


-- 8< --


While writing the sccopy pass I didn't realize that 'replace_uses_by ()' can
remove portions of the CFG.  This happens when replacing arguments of some
statement results in the removal of an EH edge.  Because of this sccopy can
then work with GIMPLE statements that aren't part of the IR anymore.  In
PR117919 this triggered an assertion within the pass which assumes that
statements the pass works with are reachable.

This patch tells the pass to notice when a statement isn't in the IR anymore
and remove it from it's worklist.

PR tree-optimization/117919

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Prune
statements that 'replace_uses_by ()' removed.

gcc/testsuite/ChangeLog:

* g++.dg/pr117919.C: New test.

Signed-off-by: Filip Kastl 
---
 gcc/gimple-ssa-sccopy.cc| 13 +
 gcc/testsuite/g++.dg/pr117919.C | 52 +
 2 files changed, 65 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr117919.C

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index 9f25fbaff36..7ffb5718ab6 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -568,6 +568,19 @@ scc_copy_prop::propagate ()
 {
   vec scc = worklist.pop ();
 
+  /* When we do 'replace_scc_by_value' it may happen that some EH edges
+get removed.  That means parts of CFG get removed.  Those may
+contain copy statements.  For that reason we prune SCCs here.  */
+  unsigned i;
+  for (i = 0; i < scc.length (); i++)
+   if (gimple_bb (scc[i]) == NULL)
+ scc.unordered_remove (i);
+  if (scc.is_empty ())
+   {
+ scc.release ();
+ continue;
+   }
+
   auto_vec inner;
   hash_set outer_ops;
   tree last_outer_op = NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/pr117919.C b/gcc/testsuite/g++.dg/pr117919.C
new file mode 100644
index 000..fa2d9c9cd1e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr117919.C
@@ -0,0 +1,52 @@
+/* PR tree-optimization/117919 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-forwprop -fnon-call-exceptions 
--param=early-inlining-insns=192 -std=c++20" } */
+
+char _M_p, _M_construct___beg;
+struct _Alloc_hider {
+  _Alloc_hider(char);
+};
+long _M_string_length;
+void _M_destroy();
+void _S_copy_chars(char *, char *, char *) noexcept;
+char _M_local_data();
+struct Trans_NS___cxx11_basic_string {
+  _Alloc_hider _M_dataplus;
+  bool _M_is_local() {
+if (_M_local_data())
+  if (_M_string_length)
+return true;
+return false;
+  }
+  void _M_dispose() {
+if (!_M_is_local())
+  _M_destroy();
+  }
+  char *_M_construct___end;
+  Trans_NS___cxx11_basic_string(Trans_NS___cxx11_basic_string &)
+  : _M_dataplus(0) {
+struct _Guard {
+  ~_Guard() { _M_guarded->_M_dispose(); }
+  Trans_NS___cxx11_basic_string *_M_guarded;
+} __guard0;
+_S_copy_chars(&_M_p, &_M_construct___beg, _M_construct___end);
+  }
+};
+namespace filesystem {
+struct path {
+  path();
+  Trans_NS___cxx11_basic_string _M_pathname;
+};
+} // namespace filesystem
+struct FileWriter {
+  filesystem::path path;
+  FileWriter() : path(path) {}
+};
+struct LanguageFileWriter : FileWriter {
+  LanguageFileWriter(filesystem::path) {}
+};
+int
+main() {
+  filesystem::path output_file;
+  LanguageFileWriter writer(output_file);
+}
-- 
2.47.1



[wwwdocs] gcc-15/changes: Mention the new -mveclibabi=aocl option in the IA-32/x86-64 section

2025-02-17 Thread Filip Kastl
Hi,

I'm mentioning a change I made in the gcc-15/changes.html file.  Validated with
the W3 Validator.  Is this ok to be pushed?

Cheers,
Filip Kastl


-- 8< --


---
 htdocs/gcc-15/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 853fad03..e8ff31da 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -584,6 +584,12 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
   -mavx512pf, -mprefetchwt1,
   -mtune=knl, and -mtune=knm compiler switches.
   
+  GCC now supports generating vectorized math calls to the math library
+from AMD Optimizing CPU Libraries (AOCL LibM). This option is available
+through the -mveclibabi=aocl compiler switch. GCC continues to
+support generating calls to AMD Core Math Library (ACML). However, that
+library is end-of-life and AOCL offers many more vectorized functions.
+  
 
 
 
-- 
2.47.1



[COMMITED] invoke.texi: Fix typo in the file-cache-lines param

2025-02-20 Thread Filip Kastl
Pushed as obvious.

Filip Kastl


-- 8< --


file-cache-lines param was documented as file-cache-files.  This fixes
the typo.

gcc/ChangeLog:

* doc/invoke.texi: Fix typo file-cache-files ->
file-cache-lines.

Signed-off-by: Filip Kastl 
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0c7adc039b5..bad49a017cc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15787,7 +15787,7 @@ Max number of files in the file cache.
 The file cache is used to print source lines in diagnostics and do some
 source checks like @option{-Wmisleading-indentation}.
 
-@item file-cache-files
+@item file-cache-lines
 Max number of lines to index into file cache. When 0 this is automatically 
sized.
 The file cache is used to print source lines in diagnostics and do some
 source checks like @option{-Wmisleading-indentation}.
-- 
2.47.1



Re: [PATCH] gimple: sccopy: Don't increment i after vec::unordered_remove()

2025-04-04 Thread Filip Kastl
On Thu 2025-03-20 13:31:38, Richard Biener wrote:
> On Thu, 20 Mar 2025, Filip Kastl wrote:
> 
> > Hi,
> > 
> > Ok to push if bootstrap and regtest (on x86_64 linux) succeeds?
> 
> OK.
> 

And I again almost forgot:  Can I backport this to releases/gcc-14 branch?

Filip

> > Thanks,
> > Filip Kastl
> > 
> > 
> > -- 8< --
> > 
> > 
> > I increment the index variable in a loop even when I do
> > vec::unordered_remove() which causes the vector traversal to miss some
> > elements.  Mikael notified me of this mistake I made in my last patch.
> > 
> > gcc/ChangeLog:
> > 
> > * gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Don't
> > increment after vec::unordered_remove().
> > 
> > Reported-by: Mikael Morin 
> > Signed-off-by: Filip Kastl 
> > ---
> >  gcc/gimple-ssa-sccopy.cc | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> > index 298feb05571..ee2a7fa8a72 100644
> > --- a/gcc/gimple-ssa-sccopy.cc
> > +++ b/gcc/gimple-ssa-sccopy.cc
> > @@ -582,9 +582,11 @@ scc_copy_prop::propagate ()
> >  get removed.  That means parts of CFG get removed.  Those may
> >  contain copy statements.  For that reason we prune SCCs here.  */
> >unsigned i;
> > -  for (i = 0; i < scc.length (); i++)
> > +  for (i = 0; i < scc.length ();)
> > if (gimple_bb (scc[i]) == NULL)
> >   scc.unordered_remove (i);
> > +   else
> > + i++;
> >if (scc.is_empty ())
> > {
> >   scc.release ();
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] gimple: Don't assert that switch has nondefault cases during lowering [PR120080]

2025-05-09 Thread Filip Kastl
Hi,

bootstrapped and regtested on x86_64 linux.  Ok to push?

Filip Kastl


-- 8< ---


I have mistakenly assumed that switch lowering cannot encounter a switch
with zero clusters.  This patch removes the relevant assert and instead
gives up bit-test lowering when this happens.

PR tree-optimization/120080

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Replace assert with return.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index dea217a01ef..bd4de966892 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
  end up with as few clusters as possible.  */
 
   unsigned l = clusters.length ();
-  auto_vec min;
-  min.reserve (l + 1);
 
-  gcc_checking_assert (l > 0);
+  if (l == 0)
+return clusters.copy ();
   gcc_checking_assert (l <= INT_MAX);
 
+  auto_vec min;
+  min.reserve (l + 1);
+
   int bits_in_word = GET_MODE_BITSIZE (word_mode);
 
   /* First phase: Compute the minimum number of clusters for each prefix of the
-- 
2.49.0



[COMMITED] testsuite: Disable bit tests in aarch64/pr99988.c

2025-05-10 Thread Filip Kastl
My recent changes to bit-test switch lowering broke pr99988.c testcase.
The testcase assumes a switch will be lowered using jump tables.  Make
the testcase run with -fno-bit-tests.

Pushed as obvious.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr99988.c: Add -fno-bit-tests.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.target/aarch64/pr99988.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c 
b/gcc/testsuite/gcc.target/aarch64/pr99988.c
index 7cca4962944..c09ce67c0fa 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr99988.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-O2 -mbranch-protection=standard" } */
+/* { dg-options "-O2 -mbranch-protection=standard -fno-bit-tests" } */
 /* { dg-final { scan-assembler-times {bti j} 13 } } */
 int a;
 int c();
-- 
2.49.0



[COMMITED] [PATCH v2] gimple: Don't assert that switch has nondefault cases during lowering [PR120080]

2025-05-10 Thread Filip Kastl
I have mistakenly assumed that switch lowering cannot encounter a switch
with zero clusters.  This patch removes the relevant assert and instead
gives up bit-test lowering when this happens.

PR tree-optimization/120080

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Replace assert with return.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr120080.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr120080.c | 26 
 gcc/tree-switch-conversion.cc|  8 +---
 2 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr120080.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c
new file mode 100644
index 000..d71ef5e9dd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O2" } */
+
+void __GIMPLE (ssa,startwith("switchlower1"))
+foo (int b)
+{
+  __BB(2):
+  switch (b) {default: L9; case 0: L5; case 5: L5; case 101: L5; }
+
+  __BB(3):
+L9:
+  switch (b) {default: L7; case 5: L6; case 101: L6; }
+
+  __BB(4):
+L6:
+  __builtin_unreachable ();
+
+  __BB(5):
+L7:
+  __builtin_trap ();
+
+  __BB(6):
+L5:
+  return;
+
+}
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index dea217a01ef..bd4de966892 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
  end up with as few clusters as possible.  */
 
   unsigned l = clusters.length ();
-  auto_vec min;
-  min.reserve (l + 1);
 
-  gcc_checking_assert (l > 0);
+  if (l == 0)
+return clusters.copy ();
   gcc_checking_assert (l <= INT_MAX);
 
+  auto_vec min;
+  min.reserve (l + 1);
+
   int bits_in_word = GET_MODE_BITSIZE (word_mode);
 
   /* First phase: Compute the minimum number of clusters for each prefix of the
-- 
2.49.0



[COMMITED] MAINTAINERS: Add myself to write after approval

2023-06-14 Thread Filip Kastl via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself to write after approval
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c8b787b6e1e..4a9a656647e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -484,6 +484,7 @@ Kean Johnston 
 Phillip Jordan 
 Tim Josling 
 Victor Kaplansky 
+Filip Kastl 
 Geoffrey Keating 
 Brendan Kehoe 
 Andi Kleen 
-- 
2.40.1


[PATCH] value-prof.cc: Correct edge prob calculation.

2023-06-15 Thread Filip Kastl via Gcc-patches
The mod-subtract optimization with ncounts==1 produced incorrect edge
probabilities due to incorrect conditional probability calculation. This
patch fixes the calculation.

gcc/ChangeLog:

* value-prof.cc (gimple_mod_subtract_transform): Correct edge
  prob calculation.

Signed-off-by: Filip Kastl 
---
 gcc/value-prof.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
index f40e58ac4f2..580d6dd648d 100644
--- a/gcc/value-prof.cc
+++ b/gcc/value-prof.cc
@@ -1186,7 +1186,11 @@ gimple_mod_subtract_transform (gimple_stmt_iterator *si)
   if (all > 0)
 {
   prob1 = profile_probability::probability_in_gcov_type (count1, all);
-  prob2 = profile_probability::probability_in_gcov_type (count2, all);
+  if (all == count1)
+   prob2 = profile_probability::even ();
+  else
+   prob2 = profile_probability::probability_in_gcov_type (count2, all -
+  count1);
 }
   else
 {
-- 
2.40.1