Hi,

Andi's greedy bit test finding algorithm was reverted.  I found a fix for the
problem that caused the revert.  I made this patch to reintroduce the greedy
alg into GCC.  However I think we should keep the old slow but more powerful
algorithm so I added a limit on the number of cases of the switch statement
that decides which of the two algorithms to use.

Bootstrapped and regtested on x86_64-linux.

I've also bootstrapped and regtested version of this patch where I force switch
lowering to only use the Andi's greedy algorithm (with my fix).  This also went
smoothly without any testsuite regression nor problems during bootstrap (which
surprised me -- I expected some testcases to rely on bit test finding to be
powerful but I double checked and really didn't find any problems).  OFC I've
also checked that the greedy algorithm doesn't cause the problems from PR117352
anymore.

Ok to push?

Thanks,
Filip Kastl


-- 8< --


This patch adds a limit on the number of cases of a switch.  When this
limit is exceeded, switch lowering decides to use faster but less
powerful algorithms.

In particular this means that for finding bit tests switch lowering
decides between the old dynamic programming O(n^2) algorithm and the
new greedy algorithm that Andi Kleen recently added but then reverted
due to PR117352.  It also means that switch lowering may bail out on
finding jump tables if the switch is too large  (Btw it also may not
bail!  It can happen that the greedy algorithms finds some bit tests
which then basically split the switch into multiple smaller switches and
those may be small enough to fit under the limit.)

The limit is implemented as --param switch-lower-slow-alg-max-cases.
Exceeding the limit is reported through -Wdisabled-optimization.

This patch fixes the issue with the greedy algorithm described in
PR117352.  The problem was incorrect usage of the is_beneficial()
heuristic.

gcc/ChangeLog:

        PR middle-end/117091
        PR middle-end/117352
        * params.opt: Add switch-lower-slow-alg-max-cases.
        * tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
        Note in a comment that we are looking for jump tables in
        case sequences delimited by the already found bit tests.
        (bit_test_cluster::find_bit_tests): Decide between
        find_bit_tests_fast() and find_bit_tests_slow().
        (bit_test_cluster::find_bit_tests_fast): New function.
        (bit_test_cluster::find_bit_tests_slow): New function.
        (switch_decision_tree::analyze_switch_statement): Report
        exceeding the limit.
        * tree-switch-conversion.h: Add find_bit_tests_fast() and
        find_bit_tests_slow().

Co-Authored-By: Andi Kleen <a...@gcc.gnu.org>
Signed-off-by: Filip Kastl <fka...@suse.cz>
---
 gcc/params.opt                |   4 ++
 gcc/tree-switch-conversion.cc | 112 +++++++++++++++++++++++++++++++---
 gcc/tree-switch-conversion.h  |  18 ++++++
 3 files changed, 127 insertions(+), 7 deletions(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index 7c572774df2..edd638565b5 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1044,6 +1044,10 @@ Maximum size of a single store merging region in bytes.
 Common Joined UInteger Var(param_switch_conversion_branch_ratio) Init(8) 
IntegerRange(1, 65536) Param Optimization
 The maximum ratio between array size and switch branches for a switch 
conversion to take place.
 
+-param=switch-lower-slow-alg-max-cases=
+Common Joined UInteger Var(param_switch_lower_slow_alg_max_cases) Init(10000) 
IntegerRange(1, 1000000000) Param Optimization
+Maximum number of cases for slow switch lowering algorithms to be used.
+
 -param=modref-max-bases=
 Common Joined UInteger Var(param_modref_max_bases) Init(32) Param Optimization
 Maximum number of bases stored in each modref tree.
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 852419b2f4b..3c8cbce5d37 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -55,6 +55,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, 
MA
 #include "tree-cfgcleanup.h"
 #include "hwint.h"
 #include "internal-fn.h"
+#include "diagnostic-core.h"
 
 /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
    type in the GIMPLE type system that is language-independent?  */
@@ -1642,6 +1643,11 @@ jump_table_cluster::find_jump_tables (vec<cluster *> 
&clusters)
     return clusters.copy ();
 
   unsigned l = clusters.length ();
+
+  /* Note: l + 1 is the number of cases of the switch.  */
+  if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
+    return clusters.copy ();
+
   auto_vec<min_cluster_item> min;
   min.reserve (l + 1);
 
@@ -1772,16 +1778,80 @@ jump_table_cluster::is_beneficial (const vec<cluster *> 
&,
   return end - start + 1 >= case_values_threshold ();
 }
 
-/* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.   MAX_C is the approx max number of cases per
-   label.  New clusters are returned.  */
+/* Find bit tests of given CLUSTERS, where all members of the vector are of
+   type simple_cluster.  Use a fast algorithm that might not find the optimal
+   solution (minimal number of clusters on the output).  New clusters are
+   returned.
+
+   You should call find_bit_tests () instead of calling this function
+   directly.  */
 
 vec<cluster *>
-bit_test_cluster::find_bit_tests (vec<cluster *> &clusters, int max_c)
+bit_test_cluster::find_bit_tests_fast (vec<cluster *> &clusters)
 {
-  if (!is_enabled () || max_c == 1)
-    return clusters.copy ();
+  unsigned l = clusters.length ();
+  vec<cluster *> output;
+
+  output.create (l);
+
+  /* Look at sliding BITS_PER_WORD sized windows in the switch value space
+     and determine if they are suitable for a bit test cluster.  Worst case
+     this can examine every value BITS_PER_WORD-1 times.  */
+  unsigned k;
+  for (unsigned i = 0; i < l; i += k)
+    {
+      hash_set<basic_block> targets;
+      cluster *start_cluster = clusters[i];
+
+      /* Find the biggest k such that clusters i to i+k-1 can be turned into a
+        one big bit test cluster.  */
+      k = 0;
+      while (i + k < l)
+       {
+         cluster *end_cluster = clusters[i + k];
+
+         /* Does value range fit into the BITS_PER_WORD window?  */
+         HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
+                                               end_cluster->get_high ());
+         if (w == 0 || w > BITS_PER_WORD)
+           break;
+
+         /* Check for max # of targets.  */
+         if (targets.elements () == m_max_case_bit_tests
+             && !targets.contains (end_cluster->m_case_bb))
+           break;
+
+         targets.add (end_cluster->m_case_bb);
+         k++;
+       }
 
+      if (is_beneficial (k, targets.elements ()))
+       {
+         output.safe_push (new bit_test_cluster (clusters, i, i + k - 1,
+                                                 i == 0 && k == l));
+       }
+      else
+       {
+         output.safe_push (clusters[i]);
+         /* ??? Might be able to skip more.  */
+         k = 1;
+       }
+    }
+
+  return output;
+}
+
+/* Find bit tests of given CLUSTERS, where all members of the vector
+   are of type simple_cluster.  Use a slow (quadratic) algorithm that always
+   finds the optimal solution (minimal number of clusters on the output).  New
+   clusters are returned.
+
+   You should call find_bit_tests () instead of calling this function
+   directly.  */
+
+vec<cluster *>
+bit_test_cluster::find_bit_tests_slow (vec<cluster *> &clusters)
+{
   unsigned l = clusters.length ();
   auto_vec<min_cluster_item> min;
   min.reserve (l + 1);
@@ -1835,6 +1905,25 @@ bit_test_cluster::find_bit_tests (vec<cluster *> 
&clusters, int max_c)
   return output;
 }
 
+/* Find bit tests of given CLUSTERS, where all members of the vector
+   are of type simple_cluster.  MAX_C is the approx max number of cases per
+   label.  New clusters are returned.  */
+
+vec<cluster *>
+bit_test_cluster::find_bit_tests (vec<cluster *> &clusters, int max_c)
+{
+  if (!is_enabled () || max_c == 1)
+    return clusters.copy ();
+
+  unsigned l = clusters.length ();
+
+  /* Note: l + 1 is the number of cases of the switch.  */
+  if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
+    return find_bit_tests_fast (clusters);
+  else
+    return find_bit_tests_slow (clusters);
+}
+
 /* Return true when RANGE of case values with UNIQ labels
    can build a bit test.  */
 
@@ -2265,10 +2354,19 @@ switch_decision_tree::analyze_switch_statement ()
 
   reset_out_edges_aux (m_switch);
 
+  if (l > (unsigned) param_switch_lower_slow_alg_max_cases)
+    warning_at (gimple_location (m_switch), OPT_Wdisabled_optimization,
+              "Using faster switch lowering algorithms. "
+              "Number of switch cases (%d) exceeds "
+              "%<--param=switch-lower-slow-alg-max-cases=%d%> limit.",
+              l, param_switch_lower_slow_alg_max_cases);
+
   /* Find bit-test clusters.  */
   vec<cluster *> output = bit_test_cluster::find_bit_tests (clusters, max_c);
 
-  /* Find jump table clusters.  */
+  /* Find jump table clusters.  We are looking for these in the sequences of
+     simple clusters which we didn't manage to convert into bit-test
+     clusters.  */
   vec<cluster *> output2;
   auto_vec<cluster *> tmp;
   output2.create (1);
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index e6a85fa6025..560620903a8 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -397,6 +397,24 @@ public:
             tree default_label_expr, basic_block default_bb, location_t loc)
      final override;
 
+  /* Find bit tests of given CLUSTERS, where all members of the vector are of
+     type simple_cluster.  Use a fast algorithm that might not find the optimal
+     solution (minimal number of clusters on the output).  New clusters are
+     returned.
+
+     You should call find_bit_tests () instead of calling this function
+     directly.  */
+  static vec<cluster *> find_bit_tests_fast (vec<cluster *> &clusters);
+
+  /* Find bit tests of given CLUSTERS, where all members of the vector
+     are of type simple_cluster.  Use a slow (quadratic) algorithm that always
+     finds the optimal solution (minimal number of clusters on the output).  
New
+     clusters are returned.
+
+     You should call find_bit_tests () instead of calling this function
+     directly.  */
+  static vec<cluster *> find_bit_tests_slow (vec<cluster *> &clusters);
+
   /* Find bit tests of given CLUSTERS, where all members of the vector
      are of type simple_cluster.  New clusters are returned.  */
   static vec<cluster *> find_bit_tests (vec<cluster *> &clusters, int max_c);
-- 
2.46.0

Reply via email to