On Thu, 1 May 2025, Filip Kastl wrote:

> Hi,
> 
> while developing GCC 15, we found out that the bit-test switch lowering
> algorithm is quadratic and that it is possible to construct a switch that
> causes huge compile time (PR117091).
> 
> Andi Kleen came up with a fast bit-test lowering algorithm which was
> 1) linear
> 2) in some cases better than the original algorithm
> 3) in some cases worse than the original algorithm
> 
> For GCC 15 we decided to use the original algorithm when the given switch is
> small and the fast algorithm when the switch is too big.
> 
> In this patch set I combine both algorithms.  The new algorithm gives strictly
> better solutions than Andi's and the original one and runs in linear time.
> 
> More details below and in commit messages.  Bootstrapped and regtested on
> x86_64 linux.
> 
> Ok for trunk?
> Filip Kastl
> 
> 
> Better solutions
> ----------------
> 
> When I say "better solutions", I mean the number of clusters.  Cluster is a
> concept from switch lowering pass.  It is basically a group of cases that can
> be handled by constantly many operations.
> 
> On every switch the new algorithm should output clustering of the cases such
> that the number of clusters is the smallest possible.  This should be
> guaranteed by how the algorithm works.
> 
> To be extra sure, I tested this by compiling GCC itself.  I bootstrapped GCC
> once for each of the three algorithms (the original one, Andi's and the new
> one) and I counted the number of clusters for each function (I didn't find a
> way to easily compare individual switches).  Indeed the new algorithm either
> gave a function with fewer clusters or a function with more switches
> (if-to-switch conversion converts if-else chains into switches based on if
> switch lowering can lower the resulting switch -- I think that this is also a
> win).
> 
> 
> A note about --param switch-lower-slow-alg-max-cases
> ----------------------------------------------------
> 
> The param switch-lower-slow-alg-max-cases specified how big a switch had to be
> so that GCC would switch to the fast algorithm.  After this patch set it will
> serve no purpose.  I kept it though.  There is the jump-table switch lowering
> algorithm.  That algorithm is still quadratic.  Nobody ran into any real 
> issues
> with that yet, but it would be nice to come up with a fast algorithm for
> jump-table lowering and set GCC to fall back to it when a switch is too big
> (pr118353).  Then the param would be relevant again.  I plan to look into that
> in this Stage 1.
> 
> 
> Testing
> -------
> 
> - I compiled the testcase from PR117091 to confirm that the new algorithm
>   spends reasonable time on it.
> 
> > gcc.sh -O2 -ftime-report big-switch.c 
> Time variable                                  wall           GGC
>  ipa SRA                            :  12.58 (  5%)   536  (  0%)
>  dominator optimization             :  55.13 ( 22%)  4687k (  1%)
>  tree FRE                           :  71.05 ( 29%)   744  (  0%)
>  tree switch lowering               :  27.41 ( 11%)    13M (  3%)
>  tree modref                        :  13.24 (  5%)   480  (  0%)
>  rest of compilation                :  23.09 (  9%)  2816  (  0%)
>  TOTAL                              : 248.27          413M
> (some entries omitted)
> 
> The original bug report claimed that this testcase ran over 30 minutes.  
> That's
> not the case here.  Switch lowering doesn't even dominate the compilation 
> time.
> 
> - I also ran a few SPEC CPU 2017 benchmarks (x86_64 linux).
> 
> with this patch set:
> 
> O2 generic
> 500.perlbench_r                               NR       1        237       
> 6.72  *
> 502.gcc_r                                     NR       1        216       
> 6.57  *
> 
> Ofast native
> 500.perlbench_r                               NR       1        251       
> 6.33  *
> 502.gcc_r                                     NR       1        210       
> 6.73  *
> 
> without this patch set:
> 
> O2 generic
> 500.perlbench_r                               NR       1        242       
> 6.57  *
> 502.gcc_r                                     NR       1        216       
> 6.56  *
> 
> Ofast native
> 500.perlbench_r                               NR       1        253       
> 6.29  *
> 502.gcc_r                                     NR       1        210       
> 6.73  *
> 
> The fourth row is the exec time in seconds.  There are only small differences
> (2% max) and they favor the new algorithm.  They could just be noise, btw.
> 
> I've selected the perl and gcc benchmarks because I expect that those contain
> many switches.
> 
> - I bootstrapped and regtested this on x86_64 linux.

This looks OK.

Thanks for the work,
Richard.

> 
> Filip Kastl (4):
>   gimple: Merge slow and fast bit-test switch lowering [PR117091]
>   gimple: Make bit-test switch lowering more powerful
>   gimple: Don't warn about using different algs for big switch lowering
>     [PR117091]
>   gimple: Switch bit-test lowering testcases for the more powerful alg
> 
>  gcc/testsuite/gcc.dg/tree-ssa/switch-5.c |  60 ++++++
>  gcc/testsuite/gcc.dg/tree-ssa/switch-6.c |  51 +++++
>  gcc/tree-switch-conversion.cc            | 245 +++++++----------------
>  gcc/tree-switch-conversion.h             |  10 -
>  4 files changed, 184 insertions(+), 182 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to