On 10/06/2017 07:24 PM, David Malcolm wrote: > On Fri, 2017-10-06 at 14:25 +0200, Martin Liška wrote: >> Hello. >> >> As presented at this year's Cauldron, I rewrote current switch >> expansion to support >> multiple algorithms (jump table and bit-test) that can be used just >> for a fraction >> of cases. Balanced decision tree is built on top of that. I decided >> to do a bigger >> refactoring and put all there 3 mentioned algorithm to its own class. >> >> There's a bigger change in jump_table_cluster::can_be_handled where >> the constant 10 (3 respectively) >> is compared to number of handled values, and not number of cases. >> Later one is wrong in my opinion. >> >> There are some numbers for cc1plus: >> >> $ bloaty ./objdir2/gcc/cc1plus -- ./objdir/gcc/cc1plus >> VM SIZE FILE SIZE >> ++++++++++++++ GROWING ++++++++++++++ >> +19% +1.32Mi .rodata +1.32Mi +19% >> [ = ] 0 .symtab +7.38Ki +0.5% >> [ = ] 0 .strtab +5.18Ki +0.2% >> [ = ] 0 .debug_info +743 +0.0% >> +0.0% +712 .eh_frame +712 +0.0% >> +0.1% +440 .eh_frame_hdr +440 +0.1% >> [ = ] 0 .debug_aranges +80 +0.1% >> +0.0% +67 .dynstr +67 +0.0% >> [ = ] 0 .debug_str +6 +0.0% >> >> -------------- SHRINKING -------------- >> -1.2% -214Ki .text -214Ki -1.2% >> [ = ] 0 .debug_loc -66.7Ki -0.1% >> [ = ] 0 .debug_line -14.0Ki -0.2% >> [ = ] 0 .debug_ranges -9.56Ki -0.1% >> -6.8% -3 [Unmapped] -375 -14.4% >> [ = ] 0 .debug_abbrev -46 -0.0% >> >> +3.8% +1.11Mi TOTAL +1.03Mi +0.5% >> >> So it growth and can be easily explained: >> >> insn-attrtab.o: >> VM SIZE FILE SIZE >> ++++++++++++++ GROWING ++++++++++++++ >> [ = ] 0 .rela.rodata +2.00Mi +215% >> +214% +682Ki .rodata +682Ki +214% >> [ = ] 0 .rela.debug_loc +29.3Ki +36% >> +32% +1.91Ki .eh_frame +1.91Ki +32% >> [ = ] 0 .debug_loc +37 +5.6% >> [ = ] 0 .debug_str +2 +0.0% >> >> -------------- SHRINKING -------------- >> -50.1% -63.3Ki .text -63.3Ki -50.1% >> [ = ] 0 .debug_line -1.71Ki -10.4% >> [ = ] 0 .rela.debug_info -768 -0.3% >> [ = ] 0 .rela.text -624 -0.8% >> [ = ] 0 .rela.debug_ranges -384 -2.7% >> [ = ] 0 .debug_info -87 -0.2% >> [ = ] 0 [Unmapped] -2 -8.7% >> >> +137% +621Ki TOTAL +2.63Mi +139% >> >> It's caused by: >> ... >> ;; GIMPLE switch case clusters: JT(2710):-1-5090 >> ;; GIMPLE switch case clusters: JT(2710):-1-5090 >> ;; GIMPLE switch case clusters: JT(2967):-1-5090 >> ;; GIMPLE switch case clusters: JT(1033):-1-5017 >> ... >> >> so there are many switch statements with very reasonable density. >> >> and >> insn-dfatab.o: +13% +99.4Ki TOTAL +455Ki +14% >> insn-latencytab.o: +19% +136Ki TOTAL +609Ki +20% >> >> There shouldn't be any fallout other from what I mentioned in >> previous email that is >> a prerequisite for this patch. >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression >> tests. >> >> I'm still planning to come with some numbers and stats. Will do that >> next week. >> >> Martin > > >> gcc/ChangeLog: >> >> 2017-09-29 Martin Liska <mli...@suse.cz> >> >> * tree-switch-conversion.c (MAX_CASE_BIT_TESTS): Remove and move >> to header file. >> (hoist_edge_and_branch_if_true): Move to ... >> (bit_test_cluster::hoist_edge_and_branch_if_true): ... this. > > I'm not a reviewer, but there's a lot of "churn" in the patch due to > things moving around; there seems to be a mixture of things simply > moving around, functions becoming methods of classes, and some > things changing. > > Would it be helpful to split the patch into two: a patch that moves > the functions to where they'll be needed, and then a patch to do the > "actual" (or functional) changes?
Hello. Sorry for late answer. I fully agree what you wrote David. It would make the patch more easily readable and reviewable. Let me postpone it to next stage1, patch separation into 2 pieces will need some effort. > > [...snip...] > >> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h >> new file mode 100644 >> index 00000000000..45ae11f408d >> --- /dev/null >> +++ b/gcc/tree-switch-conversion.h >> @@ -0,0 +1,806 @@ >> +/* Tree switch conversion for GNU compiler. >> + Copyright (C) 2017 Free Software Foundation, Inc. > > Should this be "gimple switch conversion" and > gimple-ssa-switch-conversion.h? > (but presumably this is to mirror tree-switch-conversion.c, and > I don't want to advocate for unnecessary churn) > >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3. If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +#ifndef TREE_SWITCH_CONVERSION_H >> +#define TREE_SWITCH_CONVERSION_H >> + >> +namespace tree_switch_conversion { >> + >> +/* Type of cluster. */ >> + >> +enum cluster_type >> +{ >> + SIMPLE_CASE, >> + JUMP_TABLE, >> + BIT_TEST >> +}; >> + >> +#define PRINT_CASE(f,c) print_dec (c, f, TYPE_SIGN (TREE_TYPE (c))) >> + >> +/* Base class for switch clustering. */ > > Maybe "Abstract base class for representing a cluster of cases", or > somesuch? I like it! > Perhaps an ASCII art inheritance diagram; if I'm reading the code > right, something like: > > /* Abstract base class for representing a cluster of cases. > > Here is the inheritance hierarachy, and the enum_cluster_type > values for the concrete subclasses: > > cluster > |-simple_cluster (SIMPLE_CASE) > `-group_cluster > |-jump_table_cluster (JUMP_TABLE) > `-bit_test_cluster (BIT_TEST). */ > > Am I right in thinking that the switch initialization conversion > (struct switch_conversion) is currently distinct from the clustering > in this patch? (I'm wondering if there's possible benefit from > combining the two approaches, maybe if one cluster of cases > predominantly assigns to some variables, and another cluster of > cases predominantly assigns to some other variables). Exactly, you understand the algorithm well. Yep switch conversion should be also incorporated as one of strategies for a cluster. Let me work on that in next stage1. Thanks also for another nit comment, I will include them. Martin > >> + >> +struct cluster >> +{ >> + /* Cosntructor. */ > > ^^^ Typo > >> + cluster () >> + {} > > When is this default ctor used? > Doesn't it leave the various fields uninitialized? > > I notice that of the descendant classes, at least jump_table_cluster > doesn't call its base class ctor (if I'm reading the patch right), so I > believe these fields are uninitialized for some of the subclasses. > >> + >> + /* Constructor. */ >> + cluster (tree case_label_expr, basic_block case_bb, profile_probability >> prob, >> + profile_probability subtree_prob); >> + > > [...snip...] > >> +struct simple_cluster: public cluster >> +{ > > Could this struct have a descriptive comment? e.g. > > /* Subclass of cluster representing a simple contiguous range > from [low..high]. */ > > or somesuch (assuming I've read the intent correctly) > >> + /* Constructor. */ >> + simple_cluster (tree low, tree high, tree case_label_expr, >> + basic_block case_bb, profile_probability prob); >> + >> + /* Destructor. */ >> + virtual ~simple_cluster () >> + {} >> + >> + virtual cluster_type >> + get_type () >> + { >> + return SIMPLE_CASE; >> + } > > I believe that the "virtual" here is redundant, but that it ought to say > get_type () OVERRIDE FINAL > so tell human readers that it's a vfunc override, and so that C++11 > onwards can warn if it's not actually a vfunc override. > >> + virtual tree >> + get_low () >> + { >> + return m_low; >> + } >> + >> + virtual tree >> + get_high () >> + { >> + return m_high; >> + } >> + >> + virtual void >> + debug () >> + { >> + dump (stderr); >> + } >> + >> + virtual void >> + dump (FILE *f) >> + { >> + PRINT_CASE (f, get_low ()); >> + if (get_low () != get_high ()) >> + { >> + fprintf (f, "-"); >> + PRINT_CASE (f, get_high ()); >> + } >> + fprintf (f, " "); >> + } > > Likewise for all of these, I believe. > >> + /* Low value of the case. */ >> + tree m_low; >> + >> + /* High value of the case. */ >> + tree m_high; >> +}; >> + >> +simple_cluster::simple_cluster (tree low, tree high, tree case_label_expr, >> + basic_block case_bb, profile_probability prob): >> + cluster (case_label_expr, case_bb, prob, prob), >> + m_low (low), m_high (high) >> +{ >> +} > >> +struct group_cluster: public cluster >> +{ > > Again, could this subclass have a descriptive comment, something like. > > /* Abstract subclass of cluster, handling a collection of simple_cluster > instances. */ > > or somesuch (I believe it's abstract because get_type isn't implemented). > >> + /* Destructor. */ >> + virtual ~group_cluster (); >> + >> + virtual tree >> + get_low () >> + { >> + return m_cases[0]->get_low (); >> + } >> + >> + virtual tree >> + get_high () >> + { >> + return m_cases[m_cases.length () - 1]->get_high (); >> + } >> + >> + virtual void >> + debug () >> + { >> + dump (stderr); >> + } >> + >> + virtual void >> + dump (FILE *f) >> + { >> + unsigned total_values = 0; >> + for (unsigned i = 0; i < m_cases.length (); i++) >> + total_values += m_cases[i]->get_range (m_cases[i]->get_low (), >> + m_cases[i]->get_high ()); >> + >> + fprintf (f, "%s(%d):", get_type () == JUMP_TABLE ? "JT" : "BT", >> + total_values); >> + PRINT_CASE (f, get_low ()); >> + fprintf (f, "-"); >> + PRINT_CASE (f, get_high ()); >> + fprintf (f, " "); >> + } > > As before, lose the "virtual" and add OVERRIDE FINAL for all of these, > I believe. > >> + >> + /* List of simple clusters handled by the group. */ >> + vec<simple_cluster *> m_cases; >> +}; >> + >> +struct jump_table_cluster: public group_cluster > > Suggested descriptive comment: > > /* Concrete subclass of group_cluster representing a collection > of cases to be implemented as a jump table. > The "emit" vfunc gernerates a nested switch statement which > is later lowered to a jump table. */ > > or somesuch. > >> +{ >> + /* Constructor. */ >> + jump_table_cluster (vec<cluster *> &clusters, unsigned start, unsigned >> end); >> + >> + virtual cluster_type >> + get_type () >> + { >> + return JUMP_TABLE; >> + } >> + >> + virtual void emit (tree index_expr, tree index_type, >> + tree default_label_expr, basic_block default_bb); > > As before, lose the "virtual" and add OVERRIDE FINAL for both of these > (I think it matters much for "emit" given the non-trivial signature and the > empty default implementation). > >> + /* Find jump tables of given CLUSTERS, where all members of the vector >> + are of type simple_cluster. New clusters are returned. */ >> + static vec<cluster *> find_jump_tables (vec<cluster *> &clusters); >> + >> + /* Return true when cluster starting at START and ending at END >> (inclusive) >> + can build a jump-table. */ >> + static bool can_be_handled (const vec<cluster *> &clusters, unsigned >> start, >> + unsigned end); >> + >> + /* Return true if cluster starting at START and ending at END (inclusive) >> + is profitable transformation. */ >> + static bool is_beneficial (const vec<cluster *> &clusters, unsigned start, >> + unsigned end); >> + >> + /* Return the smallest number of different values for which it is best >> + to use a jump-table instead of a tree of conditional branches. */ >> + static inline unsigned int case_values_threshold (void); >> +}; >> + >> +/* A GIMPLE switch statement can be expanded to a short sequence of bit-wise >> +comparisons. "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)" >> +where CST and MINVAL are integer constants. This is better than a series >> +of compare-and-banch insns in some cases, e.g. we can implement: >> + >> + if ((x==4) || (x==6) || (x==9) || (x==11)) >> + >> +as a single bit test: >> + >> + if ((1<<x) & ((1<<4)|(1<<6)|(1<<9)|(1<<11))) >> + >> +This transformation is only applied if the number of case targets is small, >> +if CST constains at least 3 bits, and "1 << x" is cheap. The bit tests are >> +performed in "word_mode". >> + >> +The following example shows the code the transformation generates: >> + >> + int bar(int x) >> + { >> + switch (x) >> + { >> + case '0': case '1': case '2': case '3': case '4': >> + case '5': case '6': case '7': case '8': case '9': >> + case 'A': case 'B': case 'C': case 'D': case 'E': >> + case 'F': >> + return 1; >> + } >> + return 0; >> + } >> + >> +==> >> + >> + bar (int x) >> + { >> + tmp1 = x - 48; >> + if (tmp1 > (70 - 48)) goto L2; >> + tmp2 = 1 << tmp1; >> + tmp3 = 0b11111100000001111111111; >> + if ((tmp2 & tmp3) != 0) goto L1 ; else goto L2; >> + L1: >> + return 1; >> + L2: >> + return 0; >> + } >> + >> +TODO: There are still some improvements to this transformation that could >> +be implemented: >> + >> +* A narrower mode than word_mode could be used if that is cheaper, e.g. >> + for x86_64 where a narrower-mode shift may result in smaller code. >> + >> +* The compounded constant could be shifted rather than the one. The >> + test would be either on the sign bit or on the least significant bit, >> + depending on the direction of the shift. On some machines, the test >> + for the branch would be free if the bit to test is already set by the >> + shift operation. >> + >> +This transformation was contributed by Roger Sayle, see this e-mail: >> + http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html >> +*/ >> + >> +struct bit_test_cluster: public group_cluster >> +{ >> + /* Constructor. */ >> + bit_test_cluster (vec<cluster *> &clusters, unsigned start, unsigned end); >> + >> + virtual cluster_type >> + get_type () >> + { >> + return BIT_TEST; >> + } >> + >> +/* Expand a switch statement by a short sequence of bit-wise >> + comparisons. "switch(x)" is effectively converted into >> + "if ((1 << (x-MINVAL)) & CST)" where CST and MINVAL are >> + integer constants. >> + >> + INDEX_EXPR is the value being switched on. >> + >> + MINVAL is the lowest case value of in the case nodes, >> + and RANGE is highest value minus MINVAL. MINVAL and RANGE >> + are not guaranteed to be of the same type as INDEX_EXPR >> + (the gimplifier doesn't change the type of case label values, >> + and MINVAL and RANGE are derived from those values). >> + MAXVAL is MINVAL + RANGE. >> + >> + There *MUST* be max_case_bit_tests or less unique case >> + node targets. */ >> + virtual void emit (tree index_expr, tree index_type, >> + tree default_label_expr, basic_block default_bb); > > As before, lose the "virtual" and add OVERRIDE FINAL for both of these. > >> + /* Find bit tests of given CLUSTERS, where all members of the vector >> + are of type simple_cluster. New clusters are returned. */ >> + static vec<cluster *> find_bit_tests (vec<cluster *> &clusters); >> + >> + /* Return true when cluster starting at START and ending at END >> (inclusive) >> + can build a bit test. */ >> + static bool can_be_handled (const vec<cluster *> &clusters, unsigned >> start, >> + unsigned end); >> + >> + /* Return true if cluster starting at START and ending at END (inclusive) >> + is profitable transformation. */ >> + static bool is_beneficial (const vec<cluster *> &clusters, unsigned start, >> + unsigned end); >> + >> +/* Split the basic block at the statement pointed to by GSIP, and insert >> + a branch to the target basic block of E_TRUE conditional on tree >> + expression COND. >> + >> + It is assumed that there is already an edge from the to-be-split >> + basic block to E_TRUE->dest block. This edge is removed, and the >> + profile information on the edge is re-used for the new conditional >> + jump. >> + >> + The CFG is updated. The dominator tree will not be valid after >> + this transformation, but the immediate dominators are updated if >> + UPDATE_DOMINATORS is true. >> + >> + Returns the newly created basic block. */ >> + static basic_block hoist_edge_and_branch_if_true (gimple_stmt_iterator >> *gsip, >> + tree cond, >> + basic_block case_bb); >> + >> + /* Maximum number of different basic blocks that can be handlel by > > Typo: handlel -> handled > >> + a bit test. */ >> + static const int m_max_case_bit_tests = 3; >> +}; >> + > > [...snip...] > > Hope this is constructive > Dave >