Hello, > > Apologies for the previous incomplete reply. Here's the rest of the > comments I had for this patch.
here is the updated patch. Daniel, did you write something regarding the lambda framework, or should I do it? Zdenek Index: doc/loop.texi =================================================================== *** doc/loop.texi (revision 0) --- doc/loop.texi (revision 0) *************** *** 0 **** --- 1,434 ---- + @c Copyright (c) 2006 Free Software Foundation, Inc. + @c Free Software Foundation, Inc. + @c This is part of the GCC manual. + @c For copying conditions, see the file gcc.texi. + + @c --------------------------------------------------------------------- + @c Loop Representation + @c --------------------------------------------------------------------- + + @node Loop Representation + @chapter Analysis and Representation of Loops + + GCC provides extensive infrastructure for work with natural loops, + i.e., strongly connected components of CFG with only one entry block. + This chapter describes representation of loops in GCC, both on GIMPLE + and in RTL, as well as the interfaces to loop-related analyses + (induction variable analysis and number of iterations analysis). + + @menu + * Loop representation:: Representation and analysis of loops. + * Loop querying:: Getting information about loops. + * Loop manipulation:: Loop manipulation functions. + * LCSSA:: Loop-closed SSA form. + * Scalar evolutions:: Induction variables on GIMPLE. + * loop-iv:: Induction variables on RTL. + * Number of iterations:: Number of iterations analysis. + * Dependency analysis:: Data dependency analysis. + * Lambda:: Linear loop transformations framework. + @end menu + + @node Loop representation + @section Loop representation + @cindex Loop representation + @cindex Loop analysis + + This chapter describes the representation of loops in GCC, and functions + that can be used to build, modify and analyze this representation. + Most of the interfaces and data structures are declared in @file{cfgloop.h}. + At the moment, loop structures are analyzed and this information is updated + only by the optimization passes that deal with loops, but some efforts are + being made to make it available throughout most of the optimization passes. + + In general, a natural loop has one entry block (header) and possibly + several back edges (latches) leading to the header from the inside of the loop. + Loops with several latches may appear if several loops share a single header, + or if there is a branching in the middle of the loop. The representation of + loops in GCC however allows only loops with a single latch. During loop + analysis, headers of such loops are split and forwarder blocks are created in + order to disambiguate their structures. A heuristic based on profile + information is used to determine whether the latches correspond to sub-loops or + to control flow in a single loop. This means that the analysis + sometimes changes the CFG, and if you run it in the middle of an optimization + pass, you must be able to deal with the new blocks. + + Body of the loop is the set of blocks that are dominated by its header, + and reachable from its latch against the direction of edges in CFG. + The loops are organized in a containment hierarchy (tree) such that + all the loops immediately contained inside loop L are the children + of L in the tree. This tree is represented by the @code{struct + loops} structure. The root of this tree is a fake loop that contains all blocks + in the function. Each of the loops is represented in a @code{struct loop} + structure. Each loop is assigned an index (@code{num} field of the + @code{struct loop} structure), and the pointer to the loop is stored in the + corresponding field of the @code{parray} field of the loops structure. Index + of a sub-loop is always greater than the index of its super-loop. The indices do + not have to be continuous, there may be empty (@code{NULL}) entries in the + @code{parray} created by deleting loops. The index of a loop never changes. + The first unused index is stored in the @code{num} field of the loops + structure. + + Each basic block contains the reference to the innermost loop it belongs to + (@code{loop_father}). For this reason, it is only possible to have one + @code{struct loops} structure initialized at the same time for each CFG. It is + recommended to use the global variable @code{current_loops} to contain the + @code{struct loops} structure, especially if the loop structures are updated + throughout several passes. Many of the loop manipulation functions assume that + dominance information is up-to-date. + + The loops are analyzed through @code{loop_optimizer_init} function. The + argument of this function is a set of flags represented in an integer + bitmask. These flags specify what other properties + of the loop structures should be calculated/enforced and preserved later: + + @itemize + @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in + such a way that each loop has only one entry edge, and additionally, + the source block of this entry edge has only one successor. + This creates a natural place where the code can be moved out of the loop, + and ensures that the entry edge of the loop leads from its immediate super-loop. + @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created + to force the latch block of each loop to have only one + successor. This ensures that the latch of the loop does not belong to + any of its sub-loops, and makes manipulation with the loops significantly + easier. Most of the loop manipulation functions assume that the loops + are in this shape. Note that with this flag, the ``normal'' loop without + any control flow inside and with one exit consists of two basic blocks. + @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and edges + in the strongly connected components that are not natural loops (have more + than one entry block) are marked with @code{BB_IRREDUCIBLE_LOOP} and + @code{EDGE_IRREDUCIBLE_LOOP} flags. The flag is not set for blocks and edges that + belong to natural loops that are in such an irreducible region (but it is + set for the entry and exit edges of such a loop, if they lead to/from this + region). + @item @code{LOOPS_HAVE_MARKED_SINGLE_EXITS}: If a loop has exactly one exit + edge, this edge is stored in @code{single_exit} field of the loop structure. + @code{NULL} is stored there otherwise. + @end itemize + + These properties may also be computed/enforced later, using functions + @code{create_preheaders}, @code{force_single_succ_latches}, + @code{mark_irreducible_loops} and @code{mark_single_exit_loops}. + + The memory occupied by the loops structures should be freed with + @code{loop_optimizer_finalize} function. + + The CFG manipulation functions in general do not update loop structures. + Specialized versions that additionally do so are provided for the most + common tasks. On GIMPLE, @code{cleanup_tree_cfg_loop} function can be + used to cleanup CFG while updating the loops structures if @code{current_loops} + is set. + + @node Loop querying + @section Loop querying + @cindex Loop querying + + The functions to query the information about loops are declared in + @file{cfgloop.h}. Some of the information can be taken directly from + the structures. @code{loop_father} field of each basic block contains + the innermost loop to that the block belongs. The most useful fields of + loop structure (that are kept up-to-date at all times) are: + + @itemize + @item @code{header}, @code{latch}: Header and latch basic blocks of the loop. + @item @code{num_nodes}: Number of basic blocks in the loop (including + the basic blocks of the sub-loops). + @item @code{depth}: The depth of the loop in the loops tree, i.e., the number + of super-loops of the loop. + @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first + sub-loop, and the sibling of the loop in the loops tree. + @item @code{single_exit}: The exit edge of the loop, if the loop has exactly + one exit and the loops were analyzed with LOOPS_HAVE_MARKED_SINGLE_EXITS. + @end itemize + + There are other fields in the loop structures, many of them used only by some + of the passes, or not updated during CFG changes; in general, they should not + be accessed directly. + + The most important functions to query loop structures are: + + @itemize + @item @code{flow_loops_dump}: Dumps the information about loops to a file. + @item @code{verify_loop_structure}: Checks consistency of the loop structures. + @item @code{loop_latch_edge}: Returns the latch edge of a loop. + @item @code{loop_preheader_edge}: If loops have preheaders, returns + the preheader edge of a loop. + @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of another loop. + @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs to + a loop (including its sub-loops). + @item @code{find_common_loop}: Finds the common super-loop of two loops. + @item @code{superloop_at_depth}: Returns the super-loop of a loop with the given + depth. + @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the number of + insns in the loop, on GIMPLE and on RTL. + @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a loop. + @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops with + @code{EDGE_LOOP_EXIT} flag. + @item @code{get_loop_body}, @code{get_loop_body_in_dom_order}, + @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the loop in + depth-first search order in reversed CFG, ordered by dominance relation, + and breath-first search order, respectively. + @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop. + @item @code{just_once_each_iteration_p}: Returns true if the basic block is + executed exactly once during each iteration of a loop (that is, it does not + belong to a sub-loop, and it dominates the latch of the loop). + @end itemize + + @node Loop manipulation + @section Loop manipulation + @cindex Loop manipulation + + The loops tree can be manipulated using the following functions: + + @itemize + @item @code{flow_loop_tree_node_add}: Adds a node to the tree. + @item @code{flow_loop_tree_node_remove}: Removes a node from the tree. + @item @code{add_bb_to_loop}: Adds a basic block to a loop. + @item @code{remove_bb_from_loops}: Removes a basic block from loops. + @end itemize + + The specialized versions of several low-level CFG functions that also update + loop structures are provided: + + @itemize + @item @code{loop_split_edge_with}: Splits an edge, and places a specified RTL + code on it. On GIMPLE, the function can still be used, but the code must be + NULL. + @item @code{bsi_insert_on_edge_immediate_loop}: Inserts code on edge, splitting + it if necessary. Only works on GIMPLE. + @item @code{remove_path}: Removes an edge and all blocks it dominates. + @item @code{loop_commit_inserts}: Commits insertions scheduled on edges, and + sets loops for the new blocks. This function can only be used on GIMPLE. + @item @code{split_loop_exit_edge}: Splits exit edge of the loop, ensuring + that PHI node arguments remain in the loop (this ensures that loop-closed + SSA form is preserved). Only useful on GIMPLE. + @end itemize + + Finally, there are some higher-level loop transformations implemented. While + some of them are written so that they should work on non-innermost loops, + they are mostly untested in that case, and at the moment, they are only + reliable for the innermost loops: + + @itemize + @item @code{create_iv}: Creates a new induction variable. Only works on GIMPLE. + @code{standard_iv_increment_position} can be used to find a suitable place for + the iv increment. + @item @code{duplicate_loop_to_header_edge}, + @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and on + GIMPLE) duplicate the body of the loop prescribed number of times on one of + the edges entering loop header, thus performing either loop unrolling or + loop peeling. @code{can_duplicate_loop_p} (@code{can_unroll_loop_p} + on GIMPLE) must be true for the duplicated loop. + @item @code{loop_version}, @code{tree_ssa_loop_version}: These function create + a copy of a loop, and a branch before them that selects one of them depending + on the prescribed condition. This is useful for optimizations that need to + verify some assumptions in runtime (one of the copies of the loop is usually + left unchanged, while the other one is transformed in some way). + @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the extra + iterations to make the number of iterations divisible by unroll factor, + updating the exit condition, and removing the exits that now cannot be taken. + Works only on GIMPLE. + @end itemize + + @node LCSSA + @section Loop-closed SSA form + @cindex LCSSA + @cindex Loop-closed SSA form + + Throughout the loop optimizations on tree level, one extra condition is + enforced on the SSA form: No SSA name is used outside of the loop in that + it is defined. The SSA form satisfying this condition is called + ``loop-closed SSA form'' -- LCSSA. To enforce LCSSA, PHI nodes must be + created at the exits of the loops for the SSA names that are used outside + of them. Only the real operands (not virtual SSA names) are held in LCSSA, + in order to save memory. + + There are various benefits of LCSSA: + + @itemize + @item Various optimizations (value range analysis, final value replacement) + are interested in the values that are defined in the loop and used outside + of it, i.e., exactly those for that we create new PHI nodes. + @item In induction variable analysis, it is not necessary to specify the + loop in that the analysis should be performed -- the scalar evolution + analysis always returns the results with respect to the loop in that the SSA + name is defined. + @item It makes updating of SSA form during loop transformations simpler. + Without LCSSA, operations like loop unrolling may force creation of PHI + nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be + updated locally. However, since we only keep real operands in LCSSA, we + cannot use this advantage (we could have local updating of real operands, + but it is not much more efficient than to use generic SSA form updating + for it as well; the amount of changes to SSA is the same). + @end itemize + + However, it also means LCSSA must be updated. This is usually straightforward, + unless you create a new value in loop and use it outside, or unless you + manipulate loop exit edges (functions are provided to make these manipulations + simple). @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to + LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of LCSSA + is preserved. + + @node Scalar evolutions + @section Scalar evolutions + @cindex Scalar evolutions + @cindex IV analysis on GIMPLE + + Scalar evolutions (SCEV) are used to represent results of induction variable + analysis on GIMPLE. They enable us to represent variables with complicated + behavior in a simple and consistent way (we only use it to express values of + polynomial induction variables, but it is possible to extend it). The + interfaces to SCEV analysis are declared in @file{tree-scalar-evolution.h}. + To use scalar evolutions analysis, @code{scev_initialize} must be used. + To stop using SCEV, @code{scev_finalize} should be used. SCEV analysis caches + results in order to save time and memory. This cache however is made invalid + by most of the loop transformations, including removal of code. If such + a transformation is performed, @code{scev_reset} must be called to clean + the caches. + + Given an SSA name, its behavior in loops can be analyzed using + the @code{analyze_scalar_evolution} function. The returned SCEV however + does not have to be fully analyzed and it may contain references to other + SSA names defined in the loop. To resolve these (potentially recursive) + references, @code{instantiate_parameters} or @code{resolve_mixers} functions + must be used. @code{instantiate_parameters} is useful when you use the + results of SCEV only for some analysis, and when you work with whole nest + of loops at once. It will try replacing all SSA names by their SCEV in + all loops, including the super-loops of the current loop, thus providing + a complete information about the behavior of the variable in the loop nest. + @code{resolve_mixers} is useful if you work with only one loop at a time, + and if you possibly need to create code based on the value of the induction + variable. It will only resolve the SSA names defined in the current loop, + leaving the SSA names defined outside unchanged, even if their evolution in + the outer loops is known. + + The SCEV is a normal tree expression, except for the fact that it may contain + several special tree nodes. One of them is @code{SCEV_NOT_KNOWN}, used + for SSA names whose value cannot be expressed. The other one is + @code{POLYNOMIAL_CHREC}. Polynomial chrec has three arguments -- base, step + and loop (both base and step may contain further polynomial chrecs). Type of + the expression and of base and step must be the same. A variable has evolution + @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified loop) + equivalent to @code{x_1} in the following example + + @smallexample + while (...) + @{ + x_1 = phi (base, x_2); + x_2 = x_1 + step; + @} + @end smallexample + + Note that this includes the language restrictions on the operations. For + example, if we compile C code and @code{x} has signed type, then the overflow + in addition would cause undefined behavior, and we may assume that this does + not happen. Hence, the value with this SCEV cannot overflow (which restricts + the number of iterations of such a loop). + + In many cases, one wants to restrict the attention just to affine induction + variables. In this case, the extra expressive power of SCEV is not useful, + and may complicate the optimizations. In this case, @code{simple_iv} function + may be used to analyze a value -- the result is a loop-invariant base and step. + + @node loop-iv + @section IV analysis on RTL + @cindex IV analysis on RTL + + The induction variable on RTL is simple and only allows analysis of affine + induction variables, and only in one loop at once. The interface is declared + in @file{cfgloop.h}. Before analyzing induction variables in a loop L, + @code{iv_analysis_loop_init} function must be called on L. + After the analysis (possibly calling @code{iv_analysis_loop_init} for several + loops) is finished, @code{iv_analysis_done} should be called. The following + functions can be used to access the results of the analysis: + + @itemize + @item @code{iv_analyze}: Analyzes a single register used in the given insn. + If no use of the register in this insn is found, the following insns are scanned, + so that this function can be called on the insn returned by get_condition. + @item @code{iv_analyze_result}: Analyzes result of the assignment in + the given insn. + @item @code{iv_analyze_expr}: Analyzes a more complicated expression. All its + operands are analyzed by @code{iv_analyze}, and hence they must be used in + the specified insn or one of the following insns. + @end itemize + + The description of the induction variable is provided in @code{struct rtx_iv}. + In order to handle subregs, the representation is a bit complicated; if the value + of the @code{extend} field is not @code{UNKNOWN}, the value + of the induction variable in the i-th iteration is + + @smallexample + delta + mult * [EMAIL PROTECTED]@} ([EMAIL PROTECTED]@} (base + i * step)), + @end smallexample + + with the following exception: if @code{first_special} is true, then the value + in the first iteration (when @code{i} is zero) is @code{delta + mult * base}. + However, if @code{extend} is equal to @code{UNKNOWN}, then @code{first_special} + must be false, @code{delta} 0, @code{mult} 1 and the value in the i-th + iteration is + + @smallexample + [EMAIL PROTECTED]@} (base + i * step) + @end smallexample + + The function @code{get_iv_value} can be used to perform these calculations. + + @node Number of iterations + @section Number of iterations analysis + @cindex Number of iterations analysis + + Both on GIMPLE and on RTL, there are functions available to determine + the number of iterations of a loop, with a similar interface. In many + cases, it is not possible to determine number of iterations unconditionally -- + the determined number is correct only if some assumptions are satisfied. + The analysis tries to verify these conditions using the information contained + in the program; if it fails, the conditions are returned together with the + result. The following information and conditions are provided by the analysis: + + @itemize + @item @code{assumptions}: If this condition is false, the rest of + the information is invalid. + @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If this + condition is true, the loop exits in the first iteration. + @item @code{infinite}: If this condition is true, the loop is infinite. + This condition is only available on RTL. On GIMPLE, this condition is OR-ed + with the @code{assumptions}. + @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression that + gives number of iterations. The number of iterations is defined as the number + of executions of the loop latch. + @end itemize + + Both on GIMPLE and on RTL, it necessary for the induction variable analysis + framework to be initialized (SCEV on GIMPLE, loop-iv on RTL). On GIMPLE, the + results are stored to @code{struct tree_niter_desc} structure. Number of + iterations before the loop is exited through a given exit can be determined + using @code{number_of_iterations_exit} function. On RTL, the results are + returned in @code{struct niter_desc} structure. The corresponding function is + named @code{check_simple_exit}. There are also functions that pass through all + the exits of a loop and try to find one with easy to determine number of + iterations -- @code{find_loop_niter} on GIMPLE and @code{find_simple_exit} on + RTL. Finally, there are functions that provide the same information, but + additionally cache it, so that repeated calls to number of iterations are not + so costly -- @code{number_of_iterations_in_loop} on GIMPLE and + @code{get_simple_loop_desc} on RTL. + + Note that some of these functions may behave slightly differently than others + -- some of them return only the expression for the number of iterations, and + fail if there are some assumptions. The function + @code{number_of_iterations_in_loop} works only for single-exit loops, and it + returns the value for number of iterations higher by one with respect to all + other functions (i.e., it returns number of executions of the exit statement, + not of the loop latch). + + @node Lambda + @section Linear loop transformations framework + @cindex Linear loop transformations framework + + TODO + + @node Dependency analysis + @section Data Dependency Analysis + @cindex Data Dependency Analysis + + TODO Index: doc/gccint.texi =================================================================== *** doc/gccint.texi (revision 116188) --- doc/gccint.texi (working copy) *************** Additional tutorial information is linke *** 111,116 **** --- 111,117 ---- * RTL:: The intermediate representation that most passes work on. * Control Flow:: Maintaining and manipulating the control flow graph. * Tree SSA:: Analysis and optimization of the tree representation. + * Loop Representation:: Analysis and representation of loops * Machine Desc:: How to write machine description instruction patterns. * Target Macros:: How to write the machine description C macros and functions. * Host Config:: Writing the @[EMAIL PROTECTED] file. *************** Additional tutorial information is linke *** 141,146 **** --- 142,148 ---- @include passes.texi @include c-tree.texi @include tree-ssa.texi + @include loop.texi @include rtl.texi @include cfg.texi @include md.texi Index: Makefile.in =================================================================== *** Makefile.in (revision 116188) --- Makefile.in (working copy) *************** TEXI_GCCINT_FILES = gccint.texi gcc-comm *** 3379,3385 **** rtl.texi md.texi tm.texi hostconfig.texi fragments.texi \ configfiles.texi collect2.texi headerdirs.texi funding.texi \ gnu.texi gpl.texi fdl.texi contrib.texi languages.texi \ ! sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi --- 3379,3386 ---- rtl.texi md.texi tm.texi hostconfig.texi fragments.texi \ configfiles.texi collect2.texi headerdirs.texi funding.texi \ gnu.texi gpl.texi fdl.texi contrib.texi languages.texi \ ! sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi \ ! loop.texi TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi