Re: Re; Maintaining, was: Re: Reduce Dwarf Debug Size
Hi Ian, I don't see why: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg02031.html was a bad thing. i think gcc would have been better if it had just been committed. Not fair. Zack himself says the patch is not recommended for commit, and it just a baseline for further work. For this kind of target specific change, I think it is best to let the target port maintainers make the call. I've CC'ed the maintainers listed for h8. Perhaps they will take a look at it. I am aware of the define_constraints patch. I am thinking about working on the H8 and M68K ports at least. How urgent is this? FWIW, I've reproduced the ICE that Zack mentioned, but I haven't investigated it. Kazu Hirata
XFAILing gcc.c-torture/execute/mayalias-2.c -O3 -g (PR 28834)
Hi Janis, While PR 28834 stays open, I'm thinking about XFAILing gcc.c-torture/execute/mayalias-2.c when it is run with -O3 -g. However, I am not having any luck with writing mayalias-2.x. I am wondering if you could help me with XFAIL. When I try mayalias-2.x like so: set torture_eval_before_execute { global compiler_conditional_xfail_data set compiler_conditional_xfail_data { "PR 28834" \ { "*-*-*" } \ { "-O3" } \ { "" } } } return 0 I get XPASS: gcc.c-torture/execute/mayalias-2.c execution, -O3 -fomit-frame-pointer FAIL: gcc.c-torture/execute/mayalias-2.c compilation, -O3 -g (internal compiler error) That is, I am getting an unintended XPASS for -O3 fomit-frame-pointer. Also, the "-O3 -g" one doesn't show XFAIL even though the options do contain -O3. How do I make gcc.c-torture/execute/mayalias-2.c XFAIL on -O3 -g? Thanks, Kazu Hirata
A question about DOM
Hi, Consider the following code from tree-ssa-dom.c:tree_ssa_dominator_optimize. /* Thread jumps, creating duplicate blocks as needed. */ cfg_altered = thread_through_all_blocks (); /* Removal of statements may make some EH edges dead. Purge such edges from the CFG as needed. */ if (!bitmap_empty_p (need_eh_cleanup)) { cfg_altered |= tree_purge_all_dead_eh_edges (need_eh_cleanup); bitmap_zero (need_eh_cleanup); } free_dominance_info (CDI_DOMINATORS); cfg_altered = cleanup_tree_cfg (); calculate_dominance_info (CDI_DOMINATORS); Notice that we have three assignments to cfg_altered, but the last one kills the two previous assignments. Should the last one be |=? Kazu Hirata
Re: RFC: Plan for cleaning up the "Addressing Modes" macros
Hi Zack, > So, the plan: Step 1 introduces - one at a time - target hooks > corresponding to each of the above macros. All callers are changed to > use the hooks. The default definitions of the hooks invoke the > existing macros. The hardest part of this is working out exactly what > their sole callers expect of LEGITIMIZE_ADDRESS and > LEGITIMIZE_RELOAD_ADDRESS. The manual, unfortunately, is not very > clear. I think that in both cases it amounts to "either replace X > with a new value and jump to WIN, or leave X unmodified and drop > through", which can be translated to "return a new value for X, or > NULL", but I'm not sure, particularly about LEGITIMIZE_RELOAD_ADDRESS > and its interaction with push_reload. > > Step 2 is to convert each architecture, one at a time, to make > target-hook definitions. I think it makes most sense to do each > architecture in one go, because the definitions of these macros tend > to be intertwined. In particular, REG_OK_FOR_BASE_P etc are often > used in the definition of GO_IF_LEGITIMATE_ADDRESS. It will require > some care to be sure that the hooks retain the same semantics wrt > REG_OK_STRICT. Some of the macros have sensible defaults; for > instance, it is allowed to define a LEGITIMIZE_ADDRESS that does > nothing. In this stage we should identify what those defaults are, > and add appropriate functions to hooks.c; but we cannot change the > default settings in target-def.h until all architectures are > converted. I would swap these two steps. Last time I looked at this stuff, I was surprised by how many things depended on REG_OK_STRICT. Even if macro A does not mention REG_OK_STRICT, macro B used in A might use REG_OK_STRICT, so it's difficult to tell in advance which macro depends on REG_OK_STRICT either directly or indirectly. So I would change each macro to an architecture-specific function in each architecture first. For example, GO_IF_LEGITIMATE_ADDRESS should look like this (taken from i386.h): #ifdef REG_OK_STRICT #define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) \ do {\ if (legitimate_address_p ((MODE), (X), 1))\ goto ADDR; \ } while (0) #else #define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) \ do {\ if (legitimate_address_p ((MODE), (X), 0))\ goto ADDR; \ } while (0) #endif Note that REG_OK_STRICT is essentially used as an argument to legitimate_address_p and that the use of the function argument decouples legitimate_address_p and REG_OK_STRICT. Once REG_OK_STRICT appears in this way only, we are ready to introduce new target hooks. The new target hooks would have a REG_OK_STRICT as an argument. So in the above example, we would simply psas that argument to the last argument of legitimate_address_p. Kazu Hirata
Re: RFC: Plan for cleaning up the "Addressing Modes" macros
Hi Dave, > I'm basically in agreement with you here, and just want to suggest you can > avoid an awful lot of code duplication by doing something like > > #ifdef REG_OK_STRICT > #define ${CPU}_IS_STRICT 1 > #else > #define ${CPU}_IS_STRICT 0 > #endif Sure. In fact, the FRV port and possible others have already started doing this like so: #ifdef REG_OK_STRICT #define REG_OK_STRICT_P 1 #else #define REG_OK_STRICT_P 0 #endif Kazu Hirata
A headache with fold_ternary and CALL_EXPR.
Hi, These days, I am reorganizing fold. One of my goals is to provide a function like fold_ternary (code, type, op0, op1, op2) without taking a tree that would be obtained by build3 (code, type, op0, op1, op2) So we need to eliminate a reference to the original tree, that ie, the result of the build3 above. It turns out that the CALL_EXPR case of fold_ternary needs the original tree like so. (Notice a reference to t.) /* Check for a built-in function. */ if (TREE_CODE (op0) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (op0, 0)) == FUNCTION_DECL && DECL_BUILT_IN (TREE_OPERAND (op0, 0))) { tree tmp = fold_builtin (t, false); if (tmp) return tmp; } If we want to change fold_builtin to take operands like op0, op1, and op2, I would have to change so many things that depend on fold_builtin. (Note that the subroutines of fold_builtin also depends on fold_builtin in a sense that they need the original tree, one whose TREE_CODE is CALL_EXPR is available.) So the question is: Is it worth changing all these? Or should I stop folding builtins in fold_ternary? IMHO, the latter isn't a totally crazy idea because the CCP folds builtins by directly calling fold_builtin, and Diego is planning to run CCP at several places, even at an early stage of tree SSA optimizations. (Currently, we depend on DOM to propagate constants, but it doesn't fold builtins IIRC.) Thoughts? Kazu Hirata
Things blocking fold_ARITY and fold_buildN
Hi, The uses of the whole expression in the current fold_ARITY are blocking transition to fold_ARITY (code, type, op0[, op1[, op2]]). Below I am showing places where we need to make changes in the form of a patch. Wherever 't' is used, we need to make changes. The first hunk comes from fold_unary, and the last one comes from the fold_ternary. The rest comes from fold_binary. I've already got suggestions from Zack and Roger for fold_ternary. I think I'll replace a call to fold_builtin with fold_builtin_1 and change fold_builtin_1 to take fn, arglist, and possibly static chain if needed (as suggested by Roger on IRC). COMPARISON_CLASS_P (t) is easy because it's equivalent to TREE_CODE_CLASS (code) == tcc_comparison. I might need help on others. I'll try to go through these "KAZU" one by one, but if you could volunteer to fix one, you are more than welcome. Suggestions like "Here is what I would do" would be fine, too. Kazu Hirata diff -u fold-const.c fold-const.c --- fold-const.c6 Mar 2005 00:44:01 - +++ fold-const.c6 Mar 2005 00:45:38 - @@ -6802,6 +6802,7 @@ { /* Don't leave an assignment inside a conversion unless assigning a bitfield. */ + /* KAZU. */ tem = copy_node (t); TREE_OPERAND (tem, 0) = TREE_OPERAND (op0, 1); /* First do the assignment, then return converted constant. */ @@ -7152,6 +7153,7 @@ if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0)) { + /* KAZU. */ tem = fold_binary_op_with_conditional_arg (t, code, arg0, arg1, /*cond_first_p=*/1); if (tem != NULL_TREE) @@ -7160,6 +7162,7 @@ if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1)) { + /* KAZU. */ tem = fold_binary_op_with_conditional_arg (t, code, arg1, arg0, /*cond_first_p=*/0); if (tem != NULL_TREE) @@ -7170,6 +7173,7 @@ switch (code) { case RANGE_EXPR: + /* KAZU. */ if (TREE_CONSTANT (t) != wins) { tem = copy_node (t); @@ -8611,6 +8615,7 @@ } /* See if we can build a range comparison. */ + /* KAZU. */ if (0 != (tem = fold_range_test (t))) return tem; @@ -8727,6 +8732,7 @@ /* If this is a comparison of two exprs that look like an ARRAY_REF of the same object, then we can fold this to a comparison of the two offsets. */ + /* KAZU. */ if (COMPARISON_CLASS_P (t)) { tree base0, offset0, base1, offset1; @@ -9142,6 +9148,7 @@ && (TREE_CODE (arg0) == MIN_EXPR || TREE_CODE (arg0) == MAX_EXPR) && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST) + /* KAZU. */ return optimize_minmax_comparison (t); /* If we are comparing an ABS_EXPR with a constant, we can @@ -9845,6 +9852,7 @@ && TREE_CODE (TREE_OPERAND (op0, 0)) == FUNCTION_DECL && DECL_BUILT_IN (TREE_OPERAND (op0, 0))) { + /* KAZU. */ tree tmp = fold_builtin (t, false); if (tmp) return tmp;
Re: Things blocking fold_ARITY and fold_buildN
Hi, > I might need help on others. I'll try to go through these "KAZU" one > by one, but if you could volunteer to fix one, you are more than > welcome. Suggestions like "Here is what I would do" would be fine, > too. Roger kindly suggested what to do for each of these on IRC. Kazu Hirata
A preliminary result of fold_buildN (memory usage included)
Hi, Here is a preliminary result for fold_buildN from my personal tree. I compiled cc1-i files with ./cc1 -quiet -O2 -fomit-frame-pointer -fmem-report -o /dev/null. I used --enable-gather-detailed-mem-stats to see how many nodes and bytes we are allocating for trees. (Thanks Honza!) I then summed up the memory usage of trees. original patched diff% #nodes24,234,58923,577,245 -2.712% #bytes 1,228,797,315 1,205,528,427 -1.893% As far as compile time, I consistently see improvements of 0.1% or a bit more. In case you are wondering what fold_buildN is, fold_build1 (code, type, op0) is equivalent to fold (build (code, type, op0)) except that fold_build1 does not build a scratch node when folding is successful. fold_build2 operates in a similar way. Note that ternary expressions did not take advantage of this fold_build3 mechanism because of the handling of CALL_EXPR in fold_ternary. Since ternary expressions are not as common as binary expressions, I don't except much difference even after we start using fold_build3. Kazu Hirata
Shall we take a short break from my fold changes?
Hi, Roger Sayle and I are wondering if we should take a short break (maybe a few days!?) from my fold changes. The reason is that I've already broken fold once. (Thanks goes to David Edelsohn for fixing the problem!) If we wait for a few days, it's likely that people bootstrap and test GCC on their favorite platforms. Some people might even try to build their favorite applications with mainline GCC. Plus, as Roger Sayle puts it, applying parts 16 and 17 http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00629.html http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00630.html would make it harder to revert any of parts 1 through 15. On the other hand, after applying parts 16 and 17, it's trivial to make fold_build[12] available although fold_build3 will take some time, and it would be a bit awkward to provide fold_build[12] but not fold_build3. Thoughts? Kazu Hirata
For those who want to automatically generate predicates.md...
Hi, I created a set of scripts that generates predicates.md based on PREDICATE_CODES in tm.h. The generated file looks like this: ;; Predicate definitions for FIXME FIXME. ;; Copyright (C) 2005 Free Software Foundation, Inc. ;; ;; This file is part of GCC. ;; ;; : ;; : Usual copyright notice ;; : ;; Return true if OP is a valid source operand for an integer move instruction. (define_predicate "general_operand_src" (match_code "const_int,const_double,const,symbol_ref,label_ref,subreg,reg,mem") { if (GET_MODE (op) == mode && GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == POST_INC) return 1; return general_operand (op, mode); }) : : More predicates follow. : 1. A copyright is automatically inserted except the port name. 2. A comment for each function is taken from tm.c. 3. The name of a predicate along with codes it accepts are automatically taken from PREDICATE_CODES. 4. The C code for a predicate is automatically taken from tm.c. My scripts will only generate predicate.md. It does not remove PREDICATE_CODES from tm.h, predicates from tm.c, or prototypes from tm-protos.h. All these are left for your code cleanup pleasure. :-) Another thing that my scripts won't do is to convert a C-style predicate to a LISP-style predicate. My scripts are only meant to alleviate the mechanical part of the conversion. Anyway, untar the attachment and run predicatecodes.sh h8300 under config/h8300 to generate predicates.md. Of course, you can replace h8300 with any port with PREDICATE_CODES. My scripts are not robust, so don't blame me if they eat your files. I might actually start posting patches to convert to predicate.md. Kazu Hirata conv_predicate_codes.tar.gz Description: Binary data
Obsoleting more ports for 4.0.
Hi, First off, Mark, if you think this stuff is too late for 4.0, I'll postpone this to 4.1. Please note that all we have to do is add a few lines to config.gcc as far as printing the "obsolete" message is concerned. Below, I propose to obsolete the following three architectures for GCC 4.0 and remove them for 4.1 unless somebody steps up and does *real work*. If you are working on these ports, please send us real patches. If you would like to work on these ports privately, please refrain from telling us that port xxx should be kept. More ports we have, more work we would have to do to clean up the infrastructure connecting the middle end and the backends. arc --- No maintainer. PR 8972 hasn't been fixed. GCC miscompiles a function as simple as. int f (int x, int i) { return x << i; } There were some recent interests like http://gcc.gnu.org/ml/gcc/2004-10/msg00408.html Obsoleting a port on the grounds of a single bug may seem a bit strange. However, PR 8972 implies that nobody is working on the FSF's mainline at least. PR 8973 hasn't been fixed. fr30 The same justification as http://gcc.gnu.org/ml/gcc/2004-06/msg01113.html Nobody showed an interest in keeping this port. i860 A hardware implementation is not currently being manufactured. Jason Eckhardt, the maintainer of i860, has told us that it would be OK to go. http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02033.html ip2k PR 20582: libgcc build fails despite some interests, such as http://gcc.gnu.org/ml/gcc/2004-06/msg01128.html ns32k - The same justification as http://gcc.gnu.org/ml/gcc/2004-06/msg01113.html Nobody showed an interest in keeping this port. Kazu Hirata
A question about java/lang.c:java_get_callee_fndecl.
Hi, I see that the implementation of LANG_HOOKS_GET_CALLEE_FNDECL in Java always returns NULL (at least for the time being). static tree java_get_callee_fndecl (tree call_expr) { tree method, table, element, atable_methods; HOST_WIDE_INT index; /* FIXME: This is disabled because we end up passing calls through the PLT, and we do NOT want to do that. */ return NULL; : : Is anybody planning to fix this? If not, I'm thinking about removing this language hook. The reason is not just clean up. Rather it is because I need to change the prototype of get_callee_fndecl and LANG_HOOKS_GET_CALLEE_FNDECL.. Currently, fold_ternary has the following call tree. fold_ternary get_callee_fndecl java_get_callee_fndecl If I change fold_ternary to take components of CALL_EXPR like the address expression of CALL_EXPR and the argument list, instead of CALL_EXPR itself, I would have to change java_get_callee_fndecl to take the first operand of a CALL_EXPR, instead of a CALL_EXPR. It's not that the change is so involved, but it doesn't make much sense to keep something dead up to date. In other words, when I posted the following patch http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02038.html Roger Sayle requested to keep the call to get_callee_fndecl so that we can "fold" the first operand of a CALL_EXPR to a FUNCTION_DECL. FYI, the above FIXME comes from http://gcc.gnu.org/ml/java-patches/2004-q2/msg00083.html Kazu Hirata
Re: Obsoleting more ports for 4.0.
Hi Ramana, > PR8972 has been fixed in our tree and we should be pushing > the patch upstream soon. We had got the FSF tree building > with patches to fix PR17317 / PR11476 / PR17240 .We have > also been making fixes with binutils which can be found in > the corresponding archives. If that's the case, you might want to assign those bugs to yourself and leave some comments on each of them, saying that you've got a patch and will be posting it soon. Doing so will prevent several people from independently working on the same problem (although it may be unlikely in particular this case). Also, note that our bugmasters go through open bugs from time to time. If you merge your fix into mainline in a timely manner, they wouldn't have to spend extra time to see if bugs still exist. > We would like to merge our changes with mainline, 4.0 branch > and the 3.4 branch and continue work in the FSF domain. What > would be the best way to go about this ? Send patches to [EMAIL PROTECTED] If you are familiar with ARC and willing to continue to work on the port, you may want to become a maintainer. That will speed up the process of merging your work into the FSF tree. Kazu Hirata
Do we still need get_callee_fndecl?
Hi, I am wondering if we still need get_callee_fndecl in the presence of tree optimizers. I think this function performs a form of constant propagation at a rather strange place. Currently, given an address expression of CALL_EXPR, get_callee_fndecl tries to get to its DECL_INITIAL like so. STRIP_NOPS (addr); /* If this is a readonly function pointer, extract its initial value. */ if (DECL_P (addr) && TREE_CODE (addr) != FUNCTION_DECL && TREE_READONLY (addr) && ! TREE_THIS_VOLATILE (addr) && DECL_INITIAL (addr)) addr = DECL_INITIAL (addr); This is a constant propagation in that we are propagating the value of DECL_INITIAL (addr) into a call site. In fact, even if I remove the block of code above from get_callee_fndecl, the tree optimizers can still turn an indirect call into a direct one "bar ()" in the following test case. extern int bar (void); typedef int (*bar_type) (void); const bar_type g = &bar; int foo_1 (void) { /* Call through a function pointer stored in a const global variable. */ return (*g)(); } A call through a function pointer stored in a local variable would be fine, too. int foo_2 (void) { /* Call through a function pointer stored in a local variable. */ bar_type h = &bar; return (*h)(); } But a call through a function pointer stored in a constant global array is not optimized. Currently, we don't dig that deep. const bar_type array[3] = { bar, bar, bar }; int foo_3 (void) { /* Call through a function pointer stored in a constant global array. */ bar_type h = array[2]; return (*h)(); } All of the above should illustrate my point. If we improve our constant/copy propagator to dig into a constant array and structures, we would get foo_3 optimized for free. After all, a function pointer is just a variable, and if we have an assignment of a constant, we can propagate the constant into its uses. The Java front end used to look into an array, but it doesn't nowadays. I wonder if there is any data structure that a front end does not expose to the middle for the purpose of constant propagation. After all, all we need in get_callee_fndecl seems to be addr = TREE_OPERAND (call_expr, 0); return ((TREE_CODE (addr) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL) ? TREE_OPERAND (addr, 0) : NULL_TREE; Thoughts? Kazu Hirata
Re: Do we still need get_callee_fndecl?
Hi Andrew, > I am wondering if we still need get_callee_fndecl in the presence of > tree optimizers. I think this function performs a form of constant > propagation at a rather strange place. Sorry for omitting you in the CC. Once you fix java_get_callee_fndecl, perhaps we only need to call it during inlining? Kazu Hirata
Conversion from fold (buildN (...)) to fold_buildN.
Hi, I would like to announce that fold_buildN are now ready. What's this? Put shortly, this is a tree equivalent of simplify_build_{unary,binary,ternary}. For example, we could replace fold (build2 (MULT_EXPR, type, op0, op1)); with fold_build2 (MULT_EXPR, type, op0, op1); while avoiding a scratch node that build2 would create when folding is possible. What's so good about this? -- Reduced memory usage. Here is some quantitative justification for this change: http://gcc.gnu.org/ml/gcc/2005-03/msg00277.html It's my my turn to ask you a question - When and how do we want to do these conversion? Currently, I am thinking about doing converting all of "fold (buildN (...))" and "fold (build (...))" to fold_buildN as soon as stage 2 starts, which is about one month away, so that I won't annoy people as much. I don't mind people starting to use fold_buildN on new code or doing conversions on what they maintain, but I don't think now is the right time to do the wholesale conversion. Would you like fold_{unary,binary,ternary} exported? These are equivalent to simplify_{unary,binary,ternary}. They return a folded tree if folding is possible. Otherwise, they return NULL_TREE. If need arises, I am happy to export them for you. Kazu Hirata
Re: Do we still need get_callee_fndecl?
Hi Dale, > > After all, all we need in get_callee_fndecl seems to be > > > > addr = TREE_OPERAND (call_expr, 0); > > return ((TREE_CODE (addr) == ADDR_EXPR > >&& TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL) > > ? TREE_OPERAND (addr, 0) : NULL_TREE; > > > > Thoughts? > > In Objective C (and ObjC++) it's also a good idea to look under > OBJ_TYPE_REF. > See this patch, which was deferred to 4.1 and I'm going to resubmit RSN: > http://gcc.gnu.org/ml/gcc-patches/2004-12/txt00122.txt Thanks for the information. Does OBJ_TYPE_REF_EXPR only apply to a CALL_EXPR? In other words, are there other forms of constants that are exposed by looking into OBJ_TYPE_REF_EXPR? Kazu Hirata
Merging CCP and VRP?
Hi Diego, Have you considered merging CCP and VRP (as suggested by Kenny last year at the summit)? Originally I was thinking that ASSERT_EXPRs, or ranges gathered by VRP rather, were very useful for jump threading, but they seem to be useful for constant propagation, too. Consider void bar (int); void foo (int a) { if (a == 0) bar (a); } At the end of VRP we still have : if (a_1 == 0) goto ; else goto ; :; a_2 = 0; bar (a_2); <-- a_2 isn't replaced with 0 yet! Note that we don't have bar (0) at the end. This is because currently VRP does not have the equivalent of substitute_and_fold. We just use the range information to fold COND_EXPR; we don't fold each statement using constants and ranges gathered by VRP. We could have VRP call substitute_and_fold, but then once we do that, the distinction between CCP and VRP would become less clear. Plus, propagating conditional equivalences may have a chain effect on simplification, especially with inlining. That is, an equivalence like "a = 0" above may end up massively simplifying the code. For the obvious reason, we don't want to do do { Run CCP; Insert ASSERT_EXPRs. Run VRP; Remove ASSERT_EXPRs. } while (something changes) So I am thinking about inserting ASSERT_EXPRs up front *before* going into SSA, without much of pruning, and then run an enhanced version of CCP, which includes ranges and anti-ranges in the lattice, which are all suggested by Kenny last year. I'm thinking about keeping ASSERT_EXPRs until it's difficult to keep them. I don't know much about loop optimizers, so if I were to write code to keep ASSERT_EXPRs, I might give up by turning ASSERT_EXPRs into copy statements just before hitting loop optimizers. :-) I have not figured out how to deal with ASSERT_EXPRs in FRE, but Daniel Berlin says I just have to teach tree-vn.c how to deal with it. One note about the order of optimizations. I think it's a good idea to run an enhanced version of CCP before copy-prop because the propagation engine can deal with presence of copies very well, whether copy statements or PHI nodes. If we run copy prop after an enhanced version of CCP, we would still have useful information in SSA_NAME_VALUE_RANGE at the end. Copy prop only kills newer copies; it doesn't even touch SSA_NAME_VALUE_RANGE stored in older copies. Last but not least, I'm willing to do the work, but I'd like to be more or less on the same page with other people hacking these scalar optimizations before I do any significant work. Thoughts? p.s. By the way, looking at Kenny's slides from last year, one thing we have not implemented in our propagation engine is to process the CFG worklist in the DFS order of a dominator tree. I haven't looked closely at this, but if the speed of propagation is a concern, we should come back to this. Kazu Hirata
Re: Merging CCP and VRP?
Hi Diego, > By merging, do you mean *replacing* CCP with VRP? Yes, it's > doable. No, it's not a good idea. Understood. Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to run copy-prop before VRP. Otherwise, we would end up with lots of D.18001_101 = D.18001_198; D.18011_102 = D.18001_101->head_; D.18001_12 = ASSERT_EXPR ; Note that ASSERT_EXPR is on D.18001_101, not on D.18001_198, which is older. As a result, VRP does not notice that D.18001_198 != 0B. Currently, we still have these even after copy prop because we don't allow copy propagation between const and non-const pointers, which I think is a bit too restrictive. > At one point, all the passes had to deal with ASSERT_EXPRs. > Mostly by ignoring them. Which is additional, unwanted work > because some of them had to actively know about them being > nothing but fancy copy operations. That gets in the way of their > work. I think that ASSERT_EXPRs should only survive as long as > they're useful. Yes, playing with VRP, I've realized that I am not interested in ASSERT_EXPRs per se, but I am more interested in SSA_NAME_VALUE_RANGE that VRP leaves us with. > Sure. Go ahead. My short term plan is to start merging the various > components into mainline. I'll start with the incremental SSA > patches, followed by VRP, the CCP and copy-prop improvements. > Perhaps we could leave the changes to the threader in TCB for a > little while longer, but first I need to read your proposal in > detail. Thank you. It turns out that if I blindly insert ASSERT_EXPRs for everything that allows us to infer a range, my jump threading pass finds more opportunities than what the current DOM can find. There are certain classes of opportunities that my pass misses but DOM catches, but I'll mention them in a separate message. > ISTR either stevenb or dberlin implementing a dom-order > propagation. I think they got a minor speedup that could be > worth having. Huh, whey I talked to them on IRC they didn't seem to have implemented this. I'll try to get this issue one of these days. Kazu Hirata
Re: Merging CCP and VRP?
Hi Diego, > There is a copy-propagation pass before VRP. Or do you mean > right before? Sure, the ordering of these passes is in eternal > flux anyway. "Before", but doesn't have to be "right before". The current ordering is reasonable. > > Currently, we still have these even after copy prop because we don't > > allow copy propagation between const and non-const pointers, which I > > think is a bit too restrictive. > > > I assume these are blocked by may_propagate_copy. What's > blocking these? Do they have different alias set numbers or do > we need a conversion from the const to the non-const version? Yes, they are blocked by may_propagate_copy. Seems to be const v.s. non-const pointer issue. I haven't come up with a brakedown of reasons why copies are blocked. Kazu Hirata
Re: Merging CCP and VRP?
Hi Jeff, > We'd have to go back and find the PR. I don't remember all the > details, but the problem was big enough to make ASSERT_EXPRs a > far inferior solution to the others I'd looked at for recording > context sensitive equivalences. Yes, inserting a bunch of ASSERT_EXPRs, updating SSA, running VRP, etc, all take a lot of time although using the information gathered by VRP for jump threading purposes is pretty straightforward and fast. So I am sandwiched between two things. 1) DOM is fairly fast with its iteration is limited, which shouldn't be too restrictive if we are running CCP/copy-prop before DOM. But DOM rebuilds information built by other passes, like keeping track of nonzero-ness, const/copy table, available expr table, etc. 2) VRP is slow and even slower with PHI node insertion. But if we use its result for jump threading, we are not duplicating much code. Plus the context-sensitive equivalences are globally visible and fine grained, so we don't have to mix identification of jump threading opportunities into a dom walk. (ASSERT_EXPRs and PHI nodes essentially splits a single variable into multiple variables, making a bunch of smaller live ranges to allow fine-grained value ranges to be stored.) Having said all this, I can't think of uses of these fine-grained context-sensitive equivalences other than jump threading, so I can't justify constructing this much information just for jump threading. > > > And, yes, it is imperative that we const/copy propagate into > > > ASSERT_EXPRs to the fullest extent possible. Otherwise we lose a lot > > > of threading opportunities. > > > > > ASSERT_EXPRs are regular expressions, so this is not a problem. > It is if you don't run const/copy propagation immediately after > insertion of the ASSERT_EXPRs :-) It may also be a problem to not run const/copy prop before insert ASSERT_EXPRs as noted in the problem of > D.18001_101 = D.18001_198; > D.18011_102 = D.18001_101->head_; > D.18001_12 = ASSERT_EXPR ; where we are asserting a copy of D.18001_198, and D.18001_198 itself may also be used later. Kazu Hirata
Obsoleting c4x last minute for 4.0
Hi, I would like to propose that the c4x port be obsoleted for 4.0. c4x-* tic4x-* The primary reason is that the port doesn't build. Richard Sandiford's recent patch allows us to go further during the build process, but the port still does not build. http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01840.html I've filed the current build problem as PR 20765. There are two other outstanding PRs against c4x, namely PR 14436 and PR 17824. Note that the maintainer hasn't commented on either one of these PRs, which worries me a bit. If the port doesn't build, we cannot clean up the interface between the middle end and back ends because we cannot even do minimal testing of "port builds". The PREDICATE_CODE clean-up is almost over, but c4x is blocking the last one. Given that the stage 2 hasn't even started, we may be able to clean up a few other things for 4.1 (if every port builds at least). Mark, it's still April 5, so we could wait for a week and add a couple of lines to config.gcc in time for 4.0, but if you think this is too pushy or could distract the 4.0 effort, I am happy to postpone this until 4.0.1 or 4.1. Kazu Hirata
Status of conversions to predicates.md
Hi, c4x The port doesn't build. cris A patch has been posted. http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01705.html Hans-Peter Nilsson says he wants to modify the port, so the predicate conversion patch may need to be adjusted. i860 Obsoleted. ip2k Obsoleted. mmix A patch has been posted. http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00272.html Hans-Peter Nilsson says he will pick up my work and finish up the conversion. sh A patch has been posted. http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00346.html Joern Rennecke requested me to hold off. sparc Eric Botcazou says he is working on predicates.md himself. http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01694.html Kazu Hirata
Bootstrap failure on i686-pc-linux-gnu since 2005-04-09 20:41UTC
Hi, I have been getting bootstrap failures on i686-pc-linux-gnu since Caroline's patch http://gcc.gnu.org/ml/gcc-cvs/2005-04/msg00541.html Errors go like so: org/xml/sax/helpers/ParserAdapter.java: In class 'org.xml.sax.helpers.ParserAdapter': org/xml/sax/helpers/ParserAdapter.java: In constructor '()': org/xml/sax/helpers/ParserAdapter.java:84: internal compiler error: Segmentation fault I have confirmed that the first patch attached to PR 20934 fixes this problem. Looking at the audit trails in PR 20934, I see that Caroline has been working on this problem. What is the status of the patch? Is there more work or refinement needed? Kazu Hirata
About the number of DOM's iterations.
Hi, Note that with -O1, we limit the number of iterations of DOM to 1 so that we can get resonable compile time performance. I was wondering what happens if we do the same with -O2 now that we have passes like copy-prop, VRP, and improved FRE as well as good and old CCP. Numbers go first. All of the following numbers are obtained from compiling cc1-i files with -O2. version #iters #blocks %cov #thread v0 9097 514501 94.5%9438 v1 8602 553685 95.5% 20868 v2 8609 510618 96.3% 23236 v3 8490 510734 97.2% 23262 v4 7193 188793 N/A 18450 Legend -- v0 : Vanilla mainline with the second and third runs of DOM disabled. v1 : v0 with Jeff's patch in PR 19794 v2 : v1 with the following TCB-like tweak to the pass ordering NEXT_PASS (pass_ccp); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_fre); NEXT_PASS (pass_dce); NEXT_PASS (pass_vrp); NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_dominator); v3 : v2 with my queued VRP improvements v4 : v3 with the number of DOM's iterations limited to 1 #iters : the number of times we iterate in the do-while loop in tree-ssa-dom.c:tree_ssa_dominator_optimize. #blocks : the number of blocks scanned by DOM. This is done by adding one line at the entrance of the aforementioned do-while loop. num_cfg_blocks += n_basic_blocks; %cov : the percentage of invocations of DOM that finished its work within 2 iterations. #threaded : the number of edges threaded Disclaimers --- o I have not measured compile time performance. o I have not done benchmark on generated code. o I have not tried any other pass ordering. o I have not considered other testcases or languages. Justifications for v0, ..., v4 and columns -- v0 is because I wanted to stay simple and focus on the initial basic propagation. v1 is because Jeff will soon remove various restrictions on the jump threader that have been limiting jump threading. For v2, note that ccp, copy-prop, FRE, and VRP are all something that DOM does to some extent. I put my merge_phi right before DOM so that DOM can find more jump threading opportunities in its first iteration. #iter is mostly out of curiosity. Functions that we compile come in different size, so the number isn't that important. For the same reason, #blocks is more interesting. It's a rough measure of how much time DOM takes. Now %cov. Why 2 iterations? Well, DOM has a fixed point operation, meaning that the last iteration does not do anything. More specifically, the last iteration does not perform any jump threading. Whenever jump threading opportunities are taken, cleanup_tree_cfg called from tree_ssa_dominator_optimize always returns 1, triggering the next iteration. Observations Even with v0, the vast majority of functions require at most two iterations, so we are in diminishing returns territory to begin with. With v3, we get impressive 97.2%, meaning that 97.2% of time, DOM iterates at most two times, which in turn means that 97.2% of time, we don't take any jump threading opportunity in DOM's second iteration. Note that with v4, the number of blocks that DOM looks at goes down significantly. At the same time, the number of edges threaded also goes down but is still close to the level of v1. It's interesting that even if we let DOM iterate as many times as it likes to iterate, there is a sizable difference between v1 and v2 as far as the number of edges threaded. Certain things cannot be propagated using DOM. Future work --- o Address things listed in Disclaimers. o Submit my queued VRP patches as soon as mainline gets a bit more stable. o Speed up things like the incremental SSA update, and the propagator. o Try out some ideas to get more out of the first iteration of DOM. o Take advantage of value ranges in thread_across_edge. Kazu Hirata
What's the fate of VARRAY_*?
Hi Nathan, Now that you've checked in your new VEC API, I think that's strictly more powerful than VARRAY. Should we start converting uses of VARRAY to VEC? I'd be happy to help out here. Kazu Hirata
Re: Regression on mainline in tree-vrp.c
Hi Rainer and Steve, > | 4.1.0 20050419 (experimental) (i386-pc-solaris2.10) GCC error: | > | in set_value_range, at tree-vrp.c:124| > | Error detected at sem_intr.adb:437:1 | There seem to be several ways to get to this ICE. set_value_range is used in many places in tree-vrp.c. I'll try to take a look at these in the next few days. Kazu Hirata
Merging stmt_ann_d into tree_statement_list_node
Hi, I am thinking about merging stmt_ann_d into tree_statement_list_node. Background -- tree_statement_list_node is a linked list node defined like so struct tree_statement_list_node { struct tree_statement_list_node *prev; struct tree_statement_list_node *next; tree stmt; }; stmt_ann_d is an annotation for a statement ("stmt" above) that includes operand cache, the basic block that the statement belongs to, and various flags among other things. Justifications -- o We have a 1:1 correspondence between tree_statement_list_node and stmt_ann_d. Why have separate data structures? o Cache win. Calling bsi_stmt loads tree_statement_list_node into memory, which might in turn bring members of stmt_ann_d into cache. Note that it's very common to call bsi_stmt and then do something with operand cache while walking statements in a basic block. o bsi_for_stmt becomes an O(1) operation. Steven Bosscher says that the CFG inliner will hit bsi_for_stmt hard. o Less memory management overhead. o One step closer to a flat statement structure (or a tuple form if you like). It's a bit silly that we have GIMPLE, but we still do not have a flat structure. o We can probably remove get_stmt_ann and replace all uses of stmt_ann. No more lazy stmt_ann allocation. Thoughts? Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Zdenek, > guessing from (*) below, you probably just forgot to mention that it also > involves changing SSA_NAME_DEF_STMT to point to the > tree_statement_list_node rather than the statement itself, > otherwise there is no way how to get the > annotation for SSA_NAME_DEF_STMT (name)? I am not planning to change SSA_NAME_DEF_STMT yet. If stmt_ann is embedded into tree_statement_list_node, we can compute the address of tree_statement_list_node from that of stmt_ann. I am planning to keep a pointer from a statement proper to a stmt_ann for now. > It is definitely a good idea, IMHO. Thanks! I might need your help later as a loop optimizer expert. I just found yesterday that tree-ssa-loop-im.c:schedule_sm is using stmt_ann before a statement proper is linked to a basic block. Specifically, it creates a statement, attaches a stmt_ann, and puts it on edge. We probably have to one of the following. 1) delay the creation of stmt_ann until after a statement is linked, 2) create a stmt_ann just like we do now, and then copy its contents to stmt_ann embedded in tree_statement_list_node, or 3) create a tree_statement_list_node in schedule_sm, which contains stmt_ann, and link it to a basic block. (This option would require bigger changes as we are currently supposed to put a statement in PENDING_STMT, not a tree_statement_list_node. Plus, bsi_insert_* and tsi_link_* stuff all expect a statement.) Right now I am just instrumenting stmt_ann to see where people are using stmt_ann while a statement is not linked to a basic block. Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Andrew, > It will also involve figuring out what to do with DOM, which has some > hacks via a call to create_ssa_artficial_load_stmt() which causes a > non-stmt to have a stmt annotation and operands created for it. I've > been bugging Diego to replace the parts of DOM which use this so we can > remove the hideous abberation, but it still exists at this point. > > I believe DOM uses the routine to create a dummy expression for a store > which looks like a load. This load is then entered into the equivalency > tables such that the store of the expression also shows up as a load > allowing subsequent loads to be subsumed. Or something like that > anyway. Bottom line is it creates a stmt annotation and operands for an > expression which is not in any stmt list. Does this standalone stmt_ann end up in the instruction stream? That is, are we going to have a statement as a part of a basic block whose stmt_ann is the one created in create_ssa_artficial_load_stmt? If not, I guess we can let DOM do what it wants to do because I am currently thinking about simply embedding stmt_ann into tree_statement_list_node like so. struct tree_statement_list_node GTY ((chain_next ("%h.next"), chain_prev ("%h.prev"))) { struct tree_statement_list_node *prev; struct tree_statement_list_node *next; tree stmt; struct stmt_ann_d ann; }; Though leaving this standalone stmt_ann as isn't probably the cleanest thing to do. Eventually, I want to completely hide creation/destruction of stmt_ann from applications (or optimizers if you like) of the infrastructure. If this standalone stmt_ann does end up in the instruction stream, we could structure-copy stmt_ann for the time being (yuck!). Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Andrew, > > I might need your help later as a loop optimizer expert. I just found > > yesterday that tree-ssa-loop-im.c:schedule_sm is using stmt_ann before > > a statement proper is linked to a basic block. Specifically, it > > creates a statement, attaches a stmt_ann, and puts it on edge. We > > probably have to one of the following. > > > > 1) delay the creation of stmt_ann until after a statement is linked, > > > > 2) create a stmt_ann just like we do now, and then copy its contents > >to stmt_ann embedded in tree_statement_list_node, or > > > > 3) create a tree_statement_list_node in schedule_sm, which contains > >stmt_ann, and link it to a basic block. (This option would require > >bigger changes as we are currently supposed to put a statement in > >PENDING_STMT, not a tree_statement_list_node. Plus, bsi_insert_* > >and tsi_link_* stuff all expect a statement.) > > > > Right now I am just instrumenting stmt_ann to see where people are > > using stmt_ann while a statement is not linked to a basic block. > > These are in fact issues with immediate_uses too. I fixed the ones I > found that were causing problems, but the stmt annotation and operands > should *NOT* be created until a stmt is actually linked the program via > a bsi_ routine or some such thing. Ah, so we should go with 1) above wherever possible. Once these uses of stmt_ann are fixed, we can arrange things so that we *CANNOT POSSIBLY* create stmt_ann and operands (except maybe in create_ssa_artficial_load). Then the actual work of embedding stmt_ann into tree_statement_list_node shouldn't be difficult. Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Andrew, > well, there isnt a tree_statement_list_node right... DOM simply has an > expression, and we end up calling create_stmt_ann. I guess you would > have to create a completely detached stmt_list node for this case. what > are you planning to have create_stmt_ann do? I am thinking about leaving create_stmt_ann as is just for create_ssa_artficial_load_stmt. Other than that, I don't want to call create_stmt_ann. I am planning to allocate stmt_ann as a part of tree_statement_list_node. That is, if I have this struct tree_statement_list_node GTY ((chain_next ("%h.next"), chain_prev ("%h.prev"))) { struct tree_statement_list_node *prev; struct tree_statement_list_node *next; tree stmt; struct stmt_ann_d ann; }; then tsi_link_after and tsi_link_before would allocate the right amount of memory because they have head = ggc_alloc (sizeof (*head)); where head is a a pointer to tree_statement_list_node. I still have to initialize members of stmt_ann_d. I believe all I have to do is memset (ann, 0, sizeof (ann)); ann.common.type = STMT_ANN; ann.modified = true; > I eventually want to see this entire routine go away. The only reason > DOM wants it is so that it can scan the operands like it does for a real > stmt. noone else needs to do such a thing.. Presumably the overhead of > actually inserting these loads immediately after the store and deleting > them later is too high... Ive never looked at the details of what DOM > does with the info. if it only needs the operands for a short time, a > different interface could be provided which returns the operands in a > VEC or something. I just never felt like looking at DOM too heavily. I guess we can come back to this once stmt_ann is merged into tree_statement_list_node. I don't feel like mixing infrastructure work and optimizer work. By the way, I just realized you were planning to do "New SSA Operand Cache Implementation" in stage 2, which is starting tomorrow. Would I be stepping on your toes? I am not planning to change the members of stmt_ann, so hopefully my work won't interfere with yours, but if it does, please let me know. Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Zdenek, > Once this is done, I would like to start working on moving gimple out of > trees to the "flat" structure. Very cool! I am glad you'll be working on this! Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Andrew, > How is the get_stmt_ann (stmt) going to work from the caller of > create_ssa_artificial_load_stmt() when it goes to look at the operands > for its new "stmt"? I am going to replace that particular call to get_stmt_ann with a call to create_stmt_ann. The beginning of create_ssa_artficial_load_stmt would look like /* Make sure we don't have a stmt_ann already. */ gcc_assert (stmt_ann (stmt) == NULL); /* Create a standalone stmt_ann that is not a part of tree_statement_list_node. */ create_stmt_ann (stmt); > The annotation was simply attached to the stmt before, so it was > easy to locate. There is no bsi_for_stmt() since it isnt a real > stmt, so you can't get to a stmt_list node that way either. Im not > sure how you would locate the annoation for a fake stmt like this. I am going to keep stmt_ann() so that you can get to stmt_ann from stmt even for a fake statement. Eventually, we might want to replace stmt_ann() with stmt_ann_from_bsi() so that you can get to stmt_ann from bsi directly without going through a statement proper. I think this is a good follow-up project and should be a good preparation for Zdenek's flat statement tree project. > And DOM treats it like a real stmt by calling the same routines, so I > dont know how you could change this without first making DOM do > something different. Again, I'll let DOM create its own instance of stmt_ann_d for stores. Since it's only used for look-ups and never entered into the instruction stream, I don't think that would cause a problem. > I have the patch done, I'm evaluating whether its worth using or not. > > I doubt you would be stepping on my toes any more than the annoyance of > diffs not applying all over the place like they did with > get_stmt_operands() being removed. OK. Meanwhile I'll sit here quietly, collecting information like which places do weird things with stmt_ann and thinking about how to go about the transition. Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Andrew, > OK, let me word it a different way. What is create_stmt_ann (stmt) > going to do with the annotation it creates? You are moving the > stmt_ann_t pointer from the stmt node to the tree_statement_list_node... > so there will be nothing in the stmt node to attach the annotation to > like there is now. Ah, OK. I am going to keep union tree_ann_d *ann; in tree_common. So create_stmt_ann will store a pointer to stmt_ann_d in ann in tree_common just like before. In other words, for normal cases, we would have tree_statement_list_node+stmt_ann_d | ^ | via stmt | via ann in tree_common v | MODIFY_EXPR -+ For DOM's case, we would have stmt_ann_d ^ | via ann in tree_common | MODIFY_EXPR -+ So stmt_ann_d won't be hanging in space. > I though the whole point of this exercise was to remove the > stmt_ann_t pointer from the stmt node, and put it directly in the > tree_statement_list_node. Im saying that will not work until you > either change DOM to not use these fake stmt's, or figure out some > way of associating a stmt_ann to these fake stmts. No, I would like to remove stmt_ann_t pointer from the stmt node, but not yet. All I want to do in this project is to put tree_statement_list_node and stmt_ann_d next to each other in memory while doing necessary adjustments in the compiler. Nothing more than that. Yes, it is nice to get rid of stmt_ann_t pointer from the stmt node, and I want to eventually get that, but not in this project. I think there are so many places that use "stmt" instead of "bsi". They may either pass it to subroutines or put it in some internal data structures, so I would like to do that separately. So if you like, the roadmap might be something like 1. Put tree_statement_list_node and stmt_ann_d next to each other in memory while doing necessary adjustments. 2. Provide stmt_ann_from_bsi(), a function that gives you stmt_ann given bsi. 3. Let everybody use stmt_ann_from_bsi(), replace stmt_ann(), and remove the stmt_ann_t pointer from the stmt node (if these are possible at all). 4. Hand off to Zdenek for his flat statement project. I hope this message clarifies things. Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Adnrew, > > No, I would like to remove stmt_ann_t pointer from the stmt node, but > > not yet. All I want to do in this project is to put > > tree_statement_list_node and stmt_ann_d next to each other in memory > > while doing necessary adjustments in the compiler. Nothing more than > > that. > > does that buy us much? I have to try, but it seems to be the lowest hanging fruit as far as the instruction chain infrastructure goes. > Then why not simply shorten this to: > > 1) put stmt annoation directly in the tree_statement_list_node > > 2) replace stmt_ann_t pointer in stmt with a pointer to its BSI, or the > tree_statement_list_node. This makes bsi_from_stmt O(1) immediately. > > 3) all stmts now have access to the stmt_ann directly in the > stmt_list_node, nothing changes except get_stmt_ann() returns the > address of the stmt ann within the tree_stmt_list_node. > > 4) For fake stmts you will have to allocate a fake tree_stmt_list_node > which isnt linked with anything, and set the bsi pointer to that... then > you have the annotation and access to it from the stmt. > > that seems like a quicker more direct approach then phasing it in > slowly, with less impact to anyone, and gives you immediate benefits. I > would think the patch to do this would be fairly small and > non-intrusive. This looks like a better approach. How would we do your step 1? We have var_ann and tree_ann in addition to stmt_ann. Shall we put a type field at the beginning of tree_statement_list_node+stmt_ann_d so that an annotation node can identify itself? (Since all these tree annotations already have a field for annotation type, it's more like appending tree_statement_list_node to stmt_ann_d.) Kazu Hirata
Re: Merging stmt_ann_d into tree_statement_list_node
Hi Zdenek, > > This looks like a better approach. How would we do your step 1? We > > have var_ann and tree_ann in addition to stmt_ann. Shall we put a > > type field at the beginning of tree_statement_list_node+stmt_ann_d so > > that an annotation node can identify itself? (Since all these tree > > annotations already have a field for annotation type, it's more like > > appending tree_statement_list_node to stmt_ann_d.) > > I would go just for having > > union > { > struct stmt_list_node *container; /* For gimple statements. */ > tree_ann_t ann; /* For everything else. */ > } Err, I don't see how to tell the garbage collector about this without a type field. We cannot rely on TREE_CODE (stmt) because CALL_EXPR may appear by itself or as an rhs of MODIFY_EXPR. Kazu Hirata
Re: Struggle with FOR_EACH_EDGE
t;vec[0]; D.15284_47 = e_57->dest; if (D.15284_47 == 0B) goto ; else goto ; :; ivtmp.198_31 = vec__44; goto (); For normal mode of operation, we have bb0 -> L17 -> L18 -> L19 -> L30 -> L3 -> L9 ^ | +-+ This is similar to the previous case, but notice that we have two induction variables, namely ivtmp.195_53 and ivtmp.198_7. My guess is that a bunch of * and & are confusing the optimizers, but that's just my guess. Can we somehow have the best of the two worlds? I want to use the container field (as in the last illustration), but I also want to use VEC_iterate. Note that the second case above speeds up GCC by 0.5% or so. Kazu Hirata
Re: Struggle with FOR_EACH_EDGE
Hi Nathan, > Kazu, I hope I have time to look in detail at your investigation, however > one thing that might be causing a problem is that FOR_EACH_EDGE is an iterator > that works for a non-constant VEC. This means there's an extra level of > indirection that the optimizer cannot remove, unless it completely inlines > the body of the loop (there might be some cases it can do it without inlining, > but I suspect not). I wonder if it is worthwhile implementing FOR_EACH_EDGE > and FOR_EACH_EDGE_VOLATILE (I want the default, shorter name, to be the > constant iterator, so that you have to chose to use the non-constant one). Yes, the iterator that works for a non-constant VEC adds some overhead. I am wondering if it's OK to tweak FOR_EACH_EDGE like so #define FOR_EACH_EDGE (...)\ if (vec == NULL) \ /* The vector is empty. We don't have anything to do. */ \ e = NULL; \ else \ for (/* usual iterator stuff */) In other words, I am wondering if it's safe to assume that nobody calls VEC_free during edge vector iteration. (I know calling VEC_free sounds crazy, but I want to hear second opinions). If I am allowed to put an "if" like this, the optimizers (or slightly reordered ones) can remove the null check in the loop. Anyway, there seem to be several things going on. 1. FOR_EACH_EDGE itself could be improved (like above). 2. VRP is not really effective in this case because it is run before SRA, meaning it cannot get at as many scalar variables. (Andrew Pinski pointed this out on IRC). 3. VRP does not know that &x->a is nonnull is x is nonnull. (PR21294) I'll start with a low hanging fruit. Kazu Hirata
Questions about a constant array reference in the C++ front end
Hi, I have two questions about the C++ front end. Consider a C++ program static const int array[] = { 0, 0, 0, 1, 1, 1, 2, 2, 2 }; int foo (int a) { return array[7]; } I am trying to fold array[7] into 2. It turns out that if T is an ARRAY_REF, TREE_READONLY (TREE_OPERAND (T, 0)) is 0. Why? This would be 1 if the program is fed into the C front end, which is needed to safely fold a constant array reference. Another question. How is a RANGE_EXPR used in a C++'s array constructor? The CONSTRUCTOR section of tree.def says The TREE_PURPOSE of each node is the corresponding index. If the TREE_PURPOSE is a RANGE_EXPR, it is a short-hand for many nodes, one for each index in the range. (If the corresponding TREE_VALUE has side-effects, they are evaluated once for each element. Wrap the value in a SAVE_EXPR if you want to evaluate side effects only once.) I created an array with more than one thousand elements. I still did not see a RANGE_EXPR in the array's CONSTRUCTOR. How do I get a RANGE_EXPR in a CONSTRUCTOR? Kazu Hirata
Re: Questions about a constant array reference in the C++ front end
Hi Nathan, > this is untrue. the elements hold the qualification. Then how do I know that an array is declared with const (or static const)? When I looked at the CONSTRUCTOR stored in the DECL_INITIAL of an array, I saw it with the TREE_STATIC flag set regardless of whether the array is declared with const, so that's not useful to determine whether the array is declared with const. All I need is a sufficient condition that works with C, C++, and any other language that GCC supports to determine whether it's safe to fold array[CST] into a constant. > Incorrect. RANGE_EXPRs get generated during processing of an array's > initializer -- very early on in the C++ FE. Do you know how to trigger a RANGE_EXPR? I see that in build_zero_init, so I tried a big array with only zeros in it, but that didn't help. Kazu Hirata
Re: Questions about a constant array reference in the C++ front end
Hi Mark, > That's a bug, or, rather, a place where the C++ front-end is failing > to give full information to the optimizer. It should be > TREE_READONLY. There are some tricky bits, in that if we're doing a > dynamic initialization, we presently cannot mark it TREE_READONLY, > but this is a static initialization, so it should be fine. Isn't > TREE_OPERAND (T, 0) the VAR_DECL for array itself? Yes, if T is an ARRAY_REF, TREE_OPERAND (T, 0) is the VAR_DECL for the array itself. > If so, waht's probably going wrong is that it's not being marked > TREE_READONLY because we're afraid of the dynamic initialization > case. We're missing a call to c_apply_quals_to_decl (sp?) > somewhere. Thank you for the function name. I might dig into the C++ front end. Anyway, I filed this "bug" as PR 21454, saying that the contents of a "completely static" array ends up in the .data section, not the .rodata. It turned out to be a regression from 3.x. Kazu Hirata
A question about tree-if-conv.c:tree_if_convert_stmt.
Hi, According to the CVS annotate of tree-if-conv.c, Devang Patel checked in the following code in tree_if_convert_stmt. case GOTO_EXPR: /* Unconditional goto */ add_to_predicate_list (bb_for_stmt (TREE_OPERAND (t, 1)), cond); Note that TREE_OPERAND (t, 1) seems wrong. A GOTO_EXPR takes only one operand. I am pretty sure this part of the code has never been run because if it had been, then the compiler would have ICEd by now. Now, what would be the right fix here? Something like this? case GOTO_EXPR: break; As far as I know, all jumps except computed gotos are are implicitly represented by the CFG. I don't think the if-conv is trying to handle computed gotos here. By the way, this problem has been filed as PR 18472. Kazu Hirata
Re: A question about tree-if-conv.c:tree_if_convert_stmt.
Hi Devang, > Just remove handling of GOTO_EXPR here and in if_convertible_stmt_p(). OK. Will submit a patch. Kazu Hirata
No tail call optimization in Thumb mode?
Hi, Why is tail call optimization for Thumb disabled on GCC? I am wondering if this is a TODO item or something that we cannot do intrinsically. "The ARM-THUMB Procedure Call Standard" says "No tail continuation in Thumb-state" several times in its figures and measurements, but the document doesn't explicitly forbid tail call optimization for Thumb. Thanks in advance, Kazu Hirata
A trouble with libssp in one-tree builds
Hi Jakub, I am having a trouble with libssp in one-tree builds. That is, if I try to build binutils and gcc at the same time, libssp/configure complains while compiling (and linking) the following program and the build process stops. /* confdefs.h. */ #define PACKAGE_NAME "libssp" #define PACKAGE_TARNAME "libssp" #define PACKAGE_VERSION "1.0" #define PACKAGE_STRING "libssp 1.0" #define PACKAGE_BUGREPORT "" #define PACKAGE "libssp" #define VERSION "1.0" /* end confdefs.h. */ int main () { ; return 0; } The problem is that libssp/configure contains a link test, but we may not have crt0.o built yet (assuming targets like h8300-elf, arm-none-eabi, etc). Is there anyway we could stop configuring libssp in this kind of case? Or somehow work around the link test? FWIW, I am sitting on the attached patch to disable libssp for now. Kazu Hirata Index: configure.in === RCS file: /home/gcc/repos/gcc/gcc/configure.in,v retrieving revision 1.355 diff -u -d -p -r1.355 configure.in --- configure.in2 Jul 2005 08:50:56 - 1.355 +++ configure.in4 Jul 2005 04:42:35 - @@ -307,6 +307,14 @@ if test "${ENABLE_LIBADA}" != "yes" ; th noconfigdirs="$noconfigdirs gnattools" fi +AC_ARG_ENABLE(libssp, +[ --enable-libsspBuilds libssp directory], +ENABLE_LIBSSP=$enableval, +ENABLE_LIBSSP=yes) +if test "${ENABLE_LIBSSP}" != "yes" ; then + noconfigdirs="$noconfigdirs configure-target-libssp target-libssp" +fi + # Save it here so that, even in case of --enable-libgcj, if the Java # front-end isn't enabled, we still get libgcj disabled. libgcj_saved=$libgcj
A question about having multiple insns for one operation
Hi, I have a question about having multiple insns for one operation. Take m68k port for example. It has many SImode move insns like so: (define_insn "pushexthisi_const" [(set (match_operand:SI 0 "push_operand" "=m") (match_operand:SI 1 "const_int_operand" "J"))] (define_insn "" [(set (match_operand:SI 0 "nonimmediate_operand" "=a") (match_operand:QI 1 "address_operand" "p"))] (define_insn "movsi_const0" [(set (match_operand:SI 0 "nonimmediate_operand" "=g") (const_int 0))] (define_insn "" ;; Notes: make sure no alternative allows g vs g. ;; We don't allow f-regs since fixed point cannot go in them. [(set (match_operand:SI 0 "nonimmediate_operand" "=g,d,a<") (match_operand:SI 1 "general_src_operand" "daymSKT,n,i"))] "!TARGET_COLDFIRE" Now, is it OK to have this many? IIRC, reload makes certain changes to insns without re-recognizing modified ones. (Well, it used to at least.) A few years ago, I ran into this kind of trouble in H8 port having multiple add insns, which were cleaned up soon afterwards. Aside from reload issues, are there other potential problems with having multiple insns for one operation? Maybe maintainance reasons? Having multiple move insns may make it difficult for people to figure out which pattern a given insn matches because we would have to go through a machine description to find the first matching insn. (By the way, I do understand that it's OK to have multiple patterns for different targets because values of TARGET_xxx don't change within one run of cc1.) FWIW, the second define_insn above could have "plus" in operands[1], in which case it overlaps with other insns with plus. Thanks in advance, Kazu Hirata
A questionable predicate in sh/predicates.md
Hi, In sh/predicates.md, I see (define_predicate "sh_rep_vec" (match_code "const_vector") { int i; rtx x, y; if ((GET_CODE (op) != CONST_VECTOR && GET_CODE (op) != PARALLEL) || (GET_MODE (op) != mode && mode != VOIDmode)) return 0; Notice that match_code at the beginning does not mention PARALLEL, but we have GET_CODE (op) != PARALLEL later. Is this predicate intended to accept PARALLEL as well? If so, should we change the match_code at the beginning? When I did the conversion from PREDICATE_CODES to predicates.md, I did so in a mechanical way, so loose things like this might have gone unnoticed. Kazu Hirata
Re: bootstrap broken
Hi Andreas, /cvs/gcc-svn/trunk/gcc/df-core.c: In function ‘df_compact_blocks’: /cvs/gcc-svn/trunk/gcc/df-core.c:795: error: invalid lvalue in assignment /cvs/gcc-svn/trunk/gcc/df-core.c:803: error: invalid lvalue in assignment /cvs/gcc-svn/trunk/gcc/df-core.c: In function ‘df_bb_replace’: /cvs/gcc-svn/trunk/gcc/df-core.c:833: error: invalid lvalue in assignment I just fixed these. Kazu Hirata
What shall we do with gcc.dg/vect/vect-11.c?
Hi Dorit, What shall we do with gcc.dg/vect/vect-11.c? After I fixed PR tree-optimization/25125, gcc.dg/vect/vect-11.c started failing on some targets but not on x86_pc-linux-gnu. If we XFAIL it, we would get XPASS on x86_pc-linux-gnu. I am pretty sure that adding -fwrapv will fix the problem, but that means we are changing the test case. Shall we add something like gcc.dg/vect/fwrapv-vect-11.c, remove vect-11.c, and file a missed optimization PR for vect-11.c? Any thoughts? Kazu Hirata
Re: What shall we do with gcc.dg/vect/vect-11.c?
Hi Eric, What shall we do with gcc.dg/vect/vect-11.c? After I fixed PR tree-optimization/25125, gcc.dg/vect/vect-11.c started failing on some targets but not on x86_pc-linux-gnu. It's gcc.dg/tree-ssa/gen-vect-11.c and it presumably fails everywhere. But it is not run on x86-64 and ia64 so you don't see the failure there. Oops, I now see what's going on. In this case, I think we can just XFAIL gcc.dg/tree-ssa/gen-vect-11.c. Kazu Hirata
Use of VARRAY in alias analysis
Hi Daniel and Diego, Several months ago, you mentioned that the alias analysis in gcc would be overhauled. Has this happened yet? If not, I am thinking about working on alias.c and tree-ssa-alias.c as there are only a handful of VARRAY uses left. Thanks, Kazu Hirata
Re: Use of VARRAY in alias analysis
Hi Diego, Several months ago, you mentioned that the alias analysis in gcc would be overhauled. Has this happened yet? If not, I am thinking about working on alias.c and tree-ssa-alias.c as there are only a handful of VARRAY uses left. It's happening in the mem-ssa branch. But switching from VARRAY to VEC will probably not affect my work, so go ahead. I guess that's what you have in mind? Switching VARRAY to VEC? Yes, thanks! Kazu Hirata
A question about tree-ssa-structalias.h.
Hi Daniel and Diego, I see that tree-ssa-structalias.h contains varray_type num_references; around line 58 as a member of alias_info, but it doesn't seem to be used anywhere. Is this going to be used in near future? VARRAY is going away pretty soon, so we have to do something -- either remove the line or convert it to VEC somehow. Thanks, Kazu Hirata
A question about the VEC API
Hi Nathan, Does the VEC API support the following? typedef unsigned char vec_uchar; DEF_VEC_I(vec_uchar); DEF_VEC_ALLOC_I(vec_uchar,gc); Note that this is GC'ed. When I use this construct in except.c, I get libbackend.a(except.o): In function `gt_ggc_mx_eh_status': ./gt-except.h:104: undefined reference to `gt_ggc_mx_VEC_vec_uchar_gc' ./gt-except.h:106: undefined reference to `gt_ggc_mx_VEC_vec_uchar_gc' libbackend.a(except.o): In function `gt_pch_nx_eh_status': ./gt-except.h:221: undefined reference to `gt_pch_nx_VEC_vec_uchar_gc' ./gt-except.h:223: undefined reference to `gt_pch_nx_VEC_vec_uchar_gc' The reason I'd like to use GC memory for these integer arrays is because the structure, namely eh_status, that contains the integer arrays is GC'ed. I have a workaround like so typedef struct vec_uchar_struct GTY(()) { unsigned char uc; } vec_uchar; DEF_VEC_O(vec_uchar); DEF_VEC_ALLOC_O(vec_uchar,gc); But this is more complicated than it should be. If the VEC API does not support an integer array in GC, what should I do to add that? Some sketch would be greatly appreciated. Thanks, Kazu Hirata
CC0 questions
Hi, Can a jump_insn use cc0 but not actually jump? The following instruction is found in the H8 port. (jump_insn 18 17 20 (set (reg:HI 3 r3) (eq:HI (cc0) (const_int 0 Is there a requirement that every cc0 user must be a jump_insn? Can there be two consecutive insns that use cc0 after cc0 is set? (I don't think so. I think there should be one-to-one correspondence between cc0 setters and cc0 users, but I've never seen a rule written somewhere.) Thanks, Kazu Hirata
A question about your patch for PR c++/26534
Hi Mark, I have a question about your patch for PR c++/26534. In particular, you added to standard_conversion in call.c the following code: if (expr) { tree bitfield_type; bitfield_type = is_bitfield_expr_with_lowered_type (expr); if (bitfield_type) from = bitfield_type; } Now consider the following testcase derived from PR c++/27505 and my analysis on why it fails. struct s { bool field:8; }; void foo (struct s *p) { if (!p->field) ; } When build_unary_op builds TRUTH_NOT_EXPR, it calls perform_implicit_conversion to convert p->field to the boolean type. (FWIW, p->field is expressed as >). The call to perform_implicit_conversion eventually gets to standard_conversion. The code fragment of standard_conversion shown above changes FROM to the boolean_type. Since TO is also the boolean type, no conversion happens, causing perform_implicit_conversion to return the original expression, which is of INTEGER_TYPE. invert_truthvalue, called from cp/typeck.c:3969, expects nothing but expressions of BOOLEAN_TYPE, so it ICEs at fold-const.c:3165. Now, is the code in standard_conversion meant to be an optimization or something that's required for language conformance? If the latter is the case, can we somehow force a conversion? Thanks, Kazu Hirata
A question about TYPE_ARG_TYPES
Hi, I keep finding places in GCC sources that check whether a member of TYPE_ARG_TYPES is 0. For example, for (link = TYPE_ARG_TYPES (function_or_method_type); link && TREE_VALUE (link); link = TREE_CHAIN (link)) gen_type_die (TREE_VALUE (link), context_die); Notice that TREE_VALUE (link) is part of the loop condition. Now, do we ever allow a NULL in TYPE_ARG_TYPES? If so, what does that mean? My guess is that soneone was trying to be cautious about encountering a NULL in TYPE_ARG_TYPES. (If that's the case, we should be using gcc_assert instead.) Thanks, Kazu Hirata
Re: A question about TYPE_ARG_TYPES
Hi Ian, I keep finding places in GCC sources that check whether a member of TYPE_ARG_TYPES is 0. For example, for (link = TYPE_ARG_TYPES (function_or_method_type); link && TREE_VALUE (link); link = TREE_CHAIN (link)) gen_type_die (TREE_VALUE (link), context_die); Notice that TREE_VALUE (link) is part of the loop condition. Now, do we ever allow a NULL in TYPE_ARG_TYPES? If so, what does that mean? My guess is that soneone was trying to be cautious about encountering a NULL in TYPE_ARG_TYPES. (If that's the case, we should be using gcc_assert instead.) Just guessing here, but what happens with an old-style function definition in C? void f(); AFAIK, that gets TYPE_ARG_TYPES (...) == NULL, so we don't even get to evaluate TREE_VALUE (TYPE_ARG_TYPES (...)). On IRC, Daniel Berlin claims that there are some weird cases where TREE_VALUE (TYPE_ARG_TYPES (...)) is NULL. I'll keep putting gcc_assert to see what happens. Kazu Hirata
A question about ARG_FINAL_P in the Java frontend.
Hi, I'm planning to change TYPE_ARG_TYPES to use TREE_VEC instead of TREE_LIST for compact representation. I just noticed that the Java frontend has ARG_FINAL_P, which uses a bit in the TREE_LIST node that is pointed to from TYPE_ARG_TYPES. I am wondering if there is any way we could move this bit elsewhere. The TREE_VEC node doesn't really have space for per-parameter data other than the vector proper. I could use TREE_TYPE of TREE_VEC to keep an array of boolean values to represent ARG_FINAL_P for each parameter, but I am not sure if that is a clean solution. Thoughts? Kazu Hirata
Re: A question about ARG_FINAL_P in the Java frontend.
Hi Tom, Kazu> I just noticed that the Java frontend has ARG_FINAL_P, which uses a Kazu> bit in the TREE_LIST node that is pointed to from TYPE_ARG_TYPES. Kazu> I am wondering if there is any way we could move this bit elsewhere. On the gcj-eclipse branch the code that uses ARG_FINAL_P is actually no longer used. It hasn't been deleted yet but it won't ever run. I'm hoping to merge this to trunk after 4.2 branches ... will that help? Yes, that definitely helps. Thanks, Kazu Hirata
Re: A question about ARG_FINAL_P in the Java frontend.
Hi Mark, Yes. Kazu, I'd suggest you just ignore Java; you can still get proof-of-concept for tree-trimming without Java. The ECJ changes are going to be massive, and they're going to go in before we get our stuff ready to go in, so dealing with Java now is probably a waste of time; we'll have to regroup after ECJ goes in. OK. Kazu Hirata
A question about cp/pt.c:type_unification_real
Hi, I have a question about cp/pt.c:type_unification_real. What does "args" represent? The comment above type_unification_real says "most parms like fn_type_unification." According to fn_type_unification, "args" is supposed to point to function arguments. In fact, when I backtraced a call to type_unification_real, I traced "args" to the second argument of build_new_function_call, which indicates that "args" are function arguments. However, "args" appears to be a parameter list, too. For example, type_unification_real contains code like: parms = xparms; args = xargs; while (parms && parms != void_list_node && args && args != void_list_node) Note that "args" is compared against void_list_node. As another example, also in type_unification_real, we have if (args && args != void_list_node && parms == void_list_node) return 1; Note again that "args" is compared to void_list_node. Even more confusing is: arg = TREE_VALUE (args); : : if (!TYPE_P (arg)) Can a function argument be a type? Clarification would be greatly apprecaited. Thanks, Kazu Hirata
Re: GCC 4.3 project to merge representation changes
Hi Mark, I don't believe there is a GCC 4.3 project page to merge the work that you folks did on CALL_EXPRs and TYPE_ARG_TYPEs. Would one of you please create a Wiki page for that? What would you suggest me to do for missing bits? Specifically, most backends with the exception of x86 are broken because I haven't converted uses of TYPE_ARG_TYPEs in those backends. The ARM port was broken at the time of branch creation. The Java frontend uses a flag within the TREE_LIST object that makes up TYPE_ARG_TYPEs, so it is blocking the propsed merge. (Java maintainers are planning to fix this in future.) So, Sandra's CALL_EXPEs stuff may be ready for merge, but my TYPE_ARG_TYPE stuff has a somewhat long way to go. Thanks, Kazu Hirata
[RFC] fold Reorganization Plan
Hi, I am planning to reorganize fold as suggested by Roger Sayle on IRC. The shortest way to describe this mini project would be to develop the tree equivalent of simplify_gen_ARITY and simplify_ARITY in the RTL world. Doing so should reduce the number of scratch tree nodes created when idioms like fold (buildN (...)) are used. Hopefully, we should be able to pick up some compile-time improvement along the way. Step 1 -- Split fold into fold_unary, fold_binary, etc. Make fold a simple dispatch function into fold_unary, fold_binary, etc. The interfaces are kept exactly the same. We would pass one tree as the only argument. Each function returns the folded tree or the original tree. No change to the external interface (outside of fold-const.c) is made. Step 2 -- Eliminate the direct use of the original tree. For example, fold currently has things like fold_binary_op_with_conditional_arg (t, ...) and fold_range_test (t) These functions, as they stand, would not work without the original tree "t". We need to change their interfaces so that they will work with decomposed arguments like code, type, op0, and op1. Again, no change to the external interface (outside of fold-const.c) is made. Step 3 -- Change fold_unary, fold_binary, etc, so that they will return NULL_TREE instead of the original tree when no folding is performed. Change fold accordingly so that it will still return either the original tree or the folded tree (but not NULL_TREE). Again, no change to the external interface (outside of fold-const.c) is made. Step 4 -- Change fold_unary, fold_binary, etc so that they will take decomposed arguments like code, type, op0, op1. At this point, fold_ARITY functions should be just like their RTL equivalent, simplify_ARITY. Change fold accordingly. Again, no change to the external interface (outside of fold-const.c) is made. Step 5 -- Provide fold_buildN as extern functions. Step 6 -- Convert fold (buildN (...)) to fold_buildN. This is very mechanical but very disturbing at the same time. We need to coordinate thing first with various people, especially those maintaining branches. One thing I can say is that converting a little by little would be even more disturbing than the one-shot conversion as people with patches spanning several files may have to adjust their patches several times. Step 7 -- Export fold_ARITY. Step 8 -- Look for places where we can use fold_ARITY and convert them. Step 9 -- Enjoy the result and continue to hack GCC. :-) Summary --- The point is to do as much cleanup and reorganization as possible without changing the external interface before making the big conversion. By the way, the past proposals from Roger Sayle are found at: http://gcc.gnu.org/ml/gcc-patches/2003-10/msg01514.html http://gcc.gnu.org/ml/gcc/2004-01/msg00560.html Both of these are along the same lines as above. Any comments? Kazu Hirata
Re: [RFC] fold Reorganization Plan
Hi Nathan, > I question if it is better to fold early. As I've said before, I think > the optimizations that fold performs should be turned into a proper SSA > optimization phase% -- that can be repeatedly applied as necessary. In the > front end, folding should not generally be done. I see two needs for > early folding, I may not be quite answering your question, but I have a comment. Maybe we can have an early fold and a general fold. The former would handle constant expressions for front ends. The latter is a full fledged version but optimized to handle GIMPLE statements. The reasons to optimize the full fledged version to handle GIMPLE statements include 1) We can remove parts of fold handling those cases that never occur in the GIMPLE world. 2) Currently, fold has so many transformations that look into a heavily nested tree, but all of those are useless in the GIMPLE world unless one builds scratch node and passes it to fold. An example of such a transformation would be: (A * C) + (B * C) -> (A + B) * C We would express this in GIMPLE as D = A * C; E = B * C; F = D + E; Given D + E, we can instead have fold chase SSA_NAME_DEF_STMT of D and E so that the above transformation is performed. Whether we want to always chase SSA_NAME_DEF_STMT is another question. Richard Henderson once suggested putting a hook for chasing. In a tree combiner, we may want to limit SSA_NAME_DEF_STMT chasing to the case where the SSA_NAME is used only once. In other situations, we might want to have a more relaxed hook although I cannot come up with a specific example. Kazu Hirata
Re: [RFC] fold Reorganization Plan
Hi Nathan, > hm, we may be in violent agreement :) It depends what you mean > by 'early fold'. You say it would handle constant expressions for FEs > -- isn't that the same as what I described as a constant expression > evaluator? Yes. > After all, if it is just for constant exprs, it is required > to 'fold' to a _CST node -- or give an error. If I've misunderstood, > could you clarify? Can a compile-time constant appearing in an initializer be as wild as the following? 0 ? (foo () + 9) : (3 + 5) Here (foo () + 9) does not fold to a constant, but the whole expression does fold to 8. > > The reasons to optimize the full fledged version to handle GIMPLE > > statements include > > > > 1) We can remove parts of fold handling those cases that never occur > >in the GIMPLE world. > Do you have examples of what this would be? Certainly, a GIMPLE-specific fold wouldn't need to handle TRUTH_ANDIF_EXPR. I cannot come up with a better example right now. TRUTH_ANDIF_EXPR wouldn't be a good example because we are just removing one case of the big switch statement in fold, which is unlikyly to give any measurable speed-up. > I have no feeling for what might be foldable in GIMPLE. I'm curious > (I don't think it'll affect the discussion though). x & x -> x ? :-) > hm, there's really two kinds of thing that need to happen, > 1) the kind of reassociation & strength reduction that you describe > 2) folding of constant expressions, for when we discover that some of > A, B and C are constants. > > I don't know whether these operations should be part of the same SSA > optimization or not. #2 is more of a constant propagation kind of > thing I guess. #1 is the kind of thing that has made const-fold so > complicated. #1 is the important thing to add to the SSA optimizers, > isn't it? Yes. Some transformations that happen in fold would involve CFG manipulation in the GIMPLE world. Those transformations include TRUTH_{AND,OR}IF_EXPR, a lot of COND_EXPR manipulations, etc. If these are good transformations, we need to move them to the SSA optimizers as you mentioned.. Kazu Hirata
Re: RFC -- CSE compile-time stupidity
Hi Jeff, > Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK > and SUBREG_TICKED saves about 1% compilation time for the components > of cc1. Yes, that's a net 1% improvement by dropping the abstraction > layer. Yes, I've noticed the problem. In my defense, the code in question was even worse before I touched it. :-) With the old code, every time we access a cse_reg_info entry that is different from the last access, we were generating a function call. Nowadays, we avoid calls to get_cse_reg_info_1 95% of the time. Of course, it's tough to beat the performance of your explicit initialization approach, but here are couple of things that I have thought about while keeping some abstraction layer. The first thought is to expose the timestamp update to the user of those macros that you mentioned. /* Find a cse_reg_info entry for REGNO. */ static inline struct cse_reg_info * get_cse_reg_info (unsigned int regno) { struct cse_reg_info *p = &cse_reg_info_table[regno]; /* If this entry has not been initialized, go ahead and initialize it. */ if (p->timestamp != cse_reg_info_timestamp) { get_cse_reg_info_1 (regno); p->timestamp = cse_reg_info_timestamp; /* <- Look! */ } return p; } This way, DOM may be able to do jump threading to some extent and remove a lot of the timestamp checks. Of couse, jump threading opportunities are blocked when we have a non-pure/const function call like so: for (i = regno; i < endregno; i++) { if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i)) remove_invalid_refs (i); /* <- Look! */ REG_IN_TABLE (i) = REG_TICK (i); SUBREG_TICKED (i) = -1; } The second thought is to initialize all of cse_reg_info entries at the beginning of cse_main. Set aside a bitmap with as many bits as max_regs. Whenever we use one of these accessor macros for register k, set a bit k saying "cse_reg_info_table[k] is in use." This way, when we are done with a basic block, we can walk the bitmap and reinitialize those that are used. Again, a good optimizer should be able to eliminate most of these bit sets, but a non-pure/const function call will block the cleanup opportunities. Of course, this bitmap walk is far more expensive than cse_reg_info_timestamp++. Kazu Hirata
Re: RFC -- CSE compile-time stupidity
Hi Jeff, > The second thought is to initialize all of cse_reg_info entries at the > beginning of cse_main. Set aside a bitmap with as many bits as > max_regs. Whenever we use one of these accessor macros for register > k, set a bit k saying "cse_reg_info_table[k] is in use." This way, > when we are done with a basic block, we can walk the bitmap and > reinitialize those that are used. Again, a good optimizer should be > able to eliminate most of these bit sets, but a non-pure/const > function call will block the cleanup opportunities. Of course, this > bitmap walk is far more expensive than cse_reg_info_timestamp++. I just tried this. It comes very close to the current timestamp approach but loses by 0.3% or so in compile time. I'm attaching a patch for completeness. Kazu Hirata Index: cse.c === RCS file: /cvs/gcc/gcc/gcc/cse.c,v retrieving revision 1.343 diff -c -d -p -r1.343 cse.c *** cse.c 21 Feb 2005 02:02:19 - 1.343 --- cse.c 21 Feb 2005 19:41:04 - *** struct cse_reg_info *** 328,346 /* A table of cse_reg_info indexed by register numbers. */ struct cse_reg_info *cse_reg_info_table; ! /* The size of the above table. */ ! static unsigned int cse_reg_info_table_size; ! ! /* The index of the first entry that has not been initialized. */ ! static unsigned int cse_reg_info_table_first_uninitialized; ! ! /* The timestamp at the beginning of the current run of !cse_basic_block. We increment this variable at the beginning of !the current run of cse_basic_block. The timestamp field of a !cse_reg_info entry matches the value of this variable if and only !if the entry has been initialized during the current run of !cse_basic_block. */ ! static unsigned int cse_reg_info_timestamp; /* A HARD_REG_SET containing all the hard registers for which there is currently a REG expression in the hash table. Note the difference --- 328,334 /* A table of cse_reg_info indexed by register numbers. */ struct cse_reg_info *cse_reg_info_table; ! static sbitmap cse_reg_info_init_bitmap; /* A HARD_REG_SET containing all the hard registers for which there is currently a REG expression in the hash table. Note the difference *** notreg_cost (rtx x, enum rtx_code outer) *** 841,908 static void init_cse_reg_info (unsigned int nregs) { ! /* Do we need to grow the table? */ ! if (nregs > cse_reg_info_table_size) ! { ! unsigned int new_size; ! ! if (cse_reg_info_table_size < 2048) ! { ! /* Compute a new size that is a power of 2 and no smaller !than the large of NREGS and 64. */ ! new_size = (cse_reg_info_table_size ! ? cse_reg_info_table_size : 64); ! ! while (new_size < nregs) ! new_size *= 2; ! } ! else ! { ! /* If we need a big table, allocate just enough to hold !NREGS registers. */ ! new_size = nregs; ! } ! /* Reallocate the table with NEW_SIZE entries. */ ! if (cse_reg_info_table) ! free (cse_reg_info_table); ! cse_reg_info_table = xmalloc (sizeof (struct cse_reg_info) !* new_size); ! cse_reg_info_table_size = new_size; ! cse_reg_info_table_first_uninitialized = 0; ! } ! /* Do we have all of the first NREGS entries initialized? */ ! if (cse_reg_info_table_first_uninitialized < nregs) { ! unsigned int old_timestamp = cse_reg_info_timestamp - 1; ! unsigned int i; ! ! /* Put the old timestamp on newly allocated entries so that they !will all be considered out of date. We do not touch those !entries beyond the first NREGS entries to be nice to the !virtual memory. */ ! for (i = cse_reg_info_table_first_uninitialized; i < nregs; i++) ! cse_reg_info_table[i].timestamp = old_timestamp; ! ! cse_reg_info_table_first_uninitialized = nregs; } } ! /* Given REGNO, initialize the cse_reg_info entry for REGNO. */ static void ! get_cse_reg_info_1 (unsigned int regno) { ! /* Set TIMESTAMP field to CSE_REG_INFO_TIMESTAMP so that this ! entry will be considered to have been initialized. */ ! cse_reg_info_table[regno].timestamp = cse_reg_info_timestamp; ! ! /* Initialize the rest of the entry. */ ! cse_reg_info_table[regno].reg_tick = 1; ! cse_reg_info_table[regno].reg_in_table = -1; ! cse_reg_info_table[regno].subreg_ticked = -1; ! cse_reg_info_table[regno].reg_qty = -regno - 1; } /* Find a cse_reg_info entry for REGNO. */ --- 829,857 static void init_cse_reg_info (unsigned int nregs) { ! unsigned int i; ! cse_reg_info_table = xmalloc (sizeof (struct cse_reg_info) * nregs); ! for (i = 0; i < nr