LOOP_HEADER tree code?
Hello, for project http://gcc.gnu.org/wiki/PreservingLoops, I am considering introducing a tree LOOP_HEADER with single argument N (number of iterations of the loop), that would be present in IL at the beginning of header of each loop. I have following motivations: 1) One of the goals of the project is to avoid recomputing number of iterations of loops, and to keep this information (analysing # of iterations is fairly expensive, and we recompute this information quite a lot at the moment). To keep the information valid, we need to prevent optimizations from destroying it (e.g., if the number is n_1 = n_2 - 1, and this is the last use of n_1, we do not want DCE to remove it); this is easy to achieve if n_1 would be the argument of LOOP_HEADER. Without this tree node in IL, we would need to handle this specially in DCE, and possibly also other optimizers. 2) This statement might make it simpler to ensure that various optimizers update the loops correctly. I am not quite sure whether this is a good idea, though. Are there some reasons why not to do this, or any other problems with the idea? Zdenek
Re: LOOP_HEADER tree code?
Hello, > On 10/23/06, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: > >Hello, > > > >for project http://gcc.gnu.org/wiki/PreservingLoops, I am considering > >introducing a tree LOOP_HEADER with single argument N (number of > >iterations of the loop), that would be present in IL at the beginning of > >header of each loop. I have following motivations: > > > >1) One of the goals of the project is to avoid recomputing number of > > iterations of loops, and to keep this information (analysing # of > > iterations is fairly expensive, and we recompute this information > > quite a lot at the moment). To keep the information valid, we need > > to prevent optimizations from destroying it (e.g., if the number > > is n_1 = n_2 - 1, and this is the last use of n_1, we do not want > > DCE to remove it); this is easy to achieve if n_1 would be the > > argument of LOOP_HEADER. Without this tree node in IL, we would need > > to handle this specially in DCE, and possibly also other optimizers. > >2) This statement might make it simpler to ensure that various > > optimizers update the loops correctly. > > > >I am not quite sure whether this is a good idea, though. Are there some > >reasons why not to do this, or any other problems with the idea? > > > > Why not keeping this information outside the IL I am not sure what you mean by that? IMHO this is what I propose the LOOP_HEADER tree node for. > and making sure that code > transformers are updating that information if they modify the symbols > involved > in that representation? > > The same problems will also occur when transmitting to the backend > other informations attached to loop structures. I'm not sure that > we want to create a new node for data dependences as well... for data dependences, you need to transfer just dependence vectors/distances, i.e., you do not need to worry about symbols. So I do not see a reason for using some node in IL for data dependences. Zdenek
Re: LOOP_HEADER tree code?
Hello, > >> > quite a lot at the moment). To keep the information valid, we need > >> > to prevent optimizations from destroying it (e.g., if the number > >> > is n_1 = n_2 - 1, and this is the last use of n_1, we do not want > >> > DCE to remove it); this is easy to achieve if n_1 would be the > >> > argument of LOOP_HEADER. > > You are proposing to complete the ssa representation such that > foreach_ssa_uses also iterates over the niter information (a bit like vrp > modifies the ssa chains with its extra assert information). Wouldn't it > be possible to not insert this niter information in the representation of > the > program (ie. not inserting a new tree node) but just modify the ssa > iterators > to also return the expressions that we store on the side? yes, it would be possible, however, possibly quite error-prone. > >> > Without this tree node in IL, we would need > >> > to handle this specially in DCE, and possibly also other optimizers. > > What are the other transformations that remove definitions that are not > used? ivopts remove bivs that they replace, and I suspect that vectorizer does something similar as well. I do not know about other optimizers. Zdenek
Re: LOOP_HEADER tree code?
Hello, > >> You are proposing to complete the ssa representation such that > >> foreach_ssa_uses also iterates over the niter information (a bit like vrp > >> modifies the ssa chains with its extra assert information). Wouldn't it > >> be possible to not insert this niter information in the representation of > >> the > >> program (ie. not inserting a new tree node) but just modify the ssa > >> iterators > >> to also return the expressions that we store on the side? > > > >yes, it would be possible, however, possibly quite error-prone. > > > >> What are the other transformations that remove definitions that are not > >> used? > > > >ivopts remove bivs that they replace, and I suspect that vectorizer > >does something similar as well. I do not know about other optimizers. > > > > The scev info has the same problem as the niter info, as we store it on > the side, making it vulnerable to any code transform. Now if we modify > the var renaming, removal, ... aware of the information that we store > on the side, we could avoid all these rather costly re-computations. I did that once. Updating scev info turned out to be more expensive than recomputing it. Zdenek
Maintainer(s) for loop optimizer(s)
Hello, at the moment, RTL level loop optimizers and most of the tree-level loop optimizers do not have assigned specific maintainers. I think this clearly starts to become a significant problem -- many of the loop-optimizer related patches in 4.2 timeframe remained unreviewed (the vectorizer improvement changes, some of my patches for ivopts cleanup, etc.). In 4.3 and 4.4 timeframe, there are several projects related to loop optimizations (http://gcc.gnu.org/wiki/AutovectBranchOptimizations, http://gcc.gnu.org/wiki/PredictiveCommoning, http://gcc.gnu.org/wiki/AutomaticParallelization, http://gcc.gnu.org/wiki/PrefetchingImprovements, http://gcc.gnu.org/wiki/PreservingLoops, Graphite), that I fear might face delays because of the same problem (see e.g. the (lack of) results of my attempt to find reviewers for some of them, http://gcc.gnu.org/ml/gcc/2006-09/msg00477.html). And, of course, there are other situations where the fact that there is no official maintainer is a problem (when looking for someone able to fix a bug in one of the optimizers, or if someone has questions regarding them). On gcc summit, I discussed this with several active loop optimization developers (Daniel Berlin, Sebastian Pop), and we agreed that one possible solution would be to have myself and Daniel Berlin to co-maintain the loop optimizers. This was proposed to the steering committee several months ago; however, since there was no progress in the meantime, I assume this proposal was rejected. Therefore, I would like to ask steering committee to propose an alternative solution. There does not seem to be any easy way how to address a mail just to the steering committee, so I am sending this to the gcc mailing list in hope that it will reach the right recipients. Zdenek
Re: LOOP_HEADER tree code?
Hello, > >for project http://gcc.gnu.org/wiki/PreservingLoops, I am considering > >introducing a tree LOOP_HEADER with single argument N (number of > >iterations of the loop), that would be present in IL at the beginning of > >header of each loop. I have following motivations: > > > >1) One of the goals of the project is to avoid recomputing number of > > iterations of loops, and to keep this information (analysing # of > > iterations is fairly expensive, and we recompute this information > > quite a lot at the moment). To keep the information valid, we need > > to prevent optimizations from destroying it (e.g., if the number > > is n_1 = n_2 - 1, and this is the last use of n_1, we do not want > > DCE to remove it); this is easy to achieve if n_1 would be the > > argument of LOOP_HEADER. Without this tree node in IL, we would need > > to handle this specially in DCE, and possibly also other optimizers. > >2) This statement might make it simpler to ensure that various > > optimizers update the loops correctly. > > However, various optimizer needs to know about this special tree node. not really (not any more than they know about other tree codes that are not interesting for them). Zdenek > Or am I missing something? > > > > >I am not quite sure whether this is a good idea, though. Are there some > >reasons why not to do this, or any other problems with the idea? > > > >Zdenek > > > - > Devang
Re: LOOP_HEADER tree code?
Hello, > > > for project http://gcc.gnu.org/wiki/PreservingLoops, I am considering > > > introducing a tree LOOP_HEADER with single argument N (number of > > > iterations of the loop), that would be present in IL at the beginning of > > > header of each loop. I have following motivations: > > > > > >1) One of the goals of the project is to avoid recomputing number of > > > iterations of loops, and to keep this information (analysing # of > > > iterations is fairly expensive, and we recompute this information > > > quite a lot at the moment). To keep the information valid, we need > > > to prevent optimizations from destroying it (e.g., if the number > > > is n_1 = n_2 - 1, and this is the last use of n_1, we do not want > > > DCE to remove it); this is easy to achieve if n_1 would be the > > > argument of LOOP_HEADER. Without this tree node in IL, we would need > > > to handle this specially in DCE, and possibly also other optimizers. > > >2) This statement might make it simpler to ensure that various > > > optimizers update the loops correctly. > > > > However, various optimizer needs to know about this special tree node. > > Or am I missing something? > Correct. Take for example jump threading -- it can peel off a loop > iteration which would invalidate the LOOP_HEADER node. of course, if it does that, it must update the information at the loop (regardless of whether we use the LOOP_HEADER node for it, or not). > That seems > like a recipe for disaster. No, that seems like a good way how to detect that jump threading is doing something it should not, or forgetting to update something in the progress. > Code motion passes would have to be aware of this special node as well. > And we've seen how well that works in the RTL backend with its loop > notes -- it was a major PITA. The problem was that loop notes was something special, different from other RTL statements. Of course, we can live without LOOP_HEADER tree node; however, that means that more optimizers will have to be aware of information stored in loop structures, which looks more like the case with the loop notes to me. Zdenek > I'd rather not repeat the mistakes we've already made. > > Jeff >
Re: Re: LOOP_HEADER tree code?
Hello, > >So, the passes that maniuplate loop structure need to know about > >LOOP_HEADER and others do not need to worry about LOOP_HEADER. > > More acurately, the passes that manipulate the cfg. Right now most of > these passes don't even know they modify the loop structure. > > >Now, focusing on the passes that manipulate loop structure. Are these > >pass responsible for fixing loop info or is it responsiblity of cleanup > >pass ? > > It seems to me that a cleanup pass would defeat the purpose of keeping > loop info up to date. Your cleanup pass would probably end up just > recomputing everything. > > That said, I don't really see what a LOOP_HEADER node would give you > that you can't get by making the cfg-modifying passes actually > loop-aware, or perhaps by using cfghooks to update the loop > information on the fly when a pass changes the CFG. It would be > helpful if Zdenek could give an example where a LOOP_HEADER node is > really the only way to help keep loop info accurate. it definitely is not the only way, and seeing the reaction of people, I probably won't use it. The main reason for considering to use the tree node for me was the possibility to make the number of iterations of the loop as its operand, so that I would not need to worry about keeping it alive through dce, copy/constant propagation, etc. (without a statement carrying it in IL, I do not see a solution that would not be just asking for introducing bugs and getting broken accidentally). Zdenek
Re: Re: Re: Re: Re: Re: LOOP_HEADER tree code?
Hello, > On 10/25/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: > > > >So you would mark n_1 with TREE_USED, and never let it be removed? > >What would happen if e.g. the entire loop turns out to be dead code? > >Or if the loop is rewritten (e.g. vectorized) in a way that changes > >the number of iterations of the loop? Then the assignment to n_1 would > >be _really_ dead, but there wouldn't be any way to tell. > > When it is really dead, you'll have to remove LOOP_HEADER node > anyway. if the loop becomes unreachable, the LOOP_HEADER statement gets removed like any other statement. If there are some changes to the structure of the loop (the loop is cancelled by some optimizer, or the number of iterations changes), the optimizer is required to update loop structures, and LOOP_HEADER node is removed/updated in the progress. These updates need to be done on very few places (basically only in the routines provided to manipulate the loop structures). What is quite important, it is possible to verify the correctness of the code (from e.g. verify_flow_info), and thus quickly catch any problems. > Right ? Why not instead reset the TREE_USED bit and let DCE do > its job ? Many optimizers would need to be taught to know about TREE_USED (or any other bit you would use for that). We do not have this type of restriction for any other ssa names, so this would be brand new functionality (on the other hand, most optimizers would not need to know anything special about LOOP_HEADER statements). This definitely is not a way I would take if I decide not to use LOOP_HEADER. I would probably add some sort of "implicit" uses placed on some basic blocks (or in loops). It would be a bit more difficult to teach the relevant optimizers to know about these uses, but at least I won't be introducing a new brand of ssa names that behave differently than all the other ssa names. Zdenek
Re: Re: LOOP_HEADER tree code?
Hello, > On Wed, 2006-10-25 at 13:01 -0700, Devang Patel wrote: > > > > However, various optimizer needs to know about this special tree node. > > > > > > not really (not any more than they know about other tree codes that are > > > not interesting for them). > > > > If we take an example of Jump Threading pass then it needs to know > > about this tree node and update it properly. > That could get awful ugly. actually, that will be trivial once jump threading updates loop properly (I have a patch for that). > > So, the passes that maniuplate loop structure need to know about > > LOOP_HEADER and others do not need to worry about LOOP_HEADER. > Passes which do code motions may need to know about it -- they don't > need to update its contents, but they may need to be careful about > how statements are moved around in the presence of a LOOP_HEADER note. Which is IMHO good, as it will at least ensure that they preserve loops, in case it is really relevant for them -- just now, I do not see how code motion could affect loop structures (unless it changes CFG) -- do you have some concrete example of transformation that might be made more problematic by the existence of LOOP_HEADER node? Zdenek
Re: Re: LOOP_HEADER tree code?
Hello, > On Thu, 2006-10-26 at 00:45 +0200, Zdenek Dvorak wrote: > > > actually, that will be trivial once jump threading updates loop properly > > (I have a patch for that). > I'd like to see that. > > I recall poking at having threading update things like loop exit > points and gave up. The problem is you could thread to a loop > exit, which in turn might move the loop exit edge to a new > point in the CFG. Furthermore, all the statements dominated > by the new loop exit edge are no longer in the loop (when they > previously were part of the loop). > > It's possible these problems could be solved by querying the > dominator structures, but IIRC they aren't accurate at this > point. what my patch does is keeping the loops up-to-date. To do that, I had to disable some of the threadings (mostly those that made us create new subloops of the existing loops); on the other hand, some other threadings that are now forbidden turned out to be valid, and overall I think none of the important cases is affected. I will update and submit the patch sometime later this week. > Your updates may be simpler/easier since you're updating > something different. > > [ This was a couple years ago, so if I've got details wrong, > please forgive me. ] > > > > Which is IMHO good, as it will at least ensure that they preserve loops, > > in case it is really relevant for them -- just now, I do not see how > > code motion could affect loop structures (unless it changes CFG) -- do > > you have some concrete example of transformation that might be made > > more problematic by the existence of LOOP_HEADER node? > It wasn't a matter of changing the loop structure -- the RTL loop > optimizer had 3 or 4 notes IIRC and the location of each note > in relation to code labels and other insns was important. Code > motion would sometimes place an insn between the note and a > label. That in turn caused the loop optimizer to barf. There > were also problems with edge splitting doing similar kinds of > things. Main source of these problems with the loop notes was that their validity was never verified (only in the loop optimizer itself); so you did not know you made something wrong with them, until some problem in the loop optimizer surfaced. This should not be the case with LOOP_HEADER statements, as their validity would be verified (in verify_flow_info or verify_loop_structures). What made the problems with loop notes worse was that loop discovery itself was based on them. This meant that it was quite easy to get the loop optimizer confused if you changed the positions of the notes. Since nowadays the loop discovery is based purely on CFG, this would not be a problem. For these reasons, I do not think that making the optimizers keep LOOP_HEADER statements and loops in sync would be too difficult (at least not more difficult than making the optimizers to keep CFG and loop structures in sync). Zdenek
Re: Induction variable optimization
Hello, > I am a M.E.Computer science student and doing project on induction variable > optimization. > > Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to know > about what have implemented in that. > > Is there any other way to know about what have implemented yet and in > gcc-4.0.2. Can i know the algorithm.? the overview of the algorithm is in the comment at the start of tree-ssa-loop-ivopts.c. Zdenek
Re: Zdenek Dvorak and Daniel Berlin appointed loop optimizer maintainers
Hello, > I am pleased to announce that the GCC Steering Committee has > appointed Zdenek Dvorak and Daniel Berlin as non-algorithmic maintainers > of the RTL and Tree loop optimizer infrastructure in GCC. thank you. What exactly does "non-algorithmic" mean in this context? Zdenek > Please join me in congratulating Zdenek and Daniel on their new > role. Zdenek and Daniel, please update your listings in the MAINTAINERS > file. > > Happy hacking! > David
Re: Zdenek Dvorak and Daniel Berlin appointed loop optimizer maintainers
Hello, > I am pleased to announce that the GCC Steering Committee has > appointed Zdenek Dvorak and Daniel Berlin as non-algorithmic maintainers > of the RTL and Tree loop optimizer infrastructure in GCC. > > Please join me in congratulating Zdenek and Daniel on their new > role. Zdenek and Daniel, please update your listings in the MAINTAINERS > file. done. I am not sure whether it would be useful to indicate non-algorithmicness somehow in the MAINTAINERS file? Zdenek
Re: Why does flow_loops_find modify the CFG, again?
Hello, > I'm running into some troubles with an if-conversion pass that runs > after reload, where we have to avoid lifting insns across a loop > exit edge into a loop. ifcvt.c uses flow_loops_find to find loops > and mark all loop exit edges: > > if ((! targetm.cannot_modify_jumps_p ()) > && (!flag_reorder_blocks_and_partition || !no_new_pseudos > || !targetm.have_named_sections)) > { > struct loops loops; > > flow_loops_find (&loops); > mark_loop_exit_edges (&loops); > flow_loops_free (&loops); > free_dominance_info (CDI_DOMINATORS); > } > > I was wondering why we would sometimes *not* mark exit edges, but then > I remembered that for some reason flow_loops_find modifies the CFG, > which may lead to problems that we have to work around here. > > But if we do not mark loop exit edges, we can sometimes end up doing > unprofitable if-conversions! > > It seems to me that a function called "flow_loops_find" is supposed to > do *just* analysis, and not transformations. Apparently it now first > transforms all loops into some canonical form, but that is completely > inappropriate and unnecessary for some users of this loops analysis. > > Is this something that could be easily fixed? E.g. can we make it > that flow_loops_find only performs transformations if asked to (by > adding a function argument for that)? yes. However, you won't be able to use most of the functions for work with loops then. mark_loop_exit_edges should work, though. Zdenek
Re: GCC optimizes integer overflow: bug or feature?
Hello, > >Now, if your argument is that following the LIA-1 standard will prevent > >optimizations that could otherwise be made if one followed only the C > >standard, that's a reasonable argument, but it should not be couched as > >if it implies that preventing the optimizations would not be following > >the C standard. > > I continue to suspect that the gains from this optimization are minimal > to non-existent in practice. the majority of loop optimizations work only assuming that induction variables do not overflow. This assumption can sometimes be verified using quite expensive analyses of the loop code, or in majority of cases easily derived if the induction variables are either signed or pointers (where the standard allows us to assume that the arithmetics does not overflow). Note that there was even a serious discussion about enabling -funsafe-loop-optimizations by default at some optimization level, thus making us assert that *no* induction variables (regardless of signedness) overflow; which IIRC is what some other compilers do. IMHO, using loops relying on the behavior of overflow of an induction variable (*) is an ugly hack and whoever writes such a code does not deserve for his program to work. Zdenek (*) especially without bothering to verify what the language standard says about that
Re: GCC optimizes integer overflow: bug or feature?
Hello, > > Paul Brook wrote: > > >> Compiler can optimize it any way it wants, > > >> as long as result is the same as unoptimized one. > > > > > > We have an option for that. It's called -O0. > > > > > > Pretty much all optimization will change the behavior of your program. > > > > Now that's a bit TOO strong a statement, critical optimizations like > > register allocation and instruction scheduling will generally not change > > the behavior of the program (though the basic decision to put something > > in a register will, and *surely* no one suggests avoiding this critical > > optimization). > > Actually they will with multi threaded program, since you can have a case > where it works and now it is broken because one thread has speed up so much it > writes to a variable which had a copy on another thread's stack. So the > argument > about it being too strong is wrong because timming matters now a days. > Instruction > scheduling can cause the same issue as it forces a write too early for > another thread > to act on. actually, you do not even need (invalid) multithreaded programs to realize that register allocation may change behavior of a program. If the size of the stack is bounded, register allocation may cause or prevent program from running out of stack, thus turning a crashing program to non-crashing one or vice versa. Zdenek
Re: SSA_NAMES: should there be an unused, un-free limbo?
Hello, > > > > The situation is that some SSA_NAMEs are disused (removed from the > > > > code) without being released onto the free list by > > > > release_ssa_name(). > > > > > > > Yes, it happens if a name is put into the set of names to be updated by > > > update_ssa. > > > > > > After update_ssa, it should be true that every SSA name with no > > > SSA_NAME_DEF_STMT is in the free list. > > > > In this case this isn't true, because we have code that orphans ssa > > names without releasing them. > > I'm sure Robert will explain further details in a few moments :) > True. But remember, the stated purpose of the SSA_NAME recycling > code was _not_ to track every SSA_NAME that went "dead" and recycle > it, but instead to get the majority of them (and to ultimately save > memory by recycling them). Orphan SSA_NAME were always expected. > > If we wanted to add a new policy that all dead SSA_NAMEs must be > released to the recycling code, then that's fine, I'm not going > to object. I think this might be a good idea. I think that this requires a lot of changes (basically going through all uses of bsi_remove and remove_phi_node and checking them), but it would be cleaner than the current situation. It should be however easy to add verification to verify_ssa, so this project should be fairly straightforward (I hope). Zdenek
Re: SSA_NAMES: should there be an unused, un-free limbo?
Hello, > On Thu, 2006-12-21 at 20:18 +0100, Zdenek Dvorak wrote: > > > I think this might be a good idea. I think that this requires > > a lot of changes (basically going through all uses of bsi_remove > > and remove_phi_node and checking them), but it would be cleaner > > than the current situation. > Agreed. Tedious work, but it shouldn't be terribly difficult (famous > last words). I will give it a try (i.e., if it can be done in one afternoon, I will send a patch tomorrow :-). Zdenek
Re: SSA_NAMES: should there be an unused, un-free limbo?
Hello, > > > I think this might be a good idea. I think that this requires > > > a lot of changes (basically going through all uses of bsi_remove > > > and remove_phi_node and checking them), but it would be cleaner > > > than the current situation. > > Agreed. Tedious work, but it shouldn't be terribly difficult (famous > > last words). > > I will give it a try (i.e., if it can be done in one afternoon, I will > send a patch tomorrow :-). here is a patch. It passes bootstrap & regtesting except for one testcase (g++.dg/opt/reg-stack2.C); I do not have time to work further on it now. As expected, more complications than I believed appeared. The changes to bsi_remove and release_defs would be basically sufficient for ssa names for real operands, however we are losing ssa names for virtual operands everywhere, and on several places relying on that they are not released. One problem with the patch seems to be that it appears to increase compile times significantly (by some 4% on compiling preprocessed gcc sources), I did not investigate where does this slowdown come from. Zdenek Index: tree-ssa-loop-im.c === *** tree-ssa-loop-im.c (revision 120177) --- tree-ssa-loop-im.c (working copy) *** free_mem_ref_locs (struct mem_ref_loc *m *** 897,912 static void rewrite_mem_refs (tree tmp_var, struct mem_ref_loc *mem_refs) { - tree var; - ssa_op_iter iter; - for (; mem_refs; mem_refs = mem_refs->next) { ! FOR_EACH_SSA_TREE_OPERAND (var, mem_refs->stmt, iter, SSA_OP_ALL_VIRTUALS) ! mark_sym_for_renaming (SSA_NAME_VAR (var)); ! *mem_refs->ref = tmp_var; ! update_stmt (mem_refs->stmt); } } --- 897,907 static void rewrite_mem_refs (tree tmp_var, struct mem_ref_loc *mem_refs) { for (; mem_refs; mem_refs = mem_refs->next) { ! push_stmt_changes (&mem_refs->stmt); *mem_refs->ref = tmp_var; ! pop_stmt_changes (&mem_refs->stmt); } } Index: tree-complex.c === *** tree-complex.c (revision 120177) --- tree-complex.c (working copy) *** expand_complex_move (block_stmt_iterator *** 776,781 --- 776,783 x = build2_gimple (GIMPLE_MODIFY_STMT, x, r); bsi_insert_before (bsi, x, BSI_SAME_STMT); + push_stmt_changes (&stmt); + if (stmt == bsi_stmt (*bsi)) { x = build1 (IMAGPART_EXPR, inner_type, unshare_expr (lhs)); *** expand_complex_move (block_stmt_iterator *** 794,800 } update_all_vops (stmt); ! update_stmt (stmt); } } --- 796,802 } update_all_vops (stmt); ! pop_stmt_changes (&stmt); } } Index: tree-ssa-loop-manip.c === *** tree-ssa-loop-manip.c (revision 120177) --- tree-ssa-loop-manip.c (working copy) *** verify_loop_closed_ssa (void) *** 423,429 if (current_loops == NULL) return; ! verify_ssa (false); FOR_EACH_BB (bb) { --- 423,429 if (current_loops == NULL) return; ! verify_ssa (false, false); FOR_EACH_BB (bb) { Index: tree-tailcall.c === *** tree-tailcall.c (revision 120177) --- tree-tailcall.c (working copy) *** eliminate_tail_call (struct tailcall *t) *** 752,758 break; bsi_remove (&bsi, true); - release_defs (t); } /* Number of executions of function has reduced by the tailcall. */ --- 752,757 *** eliminate_tail_call (struct tailcall *t) *** 798,804 } bsi_remove (&t->call_bsi, true); - release_defs (call); } /* Add phi nodes for the virtual operands defined in the function to the --- 797,802 Index: tree-ssa-dse.c === *** tree-ssa-dse.c (revision 120177) --- tree-ssa-dse.c (working copy) *** dse_optimize_stmt (struct dom_walk_data *** 608,617 /* Remove the dead store. */ bsi_remove (&bsi, true); - - /* And release any SSA_NAMEs set in this statement back to the -SSA_NAME manager. */ - release_defs (stmt); } record_voperand_set (dse_gd->stores, &bd->stores, ann->uid); --- 608,613 Index: tree-ssa-alias.c === *** tree-ssa-alias.c(revision 120177) --- tree-ssa-alias.c(working copy) *** compute_may_aliases (void) *** 943,953 { block_stmt_iterator bsi; basic_block bb; FOR_EACH_BB (bb) { for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi)) { ! upda
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
Hello, > >I have been looking into infer_loop_bounds_from_signedness() called > >from infer_loop_bounds_from_undefined(). > >At some places, nowrap_type_p() is used but this function operates > >only on types, so there will be too many false positive there; yet we > >will miss warning through that case. > > I don't know that area too well, but I think we are already issuing a > warning if we use -funsafe-loop-optimizations, so it might be possible > to do the same if we use signed wrapping undefinedness. not quite; -funsafe-loop-optimizations is not used in scev analysis itself, only in # of iterations analysis. At some extra compile-time cost, you can check whether we use undefinedness of signed overflow throughout the scev analysis, but there is no way how to avoid duplicate warnings, and even making the positioning and contents of the warnings sensible would be nontrivial. Zdenek
Re: GCC trunk revision 120704 failed to build spec cpu2k/gcc
Hello, > > GCC trunk revision 120704 failed to build SPEC cpu2000/gcc on -O1 and > > higher optimization level at x86_64-redhat-linux. > > > > reload1.c: In function 'reload': > > reload1.c:449: error: address taken, but ADDRESSABLE bit not set > > bad_spill_regs > > > > reload1.c:449: error: address taken, but ADDRESSABLE bit not set > > bad_spill_regs > > > > reload1.c:449: internal compiler error: verify_stmts failed > > Please submit a full bug report, > > with preprocessed source if appropriate. > > See http://gcc.gnu.org/bugs.html> for instructions. > > > > Does anybody see the same? I can reproduce this on i686. I will check what the problem is, Zdenek
Re: GCC trunk revision 120704 failed to build spec cpu2k/gcc
Hello, > > > GCC trunk revision 120704 failed to build SPEC cpu2000/gcc on -O1 and > > > higher optimization level at x86_64-redhat-linux. > > > > > > reload1.c: In function 'reload': > > > reload1.c:449: error: address taken, but ADDRESSABLE bit not set > > > bad_spill_regs > > > > > > reload1.c:449: error: address taken, but ADDRESSABLE bit not set > > > bad_spill_regs > > > > > > reload1.c:449: internal compiler error: verify_stmts failed > > > Please submit a full bug report, > > > with preprocessed source if appropriate. > > > See http://gcc.gnu.org/bugs.html> for instructions. > > > > > > Does anybody see the same? > > I can reproduce this on i686. I will check what the problem is, this is probably due to some of the recent IPA changes. ipa-reference.c:analyze_function does not traverse phi nodes, and thus misses that the address of bad_spill_regs is taken inside the phi node (and clears the addressable flag from it, causing an ICE later). I am working on the patch. Zdenek
Cheatsheet for building gcc
Hello, with the changes to the build system in the last few months, I am having problems to do some things I am used to do during development. I found more or less satisfactory procedures for most them, however, I would like to know the "official" way how to achieve those effects (and to put the created "cheatsheet" to wiki, so that people do not have to reinvent the wheel). All the solutions should be for gcc configured without any special options (except for --enable-languages and --target): Example: Q1) How to bootstrap the compiler, including all libraries? A1) "make" from the top directory Could some of the build system experts please provide the answers for the following questions? Q2) How to bootstrap the compiler, but not build any time consuming libraries (libjava, libstdc++)? Q3) How to compile just the stage1 compiler + the basic runtime (libgcc, libmudflap, libgomp)? Q4) How to compile all the libraries (libjava, ...) using the stage1 compiler? Q5) How to clean up everything (including the stage1 compiler, but excluding makefiles, so that I can build everything from scratch, but configure does not have to be run)? Q6) How to clean up everything except for the stage1 compiler and its runtime (libgcc, libmudflap, libgomp)? And of course, how to do any other useful things that I have forgotten about... Zdenek
Re: Cheatsheet for building gcc
Hello, thanks a lot for the answers! > >Q4) How to compile all the libraries (libjava, ...) using the stage1 > >compiler? > > A4) Configure with --enable-stage1-languages=all Is there a way how to do it without reconfiguring gcc? This question aims at the following situation: I already have compiled the stage1 compiler with a patch, and I have learned from some external source that the compiler ICEs during the compilation of libjava. Since it is highly unlikely that bootstrapping the compiler would matter, I would like to be able to just start building libjava using stage1 compiler. Reconfiguring gcc and building it from scratch takes some time, so I would like to avoid that. > >Q6) How to clean up everything except for the stage1 compiler and > >its runtime (libgcc, libmudflap, libgomp)? > > make restrap (which will also start a bootstrap) Would it be possible to add a possibility to just clean up the things, without restarting the bootstrap (i.e., to move everything to the state I would have after "make stage1-bubble" if I started from the scratch)? Zdenek > >And of course, how to do any other useful things that I have forgotten > >about... > > I would like to ask if, in your opinion, libssp, libmudflap and libgomp > should be bootstrapped like libgcc is. I would think that, for > automatic parallelization, you'd need at least libgomp. > > In this case, the answer to Q3 would be simply "make stage1-bubble". > > Paolo
Re: Cheatsheet for building gcc
Hello, > >Is there a way how to do it without reconfiguring gcc? > > No. Do you think it would be best to have --enable-stage1-languages=all > in non-release branches? The time taken to compile the stage1 > (nonoptimized) non-C front-ends is small, and the advantage seems > significant. if you configure with --enable-stage1-languages=all, 1) will all the libraries be built three times during bootstrap, or just once? 2) is it still possible to just build the stage1 compiler without the libraries? > >>>Q6) How to clean up everything except for the stage1 compiler and > >>> its runtime (libgcc, libmudflap, libgomp)? > >>make restrap (which will also start a bootstrap) > > > >Would it be possible to add a possibility to just clean up the things, > >without restarting the bootstrap (i.e., to move everything to the > >state I would have after "make stage1-bubble" if I started from the > >scratch)? > > Actually, I was wrong as "make restrap" will also delete the runtime. > The closest things to what you want are: > - "make distclean-stage2"; however it will leave there all the target > libraries if they have already been compiled, not only the runtime. > - "make distclean-stage2 clean-target"; which will remove also the runtime. thanks, the "make distclean-stage2 clean-target" actually is exactly what I need (although I specified something different :-) Zdenek
Re: 2007 GCC Developers Summit
Hello, > >> One idea I've always pondered is to have brief (perhaps 1-2 hr) > >> tutorials, given by people in their area of expertise, as a means for > >> other developers to come up to speed on a topic that interests them. Is > >> this something that appeals to others? > >> > >Sounds good to me. For instance, the new java front end, a description > >of the new build system, etc. > > Although it's not something new, > I'd be interested in a tutorial on loop optimizations (IV opt and all > related). > Based on my understanding, the topic might be of interest > to some other people as well. if sufficiently many people are interested, I (hopefully with the help of other loop optimizer developers) can prepare something. Zdenek
Re: Disabling bootstrap
Hello, > I've changed gcc by adding a new pass, however, when I compile gcc, > during compilation it calls itself, so I disabled bootstrap but that > is still happening even during bootstrap. Is there any way to compile > gcc without the final gcc compiling something? make stage1-bubble. See also http://gcc.gnu.org/wiki/Top-Level_Bootstrap > Moreover, how can I add a flag to gcc so that it enables/disables a given > pass? Add the flag to common.opt, and test it in the gate function of your pass. Zdenek
Re: Scheduling an early complete loop unrolling pass?
Hello, > >As we also only vectorize innermost loops I believe doing a > >complete unrolling pass early will help in general (I pushed > >for this some time ago). > > > >Thoughts? > > It might also hurt, though, since we don't have a basic block > vectorizer. IIUC the vectorizer is able to turn > > for (i = 0; i < 4; i++) > v[i] = 0.0; > > into > > *(vector double *)v = (vector double){0.0, 0.0, 0.0, 0.0}; I intentionally put the cunroll pass after vectorizer to enable this. I guess we might choose to rather unroll the loop, and leave creation of the vector operations on some kind of later combine/straight-line code vectorization pass. Zdenek
Re: Scheduling an early complete loop unrolling pass?
Hello, > Ira Rosen/Haifa/IBM wrote on 06/02/2007 11:49:17: > > > Dorit Nuzman/Haifa/IBM wrote on 05/02/2007 21:13:40: > > > > > Richard Guenther <[EMAIL PROTECTED]> wrote on 05/02/2007 17:59:00: > > > > ... > > > > > > That's going to change once this project goes in: "(3.2) Straight- > > > line code vectorization" from http://gcc.gnu. > > > org/wiki/AutovectBranchOptimizations. In fact, I think in autovect- > > > branch, if you unroll the above loop it should get vectorized > > > already. Ira - is that really the case? > > > > The completely unrolled loop will not get vectorized because the > > code will not be inside any loop (and our SLP implementation will > > focus, at least as a first step, on loops). > > Ah, right... I wonder if we can keep the loop structure in place, even > after completely unrolling the loop - I mean the 'struct loop' in > 'current_loops' (not the actual CFG), so that the "SLP in loops" would have > a chance to at least consider vectorizing this "loop". Zdenek - what do you > say? I do not think this is a good idea -- making the structures inconsistent just to "fix" a pass that can be easily fixed in other way. Zdenek > thanks, > dorit > > > The following will get vectorized (without permutation on autovect > > branch, and with redundant permutation on mainline): > > > > for (i = 0; i < n; i++) > > { > > v[4*i] = 0.0; > > v[4*i + 1] = 0.0; > > v[4*i + 2] = 0.0; > > v[4*i + 3] = 0.0; > > } > > > > The original completely unrolled loop will get vectorized if it is > > encapsulated in an outer-loop, like so: > > > > for (j=0; j > { > > for (i = 0; i < 4; i++) > > v[i] = 0.0; > > v += 4; > > } > > > > Ira
Re: RFC: vectorizer cost model
Hello, > "Linthicum, Tony" <[EMAIL PROTECTED]> writes: > > > * Would using a tree-level API like estimate_num_insns be superior > > to using a simple cost of 1 per statment? > > For this sort of calculation, I would guess not. I would imagine that > on most processors the cost of a single vector instruction is > comparable to the cost of the same operation on scalar operands. So > simply counting instructions should give you a good indication of > whether vectorization is profitable, even though each instruction will > of course have a different cost. > > If this turns out not to be the case, then I think the next step would > be to have a target defined multiplier for vector instructions, > indicating that for that target vector instructions were more or less > expensive than scalar instructions according to some ratio. > > I doubt that using the full fledged cost infrastructure would give you > better results than that in practice. on the other hand, using it does not cost you anything, and would avoid risk of creating yet another code duplication. > > * What is the best way to access target level cost information? > > I'm sure you know that the loop code does this by generating RTL and > asking for the cost of it (computation_cost in tree-ssa-loop-ivopts.c). Which should be removed (I have some patches for that, but I somehow forgot on them because of other issues). Zdenek
Re: Preserving alias analysis information
Hello, > For instance, let's consider the following struct definition (taken from > gcc.dg/struct-alias-1.c): > > struct S { >int a[3]; >int x; > }; > > This is the original code in GIMPLE pseudo-C dump representation: > > s.x = 0; > i.0 = i; > s.a[i.0] = 1; > D.1416 = s.x; > if (D.1416 != 0) goto ; else goto ; > :; > link_error (); > > This is the code after the CLI-specific array simplification: > > s.x = 0; > i.0 = i; > cilsimp.1 = &s.a[0]; > D.1423 = i.0 * 4; > D.1424 = D.1423 + cilsimp.1; > *D.1424 = 1; > D.1416 = s.x; > if (D.1416 != 0) goto ; else goto ; > :; > link_error (); > > In the original code, GCC alias analysis understands that accesses to > s.x and s.a do not alias; therefore, it understands that the "then" > condition of the if statement is never taken. > In the modified code, the alias analysis conclude that access to s.x and > pointer D.1424 may alias. > My question is: is this unavoidable because of the memory access pattern > in the modified code, or was there additional information the > transformation pass could have attached to D.1424 (or to something else) > that would have have excluded such a memory alias? you might try turning the references to TARGET_MEM_REFs, and copy the alias information using copy_ref_info to it. I am not sure how that would interact with the transformations you want to do, but we do lot of magic to keep the virtual operands for TARGET_MEM_REFs the same as before the transformation (unless that got broken in last few months, which unfortunately is pretty likely). Zdenek
Re: Preserving alias analysis information
Hello, > >you might try turning the references to TARGET_MEM_REFs, and copy the > >alias information using copy_ref_info to it. I am not sure how that > >would interact with the transformations you want to do, but we do lot > >of magic to keep the virtual operands for TARGET_MEM_REFs the same > >as before the transformation (unless that got broken in last few months, > >which unfortunately is pretty likely). > > It would be better to annotate things with better alias information > than transform into target specific trees, which none of the other > transformations actually know how to deal with. well, I had impression that what he does is some target-specific transformation, so this idea did not seem all that out of place to me. Zdenek
Re: Preserving alias analysis information
Hello, > >>>you might try turning the references to TARGET_MEM_REFs, and copy the > >>>alias information using copy_ref_info to it. I am not sure how that > >>>would interact with the transformations you want to do, but we do lot > >>>of magic to keep the virtual operands for TARGET_MEM_REFs the same > >>>as before the transformation (unless that got broken in last few months, > >>>which unfortunately is pretty likely). > >>It would be better to annotate things with better alias information > >>than transform into target specific trees, which none of the other > >>transformations actually know how to deal with. > > > >well, I had impression that what he does is some target-specific > >transformation, so this idea did not seem all that out of place to me. > > Thanks Daniel and Zdenek for your suggestions! > > In principle, I don't see anything forbidding Zdenek's idea. > Unfortunately, what I avoided to mention is that TARGET_MEM_REF nodes > are also transformed into pointer arithmetics > and the equivalent > INDIRECT_REF memory access... therefore, this is not an option even only > because of that. hmm... why you do that? Could you please describe more precisely what are you trying to achieve? > The first time this CLI-specific transformation is performed is before > GIMPLE code enters SSA form This looks like a wrong place; I guess later optimizations will in general try to revert the trees to the original form (at the moment, we do not have tree-combine pass, but if we had, it definitely would). IMHO, it would make more sense to do this kind of target specific transformations as late as possible. Anyway, if that is the case, using TMRs is not a good idea. Zdenek
Re: Preserving alias analysis information
Hello, > >>In principle, I don't see anything forbidding Zdenek's idea. > >>Unfortunately, what I avoided to mention is that TARGET_MEM_REF nodes > >>are also transformed into pointer arithmetics > >>and the equivalent > >>INDIRECT_REF memory access... therefore, this is not an option even only > >>because of that. > > > >hmm... why you do that? Could you please describe more precisely what > >are you trying to achieve? > > Sure! > The short answer is that, though most GIMPLE tree codes closely match > what is representable in the CIL bytecode, some do not; hence, such > codes are "lowered" into equivalent expressions that directly match what > is representable in the bytecode. this seems quite close to what TARGET_MEM_REFs are designed for. IMHO, the best way would be to lower the memory references to TARGET_MEM_REFs (*) just once, sometime quite late in the optimization pipeline (just after loop optimizations, for example), so that high-level optimizers/alias analysis see the easy to understand code, while at least the essential cleanups are performed on the lower level code. Zdenek (*) or just the pointer arithmetics like you do now, if you have some reasons for avoiding TMRs, although then you have to rerun the pass once later to get rid of invalid forms that may be created by the optimizers. > The long answer is in "CIL simplification pass" section at > http://gcc.gnu.org/projects/cli.html :-) > > >>The first time this CLI-specific transformation is performed is before > >>GIMPLE code enters SSA form > > > >This looks like a wrong place; I guess later optimizations will in > >general try to revert the trees to the original form (at the moment, > >we do not have tree-combine pass, but if we had, it definitely would). > >IMHO, it would make more sense to do this kind of target specific > >transformations as late as possible. > > Yes, this is precisely the reason behind running this transformation > pass twice. > The first pass (before SSA) simplifies GIMPLE expressions not atomically > representable in the bytecode, in the hope that GCC middle-end passes > optimize them further. > The second pass makes sure to re-simplify expressions that may have been > reverted to the original form. The practice shows this happens quite > rarily; still, it's a possibility that must be taken into account. > The first pass is optional, the second is strictly required. > > My experiments also show that the bytecode generated by running the CIL > simplification twice as opposed to running the final pass only is > significantly better. For multimedia codecs with heavy array accesses > (which, by the way, is the kind of code ST is particularly interested > in), the recorded difference in performance is up to 40%. > > >Anyway, if that is the case, using TMRs is not a good idea. > > > >Zdenek > > Cheers, > Roberto
Re: Preserving alias analysis information
Hello, > >this seems quite close to what TARGET_MEM_REFs are designed for. IMHO, > >the best way would be to lower the memory references to TARGET_MEM_REFs > >(*) just once, sometime quite late in the optimization pipeline (just after > >loop optimizations, for example), so that high-level optimizers/alias > >analysis see the easy to understand code, while at least the essential > >cleanups are performed on the lower level code. > > but the only kind of TARGET_MEM_REFs we would really allow would be > those with a base and no symbol, no index, no step, no offset... which > look to me like INDIRECT_REFs! well, if that acurately describes the capabilities of the target, yes. This way, you will prevent optimizers from recombining the addresses to more complex ones again. Zdenek
Re: GCC 4.3 Stage 1 project survey
Hello, > I've been trying to keep the GCC_4.3_Release_Planning wiki page up to > date, and I'd like to be sure I haven't missed anything going in. I've > moved several projects to the Completed section, and if I've done > anything in error there, please correct it. > > So here's a survey of what's left: > > * PredictiveCommoning > > I can't tell for sure, as Zdenek commits so many patches ;-) It seems > that this one is not merged yet. the final version of the patch is at http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01385.html waiting for review. Zdenek
Re: Valid gimple for MEM_REF
Hello, > I noticed that gcc.dg/tree-ssa/loop-19.c was failing on both > powerpc-linux-gnu and powerpc64-linux-gnu: > FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times MEM.(base: &|symbol: > )a, 2 > FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times MEM.(base: &|symbol: > )c, 2 > > The reason why they are failing is because we produce: > MEM[base: (double *) &c, index: ivtmp.34] = MEM[base: (double *) &a, > index: ivtmp.34]; > Which does not match the regex as there is a cast there. > Now the real question comes down to, is the following valid gimple > that IV-OPTS produces: > MEM[base: (double *) &a, index: ivtmp.34_12]; > > base is now a non gimple invariant but instead is a full expression. > If we decide to do any other optimizations with MEM_REF, we might run > into more of these issues? > > So what are the constraints on MEM_REF's base argument, is it a simple > expression (SSA_name or invariant) or can it be a complex expression? only gimple_vals (name or invariant). However, the expressions are matched in final_cleanup dump (after out-of-ssa and ter), so this no longer is the case. I think just the regular expressions need to be updated. Zdenek
Re: Valid gimple for MEM_REF
Hello, > > only gimple_vals (name or invariant). However, the expressions are > >matched in final_cleanup dump (after out-of-ssa and ter), so this no > >longer is the case. I think just the regular expressions need to be > >updated. > > Then IV-OPTs has an issue too but IV-OPTs dump gives: > D.1604_5 = MEM[base: (double *) &a, index: ivtmp.34_12]; > MEM[base: (double *) &c, index: ivtmp.34_12] = D.1604_5; > > the expression matching in final_cleanup was just a symptom of the issue. OK, I will have a look at that. Zdenek
Proposal: changing representation of memory references
Hello, at the moment, any pass that needs to process memory references are complicated (or restricted to handling just a limited set of cases) by the need to interpret the quite complex representation of memory references that we have in gimple. For example, there are about 1000 of lines of quite convoluted code in tree-data-ref.c and about 500 lines in tree-ssa-loop-ivopts.c dealing with parsing and analysing memory references. I would like to propose (and possibly also work on implementing) using a more uniform representation, described in more details below. The proposal is based on the previous discussions (http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned about the way memory references are represented in ICC. It also subsumes the patches of Daniel to make p[i] (where p is pointer) use ARRAY_REFs/MEM_REFs. I am not sure whether someone does not already work on something similar, e.g. with respect to LTO (where the mentioned discussion started), or gimple-tuples branch? Proposal: For each memory reference, we remember the following information: -- base of the reference -- constant offset -- vector of indices -- type of the accessed location -- original tree of the memory reference (or another summary of the structure of the access, for aliasing purposes) -- flags for each index, we remeber -- lower and upper bound -- step -- value of the index The address of the reference can be computed as base + offset + sum_{idx} offsetof(idx) where offsetof(idx) = (value - lower) * step For example, a.x[i][j].y would be represented as base = &a offset = offsetof (x) + offsetof (y); indices: lower = 0 upper = ? step = sizeof (a.x[i]) value = i lower = 0 upper = ? step = sizeof (a.x[j]) value = j p->x would be represented as base = p; offset = offsetof (x); indices: empty p[i] as base = p; offset = 0 indices: lower = 0 upper = ? step = sizeof (p[i]) value = i Remarks: -- it would be guaranteed that the indices of each memory reference are independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only if idx1 == idx1' and idx2 = idx2'; this is important for dependency analysis (and for this reason we also need to remember the list of indices, rather than trying to reconstruct them from the address). -- it would be guaranteed that the computations of the address do not overflow. -- possibly there could be a canonicalization pass that, given for (p = q; p < q + 100; p++) p->x = ... {represented the way described above} would transform it to for (p = q, i = 0; p < q + 100; p++, i++) {base = q, offset = offsetof(x), indices: lower = 0 upper = ? step = sizeof (*p) value = i} so that dependence analysis and friends do not have to distinguish between accesses through pointers and arrays Zdenek
Re: Proposal: changing representation of memory references
Hello, > This looks like a very complicated (though very generic) way of > specifying a memory > reference. Last time we discussed this I proposed to just have BASE, OFFSET > and accessed TYPE (and an alias tag of the memory reference). I realize > this > doesn't cover accesses to multi-dimensional arrays, but using the > full-blown scheme > in place of a simple INDIRECT_REF makes memory usage for the trees go up > noticable. as was pointed out before, you cannot avoid remembering the indices of the memory reference in some way if you want to be able to do dependency analysis. Also, many high-level optimizations work with multidimensional arrays, so the representation should make this possible. Regarding the memory consumption, let us forget the part of the information for alias analysis (that for simplicity proposes preserving the current representation, but that can be changed). Then, the proposed representation still is cheaper than the current way (component_refs are collapsed into a single offset field; only the necessary information is recorded for indices, whereas now we have redundant information stored in ARRAY_REFs). For simple indirect_refs, there are no indices, so the overhead of the proposal over the base+offset one is basically one pointer. Zdenek
Re: Proposal: changing representation of memory references
Hello, > This looks like a very complicated (though very generic) way of > specifying a memory > reference. Last time we discussed this I proposed to just have BASE, OFFSET > and accessed TYPE (and an alias tag of the memory reference). I realize > this > doesn't cover accesses to multi-dimensional arrays, but using the > full-blown scheme > in place of a simple INDIRECT_REF makes memory usage for the trees go up > noticable. by the way, does your claim "using the full-blown scheme... makes memory usage go up noticeable" mean that you have experimented with some similar representation? If so, could you please post your patches/results? Or do you have at least some data regarding how great part of the tree memory consumption comes from memory references? Zdenek
Re: Proposal: changing representation of memory references
Hello, > >> This looks like a very complicated (though very generic) way of > >> specifying a memory > >> reference. Last time we discussed this I proposed to just have BASE, > >OFFSET > >> and accessed TYPE (and an alias tag of the memory reference). I realize > >> this > >> doesn't cover accesses to multi-dimensional arrays, but using the > >> full-blown scheme > >> in place of a simple INDIRECT_REF makes memory usage for the trees go up > >> noticable. > > > >by the way, does your claim "using the full-blown scheme... makes memory > >usage go up noticeable" mean that you have experimented with some > >similar representation? If so, could you please post your > >patches/results? Or do you have at least some data regarding how great > >part of the tree memory consumption comes from memory references? > > No, I didn't do patches or measurements. I just extrapolated that if you > want > to handle things like a.x[i].y.z[j] you need to store not only two indices > but > either two strides or two types (to extract the stride). So you'd > have (assuming > a flat tree) for example two VECs in the memory reference tree. Together > with the (possibly non-constant) offset and the base these are four pointers > compared to one in a simple INDIRECT_REF. to make this comparison fair, you also need to take into account that the computations need to be done somewhere. So for the "simple INDIRECT_REF", you in fact have the following statements in the program (assuming that lower bound of each array is 0, otherwise you would also have the subtractions): off1 = step1 * i; off2 = step2 * i; off = off1 + off2; addr = base + off; *addr Given all the temporary variables, expression and assignment nodes needed for this, this more than makes up for any extra space that is used in the proposed representation. Of course, in some situations some of these computations may be shared. For simple INDIRECT_REF (without any structure), we store base, plus we have fields for offset (constant 0) and the vector of indices (NULL). So there are two extra pointers over the current state. Without experimentation, I cannot tell whether overall, my proposal would require more or less memory than we use now. In fact, if this turned out to be a problem, I can live with allowing also INDIRECT_REF in its current form, to represent references that do not have any structure. > Maybe we can incrementally create out of all the existing memory reference > two reference trees, one simple that does not handle multi-dimensional array > accesses well and one full-blown one? Would not that just mean that you would consume even more memory? If you mean incremental lowering, yes, I can imagine we might want to do that (possibly lowering the references later to the existing TARGET_MEM_REFs), but it does not really help with the memory consumption. Zdenek
Re: Proposal: changing representation of memory references
Hello, > >Proposal: > > > >For each memory reference, we remember the following information: > > > >-- base of the reference > >-- constant offset > >-- vector of indices > >-- type of the accessed location > >-- original tree of the memory reference (or another summary of the > > structure of the access, for aliasing purposes) > This i don't think we should keep, because i don't believe it's > necessary for aliasing. At worst, aliasing could come up with it's > own summary if it really wanted to. long term, I totally agree with you. Short term, I think even implementing the proposal in the current form will be a lot of work; I somewhat fear having to include having to modify alias analysis significantly into it (of course, if someone familiar with tree alias analysis -- i.e., you or Diego -- volunteered to help me with this part of the change, it would help a lot). > >-- flags > > > >for each index, we remeber > >-- lower and upper bound > >-- step > >-- value of the index > > This seems a lot, however, since most of it can be derived from the > types, why are we also keeping it in the references. The lower bound and step may be SSA_NAMEs, see the current ARRAY_REF, and as such, they need to be stored somewhere in IR (not just in type). > That is, unless we could share most of the index struct (upper, > lower, step) among expressions that access them (IE make index be > immutable, and require unsharing and resharing if you want to modify > the expression). That appears a bit dangerous to me, I would rather avoid such tricks. Zdenek
Re: Proposal: changing representation of memory references
Hello, > >> >-- flags > >> > > >> >for each index, we remeber > >> >-- lower and upper bound > >> >-- step > >> >-- value of the index > >> > >> This seems a lot, however, since most of it can be derived from the > >> types, why are we also keeping it in the references. > > > >The lower bound and step may be SSA_NAMEs, see the current ARRAY_REF, > >and as such, they need to be stored somewhere in IR (not just in type). > > > >> That is, unless we could share most of the index struct (upper, > >> lower, step) among expressions that access them (IE make index be > >> immutable, and require unsharing and resharing if you want to modify > >> the expression). > > > >That appears a bit dangerous to me, I would rather avoid such tricks. > > Like Richard, I have doubts you are not going to increase memory if > you store *this* much in every single memory access. for structured accesses (ARRAY_REFs etc.) we now need much more memory -- 4 pointers, plus overhead of the common tree fields. My proposal will save some memory here. On unstructured accesses (INDIRECT_REFs), the proposal would indeed require three extra pointers. I now think keeping INDIRECT_REFs might turn out to be necessary, to represent unstructured memory accesses. Having to deal with INDIRECT_REFs should not complicate optimizers significantly, so it would not beat the purpose of the patch. All in all, I believe it will need some experimentation, but we should not consume more memory than we do now, and possibly we could even save somewhat. Zdenek
Re: Proposal: changing representation of memory references
Hello, > > -- base of the reference > > -- constant offset > > -- vector of indices > > -- type of the accessed location > > -- original tree of the memory reference (or another summary of the > > structure of the access, for aliasing purposes) > > -- flags > > What do you do with Ada COMPONENT_REFs, at a variable offset? for concreteness, let us consider a.x{offset: n} I have considered three possibilities 1) making the offset part of the expression for base, i.e., having tmp = &a + n and reference with base: tmp 2) allowing the offset in the reference to be non-constant, i.e., having reference with base: &a, offset: n 3) making this offset into an index, i.e, having base: &a, indices: (step:1, value: n) Out of these, I like 3) the most, although it might be fairly expensive memory-wise (any idea how common the variable offsets are?) Zdenek
Re: Proposal: changing representation of memory references
Hello, > >at the moment, any pass that needs to process memory references are > >complicated (or restricted to handling just a limited set of cases) by > >the need to interpret the quite complex representation of memory > >references that we have in gimple. For example, there are about 1000 of > >lines of quite convoluted code in tree-data-ref.c and about 500 lines in > >tree-ssa-loop-ivopts.c dealing with parsing and analysing memory > >references. > > I have my doubts changing this is the correct step forward. Having a > high level representation is a good thing and having it represented as > *(base+offset) (plus other info) might just make the other passes that > like the high level representation get worse. I also don't see why > tree-data-ref.c and tree-ssa-loop-ivopts.c could not use > get_inner_reference which parses the memory references for you. for many reasons. Probably the most important is that you could not recover indices of multidimensional arrays this way, thus making dependency analysis impossible. Also, with pointer arithmetics, it is not sufficient just to call get_inner_reference, you also need to traverse SSA chains. Also, time and memory-wise, calling get_inner_reference is fairly expensive. Note that the representation I propose is not really low-level. It carefully preserves all the information you could get from the "high-level" representation (indices and their ranges, alias information). We already have low-level representation of memory references (TARGET_MEM_REFs), I do not feel any need to make an alternative proposal for that at the moment. > Maybe > I don't see the benifit in always changing our IR without really > thinking about the problem and seeing if there are already tools > (functions) which do the same thing in a common place. Sorry, but you are completely out here. I have spent a lot of time working with the code for dealing with memory references, trying many different approaches. Yes, I know about get_inner_reference, I use it where appropriate, and for many reasons (e.g., those mentioned above), it does not do the job in general. Also, I did not just pull the proposal out of thin air. I have read previous discussions regarding the topic, and I took the opportunity to ask how the memory references are represented in ICC (which turns out to be quite similar to my proposal). I assume that you are trying to help to get us towards some solution, however please note that some people could find remarks like this offensive. Zdenek
Re: Proposal: changing representation of memory references
Hello, > >> >> That is, unless we could share most of the index struct (upper, > >> >> lower, step) among expressions that access them (IE make index be > >> >> immutable, and require unsharing and resharing if you want to modify > >> >> the expression). > >> > > >> >That appears a bit dangerous to me, I would rather avoid such tricks. > >> > >> Like Richard, I have doubts you are not going to increase memory if > >> you store *this* much in every single memory access. > > > >for structured accesses (ARRAY_REFs etc.) we now need much more memory -- > >4 pointers, plus overhead of the common tree fields. My proposal will > >save some memory here. > > I'm not sure how, since you have more than 4 pointers worth of info in: > > -- base of the reference -> pointer > -- constant offset -> HOST_WIDE_INT > -- vector of indices -> embedded vec > -- type of the accessed location -> Pointer > -- original tree of the memory reference (or another summary of the > structure of the access, for aliasing purposes) -> pointer > -- flags -> No idea > > for each index, we remeber > -- lower and upper bound -> pointers > -- step -> pointer > -- value of the index -> pointer > > What have i misunderstood? consider a[i][j], on 32 bit machine: Currently, this is two ARRAY_REFS, 40 bytes each, for total of 80 bytes. With my proposal, this would be 24 bytes (the items contained in the base of tree_exp: tree_common, including type+flags; locus, and block), base, offset, summary, vector (16 bytes), in the vector size and bound (8 bytes), plus 2 indices (16 bytes each), for total of 80 bytes. I.e., we get the same memory consumption for references with at least two levels. For each other index or component ref, the proposed representation saves 40 - 16 = 24 bytes. There are some possible improvements: -- if we used tail array for the indices instead of vector, we could save some 8 bytes -- assuming that the case that the bounds or step of the array are varying is infrequent, we can change the representation to require -- 8 bytes/index if both bounds and the step are constants, -- ~32 bytes otherwise by refering to the appropriate array type in the former case and TREE_VEC containing the three values in the latter case, or some similar representation Zdenek
Re: Proposal: changing representation of memory references
Hello, > >Remarks: > >-- it would be guaranteed that the indices of each memory reference are > > independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only > > if idx1 == idx1' and idx2 = idx2'; this is important for dependency > > analysis (and for this reason we also need to remember the list of > > indices, rather than trying to reconstruct them from the address). > > I didn't completely think through this, > but how would this property help in the presence of array flattening ? > e.g. suppose two adjacent loops, both two-nest-deep, > and only one of them is flattened, then > one loop will have one dimensional access (hence will have only one index) > vs the other loop will have two dimensional. > In other words, &ref[idx1] == &ref[idx1'][idx2'] can happen. this cannot happen. Suppose you have a declaration int ref[100][100]; Then the first reference would be represented as tmp = (int (*)[1])) &ref; tmp[idx1] It cannot have ref as its base. Similarly, if one wanted to implement the reverse transformation, recovering multidimensional arrays from the flat representation, he would have to transform int a[1]; for (i = 0; i < 100; i++) for (j = 0; j < 100; j++) a[100 * i + j] = ... to tmp = (int (*)[100][100]) &a; for (i = 0; i < 100; i++) for (j = 0; j < 100; j++) ref with base: tmp, indices: (i,j) = ... > Another question is, how would the representation look > for more complicated address calculations > e.g. a closed hashtable access like: > > table[hash_value % hash_size] > > and would it help in such cases ? I am not quite sure what you ask for here; this would just be represented as idx = hash_value % hash_size; and reference with base: table, indices: (idx) Zdenek
Re: adding dependence from prefetch to load
Hello, > 2. Right now I am inserting a __builting_prefetch(...) call immediately > before the actual read, getting something like: > D.1117_12 = &A[D.1101_14]; > __builtin_prefetch (D.1117_12, 0, 1); > D.1102_16 = A[D.1101_14]; > > However, if I enable the instruction scheduler pass, it doesn't realize > there's a dependency between the prefetch and the load, and it actually > moves the prefetch after the load, rendering it useless. How can I > instruct the scheduler of this dependence? > > My thinking is to also specify a latency for prefetch, so that the > scheduler will hopefully place the prefetch somewhere earlier in the > code to partially hide this latency. Do you see anything wrong with this > approach? well, it assumes that the scheduler works with long enough lookahead to actually be able to move the prefetch far enough; i.e., if the architecture you work with is relatively slow in comparison with the memory access times, this might be feasible approach. However, on modern machines, miss in L2 cache may take hundreds of cycles, and it is not clear to me that scheduler will be able to move the prefetch so far, or indeed, that it would even be possible (I think often you do not know the address far enough in advance). Also, prefetching outside of loops in general appears not to be all that profitable, since usually most of the time is spent within loops. So I would recommend first doing some analysis and measurements (say by introducing the prefetches by hand) to check whether this project really has potential to lead to significant speedups. Zdenek
Re: adding dependence from prefetch to load
Hello, > Well, the target architecture is actually quite peculiar, it's a > parallel SPMD machine. The only similarity with MIPS is the ISA. The > latency I'm trying to hide is somewhere around 24 cycles, but because it > is a parallel machine, up to 1024 threads have to stall for 24 cycles in > the absence of prefetching, which affects overall performance. > My initial studies show that this latency can be hidden with a properly > inserted prefetch instruction, and I think that the scheduler can help > with that, if properly guided. > > So my initial question remains: is there any way to tell the scheduler > not to place the prefetch instruction after the actual read? > > The prefetch instruction takes an address_operand, and it seems all I > need to do is tell the scheduler prefetch will "write" to that address, > so it will see a true dependence between the prefetch and the read. this might also restrict the possibility to move the prefetch, since it would prevent it from being moved over any other memory operations that alias with the prefetched one. Unfortunately, I do not really understand the internals of the gcc scheduler, so I cannot give you more concrete help; but hopefully someone else will. Zdenek > But > I don't know how to do that, and changing the md file to say "+p" or > "+d" for the first operand of the prefetch didn't help.
Re: GCC mini-summit - compiling for a particular architecture
Hello, > Steve Ellcey wrote: > > >This seems unfortunate. I was hoping I might be able to turn on loop > >unrolling for IA64 at -O2 to improve performance. I have only started > >looking into this idea but it seems to help performance quite a bit, > >though it is also increasing size quite a bit too so it may need some > >modification of the unrolling parameters to make it practical. > > To me it is obvious that optimizations are target dependent. For > instance loop unrolling is really a totally different optimization > on the ia64 as a result of the rotating registers. that we do not use. Nevertheless, there are still compelling reasons for why unrolling is more useful on ia64 then on other architectures (importance of scheduling, insensitivity to code size growth). Another option would be to consider enabling (e.g.) -funroll-loops -fprefetch-loop-arrays by default on -O3. I think it is fairly rare for these flags to cause performance regressions (although of course more measurements to support this claim would be necessary). Zdenek
Re: GCC mini-summit - compiling for a particular architecture
> Look from what we're starting: > > << > @item -funroll-loops > @opindex funroll-loops > Unroll loops whose number of iterations can be determined at compile > time or upon entry to the loop. @option{-funroll-loops} implies > @option{-frerun-cse-after-loop}. This option makes code larger, > and may or may not make it run faster. > > @item -funroll-all-loops > @opindex funroll-all-loops > Unroll all loops, even if their number of iterations is uncertain when > the loop is entered. This usually makes programs run more slowly. > @option{-funroll-all-loops} implies the same options as > @option{-funroll-loops}, > >> > > It could gain a few more paragraphs written by knowledgeable people. > And expanding documentation doesn't introduce regressions :). but also does not make anyone actually use the options. Nobody reads the documention. Of course, this is a bit overstatement, but with a few exceptions, people in general do not enable non-default flags. Zdenek
Re: GCC mini-summit - compiling for a particular architecture
Hello, > On Sun, 2007-04-22 at 14:44 +0200, Richard Guenther wrote: > > On 4/22/07, Laurent GUERBY <[EMAIL PROTECTED]> wrote: > > > > > but also does not make anyone actually use the options. Nobody reads > > > > > the documention. Of course, this is a bit overstatement, but with a > > > > > few exceptions, people in general do not enable non-default flags. > > > > > > > > I don't think this is fair. > > > > Most people don't read the docs because they don't care about > > > > performance, but most people who develop code that spends a lot of CPU > > > > cycles actually read the docs at least up to loop unrolling. > > > > > > Exactly my experience. > > > > > > Unfortunately there's no useful information on this topic in the GCC > > > manual... > > > > Well, we have too many switches really. So the default is use -O2. If you > > want extra speed, try -O3, or even better use profile feedback. (Not many > > people realize that with profile feedback you get faster code than with > > -O3 and smaller code than with -Os - at least for C++ programs) > > At work we use -O3 since it gives 5% performance gain against -O2. > profile-feedback has many flags actually, only two that are really important -- -fprofile-generate and -fprofile-use. Zdenek
Re: GCC 4.2.0 Status Report (2007-04-24)
Hello, > 4. PR 31360: Missed optimization > > I don't generally mark missed optimization bugs as P1, but not hoisting > loads of zero out of a 4-instruction loop is bad. Zdenek has fixed this > on mainline. Andrew says that patch has a bug. So, what's the story here? I found the problem, I will send a patch once it passes regtesting. Zdenek
Re: A problem with the loop structure
Hello, > (based on gcc 4.1.1). now that is a problem; things have changed a lot since then, so I am not sure how much I will be able to help. > 1. The problem was unveiled by compiling a testcase with dump turned > on. The compilation failed while calling function get_loop_body from > flow_loop_dump on the following assert : > > else if (loop->latch != loop->header) >{ > tv = dfs_enumerate_from (loop->latch, 1, glb_enum_p, > tovisit + 1, loop->num_nodes - 1, > loop->header) + 1; > > > gcc_assert (tv == loop->num_nodes); > > The compilation exits successfully if compiled without enabling the dump. this means that there is some problem in some loop transformation, forgetting to record membership of some blocks to their loops or something like that. > 2. SMS pass contained a single call to loop_version on the loop to be > SMSed. This happened before any SMS related stuff was done. Trying to > call verify_loop_structure(loops) just after the call to loop_version > failed on the same assert in get_loop_body as in (1). The loop on > which we fail is neither the versioned loop nor the new loop. Probably it is their superloop? > Below > there are dumps to verify_loop_structure called from different places > in loop_version: These dumps are not very useful, loop structures do not have to be consistent in the middle of the transformation. > 3. At the very beginning of the SMS pass we build the loop structure > using build_loops_structure defined in modulo-sched.c. Just after the > > call I tried to print in gdb the loop on which we failed in > get_loop_body. This failed as well > > (gdb) p print_loop(dumpfile, 0xbabe20, 0) Do not use print_loop, that only works on gimple. Use flow_loops_dump to find which blocks belong to which loops, and possibly debug_bb_n if you need to see the contents of the blocks. Zdenek
Re: GCC 4.1 Projects
Hello, > >Although you have listed it as "stage 2", I wish to commit the finished > >portion as soon as possible during stage 1. I have maintainership > >authority > >to do so. This will not interfere in any way with *any* of the projects > >approved for stage 1, since it is in a disjoint section of code. > > If it breaks bootstrap, it will definitely interfere. If it causes > patch conflicts with other changes it will also interfere. And if it > doesn't cause any patch conflicts, then it probably won't be very hard > to maintain on a branch. > > > Accordingly, I plan to do so unless I am told not to. > > I would certainly prefer that you hold off until Stage 2, as indicated > by the documented I posted. I must admit I have very bad feeling about the whole "4.1 Projects" stuff. IMHO this over-organizes things. If people in general disagree with the Nathan's changes, or if there are any reasons to think that they are not tested enough or whatever when he submits them, of course that is something else. But I don't think having just a single person decide which patches may go in and which must wait, or even just judging their importance, is a good idea. Argument "If it breaks bootstrap, ..." -- well, if it is a large or intrusive patch, it is natural to expect it to be tested even more than the rules request, including bootstrap on several platforms. Some breakage during Stage 1 is probably unavoidable. I did not notice any significant problems due to this since I started working on gcc (2001), however. Zdenek
Re: GCC 4.1 Projects
Hello, > >>>>> Zdenek Dvorak writes: > > Zdenek> I must admit I have very bad feeling about the whole "4.1 Projects" > Zdenek> stuff. IMHO this over-organizes things. If people in general > disagree > Zdenek> with the Nathan's changes, or if there are any reasons to think that > Zdenek> they are not tested enough or whatever when he submits them, of course > Zdenek> that is something else. But I don't think having just a single person > Zdenek> decide which patches may go in and which must wait, or even just > judging > Zdenek> their importance, is a good idea. > > Given the number of branches and major changes queued up for GCC > 4.1, we need to coordinate the way that they are merged to try to avoid > too much breakage and disruption. Mark is the GCC Release Manager and he > has the authority to try to herd the cats and manage this process. at the beginning of the stage 1, there always is lot of major changes queued up. It never lead to unmanageable amount of "breakage and disruption". > I would appreciate if you and everyone else who have disagreements > with individual decisions would stop turning these into complaints about > the entire process. This was not my intention. It may well be that the Nathan's case is the only one where the plan causes significant problems to anyone. I simply do not like the idea of someone centrally planning the future of GCC, and I would complain even if things worked perfectly. > There is no perfect solution. I'm sure that we all would be happy to > consider constructive suggestions, but complaining about the abstract > concept of someone else having control or making decisions is not > helpful. As for constructive suggestions -- why not just forget on whole plan and let the things work out the way they always did? As you say, there is no perfect solution, so why to complicate people's lives with trying to get one? Zdenek
Re: GCC 4.1 Projects
Hello, > (This time around, the there seemed to be consensus that all proposals > be made public upon submission. Since that seems to be the consensus, > I'll implement that in the next release cycle.) yes, this would definitely make sense. Zdenek
Re: GCC 4.1 Projects
Hello, > >at the beginning of the stage 1, there always is lot of major changes > >queued up. It never lead to unmanageable amount of "breakage and > >disruption". > > Other people have a different perception than you do. > > >As for constructive suggestions -- why not just forget on whole plan > >and let the things work out the way they always did? As you say, > >there is no perfect solution, so why to complicate people's lives with > >trying to get one? > > I can't say I found it incredibly thrilling to try to work out the > project ordering, and I certainly have plenty of other work to do. If > there's consensus that this was a bad idea, I'm happy not to do it > again. However, one person has already said that they found it valuable. > > I think it's a little early to start the post-mortem. Perhaps in Stage > 3, we could have a better discussion about how well this worked out. > Or, perhaps it will become obvious before then, one way or the other. agreed. Anyway, thank you for all the work you spend on trying to make the whole thing work; while I have my objections against the idea, it may turn out that most of the people work in a different way and will actually prefer it (I admit to enjoy a controlled amount of chaos :-) Zdenek
Re: GIV optimizations
Hello, > Giv optimizations are just features which not > implemented yet in the new loop unroller, so I think > put it in bugzilla is not appropriate. it most definitely is appropriate. This is a performance regression. Even if it would not be, feature requests can be put to Bugzilla. The best of course would be if you could create a small testcase demonstrating what you would like the compiler to achieve. Zdenek
Re: Strange missing tree constant propagation
Hello, > Why is D.1476 not being propagated? IVOPTS introduces it, > but I don't see any reason why... > > Also, why all the leading zeros? Is there something special > about that constant? The initial RTL gcc produces for the > assignment to D.1476 is also suboptimal: > > ;; D.1476 = -1 > (insn 21 19 22 (set (reg:SI 64) > (const_int -1 [0x])) -1 (nil) > (nil)) > > (insn 22 21 0 (set (reg:SI 62 [ D.1476 ]) > (reg:SI 64)) -1 (nil) > (nil)) > > Strange... Does anyone know a reason for why this happens? I think the constant has TREE_OVERFLOW set; and from mostly historical reasons optimizers are very conservative when dealing with such constants. There are two things that need to be done. The first is to check why ivopts produce the overflowed constant and fix that (ivopts should not produce unnecessary overflowed constants, since they are handled in a very conservative way in fold). The second one is to remove the restrictions (remove the check for TREE_OVERFLOW from is_gimple_min_invariant) and deal with the problems it exposes in tree -> rtl expansion. I think I will try the later fix today (mostly because I no longer remember what exactly were the problems that lead me to introducing the TREE_OVERFLOW check to is_gimple_min_invariant). Zdenek
Re: Strange missing tree constant propagation
Hello, > >I think I will try the later fix today (mostly because I no longer > >remember what exactly were the problems that lead me to introducing > >the TREE_OVERFLOW check to is_gimple_min_invariant). > > http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00786.html > And yes the testcase still fails when you remove TREE_OVERFLOW from > is_gimple_min_invariant, at least it did three weeks ago when I tried > this. actually, it does not for me (after fixing a trivial problem in find_taken_edge_cond_expr). I am now bootstrapping and regtesting the patch on every architecture I have access to. Zdenek
Re: Weird behavior in ivopts code
Hello, > Which appears to walk down the array and try and choose better IV sets. > Since it walks down the IV array, which is in SSA_NAME_VERSION order. > Thus two loops which are equivalent in all ways except that they use > different SSA_NAME_VERSIONs can get different IV sets. > > Anyway, the instability of the IV opts code when presented with loops > that differ only in the version #s in the SSA_NAMEs they use is really > getting in the way of understanding the performance impact of the > new jump threading code. I would expect this instability to also make > it difficult to analyze the IVopts in general. there's not much to do about the matter. The choice of ivs in ivopts is just a heuristics (and this cannot be changed, due to compiler performance reasons), and as such it is prone to such instabilities. In fact, both choices of ivs make sense, and they have the same cost w.r. to the cost function used by ivopts. Anyway, in this particular case, the patch below should make ivopts to prefer the choice of preserving the variable 'i' (which is better from the point of preserving debugging information). Zdenek Index: tree-ssa-loop-ivopts.c === RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v retrieving revision 2.21 diff -c -3 -p -r2.21 tree-ssa-loop-ivopts.c *** tree-ssa-loop-ivopts.c 27 Oct 2004 20:27:20 - 2.21 --- tree-ssa-loop-ivopts.c 27 Oct 2004 20:30:19 - *** determine_iv_cost (struct ivopts_data *d *** 3326, cand->cost = cost_step + cost_base / AVG_LOOP_NITER (current_loop); ! /* Prefer the original iv unless we may gain something by replacing it. */ ! if (cand->pos == IP_ORIGINAL) cand->cost--; /* Prefer not to insert statements into latch unless there are some --- 3327,3337 cand->cost = cost_step + cost_base / AVG_LOOP_NITER (current_loop); ! /* Prefer the original iv unless we may gain something by replacing it; ! this is not really relevant for artificial ivs created by other ! passes. */ ! if (cand->pos == IP_ORIGINAL ! && !DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) cand->cost--; /* Prefer not to insert statements into latch unless there are some
Re: Question on tree-ssa-loop-im.c:for_each_index
Hello, > VIEW_CONVERT_EXPR is tcc_reference, but we can have a statement like: > > x = 22; what is the semantics of this expression? Should not this rather be x = 22 (or just INTEGER_CST:some_type 22)? Zdenek > What ends up happening here is that find_interesting_uses_stmt calls > find_interesting_uses_address, which goes down the references and > runs into the constant, which it doesn't know how to handle. I think > the simplest fix is below. It's certainly safe in that it converts > an ICE into not aborting and seems to do the right thing. The test case > is in Ada and proprietary and it didn't seem worthwhile to make a small > one for something this simple. > > Does the following look correct? > > *** tree-ssa-loop-im.c11 Mar 2005 09:05:10 - 2.31 > --- tree-ssa-loop-im.c19 Mar 2005 00:12:02 - > *** for_each_index (tree *addr_p, bool (*cbc > *** 195,198 > --- 195,199 > case STRING_CST: > case RESULT_DECL: > + case INTEGER_CST: > return true; >
Re: Question on tree-ssa-loop-im.c:for_each_index
Hello, > > x = 22; > > what is the semantics of this expression? Should not this rather be > > x = 22 > > (or just INTEGER_CST:some_type 22)? > > Depends what the type is. If it's an array type, then there's no > simple equivalent expression. using CONSTRUCTOR node? Zdenek
Re: Question on tree-ssa-loop-im.c:for_each_index
Hello, > > Depends what the type is. If it's an array type, then there's no > > simple equivalent expression. > > using CONSTRUCTOR node? > > What I mean by "simple" is something that's easy to derive. Suppose I have > a record with numerous fields of various sizes and I unchecked convert a > constant to it. Yes, I could write code to convert that into a CONSTRUCTOR > by figuring out what value should be assigned to each field, but that's a > pretty large block of code for a pretty marginal case. however, avoiding possibly a large amount of bugs in code that does not expect this corner case. I would certainly consider it much cleaner solution than adding hacks to for_each_index and possibly other places that do not expect something as weird. That being said, I think your patch is safe and will not break any of the uses of for_each_index. Zdenek
Re: [rtl-optimization] Improve Data Prefetch for IA-64
Hello, > On Sunday 27 March 2005 03:53, Canqun Yang wrote: > > The last ChangeLog of rtlopt-branch was written in > > 2003. After more than one year, many impovements in > > this branch haven't been put into the GCC HEAD. Why? > > Almost all of the rtlopt branch was merged. Prefetching is one > of the few things that was not, probably Zdenek knows why. because I never made it work as well as the current version, basically. At least no matter how much I tried, I never produced any benchmark numbers that would justify the change. Then I went to tree-ssa (and tried to write prefetching there, and got stuck on the same issue, after which I simply forgot due to loads of other work :-( ). Zdenek
Re: [rtl-optimization] Improve Data Prefetch for IA-64
Hello, > On Sunday 27 March 2005 04:45, Canqun Yang wrote: > > Another question is why the new RTL loop-unroller does > > not support giv splitting. > > Apparently because for most people it is not a problem that it does > not do it, and while you have indicated earlier that it may be useful > for you, you have neither tried to implement it yourself, nor provided > a test case to PR20376. > > FWIW you could try -fweb and see if it does what you want. And if it > does, you could write a limited webizer patch that works on just loop > bodies after unrolling. from what I understood, change to analyze_iv_to_split_insn that would detect memory references should suffice. Also maybe this patch might be relevant: http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01176.html Zdenek
Re: [rtl-optimization] Improve Data Prefetch for IA-64
Hello, > > Steven Bosscher <[EMAIL PROTECTED]>: > > > > > > > > What happens if you use the memory address unrolling > > patch, turn on > > > -fweb, and set the unrolling parameters properly? > > > > > > > The memory address unrolling patch can't work on IA- > > 64, > > Ah, no base+offset. It could be made to work with a little > work though. The IV splitting needs to be taught about GIVs, > and when that is in place, it should work, right? > (IMHO the current RTL IV analysis is not very nice to work > with, we could look into improving that first...) what problems do you have concretely in mind? > On the other hand, we may not even need it if the RTL unroller > will be replaced by the tree one. I don't know what Zdenek's > plans are about that, or if this is a wise thing to do at all. I don't think it is the case; as far as I know, it is recommended to do non-specific unrolling just before or during scheduling. (By "non-specific" I mean unrolling the loops without any other reason other than just getting them unrolled). Zdenek
Re: Merging stmt_ann_d into tree_statement_list_node
Hello, > I am thinking about merging stmt_ann_d into tree_statement_list_node. > > Background > -- > > tree_statement_list_node is a linked list node defined like so > > struct tree_statement_list_node { > struct tree_statement_list_node *prev; > struct tree_statement_list_node *next; > tree stmt; > }; > > stmt_ann_d is an annotation for a statement ("stmt" above) that > includes operand cache, the basic block that the statement belongs to, > and various flags among other things. guessing from (*) below, you probably just forgot to mention that it also involves changing SSA_NAME_DEF_STMT to point to the tree_statement_list_node rather than the statement itself, otherwise there is no way how to get the annotation for SSA_NAME_DEF_STMT (name)? > Justifications > -- > > o We have a 1:1 correspondence between tree_statement_list_node and > stmt_ann_d. Why have separate data structures? > > o Cache win. Calling bsi_stmt loads tree_statement_list_node into > memory, which might in turn bring members of stmt_ann_d into cache. > Note that it's very common to call bsi_stmt and then do something > with operand cache while walking statements in a basic block. > > o bsi_for_stmt becomes an O(1) operation. (*) > Steven Bosscher says that > the CFG inliner will hit bsi_for_stmt hard. This is just because CFG inliner is badly written; Honza has already fixed that in his private tree. > o Less memory management overhead. > > o One step closer to a flat statement structure (or a tuple form if > you like). It's a bit silly that we have GIMPLE, but we still do > not have a flat structure. > > o We can probably remove get_stmt_ann and replace all uses of > stmt_ann. No more lazy stmt_ann allocation. > > Thoughts? It is definitely a good idea, IMHO. Zdenek
Re: Merging stmt_ann_d into tree_statement_list_node
Hello, > > o One step closer to a flat statement structure (or a tuple form if > > you like). It's a bit silly that we have GIMPLE, but we still do > > not have a flat structure. on a side note, I am currently working on cleaning up gimplification (mostly to enable embedding of simple and fast cleanup passes like local value numbering, copy/constant propagation and unreachable code removal). As a side effect, this will cause gimplifier to directly produce gimple statements instead of gradually rewriting the function tree. I will tell more about this project on GCC summit. Once this is done, I would like to start working on moving gimple out of trees to the "flat" structure. Zdenek
Re: Merging stmt_ann_d into tree_statement_list_node
Hello, > > Then why not simply shorten this to: > > > > 1) put stmt annoation directly in the tree_statement_list_node > > > > 2) replace stmt_ann_t pointer in stmt with a pointer to its BSI, or the > > tree_statement_list_node. This makes bsi_from_stmt O(1) immediately. > > > > 3) all stmts now have access to the stmt_ann directly in the > > stmt_list_node, nothing changes except get_stmt_ann() returns the > > address of the stmt ann within the tree_stmt_list_node. > > > > 4) For fake stmts you will have to allocate a fake tree_stmt_list_node > > which isnt linked with anything, and set the bsi pointer to that... then > > you have the annotation and access to it from the stmt. > > > > that seems like a quicker more direct approach then phasing it in > > slowly, with less impact to anyone, and gives you immediate benefits. I > > would think the patch to do this would be fairly small and > > non-intrusive. > > This looks like a better approach. How would we do your step 1? We > have var_ann and tree_ann in addition to stmt_ann. Shall we put a > type field at the beginning of tree_statement_list_node+stmt_ann_d so > that an annotation node can identify itself? (Since all these tree > annotations already have a field for annotation type, it's more like > appending tree_statement_list_node to stmt_ann_d.) I would go just for having union { struct stmt_list_node *container; /* For gimple statements. */ tree_ann_t ann; /* For everything else. */ } (Plus some GTY annotations around that enable the garbage collector to recognize this case; as far as my memory serves, I needed something similar once and it was doable). This way you won't waste space for the useless tree_statement_list_node "type field". Zdenek
Re: Merging stmt_ann_d into tree_statement_list_node
Hello, > > > This looks like a better approach. How would we do your step 1? We > > > have var_ann and tree_ann in addition to stmt_ann. Shall we put a > > > type field at the beginning of tree_statement_list_node+stmt_ann_d so > > > that an annotation node can identify itself? (Since all these tree > > > annotations already have a field for annotation type, it's more like > > > appending tree_statement_list_node to stmt_ann_d.) > > > > I would go just for having > > > > union > > { > > struct stmt_list_node *container; /* For gimple statements. */ > > tree_ann_t ann; /* For everything else. */ > > } > > Err, I don't see how to tell the garbage collector about this without > a type field. We cannot rely on TREE_CODE (stmt) because CALL_EXPR > may appear by itself or as an rhs of MODIFY_EXPR. in the later case, they don't have annotations? But OK, there may be some other case I am forgetting now, so perhaps making it safe and having a type field is a better idea. Zdenek
Re: Testcase for loop in try_move_mult_to_index?
Hello, > I cannot see how fold-const.c:try_move_mult_to_index ever > successfully will do a transformation if the loop > > for (;; ref = TREE_OPERAND (ref, 0)) > { > if (TREE_CODE (ref) == ARRAY_REF) > { > ... > break; > } > > if (!handled_component_p (ref)) > return NULL_TREE; > } > > will not break in the first iteration. What kind of complex > trees are we trying to match here? for example struct x { int x; int y; } a[1000]; &a[i].x + 8 * j is transformed to &a[i+j].x Zdenek
Re: Struggle with FOR_EACH_EDGE
Hello, > >To see what kind of code GCC generates for FOR_EACH_EDGE, I wrote a > >simple dummy function. > > Kazu, I hope I have time to look in detail at your investigation, however > one thing that might be causing a problem is that FOR_EACH_EDGE is an > iterator > that works for a non-constant VEC. This means there's an extra level of > indirection that the optimizer cannot remove, unless it completely inlines > the body of the loop (there might be some cases it can do it without > inlining, > but I suspect not). I wonder if it is worthwhile implementing FOR_EACH_EDGE > and FOR_EACH_EDGE_VOLATILE (I want the default, shorter name, to be the > constant iterator, so that you have to chose to use the non-constant one). > > Even with a constant iterator, it might be necessary to help the optimizer > by copying the vector length to somewhere local that the optimizer can see > cannot be changed by the body of the loop. Hmm, I guess that does mean you > need a proper iterator object, rather than the integer index that > VEC_iterate > employs. you can always start the index from number of edges - 1 and iterate to zero. But a proper iterator might be useful anyway. Zdenek
Problem with -ftree-ter and loop unrolling
Hello, I am just playing with loop unrolling on trees (for the purposes of prefetching), and I encountered the following problem. Consider loop sum1 = 0; sum2 = 0; for (i = 0; i < n; i++) { x_1 = a[i]; y_1 = b[i]; sum1 += x_1 * y_1; sum2 += x_1 / y_1; } Now after unrolling we get (with some handling for remaining iterations of the loop) sum1 = 0; sum2 = 0; for (i = 0; i < n; i+=4) { x_1 = a[i]; y_1 = b[i]; sum1 += x_1 * y_1; sum2 += x_1 / y_1; x_2 = a[i+1]; y_2 = b[i+1]; sum1 += x_2 * y_2; sum2 += x_2 / y_2; x_3 = a[i+2]; y_3 = b[i+2]; sum1 += x_3 * y_3; sum2 += x_3 / y_3; x_4 = a[i+3]; y_4 = b[i+3]; sum1 += x_4 * y_4; sum2 += x_4 / y_4; } So far OK, but with ter, this becomes sum1 = 0; sum2 = 0; for (i = 0; i < n; i+=4) { x_1 = a[i]; y_1 = b[i]; x_2 = a[i+1]; y_2 = b[i+1]; x_3 = a[i+2]; y_3 = b[i+2]; x_4 = a[i+3]; y_4 = b[i+3]; sum1 += x_1 * y_1 + x_2 * y_2 + x_3 * y_3 + x_4 * y_4; sum2 += x_1 / y_1 + x_2 / y_2 + x_3 / y_3 + x_4 / y_4; } Now we need some 11 registers for the loop, instead of the original 5 (and the number of registers grows with the unroll factor). This causes huge regressions (more than 60% on sixtrack from spec2000). It might perhaps be a good idea to limit the size of expressions ter produces, or in some other way restrict the way it increases the lives of registers? Zdenek
Re: Problem with -ftree-ter and loop unrolling
Hello, > > > So far OK, but with ter, this becomes > > > > > > sum1 = 0; > > > sum2 = 0; > > > for (i = 0; i < n; i+=4) > > > { > > > x_1 = a[i]; > > > y_1 = b[i]; > > > x_2 = a[i+1]; > > > y_2 = b[i+1]; > > > x_3 = a[i+2]; > > > y_3 = b[i+2]; > > > x_4 = a[i+3]; > > > y_4 = b[i+3]; > > > sum1 += x_1 * y_1 + x_2 * y_2 + x_3 * y_3 + x_4 * y_4; > > > sum2 += x_1 / y_1 + x_2 / y_2 + x_3 / y_3 + x_4 / y_4; > > > } > > > > > > Now we need some 11 registers for the loop, instead of the original 5 > > > (and the number of registers grows with the unroll factor). > > > > The TER hack we settled on for PR17549 was supposed to prevent this kind > > of thing, but it was already obvious at the time that a better fix is > > needed in the general case. You've find a pretty nasty one here. > > Why didn't it trigger? I can't reproduce it by a bit of simple hacking > around, have you got a little testcase and options to turn on to produce > this? -O1 suffices. The (sum? + 1) is needed to workaround the hack introduced to fix PR17549 (and it is very close to what happens in sixtrack, except that there the operation with the accumulated variable is a bit more complicated). Zdenek int a[200], b[200]; void xxx(void) { int i, sum1 = 0, sum2 = 0, x, y; for (i = 0; i < 200; i+=8) { x = a[i]; y = b[i]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+1]; y = b[i+1]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+2]; y = b[i+2]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+3]; y = b[i+3]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+4]; y = b[i+4]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+5]; y = b[i+5]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+6]; y = b[i+6]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+7]; y = b[i+7]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; } bla (sum1, sum2); }
Re: Type mismatch in tree-ssa-loop-ivopts.c:5436 (rewrite_address_base)
Hello, > Don't know how to fix this - nothing obvious. But we create at > > *op = build1 (INDIRECT_REF, TREE_TYPE (*op), with); > > an INDRECT_REF of the form > > type size > unit size > align 8 symtab 1075593216 alias set -1 precision 8 min > max > pointer_to_this > > > arg 0 type frame_state> > public unsigned SI > size > unit size > align 32 symtab 0 alias set -1> > var def_stmt 0x4066c1f8> > version 38 > ptr-info 0x4065b8ac>> > > where we confuse type and > type . Testcase is compiling > with -O2 -g > > /net/alwazn/home/rguenth/src/gcc/cvs/gcc-4.1/gcc/unwind-dw2.c: In function > '__frame_state_for': > /net/alwazn/home/rguenth/src/gcc/cvs/gcc-4.1/gcc/unwind-dw2.c:1036 > > > Please fix / advice. rewrite_address_base is a piece of hack needed to cheat the alias representation system. I would not worry about it much now, once the TARGET_MEM_REF patch is approved, this whole function will be removed. Zdenek
Re: tree-ssa-address ICE
Hello, > On Wednesday 08 June 2005 12:01, Nick Burrett wrote: > > $ ./cc1 -quiet test.c -mthumb -O2 > > ../../bug.c: In function ?foo?: > > ../../bug.c:3: internal compiler error: in create_mem_ref, at > > tree-ssa-address.c:585 > > Please submit a full bug report, > ^^^ > ;-) > > > And some more information in the mean time: > > Starting program: /home/steven/devel/build-test/gcc/cc1 -mthumb t.c -O > foo > > Breakpoint 1, fancy_abort (file=0xa19258 > "../../mainline/gcc/tree-ssa-address.c", line=585, > function=0xa192d3 "create_mem_ref") at diagnostic.c:588 > 588 internal_error ("in %s, at %s:%d", function, trim_filename (file), > line); > (gdb) up > #1 0x005930da in create_mem_ref (bsi=0x7fbfffd370, > type=0x2a95896750, addr=0x7fbfffd390) > at tree-ssa-address.c:585 > 585 gcc_unreachable (); > (gdb) p debug_generic_stmt(bsi.tsi.ptr.stmt) > # VUSE ; > c1D.1197_10 = *s1D.1192_27; > > $12 = void > (gdb) p parts > $13 = {symbol = 0x0, base = 0x0, index = 0x2a95981380, step = 0x0, offset = > 0x0} a patch, I will submit it once passes regtesting (only the tree-ssa-address.c:addr_for_mem_ref part is important, but I have rather changed also the tree-ssa-loop-ivopts.c parts that could cause similar problems). Zdenek Index: tree-ssa-address.c === RCS file: /cvs/gcc/gcc/gcc/tree-ssa-address.c,v retrieving revision 2.1 diff -c -3 -p -r2.1 tree-ssa-address.c *** tree-ssa-address.c 7 Jun 2005 12:01:28 - 2.1 --- tree-ssa-address.c 8 Jun 2005 14:34:35 - *** addr_for_mem_ref (struct mem_address *ad *** 198,205 templates_initialized = true; sym = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup ("test_symbol")); ! bse = gen_raw_REG (Pmode, FIRST_PSEUDO_REGISTER); ! idx = gen_raw_REG (Pmode, FIRST_PSEUDO_REGISTER + 1); for (i = 0; i < 32; i++) gen_addr_rtx ((i & 16 ? sym : NULL_RTX), --- 198,205 templates_initialized = true; sym = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup ("test_symbol")); ! bse = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 1); ! idx = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 2); for (i = 0; i < 32; i++) gen_addr_rtx ((i & 16 ? sym : NULL_RTX), Index: tree-ssa-loop-ivopts.c === RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v retrieving revision 2.78 diff -c -3 -p -r2.78 tree-ssa-loop-ivopts.c *** tree-ssa-loop-ivopts.c 7 Jun 2005 22:44:56 - 2.78 --- tree-ssa-loop-ivopts.c 8 Jun 2005 14:34:35 - *** add_cost (enum machine_mode mode) *** 3149,3156 start_sequence (); force_operand (gen_rtx_fmt_ee (PLUS, mode, !gen_raw_REG (mode, FIRST_PSEUDO_REGISTER), !gen_raw_REG (mode, FIRST_PSEUDO_REGISTER + 1)), NULL_RTX); seq = get_insns (); end_sequence (); --- 3149,3156 start_sequence (); force_operand (gen_rtx_fmt_ee (PLUS, mode, !gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), !gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)), NULL_RTX); seq = get_insns (); end_sequence (); *** multiply_by_cost (HOST_WIDE_INT cst, enu *** 3221,3228 (*cached)->cst = cst; start_sequence (); ! expand_mult (mode, gen_raw_REG (mode, FIRST_PSEUDO_REGISTER), GEN_INT (cst), ! NULL_RTX, 0); seq = get_insns (); end_sequence (); --- 3221,3228 (*cached)->cst = cst; start_sequence (); ! expand_mult (mode, gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), ! gen_int_mode (cst, mode), NULL_RTX, 0); seq = get_insns (); end_sequence (); *** multiplier_allowed_in_address_p (HOST_WI *** 3247,3253 if (!valid_mult) { ! rtx reg1 = gen_raw_REG (Pmode, FIRST_PSEUDO_REGISTER); rtx addr; HOST_WIDE_INT i; --- 3247,3253 if (!valid_mult) { ! rtx reg1 = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 1); rtx addr; HOST_WIDE_INT i; *** get_address_cost (bool symbol_present, b *** 3305,3316 HOST_WIDE_INT i; initialized = true; ! reg1 = gen_raw_REG (Pmode, FIRST_PSEUDO_REGISTER); addr = gen_rtx_fmt_ee (PLUS, Pmode, reg1, NULL_RTX); for (i = 1; i <= 1 << 20; i <<= 1) { ! XEXP (addr, 1) = GEN_INT (i); if (!memory_address_p (Pmode, addr)) break; } --- 3305,3316 HOST_WIDE_INT i; initialized = true; ! reg1 = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 1); addr = gen_rtx_fmt_ee (PLUS, Pmode, reg1, NULL_RTX); for (i = 1; i <= 1 << 20; i
Problem with recompute_tree_invarant_for_addr_expr
Hello, I have run into the following problem: recompute_tree_invarant_for_addr_expr says that address of a decl is invariant if it is a decl in the current function: if (decl_function_context (node) == current_function_decl) ... /* set TREE_INVARIANT */ This has a small flaw -- in case NODE has variable size, it gets allocated on stack dynamically, it may be allocated and deallocated several times, and its address is no longer an invariant. So I tried to fix the code as follows: if (decl_function_context (node) == current_function_decl && TREE_CONSTANT (DECL_SIZE (node)) ... /* set TREE_INVARIANT */ The problem is that tree-nested.c builds frame_decl, adds fields to it one by one, sometimes takes its address, and only in the end lays it out. This means that at the point the address is taken, DECL_SIZE is not yet known and it is NULL. In fact, in the moment the address is taken, there seems to be no way how to tell whether the size of frame_decl will be variable or not. So I tried a conservative variant: if (decl_function_context (node) == current_function_decl && DECL_SIZE (node) && TREE_CONSTANT (DECL_SIZE (node)) ... /* set TREE_INVARIANT */ This works up to moment verify_stmts is called, which ends with "invariant not recomputed when ADDR_EXPR changed" (because now frame_decl is laid out, and we know its size is constant, and that the address is invariant). Other possibility would be to be "optimistic": if (decl_function_context (node) == current_function_decl && (!DECL_SIZE (node) || TREE_CONSTANT (DECL_SIZE (node))) ... /* set TREE_INVARIANT */ which would probably work for frame_decl, as it should be allocated just once (I hope), but is not very safe and still would ICE in "invariant not recomputed when ADDR_EXPR changed", this time in case if frame_decl were variable-sized. It might be possible to special-case frame_decl in recompute_tree_invarant_for_addr_expr, but it would be quite ugly hack, and it is not somehow terribly simple to do anyway. The other possibility is to somehow remember the list of places where we take address of something based on frame_decl, and recompute TREE_INVARIANT for them once frame_decl is laid out. Any simpler idea how to solve this problem? Zdenek
Re: Problem with recompute_tree_invarant_for_addr_expr
Hello, > On Mon, Jun 13, 2005 at 07:39:33PM +0200, Zdenek Dvorak wrote: > > This has a small flaw -- in case NODE has variable size, it gets > > allocated on stack dynamically, it may be allocated and deallocated > > several times, and its address is no longer an invariant. > > > > So I tried to fix the code as follows: > > > > if (decl_function_context (node) == current_function_decl > > && TREE_CONSTANT (DECL_SIZE (node)) > > ... /* set TREE_INVARIANT */ > > Such nodes should never be seen having their address taken. They > should be completely removed during gimplification. So I think we > should stop here and figure out why you care about such nodes. good point, thanks (I don't precisely remember why I thought this change was necessary; I did it while working on some changes to gimplification, so perhaps I broke something there). Zdenek
Re: Problem with recompute_tree_invarant_for_addr_expr
Hello, > On Mon, Jun 13, 2005 at 07:39:33PM +0200, Zdenek Dvorak wrote: > > This has a small flaw -- in case NODE has variable size, it gets > > allocated on stack dynamically, it may be allocated and deallocated > > several times, and its address is no longer an invariant. > > > > So I tried to fix the code as follows: > > > > if (decl_function_context (node) == current_function_decl > > && TREE_CONSTANT (DECL_SIZE (node)) > > ... /* set TREE_INVARIANT */ > > Such nodes should never be seen having their address taken. They > should be completely removed during gimplification. So I think we > should stop here and figure out why you care about such nodes. OK, I remembered. I put if (is_gimple_min_invariant (t)) or if (is_gimple_val (t)) { shortcut; } type constructs on some places in gimplification. Which causes problems, since is_gimple_min_invariant only checks the TREE_INVARIANT flag for ADDR_EXPRs, and ADDR_EXPRs of variable sized decls then leak through. The change to recompute_tree_invarant_for_addr_expr was intended to prevent this. Zdenek
Re: Problem with recompute_tree_invarant_for_addr_expr
Hello, > On Mon, Jun 13, 2005 at 10:54:23PM +0200, Zdenek Dvorak wrote: > > OK, I remembered. I put > > > > if (is_gimple_min_invariant (t)) > > > > or > > > > if (is_gimple_val (t)) > > { > > shortcut; > > } > > > > type constructs on some places in gimplification. > > With an aim toward speeding up gimplification, I guess. Any idea how > much benefit you get from that? not really. The problem anyway is that it does not quite work even when the decls with variable size are taken care of, as there are other constructions with that ADDR_EXPR gets marked TREE_INVARIANT, but that are not gimple. Zdenek > As for the original question, you could try modifying tree-nested.c to > use a local routine for creating the ADDR_EXPRs of the frame object, > and force TREE_INVARIANT on there.
Re: -floop-optimize2
Hello, > I've noticed that -floop-optimize2 tends to be a pessimism on many > algorithms. > > I'm hesitant to file this as a "bug", given that -floop-optimize2 is a > "new" replacement for the older loop optimizer. > > Is -floop-optimize2 still in development, and not ready yet -- or are > the problems I'm seeing something that should be analyzed and reported > as a bug? if you have testcases, I would be definitely interested. Zdenek
Re: -fprofile-arcs changes the structure of basic blocks
Hello, > I want to use profiling information. I know there're two relevent > fields in each basic block, count and frequency. I want to use > frequency because the compiled program is for another architecture so > it cannot run on the host. > > I use -fprofile-arcs. this does not make much sense to me. If you cannot run the compiled program, adding instrumentation with -fprofile-arcs is useless. In this case, you may just use the guessed frequencies (that you get by default without any flags). Zdenek > And I can see the frequency value when I debug > cc1. But I happen to realize that when I add -fprofile-arcs, it change > the the whole structure of basic block. I compared the vcg output > files with and without the -fprofile-arcs. I found they're totally > different. > > My question is why it is so? I want to know the profiling info, but if > profiling info I get is for another different structure of basic > block, it's useless to me. > > Where do I go wrong here? Which option is the suitable in this case? > The gcc version is 3.3.3. Thanks. > > > Regards, > Timothy
Re: Question on tree-ssa-loop-ivopts.c:constant_multiple_of
Hello, > Isn't it the case that *any* conversion can be stripped for the purpose > of this routine? I get an ICE compiling the Ada RTS a-strfix.adb because > of that. The following seems to fix it, but is it right? no, it is not. The uses of constant_multiple_of expect that it determines whether TOP = CST * BOT, except for no-op casts. It is true that there are some sequences of casts that in the end satisfy this, but are not handled by constant_multiple_of, but you would need to be more careful when changing it. The bug you describe most probably is PR 21963 (and there is a patch for this PR submitted). Zdenek > *** tree-ssa-loop-ivopts.c26 Jun 2005 21:21:32 - 2.82 > --- tree-ssa-loop-ivopts.c29 Jun 2005 21:38:29 - > *** constant_multiple_of (tree type, tree to > *** 2594,2599 > bool negate; > > ! STRIP_NOPS (top); > ! STRIP_NOPS (bot); > > if (operand_equal_p (top, bot, 0)) > --- 2594,2603 > bool negate; > > ! /* For determining the condition above, we can ignore all conversions, not > ! just those that don't change the mode, so can't use STRIP_NOPS here. */ > ! while (TREE_CODE (top) == NOP_EXPR || TREE_CODE (top) == CONVERT_EXPR) > ! top = TREE_OPERAND (top, 0); > ! while (TREE_CODE (bot) == NOP_EXPR || TREE_CODE (bot) == CONVERT_EXPR) > ! bot = TREE_OPERAND (bot, 0); > > if (operand_equal_p (top, bot, 0))
Re: PR22336 (was: Problem with tree-ssa-loop-ivopts.c:get_computation-cost)
Hello, > If no one is suggesting an alternative, tested on x86 and x86_64-linux > where it restores bootstrap (at last :), ok to commit? I think the patch is fine (although of course I cannot approve it). The more proper fix would be to never expand such expressions from ivopts; I have a patch for this, but it needs more testing and benchmarking. Zdenek > We're down to 6 ACATS FAIL on x86_64 and 8 on x86: > > http://gcc.gnu.org/ml/gcc-testresults/2005-07/msg00654.html > http://gcc.gnu.org/ml/gcc-testresults/2005-07/msg00653.html > > common: > 18818: cd10002 (run) stream attributes > 18819: cdd2a02 (run, works at -O0 everywhere) works with -O2 -fno-tree-sra > 22333: c34007p c34007r c45282b (run) spurious discriminant CONSTRAINT_ERROR > > x86 only: > 18659: c32001e c64105b c95086b (ICE x86, ppc, ia64, works on x86_64, pass > everywhere at -O0) works with -O2 -fno-tree-sra > > x86_64 only: > 20548: c52103x (run) segfault at runtime on x86_64 and hppa > > Laurent > > 2005-07-12 Richard Kenner <[EMAIL PROTECTED]> > Laurent GUERBY <[EMAIL PROTECTED]> > > PR tree-optimization/22336 > * function.c (record_block_change): Check for > cfun->ib_boundaries_block. > > > Index: function.c > === > RCS file: /cvs/gcc/gcc/gcc/function.c,v > retrieving revision 1.635 > diff -u -r1.635 function.c > --- function.c 7 Jul 2005 21:04:31 - 1.635 > +++ function.c 10 Jul 2005 19:06:10 - > @@ -5502,6 +5502,9 @@ >if (!block) > return; > > + if(!cfun->ib_boundaries_block) > +return; > + >last_block = VARRAY_TOP_TREE (cfun->ib_boundaries_block); >VARRAY_POP (cfun->ib_boundaries_block); >n = get_max_uid (); > > > On Tue, 2005-07-05 at 10:31 +0200, Laurent GUERBY wrote: > > This is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22212 > > and is the problem blocking Ada bootstrap on x86_64-linux, > > it would be great to move forward on this. > > > > Laurent > > > > On Thu, 2005-06-30 at 18:18 -0400, Richard Kenner wrote: > > > This function generates RTL from an expression to see how many RTL insns > > > it is. But this causes a problem compiling the Ada ACATS test cxa4006. > > > > > > The problem is when part of the expression has a location. In that > > > case, record_block_change is called and that relies on > > > cfun->ib_boundaries_block being set. But it isn't because we aren't > > > expanding stuff "for real". A kludge would be to test that field, but > > > what's the proper way? > > > > > > Also, seq_cost really should be looking at next_real_insn, not NEXT_INSN, > > > since any notes shouldn't be counted. > > > >
Re: PR22336 (was: Problem with tree-ssa-loop-ivopts.c:get_computation-cost)
Hello, > I think the patch is fine (although of course I cannot approve it). > > I, as it's author, do not. But, as I keep saying, I don't know what > the proper fix is. preventing record_block_change from working in this situation seems a quite clean solution to me; of course, not expanding expressions with TREE_BLOCK set (when determining their cost) at all would be better, but somewhat more complicated. > The more proper fix would be to never expand such expressions from ivopts; > > What are "such expressions"? Expressions with TREE_BLOCK set; i.e. those coming from the compiled program. Zdenek
Re: Question on updating ssa for virtual operands (PR tree-optimization/22543)
Hello, > > The other thing we could try to do is put virtual variables in loop-closed- > > form, at least just before the vectorizer, and at least just for some > > loops. Does this sound reasonabale? (By the way, why don't we keep virtual > > variables in loop-closed-form?) > > We used to, nobody could come up with a good reason to keep doing it, so > we stopped. there were couple of good reasons (consistency, faster updating after loop unrolling and loop header copying). However, LCSSA for virtual operands is memory expensive and this overweighted the advantages (in particular, slower updating during loop manipulations is compensated by everything else being faster). Zdenek
Re: -fprofile-generate and -fprofile-use
Hello, > >A more likely source of performance degradation is that loop unrolling > >is enabled when profiling, and loop unrolling is almost always a bad > >pessimization on 32 bits x86 targets. > > To clarify, I was compiling with -funroll-loops and -fpeel-loops > enabled in both cases. > > The FDO slowdown in my case was caused by the presence of some loop > invariant code that was getting removed from the loop by the loop > optimizer pass in the non-FDO case. you may try adding -fmove-loop-invariants flag, which enables new invariant motion pass. Zdenek
Re: doloop-opt deficiency
Hello, > Dale Johannesen wrote: > > I think this is a simple pasto; this code was evidently copied from > > the previous block: > > > > I don't think that this was a simple pasto. The code looks correct. > We have the same code in tree-ssa-loop-niter.c around line 436, since > we inherited this code from the rtl-level. note the extra negation in loop-iv.c. I think the fix is OK. Zdenek > > Index: loop-iv.c > > === > > RCS file: /cvs/gcc/gcc/gcc/loop-iv.c,v > > retrieving revision 2.35 > > diff -u -b -c -p -r2.35 loop-iv.c > > cvs diff: conflicting specifications of output style > > *** loop-iv.c 21 Jul 2005 07:24:07 - 2.35 > > --- loop-iv.c 29 Aug 2005 23:34:12 - > > *** iv_number_of_iterations (struct loop *lo > > *** 2417,2423 > > tmp0 = lowpart_subreg (mode, iv0.base, comp_mode); > > tmp1 = lowpart_subreg (mode, iv1.base, comp_mode); > > > > ! bound = simplify_gen_binary (MINUS, mode, mode_mmin, > >lowpart_subreg (mode, step, comp_mode)); > > if (step_is_pow2) > > { > > --- 2417,2423 > > tmp0 = lowpart_subreg (mode, iv0.base, comp_mode); > > tmp1 = lowpart_subreg (mode, iv1.base, comp_mode); > > > > ! bound = simplify_gen_binary (PLUS, mode, mode_mmin, > >lowpart_subreg (mode, step, comp_mode)); > > if (step_is_pow2) > > { > > > I don't think this fix is correct: 'bound' is used to test whether an > overflow has occured, and in the following code that uses 'bound', it > is expected to overflow for the example that you have provided: > > /* If s is power of 2, we know that the loop is infinite if >a % s <= b % s and a - s overflows. */ > assumption = simplify_gen_relational (reverse_condition (cond), > SImode, mode, > bound, tmp0); > > > > > > > The code as it was computed -2147483648-256 which overflows. > > this is the right behavior. > > > We noticed that the simple loop here > > > extern int a[]; > > int foo(int w) { > > int n = w; > > while (n >= 512) > > { > > a[n] = 42; > > n -= 256; > > } > > } > > > > > > > was being treated as ineligible for the doloop modification. > > The problem in this example is that the initial value of the induction > variable is not known: it is a parameter 'w'. Consequently we have > not enough information to determine how many times it wraps, nor for > estimating the number of iterations, if you're using the -fwrapv > semantics. > > I'm not sure whether at rtl level we have this distinction of > wrapping/non wrapping operations, but from the code that I've read, > everything wraps (probably Roger Sayle knows more about the wrapping > behavior that is expected at rtl level). Having said this, we have to > adjust the code at the tree level such that it catches this case > (conditionally on signed iv and !fwrapv). > > sebastian
Re: -fprofile-generate and -fprofile-use
Hello, > > >you may try adding -fmove-loop-invariants flag, which enables new > > >invariant motion pass. > > > > That cleaned up both my simplified test case, and the code it > > originated from. It also cleaned up a few other cases where I > > was noticing worse performance with FDO enabled. Thanks!! > > > > Perhaps this option should be enabled by default when doing FDO > > to replace the loop invariant motions done by the recently > > disabled loop optimize pass. > > This sounds like sane idea. Zdenek, is -fmove-loop-invariants dangerous > in some way or just disabled because old loop does the same? -fmove-loop-invariants is enabled by default on killloop-branch, and passes bootstrap & regtesting on i686 and x86_64; the branch does not bootstrap for me on ia64 and ppc (but neither does mainline). I am not aware of any correctness/ICE problems with -fmove-loop-invariants, but given that it is disabled by default, this does not say much. Zdenek
Problem with dom
Hello, I have run into following problem with dom. One of the places where cfg_altered is set is wrong: Index: tree-ssa-dom.c === RCS file: /cvs/gcc/gcc/gcc/tree-ssa-dom.c,v retrieving revision 2.124.2.1 diff -c -3 -p -r2.124.2.1 tree-ssa-dom.c *** tree-ssa-dom.c 2 Sep 2005 16:43:11 - 2.124.2.1 --- tree-ssa-dom.c 10 Sep 2005 16:47:06 - *** tree_ssa_dominator_optimize (void) *** 479,485 if (cfg_altered) free_dominance_info (CDI_DOMINATORS); ! cfg_altered = cleanup_tree_cfg (); if (rediscover_loops_after_threading) { --- 479,485 if (cfg_altered) free_dominance_info (CDI_DOMINATORS); ! cfg_altered |= cleanup_tree_cfg (); if (rediscover_loops_after_threading) { i.e. dom is iterating only if cfg cleanup changed something. In particular, if jump threading is performed, but cfg cleanup does not change anything (which happens sometimes), dom does not iterate. For testcases like if (n > 0) loop; if (n > 0) loop; ... if (n > 0) loop; it causes only some of the jumps to be threaded, and we get code that looks like if (n > 0) { loop; loop; loop; goto next; } if (n > 0) { next:; loop; (*) loop; loop; } This is very bad for analysis of # of iterations: Neither of the "if (n > 0)" dominates loop (*), thus we are unable to determine that n must be positive there, and we cannot determine # of iterations of the loop precisely. So I tried the obvious fix (the patch above). This fixes the problem, but also causes dom to completely unroll loops sometimes, for example for void bar(int); void foo(void) { int i; for (i = 0; i < 1; i++) { if (i & 65536) abort (); bar(i); } } (two exit conditions are necessary; for some reason, this won't happen if "if (i & 65536) abort ();" is removed). This of course is not desirable (and it is very, very slow). Any idea how to fix the first problem without introducing the second one? Zdenek
Re: Problem with dom
Hello, > > I have run into following problem with dom. One of the places > > where cfg_altered is set is wrong: > It's not really wrong, it's set this way on purpose, specifically > to avoid the compile-time explosion you're seeing. This works by pure luck only, I think. In fact, the only thing that prevents dom from unrolling loops just now is if (need_ssa_update_p ()) return false; condition in tree_can_merge_blocks_p, that prevents block merging and keeps cfg unchanged. While I do not have a testcase for this, I would not be suprised if dom could be persuaded to completely unroll a loop as it is. Nevertheless, could you please add at least a short comment explaining that this obvious mistake is an intended one to tree-ssa-dom.c? > > So I tried the obvious fix (the patch above). This fixes the problem, > > but also causes dom to completely unroll loops sometimes, for > > example for > > > > void bar(int); > > > > void foo(void) > > { > > int i; > > > > for (i = 0; i < 1; i++) > > { > > if (i & 65536) > > abort (); > > bar(i); > > } > > } > > > > (two exit conditions are necessary; for some reason, this won't > > happen if "if (i & 65536) abort ();" is removed). This of course > > is not desirable (and it is very, very slow). Any idea how to fix > > the first problem without introducing the second one? > I think the way to do this is to somehow limit the iterations when > one or more backedges there threaded. > > This may also fall out of the work Steven has started. His changes > limit DOM's ability to thread jumps when blocks get large, which is > precisely what happens when DOM unrolls this loop completely. We > get a single very very large basic block. You might poke around with > his change a little. No, it unfortunately cannot help. The new blocks are placed before the loop, and jump threading only occurs inside the loop, so the sizes of copied blocks remain small. I guess the only real fix is to remove jump threading from dom and make a separate, non-iterative pass for it. Which of course is quite a bit of work. Zdenek
Must mode of set and REG_EQUIV note match?
Hello, I have the following insn: (insn 522 521 523 87 (set (reg:SI 308) (reg:SI 0 ax)) 40 {*movsi_1} (nil) (insn_list:REG_RETVAL 520 (expr_list:REG_EQUAL (parity:DI (reg:DI 248 [ D.1874 ])) (nil Is this correct? I have a piece of code that breaks if mode of the assigned register is not equal to mode of the expression in REG_EQUAL note, and I wonder whether I should be fixing this piece of code, or the code that sets the REG_EQUAL note on this insn. Zdenek
Re: Loop information
Hello, > Can someone please help me getting the following information? > > 1) I would like to obtain the loop bounds (constant case) of all nested > loops > of a RTL insn. Is there a data structure over which I can iterate to get > bounds > for each nested loop of a RTL insn? > > 3) Can I determine if a pseudo register (RTX) is an induction variable > (linear) or not? > Which data structure gives me this information? see loop-iv.c (or a bit improved version of it on killloop-branch). You can analyze induction variables and determine # of iterations of a loop using it, which should give you the information you need. But the rtl level iv analysis is not too powerful, just sufficiently for the current uses. > 2) Is there a way of determining sequences as mentioned in the paper > "Beyond Induction Variables: Detecting and Classifying Sequences Using a > Demand-Driven SSA From" by Gerlek, Stoltz and Wolfe? Not on RTL level. On tree level, see tree-scalar-evolutions.c > 4) At RTL level, array accesses convert to MEM expressions. I was > wondering > if I can obtain the source level array name from the MEM expression. You might be able to extract this information from MEM_EXPR. Zdenek
Re: [RFC] propagating loop dependences from trees to RTL (for SMS)
Hello, > As a follow up to http://gcc.gnu.org/ml/gcc/2005-04/msg00461.html > > I would like to improve SMS by passing data dependencies information > computed in tree-level to rtl-level SMS. Currently data-dependency graph > built for use by SMS has an edge for every two data references (i.e. it's > too conservative). I want to check for every loop, using functions defined > in tree-data-ref.c, if there are data dependencies in the loop. The problem > is how to pass this information to SMS (note - we're only trying to convey > whether there are no dependencies at all in the loop - i.e. one bit of > information). The alternatives being considered are: > > 1. Introduce a new BB bit flag and set it for the header BB of a loop that > has no data dependencies. This approach already works, but only if the old > loop optimizer (pass_loop_optimize) is disabled (otherwise the bit doesn't > survive). One potential problem is that the loop header BB may change > between the tree-level and SMS as result of some optimization pass (can > that really happen?) yes -- jump threading may change loop structures, although this is fairly easy to prevent. Loop optimizations can of course alter the structure of loops, but they should be able to preserve this type of informations. > 2. Use a bitmap (as suggested in > http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01353.html) that is indexed > using the BB index. In my case I need to define and use the property within > different > functions. I can define a static function > "set_and_check_nodeps(bb_loop_header)" and define a bitmap there. > Like the previous solution, The problem that can arise is that some > intermediate optimizations can change the header of the loop. By the way, > is it guaranteed that a BB keeps the same index throught the entire > compilation? No -- basic blocks are "compacted" from time to time, to fill in holes created by removal of dead blocks. Also, basic block numbers might change due to basic block splitting/merging in some optimizations. > 3. Use insn_notes - introduce a new note "NOTE_INSN_NO_DEPS_IN_LOOP" to be > inserted after the "NOTE_INSN_LOOP_BEG" for relevant loops. NOTE_INSN_LOOP_BEG should most definitely disappear in 4.2, and introducing insn notes seems like a bad idea to me. > 4. Other ideas? Preserving the information about loops throughout the optimizations, and just keeping this information attached at the loop description would by far be the cleanest approach -- admitedly also the one that requires the greatest amount of work. Zdenek
Re: Possible bug in tree-ssa-loop-ivopts.c:prepare_decl_rtl?
Hello, > There don't seem to be many comments explaining why we're doing > what we're doing here, so I'm not sure whether this was the intended > behaviour or not. Do we really want to kick out existing DECL_RTLs, > or is there simply a missing !DECL_RTL_SET_P in the DECL_P condition: I think that adding the !DECL_RTL_SET_P should work. Nevertheless, prepare_decl_rtl and friends should disappear when killloop-branch is merged (I have a patch for this in testing, not yet commited to the branch). Zdenek
[RFC] Not using VAR_DECLs for temporary variables
Hello, during expansion of expressions, gimplification creates a lot of temporary variables. VAR_DECLs are fairly large (88 bytes on i686), and additionally an annotation (44 bytes on i686) are allocated for each of them (some of them even get names, for additional ~50 bytes of memory each). This is quite wasteful. Basic observation is that these temporaries have exactly one definition. It is therefore possible to generate directly SSA_NAMEs for them, setting SSA_NAME_VAR to NULL (and be careful to never rewrite them out of ssa). Theoretically, it should need only a few changes to gimplification, into- and out-of-ssa, and expand, and everything should work. Unfortunately, lot of SSA passes looks on SSA_NAME_VAR. This seems wrong to me, but changing all the places would be very hard. However, if we instead put a really minimal DECL node (we allocate only one per type, and consisting just of struct tree_common -- 16 bytes) as a SSA_NAME_VAR, things are much easier to get working. The patch below is the first attempt for the idea (completely untested, I just made it to work somehow so that I may measure its usefullness). We get the following results (for combine.i): -O0: TOTAL : 3.9122668 kB TOTAL : 3.8722096 kB (-2.5%) -O1: TOTAL : 20.7734640 kB TOTAL : 20.2633688 kB (-2.7%) -O2: TOTAL : 25.9341116 kB TOTAL : 25.8040582 kB (-1.2%) That is, we need about 2% less memory, and also the compilation seems to be a bit faster (maybe because of the memory savings, or possibly because less work needs to be done in into- and outof- ssa passes). However, this idea is a bit hackish, so I would like to hear an opinion about it before I spend more time on testing it. Zdenek Index: tree-into-ssa.c === *** tree-into-ssa.c (revision 108807) --- tree-into-ssa.c (working copy) *** mark_def_sites (struct dom_walk_data *wa *** 650,655 --- 650,668 SSA_OP_USE | SSA_OP_VUSE | SSA_OP_VMUSTKILL) { tree sym = USE_FROM_PTR (use_p); + + if (TREE_CODE (sym) == SSA_NAME) + { + /* Only gimplifier temporaries are now in ssa form. We had already +seen the definition, so we have assigned a version to the ssa +name, and the definition must dominate it. */ + gcc_assert (TMP_NAME_P (sym)); + gcc_assert (SSA_NAME_VERSION (sym) != 0); + gcc_assert (dominated_by_p (CDI_DOMINATORS, bb, + bb_for_stmt (SSA_NAME_DEF_STMT (sym; + continue; + } + gcc_assert (DECL_P (sym)); if (!bitmap_bit_p (kills, DECL_UID (sym))) set_livein_block (sym, bb); *** mark_def_sites (struct dom_walk_data *wa *** 674,679 --- 687,700 /* Now process the defs and must-defs made by this statement. */ FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_DEF | SSA_OP_VMUSTDEF) { + if (TREE_CODE (def) == SSA_NAME) + { + gcc_assert (TMP_NAME_P (def)); + gcc_assert (SSA_NAME_VERSION (def) == 0); + gcc_assert (SSA_NAME_DEF_STMT (def) == stmt); + register_tmp_ssa_name (def); + continue; + } gcc_assert (DECL_P (def)); set_def_block (def, bb, false); bitmap_set_bit (kills, DECL_UID (def)); *** rewrite_stmt (struct dom_walk_data *walk *** 1039,1044 --- 1060,1067 SSA_OP_ALL_USES|SSA_OP_ALL_KILLS) { tree var = USE_FROM_PTR (use_p); + if (TMP_NAME_P (var)) + continue; gcc_assert (DECL_P (var)); SET_USE (use_p, get_reaching_def (var)); } *** rewrite_stmt (struct dom_walk_data *walk *** 1048,1053 --- 1071,1078 FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_ALL_DEFS) { tree var = DEF_FROM_PTR (def_p); + if (TMP_NAME_P (var)) + continue; gcc_assert (DECL_P (var)); SET_DEF (def_p, make_ssa_name (var, stmt)); register_new_def (DEF_FROM_PTR (def_p), &block_defs_stack); Index: tree-pretty-print.c === *** tree-pretty-print.c (revision 108807) --- tree-pretty-print.c (working copy) *** dump_generic_node (pretty_printer *buffe *** 708,713 --- 708,717 dump_decl_name (buffer, node, flags); break; + case GVAR_DECL: + pp_printf (buffer, "", DECL_UID (node)); + break; + case RESULT_DECL: pp_string (buffer, ""); break; Index: tree.c === *** tree.c (revision 108807) --- tree.c (working copy) *** static const