Re: expand_omp_parallel typo?
Marcin Dalecki wrote on 10/18/06 00:27: bsi_insert_after (&si, t, TSI_SAME_STMT); Shouldn't this bee bsi_insert_after (&si, t, BSI_SAME_STMT); instead? Yes. We lucked out because both symbols have the same numeric value. Patch pre-approved as obvious. PS: Please do not use existing threads to start an unrelated one.
Re: stability of libgomp and libssp
Eric Christopher wrote on 10/19/06 17:33: I was wondering if anyone planned on changing abi or if we can depend on all changes not breaking the abi of these libraries? There is nothing planned in that area, but I wouldn't want to guarantee ABI stability. Mostly as a result of bug fixing. Since this will be the first official release, I expect several bugs that may introduce ABI problems. Unless you can use some kind of versioning, I don't see a good way to address this.
Re: Question about LTO dwarf reader vs. artificial variables and formal arguments
Ian Lance Taylor wrote on 10/21/06 14:59: That is, we are not going to write out DWARF. We can't, because DWARF is not designed to represent all the details which the compiler needs to represent. What we are going to write out is a superset of DWARF. And in fact, if it helps, I think that we shouldn't hesitate to write out something which is similar to but incompatible with DWARF. In general reading and writing trees is far from the hardest part of the LTO effort. I think it is a mistake for us to get too tied up in the details of how to represent things in DWARF. (I also think that we could probably do better by defining our own bytecode language, one optimized for our purposes, but it's not an issue worth fighting over.) Agreed. I don't think we'll get far if we focus too much on DWARF, as it clearly cannot be used as a bytecode language for our purposes. We will need to evolve our own bytecode language, either as an extension to DWARF (much like we did with SIMPLE) or do something from scratch. Implementing type support starting from DWARF is a start, but we should not constrain ourselves to it.
Re: fdump-tree explanation
Dino Puller wrote on 10/26/06 10:11: How many times gcc simplify expressions like: x/x, 0*x, 1*y, a+0, x*x/x and so on You are probably looking at folding then. An initial idea might be to put some code in fold-const.c:fold that compares the input tree expression with the output, if they are different, increment your counter.
Re: memory benchmark of tuples branch
Aldy Hernandez wrote on 10/26/06 10:40: As we have hoped, every single function exhibits memory savings. Yay. Nice! I don't know if this merits merging into mainline, or if it's preferable to keep plodding along and convert the rest of the tuples. What do you guys think? Either way, I have my work cut out for me, though I believe the hardest part is over (FLW). My vote is to merge into mainline sooner rather than later. However, it is a big patch and affects just about every module in the compiler, so I wouldn't want to barge in without getting some consensus first. As for the rest of the conversion, I don't think there are many changes left that are as invasive as this one. Mostly global search and replace, but it's been a few weeks since I looked at the code. I'm hoping we can extend the rest of the changes into late Stage1 and Stage2, but we can see about that as we go.
Re: memory benchmark of tuples branch
Aldy Hernandez wrote on 10/27/06 09:35: How does this sound to y'all? Sounds good to me. I would add an additional memory savings check between 4 and 5.
Re: fdump-tree explanation
Dino Puller wrote on 10/27/06 11:25: The idea is a bit complex. Anyway the fold function has no one only return so i can't compare input tree with output one, and it's called from a lot of others functions of others files. > Of course you can. fold() does not modify the input tree anymore. Just write a wrapper for it which calls the real fold(). Rename fold() to something else and that should be all you need to do. Now, if you want to do this on a per-pass manner, then it would probably require a bit more code. In that case, you'd have to write entry/exit code in passes.c:execute_one_pass. Even if i find the right places to put my code, how can i output collected infos? Collect it in a buffer and dump it at the end of compilation. Check in final.c:rest_of_handle_final.
Re: memory benchmark of tuples branch
Mark Mitchell wrote on 10/27/06 12:25: Aldy Hernandez wrote: Does the tuples branch include the CALL_EXPR reworking from the LTO branch? No. Though, that is a similar global-touch-everything project, so hopefully whatever consensus develops from tuples will carry over. I feel the same about LTO. We seem to have lots of destabilizing stuff in various branches. It may be better to move chunks of long-lived branches as we go along. Particularly things that we feel won't change much over the lifetime of the branch. So, if the CALL_EXPR rework in LTO is "done", we should think about moving it in. But other folks may want to play it more conservatively, so I would rather have consensus here.
Re: compiling very large functions.
Kenneth Zadeck wrote on 11/04/06 15:17: 1) defining the set of optimizations that need to be skipped. 2) defining the set of functions that trigger the special processing. This seems too simplistic. Number of variables/blocks/statements is a factor, but they may interact in ways that are difficult or not possible to compute until after the optimization has started (it may depend on how many blocks have this or that property, in/out degree, number of variables referenced in statements, grouping of something or other, etc). So, in my view, each pass should be responsible for throttling itself. The pass gate functions already give us the mechanism for on/off. I agree that we need more graceful throttles. And then we have components of the pipeline that cannot really be turned on/off (like alias analysis) but could throttle themselves based on size (working on that). The compilation manager could then look at the options, in particular the -O level and perhaps some new options to indicate that this is a small machine or in the other extreme "optimize all functions come hell or high water!!" and skip those passes which will cause performance problems. All this information is already available to the gate functions. There isn't a lot here that the pass manager needs to do. We already know compilations options, target machine features, and overall optimization level. What we do need is for each pass to learn to throttle itself and/or turn itself off. Turning the pass off statically and quickly could be done in the gating function. A quick analysis of the CFG made by the pass itself may be enough to decide. We could provide a standard group of heuristics with standard metrics that lazy passes could use. Say a 'cfg_too_big_p' or 'cfg_too_jumpy_p' that passes could call and decide not to run, or set internal flags that would partially disable parts of the pass (much like DCE can work with or without control-dependence information).
Re: compiling very large functions.
Kenneth Zadeck wrote on 11/06/06 12:54: I am not saying that my original proposal was the best of all possible worlds, but solving hacking things on a pass by pass or pr by pr basis is not really solving the problem. I don't think it's a hackish approach. We have policy setting at the high level (-O[123]), and local implementation of that policy via the gating functions. Providing common predicates that every pass can use to decide whether to switch itself off is fine (size estimators, high connectivity, etc), but ultimately the analysis required to determine whether a function is too expensive for a pass may not be the same from one pass to the other. OTOH, just using the gating function is not enough. Sometimes you want the pass to work in a partially disabled mode (like the CD-DCE example I mentioned earlier). In terms of machinery, I don't think we are missing a lot. All the information is already there. What we are missing is the implementation of more throttling/disabling mechanisms.
Re: compiling very large functions.
Brooks Moses wrote on 11/06/06 17:41: Is there a need for any fine-grained control on this knob, though, or would it be sufficient to add an -O4 option that's equivalent to -O3 but with no optimization throttling? We need to distinguish two orthogonal issues here: effort and enabled transformations. Currently, -O3 means enabling transformations that (a) may not result in an optimization improvement, and (b) may change the semantics of the program. -O3 will also enable "maximum effort" out of every transformation. In terms of effort, we currently have individual knobs in the form of -f and/or --params settings. It should not be hard to introduce a global -Oeffort=xxx parameter. But, it will take some tweaking to coordinate what -f/--params/-m switches should that enable.
Re: compiling very large functions.
Jan Hubicka wrote on 11/07/06 05:07: -O3 enables inlining, unswitching and GCSE after reload. How those change semantics of the program? Bah, I was convinced we were switching on -ffast-math at -O3. Never mind.
Re: Control Flow Graph
[EMAIL PROTECTED] wrote on 11/15/06 06:06: Hi all, i must use cfg library to build and manipulate a control flow graph. I have read more but i have not found an answer to my question: It is possible to build a cfg structure directly from a file .cfg ?? How i can building a cfg from a file?? Thanks to all, Ask for a dump using the -blocks switch and post-process the dump file with the attached script. $ gcc -fdump-tree-all-blocks file.c $ dump2dot file.c.XXXt.yyy It generates a graphviz file with the flow graph of the function. The script is fairly simplistic and will not handle more than one function too gracefully, but that should be easy to change. #!/bin/sh # # (C) 2005 Free Software Foundation # Contributed by Diego Novillo <[EMAIL PROTECTED]>. # # This script is Free Software, and it can be copied, distributed and # modified as defined in the GNU General Public License. A copy of # its license can be downloaded from http://www.gnu.org/copyleft/gpl.html if [ "$1" = "" ] ; then echo "usage: $0 file" echo echo "Generates a GraphViz .dot graph file from 'file'." echo "It assumes that 'file' has been generated with -fdump-tree-...-blocks" echo exit 1 fi file=$1 out=$file.dot echo "digraph cfg {"> $out echo " node [shape=box]" >>$out echo ' size="11,8.5"' >>$out echo>>$out (grep -E '# BLOCK|# PRED:|# SUCC:' $file | \ sed -e 's:\[\([0-9\.%]*\)*\]::g;s:([a-z_,]*)::g' | \ awk '{ #print $0; \ if ($2 == "BLOCK") \ { \ bb = $3;\ print "\t", bb, "[label=\"", bb, "\", style=filled, color=gray]"; \ } \ else if ($2 == "PRED:") \ { \ for (i = 3; i <= NF; i++) \ print "\t", $i, "->", bb, ";"; \ } \ }') >> $out echo "}">> $out
Re: Control Flow Graph
albino aiello wrote on 11/15/06 10:14: Thanks, but i want to use the .cfg file to construct directly a tree_cfg in C language using the TREE SSA libraries of gcc. There is no such thing as a tree ssa library. If you are adding a pass to GCC, then you already have the CFG at your disposal. In fact, you are pretty much forced to work over the CFG. If you want to use this functionality outside of GCC, I'm afraid you cannot do that (without a lot of work).
Re: how to load a cfg from a file by tree-ssa
Rob Quill wrote on 11/23/06 12:41: I haven't looked into this yet, but as I think I may need to be able to do something similar, is it possible to parse the cfg file that is given out, and build a C structure like that? Parsing a CFG dump is trivial. See the script I posted in http://gcc.gnu.org/ml/gcc/2006-11/msg00576.html. You can then convert it to whichever C data structure you want.
Re: machine-dependent Passes on GIMPLE/SSA Tree's?
Markus Franke wrote on 11/27/06 12:50: Are there also some other optimisation passes working on the GIMPLE/SSA representation which make use of any machine-dependent features? Yes. Passes like vectorization and loop optimizations will use so called 'target hooks' which allow the high-level passes to query the target for various capabilities and attributes. See the tree-vect*.c files for several examples.
Re: writing a new pass: association with an option string
Andrea Callia D'Iddio wrote on 12/04/06 03:48: Dear all, I wrote a new pass for gcc. Actually the pass is always executed, but I'd like to execute it only if I specify an option from shell (ex. gcc --mypass pippo.c). How can I do? Create a new flag in common.opt and read its value in the gate function of your pass. I *believe* this is documented somewher in the internals manual, but I'm not sure. You can check how other passes do it. See, for instance, flag_tree_vectorize in common.opt and in the vectorizer's gating predicate.
Re: [PATCH]: Require MPFR 2.2.1
Richard Guenther wrote on 12/04/06 11:23: On 12/3/06, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote: I'd like to give everyone enough time to update their personal installations and regression testers before installing this. Does one week sound okay? If there are no objections, that's what I'd like to do. Please don't. It'll be a hassle for us again and will cause automatic testers to again miss some days or weeks during stage1 (given christmas holiday season is near). Rather defer to the start of stage3 please. Agreed, please don't. The whole MPFR thing is already fairly annoying. I have just updated all my machines with a special RPM I got from Jakub. I don't want to go through that again so soon.
Re: Announce: MPFR 2.2.1 is released
Kaveh R. GHAZI wrote on 12/04/06 21:32: That idea got nixed, but I think it's time to revisit it. Paolo has worked out the kinks in the configury and we should apply his patch and import the gmp/mpfr sources, IMHO. Yes, I vote to include gmp/mpfr in the tree. If gmp/mpfr is still a fluid target, we could add svn glue code to avoid commits to the sub-tree and rely exclusively on wholesale import.
Re: How to save a va_list object into a buffer and restore it from there?
Hoehenleitner, Thomas wrote on 12/06/06 07:08: after unsuccessful search in the doc, the web and this mailing list I decided to launch this question here: Offtopic in this forum. Please use [EMAIL PROTECTED] or comp.lang.c. This list is for GCC *development*.
Re: Gimple Type System
Richard Warburton wrote on 12/06/06 07:59: I would be most grateful of an answer to these questions, since I find the implementation of the gimple type system to be a little puzzling. That's because there is *no* GIMPLE type system. GIMPLE latches on to the type system of the input language, via the so called 'language hooks' (see lang_hooks.types_compatible_p). This is a limitation that has not been addressed yet.
Re: Gimple Type System
Richard Warburton wrote on 12/06/06 09:44: Thanks for this information. I presume from you response that there is a plan to address this issues, is this something that will be happening in the 'near-term', by that I mean within the next 6-9 months? Well, we will need something like this for the LTO project (http://gcc.gnu.org/wiki/LinkTimeOptimization). I am not sure whether there is anyone actively working on it ATM, though. As far as timelines, that is fairly hard to predict. It may be a few months or a couple of years. The only change in this area that I'm aware of is the streaming of type information using dwarf. But that is not the same as having a proper GIMPLE type system.
Re: Help with traversing block statements in pragma handler
Ferad Zyulkyarov wrote on 12/15/06 05:02: 9: FOR_EACH_BB_FN (bb, this_cfun) 10: for (bsi = bsi_start(bb); !bsi_end_p(bsi); bsi_next(&bsi)) 11: { 12: tree stmt = bsi_stmt(bsi); 13: debug_tree(stmt); 14: /* Do something */ 15: } 16: } /* End of void handle_pragma_test */ This is way too early in the compilation. At this point we are not even in GIMPLE mode, so there will not be a flowgraph. I recommend that you follow what happens when GCC handles structurally similar pragmas. In your case, try following through #pragma omp parallel. Its behaviour is very similar to what you show in your #pragma test.
Re: Help with traversing block statements in pragma handler
Ferad Zyulkyarov wrote on 12/15/06 08:46: And something more, what is the difference between c_register_pragma and cpp_register_deferred_pragma functions? Unfortunately, I couldn't fined a descriptive information about these two functions. You need to look in ../libcpp/directives.c. Deferred pragmas are registered to avoid calling pragma handling while we are pre-processing.
Re: SSA_NAMES: should there be an unused, un-free limbo?
Robert Kennedy wrote on 12/21/06 11:37: The situation is that some SSA_NAMEs are disused (removed from the code) without being released onto the free list by release_ssa_name(). Yes, it happens if a name is put into the set of names to be updated by update_ssa. After update_ssa, it should be true that every SSA name with no SSA_NAME_DEF_STMT is in the free list. However, if we have SSA names with no defining statement that are still in considered active, I would hardly consider it a serious bug. It's a waste of memory, which you are more than welcome to fix, but it should not cause correctness issues. Please discuss. Test case?
Re: SSA_NAMES: should there be an unused, un-free limbo?
Daniel Berlin wrote on 12/21/06 12:21: for (i = 0; i < num_ssa_names; i++) { tree name = ssa_name (i); if (name && !SSA_NAME_IN_FREELIST (name) DFS (name) > I see that you are not checking for IS_EMPTY_STMT. Does DFS need to access things like bb_for_stmt? In any case, that is not important. I agree that every SSA name in the SSA table needs to have a DEF_STMT that is either (a) an empty statement, or, (b) a valid statement still present in the IL. Note that this is orthogonal to the problem of whether we free up unused names from this list. Every time a statement S disappears, we should make sure that the names defined by S get their SSA_NAME_DEF_STMT set to NOP. Frankly, I'm a bit surprised that we are running into this. I'd like to see a test case, if you have one.
Re: SSA_NAMES: should there be an unused, un-free limbo?
Robert Kennedy wrote on 12/21/06 13:58: Right now we can have SSA_NAMEs in the list which are no longer used, and we have no way to tell whether they are used or not. Thus the only way to see all valid SSA_NAMEs is to walk the code. To wit: are there iteration macros somewhere that will help me walk the code while abstracting away all the ugly details like stmt/bb boundaries, etc.? No. The code is segmented into basic blocks. To walk the code, you must walk each basic block. Something I forgot to add in my previous message. Notice that it is not altogether rare to find cases where we have more SSA names than statements. Are you walking the SSA names because you assume it's always shorter than walking the statements?
Re: SSA_NAMES: should there be an unused, un-free limbo?
Jeffrey Law wrote on 12/21/06 12:48: True. But remember, the stated purpose of the SSA_NAME recycling code was _not_ to track every SSA_NAME that went "dead" and recycle it, but instead to get the majority of them (and to ultimately save memory by recycling them). Orphan SSA_NAME were always expected. But this is orthogonal to the recycling issue. They are traversing the SSA name table and finding SSA names that have invalid DEF_STMT entries. I believe that we should support this kind of usage of the SSA table. Alternately we can revisit the entire recycling question as well -- things have changed significantly since that code was written and I've speculated that the utility of the recycling code has diminished, possibly to the point of being a useless waste of time and code. That'd be interesting to try, yes. Though we *do* want to invalidate SSA_NAME_DEF_STMT for the SSA names whose defining statement gets deleted.
Re: SSA_NAMES: should there be an unused, un-free limbo?
Ian Lance Taylor wrote on 12/21/06 13:08: If that is acceptable, then there is no issue here. If that is not acceptable, then we need to fix the code to correctly mark SSA_NAMEs which are no longer used. Whether we recycle the memory in the unused SSA_NAMEs is a separate (and less interesting) discussion. Agreed. We have various passes that walk through the SSA table, so I want to keep supporting that. We do have cases where an SSA name may get its defining statement zapped and yet we need to keep it around. The renamer uses names_to_release in those cases, and makes sure not to visit the defining statement. If every statement removal were to set SSA_NAME_DEF_STMT to NOP for every name generated by the removed statement, then the renamer would probably not need to do that. However, the renamer *needs* the SSA name itself not to be recycled (for name->name mappings).
Re: SSA_NAMES: should there be an unused, un-free limbo?
Robert Kennedy wrote on 12/21/06 15:01: Something I forgot to add in my previous message. Notice that it is not altogether rare to find cases where we have more SSA names than statements. Are you walking the SSA names because you assume it's always shorter than walking the statements? No. I'm walking the SSA names because logically that's what the algorithm is interested in. At that level, the algorithm doesn't care about statements. OK. Good enough. To fix this bug, I also suggest what Jeff and Ian have been discussing: 1- A verifier in verify_ssa 2- Fix bugs found in #1 by making sure that every time we remove a statement, the SSA_NAME_DEF_STMT of all the affected names is changed to point to an empty statement.
Re: SSA_NAMES: should there be an unused, un-free limbo?
Jeffrey Law wrote on 12/22/06 01:09: On Thu, 2006-12-21 at 14:05 -0500, Diego Novillo wrote: In any case, that is not important. I agree that every SSA name in the SSA table needs to have a DEF_STMT that is either (a) an empty statement, or, (b) a valid statement still present in the IL. Just to be 100% clear. This is not true at the current time; see the discussion about the sharing of a single field for TREE_CHAIN and SSA_NAME_DEF_STMT. If you want to make that statement true, then you need to fix both the orphan problem and the sharing of a field for SSA_NAME_DEF_STMT and TREE_CHAIN. I think we are agreeing violently.
[mem-ssa] Updated documentation
I've updated the document describing Memory SSA. The section on mixing static and dynamic partitioning is still being implemented, so it's a bit sparse on details and things will probably shift somewhat before I'm done. http://gcc.gnu.org/wiki/mem-ssa Feedback welcome. Thanks.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
Mark Mitchell wrote on 01/01/07 14:46: What a thread this has turned out to be. Indeed. In general, I'm not too thrilled with the idea of disabling transformations for the sake of non-conforming code. However, I would not mind a -fconforming flag similar to -fstrict-aliasing. I haven't yet seen that anyone has actually tried the obvious: run SPEC with and without -fwrapv. Would someone please do that? Or, pick your favorite high-performance application and do the same. But, let's get some concrete data as to how much this optimization helps. On x86_64: SPEC2000int is almost identical. SPEC2000fp shows a -6% drop when using -fwrapv. - HARDWARE CPU: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz CPU MHz: 2128.001 FPU: Integrated CPU(s) enabled: 2 Secondary Cache: 2048 KB Memory: 2053792 kB SOFTWARE Operating System: Linux 2.6.18-1.2868.fc6 Compiler: GNU C version 4.3.0 20070101 (experimental) (x86_64-unknow n-linux-gnu) NOTES - Base is -O2 -march=nocona -mtune=generic -fwrapv Peak is -O2 -march=nocona -mtune=generic SPEC CINT2000 Summary Estimated Estimated Base Base Base Peak Peak Peak BenchmarksRef Time Run Time RatioRef Time Run Time Ratio 164.gzip 1400 1271099* 1400 1271106* 175.vpr 1400 1141227* 1400 1161209* 176.gcc X X 181.mcf 1800 195 921* 1800 194 927* 186.crafty1000 48.7 2054* 1000 48.2 2076* 197.parser1800 197 915* 1800 195 923* 252.eon 1300 64.7 2011* 1300 64.8 2005* 253.perlbmk X X 254.gap 1100 64.9 1696* 1100 65.3 1685* 255.vortexX X 256.bzip2 1500 1081384* 1500 1081395* 300.twolf 3000 1691771* 3000 1691775* Est. SPECint_base2000 1391 Est. SPECint20001394 SPEC CFP2000 Summary Estimated Estimated Base Base Base Peak Peak Peak BenchmarksRef Time Run Time RatioRef Time Run Time Ratio 168.wupwise 1600 1031547* 1600 88.3 1812* 171.swim 3100 1522033* 3100 1452131* 172.mgrid 1800 193 935* 1800 1311376* 173.applu 2100 1951076* 2100 1881116* 177.mesa 1400 71.1 1968* 1400 71.3 1964* 178.galgel2900 1072699* 2900 99.1 2927* 179.art 2600 96.7 2689* 2600 94.6 2749* 183.equake1300 69.0 1884* 1300 67.1 1939* 187.facerec 1900 1491273* 1900 1461302* 188.ammp 2200 1701292* 2200 1681312* 189.lucas 2000 1021965* 2000 98.7 2025* 191.fma3d 2100 1951079* 2100 1921092* 200.sixtrack 1100 199 553* 1100 198 556* 301.apsi 2600 2081248* 2600 1961329* Est. SPECfp_base2000 1463 Est. SPECfp2000 1560 -
Re: Do we want non-bootstrapping "make" back?
Daniel Jacobowitz wrote on 12/30/06 02:08: Once upon a time, the --disable-bootstrap configure option wasn't necessary. "make" built gcc, and "make bootstrap" bootstrapped it. Is this behavior useful? Should we have it back again? That'd be great. I miss the old behaviour.
[RFC] Our release cycles are getting longer
So, I was doing some archeology on past releases and we seem to be getting into longer release cycles. With 4.2 we have already crossed the 1 year barrier. For 4.3 we have already added quite a bit of infrastructure that is all good in paper but still needs some amount of TLC. There was some discussion on IRC that I would like to move to the mailing list so that we get a wider discussion. There's been thoughts about skipping 4.2 completely, or going to an extended Stage 3, etc. Thoughts? release-cycle.pdf Description: Adobe PDF document
Re: [mem-ssa] Updated documentation
Ira Rosen wrote on 01/02/07 03:44: In the example of dynamic partitioning below (Figure 6), I don't understand why MEM7 is not killed in line 13 and is killed in line 20 later. As far as I understand, in line 13 'c' is in the alias set, and it's currdef is MEM7, so it must be killed by the store in line 14. What am I missing? You are absolutely correct. MEM7 should indeed be killed in line 13 (serves me right for manually changing the code). Thanks for pointing it out. I will correct the document.
Re: Which optimization levels affect gimple?
Paulo J. Matos wrote on 01/24/07 12:44: check what kind of gimple code you get with -fdump-tree-gimple and -O0 and -O3 have different results, -fdump-tree-gimple is the first dump *before* any optimizations occur. To see the effect of all the GIMPLE optimizations you should use -fdump-tree-optimized. however, -O3 and -O9 have exactly the same output. Will -Ox for x > 3, generate the same gimple trees? (i.e., are done in backend of gcc?) -On for n >= 3 is identical to -O3. This may change in the future.
Re: [RFC] Our release cycles are getting longer
Mark Mitchell wrote on 01/25/07 00:09: First, I haven't had as much time to put in as RM lately as in past, so I haven't been nagging people as much. > Sure, but this is a trend that started with 3.1 and it's gotten progressively worse. Granted, we are now dealing with a much bigger project and perhaps the amount of engineering cycles has not kept up: Release Size (KLOC) 1.21 1988 58 1.38 1990 87 2.0 1992229 2.8.1 1998416 EGCS 1998603 2.95 1999715 3.0 2001 1,007 3.1 2002 1,336 4.0 2005 1,813 4.1 2006 2,109 4.2 2007 2,379 some people want/suggest more frequent releases. But, I've also had a number of people tell me that the 4.2 release cycle was too quick in its early stages, and that we didn't allow enough time to get features in -- even though doing so would likely have left us even more bugs to fix. > That's also true. The duration of our stage1 cycles has gone down quite a bit since 3.3. The data I have for the 3.x releases is a bit incomplete and we had a strange 3.2 release which I didn't include because we suddenly jumped from branching 3.1 to releasing 3.2 (that was the C++ ABI thing, IIRC). Anyway, here's the data I got from our release schedule. These are the durations of each stage since 3.1 Release Stage 1 Stage 2 Stage 3 Release 3.1 2002 0 65 69 212 3.3 2003 169 1 61 271 3.4 2004 262 103 93 289 4.0 2005 172 64 170 288 4.1 2006 59 74 133 309 4.2 2007 61 59 216 393 There is some correlation between the length of Stage1 to Stage3. It's as if longer Stage1s lead to shorter Stage3s. Perhaps we could consider lengthening the early stages, which by all accounts are the more "fun", and shorten the pain during stage 3. Long-lived branches are painful to maintain. If we allow them more time to get in mainline, it may help spread the stabilization work during stage1 (a lot more exposure). Another thing we could try again is going into mini-freeze cycles spanning 2-3 weeks. We've done that in the past when mainline was in a pathetic state and I think it was helpful. Some folks have suggested that we ought to try to line up FSF releases to help the Linux distributors. > I don't think that's in our best interest. We can't really help what distros do. The fact is, however, that when distros pick up a specific release, that release tends to be pretty solid (e.g. 4.1). I don't think that some of the ideas (like saying that you have to fix N bugs for every patch you contribute) are very practical. What we're seeing is telling us something about "the market" for GCC; there's more pressure for features, optimization, and ports than bug fixes. If there were enough people unhappy about bugs, there would be more people contributing bug fixes. Agreed. We are now in a featuritis phase. We still have many marketing bullet points that folks want filled in. I believe this will continue for at least a couple more releases. We are also being pulled from many directions at once, our user base is very diverse. Making the infrastructure more palatable for external folks to get involved in development and attract more engineering cycles is probably one of our best long term bets.
Re: gcc compile time support for assumptions
Ian Lance Taylor wrote on 01/18/07 10:51: Well, internally, we do have ASSERT_EXPR. It would probably take a little work to permit the frontends to generate it, but the optimizers should understand it. By default, they do not. When I initially implemented VRP, I was adding ASSERT_EXPRs right after gimplification. The rationale was to have the ASSERT_EXPRs rewritten into SSA form by the initial SSA pass. This was convenient, but it destroyed the quality of generated code. Suddenly, little or no copies, constants were being propagated, jump threading wasn't working, PRE wasn't doing its job, etc. The problem was that all these ASSERT_EXPRs were not being grokked by the optimizers, every optimizer would see the assertion, think the worst and block transformations. It also meant quite a bit of bulk added to the IL which increased compilation times. So, if we decide to add ASSERT_EXPRs early in the pipeline, we have to mind these issues. In the end, I went for adding assertions inside VRP and fixing up the SSA form incrementally. Perhaps we can do something similar for other passes that may want to deal with assertions. Now, if these are assertions inserted by the user, that's another problem. The IL wouldn't bulk up so much, but we would still need to handle them everywhere. Assertions shouldn't block scalar cleanups.
Re: Which optimization levels affect gimple?
Paulo J. Matos wrote on 01/26/07 06:52: Is the output of -fdump-tree-optimized a subset of GIMPLE? Yes. The output is an incomplete textual representation of the GIMPLE form of the program.
Re: Which optimization levels affect gimple?
Richard Guenther wrote on 01/26/07 07:28: It's after doing TER, so the statements are no longer valid GIMPLE statements. Silly me. Richard's right. You want the output of -fdump-tree-uncprop. That's the last GIMPLE dump (if my memory doesn't fail me again).
Re: Can C and C++ object files be linked into an executable?
Ray Hurst wrote on 01/27/07 16:48: I think this was the answer I was looking for. By the way, was this the correct place to post it? No. That was a language question. gcc-help or comp.std.c++ would have been a better forum. This deals with the development of GCC, not its use.
Re: Which optimization levels affect gimple?
Paulo J. Matos wrote on 01/28/07 18:03: On 1/24/07, Diego Novillo <[EMAIL PROTECTED]> wrote: Paulo J. Matos wrote on 01/24/07 12:44: check what kind of gimple code you get with -fdump-tree-gimple and -O0 and -O3 have different results, -fdump-tree-gimple is the first dump *before* any optimizations occur. To see the effect of all the GIMPLE optimizations you should use -fdump-tree-optimized. So the dump-tree-optimized will also return GIMPLE? a subset of... GIMPLE? No, that's back to GENERIC. During out-of-ssa we recombine GIMPLE expressions into GENERIC because of limitations in the way RTL expansion works. In the future, we may go from GIMPLE SSA directly into RTL, so this may change.
Re: Which optimization levels affect gimple?
Paulo J. Matos wrote on 01/28/07 18:08: On 1/26/07, Diego Novillo <[EMAIL PROTECTED]> wrote: Richard Guenther wrote on 01/26/07 07:28: It's after doing TER, so the statements are no longer valid GIMPLE statements. Silly me. Richard's right. You want the output of -fdump-tree-uncprop. That's the last GIMPLE dump (if my memory doesn't fail me again). Ok, so, I guess that being GIMPLE a language, there's somewhere a specification of possible constructs, right? Where can I find it? I tried to search in the inside gcc manual but there seems to be nothing like a formal specification of the nodes in the GIMPLE AST. The GIMPLE grammar is documented in the internals manual. In the tree ssa sections. Search for "GIMPLE grammar" or something along those lines. Moreover, is there anything referencing which optimizations are performed between fdump-tree-gimple and fdump-tree-uncprop (being those, afaik, the first and last gimple dumps)? -fdump-tree-all gives you all the dumps by the high-level optimizers. -fdump-all-all gives you all the dumps by both GIMPLE and RTL optimizers. If reading source code could help me understand GIMPLE, could please someone tell me where to start looking at it. I've svn'ed gcc and it's huge so I'm quite lost. Where is the output of fdump-tree-gimple done? Start with passes.c:init_optimization_passes. That's the pass sequencer.
Re: 2007 GCC Developers Summit
Ben Elliston wrote on 01/28/07 17:45: One idea I've always pondered is to have brief (perhaps 1-2 hr) tutorials, given by people in their area of expertise, as a means for other developers to come up to speed on a topic that interests them. Is this something that appeals to others? Sounds good to me. For instance, the new java front end, a description of the new build system, etc.
Re: Which optimization levels affect gimple?
Paulo J. Matos wrote on 01/29/07 06:35: On 1/29/07, Diego Novillo <[EMAIL PROTECTED]> wrote: -fdump-tree-all gives you all the dumps by the high-level optimizers. -fdump-all-all gives you all the dumps by both GIMPLE and RTL optimizers. Is this -fdump-all-all version specific? Doesn't work on 4.1.1: $ g++ -fdump-all-all allocation.cpp cc1plus: error: unrecognized command line option "-fdump-all-all" No, I goofed. I must've dreamed the -all-all switch. You have to use -fdump-tree- for GIMPLE dumps and -fdump-rtl- for RTL dumps. It's also possible that -fdump-rtl doesn't work on the 4.1 series (I don't recall when -fdump-rtl was introduced, sorry). Check the invocation sections in the GCC 4.1 manual. Grep for fdump-.
Re: After GIMPLE...
Paulo J. Matos wrote on 01/30/07 10:11: Well, I spent the morning looking at the code and since what I need is only the flow of gcc up until I have the GIMPLE tree, I could add a pass after the pass which generates the gimple tree, in that pass I do what I need with the gimple tree and then call exit(). Would this be a good idea? It would probably not be a good idea. Passes are called for each function in the callgraph. If you stop immediately after your pass, you will leave all the other functions unprocessed. What is it that you want to do? If you need dataflow information, you probably also need to have the GIMPLE code in SSA form. If yes, then the idea would be to create a pass and add it in passes.c after the line NEXT_PASS (pass_lower_cf); since from what I heard in #gcc, this is where the gimple tree is created, right? Well, it depends on what you need. If your pass can work in high GIMPLE then you can insert it before that. pass_lower_cf lowers control flow and lexical scopes, but not EH. Perhaps if you describe a little bit what you are trying to do, we can give you a better idea.
Re: After GIMPLE...
Paulo J. Matos wrote on 01/31/07 11:26: So, ideally, I would like just the gcc part until the first part of the middleend where you have a 'no optimizations', language independent AST of the source file. OK, so you probably want to inject your pass right before pass_build_ssa (in init_optimization_passes). All the facilities to traverse the IL and flowgraph described in the Tree SSA section of the internals manual should apply.
Re: After GIMPLE...
Paulo J. Matos wrote on 02/01/07 04:37: What can I do then to stop gcc to further process things? After informing the user there's no more reason on my site to continue. Stop gracefully or just stop? The latter is easy. The former involves writing code to skip all passes after a certain point, or just don't schedule the passes don't want to run. See init_optimization_passes.
Re: After GIMPLE...
Paulo J. Matos wrote on 02/06/07 14:19: Why before pass_build_ssa? (version 4.1.1) It depends on the properties your pass requires. If you ask for PROP_cfg and PROP_gimple_any then you should schedule it after the CFG has been built, but if you need PROP_ssa, then you must be after pass_build_ssa which implies that your pass only gets enabled at -O1+.
Re: which opt. flags go where? - references
Kenneth Hoste wrote on 02/07/07 08:56: [1] Almagor et al., Finding effective compilation sequences (LCES'04) [2] Cooper et al., Optimizing for Reduced Code Space using Genetic Algorithms (LCTES'99) [3] Almagor et al., Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms (Tech.Report) [3] Acovea: Using Natural Selection to Investigate Software Complexities (http://www.coyotegulch.com/products/acovea/) You should also contact Ben Elliston (CC'd) and Grigori Fursin (sorry, no email). Ben worked on dynamic reordering of passes, his thesis will have more information about it. Grigori is working on an API for iterative an adaptive optimization, implemented in GCC. He presented at the last HiPEAC 2007 GCC workshop. Their presentation should be available at http://www.hipeac.net/node/746 Some other questions: * I'm planning to do this work on an x86 platform (i.e. Pentium4), but richi told me that's probably not a good idea, because of the low number of registers available on x86. Comments? When deriving ideal flag combinations for -Ox, we will probably want common sets for the more popular architectures, so I would definitely include x86. * Since we have done quite some analysis on the SPEC2k benchmarks, we'll also be using them for this work. Other suggestions are highly appreciated. We have a collection of tests from several user communities that we use as performance benchmarks (DLV, TRAMP3D, MICO). There should be links to the testers somewhere in http://gcc.gnu.org/ * Since there has been some previous work on this, I wonder why none of it has made it into GCC development. Were the methods proposed unfeasible for some reason? What would be needed to make an approach to automatically find suitable flags for -Ox interesting enough to incorporate it into GCC? Any references to this previous work? It's one of the things I would like to see implemented in GCC in the near future. I've been chatting with Ben and Grigori about their work and it would be a great idea if we could discuss this at the next GCC Summit. I'm hoping someone will propose a BoF about it.
Re: Performance regression on the 4.3 branch?
H. J. Lu wrote on 02/14/07 09:22: Is this the saem as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30735 No, it isn't.
Re: Finalizer after pass?
Paulo J. Matos wrote on 02/28/07 11:07: Is there a way to install a finalizing function? (to be called after all functions in the pass have been processed) Or to know if the current function being processed is the last one? (maybe if I know the number of times my pass will be called!) Perhaps it's easier to implement your feature as an IPA pass. For IPA passes, you are not called with a specific function. Instead, you get to traverse the callgraph yourself. See passes like ipa-cp.c for details.
Re: Finalizer after pass?
Paulo J. Matos wrote on 03/01/07 10:41: My IPA pass seems to be run only for -On, n>=1, is there a way to make it ran even for -O0? No, we only run IPA passes if flag_unit_at_a_time is set. That only is set when optimizing. At -O0, we simply emit functions individually.
Re: Accessing function code from CFG
Paulo J. Matos wrote on 03/02/07 10:12: In an IPA pass, for each CFG node, I have a tree decl member from which I can access the return type, name of the function, argument names and its types, but I can't seem to find a way to get the function code. I would guess it would be a basic block list but I don't know where I can get it. You need to get at the function structure from the cgraph node with DECL_STRUCT_FUNCTION (cgraph_node->decl). Then you can use one of the CFG accessors like basic_block_info_for_function().
Re: Massive SPEC failures on trunk
Grigory Zagorodnev wrote on 03/03/07 02:27: > There are three checkins, candidates for the root of regression: > http://gcc.gnu.org/viewcvs?view=rev&revision=122487 > http://gcc.gnu.org/viewcvs?view=rev&revision=122484 > http://gcc.gnu.org/viewcvs?view=rev&revision=122479 > SPEC2k works as usual[1] for me on x86_64 as of revision 122484. The only new compile failure I see is building 300.twolf with: mt.c: In function 'MTEnd': mt.c:46: warning: incompatible implicit declaration of built-in function 'free' mt.c:46: error: too many arguments to function 'free' specmake: *** [mt.o] Error 1 Ian, looks like your VRP patch may be involved. [1] 176.gcc and 253.perlbmk usually miscompare for me. Not sure why.
Re: Improvements of the haifa scheduler
Maxim Kuvyrkov wrote on 03/05/07 02:14: >o Fix passes that invalidate tree-ssa alias export. Yes, this should be good and shouldn't need a lot of work. >o { Fast but unsafe Gupta's aliasing patch, Unsafe tree-ssa alias > export } in scheduler's data speculation. "unsafe" alias export? I would definitely like to see the tree->rtl alias information transfer fixed once and for all. Finishing RAS's tree->rtl work would probably make a good SoC project.
Re: Signed overflow patches OK for 4.2?
Eric Botcazou wrote on 03/05/07 15:59: >> Then it should also be disabled by default also in 4.1.3 and should >> have been disabled in 4.1.2 which was only released last month so >> there is no reason why it has to be disabled in 4.2.0 if everyone is >> using 4.1 anyways. > > VRP has become more aggressive in 4.2.x than in 4.1.x though. Agreed. I don't see the need to backport this functionality to 4.1. It has been out for quite some time now, used in various distros and we have not been flooded with requests from users. While this represents a new feature in 4.2, I don't think it's too risky. Whatever failures are triggered should be easy to identify and fix. I personally don't like this feature very much as it may represent a slippery slope into forcing us to warn in every optimization that exploits undefined aspects of the standard. But user pressure obviously exists, so *shrug*.
Re: Signed overflow patches OK for 4.2?
Ian Lance Taylor wrote on 03/05/07 18:23: > I gather you are saying here that it is OK with you to backport > -fstrict-overflow/-Wstrict-overflow to 4.2. Yes.
Re: Massive SPEC failures on trunk
Ian Lance Taylor wrote on 03/06/07 09:49: > "Vladimir Sysoev" <[EMAIL PROTECTED]> writes: > >> Bug has been already reported >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31037 > > I don't think this one could have anything to do with my VRP changes, > but I'll try to take a look later today. > Actually, this looks more related to my aliasing patch. I'll be dealing with this one soon.
Re: Accessing function code from CFG
Paulo J. Matos wrote on 03/07/07 10:36: > Is this normal? It seems there are no basic blocks set for the > functions. Probably my pass is being run before the bbs are created? Looks like it. Set a breakpoint in build_tree_cfg and your function. If gdb stops in your function first, you found the problem.
Re: Accessing function code from CFG
Paulo J. Matos wrote on 03/07/07 11:43: > What am I missing? You are debugging the wrong binary. I'd suggest you browse through http://gcc.gnu.org/wiki/DebuggingGCC You need to debug one of cc1/cc1plus/jc1
Re: Libiberty functions
Dave Korn wrote on 03/08/07 07:30: > (Also, bear in mind that if you want your new pass to work correctly with > pre-compiled headers, you really ought to be using Gcc's garbage-collected > memory management facilities. See > http://gcc.gnu.org/onlinedocs/gccint/Type-Information.html#Type-Information > for the full gory details) That's not right. GCC does not use GC memory everywhere, passes can also heap-allocate and/or use obstacks. You do have to be careful when mixing heap-allocated with GC memory, but for pass-local memory allocation schemes, heap and obstacks are perfectly fine. Paulo, you may also want to use the XNEW/XCNEW wrapper macros. They are handy shorthand wrappers around malloc. Another convenient way of allocating a pool of memory is to use obstacks (See libiberty/obstack.c).
Re: Import GCC 4.2.0 PRs
Richard Guenther wrote on 03/13/07 05:57: > Yes, this is a similar issue as PR30840 on the mainline, the CCP propagator > goes > up the lattice in some cases. This is something Diego promised me to look at > ;) > But we might be able to paper over this issue in 4.2 ... I'll take a look later this week.
Re: Query regarding struct variables in GIMPLE
Karthikeyan M wrote on 03/13/07 21:32: > appears as x.j = 10 inside the GIMPLE dump of the function body . Is > there some place from where I can get it in the following( or any > other simpler ) form No, we don't unnecessarily take addresses of variables. Structure references are left intact. For some aggregates that cannot be scalarized we try to create artificial tags to represent the fields (to get field sensitive results in points-to resolution).
Re: can't find VCG viewer
Sunzir Deepur wrote on 03/14/07 05:36: > any idea where I can find a (free) graphical VCG viewer suitable > for gcc's vcg outputs ? I'd recommend the attached script. Feed the output to GraphViz. The script may need changes if you are using RTL dumps. #!/bin/sh # # (C) 2005 Free Software Foundation # Contributed by Diego Novillo <[EMAIL PROTECTED]>. # # This script is Free Software, and it can be copied, distributed and # modified as defined in the GNU General Public License. A copy of # its license can be downloaded from http://www.gnu.org/copyleft/gpl.html if [ "$1" = "" ] ; then echo "usage: $0 file" echo echo "Generates a GraphViz .dot graph file from 'file'." echo "It assumes that 'file' has been generated with -fdump-tree-...-blocks" echo exit 1 fi file=$1 out=$file.dot echo "digraph cfg {"> $out echo " node [shape=box]" >>$out echo ' size="11,8.5"' >>$out echo>>$out (grep -E '# BLOCK|# PRED:|# SUCC:' $file | \ sed -e 's:\[\([0-9\.%]*\)*\]::g;s:([a-z_,]*)::g' | \ awk '{ #print $0; \ if ($2 == "BLOCK") \ { \ bb = $3;\ print "\t", bb, "[label=\"", bb, "\", style=filled, color=gray]"; \ } \ else if ($2 == "PRED:") \ { \ for (i = 3; i <= NF; i++) \ print "\t", $i, "->", bb, ";"; \ } \ }') >> $out echo "}">> $out
ANNOUNCE: Gelato ICE GCC track, San Jose, CA, April 16-18, 2007
The GCC track will be on Mon 16/Apr and Tue 17/Apr. The program should be complete by now - Program at-a-glance: http://ice.gelato.org/pdf/gelatoICE_ataglance.pdf - Speaker list and abstracts: http://ice.gelato.org/program/program.php - The following GCC track is part of the Gelato ICE (Itanium Conference & Expo) technical program, April 16-18, 2007, San Jose, CA. All interested GCC developers are invited to attend . A working list of speakers and topics can be found here: This year there is a strong focus on Linux. Andrew Morton and Wim Coekaerts, Senior Director for Linux Engineering at Oracle, are keynote speakers. In addition to the GCC track, there are tracks covering the Linux IA-64 kernel, virtualization, tools and tuning, multi-core programming, and research. GCC Track at Gelato ICE: - Update on Scheduler Work & Discussion of New Software Pipelining Work, Arutyun Avetisyan, Russian Academy of Science - GPL2 and GPL3, Dan Berlin, Google - Update on the Gelato GCC Build Farm, Matthieu Delahaye, Gelato Central Operations - Update on Prefetching Work, Zdenek Dvorak, SuSE - Interprocedural Optimization Framework, Jan Hubicka, SuSE - Update on Superblock Work, Bob Kidd, University of Illinois - GCC and Osprey Update, Shin-Ming Liu, HP - Compiling Debian Using GCC 4.2 and Osprey, Martin Michlmayr, Debian - Update on Alias Analysis Work, Diego Novillo, Redhat - Update on LTO, Kenneth Zadeck, NaturalBridge
Re: Query regarding struct variables in GIMPLE
Karthikeyan M wrote on 03/15/07 15:06: > Thanks. > Can you point me to documentation / code where I can get more > information about these artificial tags ? gcc/tree-ssa-alias.c:create_structure_vars() The section on Structural alias analysis in the internals documentation should also help.
Re: Google SoC Project Proposal: Better Uninitialized Warnings
Manuel López-Ibáñez wrote on 03/17/07 14:28: > This is the project proposal that I am planning to submit to Google > Summer of Code 2007. It is based on previous work of Jeffrey Laws, > Diego Novillo and others. I hope someone will find it interesting and Yes, I can act as a mentor. I'm particularly interested in what we are going to do at -O0. Ideally, I would try to build the SSA form and/or a predicated SSA form and try to phrase the problem in terms of propagation of the uninitialized attribute. I agree with your goal of consistency. The erratic behaviour of the current -Wuninitialized implementation is, to me, one of the most annoying traits of GCC. We can't even reorder the pass pipeline without running into this problem.
Re: We're out of tree codes; now what?
Steven Bosscher wrote on 03/19/07 10:14: > IMHO this is still the better solution than the subcodes idea. Agreed. If the performance hit is not too large, getting a wider tree code is much simpler to maintain.
Re: Google SoC Project Proposal: Better Uninitialized Warnings
Manuel López-Ibáñez wrote on 03/19/07 14:45: > Is building this early SSA form something that can be tackled by a > newbie developer with almost zero middle-end knowledge within the time > frame of the Summer of Code? Yes, it should not be too hard. See tree_lowering_passes. You may also want to read on the compilation flow to get some idea on how things are processed after parsing. There are some diagrams at http://people.redhat.com/dnovillo/Papers/#cgo2007
Re: Creating parameters for functions calls
Antoine Eiche wrote on 03/27/07 13:28: > Thanks for any help in finishing this pass See how omp-low.c builds calls to the child parallel functions (create_omp_child_function).
Re: GCC 4.2.0 Status Report (2007-03-22)
Mark Mitchell wrote on 03/22/07 22:10: > Diego, Roger, Jason, would you please let me know if you can work on the > issues above? I'm going to try to test Jim's patch for PR 31273 tonight. I'm looking at 29585 today.
Re: GCC 4.2.0 Status Report (2007-03-22)
Mark Mitchell wrote on 03/22/07 22:10: > PR 29585 (Novillo): ICE-on-valid This one seems to be a bug in the C++ FE, compounded by alias analysis papering over the issue. We are failing to mark DECLs in vtbl initializers as addressable. This causes the failure during aliasing because it is added to a points-to set but not marked for renaming. Since the variable does not have its address taken, we initially do not consider it interesting in the setup routines of alias analysis. However, the variable ends up inside points-to sets and later on we put it inside may-alias sets. This causes it to appear in virtual operands, but since it had not been marked for renaming, we fail. I traced the problem back to the building of vtables. I'm simply calling cxx_mark_addressable after building the ADDR_EXPR (I'm wondering if building ADDR_EXPR shouldn't just call langhooks.mark_addressable). Another way of addressing this would be to mark symbols addressable during referenced var discovery. But that is a bit hacky. Mark, does this look OK? (not tested yet) Index: cp/class.c === --- cp/class.c (revision 123332) +++ cp/class.c (working copy) @@ -7102,6 +7102,7 @@ /* Figure out the position to which the VPTR should point. */ vtbl = TREE_PURPOSE (l); vtbl = build1 (ADDR_EXPR, vtbl_ptr_type_node, vtbl); + cxx_mark_addressable (vtbl); index = size_binop (PLUS_EXPR, size_int (non_fn_entries), size_int (list_length (TREE_VALUE (l;
Re: GCC 4.2.0 Status Report (2007-03-22)
Jason Merrill wrote on 03/30/07 11:45: > Looks fine to me. Many places in the front end use build_address rather > than build1 (ADDR_EXPR) to avoid this issue. Yeah, I found other cases in Java and in c-*.c. In one case, we are building the address of a LABEL_DECL for a computed goto (finish_label_address_expr). Interestingly enough, mark_addressable refuses to mark the label as addressable, but we need the label addressable so that it's processed properly by the compute_may_aliases machinery. Given that we need to be very consistent about addressability marking in the FEs, wouldn't we be better off doing this in build1_stat()? Index: tree.c === --- tree.c (revision 123332) +++ tree.c (working copy) @@ -2922,7 +2922,11 @@ build1_stat (enum tree_code code, tree t case ADDR_EXPR: if (node) - recompute_tree_invariant_for_addr_expr (t); + { + recompute_tree_invariant_for_addr_expr (t); + if (DECL_P (node)) + TREE_ADDRESSABLE (node) = 1; + } break; default: Thanks.
Re: GCC 4.2.0 Status Report (2007-03-22)
Mark Mitchell wrote on 03/30/07 12:22: > So, I think the right fix is (a) the change above, (b) remove the > TREE_ADDRESSABLE setting from mark_vtable_entries (possibly replacing it > with an assert.) After removing the papering over TREE_ADDRESSABLE we were doing in the aliaser, I found that other users of ADDR_EXPR are not consistently setting the addressability bit. This led me to this patch, which I'm now testing. This removes the workaround we had in the aliaser and consistently marks every DECL that's put in an ADDR_EXPR as addressable. One thing that I'm wondering about this patch is why hasn't this been done before? We seem to purposely separate TREE_ADDRESSABLE from ADDR_EXPR. Perhaps to prevent pessimistic assumptions? The current aliasing code removes addressability when it can prove otherwise. This patch bootstraps all default languages. I'll test Ada later on, but I need input from all the FE folks. Thanks. 2007-03-30 Diego Novillo <[EMAIL PROTECTED]> * tree.c (build1_stat): When building ADDR_EXPR of a DECL, mark it addressable. * tree-ssa-alias.c (add_may_alias): Assert that ALIAS may be aliased. * c-typeck.c (c_mark_addressable): Handle LABEL_DECL. Index: tree.c === --- tree.c (revision 123332) +++ tree.c (working copy) @@ -2922,7 +2922,11 @@ build1_stat (enum tree_code code, tree t case ADDR_EXPR: if (node) - recompute_tree_invariant_for_addr_expr (t); + { + recompute_tree_invariant_for_addr_expr (t); + if (DECL_P (node)) + lang_hooks.mark_addressable (node); + } break; default: Index: tree-ssa-alias.c === --- tree-ssa-alias.c (revision 123332) +++ tree-ssa-alias.c (working copy) @@ -2045,11 +2045,7 @@ add_may_alias (tree var, tree alias) gcc_assert (var != alias); /* ALIAS must be addressable if it's being added to an alias set. */ -#if 1 - TREE_ADDRESSABLE (alias) = 1; -#else gcc_assert (may_be_aliased (alias)); -#endif if (v_ann->may_aliases == NULL) v_ann->may_aliases = VEC_alloc (tree, gc, 2); Index: c-typeck.c === --- c-typeck.c (revision 123332) +++ c-typeck.c (working copy) @@ -3247,6 +3247,7 @@ c_mark_addressable (tree exp) /* drops in */ case FUNCTION_DECL: + case LABEL_DECL: TREE_ADDRESSABLE (x) = 1; /* drops out */ default:
Re: GCC 4.2.0 Status Report (2007-03-22)
Diego Novillo wrote on 03/30/07 13:21: > This patch bootstraps all default languages. I'll test Ada later on, > but I need input from all the FE folks. Sigh. I forgot to include Mark's suggestion in the patch. With this patch, calling build_address in dfs_accumulate_vtbl_inits is not strictly required (because we mark the DECL addressable in build1 now), but I will include it in the final version.
Re: GCC 4.2.0 Status Report (2007-03-22)
Richard Kenner wrote on 03/30/07 13:45: > One concern I have in marking a DECL addressable that early on is that > it may stay "stuck" even if the ADDR_EXPR is later eliminated. This can > be common in inlined situations, I thought. The aliaser is fairly aggressive at removing TREE_ADDRESSABLE from variables that do not need it anymore, so that should not be a problem. > We *do* have to make up our mind, of course, on a precise time when it's > set and be very clear about whether we can reset it (and how) if we > discover later that the address actually isn't being taken. Agreed.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Richard Guenther wrote on 04/10/07 08:01: > It looks decent, but I also would go one step further with location > information and > PHI node "canonicalization". What would be the "step further" for PHI nodes? We haven't really left anything to spare inside GS_PHI. For insn locators, the idea is to simply move the RTL insn locator code into GIMPLE. This can be even done early in the implementation process, but for simplicity we left it for later. If someone wants to work on that, then great. > Further for memory usage we may want to use > available padding on gimple_statement_base as flags or somehow trick gcc to > use > tail-padding for inheritance... There is only going to be padding on 64 bit hosts. Instructions with no subcode will use those bits as bitflags. > For traversal speed I'd also put the chain first in the structure: Sure. That's easy enough to experiment with while doing the implementation.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Andrew Pinski wrote on 04/10/07 01:43: > Yes clobbers for asm are strings and don't really need to be full > trees. This should help out more. though it does make it harder to > implement. Hmm, could be. The problem is that __asm__ are so infrequent that I'm not sure it's worth the additional headache. > For "GS COND", you forgot about all the unorder conditionals which > don't happen that often but can. Good point. Thanks. > Most of the labels can go away with better handling of the CFG. Gotos > really should pointing to the basic block rather than labels. Remember that when we emit GIMPLE, we are not working with a CFG. One could argue that we would want to make the GOTO target a union of a label and block, but a label-to-block map is just as good, and easier to work with. > I also think we can improve our current gimplification which produces > sometimes ineffient gimplification which will change your numbers of > how many copies exist, see PRs 27798, 27800, 23401, 27810 (this one > especially as combine.i numbers show that). True. But remember that the stats were gathered inside the SSA verification routines, which happens during optimization. All those gimplification inefficiencies are quickly eliminated by the first few scalar cleanups. I very much doubt that you would see significantly different instruction distribution profiles if you made improvements in the gimplifier. > Also I noticed in your pdf, you have "PHI NODE" as 12%, we can improve > the memory usage for this statement by removing the usage of > TREE_CHAIN/TREE_TYPE, so we can save 4/8 bytes for those 12% without > doing much work. I can send a patch in the next week or so (I am busy > at a conference the next two days but I can start writting a patch > tomorrow). Well, if you want to do this patch for 4.3 and it gives us sufficient benefit, go for it. But it will have no effect on the tuples branch.
Re: RFC: GIMPLE tuples. Design and implementation proposal
J.C. Pizarro wrote on 04/10/07 01:24: > 1. Are there fields for flags, annotations, .. for special situations? Yes, some instructions have no sub-codes and will use the subcodes field for flags. Annotations and such will be discouraged as much as possible. If an attribute is very frequently used and causes significant degraded behaviour in pointer-maps or hash tables, then we can see about adding it somewhere. > 2. These structures are poorly specified. > Have they advanced structures like lists, e.g., list of predecessors >instructions of loops, predecessors instructions of forwarded >jumps, etc. instead of poor "prev"? I think you are missing the point. This structure defines the bits needed for representing GIMPLE. All we need is a double-chain of instructions. These chains are embedded inside basic blocks. > 3. Are there fields for more debug information? More debug information? What debug information are you looking for?
Re: RFC: GIMPLE tuples. Design and implementation proposal
J.C. Pizarro wrote on 04/10/07 02:08: > However, they've appeared the "conditional moves" to don't jump > and consecuently to reduce the penalization of the conditional jump. We already have conditional moves. Notice that subcodes for GS_ASSIGN have the same meaning as they do today. GS_ASSIGN:{EQ_EXPR,NE_EXPR...} will have three operands.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Steven Bosscher wrote on 04/10/07 02:43: > On 4/10/07, Diego Novillo <[EMAIL PROTECTED]> wrote: >> Thoughts/comments on the proposal? > > This looks a lot like the RTL insn! > > For locus, you can use just an "int" instead of a word if you use the > same representation for locations as we do for RTL (INSN_LOCATOR). You > mention this step as "straightforward to implement" after the > conversion to tuples is complete. Why did you decide not to just use > the scheme right from the start? One less thing to think about. That's the only reason, if anyone wants to work on this from day 1, then great. But since this is easily fixable afterwards, and we'll already have enough headaches with the basic conversion, it seemed simpler just to let this be for now. > this: You can use the same locator information in GIMPLE as in RTL, > which saves a conversion step; and you free up 32 bits on 64-bits > hosts, which is nice when you add the inevitable so-many-bits for > flags to the GIMPLE tuples ;-) Absolutely. The advantages are very clear. Location information at every insn is extremely redundant. > I don't really like the idea for promoting subcodes to first-level > codes, like you do for GS_COND NE and EQ. Looks complicated and > confusing to me. What is the benefit of this? Mostly, speed of recognition. I'm not totally against dropping this. As Andrew M. mentioned during our internal discussions, we will now have to implement predicates that recognize all the insns in the "GS_COND" family. This is something that we can do some experimentation. > Looks like a nice document overall. I hope we can keep it up to date, > it's a good start for new-GIMPLE documentation ;-) That's certainly a challenge. I could mumble something about doxygen and how much easier it would be if we could embed this in the source code. But the synchronization problem still remains. Many times we have comments totally out of sync with the code.
Re: RFC: GIMPLE tuples. Design and implementation proposal
J.C. Pizarro wrote on 04/10/07 08:17: > Is a need to build several tables in HTML of the codes (with subcodes). > Each table has an explanation. It's like a roadmap. Hmm, what?
Re: RFC: GIMPLE tuples. Design and implementation proposal
J.C. Pizarro wrote on 04/10/07 10:24: > By example, worth weigths, use's frecuencies, statistical data, ... of GIMPLE. > To debug the GIMPLE too. That's kept separately. Pointer maps, hash tables... > How you debug the failed GIMPLE? Lots of debug_*() functions available. You also use -fdump-tree-... a lot. In the future, I would like us to be able to inject GIMPLE directly at any point in the pipeline to give us the illusion of unit testing.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Richard Guenther wrote on 04/10/07 10:45: > On 4/10/07, Diego Novillo <[EMAIL PROTECTED]> wrote: >> Steven Bosscher wrote on 04/10/07 02:43: >>> I don't really like the idea for promoting subcodes to first-level >>> codes, like you do for GS_COND NE and EQ. Looks complicated and >>> confusing to me. What is the benefit of this? >> Mostly, speed of recognition. I'm not totally against dropping this. >> As Andrew M. mentioned during our internal discussions, we will now have >> to implement predicates that recognize all the insns in the "GS_COND" >> family. >> >> This is something that we can do some experimentation. > > Will this replace the tree code class or is it merely sub-classing and > the sub-code > is the real code we have now? Sorry, I don't understand what you are trying to say.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Richard Guenther wrote on 04/10/07 11:02: > Well, we now have a tcc_comparison for example - is the gimple statement code > something like that? Or will the grouping of gimple statement codes > to code classes > persist? If we were able to encode the class directly in the gimple > statement we > can avoid a memory reference to look up the operands class. For most situations, I would like to avoid the class lookup and be able to go off the statement code directly. I have to admit that I am not totally convinced that this promotion of subcodes to first-level codes is a good idea. Richard suggested it to avoid having to lookup the subcode when recognizing frequent codes like copy assignments. But that also means that we now have to add more cases to switch() and or need || predicates to determine what kind of GS_ASSIGN we are dealing with. I'm ready to be convinced either way. OTOH, either approach would not make the design drastically different. We could explore both options and get numbers.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Ian Lance Taylor wrote on 04/10/07 13:53: > Don't you need four operands for a conditional move? Is that what you > meant? Ah, yes. The two comparison operands and the two assignment values.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Ian Lance Taylor wrote on 04/10/07 13:53: > I seem to recall that at one point somebody worked on a gensimplify > program or something like that. Would it make sense to revive that > approach, and use it to generate simplifiers for trees, GIMPLE, and > RTL, to avoid triplification of these basic optimizations? Perhaps. This would allow us to define folding/simplification using a pattern matching system. I think I like this better than the other two choices. Replicating fold-const.c for GIMPLE would involve a bit of code duplication, but since GIMPLE is a strict subset of ASTs, I think it would be a fraction of what we have today. Still, it's annoying and we should probably avoid it. > Or should we instead rewrite fold-const.c to work on GIMPLE rather > than trees, thus essentially removing constant folding from the > front-ends? If we follow that path somebody would need to think about > the effect on warnings issued by the front-end, and on > __builtin_constant_p. I don't think we want to do that. Folding and simplification needs to be done at just about every level.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Ian Lance Taylor wrote on 04/10/07 14:13: > Diego Novillo <[EMAIL PROTECTED]> writes: > >> Following up on the recent discussion about GIMPLE tuples >> (http://gcc.gnu.org/ml/gcc/2007-03/msg01126.html), we have summarized >> our main ideas and implementation proposal in the attached document. >> >> This should be enough to get the implementation going, but there will be >> many details that still need to be addressed. >> >> Thoughts/comments on the proposal? > > For purposes of LTO, it is essential that we be able to efficiently > load the IR during the whole program optimization phase. Certainly. Now, is GIMPLE going to be our bytecode? Or do we want to invest the time in some other bytecode representation with an eye towards a future JIT component? Regardless of that, streaming GIMPLE is certainly something worth pursuing. Even if it is just to give us the ability to load IL snippets and inject them into the optimizers without having to go through all the intermediate steps. > Part of that is almost certainly going to mean having some sort of > index which will tell us whether to load the IR at all--if the > functions represented in some .o file are rarely called, then we > should use the .o file as-is, and not try to further optimize it. > This is not part of the current LTO plan, but I think it is inevitable > if we are to avoid an hours-long compilation process. > > But there is another part: we need to have an IR which can be very > quickly loaded into memory for further processing. When it comes to > loading IR, there is nothing faster than mmap. That requires that the > IR be stored in the .o file in a form which can be used directly when > it is read in from memory. And ideally that means no pointer > swizzling: the IR should be usable when loaded without modification. > And because the link phase can see arbitrary .o files, we can not use > the PCH hack of requiring a specific memory address. So that requires > an IR which is position independent. > > The obvious way to make the proposed tuples position independent would > be to use array offsets rather than pointers. This has the obvious > disadvantage that every access through a pointer requires an > additional memory reference. On the other hand, it has some other > advantages: it may no longer be necessary to keep a previous pointer I doubt this. We had started with single-linked chains but reverse traversals do occur, and they were very painful and slow. > in each tuple; we can delete tuples by marking them with a deleted > code, and we can periodically garbage collect deleted tuples and fix > up the next pointers. On a 64-bit system, we do not need to burn 64 > bits for each pointer; 32 bits will be sufficient for an array offset. > > I would like us to seriously think about this approach. Most of the > details would be hidden by accessor macros when it comes to actual > coding. The question is whether we can tolerate some slow down for > normal processing in return for a benefit to LTO. > > If anybody can see how to get the best of both worlds, that would of > course be even better. I've thought about this a little bit and it may not be all that onerous. So, if you take the components of a tuple: nextCould be a UID to the next tuple prevLikewise bb Use bb->index here locus Not needed. INSN_LOCATOR. block Likewise. The operands may get tricky, but perhaps not so much. We have a- _DECLs. These are easily replaced with their UID and a symbol table. b- SSA_NAMEs. Just the SSA name version is enough. c- *_CONST. They're just a bit pattern, no swizzling required. But we may need to define byte ordering. c- *_REF. These may be tricky, but we could emit them using a REF table and just put the index here. We then reserve the first few bits to distinguish the class of operand and the remaining bits as the index into the respective table.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Daniel Berlin wrote on 04/10/07 15:18: > There is no need for the addresses_taken bitmap, it's a waste of space. Awesome. I was going to check, but I forgot. I did check the stmt_makes_clobbering_call and that one is also write-only. > Neither of these really needs it, and i have a patch to remove it entirely. Excellent. Thanks.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Dave Korn wrote on 04/10/07 15:39: > Reverse-traversing an array really isn't all that painful or slow! Instructions are not laid out in an array. Insertion and removal is done constantly. It would be very expensive to use arrays to represent basic blocks. Insertions and deletions are all too common. > How about delta-linked lists? Makes your iterators bigger, but makes every > single node smaller. Worth a shot, I guess. Don't recall what other properties these things had.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Andrew Pinski wrote on 04/10/07 16:04: > On 4/10/07, Andrew Pinski <[EMAIL PROTECTED]> wrote: >> Here is the quick patch (thanks to the work done for gimple tuple) >> which does this, removes the unneeded type from phi nodes. I have not >> tested it except for a quick test on some small testcases so there >> might be more places which use TREE_CHAIN instead of PHI_CHAIN. > > This patch has one problem, the GC issue with chain_next and > lang_tree_node, this problem is not hard to fix. I'm not sure what you want me to do with this patch. It's not related to this thread and would not be applicable to the tuples branch. I would suggest that you thoroughly test the patch, measure the benefit and if it's good enough propose it for 4.3.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Richard Henderson wrote on 04/10/07 20:30: > Perhaps I misunderstood what Diego was proposing, but I > would have thought the subcode would continue to be the > tree PLUS_EXPR, and not a GS_PLUS something. Yes. > With that, build_foldN does essentially what we want, > without having to regenerate tree nodes on the input side. Sure, but things will be different if/when the operands stop being 'tree'.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Ian Lance Taylor wrote on 04/10/07 20:49: > I'm having a hard time seeing it. fold_build2 calls fold_binary; I > agree that if we can handle fold_binary, we can handle fold_build2. > But fold_binary takes trees as parameters. How are you thinking of > calling it? the gimple version of z = x + y is stmt -> GS_ASSIGN we have a wrapper that calls: op0 = GIMPLE_OPERAND (stmt, 0); op1 = GIMPLE_OPERAND (stmt, 1); op2 = GIMPLE_OPERAND (stmt, 2); t = fold_build2 (GIMPLE_SUBCODE(stmt), TREE_TYPE (op0), op0, op1, op2); and then stuffs t's operands back into 'stmt'.
Re: RFC: GIMPLE tuples. Design and implementation proposal
Richard Henderson wrote on 04/10/07 21:19: > On Tue, Apr 10, 2007 at 08:48:27PM -0400, Diego Novillo wrote: >> Sure, but things will be different if/when the operands stop being 'tree'. > > We'll burn that bridge when we come to it. Works for me.
New wiki page on testing compile times and memory usage
I've added a collection of scripts that I have gathered over time to test compile time and memory usage when making changes to the compiler. http://gcc.gnu.org/wiki/PerformanceTesting If you have other scripts or tests that could be used for this, please add them to this page. Thanks.
GIMPLE tuples document uploaded to wiki
I have added the design document and links to most of the discussions we've had so far. Aldy updated the document to reflect the latest thread. http://gcc.gnu.org/wiki/tuples
Re: GIMPLE tuples document uploaded to wiki
Jan Hubicka wrote on 04/14/07 16:14: > Looks great, still I think "locus" and "block" could be both merged into > single integer, like RTL land has INSN_LOCATOR. That's the idea. But it's simpler to do this for now. The insn locator is easily done at anytime during the implementation. > Also ssa_operands structures should be somewhere in the header and uid > would be handy for on-side datastructures. No. SSA operands need to be split in the instructions that actually need them. Also, UIDs are tempting but not really needed. I would only consider them if using pointer-maps or hash tables gets outrageously expensive. > In CFG getting rid of labels in GS_COND woulsd actually save us > noticeable amount of memory by avoiding the need for labels. Perhaps we > can simply define "true branch target"/"false branch target" to point to > label or BB depending on CFG presence or to be NULL after CFG conversion > and rely on CFG edges. GS_SWITCH would be harder, since association with > CFG edges is not so direct. Sure, that would be something to consider.
Re: GIMPLE tuples document uploaded to wiki
Jan Hubicka wrote on 04/14/07 21:14: > I just wondered if your document is documenting the final shape or what > should be done during hte first transition. If the second, probably 2 > words should be accounted for location as source_locues is currently a > structure. The document is describing what the initial implementation will look like. It will/should evolve as the implementation progresses. > So you expect the ssa_operands to be associated via a hashtable Hmm? No, they are there. Notice gimple_statement_with_ops and gimple_statement_with_memory_ops. > Concerning uids, it is always dificult to get some good data on this > sort of thing. It seems to me that the UID would be handly and easy to > bundle to some other integer, but it is not too important, especially if > get some handy abstraction to map data with statements that we can > easilly turn in between hashtables and arrays to see the difference > instead of writting hasthables by hand in every pass doing this. I grepped for uid and IIRC there are only two passes using UIDs: DSE and PRE. We should see how they react to having uid taken away from them. > I have some data for this, lets discuss it at ICE. Sounds good.