-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The topic of our internal data structures comes up every so often and it will become particularly important now that we are planning to add link-time and dynamic optimizations to GCC. I would like to get started on some initial cleanups that should help when time comes to either stream GIMPLE (or some other variant) out to disk or when combining multiple compilation units for interprocedural optimizations. The common thread to these cleanups is going to be memory savings, and to a certain extent compile time improvements. I'm not looking for any new optimization opportunities nor improved analyses. Well, there is one additional feature that I would like to shoot for: the ability to save GIMPLE out and read it back in at an arbitrary point in the compilation phase. 1- Memory SSA The virtual SSA representation is very bloated, particularly in the presence of imprecise alias information. This work attempts to compact that information without risking precision. This is currently underway and expected to be integrated by the next stage 1. More details at http://gcc.gnu.org/ml/gcc/2006-02/msg00620.html 2- GIMPLE tuples New data structures for representing GIMPLE statements. The current IL uses a central data structure called 'tree'. Every IL node has this type, which results in nodes with unused fields. The basic idea is to analyze the tree codes used in GIMPLE and create smaller data structures to represent them. This will require some exploratory work and details are a bit vague, but some early discussion and implementation strategies were discussed in http://gcc.gnu.org/ml/gcc/2006-02/msg00622.html. * Create a basic 1 or 2 word structure that's basically a code and some bitfields. GIMPLE statements and expressions inherit from this structure. * Introduce the notion of GIMPLE statements and GIMPLE expressions. Each has attributes that the other does not need. A statement will have location information and no type, while an expression will have type and no location information. * GIMPLE statements have an array of operands, location info (including basic block) and an ID. Each operand will be a GIMPLE expression. * GIMPLE expressions have 0 or more operands and a type. SSA names are GIMPLE expressions with 0 operands. * Convert compiler temporaries into smaller DECLs and/or emit them directly as SSA names or just indices into a temporary table. (http://gcc.gnu.org/ml/gcc/2005-12/msg00632.html, http://gcc.gnu.org/ml/gcc/2006-01/msg00510.html) * Move annotations to hash tables or make them part of the IL structures. I presume that this activity may take a good 8-12 weeks (assuming 40-45 hour weeks). I will dedicate some of my development time to this, though my main focus for now is going to be mem-ssa. I will create a branch in the next few days to host this work. Or perhaps mem-ssa could be used for this, I'm still not sure. I'd appreciate thoughts/ideas in this area, and code, of course. 3- Remove on-the-side data structures A key property of a streamable IL is to be able to recover full compilation state from an isolated file. This means that the IL must have all the information required to create final code out of it. We currently keep some on-the-side data structures and/or global variables that are created at various points starting at the parser. A litmus test for this would be a pass that can read a program in GIMPLE form from a file and compile it to completion. This will force us to commit to a completely explicit representation: variables, types, exception handling, etc. It's not clear yet whether we will want to make GIMPLE itself streamable for the purposes of optimization. Perhaps we will end up using some other variant like the GVM bytecodes proposed in LTO or convert into the bytecodes used by LLVM. But I think it should be useful if we could read/write GIMPLE for developing test cases and general debugging of the compiler itself. Anything else I may have missed? There are other related activities, like keeping the whole callgraph in SSA form that may also benefit from these cleanups. All these cleanups could be either hosted on the same branch, or done in separate branches. However, I think concentrating the work on a single branch may allow better integration and memory savings testing. Thanks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEWeulUTa2oAUaiwQRAidEAJ9Zwrku7cHBy/hqiz96rNFiBEH2agCfRajf L2YfHlJBHjCzoix4gWQhV0Q= =9Swx -----END PGP SIGNATURE-----