-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The topic of our internal data structures comes up every so often and it
will become particularly important now that we are planning to add
link-time and dynamic optimizations to GCC.

I would like to get started on some initial cleanups that should help
when time comes to either stream GIMPLE (or some other variant) out to
disk or when combining multiple compilation units for interprocedural
optimizations.

The common thread to these cleanups is going to be memory savings, and
to a certain extent compile time improvements.  I'm not looking for any
new optimization opportunities nor improved analyses.

Well, there is one additional feature that I would like to shoot for:
the ability to save GIMPLE out and read it back in at an arbitrary point
in the compilation phase.



1- Memory SSA

The virtual SSA representation is very bloated, particularly in the
presence of imprecise alias information.  This work attempts to compact
that information without risking precision.  This is currently underway
and expected to be integrated by the next stage 1.  More details at
http://gcc.gnu.org/ml/gcc/2006-02/msg00620.html


2- GIMPLE tuples

New data structures for representing GIMPLE statements.  The current IL
uses a central data structure called 'tree'.  Every IL node has this
type, which results in nodes with unused fields.  The basic idea is to
analyze the tree codes used in GIMPLE and create smaller data structures
to represent them.

This will require some exploratory work and details are a bit vague, but
some early discussion and implementation strategies were discussed in
http://gcc.gnu.org/ml/gcc/2006-02/msg00622.html.

* Create a basic 1 or 2 word structure that's basically a code and some
bitfields.  GIMPLE statements and expressions inherit from this structure.

* Introduce the notion of GIMPLE statements and GIMPLE expressions.
Each has attributes that the other does not need.  A statement will have
location information and no type, while an expression will have type and
no location information.

* GIMPLE statements have an array of operands, location info (including
basic block) and an ID.  Each operand will be a GIMPLE expression.

* GIMPLE expressions have 0 or more operands and a type.  SSA names are
GIMPLE expressions with 0 operands.

* Convert compiler temporaries into smaller DECLs and/or emit them
directly as SSA names or just indices into a temporary table.
(http://gcc.gnu.org/ml/gcc/2005-12/msg00632.html,
http://gcc.gnu.org/ml/gcc/2006-01/msg00510.html)

* Move annotations to hash tables or make them part of the IL structures.

I presume that this activity may take a good 8-12 weeks (assuming 40-45
hour weeks).  I will dedicate some of my development time to this,
though my main focus for now is going to be mem-ssa.  I will create a
branch in the next few days to host this work.  Or perhaps mem-ssa could
be used for this, I'm still not sure.  I'd appreciate thoughts/ideas in
this area, and code, of course.


3- Remove on-the-side data structures

A key property of a streamable IL is to be able to recover full
compilation state from an isolated file.  This means that the IL must
have all the information required to create final code out of it.  We
currently keep some on-the-side data structures and/or global variables
that are created at various points starting at the parser.

A litmus test for this would be a pass that can read a program in GIMPLE
form from a file and compile it to completion.  This will force us to
commit to a completely explicit representation: variables, types,
exception handling, etc.

It's not clear yet whether we will want to make GIMPLE itself streamable
for the purposes of optimization.  Perhaps we will end up using some
other variant like the GVM bytecodes proposed in LTO or convert into the
bytecodes used by LLVM.  But I think it should be useful if we could
read/write GIMPLE for developing test cases and general debugging of the
compiler itself.


Anything else I may have missed?  There are other related activities,
like keeping the whole callgraph in SSA form that may also benefit from
these cleanups.  All these cleanups could be either hosted on the same
branch, or done in separate branches.  However, I think concentrating
the work on a single branch may allow better integration and memory
savings testing.


Thanks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEWeulUTa2oAUaiwQRAidEAJ9Zwrku7cHBy/hqiz96rNFiBEH2agCfRajf
L2YfHlJBHjCzoix4gWQhV0Q=
=9Swx
-----END PGP SIGNATURE-----

Reply via email to