Reposting this as well. Ollie
On Wed, Jun 4, 2008 at 9:14 AM, Chris Lattner <[EMAIL PROTECTED]> wrote: > > 1) start with all code in memory and see how far you can get. It seems that > on reasonable developer machines (e.g. 2GB memory) that we can handle C > programs on the order of a million lines of code, or C++ code on the order of > 400K lines of code without a problem with LLVM. This is essentially what the lto branch does today, and I don't see any reason to disable this feature. In the language of the WHOPR design, the lto branch supports LGEN + LTRANS, with WPA bypassed completely. For implementing WPA, my intention is to add a new flag (-fpartition or whatever else people think is suitable) to instruct the lto1 front end to perform partitioning (aka repackaging) of .o files, execute summary IPA analysese, and kick off a separate LTRANS phase. This gives us two modes of operation: one in which all object files are loaded into memory and optimized using the full array of passes available to GCC; and one which does some high-level analysis on the whole program, partitions the program into smaller pieces, and does more detailed analysis + grunt work on the smaller pieces. > > 2) start leaving function bodies on disk, use lazily accesses, and a cache > manager to keep things in memory when needed. I think this will let us scale > to tens or hundreds of million line code bases them. I see no reason to take > a whopr approach just to be able to handle large programs. In addition to memory consumption, there is also the question of time consumption. Alternative LTO implementations by HP, Intel, and others follow this model and spend multiple hours optimizing even moderately large programs. Ollie