Hello, perhaps I could write bit more on my longer term plans. At the moment 30% of firefox WPA is taken by straming trees and another roughly 30% is taken by inliner. It is bit anoying but relatively easy to optimize inliner, but trees represent bigger problem.
According to the stats average tree is streamed in 20times and according to perf we spend about 1/4th by unpacking the sections and then actual read of fields & SCC unification dominates. At low level, tree streaming is already pretty well optimized. I started to look into the following: 1) putting types&decls on diet I started to move individual fields into more fitting locations, getting rid of one field for many different reasons. I am trying to do this incrementally and keeping about one field per week flow. Currenlty I am stuck at: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01969.html (moving DECL_ARGUMENTS). The plan is to - get rid of decl_with_vis. I removed all fields except for symbol table pointer (that will stay) and some flags I plan to handle soon - comdat, weak and visibility. The last one is harder because C++ FE uses it on type declarations, but it is almost done). The rest of flags (few variable/function specific items that has nothing to do with visibility) can go into decl_common where it is enough of space. - get rid of decl_non_common Here I need to move arguments and results. Have patches for both. - I plan to do the same on type side - decompose TYPE_NON_COMMON in favour of explicit type hiearchy. - experiment with getting rid of RTL pointer I plan to test moving DECL_RTL into on-side tables (one global for global RTLs and one local for per-functoin RTLs). This should get us closer moving RTL into per-function storage again and make RTL easier to reclaim. - Once done with these I can recast the inheritance to have DATA_TYPE and DATA_DECL that is common base of types/decls that do have data associated with them. Those can cary mode, sizes, alias info that is not needed for functions, labels, type declarations etc. I also wonder if we need canonical types for FUNCTION_TYPE/METHOD_TYPE and other thing that is not associated with readable data. This has bit of multiple inheritance issues (that I do not want to introduce), since we have decls with symbol table and decls with data. I think simple union for that single symtab pointer will do. In fact I already tested restricting DECL_SIZE&friends to decls with data, but there is a lot of frontend updating to do, as these fields are overriden for many of the FE declarations. (it is reason why I added FE machinery to allow custom memory storage for newly added ecls in the patch above) Naturally this is good from maintenance point of view, it has potential to reduce memory footprint, streaming size, improve mergeability of trees (if definition and external declarations looks the same in tree decls, we will merge more type variants, because currently we keep class types in two copies, one for unit definig them and other for units using them) and also avoid stremaing of stale pointers, but it is a slow progress and the direct benefits are limited. 2) put BINFOs on diet BINFOs are currently added to every class type. We can drop them in case they do not hold useful information for devirtualization neither debug info. This is now quite well defined. Main offender is ipa-prop that still uses get_binfo_at_offset and walks binfos it should not. I am working on it. 3) ODR type merging I have patches for this, but want to go bit curefuly - I need to discuss with Jason the anonymous types and get code for checking ODR violations working well. Basically for ODR types I can merge variant lists that results in leaner debug info and bit less of streaming WPA->ltrans It is also important for type propagation and I have prototype to handle canonical types of ODR and anonymous types specially. This actually increases LTO stream sizes (uncompressed) by about 6% to stream explicit mangled names. My 4.10 with the patch is still faster than 4.9 but definitely would be happier if there was easier way around 4) Reduce size of LTO streams This is what I was shooting for with the variant streaming (in addition to have sanity checker for 3 as bugs in these may turn types into a crazy soup quite easily). Types and decls are most common things to stream, 50% of types are variants, so not streaming duplicated data in variants has chance to save about 30-40% of type storage. Decls inherits some stuff from types (99% of time), like DECL_SIZE and friends. In my tests I went from compression ration over 3 to 2.1 keeping about the same gzipped data - so this speeds up unpacking & rebuilding trees, since direct copies are faster than LTO streamer table lookups. 5) Avoid merging of unmergeable things This is the patch that drops hashtable to 1 for things where we know we do not want to merge. This is needed for correctness of ODR types and it also improves compression ration of the streams as SCC hashes are hard to gzip. 6) Put variable initializers into named sections (as function bodies) This is supposed to help vtables, but I am always too lazy to dive into details of our ugly low level section API. 7) Improve streaming of locations, as discussed several times. Again I am bit discouraged but need to make extra section etc. Location lookup still shows high in the profile. So some of my immediate plans. Honza