Re: Optimize type streaming

Jan Hubicka Wed, 09 Jul 2014 01:59:29 -0700

Hello,
perhaps I could write bit more on my longer term plans.  At the moment 30% of 
firefox WPA is taken
by straming trees and another roughly 30% is taken by inliner.  It is bit 
anoying but relatively
easy to optimize inliner, but trees represent bigger problem.


According to the stats average tree is streamed in 20times and according to 
perf we spend about 1/4th
by unpacking the sections and then actual read of fields & SCC unification 
dominates.  At low level,
tree streaming is already pretty well optimized.

I started to look into the following:

1) putting types&decls on diet
 
   I started to move individual fields into more fitting locations, getting rid 
of
   one field for many different reasons.  I am trying to do this incrementally
   and keeping about one field per week flow. Currenlty I am stuck at:

   https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01969.html

   (moving DECL_ARGUMENTS). The plan is to 
   - get rid of decl_with_vis.  I removed all fields except for symbol table 
pointer
     (that will stay) and some flags I plan to handle soon - comdat, weak and 
visibility.
     The last one is harder because C++ FE uses it on type declarations, but it 
is almost done).
     The rest of flags (few variable/function specific items that has nothing
     to do with visibility) can go into decl_common where it is enough of space.
   - get rid of decl_non_common
     Here I need to move arguments and results. Have patches for both.
   - I plan to do the same on type side - decompose TYPE_NON_COMMON in favour 
of explicit type
     hiearchy.
   - experiment with getting rid of RTL pointer 

     I plan to test moving DECL_RTL into on-side tables (one global for
     global RTLs and one local for per-functoin RTLs). This should get us 
closer moving RTL
     into per-function storage again and make RTL easier to reclaim.
   - Once done with these I can recast the inheritance to have DATA_TYPE and 
DATA_DECL
     that is common base of types/decls that do have data associated with them. 
 Those can
     cary mode, sizes, alias info that is not needed for functions, labels, 
type declarations
     etc.

     I also wonder if we need canonical types for FUNCTION_TYPE/METHOD_TYPE and 
other thing
     that is not associated with readable data.

     This has bit of multiple inheritance issues (that I do not want to 
introduce),
     since we have decls with symbol table and decls with data.  I think simple 
union
     for that single symtab pointer will do.  In fact I already tested 
restricting
     DECL_SIZE&friends to decls with data, but there is a lot of frontend 
updating to do,
     as these fields are overriden for many of the FE declarations.  (it is 
reason why I
     added FE machinery to allow custom memory storage for newly added ecls in 
the patch above)

   Naturally this is good from maintenance point of view, it has potential to 
reduce memory
   footprint, streaming size, improve mergeability of trees (if definition and 
external declarations
   looks the same in tree decls, we will merge more type variants, because 
currently we keep class types
   in two copies, one for unit definig them and other for units using them) and 
also avoid
   stremaing of stale pointers, but it is a slow progress and the direct 
benefits are limited.

2) put BINFOs on diet

   BINFOs are currently added to every class type.  We can drop them in case 
they do
   not hold useful information for devirtualization neither debug info.  This 
is now
   quite well defined.  Main offender is ipa-prop that still uses 
get_binfo_at_offset
   and walks binfos it should not.  I am working on it.

3) ODR type merging

   I have patches for this, but want to go bit curefuly - I need to discuss 
with Jason
   the anonymous types and get code for checking ODR violations working well.

   Basically for ODR types I can merge variant lists that results in leaner 
debug info
   and bit less of streaming WPA->ltrans
   It is also important for type propagation and I have prototype to handle 
canonical types
   of ODR and anonymous types specially.

   This actually increases LTO stream sizes (uncompressed) by about 6% to 
stream explicit
   mangled names.  My 4.10 with the patch is still faster than 4.9 but 
definitely would be
   happier if there was easier way around

4) Reduce size of LTO streams

   This is what I was shooting for with the variant streaming (in addition to 
have sanity checker
   for 3 as bugs in these may turn types into a crazy soup quite easily).
   Types and decls are most common things to stream, 50% of types are variants, 
so not streaming
   duplicated data in variants has chance to save about 30-40% of type storage.
   Decls inherits some stuff from types (99% of time), like DECL_SIZE and 
friends.

   In my tests I went from compression ration over 3 to 2.1 keeping about the 
same gzipped
   data - so this speeds up unpacking & rebuilding trees, since direct copies 
are faster than
   LTO streamer table lookups.

5) Avoid merging of unmergeable things

   This is the patch that drops hashtable to 1 for things where we know we do 
not want to merge.
   This is needed for correctness of ODR types and it also improves compression 
ration of the
   streams as SCC hashes are hard to gzip.

6) Put variable initializers into named sections (as function bodies)

   This is supposed to help vtables, but I am always too lazy to dive into 
details of our
   ugly low level section API.

7) Improve streaming of locations, as discussed several times.  Again I am bit 
discouraged
   but need to make extra section etc.  Location lookup still shows high in the 
profile.
   
So some of my immediate plans.
Honza

Re: Optimize type streaming

Reply via email to