Attached is the memory consumption report for a very large source file. Looks like this patch actually reduced the memory consumption by 2%.
Dehao On Thu, Sep 13, 2012 at 1:18 AM, Xinliang David Li <davi...@google.com> wrote: > On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <de...@google.com> wrote: >> There are two parts that needs memory management: >> >> 1. The BLOCK structure. This is managed by GC. I originally thought >> that removing blocks from tree.gsbase would paralyze GC. This turned >> out not to be a concern because DECL_INITIAL will still mark those >> used tree nodes. This patch may decrease the memory consumption by >> removing blocks from tree/gimple. However, as it makes more blocks >> become used, they also increase the memory consumption. > > You mean when you also make the location table GC root. > > Can you share the mem-stats information for the large program with and > without your patch? > > thanks, > > David > >> 2. The data structure in libcpp that maintains the hashtable for the >> location->block mapping. This is relatively minor because for the >> largest source I've seen, it only maintains less than 100K entries in >> the array (less than 1M total memory consumption). However, as it is a >> global data structure, it may make LTO unhappy. Honza is helping >> testing the memory consumption on LTO (but we first need to make this >> patch work for LTO). If the LTO result turns out ok, we probably don't >> want to put these under GC because: 1. it'll make things much more >> complicated. 2. using self managed memory is more efficient (as this >> is frequently used in many passes). 3. not using GC actually saves >> memory because even though the block is in the map, it can still be >> GCed as soon as it's not reachable from DECL_INITIAL. >> >> I've tested this on some very large C++ files (each one takes more >> than 10s to build), the memory consumption does not see noticeable >> increase/decrease. >> >> Thanks, >> Dehao >> >> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davi...@google.com> >> wrote: >>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther >>> <richard.guent...@gmail.com> wrote: >>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <de...@google.com> wrote: >>>>> Now I think we are facing a more complex problem. The data structure >>>>> we use to store the location_adhoc_data are file-static in linemap.c >>>>> in libcpp. These data structures are not guarded by GTY(()). >>>>> Meanwhile, as we have removed the block data structure from >>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t). >>>>> This could cause block being GCed and the LOCATION_BLOCK becoming >>>>> dangling pointers. >>>> >>>> Uh. Note that it is quite important that we are able to garbage-collect >>>> unused >>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in >>>> remove_unused_locals. So this indeed becomes much more complicated ... >>>> What would be desired is that the garbage collector can NULL an entry in >>>> the mapping table when it is not referenced in any other way (that other >>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs >>>> DECL_INITIAL). >>> >>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS >>> are created for a large C++ program. This patch saves memory by >>> shrinking tree size, is it a net win or loss without GC those BLOCKS? >>> >>> thanks, >>> >>> David >>> >>> >>>> >>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from >>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can >>>>> help me. >>>>> >>>>> Another approach would be guard the location_adhoc_data and related >>>>> data structures in GTY(()). However, this is non-trivial because tree >>>>> is not visible in libcpp. At the same time, my implementation heavily >>>>> relies on hashtable to make the code efficient, thus it's quite tricky >>>>> to make "param_is" and "use_params" work. >>>>> >>>>> The final approach, which I'll try tomorrow, would be move all my >>>>> implementation from libcpp to gcc, and guard them with GTY(()). I >>>>> still haven't thought of any potential problem of this approach. Any >>>>> comments? >>>> >>>> I think moving the mapping to GC in a lazy manner as I described above >>>> would be the way to go. For hashtables GC already supports if_marked, >>>> not sure if similar support is available for arrays/vecs. >>>> >>>> Richard. >>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <de...@google.com> wrote: >>>>>> I saw comments in tree-streamer-out.c: >>>>>> >>>>>> /* Do not stream BLOCK_SOURCE_LOCATION. We cannot handle debug >>>>>> information >>>>>> for early inlining so drop it on the floor instead of ICEing in >>>>>> dwarf2out.c. */ >>>>>> streamer_write_chain (ob, BLOCK_VARS (expr), ref_p); >>>>>> >>>>>> However, what the code is doing seemed contradictory with the comment. >>>>>> Or am I missing something? >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <m...@suse.de> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote: >>>>>>> >>>>>>>> Looks like we have two choices: >>>>>>>> >>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t) >>>>>>> >>>>>>> This will actually not work correctly in some cases. The problem is, if >>>>>>> the prevailing decl is already part of another chain (say in another >>>>>>> block_var list) you would break the current chain. Hence block vars >>>>>>> need >>>>>>> special handling in the lto streamer (another reason why tree_chain is >>>>>>> not >>>>>>> the most clever think to use for this chain). This problem area needs >>>>>>> to >>>>>>> be solved somehow if block info is to be preserved correctly. >>>>>>> >>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL >>>>>>>> (TREE_CHAIN (t)). >>>>>>> >>>>>>> That's also a large hammer as it basically will mean no debug info after >>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve >>>>>>> quite some work, perhaps your hack is good enough for the time being, >>>>>>> though I hate it :) >>>>>> >>>>>> I got it. Then I'll keep the patch as it is (remove the >>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and >>>>>> then we should be good to check in? >>>>>> >>>>>> Thanks, >>>>>> Dehao >>>>>> >>>>>>> >>>>>>> >>>>>>> Ciao, >>>>>>> Michael.
Number of expanded macros: 23481 Average number of tokens per macro expansion: 12 Line Table allocations during the compilation process Number of ordinary maps used: 1451 Ordinary map used size: 56k Number of ordinary maps allocated: 1638 Ordinary maps allocated size: 63k Number of macro maps used: 0 Macro maps used size: 0 Macro maps locations size: 0 Macro maps size: 0 Duplicated maps locations size: 0 Total allocated maps size: 63k Total used maps size: 56k Memory still allocated at the end of the compilation process Size Allocated Used Overhead 8 88k 64k 2640 16 16M 4726k 356k 32 31M 11M 570k 64 60M 37M 960k 128 7296k 6833k 99k 256 29M 24M 417k 512 1224k 753k 16k 1024 21M 21M 301k 2048 600k 484k 8400 4096 2660k 2660k 36k 8192 1352k 1352k 9464 16384 2560k 2560k 8960 32768 288k 288k 504 65536 512k 512k 448 131072 256k 256k 112 262144 2048k 2048k 448 524288 2560k 2560k 280 1048576 4096k 4096k 224 2097152 2048k 2048k 56 24 48M 7165k 875k 40 66M 24M 1068k 48 21M 5121k 342k 56 14M 4309k 230k 72 30M 16M 431k 80 82M 49M 1149k 88 15M 10M 218k 104 10M 8041k 152k 112 10056k 2527k 137k 120 10M 6083k 146k 184 20M 17M 292k 152 8996k 6841k 122k 160 38M 11M 532k 168 53M 40M 754k 96 17M 6035k 251k 304 54M 49M 761k 136 35M 31M 496k Total 726M 421M 10M String pool entries 187117 identifiers 111112 (59.38%) slots 262144 deleted 67636 bytes 5237k (17592186044410M overhead) table size 2048k coll/search 1.3684 ins/search 0.0853 avg. entry 28.66 bytes (+/- 34.34) longest entry 236 ??? tree nodes created (No per-node statistics) Type hash: size 131071, 68058 elements, 1.008263 collisions DECL_DEBUG_EXPR hash: size 8191, 131 elements, 0.527951 collisions DECL_VALUE_EXPR hash: size 1021, 16 elements, 0.000000 collisions no search statistics decl_specializations: size 65521, 33767 elements, 1.169079 collisions type_specializations: size 32749, 22750 elements, 1.193797 collisions No gimple statistics Alias oracle query stats: refs_may_alias_p: 8495 disambiguations, 24866 queries ref_maybe_used_by_call_p: 152203 disambiguations, 27333 queries call_may_clobber_ref_p: 151635 disambiguations, 151635 queries PTA query stats: pt_solution_includes: 430 disambiguations, 17991 queries pt_solutions_intersect: 1291 disambiguations, 655295 queries
Number of expanded macros: 23481 Average number of tokens per macro expansion: 12 Line Table allocations during the compilation process Number of ordinary maps used: 1451 Ordinary map used size: 56k Number of ordinary maps allocated: 1638 Ordinary maps allocated size: 63k Number of macro maps used: 0 Macro maps used size: 0 Macro maps locations size: 0 Macro maps size: 0 Duplicated maps locations size: 0 Total allocated maps size: 63k Total used maps size: 56k Memory still allocated at the end of the compilation process Size Allocated Used Overhead 8 88k 64k 2640 16 17M 4640k 376k 32 48M 13M 865k 64 51M 34M 820k 128 8596k 6686k 117k 256 27M 23M 385k 512 1184k 753k 16k 1024 21M 21M 299k 2048 592k 484k 8288 4096 2660k 2660k 36k 8192 1352k 1352k 9464 16384 2560k 2560k 8960 32768 288k 288k 504 65536 512k 512k 448 131072 256k 256k 112 262144 1792k 1792k 392 524288 3072k 3072k 336 1048576 4096k 4096k 224 2097152 2048k 2048k 56 24 55M 7228k 995k 40 61M 22M 990k 48 19M 6224k 318k 56 14M 4400k 238k 72 45M 17M 643k 80 65M 42M 920k 88 12M 10M 172k 104 10M 8087k 145k 112 10124k 2468k 138k 120 10M 5958k 141k 184 21M 17M 295k 152 8348k 6826k 114k 160 19M 6792k 275k 168 54M 40M 757k 96 18M 6030k 259k 304 54M 49M 762k 136 33M 31M 473k Total 710M 407M 10M String pool entries 187117 identifiers 108460 (57.96%) slots 262144 deleted 70040 bytes 5133k (17592186044410M overhead) table size 2048k coll/search 1.3759 ins/search 0.0847 avg. entry 28.09 bytes (+/- 34.18) longest entry 236 ??? tree nodes created (No per-node statistics) Type hash: size 131071, 68058 elements, 1.009220 collisions DECL_DEBUG_EXPR hash: size 8191, 131 elements, 0.457233 collisions DECL_VALUE_EXPR hash: size 1021, 15 elements, 0.033784 collisions no search statistics decl_specializations: size 65521, 33767 elements, 1.169079 collisions type_specializations: size 32749, 22750 elements, 1.193797 collisions No gimple statistics Alias oracle query stats: refs_may_alias_p: 8495 disambiguations, 24866 queries ref_maybe_used_by_call_p: 152240 disambiguations, 27333 queries call_may_clobber_ref_p: 151672 disambiguations, 151672 queries PTA query stats: pt_solution_includes: 430 disambiguations, 17991 queries pt_solutions_intersect: 1291 disambiguations, 655295 queries