Attached is the memory consumption report for a very large source
file. Looks like this patch actually reduced the memory consumption by
2%.

Dehao

On Thu, Sep 13, 2012 at 1:18 AM, Xinliang David Li <davi...@google.com> wrote:
> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <de...@google.com> wrote:
>> There are two parts that needs memory management:
>>
>> 1. The BLOCK structure. This is managed by GC. I originally thought
>> that removing blocks from tree.gsbase would paralyze GC. This turned
>> out not to be a concern because DECL_INITIAL will still mark those
>> used tree nodes. This patch may decrease the memory consumption by
>> removing blocks from tree/gimple. However, as it makes more blocks
>> become used, they also increase the memory consumption.
>
> You mean when you also make the location table GC root.
>
> Can you share the mem-stats information for the large program with and
> without your patch?
>
> thanks,
>
> David
>
>> 2. The data structure in libcpp that maintains the hashtable for the
>> location->block mapping. This is relatively minor because for the
>> largest source I've seen, it only maintains less than 100K entries in
>> the array (less than 1M total memory consumption). However, as it is a
>> global data structure, it may make LTO unhappy. Honza is helping
>> testing the memory consumption on LTO (but we first need to make this
>> patch work for LTO). If the LTO result turns out ok, we probably don't
>> want to put these under GC because: 1. it'll make things much more
>> complicated. 2. using self managed memory is more efficient (as this
>> is frequently used in many passes). 3. not using GC actually saves
>> memory because even though the block is in the map, it can still be
>> GCed as soon as it's not reachable from DECL_INITIAL.
>>
>> I've tested this on some very large C++ files (each one takes more
>> than 10s to build), the memory consumption does not see noticeable
>> increase/decrease.
>>
>> Thanks,
>> Dehao
>>
>> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davi...@google.com> 
>> wrote:
>>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
>>> <richard.guent...@gmail.com> wrote:
>>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <de...@google.com> wrote:
>>>>> Now I think we are facing a more complex problem. The data structure
>>>>> we use to store the location_adhoc_data are file-static in linemap.c
>>>>> in libcpp. These data structures are not guarded by GTY(()).
>>>>> Meanwhile, as we have removed the block data structure from
>>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t).
>>>>> This could cause block being GCed and the LOCATION_BLOCK becoming
>>>>> dangling pointers.
>>>>
>>>> Uh.  Note that it is quite important that we are able to garbage-collect 
>>>> unused
>>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in
>>>> remove_unused_locals.  So this indeed becomes much more complicated ...
>>>> What would be desired is that the garbage collector can NULL an entry in
>>>> the mapping table when it is not referenced in any other way (that other
>>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs 
>>>> DECL_INITIAL).
>>>
>>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
>>> are created for a large C++ program. This patch saves memory by
>>> shrinking tree size, is it a net win or loss without GC those BLOCKS?
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>>
>>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
>>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
>>>>> help me.
>>>>>
>>>>> Another approach would be guard the location_adhoc_data and related
>>>>> data structures in GTY(()). However, this is non-trivial because tree
>>>>> is not visible in libcpp. At the same time, my implementation heavily
>>>>> relies on hashtable to make the code efficient, thus it's quite tricky
>>>>> to make "param_is" and "use_params" work.
>>>>>
>>>>> The final approach, which I'll try tomorrow, would be move all my
>>>>> implementation from libcpp to gcc, and guard them with GTY(()). I
>>>>> still haven't thought of any potential problem of this approach. Any
>>>>> comments?
>>>>
>>>> I think moving the mapping to GC in a lazy manner as I described above
>>>> would be the way to go.  For hashtables GC already supports if_marked,
>>>> not sure if similar support is available for arrays/vecs.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <de...@google.com> wrote:
>>>>>> I saw comments in tree-streamer-out.c:
>>>>>>
>>>>>>   /* Do not stream BLOCK_SOURCE_LOCATION.  We cannot handle debug 
>>>>>> information
>>>>>>      for early inlining so drop it on the floor instead of ICEing in
>>>>>>      dwarf2out.c.  */
>>>>>>   streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);
>>>>>>
>>>>>> However, what the code is doing seemed contradictory with the comment.
>>>>>> Or am I missing something?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <m...@suse.de> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote:
>>>>>>>
>>>>>>>> Looks like we have two choices:
>>>>>>>>
>>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t)
>>>>>>>
>>>>>>> This will actually not work correctly in some cases.  The problem is, if
>>>>>>> the prevailing decl is already part of another chain (say in another
>>>>>>> block_var list) you would break the current chain.  Hence block vars 
>>>>>>> need
>>>>>>> special handling in the lto streamer (another reason why tree_chain is 
>>>>>>> not
>>>>>>> the most clever think to use for this chain).  This problem area needs 
>>>>>>> to
>>>>>>> be solved somehow if block info is to be preserved correctly.
>>>>>>>
>>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL
>>>>>>>> (TREE_CHAIN (t)).
>>>>>>>
>>>>>>> That's also a large hammer as it basically will mean no debug info after
>>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve
>>>>>>> quite some work, perhaps your hack is good enough for the time being,
>>>>>>> though I hate it :)
>>>>>>
>>>>>> I got it. Then I'll keep the patch as it is (remove the
>>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and
>>>>>> then we should be good to check in?
>>>>>>
>>>>>> Thanks,
>>>>>> Dehao
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ciao,
>>>>>>> Michael.
Number of expanded macros:                     23481
Average number of tokens per macro expansion:     12

Line Table allocations during the compilation process
Number of ordinary maps used:         1451 
Ordinary map used size:                 56k
Number of ordinary maps allocated:    1638 
Ordinary maps allocated size:           63k
Number of macro maps used:               0 
Macro maps used size:                    0 
Macro maps locations size:               0 
Macro maps size:                         0 
Duplicated maps locations size:          0 
Total allocated maps size:              63k
Total used maps size:                   56k

Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             88k         64k       2640 
16            16M       4726k        356k
32            31M         11M        570k
64            60M         37M        960k
128         7296k       6833k         99k
256           29M         24M        417k
512         1224k        753k         16k
1024          21M         21M        301k
2048         600k        484k       8400 
4096        2660k       2660k         36k
8192        1352k       1352k       9464 
16384       2560k       2560k       8960 
32768        288k        288k        504 
65536        512k        512k        448 
131072        256k        256k        112 
262144       2048k       2048k        448 
524288       2560k       2560k        280 
1048576       4096k       4096k        224 
2097152       2048k       2048k         56 
24            48M       7165k        875k
40            66M         24M       1068k
48            21M       5121k        342k
56            14M       4309k        230k
72            30M         16M        431k
80            82M         49M       1149k
88            15M         10M        218k
104           10M       8041k        152k
112        10056k       2527k        137k
120           10M       6083k        146k
184           20M         17M        292k
152         8996k       6841k        122k
160           38M         11M        532k
168           53M         40M        754k
96            17M       6035k        251k
304           54M         49M        761k
136           35M         31M        496k
Total        726M        421M         10M

String pool
entries         187117
identifiers     111112 (59.38%)
slots           262144
deleted         67636
bytes           5237k (17592186044410M overhead)
table size      2048k
coll/search     1.3684
ins/search      0.0853
avg. entry      28.66 bytes (+/- 34.34)
longest entry   236

??? tree nodes created

(No per-node statistics)
Type hash: size 131071, 68058 elements, 1.008263 collisions
DECL_DEBUG_EXPR  hash: size 8191, 131 elements, 0.527951 collisions
DECL_VALUE_EXPR  hash: size 1021, 16 elements, 0.000000 collisions
no search statistics
decl_specializations: size 65521, 33767 elements, 1.169079 collisions
type_specializations: size 32749, 22750 elements, 1.193797 collisions
No gimple statistics

Alias oracle query stats:
  refs_may_alias_p: 8495 disambiguations, 24866 queries
  ref_maybe_used_by_call_p: 152203 disambiguations, 27333 queries
  call_may_clobber_ref_p: 151635 disambiguations, 151635 queries

PTA query stats:
  pt_solution_includes: 430 disambiguations, 17991 queries
  pt_solutions_intersect: 1291 disambiguations, 655295 queries
Number of expanded macros:                     23481
Average number of tokens per macro expansion:     12

Line Table allocations during the compilation process
Number of ordinary maps used:         1451 
Ordinary map used size:                 56k
Number of ordinary maps allocated:    1638 
Ordinary maps allocated size:           63k
Number of macro maps used:               0 
Macro maps used size:                    0 
Macro maps locations size:               0 
Macro maps size:                         0 
Duplicated maps locations size:          0 
Total allocated maps size:              63k
Total used maps size:                   56k

Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             88k         64k       2640 
16            17M       4640k        376k
32            48M         13M        865k
64            51M         34M        820k
128         8596k       6686k        117k
256           27M         23M        385k
512         1184k        753k         16k
1024          21M         21M        299k
2048         592k        484k       8288 
4096        2660k       2660k         36k
8192        1352k       1352k       9464 
16384       2560k       2560k       8960 
32768        288k        288k        504 
65536        512k        512k        448 
131072        256k        256k        112 
262144       1792k       1792k        392 
524288       3072k       3072k        336 
1048576       4096k       4096k        224 
2097152       2048k       2048k         56 
24            55M       7228k        995k
40            61M         22M        990k
48            19M       6224k        318k
56            14M       4400k        238k
72            45M         17M        643k
80            65M         42M        920k
88            12M         10M        172k
104           10M       8087k        145k
112        10124k       2468k        138k
120           10M       5958k        141k
184           21M         17M        295k
152         8348k       6826k        114k
160           19M       6792k        275k
168           54M         40M        757k
96            18M       6030k        259k
304           54M         49M        762k
136           33M         31M        473k
Total        710M        407M         10M

String pool
entries         187117
identifiers     108460 (57.96%)
slots           262144
deleted         70040
bytes           5133k (17592186044410M overhead)
table size      2048k
coll/search     1.3759
ins/search      0.0847
avg. entry      28.09 bytes (+/- 34.18)
longest entry   236

??? tree nodes created

(No per-node statistics)
Type hash: size 131071, 68058 elements, 1.009220 collisions
DECL_DEBUG_EXPR  hash: size 8191, 131 elements, 0.457233 collisions
DECL_VALUE_EXPR  hash: size 1021, 15 elements, 0.033784 collisions
no search statistics
decl_specializations: size 65521, 33767 elements, 1.169079 collisions
type_specializations: size 32749, 22750 elements, 1.193797 collisions
No gimple statistics

Alias oracle query stats:
  refs_may_alias_p: 8495 disambiguations, 24866 queries
  ref_maybe_used_by_call_p: 152240 disambiguations, 27333 queries
  call_may_clobber_ref_p: 151672 disambiguations, 151672 queries

PTA query stats:
  pt_solution_includes: 430 disambiguations, 17991 queries
  pt_solutions_intersect: 1291 disambiguations, 655295 queries

Reply via email to