On 11/5/19 12:01 PM, Jan Hubicka wrote:
On 11/5/19 11:36 AM, Jan Hubicka wrote:
Hi,
this patch adds object allocators to manage IPA summaries. This reduces
malloc overhead and fragmentation. I now get peak memory use 7.5GB instead
of 10GB for firefox WPA because reduced fragmentation leads to less COWs after
forks.
That sounds promising.
Additional bonus is that we now have statistics gathered by mem-reports
which makes my life easier, too.
What's currently bad with the detailed memory statistics? I updated the
code that one should see the allocation for the underlying hash_map and
vec?
I currently get:
--------------------------------------------------------------------------------------------------------------------------------------------
Pool name Allocation pool
Pools Leak Peak Times Elt size
--------------------------------------------------------------------------------------------------------------------------------------------
tree_scc lto/lto-common.c:2709 (read_cgraph_and_symbols)
1 0 : 0.0% 99M 3169k: 43.7% 32
IPA histogram ipa-profile.c:77
(__static_initialization_and_de 1 16 : 0.0% 16 1 :
0.0% 16
IPA-PROP ref descriptions ipa-prop.c:170
(__static_initialization_and_dest 1 226k: 0.3% 226k 9670 :
0.1% 24
function summary ipa-fnsummary.c:557 (ipa_fn_summary_alloc)
1 6145k: 7.0% 6257k 391k: 5.4% 16
function summary ipa-pure-const.c:136 (__base_ctor )
1 6863k: 7.9% 9449k 590k: 8.1% 16
edge predicates ipa-fnsummary.c:93
(__static_initialization_and_ 1 8327k: 9.5% 8385k 209k:
2.9% 40
call summary ipa-sra.c:436 (__base_ctor )
1 18M: 21.3% 21M 1393k: 19.2% 16
call summary ipa-fnsummary.h:276 (__base_ctor )
1 46M: 54.0% 46M 1483k: 20.5% 32
--------------------------------------------------------------------------------------------------------------------------------------------
Pool name Allocation pool
Pools Leak Peak Times Elt size
--------------------------------------------------------------------------------------------------------------------------------------------
Total
9 85M
--------------------------------------------------------------------------------------------------------------------------------------------
This is quite readable, though we may give them different names and
update constructors. Not a big deal IMO.
For GGC statistics I see:
varpool.c:137 (create_empty) 7924k: 0.4% 0 :
0.0% 3214k: 0.2% 0 : 0.0% 87k
cgraph.c:939 (cgraph_allocate_init_indirect_info 8566k: 0.4% 0 :
0.0% 1395k: 0.1% 0 : 0.0% 113k
alias.c:1170 (record_alias_subset) 12M: 0.6% 0 :
0.0% 12k: 0.0% 99k: 0.1% 12k
ipa-sra.c:2717 (isra_read_node_info) 12M: 0.6% 0 :
0.0% 4179k: 0.2% 21k: 0.0% 376k
toplev.c:904 (realloc_for_line_map) 16M: 0.8% 0 :
0.0% 15M: 0.9% 144 : 0.0% 12
ipa-prop.c:278 (ipa_alloc_node_params) 16M: 0.8% 266k:
0.4% 0 : 0.0% 22k: 0.0% 366k
symbol-summary.h:555 (allocate_new) 18M: 0.9% 0 :
0.0% 119k: 0.0% 0 : 0.0% 1171k
^^^ here we should point the caller of get_create
ipa-fnsummary.c:3877 (inline_read_section) 28M: 1.4% 0 :
0.0% 552k: 0.0% 392k: 0.3% 261k
lto-section-in.c:388 (lto_new_in_decl_state) 29M: 1.4% 0 :
0.0% 11M: 0.7% 0 : 0.0% 587k
symtab.c:582 (create_reference) 35M: 1.7% 0 :
0.0% 50M: 2.9% 1199k: 0.9% 541k
symbol-summary.h:64 (allocate_new) 46M: 2.2% 0 :
0.0% 2445k: 0.1% 0 : 0.0% 1168k
^^^ same here.
stringpool.c:63 (alloc_node) 47M: 2.3% 0 :
0.0% 0 : 0.0% 0 : 0.0% 1217k
ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 :
0.0% 260k: 0.0% 404k: 0.3% 531k
hash-table.h:801 (expand) 81M: 3.9% 0 :
0.0% 80M: 4.7% 88k: 0.1% 3349
^^^ some of memory comes here which ought to be accounted to caller of
expand.
Yes, these all come from ggc_internal_alloc. Ideally we should register a
mem_alloc_description
for each created symbol/call_summary and register manually every allocation to
such descriptor.
stringpool.c:41 (stringpool_ggc_alloc) 92M: 4.4% 0 :
0.0% 0 : 0.0% 6600k: 5.2% 1217k
cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 :
0.0% 115M: 6.7% 0 : 0.0% 767k
cgraph.c:851 (create_edge) 149M: 7.1% 0 :
0.0% 27M: 1.6% 0 : 0.0% 1743k
ipa-fnsummary.c:3936 (inline_read_section) 174M: 8.3% 0 :
0.0% 4190k: 0.2% 12M: 10.2% 391k
lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 9.6% 0 :
0.0% 65M: 3.8% 19M: 15.5% 1731k
ipa-prop.c:4478 (ipa_read_edge_info) 210M: 10.0% 0 :
0.0% 1361k: 0.1% 17M: 14.4% 1171k
tree-streamer-in.c:631 (streamer_alloc_tree) 647M: 30.8% 55M:
84.5% 1267M: 73.4% 64M: 52.1% 13M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory Leak Garbage
Freed Overhead Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total 2100M:100.0%
65M:100.0% 1726M:100.0% 124M:100.0% 29M
--------------------------------------------------------------------------------------------------------------------------------------------
One very odd thing is that at the end of WPA of firefox I see:
hash-table.h:801 (expand) 100M: 2.9% 2088 :
0.0% 193M: 6.4% 90k: 0.0% 3379
tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 3.0% 0 :
0.0% 39M: 1.3% 0 : 0.0% 105k
stringpool.c:41 (stringpool_ggc_alloc) 106M: 3.1% 0 :
0.0% 0 : 0.0% 7652k: 2.4% 1362k
ipa-fnsummary.c:3936 (inline_read_section) 174M: 5.1% 0 :
0.0% 4190k: 0.1% 12M: 4.0% 391k
^^^ those are size_tale vectors that ought to be freed.
lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 5.8% 0 :
0.0% 65M: 2.2% 19M: 6.1% 1731k
ipa-prop.c:4478 (ipa_read_edge_info) 210M: 6.1% 0 :
0.0% 1361k: 0.0% 17M: 5.7% 1171k
^^^ those are jumptables that ought to be freed too.
I verified this and I can confirm that
class GTY((for_user)) ipa_edge_args
{
public:
/* Default constructor. */
ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL)
{}
/* Destructor. */
~ipa_edge_args ()
{
vec_free (jump_functions);
vec_free (polymorphic_call_contexts);
}
is called which then calls vec_free. I traced that down and
m_reverse_object_map does not
contain the pointer. So some minor issue in allocation tracing. But I'm pretty
sure the memory
is released.
Martin
cgraph.c:851 (create_edge) 285M: 8.3% 0 :
0.0% 33M: 1.1% 0 : 0.0% 3141k
cgraph.h:2712 (allocate_cgraph_symbol) 417M: 12.1% 0 :
0.0% 121M: 4.0% 0 : 0.0% 1567k
tree-streamer-in.c:631 (streamer_alloc_tree) 758M: 22.0% 96M:
23.0% 1267M: 41.7% 64M: 20.6% 15M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory Leak Garbage
Freed Overhead Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total 3453M:100.0%
418M:100.0% 3039M:100.0% 313M:100.0% 49M
--------------------------------------------------------------------------------------------------------------------------------------------
I am not sure where the problem is - it is GGC memory and we release
those summaries after inlining so there should not be any pointers to
them. At worst it should account to garbage, so it may be also some
accounting bug.
I suppose first thing to try is to breakpoint in the ggc walker of these
and see if it shows up in the final ggc.
Honza