On 11/5/19 12:01 PM, Jan Hubicka wrote:
On 11/5/19 11:36 AM, Jan Hubicka wrote:
Hi,
this patch adds object allocators to manage IPA summaries. This reduces
malloc overhead and fragmentation.  I now get peak memory use 7.5GB instead
of 10GB for firefox WPA because reduced fragmentation leads to less COWs after
forks.

That sounds promising.

Additional bonus is that we now have statistics gathered by mem-reports
which makes my life easier, too.

What's currently bad with the detailed memory statistics? I updated the
code that one should see the allocation for the underlying hash_map and
vec?

I currently get:

--------------------------------------------------------------------------------------------------------------------------------------------
Pool name                       Allocation pool                                 
  Pools       Leak            Peak            Times    Elt size
--------------------------------------------------------------------------------------------------------------------------------------------
tree_scc                        lto/lto-common.c:2709 (read_cgraph_and_symbols) 
     1         0 :  0.0%       99M     3169k: 43.7%          32
IPA histogram                   ipa-profile.c:77 
(__static_initialization_and_de     1        16 :  0.0%       16         1 :  
0.0%          16
IPA-PROP ref descriptions       ipa-prop.c:170 
(__static_initialization_and_dest     1       226k:  0.3%      226k     9670 :  
0.1%          24
function summary                ipa-fnsummary.c:557 (ipa_fn_summary_alloc)      
     1      6145k:  7.0%     6257k      391k:  5.4%          16
function summary                ipa-pure-const.c:136 (__base_ctor )             
     1      6863k:  7.9%     9449k      590k:  8.1%          16
edge predicates                 ipa-fnsummary.c:93 
(__static_initialization_and_     1      8327k:  9.5%     8385k      209k:  
2.9%          40
call summary                    ipa-sra.c:436 (__base_ctor )                    
     1        18M: 21.3%       21M     1393k: 19.2%          16
call summary                    ipa-fnsummary.h:276 (__base_ctor )              
     1        46M: 54.0%       46M     1483k: 20.5%          32
--------------------------------------------------------------------------------------------------------------------------------------------
Pool name                       Allocation pool                                 
  Pools       Leak            Peak            Times    Elt size
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                           
      9         85M
--------------------------------------------------------------------------------------------------------------------------------------------

This is quite readable, though we may give them different names and
update constructors. Not a big deal IMO.

For GGC statistics I see:

varpool.c:137 (create_empty)                          7924k:  0.4%        0 :  
0.0%     3214k:  0.2%        0 :  0.0%       87k
cgraph.c:939 (cgraph_allocate_init_indirect_info      8566k:  0.4%        0 :  
0.0%     1395k:  0.1%        0 :  0.0%      113k
alias.c:1170 (record_alias_subset)                      12M:  0.6%        0 :  
0.0%       12k:  0.0%       99k:  0.1%       12k
ipa-sra.c:2717 (isra_read_node_info)                    12M:  0.6%        0 :  
0.0%     4179k:  0.2%       21k:  0.0%      376k
toplev.c:904 (realloc_for_line_map)                     16M:  0.8%        0 :  
0.0%       15M:  0.9%      144 :  0.0%       12
ipa-prop.c:278 (ipa_alloc_node_params)                  16M:  0.8%      266k:  
0.4%        0 :  0.0%       22k:  0.0%      366k
symbol-summary.h:555 (allocate_new)                     18M:  0.9%        0 :  
0.0%      119k:  0.0%        0 :  0.0%     1171k
  ^^^ here we should point the caller of get_create

ipa-fnsummary.c:3877 (inline_read_section)              28M:  1.4%        0 :  
0.0%      552k:  0.0%      392k:  0.3%      261k
lto-section-in.c:388 (lto_new_in_decl_state)            29M:  1.4%        0 :  
0.0%       11M:  0.7%        0 :  0.0%      587k
symtab.c:582 (create_reference)                         35M:  1.7%        0 :  
0.0%       50M:  2.9%     1199k:  0.9%      541k
symbol-summary.h:64 (allocate_new)                      46M:  2.2%        0 :  
0.0%     2445k:  0.1%        0 :  0.0%     1168k
  ^^^ same here.

stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  
0.0%        0 :  0.0%        0 :  0.0%     1217k
ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  
0.0%      260k:  0.0%      404k:  0.3%      531k
hash-table.h:801 (expand)                               81M:  3.9%        0 :  
0.0%       80M:  4.7%       88k:  0.1%     3349
  ^^^ some of memory comes here which ought to be accounted to caller of
  expand.

Yes, these all come from ggc_internal_alloc. Ideally we should register a 
mem_alloc_description
for each created symbol/call_summary and register manually every allocation to 
such descriptor.

stringpool.c:41 (stringpool_ggc_alloc)                  92M:  4.4%        0 :  
0.0%        0 :  0.0%     6600k:  5.2%     1217k
cgraph.h:2712 (allocate_cgraph_symbol)                 148M:  7.1%        0 :  
0.0%      115M:  6.7%        0 :  0.0%      767k
cgraph.c:851 (create_edge)                             149M:  7.1%        0 :  
0.0%       27M:  1.6%        0 :  0.0%     1743k
ipa-fnsummary.c:3936 (inline_read_section)             174M:  8.3%        0 :  
0.0%     4190k:  0.2%       12M: 10.2%      391k
lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  9.6%        0 :  
0.0%       65M:  3.8%       19M: 15.5%     1731k
ipa-prop.c:4478 (ipa_read_edge_info)                   210M: 10.0%        0 :  
0.0%     1361k:  0.1%       17M: 14.4%     1171k
tree-streamer-in.c:631 (streamer_alloc_tree)           647M: 30.8%       55M: 
84.5%     1267M: 73.4%       64M: 52.1%       13M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage   
         Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 2100M:100.0%       
65M:100.0%     1726M:100.0%      124M:100.0%       29M
--------------------------------------------------------------------------------------------------------------------------------------------

One very odd thing is that at the end of WPA of firefox I see:

hash-table.h:801 (expand)                              100M:  2.9%     2088 :  
0.0%      193M:  6.4%       90k:  0.0%     3379
tree-ssa-operands.c:265 (ssa_operand_alloc)            104M:  3.0%        0 :  
0.0%       39M:  1.3%        0 :  0.0%      105k
stringpool.c:41 (stringpool_ggc_alloc)                 106M:  3.1%        0 :  
0.0%        0 :  0.0%     7652k:  2.4%     1362k
ipa-fnsummary.c:3936 (inline_read_section)             174M:  5.1%        0 :  
0.0%     4190k:  0.1%       12M:  4.0%      391k
   ^^^ those are size_tale vectors that ought to be freed.

lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  5.8%        0 :  
0.0%       65M:  2.2%       19M:  6.1%     1731k
ipa-prop.c:4478 (ipa_read_edge_info)                   210M:  6.1%        0 :  
0.0%     1361k:  0.0%       17M:  5.7%     1171k
   ^^^ those are jumptables that ought to be freed too.

I verified this and I can confirm that

class GTY((for_user)) ipa_edge_args
{
 public:

  /* Default constructor.  */
  ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL)
    {}

  /* Destructor.  */
  ~ipa_edge_args ()
    {
      vec_free (jump_functions);
      vec_free (polymorphic_call_contexts);
    }

is called which then calls vec_free. I traced that down and 
m_reverse_object_map does not
contain the pointer. So some minor issue in allocation tracing. But I'm pretty 
sure the memory
is released.

Martin


cgraph.c:851 (create_edge)                             285M:  8.3%        0 :  
0.0%       33M:  1.1%        0 :  0.0%     3141k
cgraph.h:2712 (allocate_cgraph_symbol)                 417M: 12.1%        0 :  
0.0%      121M:  4.0%        0 :  0.0%     1567k
tree-streamer-in.c:631 (streamer_alloc_tree)           758M: 22.0%       96M: 
23.0%     1267M: 41.7%       64M: 20.6%       15M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage   
         Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 3453M:100.0%      
418M:100.0%     3039M:100.0%      313M:100.0%       49M
--------------------------------------------------------------------------------------------------------------------------------------------

I am not sure where the problem is - it is GGC memory and we release
those summaries after inlining so there should not be any pointers to
them. At worst it should account to garbage, so it may be also some
accounting bug.

I suppose first thing to try is to breakpoint in the ggc walker of these
and see if it shows up in the final ggc.

Honza


Reply via email to