http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375
--- Comment #119 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-10-19 09:22:01 UTC --- Some up to date perfomrance data. WPA peaks 3.1GB in TOP now. (3261 virt). Overall compile time is 4m32s real, 21m14 user. GGC memory is GC 2248537k -> 1727826k WPA time report: callgraph optimization : 1.68 ( 1%) usr 0.00 ( 0%) sys 1.70 ( 1%) wall 16008 kB (11%) ggc varpool construction : 0.66 ( 0%) usr 0.02 ( 0%) sys 0.68 ( 0%) wall 55300 kB (39%) ggc ipa cp : 1.70 ( 1%) usr 0.09 ( 1%) sys 1.79 ( 1%) wall 75845 kB (53%) ggc ipa lto gimple out : 9.40 ( 6%) usr 0.91 (10%) sys 10.36 ( 6%) wall 0 kB ( 0%) ggc ipa lto decl in : 45.99 (29%) usr 1.66 (19%) sys 47.95 (28%) wall 3285797 kB (2315%) ggc ipa lto decl out : 35.61 (22%) usr 1.65 (19%) sys 37.23 (22%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 3.73 ( 2%) usr 0.22 ( 2%) sys 3.95 ( 2%) wall 621046 kB (438%) ggc ipa lto decl merge : 5.75 ( 4%) usr 0.00 ( 0%) sys 5.75 ( 3%) wall 803 kB ( 1%) ggc ipa lto cgraph merge : 2.79 ( 2%) usr 0.02 ( 0%) sys 2.81 ( 2%) wall 27731 kB (20%) ggc inline heuristics : 31.32 (19%) usr 0.13 ( 1%) sys 31.48 (18%) wall 252282 kB (178%) ggc TOTAL : 161.21 8.82 170.40 141952 kB (i.e. 60% of overall compilation time and about 1/3 if streaming in 1/3 of straming out and 1/5th for inliner). oprofile of streaming in: 9467 6.8109 lto1 htab_find_slot_with_hash 9036 6.5008 lto1 inflate_fast 6608 4.7540 libc-2.11.1.so memset 6256 4.5008 libc-2.11.1.so _int_malloc 6243 4.4914 lto1 pointer_map_insert 5694 4.0965 lto1 lto_input_tree 5014 3.6072 lto1 gt_ggc_mx_lang_tree_node 4522 3.2533 lto1 streamer_read_tree_bitfields 4463 3.2108 lto1 ggc_set_mark 4087 2.9403 opreport /usr/bin/opreport 3661 2.6339 lto1 ggc_internal_alloc_stat 3475 2.5000 lto1 streamer_read_uhwi 2508 1.8043 lto1 gimple_type_eq 2418 1.7396 lto1 streamer_read_tree_body 2310 1.6619 libc-2.11.1.so memcpy 2292 1.6489 lto1 streamer_tree_cache_insert_1 2255 1.6223 libc-2.11.1.so memcmp 2119 1.5245 lto1 ht_lookup_with_hash 1902 1.3684 lto1 iterative_hash_hashval_t 1885 1.3561 lto1 lto_fixup_types 1884 1.3554 libc-2.11.1.so _int_free 1872 1.3468 lto1 uniquify_nodes 1842 1.3252 lto1 htab_expand 1825 1.3130 oprofiled /usr/bin/oprofiled 1813 1.3043 lto1 adler32 1734 1.2475 lto1 htab_hash_string 1509 1.0856 libc-2.11.1.so _IO_vfscanf 1470 1.0576 libc-2.11.1.so malloc_consolidate pointer map and htab is mostly type merging still, I believe. oprofile of inliner: 8772 37.9215 lto1 edge_badness 5532 23.9149 lto1 do_estimate_growth_1 1647 7.1200 lto1 update_caller_keys 1484 6.4154 lto1 can_inline_edge_p 744 3.2163 lto1 estimate_calls_size_and_time.isra.32 509 2.2004 lto1 estimate_edge_size_and_time.constprop.65 495 2.1399 lto1 fibheap_consolidate 267 1.1542 lto1 fibheap_extr_min_node 210 0.9078 lto1 cgraph_maybe_hot_edge_p I.e. easy to handle by taming down amout of heap updating. Stream out: 33711 19.7166 lto1 lto1 varpool_node_for_asm 13947 8.1572 lto1 lto1 decl_assembler_name_equal 8873 5.1896 lto1 lto1 pointer_map_insert 8765 5.1264 lto1 lto1 linemap_lookup 6809 3.9824 lto1 lto1 lto_output_tree 4931 2.8840 lto1 lto1 inflate_fast 4718 2.7594 lto1 lto1 streamer_write_uhwi_stream 3521 2.0593 lto1 lto1 streamer_tree_cache_insert_1 3340 1.9535 lto1 lto1 splay_tree_splay 3293 1.9260 lto1 lto1 streamer_pack_tree_bitfields 3210 1.8774 libc-2.11.1.so libc-2.11.1.so memcpy 3175 1.8570 libc-2.11.1.so libc-2.11.1.so _int_malloc The assembler name lookups will go away with finishing the alias rewrite. Oprofile of ltrans stage: 52827 3.3333 lto1 lto1 value_member 45691 2.8830 libc-2.11.1.so libc-2.11.1.so _int_malloc 42528 2.6835 lto1 lto1 bitmap_set_bit 41934 2.6460 oprofiled oprofiled /usr/bin/oprofiled 22353 1.4104 libc-2.11.1.so libc-2.11.1.so memset 21573 1.3612 lto1 lto1 htab_find_slot_with_hash 20936 1.3210 lto1 lto1 ggc_internal_alloc_stat 19608 1.2372 lto1 lto1 record_reg_classes.constprop.10 17423 1.0994 lto1 lto1 bitmap_bit_p 17195 1.0850 lto1 lto1 for_each_rtx_1 13504 0.8521 libc-2.11.1.so libc-2.11.1.so _int_free 12343 0.7788 lto1 lto1 bitmap_clear_bit 11826 0.7462 lto1 lto1 constrain_operands The slowest of ltrans is: garbage collection : 1.69 ( 2%) usr 0.01 ( 0%) sys 1.72 ( 2%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.52 ( 2%) usr 0.45 ( 9%) sys 1.94 ( 2%) wall 212002 kB (11%) ggc ipa lto decl in : 1.61 ( 2%) usr 0.19 ( 4%) sys 1.81 ( 2%) wall 147115 kB ( 7%) ggc cfg cleanup : 1.46 ( 2%) usr 0.03 ( 1%) sys 1.60 ( 2%) wall 5376 kB ( 0%) ggc df live regs : 2.26 ( 3%) usr 0.03 ( 1%) sys 2.62 ( 3%) wall 0 kB ( 0%) ggc tree VRP : 2.04 ( 2%) usr 0.05 ( 1%) sys 2.34 ( 2%) wall 126142 kB ( 6%) ggc tree PTA : 1.97 ( 2%) usr 0.00 ( 0%) sys 2.43 ( 3%) wall 8733 kB ( 0%) ggc tree PRE : 2.98 ( 3%) usr 0.07 ( 1%) sys 3.83 ( 4%) wall 64875 kB ( 3%) ggc tree FRE : 1.50 ( 2%) usr 0.01 ( 0%) sys 1.98 ( 2%) wall 33609 kB ( 2%) ggc expand : 4.11 ( 5%) usr 0.11 ( 2%) sys 4.85 ( 5%) wall 138280 kB ( 7%) ggc CSE : 1.88 ( 2%) usr 0.04 ( 1%) sys 2.16 ( 2%) wall 2764 kB ( 0%) ggc CPROP : 1.83 ( 2%) usr 0.04 ( 1%) sys 1.87 ( 2%) wall 21657 kB ( 1%) ggc integrated RA : 6.84 ( 8%) usr 0.08 ( 2%) sys 7.30 ( 8%) wall 367479 kB (19%) ggc reload : 2.47 ( 3%) usr 0.04 ( 1%) sys 2.82 ( 3%) wall 8783 kB ( 0%) ggc reload CSE regs : 2.03 ( 2%) usr 0.01 ( 0%) sys 2.02 ( 2%) wall 19115 kB ( 1%) ggc scheduling 2 : 3.08 ( 3%) usr 0.03 ( 1%) sys 3.14 ( 3%) wall 3942 kB ( 0%) ggc final : 11.46 (13%) usr 1.06 (21%) sys 3.62 ( 4%) wall 40822 kB ( 2%) ggc rest of compilation : 2.97 ( 3%) usr 0.87 (17%) sys 5.22 ( 5%) wall 60101 kB ( 3%) ggc unaccounted todo : 1.35 ( 2%) usr 0.67 (13%) sys 2.37 ( 2%) wall 0 kB ( 0%) ggc TOTAL : 89.65 5.08 95.59 1962376 kB Final is suprisingly slow.