http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60828
Bug ID: 60828 Summary: Compile time speedups when using tcmalloc Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: trippels at gcc dot gnu.org There are noticeable compile time speedups when one links gcc with tcmalloc. This happens mostly for C++ programs. Plain C projects show not much difference. Here are the compile times for Firefox an my 4-core machine: Firefox -O3: glibc malloc: 2806.82s user 126.92s system 349% cpu 13:58.37 total 0% speedup tcmalloc: 2707.31s user 129.93s system 358% cpu 13:10.61 total 5.7% speedup jemalloc: 2708.30s user 175.53s system 354% cpu 13:34.29 total 2.9% speedup Firefox -flto=4 -O3: glibc malloc: 3241.66s user 155.71s system 316% cpu 17:54.13 total 0% speedup tcmalloc: 3140.43s user 164.22s system 323% cpu 17:01.13 total 4.9% speedup jemalloc: 3155.74s user 226.63s system 320% cpu 17:35.51 total 1.7% speedup A simpler example is tramp3d-v4: glibc malloc: % time g++ -w -O3 -march=native tramp3d-v4.cpp 22.30s user 0.34s system 97% cpu 23.301 total tcmalloc: ~ % time g++ -w -O3 -march=native tramp3d-v4.cpp 21.36s user 0.30s system 99% cpu 21.659 total (~7% speedup) tcmalloc build in heap-profiler shows (number of allocated megabytes. This includes the space that has since been deallocated): markus@x4 ~ % pprof --alloc_space --text /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/cc1 /tmp/mybin.hprof_4474.0010.heap Using local file /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/cc1. Using local file /tmp/mybin.hprof_4474.0010.heap. Total: 34.3 MB 7.7 22.6% 22.6% 7.8 22.6% c_common_nodes_and_builtins [clone .cold.171] 5.7 16.7% 39.3% 5.7 16.7% tree_ssa_lim 4.3 12.5% 51.8% 10.8 31.5% cpp_classify_number 3.8 11.1% 62.9% 5.2 15.1% do_endif [clone .lto_priv.2364] 2.6 7.5% 70.4% 2.6 7.5% _cpp_pop_context 2.6 7.5% 77.8% 2.6 7.5% cgraph_add_node_removal_hook 2.2 6.5% 84.3% 2.2 6.5% __gmp_default_allocate 1.7 5.1% 89.4% 1.7 5.1% rtx_moveable_p [clone .isra.7] [clone .lto_priv.5842] 1.5 4.2% 93.6% 1.7 5.1% add_exit_phis [clone .lto_priv.5880] 0.7 2.1% 95.7% 0.7 2.1% ix86_target_macros_internal [clone .lto_priv.7319] 0.3 0.9% 96.6% 0.3 0.9% init_alias_vars [clone .lto_priv.9038] 0.3 0.8% 97.4% 0.3 0.8% gimple_fold_builtin ... And total objects (including deallocated): Total: 619253 objects 290259 46.9% 46.9% 290259 46.9% __gmp_default_allocate 89866 14.5% 61.4% 89866 14.5% rtx_moveable_p [clone .isra.7] [clone .lto_priv.5842] 74190 12.0% 73.4% 107769 17.4% cpp_classify_number 66198 10.7% 84.1% 66243 10.7% do_endif [clone .lto_priv.2364] 44778 7.2% 91.3% 44778 7.2% _cpp_pop_context 20931 3.4% 94.7% 20939 3.4% simplify_plus_minus [clone .lto_priv.5851] 8642 1.4% 96.1% 11749 1.9% expand_asm_operands [clone .lto_priv.6838] 5665 0.9% 97.0% 5801 0.9% c_common_nodes_and_builtins [clone .cold.171] 4659 0.8% 97.7% 4659 0.8% merge_classes [clone .part.41] [clone .lto_priv.3432] 3659 0.6% 98.3% 3773 0.6% init_alias_vars [clone .lto_priv.9038] 2541 0.4% 98.7% 2541 0.4% tree_ssa_lim