https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99105
Bug ID: 99105 Summary: profile streaming scales poorly to projects with many source files Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org CC: marxin at gcc dot gnu.org Target Milestone: --- Compared to clang we need significantly longer time to train Firefox (25 minutes compared to 7) and run clang make check-clang which takes 12 hours compared to 27 minutes. Most of time is spent by kernel by IO. I suppose we really should consider optionally producing per-binary rather then per-source file profile data dumps and omit untrained parts of program. This is perf top of running llvm testsuite (first with and second without kernel symbols). Seems merging of topn is high in profile now. Overhead Shared Object Symbol 8.58% libc-2.32.so [.] read 7.24% [kernel] [k] __x86_indirect_thunk_rax 7.15% [kernel] [k] entry_SYSCALL_64 6.43% [kernel] [k] __x64_sys_read 5.70% [kernel] [k] apparmor_file_permission 5.49% [kernel] [k] generic_file_buffered_read 4.45% [kernel] [k] btrfs_file_read_iter 4.10% [kernel] [k] syscall_return_via_sysret 3.31% [kernel] [k] new_sync_read 3.07% libc-2.32.so [.] _IO_file_xsgetn 2.77% [kernel] [k] find_get_entry 2.76% libc-2.32.so [.] _IO_fread 2.60% [kernel] [k] current_time 2.33% [kernel] [k] atime_needs_update 2.18% [kernel] [k] vfs_read 2.11% clang-11 [.] __gcov_merge_topn 2.02% [kernel] [k] pagecache_get_page 1.97% [kernel] [k] entry_SYSCALL_64_after_hwframe 1.89% clang-11 [.] gcov_read_words 1.76% [kernel] [k] __fsnotify_parent 1.67% [kernel] [k] syscall_exit_to_user_mode 1.60% [kernel] [k] ksys_read 1.40% [kernel] [k] security_file_permission 1.30% [kernel] [k] aa_file_perm 1.23% [kernel] [k] syscall_enter_from_user_mode 1.11% [kernel] [k] touch_atime 1.02% [kernel] [k] exit_to_user_mode_prepare 0.99% [kernel] [k] xas_load 0.95% [kernel] [k] xas_start 0.74% [kernel] [k] __fget_light 0.71% [kernel] [k] __fdget_pos 0.69% clang-11 [.] __gcov_read_counter 0.64% [kernel] [k] do_syscall_64 0.58% [kernel] [k] ktime_get_coarse_real_ts64 0.55% [kernel] [k] rw_verify_area 0.50% libc-2.32.so [.] _IO_sgetn 0.50% [kernel] [k] PageHuge 0.45% perf [.] rb_next 0.38% [kernel] [k] iov_iter_init For a higher level overview, try: perf top --sort comm,dso Overhead Shared Object Symbol 43.43% libc-2.32.so [.] read 12.00% libc-2.32.so [.] _IO_file_xsgetn 11.80% libc-2.32.so [.] _IO_fread 7.89% clang-11 [.] __gcov_merge_topn 7.28% clang-11 [.] gcov_read_words 2.32% clang-11 [.] __gcov_read_counter 2.28% libc-2.32.so [.] _IO_sgetn 2.08% FileCheck [.] __gcov_merge_topn 1.46% FileCheck [.] gcov_read_words 1.23% perf [.] rb_next 1.08% perf [.] __symbols__insert 0.87% libc-2.32.so [.] _IO_file_read 0.72% clang-11 [.] gcov_do_dump 0.38% FileCheck [.] __gcov_read_counter 0.28% perf [.] rust_demangle_callback 0.25% libc-2.32.so [.] _int_malloc 0.19% clang-11 [.] gcov_write_words 0.18% libc-2.32.so [.] __strchr_avx2 0.18% clang-11 [.] fread@plt 0.18% libc-2.32.so [.] __libc_calloc 0.17% perf [.] dso__load_sym 0.15% perf [.] symbol__new 0.14% perf [.] rb_insert_color 0.11% libc-2.32.so [.] __strlen_avx2 0.10% perf [.] 0x000000000087755b 0.08% libc-2.32.so [.] __memmove_avx_unaligned_erms 0.08% perf [.] evsel__parse_sample 0.07% libc-2.32.so [.] sysmalloc 0.07% perf [.] symbols__fixup_end 0.07% perf [.] eprintf 0.07% libc-2.32.so [.] __memset_avx2_unaligned_erms 0.07% libc-2.32.so [.] cfree@GLIBC_2.2.5 0.06% perf [.] bfd_demangle 0.06% perf [.] rust_demangle 0.05% perf [.] cplus_demangle 0.05% libc-2.32.so [.] _int_free 0.05% libpthread-2.32.so [.] __pthread_mutex_init 0.04% perf [.] cplus_demangle_v3 0.04% FileCheck [.] fread@plt For a higher level overview, try: perf top --sort comm,dso