Hello,
I've tried to meassure some cache misses of 4.0.1 and 4.1.0 C++ compilers by using oprofile on amd64 box while compiling MICO sources and found that:
0) compiler options used were: -I../include -Wall -D_REENTRANT -D_GNU_SOURCE -DPIC -fPIC -c
1) the most expensive seems to be comptypes -- at least from L1 and L2 DTLB misses point of view (~13%)
2) comptypes is also the most CPU intensive operation since the most of time is spent there
3) some other L1 and L2 DTLB misses expensive functions seems to be: push_to_top_level(~5%), htab_find_slot_with_hash(~5%), ht_lookup_with_hash(~4%), lookup_fnfields_1(~4%)
4) for 4.0.1 every L1 and L2 DTLB miss happens every 2275 CLK event
5) for 4.1.0 every L1 and L2 DTLB miss happens every 2332 CLK event
6) 4.1.0 is a _bit_ faster than 4.0.1
7) tables were produced after three cycles of "make; find . -name '*.o' -exec rm \{} \;"
I've thought that L1 and L2 DTLB misses are the most important for the overall performance or performance degradation, if not please correct me since this is my first attempt to measure and interpret such data.
First few lines of produced tables are below. One table is for overall cc1plus run and one is for symbol listing.
Please let me know if you find something like that useful so I will continue from time to time to provide such data or if it is completely useless and I will try to help somewhere else.
Thanks! Karel
GCC 4.0.1 20050514 (prerelease): silence:~$ ~/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu/bin/c++ -v Using built-in specs. Target: amd64-linux-gnu Configured with: ../gcc-4_0-branch/configure --prefix=/home/karel/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib --enable-libstdcxx-allocator=mt amd64-linux-gnu Thread model: posix
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) count 1000 CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L1_AND_L2_DTLB...|L1_DTLB_MISSES...| samples| %| samples| %| samples| %| samples| %| ------------------------------------------------------------------------ 4498408 100.000 2728674 100.000 197695 100.000 3734282 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) coun t 1000 samples % samples % samples % samples % symbol name 191205 4.5167 346985 13.4574 25558 13.8668 100870 2.8451 comptypes 134792 3.1841 84111 3.2621 5996 3.2532 287969 8.1223 ggc_alloc_stat 130635 3.0859 161496 6.2634 7606 4.1267 474363 13.3796 lookup_fnfields_1 100161 2.3660 5841 0.2265 153 0.0830 12492 0.3523 record_reg_classes 85299 2.0150 16765 0.6502 350 0.1899 36418 1.0272 dfs_walk_all 81984 1.9367 13907 0.5394 135 0.0732 39432 1.1122 find_reloads 78803 1.8615 18008 0.6984 586 0.3179 16583 0.4677 walk_tree 63327 1.4959 1979 0.0768 130 0.0705 24860 0.7012 _cpp_lex_direct 54152 1.2792 38433 1.4906 7770 4.2157 88230 2.4886 ht_lookup_with_hash 52226 1.2337 6949 0.2695 78 0.0423 2365 0.0667 _cpp_clean_line 47768 1.1284 40274 1.5620 8978 4.8711 65595 1.8501 htab_find_slot_with_hash 46236 1.0922 5905 0.2290 710 0.3852 32132 0.9063 splay_tree_splay_helper 45524 1.0754 55568 2.1551 1725 0.9359 73780 2.0810 lookup_field_1 44070 1.0410 33720 1.3078 1965 1.0661 47199 1.3313 tsubst 42073 0.9939 9121 0.3537 494 0.2680 20246 0.5710 grokdeclarator 41105 0.9710 19844 0.7696 581 0.3152 12929 0.3647 cp_walk_subtrees 37812 0.8932 61645 2.3908 10128 5.4951 6142 0.1732 push_to_top_level
GCC 4.1.0 20050514 (experimental): silence:~$ ~/usr/local/gcc-main-20050514/bin/c++ -v Using built-in specs. Target: amd64-unknown-linux-gnu Configured with: ../gcc-main/configure --prefix=/home/karel/usr/local/gcc-main-20050514 --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib amd64-unknown-linux-gnu Thread model: posix gcc version 4.1.0 20050514 (experimental)
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) coun t 1000 CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L1_AND_L2_DTLB...|L1_DTLB_MISSES...| samples| %| samples| %| samples| %| samples| %| ------------------------------------------------------------------------ 4505282 100.000 2641789 100.000 193179 100.000 3666902 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) coun t 1000 samples % samples % samples % samples % symbol name 188907 4.2302 346968 13.1545 25652 13.3726 104639 2.8740 comptypes 155510 3.4823 86426 3.2766 6713 3.4995 263278 7.2311 ggc_alloc_stat 129618 2.9025 149269 5.6592 6987 3.6424 487011 13.3761 lookup_fnfields_1 104383 2.3374 6488 0.2460 169 0.0881 9317 0.2559 record_reg_classes 90854 2.0345 14472 0.5487 264 0.1376 33677 0.9250 dfs_walk_all 90136 2.0184 24639 0.9341 663 0.3456 23587 0.6478 walk_tree 81124 1.8166 6738 0.2555 63 0.0328 28316 0.7777 find_reloads 78124 1.7494 3998 0.1516 154 0.0803 30305 0.8324 _cpp_lex_direct 57288 1.2828 40331 1.5291 8237 4.2940 98403 2.7027 ht_lookup_with_hash 55880 1.2513 7466 0.2831 100 0.0521 1187 0.0326 _cpp_clean_line 49160 1.1008 59362 2.2506 1748 0.9112 79866 2.1936 lookup_field_1 48784 1.0924 70640 2.6781 2231 1.1630 26856 0.7376 compparms 48030 1.0755 42436 1.6089 9417 4.9092 61766 1.6965 htab_find_slot_with_hash 47940 1.0735 38711 1.4676 2053 1.0702 53454 1.4682 tsubst 47034 1.0532 6084 0.2307 671 0.3498 32065 0.8807 splay_tree_splay_helper 45679 1.0229 7168 0.2718 448 0.2335 21898 0.6014 grokdeclarator 44777 1.0027 18205 0.6902 529 0.2758 13609 0.3738 cp_walk_subtrees 39890 0.8933 65131 2.4693 10764 5.6114 6737 0.1850 push_to_top_level
-- Karel Gardas [EMAIL PROTECTED] ObjectSecurity Ltd. http://www.objectsecurity.com