Hello,
I've tried to meassure some cache misses of 4.0.1 and 4.1.0 C++
compilers by using oprofile on amd64 box while compiling MICO sources
and found that:
0) compiler options used were:
-I../include -Wall -D_REENTRANT -D_GNU_SOURCE -DPIC -fPIC -c
1) the most expensive seems to be comptypes -- at least from L1 and L2
DTLB misses point of view (~13%)
2) comptypes is also the most CPU intensive operation since the most
of time is spent there
3) some other L1 and L2 DTLB misses expensive functions seems to be:
push_to_top_level(~5%), htab_find_slot_with_hash(~5%),
ht_lookup_with_hash(~4%), lookup_fnfields_1(~4%)
4) for 4.0.1 every L1 and L2 DTLB miss happens every 2275 CLK event
5) for 4.1.0 every L1 and L2 DTLB miss happens every 2332 CLK event
6) 4.1.0 is a _bit_ faster than 4.0.1
7) tables were produced after three cycles of "make; find . -name '*.o'
-exec rm \{} \;"
I've thought that L1 and L2 DTLB misses are the most important for the
overall performance or performance degradation, if not please correct
me since this is my first attempt to measure and interpret such data.
First few lines of produced tables are below. One table is for overall
cc1plus run and one is for symbol listing.
Please let me know if you find something like that useful so I will
continue from time to time to provide such data or if it is completely
useless and I will try to help somewhere else.
Thanks!
Karel
GCC 4.0.1 20050514 (prerelease):
silence:~$
~/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu/bin/c++ -v
Using built-in specs.
Target: amd64-linux-gnu
Configured with: ../gcc-4_0-branch/configure
--prefix=/home/karel/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu
--enable-shared --enable-threads --enable-languages=c++ --disable-checking
--enable-__cxa_atexit --disable-multilib --enable-libstdcxx-allocator=mt
amd64-linux-gnu
Thread model: posix
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00
(No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask
of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits)
with a unit mask of 0x00 (No unit mask) count 1000
CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L1_AND_L2_DTLB...|L1_DTLB_MISSES...|
samples| %| samples| %| samples| %| samples| %|
------------------------------------------------------------------------
4498408 100.000 2728674 100.000 197695 100.000 3734282 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00
(No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask
of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits)
with a unit mask of 0x00 (No unit mask) coun
t 1000
samples % samples % samples % samples % symbol
name
191205 4.5167 346985 13.4574 25558 13.8668 100870 2.8451
comptypes
134792 3.1841 84111 3.2621 5996 3.2532 287969 8.1223
ggc_alloc_stat
130635 3.0859 161496 6.2634 7606 4.1267 474363 13.3796
lookup_fnfields_1
100161 2.3660 5841 0.2265 153 0.0830 12492 0.3523
record_reg_classes
85299 2.0150 16765 0.6502 350 0.1899 36418 1.0272
dfs_walk_all
81984 1.9367 13907 0.5394 135 0.0732 39432 1.1122
find_reloads
78803 1.8615 18008 0.6984 586 0.3179 16583 0.4677
walk_tree
63327 1.4959 1979 0.0768 130 0.0705 24860 0.7012
_cpp_lex_direct
54152 1.2792 38433 1.4906 7770 4.2157 88230 2.4886
ht_lookup_with_hash
52226 1.2337 6949 0.2695 78 0.0423 2365 0.0667
_cpp_clean_line
47768 1.1284 40274 1.5620 8978 4.8711 65595 1.8501
htab_find_slot_with_hash
46236 1.0922 5905 0.2290 710 0.3852 32132 0.9063
splay_tree_splay_helper
45524 1.0754 55568 2.1551 1725 0.9359 73780 2.0810
lookup_field_1
44070 1.0410 33720 1.3078 1965 1.0661 47199 1.3313 tsubst
42073 0.9939 9121 0.3537 494 0.2680 20246 0.5710
grokdeclarator
41105 0.9710 19844 0.7696 581 0.3152 12929 0.3647
cp_walk_subtrees
37812 0.8932 61645 2.3908 10128 5.4951 6142 0.1732
push_to_top_level
GCC 4.1.0 20050514 (experimental):
silence:~$ ~/usr/local/gcc-main-20050514/bin/c++ -v
Using built-in specs.
Target: amd64-unknown-linux-gnu
Configured with: ../gcc-main/configure
--prefix=/home/karel/usr/local/gcc-main-20050514 --enable-shared
--enable-threads --enable-languages=c++ --disable-checking
--enable-__cxa_atexit --disable-multilib amd64-unknown-linux-gnu
Thread model: posix
gcc version 4.1.0 20050514 (experimental)
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00
(No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask
of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits)
with a unit mask of 0x00 (No unit mask) coun
t 1000
CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L1_AND_L2_DTLB...|L1_DTLB_MISSES...|
samples| %| samples| %| samples| %| samples| %|
------------------------------------------------------------------------
4505282 100.000 2641789 100.000 193179 100.000 3666902 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00
(No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask
of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits)
with a unit mask of 0x00 (No unit mask) coun
t 1000
samples % samples % samples % samples % symbol
name
188907 4.2302 346968 13.1545 25652 13.3726 104639 2.8740
comptypes
155510 3.4823 86426 3.2766 6713 3.4995 263278 7.2311
ggc_alloc_stat
129618 2.9025 149269 5.6592 6987 3.6424 487011 13.3761
lookup_fnfields_1
104383 2.3374 6488 0.2460 169 0.0881 9317 0.2559
record_reg_classes
90854 2.0345 14472 0.5487 264 0.1376 33677 0.9250
dfs_walk_all
90136 2.0184 24639 0.9341 663 0.3456 23587 0.6478
walk_tree
81124 1.8166 6738 0.2555 63 0.0328 28316 0.7777
find_reloads
78124 1.7494 3998 0.1516 154 0.0803 30305 0.8324
_cpp_lex_direct
57288 1.2828 40331 1.5291 8237 4.2940 98403 2.7027
ht_lookup_with_hash
55880 1.2513 7466 0.2831 100 0.0521 1187 0.0326
_cpp_clean_line
49160 1.1008 59362 2.2506 1748 0.9112 79866 2.1936
lookup_field_1
48784 1.0924 70640 2.6781 2231 1.1630 26856 0.7376
compparms
48030 1.0755 42436 1.6089 9417 4.9092 61766 1.6965
htab_find_slot_with_hash
47940 1.0735 38711 1.4676 2053 1.0702 53454 1.4682 tsubst
47034 1.0532 6084 0.2307 671 0.3498 32065 0.8807
splay_tree_splay_helper
45679 1.0229 7168 0.2718 448 0.2335 21898 0.6014
grokdeclarator
44777 1.0027 18205 0.6902 529 0.2758 13609 0.3738
cp_walk_subtrees
39890 0.8933 65131 2.4693 10764 5.6114 6737 0.1850
push_to_top_level
--
Karel Gardas [EMAIL PROTECTED]
ObjectSecurity Ltd. http://www.objectsecurity.com