Karel Gardas <[EMAIL PROTECTED]> writes: > I've thought that L1 and L2 DTLB misses are the most important for the > overall performance or performance degradation, if not please correct > me since this is my first attempt to measure and interpret such data.
TLB is just for caching the translations from virtual to physical addresses. Normally the data/instruction cache misses are more important. There are a few TLB intensive workloads too, but they tend to use much more memory than gcc normally does. So I think you should rather use ICACHE_MISSES and DATA_CACHE_REFILLS_FROM_SYSTEM, which measure the "real" L2 caches. And perhaps run a normal instruction profile (CPU_CLK_UNHALTED) in parallel and double check the hot spots displayed by the others match the real time hogs. Note you can use upto three performance counters at the same time. -Andi