Mahmoud Mandour <ma.mando...@gmail.com> writes:
> This adds an implementation of a simple L2 configuration, in which a > unified L2 cache (stores both blocks of instructions and data) is > maintained for each core separately, with no inter-core interaction > taken in account. The L2 cache is used as a backup for L1 and is only > accessed if the wanted block does not exist in L1. > > In terms of multi-threaded user-space emulation, the same approximation > of L1 is done, a static number of caches is maintained, and each and > every memory access initiated by a thread will have to go through one of > the available caches. > > An atomic increment is used to maintain the number of L2 misses per > instruction. > > The default cache parameters of L2 caches is: > > 2MB cache size > 16-way associativity > 64-byte blocks > > Signed-off-by: Mahmoud Mandour <ma.mando...@gmail.com> > --- > contrib/plugins/cache.c | 256 +++++++++++++++++++++++++++------------- > 1 file changed, 175 insertions(+), 81 deletions(-) > > diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c > index a255e26e25..908c967a09 100644 > --- a/contrib/plugins/cache.c > +++ b/contrib/plugins/cache.c > @@ -82,8 +82,9 @@ typedef struct { > char *disas_str; > const char *symbol; > uint64_t addr; > - uint64_t dmisses; > - uint64_t imisses; > + uint64_t l1_dmisses; > + uint64_t l1_imisses; > + uint64_t l2_misses; > } InsnData; > > void (*update_hit)(Cache *cache, int set, int blk); > @@ -93,15 +94,20 @@ void (*metadata_init)(Cache *cache); > void (*metadata_destroy)(Cache *cache); > > static int cores; > -static Cache **dcaches, **icaches; > +static Cache **l1_dcaches, **l1_icaches; > +static Cache **l2_ucaches; > > -static GMutex *dcache_locks; > -static GMutex *icache_locks; > +static GMutex *l1_dcache_locks; > +static GMutex *l1_icache_locks; > +static GMutex *l2_ucache_locks; Did you experiment with keeping a single locking hierarchy? I measured quite a high contention with perf while running on system emulation. While splitting locks can reduce contention I suspect the pattern of access might just lead to 2 threads serialising twice in a row and therfore adding to latency. It might be overly complicated by the current split between i and d cache for layer 1 which probably makes sense. Otherwise looks reasonable to me: Reviewed-by: Alex Bennée <alex.ben...@linaro.org> -- Alex Bennée