cache: implement unified L2 cache emulation

Alex Bennée Fri, 08 Oct 2021 08:54:34 -0700


Mahmoud Mandour <ma.mando...@gmail.com> writes:


> This adds an implementation of a simple L2 configuration, in which a
> unified L2 cache (stores both blocks of instructions and data) is
> maintained for each core separately, with no inter-core interaction
> taken in account. The L2 cache is used as a backup for L1 and is only
> accessed if the wanted block does not exist in L1.
>
> In terms of multi-threaded user-space emulation, the same approximation
> of L1 is done, a static number of caches is maintained, and each and
> every memory access initiated by a thread will have to go through one of
> the available caches.
>
> An atomic increment is used to maintain the number of L2 misses per
> instruction.
>
> The default cache parameters of L2 caches is:
>
>     2MB cache size
>     16-way associativity
>     64-byte blocks
>
> Signed-off-by: Mahmoud Mandour <ma.mando...@gmail.com>
> ---
>  contrib/plugins/cache.c | 256 +++++++++++++++++++++++++++-------------
>  1 file changed, 175 insertions(+), 81 deletions(-)
>
> diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
> index a255e26e25..908c967a09 100644
> --- a/contrib/plugins/cache.c
> +++ b/contrib/plugins/cache.c
> @@ -82,8 +82,9 @@ typedef struct {
>      char *disas_str;
>      const char *symbol;
>      uint64_t addr;
> -    uint64_t dmisses;
> -    uint64_t imisses;
> +    uint64_t l1_dmisses;
> +    uint64_t l1_imisses;
> +    uint64_t l2_misses;
>  } InsnData;
>  
>  void (*update_hit)(Cache *cache, int set, int blk);
> @@ -93,15 +94,20 @@ void (*metadata_init)(Cache *cache);
>  void (*metadata_destroy)(Cache *cache);
>  
>  static int cores;
> -static Cache **dcaches, **icaches;
> +static Cache **l1_dcaches, **l1_icaches;
> +static Cache **l2_ucaches;
>  
> -static GMutex *dcache_locks;
> -static GMutex *icache_locks;
> +static GMutex *l1_dcache_locks;
> +static GMutex *l1_icache_locks;
> +static GMutex *l2_ucache_locks;

Did you experiment with keeping a single locking hierarchy? I measured
quite a high contention with perf while running on system emulation.
While splitting locks can reduce contention I suspect the pattern of
access might just lead to 2 threads serialising twice in a row and
therfore adding to latency.

It might be overly complicated by the current split between i and d
cache for layer 1 which probably makes sense.

Otherwise looks reasonable to me:

Reviewed-by: Alex Bennée <alex.ben...@linaro.org>

-- 
Alex Bennée

Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation

Reply via email to