[gem5-users] Performance tanking with memory intensive benchmark compared to real machine

Jared Nye via gem5-users Fri, 30 Jul 2021 13:16:40 -0700

 Hello,

I am running a simple single threaded memory benchmark that measures the
time it takes to copy an array (https://github.com/BTone/cagbench). I run
the benchmark in SE mode with only 1 thread (and 1 CPU) configured to match
the setup used in gem5-Skylake (
https://github.com/darchr/gem5-skylake-config) with 32 kB L1I and L1D
cache, 256 kB L2 and 8 MB LLC.


On a real Intel Skylake (i7 6700k), DDR4-2400:
With an array size of 8 MB (total working set of 16 MB), the throughput is
~11,000 MB/s and with an array size 16 MB (total working set of 32 MB) the
throughput is ~9,500 MB/s.

In Gem5 (darchr/gem5-skylake-config):
With an array size of 8 MB (total working set of 16 MB), the throughput is
~6,000 MB/s. However, with an array size 16 MB (total working set of 32 MB)
the throughput drops to ~700 MB/s.

The performance when the workload mostly fits in the cache hierarchy is
reasonable, but ~700 MB/s seems far slower and does not seem commensurate
with the real system.

I think this has something to do with the memory system past the last-level
cache, but I am having trouble determining what exactly the issue is.

Just for reference, this is how I have the cache hierarchy configured (I
reduced the tag/data/response latencies to eliminate the caches from being
an issue):

Both L1I and L1D caches:
    size = '32kB'
    assoc = 8
    tag_latency = 1
    data_latency = 1
    response_latency = 1
    mshrs = 128
    tgts_per_mshr = 16
    write_buffers = 56
    demand_mshr_reserve = 96

L2 Cache:
    size = '256kB'
    assoc = 4
    tag_latency = 1
    data_latency = 1
    response_latency = 1
    mshrs = 256
    tgts_per_mshr = 16
    write_buffers = 256

L3 cache:
    size = '8MB'
    assoc = 16
    tag_latency = 1
    data_latency = 1
    response_latency = 1
    mshrs = 256
    tgts_per_mshr = 20
    write_buffers = 256
    clusivity = 'mostly_excl'

Any suggestions would be greatly appreciated.

_______________________________________________
gem5-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Performance tanking with memory intensive benchmark compared to real machine

Reply via email to