Hello I have been debugging the assertion failure when the L3$ (residing in the HNF) clusivity = Mostly Exclusive.
All the failures are related when the config being modelled is L2$ for all CPUs
is private & [L2$, L3$] clusivity == [Mostly Inclusive, Mostly_Exclusive].
Enabling some debug flags, a snippet of the trace at the failure point being:
12180159000: RubyGenerated: system.cpu2.l2: executing Pop_TriggerQueue
12180159000: RubyGenerated: system.cpu2.l2: executing Send_Data
12180159000: RubyGenerated: system.cpu2.l2: executing
ProcessNextState_ClearPending
12180159000: RubyGenerated: system.cpu2.l2: next_state: BUSY_BLKD
ProtocolTrace: 12180159000 18 Cache TX_Data
BUSY_BLKD>BUSY_BLKD [0xab040, line 0xab040]
ProtocolTrace: 12180159000 7 Seq Begin >
[0x24db90, line 0x24db80] LD
12180160000: RubyGenerated: system.ruby.hnf1.cntrl: [Cache_Controller 25],
Time: 24360320, state: BUSY_BLKD, event: CompAck, addr: 0xab040
12180160000: RubyGenerated: system.ruby.hnf1.cntrl: executing Receive_ReqResp
12180160000: RubyGenerated: system.ruby.hnf1.cntrl: executing
UpdateDirState_FromReqResp
build/ARM/mem/ruby/protocol/Cache_Controller.cc:5477: panic: Runtime Error at
CHI-cache-actions.sm:1947: assert failure.
Looking at around line 1947 in CHI-cache-actions.sm:
action(UpdateDirState_FromReqResp, desc="") { <== HNF1 cache controller
executing this cache action
peek(rspInPort, CHIResponseMsg) {
if ((in_msg.type == CHIResponseType:CompAck) && tbe.updateDirOnCompAck) {
assert(tbe.requestor == in_msg.responder);
tbe.dir_sharers.add(in_msg.responder);
if (tbe.requestorToBeOwner) {
assert(tbe.dataMaybeDirtyUpstream);
assert(tbe.dir_ownerExists == false);
assert(tbe.requestorToBeExclusiveOwner == false);
tbe.dir_owner := in_msg.responder;
tbe.dir_ownerExists := true;
tbe.dir_ownerIsExcl := false;
} else if (tbe.requestorToBeExclusiveOwner) {
assert(tbe.dataMaybeDirtyUpstream);
assert(tbe.dir_ownerExists == false);
assert(tbe.dir_sharers.count() == 1); <== Line 1947
tbe.dir_owner := in_msg.responder;
tbe.dir_ownerExists := true;
tbe.dir_ownerIsExcl := true;
}
}
}
printTBEState(tbe);
}
So the problem _seems_ to be related to updating the directory state within
HNF1.
The L2$ wants to make the requested cache line to be exclusive. Thus
dir_sharers.count should be zero (as the cache line now only resides in a
single L2$).
QS: Is this possibly a CHI bug?
P.S : I have also attached the gzipped version of the log file.
Tks
JO
From: Javed Osmany
Sent: 22 April 2022 12:05
To: gem5 users mailing list <[email protected]>
Cc: Javed Osmany <[email protected]>
Subject: RE: CHi - assertion error when modelling "mostly inclusive" for
private L2$
Hello
An update on my previous email...
Have been simulating the multicore system with Parsec/Splash2 benchmarks for
different permutations of clusivity for L2$ and L3$. The results being in the
following table. Note, by L3$, I mean the L3$ within the HNF
L2$ clusivity
L3$ clusivity
Comments
Strict Inclusive (sincl) (default)
Mostly inclusive (mincl) (default)
All tests complete okay
mincl
mincl
All tests complete okay
mincl
Mostly exclusive (mexcl)
10 tests abort with the assertion failure
sincl
mexcl
10 tests abort with the assertion failure
>From the above, the deduction being that setting L3$ clusivity to mostly
>exclusive would be the cause of the problem.
The definition of mostly_inclusive (defined by default for the HNFCache
controller in CHI_config.py) and mostly_exclusive (based on the write up @
https://www.gem5.org/documentation/general_docs/ruby/CHI/ and my understanding
being that the L3$ now becomes a victim cache of the L2$) that I have used
being:
mostly inclusive
mostly exclusive
comments
alloc_on_seq_acc
False
False
alloc_on_seq_line_write
False
False
alloc_on_readshared
True
False
alloc_on_readunique
False
False
alloc_on_readonce
True
False
alloc_on_writeback
True
True
For the L3$, writebacks and evictions being the mechanism of allocating a cache
line
dealloc_on_unique
True
True
Upstream $line becomes unique, then deallocate from L3$
dealloc_on_shared
False
True
Upstream $line becomes shared, then deallocate from L3$
dealloc_backinv_unique
False
False
If the L3$ line is deallocated due to replacement, then don't back invalidate
the upstream cache line.
dealloc_backinv_shared
False
False
If the L3$ line is deallocated due to replacement, then don't back invalidate
the upstream cache line.
Any insight as to why the above encoding for mostly exclusive might be wrong
and thus causing the assertions to fire, would be greatly appreciated.
Thanks in advance
JO
From: Javed Osmany
Sent: 21 April 2022 16:03
To: gem5 users mailing list <[email protected]<mailto:[email protected]>>
Cc: Javed Osmany <[email protected]<mailto:[email protected]>>
Subject: CHi - assertion error when modelling "mostly inclusive" for private L2$
Hello
I am simulating a multicore Ruby system using CHI, using the Parsec/Splash2
benchmarks & gem5-21.2.1.0.
It consists of three clusters :
1) Little cluster of 4 CPUs, each CPU has private L1$ and L2$
2) Middle cluster of 3 CPUs, each CPU has private L1$ and L2$
3) Big cluster of 1 CPU with private L1$ and L2$.
By default, the L2$ and L3$ (residing in the HNF) have their clusivity set to
strict_inclusive and mostly_inclusive respectively (CHI_config.py):
class CHI_L2Controller(CHI_Cache_Controller):
'''
Default parameters for a L2 Cache controller
'''
def __init__(self, ruby_system, cache, l2_clusivity, prefetcher):
super(CHI_L2Controller, self).__init__(ruby_system)
self.sequencer = NULL
self.cache = cache
self.use_prefetcher = False
self.allow_SD = True
self.is_HN = False
self.enable_DMT = False
self.enable_DCT = False
self.send_evictions = False
# Strict inclusive MOESI
self.alloc_on_seq_acc = False
self.alloc_on_seq_line_write = False
self.alloc_on_readshared = True
self.alloc_on_readunique = True
self.alloc_on_readonce = True
self.alloc_on_writeback = True
self.dealloc_on_unique = False
self.dealloc_on_shared = False
self.dealloc_backinv_unique = True
self.dealloc_backinv_shared = True
class CHI_HNFController(CHI_Cache_Controller):
'''
Default parameters for a coherent home node (HNF) cache controller
'''
#def __init__(self, ruby_system, cache, prefetcher, addr_ranges):
def __init__(self, ruby_system, cache, prefetcher, addr_ranges,
hnf_enable_dmt, hnf_enable_dct, \
num_tbe, num_repl_tbe, num_snp_tbe, unified_repl_tbe,
l3_clusivity):
super(CHI_HNFController, self).__init__(ruby_system)
self.sequencer = NULL
self.cache = cache
self.use_prefetcher = False
self.addr_ranges = addr_ranges
self.allow_SD = True
self.is_HN = True
#self.enable_DMT = True
#self.enable_DCT = True
self.enable_DMT = hnf_enable_dmt
self.enable_DCT = hnf_enable_dct
self.send_evictions = False
# MOESI / Mostly inclusive for shared / Exclusive for unique
self.alloc_on_seq_acc = False
self.alloc_on_seq_line_write = False
self.alloc_on_readshared = True
self.alloc_on_readunique = False
self.alloc_on_readonce = True
self.alloc_on_writeback = True
self.dealloc_on_unique = True
self.dealloc_on_shared = False
self.dealloc_backinv_unique = False
self.dealloc_backinv_shared = False
The simulations complete okay for the default clusivity of L2$ and L3$.
However, if I change the L2$ clusivity to "mostly_inclusive" some of the
benchmarks are failing with an assertion error.
I took the default L3$ clusivity of mostly_inclusive to update the L2$
clusivity to be mostly_inclusive:
class CHI_L2Controller(CHI_Cache_Controller):
'''
Default parameters for a L2 Cache controller
'''
def __init__(self, ruby_system, cache, l2_clusivity, prefetcher):
super(CHI_L2Controller, self).__init__(ruby_system)
self.sequencer = NULL
self.cache = cache
self.use_prefetcher = False
self.allow_SD = True
self.is_HN = False
self.enable_DMT = False
self.enable_DCT = False
self.send_evictions = False
# Strict inclusive MOESI
if (l2_clusivity == "sincl"):
self.alloc_on_seq_acc = False
self.alloc_on_seq_line_write = False
self.alloc_on_readshared = True
self.alloc_on_readunique = True
self.alloc_on_readonce = True
self.alloc_on_writeback = True
self.dealloc_on_unique = False
self.dealloc_on_shared = False
self.dealloc_backinv_unique = True
self.dealloc_backinv_shared = True
elif (l2_clusivity == "mincl"):
# Mostly inclusive MOESI
self.alloc_on_seq_acc = False
self.alloc_on_seq_line_write = False
self.alloc_on_readshared = True
self.alloc_on_readunique = False
self.alloc_on_readonce = True
self.alloc_on_writeback = True
self.dealloc_on_unique = True
self.dealloc_on_shared = False
self.dealloc_backinv_unique = False
self.dealloc_backinv_shared = False
The assertion error being:
log_parsec_volrend_134_8rnf_1snf_4hnf_3_clust_all_priv_l2.txt:build/ARM/mem/ruby/protocol/Cache_Controller.cc:5477:
panic: Runtime Error at CHI-cache-actions.sm:1947: assert failure.
QS 1: Even though the L2$ is private, i am assuming that L2$ clusivity can be
set to mostly_inclusive. Is that assumption correct?
QS2: If the answer to QS 1 is yes, then it would seem that the
"mostly_inclusive" settings for the L2$ (copied from the mostly_inclusive
settings for L3$ residing in the HNF) could be the root cause of the problem.
Any thoughts on this ?
Thanks in advance
JO
log_parsec_lu_cb_134_8rnf_1snf_4hnf_3_clust_all_priv_l2_mincl_mexcl_debug1.txt.gz
Description: log_parsec_lu_cb_134_8rnf_1snf_4hnf_3_clust_all_priv_l2_mincl_mexcl_debug1.txt.gz
_______________________________________________ gem5-users mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
