I don't know how much testing has been done with ARM_SE and
checkpointing. Checkpointing on ARM_FS does work, but there might be a
bug you're running into in ARM_SE. I don't have time to try and debug it
for you at the moment. The locked flag seems to be serialized and
unserialized in the atomic cpu, so I don't know why it's not being
found. You should be able to look at m5.cpt and verify that it's
actually there.
Ali
On Tue, 29 Mar 2011 13:51:13 -0400, Griffin Wright <[email protected]>
wrote:
Trying with O3 was just a random attempt. The intent is to be able
to use
simple timing for what I'm doing. The checkpoint taking and
restoring even
in pure simple atomic mode is not functioning, so that's what I'm
trying to
figure out.
Working with a simple helloworld program(but also several others), I
take a
checkpoint in simple atomic at instruction N, which writes and then
the
program exits because the 'thread reached max instruction count'
[which is
not what I'm concerned with]. When I then restore, still in simple
atomic
(and not switching to anything else), from that same checkpoint, I
get
"warn: optional parameter system.cpu.workload:M5_pid not present",
followed
by the program going into "**** REAL SIMULATION ****" followed by a
seg
fault. I'm looking into the segfault bit now, but am unsure what the
M5_pid warning relates to; I'm only using one CPU and one thread.
I am using m5.fast, though the same bugs happen with m5.opt. I'll be
trying m5.debug next, at least in the hopes of getting more useful
info
with gdb.
In conclusion, are the bugs I've mentioned one of these common
'long-standing' bugs that people have had to deal with?
-Griffin
On Tue, 29 Mar 2011 09:57:53 -0700, Steve Reinhardt
<[email protected]>
wrote:
In theory, these should all work, though as Ali said things will
break
if you take a checkpoint in a system with caches because the caches
will likely have dirty memory blocks that don't get saved. So since
O3 doesn't work without caches, in practice you can't create a
checkpoint from it. But strictly speaking that's a shortcoming of
the
caches and not the CPU model.
In practice, people generally create checkpoints with atomic mode
(since it's fast), then restore to atomic mode and switch to
timing/detailed. So if you're having problems with a
checkpoint/restore in atomic mode then that's definitely a bug of
some
kind.
Problems in other modes may well be bugs too but they may be
longstanding ones that people have just learned to work around.
Steve
On Tue, Mar 29, 2011 at 9:13 AM, Griffin Wright <[email protected]>
wrote:
In what situations is checkpoint taking/restoring actually
supported in
m5?
I have tried creating and restoring checkpoints with different
programs
in
simple atomic, simple timing, and detailed(O3CPUTim) modes, and
they all
fail due to unserialization errors somewhere, either with
system.cpu:locked,
or Globals.curTick (in the case of detailed mode). I'm not sure
what
I'm
missing, and would at least like some clarification on how m5
supports
checkpointing in any of these modes. I've looked at various
unserialize
methods, and can't tell what functionality they might be lacking
which
causes these troubles.
In all cases, once I create the checkpoint, the program exits due
to a
"thread reached the max instruction count", but that doesn't
concern me
because at that point, the checkpoint has become available for use.
Thanks,
Griffin Wright
On Sun, 27 Mar 2011 12:04:53 -0500, Ali Saidi <[email protected]>
wrote:
Why are you taking checkpoints with a timing cpu and not an atomic
one?
It's
faster and the caches don't save their state, so if you're using
caches
with
the timing CPU you'll get an incomplete checkpoint.
Ali
On Mar 27, 2011, at 11:15 AM, Griffin Wright wrote:
Hello,
I'm working with checkpoints on simulations with an ARM_SE setup on
a
simple
timing CPU, and while I can take a checkpoint in simple timing mode
just
fine, when I attempt to restore from a checkpoint, I get the
following:
fatal: Can't unserialize 'system.cpu:locked'
@ cycle 1945913882000
[paramIn:build/ARM_SE/sim/serialize.cc, line 211]
Memory Usage: 559288 KBytes
For more information see: http://www.m5sim.org/fatal/60de9f5a
That link points to nothing, but that's no biggie. I skimmed
through
the
user's archive and found some related queries, with the quote at
the end
of
this message being a solution. I'm wondering if this is in fact
the
only
way to use the provided restore-checkpoint feature in m5, or if
progress
has
been made with regards to the simple timing CPU since this below
post,
or
if
I'm looking at my error in the wrong light altogether.
The code that is failing is as follows, showing that
system.cpu:locked
cannot be unserialized:
if (!cp->find(section, name, str) || !parseParam(str, param)) {
fatal("Can't unserialize '%s:%s'\n", section, name);
}
"Resume() is the opposite of drain() which means the
system can continue issuing requests and acting as normal.
serialize()
needs to save all of the state the CPU needs to put itself in the
same
state as it was executing and unserialize() restores that saved
state.
Looking at other implementations of save()/restore() is the easiest
way to do this. Finally, if you want to be able to switch to/from
the
inorder cpu switchOut() and takeOverFrom() need to be implemented.
Thank you,
Griffin Wright
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users