I don't know how much testing has been done with ARM_SE and checkpointing. Checkpointing on ARM_FS does work, but there might be a bug you're running into in ARM_SE. I don't have time to try and debug it for you at the moment. The locked flag seems to be serialized and unserialized in the atomic cpu, so I don't know why it's not being found. You should be able to look at m5.cpt and verify that it's actually there.
Ali



On Tue, 29 Mar 2011 13:51:13 -0400, Griffin Wright <[email protected]> wrote:
Trying with O3 was just a random attempt. The intent is to be able to use simple timing for what I'm doing. The checkpoint taking and restoring even in pure simple atomic mode is not functioning, so that's what I'm trying to
figure out.

Working with a simple helloworld program(but also several others), I take a checkpoint in simple atomic at instruction N, which writes and then the program exits because the 'thread reached max instruction count' [which is not what I'm concerned with]. When I then restore, still in simple atomic (and not switching to anything else), from that same checkpoint, I get "warn: optional parameter system.cpu.workload:M5_pid not present", followed by the program going into "**** REAL SIMULATION ****" followed by a seg
fault.  I'm looking into the segfault bit now, but am unsure what the
M5_pid warning relates to; I'm only using one CPU and one thread.

I am using m5.fast, though the same bugs happen with m5.opt.  I'll be
trying m5.debug next, at least in the hopes of getting more useful info
with gdb.

In conclusion, are the bugs I've mentioned one of these common
'long-standing'  bugs that people have had to deal with?

-Griffin

On Tue, 29 Mar 2011 09:57:53 -0700, Steve Reinhardt <[email protected]>
wrote:
In theory, these should all work, though as Ali said things will break
if you take a checkpoint in a system with caches because the caches
will likely have dirty memory blocks that don't get saved.  So since
O3 doesn't work without caches, in practice you can't create a
checkpoint from it. But strictly speaking that's a shortcoming of the
caches and not the CPU model.

In practice, people generally create checkpoints with atomic mode
(since it's fast), then restore to atomic mode and switch to
timing/detailed.  So if you're having problems with a
checkpoint/restore in atomic mode then that's definitely a bug of some
kind.

Problems in other modes may well be bugs too but they may be
longstanding ones that people have just learned to work around.

Steve

On Tue, Mar 29, 2011 at 9:13 AM, Griffin Wright <[email protected]>
wrote:
In what situations is checkpoint taking/restoring actually supported in
m5?

I have tried creating and restoring checkpoints with different programs
in
simple atomic, simple timing, and detailed(O3CPUTim) modes, and they all
fail due to unserialization errors somewhere, either with
system.cpu:locked,
or Globals.curTick (in the case of detailed mode).  I'm not sure what
I'm
missing, and would at least like some clarification on how m5 supports
checkpointing in any of these modes.  I've looked at various
unserialize
methods, and can't tell what functionality they might be lacking which
causes these troubles.

In all cases, once I create the checkpoint, the program exits due to a "thread reached the max instruction count", but that doesn't concern me
because at that point, the checkpoint has become available for use.

Thanks,

Griffin Wright



On Sun, 27 Mar 2011 12:04:53 -0500, Ali Saidi <[email protected]> wrote:

Why are you taking checkpoints with a timing cpu and not an atomic one?
It's
faster and the caches don't save their state, so if you're using caches
with
the timing CPU you'll get an incomplete checkpoint.
Ali
On Mar 27, 2011, at 11:15 AM, Griffin Wright wrote:

Hello,

I'm working with checkpoints on simulations with an ARM_SE setup on a
simple
timing CPU, and while I can take a checkpoint in simple timing mode just fine, when I attempt to restore from a checkpoint, I get the following:

fatal: Can't unserialize 'system.cpu:locked'
 @ cycle 1945913882000
[paramIn:build/ARM_SE/sim/serialize.cc, line 211]
Memory Usage: 559288 KBytes
For more information see: http://www.m5sim.org/fatal/60de9f5a

That link points to nothing, but that's no biggie.  I skimmed through
the
user's archive and found some related queries, with the quote at the end
of
this message being a solution.  I'm wondering if this is in fact the
only
way to use the provided restore-checkpoint feature in m5, or if progress
has
been made with regards to the simple timing CPU since this below post,
or
if
I'm looking at my error in the wrong light altogether.

The code that is failing is as follows, showing that system.cpu:locked
cannot be unserialized:

    if (!cp->find(section, name, str) || !parseParam(str, param)) {
        fatal("Can't unserialize '%s:%s'\n", section, name);
    }

"Resume() is the opposite of drain() which means the
system can continue issuing requests and acting as normal. serialize() needs to save all of the state the CPU needs to put itself in the same state as it was executing and unserialize() restores that saved state.
Looking at other implementations of save()/restore() is the easiest
way to do this. Finally, if you want to be able to switch to/from the
inorder cpu switchOut() and takeOverFrom() need to be implemented.

Thank you,
Griffin Wright


_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to