Re: [m5-users] Checkpoint Restore with a Simple Timing CPU

Ali Saidi Tue, 29 Mar 2011 11:17:02 -0700

I don't know how much testing has been done with ARM_SE andcheckpointing. Checkpointing on ARM_FS does work, but there might be abug you're running into in ARM_SE. I don't have time to try and debug itfor you at the moment. The locked flag seems to be serialized andunserialized in the atomic cpu, so I don't know why it's not beingfound. You should be able to look at m5.cpt and verify that it'sactually there.

Ali

On Tue, 29 Mar 2011 13:51:13 -0400, Griffin Wright <[email protected]>wrote:

Trying with O3 was just a random attempt. The intent is to be ableto usesimple timing for what I'm doing. The checkpoint taking andrestoring evenin pure simple atomic mode is not functioning, so that's what I'mtrying to
figure out.
Working with a simple helloworld program(but also several others), Itake acheckpoint in simple atomic at instruction N, which writes and thentheprogram exits because the 'thread reached max instruction count'[which isnot what I'm concerned with]. When I then restore, still in simpleatomic(and not switching to anything else), from that same checkpoint, Iget"warn: optional parameter system.cpu.workload:M5_pid not present",followedby the program going into "**** REAL SIMULATION ****" followed by aseg
fault.  I'm looking into the segfault bit now, but am unsure what the
M5_pid warning relates to; I'm only using one CPU and one thread.

I am using m5.fast, though the same bugs happen with m5.opt.  I'll be
trying m5.debug next, at least in the hopes of getting more usefulinfo
with gdb.

In conclusion, are the bugs I've mentioned one of these common
'long-standing'  bugs that people have had to deal with?

-Griffin
On Tue, 29 Mar 2011 09:57:53 -0700, Steve Reinhardt<[email protected]>
wrote:
In theory, these should all work, though as Ali said things willbreak
if you take a checkpoint in a system with caches because the caches
will likely have dirty memory blocks that don't get saved.  So since
O3 doesn't work without caches, in practice you can't create a
checkpoint from it. But strictly speaking that's a shortcoming ofthe
caches and not the CPU model.

In practice, people generally create checkpoints with atomic mode
(since it's fast), then restore to atomic mode and switch to
timing/detailed.  So if you're having problems with a
checkpoint/restore in atomic mode then that's definitely a bug ofsome
kind.

Problems in other modes may well be bugs too but they may be
longstanding ones that people have just learned to work around.

Steve

On Tue, Mar 29, 2011 at 9:13 AM, Griffin Wright <[email protected]>
wrote:
In what situations is checkpoint taking/restoring actuallysupported in
m5?
I have tried creating and restoring checkpoints with differentprograms
in
simple atomic, simple timing, and detailed(O3CPUTim) modes, andthey all
fail due to unserialization errors somewhere, either with
system.cpu:locked,
or Globals.curTick (in the case of detailed mode). I'm not surewhat
I'm
missing, and would at least like some clarification on how m5supports
checkpointing in any of these modes.  I've looked at various
unserialize
methods, and can't tell what functionality they might be lackingwhich
causes these troubles.
In all cases, once I create the checkpoint, the program exits dueto a"thread reached the max instruction count", but that doesn'tconcern me
because at that point, the checkpoint has become available for use.

Thanks,

Griffin Wright
On Sun, 27 Mar 2011 12:04:53 -0500, Ali Saidi <[email protected]>wrote:
Why are you taking checkpoints with a timing cpu and not an atomicone?
It's
faster and the caches don't save their state, so if you're usingcaches
with
the timing CPU you'll get an incomplete checkpoint.
Ali
On Mar 27, 2011, at 11:15 AM, Griffin Wright wrote:

Hello,
I'm working with checkpoints on simulations with an ARM_SE setup ona
simple
timing CPU, and while I can take a checkpoint in simple timing modejustfine, when I attempt to restore from a checkpoint, I get thefollowing:
fatal: Can't unserialize 'system.cpu:locked'
 @ cycle 1945913882000
[paramIn:build/ARM_SE/sim/serialize.cc, line 211]
Memory Usage: 559288 KBytes
For more information see: http://www.m5sim.org/fatal/60de9f5a
That link points to nothing, but that's no biggie. I skimmedthrough
the
user's archive and found some related queries, with the quote atthe end
of
this message being a solution. I'm wondering if this is in factthe
only
way to use the provided restore-checkpoint feature in m5, or ifprogress
has
been made with regards to the simple timing CPU since this belowpost,
or
if
I'm looking at my error in the wrong light altogether.
The code that is failing is as follows, showing thatsystem.cpu:locked
cannot be unserialized:

    if (!cp->find(section, name, str) || !parseParam(str, param)) {
        fatal("Can't unserialize '%s:%s'\n", section, name);
    }

"Resume() is the opposite of drain() which means the
system can continue issuing requests and acting as normal.serialize()needs to save all of the state the CPU needs to put itself in thesamestate as it was executing and unserialize() restores that savedstate.
Looking at other implementations of save()/restore() is the easiest
way to do this. Finally, if you want to be able to switch to/fromthe
inorder cpu switchOut() and takeOverFrom() need to be implemented.

Thank you,
Griffin Wright


_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] Checkpoint Restore with a Simple Timing CPU

Reply via email to