On Tue, 14 Oct 2008, Mizanur Khondoker wrote:

Dear list,

Most high performance computing clusters/grid engines  have some
restrictions on how long a job can be run in batch mode.
The cluster I am using has maximum of 48 hours limit, but my job would take
far more than that.

I know that it is possible to checkpoint jobs without modifying the code if
some specialized software (e.g., BLCR ) is installed on the grid engine.

However, I am looking for a solution when this kind of facility is not
available on the cluster,  for example , by modifying the code so that the
job can checkpoint and restart by itself.

Does anyone have any  experience or idea of doing so? Any help would be
greatly appreciated.

Yes, we've done this for many years, generally by saving the workspace every few hours (in our case say every 100 simulation runs), and making sure that the workspace contains enough information to restart at the save points. This approach does depend on the run coming back to a simply reproducible point fairly often: if it is a simulation running entirely in C++ code in a package you have little hope.


--
Mizanur Khondoker
Division of Pathway Medicine (DPM)
The University of Edinburgh Medical School
The Chancellor's Building
49 Little France Crescent
Edinburgh EH16 4SB
United Kingdom

Tel:  +44 (0) 131 242 6287
Fax: +44 (0) 131 242 6244
http://www.pathwaymedicine.ed.ac.uk/

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to