I've never looked at the internals of how the native Slurm archive
script works. What I can tell you is that we have never had a problem
reimporting the data back in that was dumped from older versions into a
current version database. So the import using sacctmgr must do the
conversion from the older formats to the newer formats and handle the
schema changes.
I will note that if you are storing job_scripts and envs those can eat
up a ton of space in 21.08. It looks like they've solved that problem
in 22.05 but the archive steps on 21.08 took forever due to those
scripts and envs.
-Paul Edmon-
On 7/14/2022 12:55 PM, Timony, Mick wrote:
Hi Paul
If you have 6 years worth of data and you want to prune down to 2
years, I recommend going month by month rather than doing it in
one go. When we initially started archiving data several years
back our first pass at archiving (which at that time had 2 years
of data in it) took forever and actually caused issues with the
archive process. We worked with SchedMD, improved the archive
script built into Slurm but also decided to only archive one month
at a time which allowed it to get done in a reasonable amount of time.
Thanks, that is good advice. We'd had issues with accounting in the
past and had to run slurmdb rollups which can take up to 2 weeks. It's
good to get feedback like yours. Do you what exactly the Slurm archive
script does and how it archives data or what formats it supports?
The docs are a little vague:
https://slurm.schedmd.com/slurmdbd.conf.html#OPT_ArchiveScript
"This script is used to transfer accounting records out of the
database into an archive. It is used in place of the internal process
used to archive objects. The script is executed with no arguments, and
the following environment variables are set."
The archived data can be pulled into a different slurm database,
which is what we do for importing historic data into our XDMod
instance.
How do you keep track of and implement schema changes to this database?
Thanks
--Mick