Hi Chris, you’ll find the patch for our version attached. Integrate it as you see fit, personally I’d recommend a branch since the two log files approach isn’t really reconcilable with the idea of having separate job files accessible to the respective owner. All filenames and directories are defined with „#define“ pragmas, as it was more convenient to have them all in one place.
Kind regards, Lech
job_archive.patch.gz
Description: GNU Zip compressed data
> Am 15.06.2019 um 00:47 schrieb Christopher Benjamin Coffey > <chris.cof...@nau.edu>: > > Hi Lech, > > I'm glad that it is working out well with the modifications you've put in > place! Yes, there can be a huge volume of jobscripts out there. That’s a > pretty good way of dealing with it! . We've backed up 1.1M jobscripts since > its inception 1.5 months ago and aren't too worried yet about the inode/space > usage. We haven't settled in to what we will do to keep the archive clean > yet. My thought was: > > - keep two months (directories) of jobscripts for each user, leaving the > jobscripts intact for easy user access > - tar up the month directories that are older than two months > - keep four tarred months > > That way there would be 6 months of jobscript archive to match our 6 month > job accounting retention in the slurm db. > > I'd be interested in your version however, please do send it along! And > please keep in touch with how everything goes! > > Best, > Chris > — > Christopher Coffey > High-Performance Computing > Northern Arizona University > 928-523-1167 > > > On 6/14/19, 2:22 AM, "slurm-users on behalf of Lech Nieroda" > <slurm-users-boun...@lists.schedmd.com on behalf of > lech.nier...@uni-koeln.de> wrote: > > Hello Chris, > > we’ve tried out your archiver and adapted it to our needs, it works quite > well. > The changes: > - we get lots of jobs per day, ca. 3k-5k, so storing them as individual > files would waste too much inodes and 4k-blocks. Instead everything is > written into two log files (job_script.log and job_env.log) with the prefix > „<timestamp> <user> <jobid>“ in each line. In this way one can easily grep > and cut the corresponding job script or environment. Long term storage and > compression is handled by logrotate, with standard compression settings > - the parsing part can fail to produce a username, thus we have introduced > a customized environment variable that stores the username and can be read > directly by the archiver > - most of the program’s output, including debug output, is handled by the > logger and stored in a jobarchive.log file with an appropriate timestamp > - the logger uses a va_list to make multi-argument log-oneliners possible > - signal handling is reduced to the debug-level incease/decrease > - file handling is mostly relegated to HelperFn, directory trees are now > created automatically > - the binary header of the env-file and the binary footer of the > script-file are filtered, thus the resulting files are recognized as ascii > files > > If you are interested in our modified version, let me know. > > Kind regards, > Lech > > >> Am 09.05.2019 um 17:37 schrieb Christopher Benjamin Coffey >> <chris.cof...@nau.edu>: >> >> Hi All, >> >> We created a slurm job script archiver which you may find handy. We >> initially attempted to do this through slurm with a slurmctld prolog but it >> really bogged the scheduler down. This new solution is a custom c++ program >> that uses inotify to watch for job scripts and environment files to show up >> out in /var/spool/slurm/hash.* on the head node. When they do, the program >> copies the jobscript and environment out to a local archive directory. The >> program is multithreaded and has a dedicated thread watching each hash >> directory. The program is super-fast and lightweight and has no side effects >> on the scheduler. The program by default will apply ACLs to the archived job >> scripts so that only the owner of the jobscript can read the files. Feel >> free to try it out and let us know how it works for you! >> >> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnauhpc%2Fjob_archive&data=02%7C01%7Cchris.coffey%40nau.edu%7Cce8cb62264b84a21e32608d6f0a9d9be%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C1%7C636961009635679145&sdata=k2%2BdZ90EE78r5PQz9GdblaEWIrPoY79T6gwkIcNrxNE%3D&reserved=0 >> >> Best, >> Chris >> >> — >> Christopher Coffey >> High-Performance Computing >> Northern Arizona University >> 928-523-1167