Hello Chris, we’ve tried out your archiver and adapted it to our needs, it works quite well. The changes: - we get lots of jobs per day, ca. 3k-5k, so storing them as individual files would waste too much inodes and 4k-blocks. Instead everything is written into two log files (job_script.log and job_env.log) with the prefix „<timestamp> <user> <jobid>“ in each line. In this way one can easily grep and cut the corresponding job script or environment. Long term storage and compression is handled by logrotate, with standard compression settings - the parsing part can fail to produce a username, thus we have introduced a customized environment variable that stores the username and can be read directly by the archiver - most of the program’s output, including debug output, is handled by the logger and stored in a jobarchive.log file with an appropriate timestamp - the logger uses a va_list to make multi-argument log-oneliners possible - signal handling is reduced to the debug-level incease/decrease - file handling is mostly relegated to HelperFn, directory trees are now created automatically - the binary header of the env-file and the binary footer of the script-file are filtered, thus the resulting files are recognized as ascii files
If you are interested in our modified version, let me know. Kind regards, Lech > Am 09.05.2019 um 17:37 schrieb Christopher Benjamin Coffey > <chris.cof...@nau.edu>: > > Hi All, > > We created a slurm job script archiver which you may find handy. We initially > attempted to do this through slurm with a slurmctld prolog but it really > bogged the scheduler down. This new solution is a custom c++ program that > uses inotify to watch for job scripts and environment files to show up out in > /var/spool/slurm/hash.* on the head node. When they do, the program copies > the jobscript and environment out to a local archive directory. The program > is multithreaded and has a dedicated thread watching each hash directory. The > program is super-fast and lightweight and has no side effects on the > scheduler. The program by default will apply ACLs to the archived job scripts > so that only the owner of the jobscript can read the files. Feel free to try > it out and let us know how it works for you! > > https://github.com/nauhpc/job_archive > > Best, > Chris > > — > Christopher Coffey > High-Performance Computing > Northern Arizona University > 928-523-1167 > >