Hello Chris,

we’ve tried out your archiver and adapted it to our needs, it works quite well.
The changes:
- we get lots of jobs per day, ca. 3k-5k, so storing them as individual files 
would waste too much inodes and 4k-blocks. Instead everything is written into 
two log files (job_script.log and job_env.log) with the prefix „<timestamp> 
<user> <jobid>“ in each line. In this way one can easily grep and cut the 
corresponding job script or environment. Long term storage and compression is 
handled by logrotate, with standard compression settings
- the parsing part can fail to produce a username, thus we have introduced a 
customized environment variable that stores the username and can be read 
directly by the archiver 
- most of the program’s output, including debug output, is handled by the 
logger and stored in a jobarchive.log file with an appropriate timestamp
- the logger uses a va_list to make multi-argument log-oneliners possible
- signal handling is reduced to the debug-level incease/decrease
- file handling is mostly relegated to HelperFn, directory trees are now 
created automatically
- the binary header of the env-file and the binary footer of the script-file 
are filtered, thus the resulting files are recognized as ascii files

If you are interested in our modified version, let me know.

Kind regards,
Lech


> Am 09.05.2019 um 17:37 schrieb Christopher Benjamin Coffey 
> <chris.cof...@nau.edu>:
> 
> Hi All,
> 
> We created a slurm job script archiver which you may find handy. We initially 
> attempted to do this through slurm with a slurmctld prolog but it really 
> bogged the scheduler down. This new solution is a custom c++ program that 
> uses inotify to watch for job scripts and environment files to show up out in 
> /var/spool/slurm/hash.* on the head node. When they do, the program copies 
> the jobscript and environment out to a local archive directory. The program 
> is multithreaded and has a dedicated thread watching each hash directory. The 
> program is super-fast and lightweight and has no side effects on the 
> scheduler. The program by default will apply ACLs to the archived job scripts 
> so that only the owner of the jobscript can read the files. Feel free to try 
> it out and let us know how it works for you!
> 
> https://github.com/nauhpc/job_archive
> 
> Best,
> Chris
> 
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
> 
> 


Reply via email to