Hi Chris,

you’ll find the patch for our version attached. Integrate it as you see fit, 
personally I’d recommend a branch since the two log files approach isn’t really 
reconcilable with the idea of having separate job files accessible to the 
respective owner.
All filenames and directories are defined with „#define“ pragmas, as it was 
more convenient to have them all in one place.

Kind regards,
Lech

Attachment: job_archive.patch.gz
Description: GNU Zip compressed data


> Am 15.06.2019 um 00:47 schrieb Christopher Benjamin Coffey 
> <chris.cof...@nau.edu>:
> 
> Hi Lech,
> 
> I'm glad that it is working out well with the modifications you've put in 
> place! Yes, there can be a huge volume of jobscripts out there. That’s a 
> pretty good way of dealing with it! . We've backed up 1.1M jobscripts since 
> its inception 1.5 months ago and aren't too worried yet about the inode/space 
> usage. We haven't settled in to what we will do to keep the archive clean 
> yet. My thought was:
> 
> - keep two months (directories) of jobscripts for each user, leaving the 
> jobscripts intact for easy user access
> - tar up the month directories that are older than two months
> - keep four tarred months
> 
> That way there would be 6 months of jobscript archive to match our 6 month 
> job accounting retention in the slurm db.
> 
> I'd be interested in your version however, please do send it along! And 
> please keep in touch with how everything goes!
> 
> Best,
> Chris
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
> 
> 
> On 6/14/19, 2:22 AM, "slurm-users on behalf of Lech Nieroda" 
> <slurm-users-boun...@lists.schedmd.com on behalf of 
> lech.nier...@uni-koeln.de> wrote:
> 
>    Hello Chris,
> 
>    we’ve tried out your archiver and adapted it to our needs, it works quite 
> well.
>    The changes:
>    - we get lots of jobs per day, ca. 3k-5k, so storing them as individual 
> files would waste too much inodes and 4k-blocks. Instead everything is 
> written into two log files (job_script.log and job_env.log) with the prefix 
> „<timestamp> <user> <jobid>“ in each line. In this way one can easily grep 
> and cut the corresponding job script or environment. Long term storage and 
> compression is handled by logrotate, with standard compression settings
>    - the parsing part can fail to produce a username, thus we have introduced 
> a customized environment variable that stores the username and can be read 
> directly by the archiver 
>    - most of the program’s output, including debug output, is handled by the 
> logger and stored in a jobarchive.log file with an appropriate timestamp
>    - the logger uses a va_list to make multi-argument log-oneliners possible
>    - signal handling is reduced to the debug-level incease/decrease
>    - file handling is mostly relegated to HelperFn, directory trees are now 
> created automatically
>    - the binary header of the env-file and the binary footer of the 
> script-file are filtered, thus the resulting files are recognized as ascii 
> files
> 
>    If you are interested in our modified version, let me know.
> 
>    Kind regards,
>    Lech
> 
> 
>> Am 09.05.2019 um 17:37 schrieb Christopher Benjamin Coffey 
>> <chris.cof...@nau.edu>:
>> 
>> Hi All,
>> 
>> We created a slurm job script archiver which you may find handy. We 
>> initially attempted to do this through slurm with a slurmctld prolog but it 
>> really bogged the scheduler down. This new solution is a custom c++ program 
>> that uses inotify to watch for job scripts and environment files to show up 
>> out in /var/spool/slurm/hash.* on the head node. When they do, the program 
>> copies the jobscript and environment out to a local archive directory. The 
>> program is multithreaded and has a dedicated thread watching each hash 
>> directory. The program is super-fast and lightweight and has no side effects 
>> on the scheduler. The program by default will apply ACLs to the archived job 
>> scripts so that only the owner of the jobscript can read the files. Feel 
>> free to try it out and let us know how it works for you!
>> 
>> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnauhpc%2Fjob_archive&amp;data=02%7C01%7Cchris.coffey%40nau.edu%7Cce8cb62264b84a21e32608d6f0a9d9be%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C1%7C636961009635679145&amp;sdata=k2%2BdZ90EE78r5PQz9GdblaEWIrPoY79T6gwkIcNrxNE%3D&amp;reserved=0
>> 
>> Best,
>> Chris
>> 
>> —
>> Christopher Coffey
>> High-Performance Computing
>> Northern Arizona University
>> 928-523-1167

Reply via email to