On Tue, Jun 12, 2018 at 02:28:25PM +1000, Chris Samuel wrote: > On Sunday, 10 June 2018 1:48:18 AM AEST Skylar Thompson wrote: > > > Unfortunately we don't have a mechanism to limit > > network usage or local scratch usage > > Our trick in Slurm is to use the slurmdprolog script to set an XFS project > quota for that job ID on the per-job directory (created by a plugin which > also makes subdirectories there that it maps to /tmp and /var/tmp for the > job) on the XFS partition used for local scratch on the node. > > If they don't request an amount via the --tmp= option then they get a default > of 100MB. Snipping the relevant segments out of our prolog... > > JOBSCRATCH=/jobfs/local/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT} > > if [ -d ${JOBSCRATCH} ]; then > QUOTA=$(/apps/slurm/latest/bin/scontrol show JobId=${SLURM_JOB_ID} | > egrep MinTmpDiskNode=[0-9] | awk -F= '{print $NF}') > if [ "${QUOTA}" == "0" ]; then > QUOTA=100M > fi > /usr/sbin/xfs_quota -x -c "project -s -p ${JOBSCRATCH} > ${SLURM_JOB_ID}" /jobfs/local > /usr/sbin/xfs_quota -x -c "limit -p bhard=${QUOTA} ${SLURM_JOB_ID}" > /jobfs/local
Thanks, Chris! We've been considering doing this with GE prolog/epilog scripts (and boot-time logic to clean up if a node dies with scratch space still allocated) but haven't gotten around to it. I think we might also need to get buy-in from some groups that are happy with the unenforced state right now. -- Skylar _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf