Hi Sean, Thanks for the code - looks like you have put a lot more thought into it than I have into mine. I'll certainly have to look at handling the 'tres-per-*' options.
By the way, how to you do your testing? As I don't have at test cluster, currently I'm doing "open heart" testing, but I really need a minimal test cluster, maybe using VMs. Cheers, Loris Sean Crosby <scro...@unimelb.edu.au> writes: > Hi Loris, > > This is our submit filter for what you're asking. It checks for both --gres > and --gpus > > ESLURM_INVALID_GRES=2072 > ESLURM_BAD_TASK_COUNT=2025 > if ( job_desc.partition ~= slurm.NO_VAL ) then > if (job_desc.partition ~= nil) then > if (string.match(job_desc.partition,"gpgpu") or > string.match(job_desc.partition,"gpgputest")) then > --slurm.log_info("slurm_job_submit (lua): detect job for gpgpu > partition") > --Alert on invalid gpu count - eg: gpu:0 , gpu:p100:0 > if (job_desc.gres and string.find(job_desc.gres, "gpu")) then > local numgpu = string.match(job_desc.gres, ":%d+$") > if(numgpu ~= nil) then > numgpu = numgpu:gsub(':', '') > if ( tonumber(numgpu) < 1) then > slurm.log_user("Invalid GPGPU count specified in GRES, must > be greater than 0") > return ESLURM_INVALID_GRES > end > end > else > --Alternative use gpus in new version of slurm > if (job_desc.tres_per_node == nil) then > if (job_desc.tres_per_socket == nil) then > if (job_desc.tres_per_task == nil) then > slurm.log_user("You tried submitting to a GPGPU partition, > but you didn't request one with GRES or GPUS") > return ESLURM_INVALID_GRES > else > if (job_desc.num_tasks == slurm.NO_VAL) then > slurm.user_msg("--gpus-per-task option requires --tasks > specification") > return ESLURM_BAD_TASK_COUNT > end > end > end > end > end > end > end > > Let me know if you improve it please? We're always on the hunt to fix up some > of the logic in the submit filter. > > Cheers, > Sean > > -- > Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead > Research Computing Services | Business Services > The University of Melbourne, Victoria 3010 Australia > > On Fri, 4 Dec 2020 at 23:58, Loris Bennett <loris.benn...@fu-berlin.de> wrote: > > UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts > > Hi, > > I want to reject jobs that don't specify any GPUs when accessing our GPU > partition and have the following in job_submit.lua: > > if (job_desc.partition == "gpu" and job_desc.gres == nil ) then > slurm.log_user(string.format("Please request GPU resources in the > partition 'gpu', " .. > "e.g. '#SBATCH --gres=gpu:1' " .. > "Please see 'man sbatch' for more > details)")) > slurm.log_info(string.format("check_parameters: user '%s' did not > request GPUs in partition 'gpu'", > username)) > return slurm.ERROR > end > > If GRES is not given for the GPU partition, this produces > > sbatch: error: Please request GPU resources in the partition 'gpu', e.g. > '#SBATCH --gres=gpu:1' Please see 'man sbatch' for more details) > sbatch: error: Batch job submission failed: Unspecified error > > My questions are: > > 1. Is there a better error to return? The 'slurm.ERROR' produces the > generic second error line above (slurm_errno.h just seems to have > ESLURM_MISSING_TIME_LIMIT and ESLURM_INVALID_KNL as errors a plugin > might raise). This is misleading, since the error is in fact known > and specific. > 2. I am right in thinking that 'job_desc' does not, as of 20.02.06, have > a 'gpus' field corresponding to the sbatch/srun option '--gpus'? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Hr./Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de