Hi Loris, This is our submit filter for what you're asking. It checks for both --gres and --gpus
ESLURM_INVALID_GRES=2072 ESLURM_BAD_TASK_COUNT=2025 if ( job_desc.partition ~= slurm.NO_VAL ) then if (job_desc.partition ~= nil) then if (string.match(job_desc.partition,"gpgpu") or string.match(job_desc.partition,"gpgputest")) then --slurm.log_info("slurm_job_submit (lua): detect job for gpgpu partition") --Alert on invalid gpu count - eg: gpu:0 , gpu:p100:0 if (job_desc.gres and string.find(job_desc.gres, "gpu")) then local numgpu = string.match(job_desc.gres, ":%d+$") if(numgpu ~= nil) then numgpu = numgpu:gsub(':', '') if ( tonumber(numgpu) < 1) then slurm.log_user("Invalid GPGPU count specified in GRES, must be greater than 0") return ESLURM_INVALID_GRES end end else --Alternative use gpus in new version of slurm if (job_desc.tres_per_node == nil) then if (job_desc.tres_per_socket == nil) then if (job_desc.tres_per_task == nil) then slurm.log_user("You tried submitting to a GPGPU partition, but you didn't request one with GRES or GPUS") return ESLURM_INVALID_GRES else if (job_desc.num_tasks == slurm.NO_VAL) then slurm.user_msg("--gpus-per-task option requires --tasks specification") return ESLURM_BAD_TASK_COUNT end end end end end end end Let me know if you improve it please? We're always on the hunt to fix up some of the logic in the submit filter. Cheers, Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Fri, 4 Dec 2020 at 23:58, Loris Bennett <loris.benn...@fu-berlin.de> wrote: > UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts > > Hi, > > I want to reject jobs that don't specify any GPUs when accessing our GPU > partition and have the following in job_submit.lua: > > if (job_desc.partition == "gpu" and job_desc.gres == nil ) then > slurm.log_user(string.format("Please request GPU resources in the > partition 'gpu', " .. > "e.g. '#SBATCH --gres=gpu:1' " .. > "Please see 'man sbatch' for more > details)")) > slurm.log_info(string.format("check_parameters: user '%s' did not > request GPUs in partition 'gpu'", > username)) > return slurm.ERROR > end > > If GRES is not given for the GPU partition, this produces > > sbatch: error: Please request GPU resources in the partition 'gpu', e.g. > '#SBATCH --gres=gpu:1' Please see 'man sbatch' for more details) > sbatch: error: Batch job submission failed: Unspecified error > > My questions are: > > 1. Is there a better error to return? The 'slurm.ERROR' produces the > generic second error line above (slurm_errno.h just seems to have > ESLURM_MISSING_TIME_LIMIT and ESLURM_INVALID_KNL as errors a plugin > might raise). This is misleading, since the error is in fact known > and specific. > 2. I am right in thinking that 'job_desc' does not, as of 20.02.06, have > a 'gpus' field corresponding to the sbatch/srun option '--gpus'? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Hr./Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > >