date:20210608

Re: [slurm-users] Job requesting two different GPUs on two different nodes

2021-06-08 Thread Loris Bennett

Hi Gestió,: Gestió Servidors writes: > Hi, > > > > Today, doing some tests, I have not got a solution to write a submit script > that requests 2 different GPUs on 2 different nodes. With this simple script: > > > > #!/bin/bash > # > #SBATCH --job-name=N2n4 > #SBATCH --output=N2n4-CUDA.txt >

Re: [slurm-users] Kill job when child process gets OOM-killed

2021-06-08 Thread Arthur Gilly

I could say that the limit on max array sizes is lower on our cluster, and we start to see I/O problems very fast as parallelism scales (which we can limit with % as you mention). But the actual reason is simpler, as I mentioned we have an entire collection of scripts which were written for a pr

Re: [slurm-users] Maui equivalent Nodeallocationpolicy

2021-06-08 Thread Lyn Gerner

David, take a look at the various instances of the string "LLN" throughout slurm.conf, as well as pack_serial_at_end. (I suspect you may want LLN=no on your partition definition.) Best, Lyn On Tue, Jun 8, 2021 at 11:51 AM David Chaffin wrote: > replying to myself as I can't quite figure out how

Re: [slurm-users] Maui equivalent Nodeallocationpolicy

2021-06-08 Thread David Chaffin

replying to myself as I can't quite figure out how to reply to Jurgen in digest. Jurgen's pointers are good for some of our other issues, but I misstated the question, it should have been how do I send the small HTC job to the node that has currently the smallest free cores? RTFM and I think this

[slurm-users] Job requesting two different GPUs on two different nodes

2021-06-08 Thread Gestió Servidors

Hi, Today, doing some tests, I have not got a solution to write a submit script that requests 2 different GPUs on 2 different nodes. With this simple script: #!/bin/bash # #SBATCH --job-name=N2n4 #SBATCH --output=N2n4-CUDA.txt #SBATCH --gres=gpu:GeForceRTX3080:1 #SBATCH -N 2 # number of nodes #

Re: [slurm-users] Kill job when child process gets OOM-killed

2021-06-08 Thread Renfro, Michael

Any reason *not* to create an array of 100k jobs and let the scheduler just handle things? Current versions of Slurm support arrays of up to 4M jobs, and you can limit the number of jobs running simultaneously with the '%' specifier in your array= sbatch parameter. From: slurm-users on behalf

Re: [slurm-users] Kill job when child process gets OOM-killed

2021-06-08 Thread Arthur Gilly

Thank you Loris! Like many of our jobs, this is an embarrassingly parallel analysis, where we have to strike a compromise between what would be a completely granular array of >100,000 small jobs or some kind of serialisation through loops. So the individual jobs where I noticed this behaviou

Re: [slurm-users] Kill job when child process gets OOM-killed

2021-06-08 Thread Loris Bennett

Dear Arthur, Arthur Gilly writes: > Dear Slurm users, > > > > I am looking for a SLURM setting that will kill a job immediately when any > subprocess of that job hits an OOM limit. Several posts have touched upon > that, e.g: > https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg0

[slurm-users] Kill job when child process gets OOM-killed

2021-06-08 Thread Arthur Gilly

Dear Slurm users, I am looking for a SLURM setting that will kill a job immediately when any subprocess of that job hits an OOM limit. Several posts have touched upon that, e.g: https://www.mail-archive.com/slurm-users@l

Re: [slurm-users] Slurm stats in JSON format

2021-06-08 Thread Ward Poelmans

On 8/06/2021 00:27, Sid Young wrote: > Is there a tool that will extract the job counts in JSON format? Such as > #running, #in pending #onhold etc > > I am trying to build some custom dashboards for the our new cluster and this > would be a really useful set of metrics to gather and display.

Re: [slurm-users] Job requesting two different GPUs on two different nodes

Re: [slurm-users] Kill job when child process gets OOM-killed

Re: [slurm-users] Maui equivalent Nodeallocationpolicy

Re: [slurm-users] Maui equivalent Nodeallocationpolicy

[slurm-users] Job requesting two different GPUs on two different nodes

Re: [slurm-users] Kill job when child process gets OOM-killed

Re: [slurm-users] Kill job when child process gets OOM-killed

Re: [slurm-users] Kill job when child process gets OOM-killed

[slurm-users] Kill job when child process gets OOM-killed

Re: [slurm-users] Slurm stats in JSON format

10 matches

Site Navigation

Mail list logo

Footer information