Thanks Michael. I noticed a couple of questions on the mailing list
mentioning GRES lately. I will share that information to our SLURM
administrators.
Cheers
On Thu, 21 Mar 2019 at 12:56, Renfro, Michael wrote:
> I think all you’re looking for is Generic Resource (GRES) scheduling,
> starting a
I think all you’re looking for is Generic Resource (GRES) scheduling, starting
at https://slurm.schedmd.com/gres.html — if you’ve already seen that, then more
details would be helpful.
If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to
4 of those jobs and leave the
Hi,
I am new to SLURM.
I have access to a cluster where one of the node has 4 GPUs
We are running version SLURM 17.11.12
Is there some SBATCH token=value pair value I can use to submit jobs
(each of which has an application that is only able to utilize 1 GPU) so
that if I submit 6 copie
On 3/20/19 4:20 AM, Frava wrote:
Hi Chris, thank you for the reply.
The team that manages that cluster is not very fond of upgrading SLURM,
which I understand.
Do be aware that Slurm 17.11 will stop being maintained once 19.05 is
released in May.
So basically my heterogeneous job that only
On 3/20/19 9:09 AM, Peter Steinbach wrote:
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf
file, everything stops working again. :/ Should I send around scontrol
outputs? And yes, I watched out to set the --mem flag for the job
submission this time.
Well there you've sa
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf
file, everything stops working again. :/ Should I send around scontrol
outputs? And yes, I watched out to set the --mem flag for the job
submission this time.
Best,
Peter
smime.p7s
Description: S/MIME Cryptographic Signat
Config details:
- Slurm v17.11.8
- QOS-based preemption
- Backfill scheduler (default parameters)
- QOS:
- "normal" = PreemptMode=CANCEL, GraceTime=5 minutes
- Per-stakeholder = Preempt=normal GrpTRES=
- Partitions:
- "standard" (default) = QOS=normal
- Per-stakeholder = QOS=
When users need prio
Hi Philippe,
thanks for spotting this. This indeed appears to solve this first issue.
Now I can try to make the GPUs available and play with pinning etc.
Superb - if you happen to be at ISC, let me know. I'd buy you a
coffee/beer! ;)
Peter
smime.p7s
Description: S/MIME Cryptographic Sign
Hello Peter,
In order to run non-gres and gres jobs on the g1 node, you should try to limit
the memory requested for each job, for instance if suitable:
# sbatch --wrap="sleep 100 && env" -o singlecpu.log -c 3 -J cpu --mem=100
# sbatch --wrap="sleep 6 && env" -o gres.log -c 1 -J gpu --gres=g
Hello,
When I redefine the stdout or stderror to files into a folder that does
not exist the job fails instead of creating the folder
I do not know if it is the expected behavior and if so, if it can be
changed (with some config parameter or environment variable).
I have another installatio
Hi Chris, thank you for the reply.
The team that manages that cluster is not very fond of upgrading SLURM,
which I understand.
So basically my heterogeneous job that only have one step is considered to
have multiple steps and that's a bug in SLURM 17.11.12 ?
Le mer. 20 mars 2019 à 07:02, Chris Sa
Hi Chris,
I changed the initial state a bit (the number of cores per node was
misconfigured):
https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf
But that doesn't change things. Initially, I see this:
# sinfo -N -l
Wed Mar 20 09:03:26 2019
NODELIST NO
Hi Will,
I solved this by creating a new GRES:
Some nodes have VRAM:no_consume:12G
Some nodes have VRAM:no_consume:24G
"no_consume" because it would be for the whole node otherwise.
It only works because the nodes only have one type of GPUs each.
It is then requested with --gres=gpu:1,VRAM:16G
13 matches
Mail list logo