Hi,
As their example was limited too "allgpus", I had posted my take on this
on the nvidia developer blog.
Basically all the same, but lookups the groupid from the dcgmi group
json using jp instead of a file.
https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-s
Hi Everyone,
I'm a new to slurm administration and looking for a bit of help!
Just added Accounting to an existing cluster but job information is not being
added to the Accounting Mariadb. When I submit a test job it gets scheduled
fine and its visible with squeue, I get nothing returned from s
We do the same thing. Our prolog has
==
# setup DCGMI job stats
if [ -n "$CUDA_VISIBLE_DEVICES" ] ; then
if [ -d /var/slurm/gpu_stats.run ] ; then
if pgrep -f nv-hostengine >/dev/null 2>&1 ; then
groupstr=$(/usr/bin/dcgmi group -c J$SLURM_JOB_ID -a
$CUDA_VISIBLE_DEVICES)
grou
Interesting solution didn't know it was possible to do this.
Will try to test this also !
Sylvain
On 17/10/2024 10:45, Pierre-Antoine Schnell via slurm-users wrote:
CAUTION : External Sender. Please do not click on links or open
attachments from senders you do not trust.
Hello,
we recently
Started testing in prolog and you're right !
Before doing anything I wanted to see if there was a best practices.
Regards,
Sylvain Maret
On 16/10/2024 18:03, Brian Andrus via slurm-users wrote:
CAUTION : External Sender. Please do not click on links or open
attachments from senders you do not
I am using Slurm 23.11.3 and it AllowAccounts works for me. We
have a partition defied with AllowAccounts and if one tries to
submit in an account not in the list one will get
srun: error: Unable to allocate resources: Invalid account or
account/partition combination specified
Do you have
Hello,
we recently started monitoring GPU usage on our GPUs with NVIDIA's DCGM:
https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-slurm/
We create a new dcgmi group for each job and start the statistics
retrieval for it in a prolog script.
Then we stop the retr
Dear all,
we've set up SLURM 24.05.3 on our cluster and are experiencing an issue with
interactive jobs. Before, we used 21.08 and pretty much the same settings, but
without these issues. We've started with a fresh DB etc.
The behavior of interactive jobs is very erratic. Sometimes they start
Hi Laura,
that might work for what we need to catch,
Many Thanks,
Adam
-Original Message-
From: Laura Hild via slurm-users
Sent: 16 October 2024 16:49
To: a...@bramblecfd.com
Cc: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: Dependency jobs
> I know you can show job info