[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Markus Kötter via slurm-users
Hi, As their example was limited too "allgpus", I had posted my take on this on the nvidia developer blog. Basically all the same, but lookups the groupid from the dcgmi group json using jp instead of a file. https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-s

[slurm-users] Job information is not being added to accounting database on new setup

2024-10-17 Thread Adrian Brady via slurm-users
Hi Everyone, I'm a new to slurm administration and looking for a bit of help! Just added Accounting to an existing cluster but job information is not being added to the Accounting Mariadb. When I submit a test job it gets scheduled fine and its visible with squeue, I get nothing returned from s

[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Paul Raines via slurm-users
We do the same thing. Our prolog has == # setup DCGMI job stats if [ -n "$CUDA_VISIBLE_DEVICES" ] ; then if [ -d /var/slurm/gpu_stats.run ] ; then if pgrep -f nv-hostengine >/dev/null 2>&1 ; then groupstr=$(/usr/bin/dcgmi group -c J$SLURM_JOB_ID -a $CUDA_VISIBLE_DEVICES) grou

[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Sylvain MARET via slurm-users
Interesting solution didn't know it was possible to do this. Will try to test this also ! Sylvain On 17/10/2024 10:45, Pierre-Antoine Schnell via slurm-users wrote: CAUTION : External Sender. Please do not click on links or open attachments from senders you do not trust. Hello, we recently

[slurm-users] Re: How do you guys track which GPU is used by which job ?

2024-10-17 Thread Sylvain MARET via slurm-users
Started testing in prolog and you're right ! Before doing anything I wanted to see if there was a best practices. Regards, Sylvain Maret On 16/10/2024 18:03, Brian Andrus via slurm-users wrote:  CAUTION : External Sender. Please do not click on links or open attachments from senders you do not

[slurm-users] Re: Why AllowAccounts not work in slurm-23.11.6

2024-10-17 Thread Paul Raines via slurm-users
I am using Slurm 23.11.3 and it AllowAccounts works for me. We have a partition defied with AllowAccounts and if one tries to submit in an account not in the list one will get srun: error: Unable to allocate resources: Invalid account or account/partition combination specified Do you have

[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Pierre-Antoine Schnell via slurm-users
Hello, we recently started monitoring GPU usage on our GPUs with NVIDIA's DCGM: https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-slurm/ We create a new dcgmi group for each job and start the statistics retrieval for it in a prolog script. Then we stop the retr

[slurm-users] Issue with interactive jobs

2024-10-17 Thread Nerjes, Onno via slurm-users
Dear all, we've set up SLURM 24.05.3 on our cluster and are experiencing an issue with interactive jobs. Before, we used 21.08 and pretty much the same settings, but without these issues. We've started with a fresh DB etc. The behavior of interactive jobs is very erratic. Sometimes they start

[slurm-users] Re: Dependency jobs

2024-10-17 Thread Adam Holmes via slurm-users
Hi Laura, that might work for what we need to catch, Many Thanks, Adam -Original Message- From: Laura Hild via slurm-users Sent: 16 October 2024 16:49 To: a...@bramblecfd.com Cc: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Dependency jobs > I know you can show job info