[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Sylvain MARET via slurm-users
e analysis of it. Best wishes, Pierre-Antoine Schnell Am 16.10.24 um 15:10 schrieb Sylvain MARET via slurm-users: Hey guys ! I'm looking to improve GPU monitoring on our cluster. I want to install this https://github.com/NVIDIA/dcgm-exporter and saw in the README that it can support tra

[slurm-users] Re: How do you guys track which GPU is used by which job ?

2024-10-17 Thread Sylvain MARET via slurm-users
Started testing in prolog and you're right ! Before doing anything I wanted to see if there was a best practices. Regards, Sylvain Maret On 16/10/2024 18:03, Brian Andrus via slurm-users wrote:  CAUTION : External Sender. Please do not click on links or open attachments from senders you d

[slurm-users] How do you guys track which GPU is used by which job ?

2024-10-16 Thread Sylvain MARET via slurm-users
to hear it out ! Regards, Sylvain Maret -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: SLURM in K8s, any advice?

2024-03-13 Thread Sylvain MARET via slurm-users
er for the code to reproduce her experiment but I don't have the time yet to do so. Regards, Sylvain Maret On 13/03/2024 11:04, Nicolas Greneche via slurm-users wrote: CAUTION : External Sender. Please do not click on links or open attachments from senders you do not trust. Hi Alan, Your

[slurm-users] Need help managing licence

2024-02-16 Thread Sylvain MARET via slurm-users
sure slurm doesn't launch a job if 20 people are already running code with this license. How do you guys manage paid licenses on your cluster ? Any advice would be appreciated ! Regards, Sylvain Maret -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to

[slurm-users] Compilation question

2024-01-17 Thread Sylvain MARET
lurmfull.so: undefined reference to `slurm_setenvpf' /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so: undefined reference to `slurm_list_destroy' collect2: error: ld returned 1 exit status What can I do to resolve these undefined reference error ? Regards, Sylvain Maret