[slurm-users] Whether slurm can support GPU MIG feature

2020-11-10 Thread Chaofeng Zhang
File=/dev/nvidia0 [cid:image001.jpg@01D6B784.E20E1C70] Thanks Jeff (ChaoFeng Zhang, 张超锋) PMP® zhang...@lenovo.com<mailto:zhang...@lenovo.com> HPC&AI | Cloud Software Architect (+86) - 18116117420 Software solution development

Re: [slurm-users] [External] Re: Is there a way to get the real-time cpu/memory usage of job processes by using slurm command. (Chaofeng Zhang)

2020-10-26 Thread Chaofeng Zhang
Yes, I am using the icinga2 plugin to do it, just want to check whether slurm has already implemented. Thanks From: 肖正刚 Sent: Tuesday, October 27, 2020 8:19 AM To: Chaofeng Zhang Cc: slurm-users@lists.schedmd.com Subject: Re: [External] Re: Is there a way to get the real-time cpu/memory

Re: [slurm-users] [External] Re: Is there a way to get the real-time cpu/memory usage of job processes by using slurm command. (Chaofeng Zhang)

2020-10-25 Thread Chaofeng Zhang
I can see some cpu usage information from command sstat and sacct, but that is not what I want. I want to see the real-time cpu usage like the linux top command. Thanks From: 肖正刚 Sent: Monday, October 26, 2020 1:25 PM To: slurm-users@lists.schedmd.com; Chaofeng Zhang Subject: [External] Re

[slurm-users] Is there a way to get the real-time cpu/memory usage of job processes by using slurm command.

2020-10-25 Thread Chaofeng Zhang
Whether slurm has the command to get the real-time cpu/memory usage of job process when job is running. Or I must write the script to get it (scontrol listpids to get pid of process, then script to get cpu/memory usage of process.)? Thanks Jeff (ChaoFeng Zhang, 张超锋) PMP® zhang

Re: [slurm-users] [External] Re: How to configure slurm not queuing the jobs?

2020-10-13 Thread Chaofeng Zhang
puting On 13. Oct 2020, at 08:26, Chaofeng Zhang mailto:zhang...@lenovo.com>> wrote: If there is no available resource when I submitting a job, I want slurm to return job submission failed directly. How to configure sl

[slurm-users] How to configure slurm not queuing the jobs?

2020-10-12 Thread Chaofeng Zhang
If there is no available resource when I submitting a job, I want slurm to return job submission failed directly. How to configure slurm? Thanks

Re: [slurm-users] [External] Re: Whether I can replace value of the variable when use srun

2018-08-30 Thread Chaofeng Zhang
Sorry, for the variable replacement, for 17.02, even I don’t set CUDA_VISIBLE_DEVICES=NoDevFiles in the srun, it is the same result. From: slurm-users On Behalf Of Chaofeng Zhang Sent: Friday, August 31, 2018 12:16 AM To: Slurm User Community List Subject: [External] Re: [slurm-users

Re: [slurm-users] Whether I can replace value of the variable when use srun

2018-08-30 Thread Chaofeng Zhang
CUDA_VISIBLE_DEVICES=NoDevFiles CUDA_HOME=/usr/local/cuda [root@head ~]# From: Chaofeng Zhang Sent: Friday, August 31, 2018 12:13 AM To: Slurm User Community List Subject: Whether I can replace value of the variable when use srun export CUDA_VISIBLE_DEVICES=0,1 srun -N1 -n1 --nodelist=head --export

[slurm-users] Whether I can replace value of the variable when use srun

2018-08-30 Thread Chaofeng Zhang
export CUDA_VISIBLE_DEVICES=0,1 srun -N1 -n1 --nodelist=head --export=CUDA_VISIBLE_DEVICES=NoDevFiles,ALL env|grep CUDA The srun result is CUDA_VISIBLE_DEVICES=0,1, how could I replace CUDA_VISIBLE_DEVICES with NoDevFiles. Thanks.

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
e004 ~]$ echo $CUDA_VISIBLE_DEVICES 0,1 = On Aug 30, 2018, at 4:18 AM, Chaofeng Zhang <mailto:zhang...@lenovo.com> wrote: CUDA_VISBLE_DEVICES is used by many AI framework to determine which gpu to use, like tensorflow. So this environment is critical to us. -Original

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
DA_VISIBLE_DEVICES 0 [renfro@login ~]$ hpcshell --partition=gpu-interactive --gres=gpu:2 [renfro@gpunode004 ~]$ echo $CUDA_VISIBLE_DEVICES 0,1 = > On Aug 30, 2018, at 4:18 AM, Chaofeng Zhang wrote: > > CUDA_VISBLE_DEVICES is used by many AI framework to determine which gpu to &g

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
: [External] Re: [slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7 On Thursday, 30 August 2018 6:38:08 PM AEST Chaofeng Zhang wrote: > The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7. > This is worked when we use Slurm 17.02. You probably shou

[slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7. This is worked when we use Slurm 17.02. Slurm 17.02: [root@head ~]# export CUDA_VISIBLE_DEVICES=0,1 [root@head ~]# srun -N1 -n1 --gres=none --nodelist=head /usr/bin/env|grep CUDA CUDA_HOME=/usr/local/cuda CUDA_VISIBLE_DEVICES=NoD

[slurm-users] OverSubscribe can be used for cpu, but not worked for GPU?

2018-03-09 Thread Chaofeng Zhang
Below is worked for cpu, with OverSubscribe, I can have more than 4 process in running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will be just 1 process in running status, the other are in pending status. The OverSubscribe can just be used for the resource cpu, whether it