File=/dev/nvidia0
[cid:image001.jpg@01D6B784.E20E1C70]
Thanks
Jeff (ChaoFeng Zhang, 张超锋) PMP®
zhang...@lenovo.com<mailto:zhang...@lenovo.com>
HPC&AI | Cloud Software Architect (+86) - 18116117420
Software solution development
Yes, I am using the icinga2 plugin to do it, just want to check whether slurm
has already implemented.
Thanks
From: 肖正刚
Sent: Tuesday, October 27, 2020 8:19 AM
To: Chaofeng Zhang
Cc: slurm-users@lists.schedmd.com
Subject: Re: [External] Re: Is there a way to get the real-time cpu/memory
I can see some cpu usage information from command sstat and sacct, but that is
not what I want. I want to see the real-time cpu usage like the linux top
command.
Thanks
From: 肖正刚
Sent: Monday, October 26, 2020 1:25 PM
To: slurm-users@lists.schedmd.com; Chaofeng Zhang
Subject: [External] Re
Whether slurm has the command to get the real-time cpu/memory usage of job
process when job is running.
Or I must write the script to get it (scontrol listpids to get pid of process,
then script to get cpu/memory usage of process.)?
Thanks
Jeff (ChaoFeng Zhang, 张超锋) PMP®
zhang
puting
On 13. Oct 2020, at 08:26, Chaofeng Zhang
mailto:zhang...@lenovo.com>> wrote:
If there is no available resource when I submitting a job, I want slurm to
return job submission failed directly. How to configure sl
If there is no available resource when I submitting a job, I want slurm to
return job submission failed directly. How to configure slurm?
Thanks
Sorry, for the variable replacement, for 17.02, even I don’t set
CUDA_VISIBLE_DEVICES=NoDevFiles in the srun, it is the same result.
From: slurm-users On Behalf Of Chaofeng
Zhang
Sent: Friday, August 31, 2018 12:16 AM
To: Slurm User Community List
Subject: [External] Re: [slurm-users
CUDA_VISIBLE_DEVICES=NoDevFiles
CUDA_HOME=/usr/local/cuda
[root@head ~]#
From: Chaofeng Zhang
Sent: Friday, August 31, 2018 12:13 AM
To: Slurm User Community List
Subject: Whether I can replace value of the variable when use srun
export CUDA_VISIBLE_DEVICES=0,1
srun -N1 -n1 --nodelist=head --export
export CUDA_VISIBLE_DEVICES=0,1
srun -N1 -n1 --nodelist=head --export=CUDA_VISIBLE_DEVICES=NoDevFiles,ALL
env|grep CUDA
The srun result is CUDA_VISIBLE_DEVICES=0,1, how could I replace
CUDA_VISIBLE_DEVICES with NoDevFiles.
Thanks.
e004 ~]$ echo $CUDA_VISIBLE_DEVICES
0,1
=
On Aug 30, 2018, at 4:18 AM, Chaofeng Zhang
<mailto:zhang...@lenovo.com> wrote:
CUDA_VISBLE_DEVICES is used by many AI framework to determine which gpu to use,
like tensorflow. So this environment is critical to us.
-Original
DA_VISIBLE_DEVICES
0
[renfro@login ~]$ hpcshell --partition=gpu-interactive --gres=gpu:2
[renfro@gpunode004 ~]$ echo $CUDA_VISIBLE_DEVICES
0,1
=
> On Aug 30, 2018, at 4:18 AM, Chaofeng Zhang wrote:
>
> CUDA_VISBLE_DEVICES is used by many AI framework to determine which gpu to
&g
: [External] Re: [slurm-users] serious bug about CUDA_VISBLE_DEVICES in
the slurm 17.11.7
On Thursday, 30 August 2018 6:38:08 PM AEST Chaofeng Zhang wrote:
> The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7.
> This is worked when we use Slurm 17.02.
You probably shou
The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7. This is
worked when we use Slurm 17.02.
Slurm 17.02:
[root@head ~]# export CUDA_VISIBLE_DEVICES=0,1
[root@head ~]# srun -N1 -n1 --gres=none --nodelist=head /usr/bin/env|grep CUDA
CUDA_HOME=/usr/local/cuda
CUDA_VISIBLE_DEVICES=NoD
Below is worked for cpu, with OverSubscribe, I can have more than 4 process in
running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will
be just 1 process in running status, the other are in pending status.
The OverSubscribe can just be used for the resource cpu, whether it
14 matches
Mail list logo