Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ryan Novosielski
> On Jul 23, 2018, at 10:31 PM, Ian Mortimer wrote: > > On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote: > >> Best off running nvidia-persistenced. Handles all of this stuff as a >> side effect, and also enables persistence mode, provided you don’t >> configure it otherwise. > > Yes.

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ian Mortimer
On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote: > Best off running nvidia-persistenced. Handles all of this stuff as a > side effect, and also enables persistence mode, provided you don’t > configure it otherwise.  Yes. But you have to ensure it starts before slurmd. -- Ian

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ryan Novosielski
Best off running nvidia-persistenced. Handles all of this stuff as a side effect, and also enables persistence mode, provided you don’t configure it otherwise. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski -

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Sean Crosby
Hi Alex, What's the actual content of your gres.conf file? Seems to me that you have a trailing comma after the location of the nvidia device Our gres.conf has NodeName=gpuhost[001-077] Name=gpu Type=p100 File=/dev/nvidia0 Cores=0,2,4,6,8,10,12,14,16,18,20,22 NodeName=gpuhost[001-077] Name=gpu T

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ian Mortimer
On Mon, 2018-07-23 at 15:59 -0700, Alex Chekholko wrote: > However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just > fine and produce expected output. They will because running nvidia-smi triggers loading of the kernel module and creation of the device files. But are the device file

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Nicholas McCollum
You may want to check and make sure your GPUs are in persistance mode. You can enable it through the nvidia-smi utility. Nicholas McCollum Alabama Supercomputer Authority From: Alex Chekholko Sent: Monday, July 23, 2018 6:00 PM To: Slurm User Community List Subj

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Alex Chekholko
Thanks for the suggestion; if my memory serves me right, I had to do that previously to cause the drivers to load correctly after boot. However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just fine and produce expected output. One thing I see that is different is my older nodes have t

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Bill
Hi Alex, Try run nvidia-smi before start slurmd, I also found this issue. I have to run nvidia-smi before slurmd when I reboot system. Regards, Bill -- Original -- From: Alex Chekholko Date: Tue,Jul 24,2018 6:10 AM To: Slurm User Community List Subject: Re: [s

[slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Alex Chekholko
Hi all, I have a few working GPU compute nodes. I bought a couple of more identical nodes. They are all diskless; so they all boot from the same disk image. For some reason slurmd refuses to start on the new nodes; and I'm not able to find any differences in hardware or software. Google search