> On Jul 23, 2018, at 10:31 PM, Ian Mortimer wrote:
>
> On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote:
>
>> Best off running nvidia-persistenced. Handles all of this stuff as a
>> side effect, and also enables persistence mode, provided you don’t
>> configure it otherwise.
>
> Yes.
On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote:
> Best off running nvidia-persistenced. Handles all of this stuff as a
> side effect, and also enables persistence mode, provided you don’t
> configure it otherwise.
Yes. But you have to ensure it starts before slurmd.
--
Ian
Best off running nvidia-persistenced. Handles all of this stuff as a side
effect, and also enables persistence mode, provided you don’t configure it
otherwise.
--
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski -
Hi Alex,
What's the actual content of your gres.conf file? Seems to me that you have
a trailing comma after the location of the nvidia device
Our gres.conf has
NodeName=gpuhost[001-077] Name=gpu Type=p100 File=/dev/nvidia0
Cores=0,2,4,6,8,10,12,14,16,18,20,22
NodeName=gpuhost[001-077] Name=gpu T
On Mon, 2018-07-23 at 15:59 -0700, Alex Chekholko wrote:
> However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just
> fine and produce expected output.
They will because running nvidia-smi triggers loading of the kernel
module and creation of the device files. But are the device file
You may want to check and make sure your GPUs are in persistance mode. You can
enable it through the nvidia-smi utility.
Nicholas McCollum
Alabama Supercomputer Authority
From: Alex Chekholko
Sent: Monday, July 23, 2018 6:00 PM
To: Slurm User Community List
Subj
Thanks for the suggestion; if my memory serves me right, I had to do that
previously to cause the drivers to load correctly after boot.
However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just fine
and produce expected output.
One thing I see that is different is my older nodes have t
Hi Alex,
Try run nvidia-smi before start slurmd, I also found this issue. I have to run
nvidia-smi before slurmd when I reboot system.
Regards,
Bill
-- Original --
From: Alex Chekholko
Date: Tue,Jul 24,2018 6:10 AM
To: Slurm User Community List
Subject: Re: [s
Hi all,
I have a few working GPU compute nodes. I bought a couple of more
identical nodes. They are all diskless; so they all boot from the same
disk image.
For some reason slurmd refuses to start on the new nodes; and I'm not able
to find any differences in hardware or software. Google search