Hello all,
My error was indeed just the comma in my gres.conf. I was confused because
I had the same file on my running nodes but that's just because slurmd
started before the erroneous comma was added to the config.
So the error message was in fact directly correct, it could not find the
device
> On Jul 23, 2018, at 10:31 PM, Ian Mortimer wrote:
>
> On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote:
>
>> Best off running nvidia-persistenced. Handles all of this stuff as a
>> side effect, and also enables persistence mode, provided you don’t
>> configure it otherwise.
>
> Yes.
On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote:
> Best off running nvidia-persistenced. Handles all of this stuff as a
> side effect, and also enables persistence mode, provided you don’t
> configure it otherwise.
Yes. But you have to ensure it starts before slurmd.
--
Ian
2018 6:00 PM
To: Slurm User Community List
Subject: Re: [slurm-users] "fatal: can't stat gres.conf"
Thanks for the suggestion; if my memory serves me right, I had to do that
previously to cause the drivers to load correctly after boot.
However, in this case both 'nvidia-smi'
Hi Alex,
What's the actual content of your gres.conf file? Seems to me that you have
a trailing comma after the location of the nvidia device
Our gres.conf has
NodeName=gpuhost[001-077] Name=gpu Type=p100 File=/dev/nvidia0
Cores=0,2,4,6,8,10,12,14,16,18,20,22
NodeName=gpuhost[001-077] Name=gpu T
On Mon, 2018-07-23 at 15:59 -0700, Alex Chekholko wrote:
> However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just
> fine and produce expected output.
They will because running nvidia-smi triggers loading of the kernel
module and creation of the device files. But are the device file
Subject: Re: [slurm-users] "fatal: can't stat gres.conf"
Thanks for the suggestion; if my memory serves me right, I had to do that
previously to cause the drivers to load correctly after boot.
However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run jus
ate:* Tue,Jul 24,2018 6:10 AM
> *To:* Slurm User Community List
> *Subject:* Re: [slurm-users] "fatal: can't stat gres.conf"
>
> Hi all,
>
> I have a few working GPU compute nodes. I bought a couple of more
> identical nodes. They are all diskless; so they all boo
: [slurm-users] "fatal: can't stat gres.conf"
Hi all,
I have a few working GPU compute nodes. I bought a couple of more identical
nodes. They are all diskless; so they all boot from the same disk image.
For some reason slurmd refuses to start on the new nodes; and I'm n
Hi all,
I have a few working GPU compute nodes. I bought a couple of more
identical nodes. They are all diskless; so they all boot from the same
disk image.
For some reason slurmd refuses to start on the new nodes; and I'm not able
to find any differences in hardware or software. Google search
10 matches
Mail list logo