The firewalls are disabled on all nodes on my cluster so I don't think it is a
firewall issue. It's probably our network security between the wired part of
our network and the wireless side. When I put the nodes back on a wired
controller they work again.
-Original Message-
From: slu
On 06-02-2020 22:40, Dean Schulze wrote:
I've moved two nodes to a different controller. The nodes are wired and
the controller is networked via wifi. I had to open up ports 6817 and
6818 between the wired and wireless sides of our network to get any
connectivity.
Now when I do
srun -N2 ho
So this is related to the gpu/nvml plugin in the source code tree. That
didn't get built because I didn't have the nvidia driver (really the
library libnvidia-ml.so) installed when I built the code. I see in
config.log where it tries to find -lnvidia-ml and it skips building the
gpu.nvml plugin i
Hi Dean,
On 2/7/20 8:03 AM, dean.w.schu...@gmail.com wrote:
I just checked the .deb package that I build from source and there is nothing
in it that has nv or cuda in its name.
Are you sure that slurm distributes nvidia binaries?
SchedMD only distributes sources, it's up to distros how they
Your trying to run bash which, without special configuration, needs a pty
Try
srun -v -p debug --pty bash
Brian Andrus
On 2/6/2020 10:28 PM, Hector Yuen wrote:
Hello,
I am setting up a very simple configuration: one node running slurmd
and another one running slurmctld.
In the slurmctld m
I didn't say slurm distributes nvidia binaries. But slurm's gpu_nvml.so
links to libnvidia-ml.so if it was found at build time:
$ ldd lib/slurm/gpu_nvml.so
...
libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
(0x7f2d2bac8000)
...
When I run configure with
gpu_nvml.so links to libnvidia-ml.so:
$ ldd lib/slurm/gpu_nvml.so
...
libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
(0x7f2d2bac8000)
...
When you run configure you'll see something along these lines:
On 07.02.20 17:03, dean.w.schu...@gmail.com wrote:
I just checked the .deb package that I build from source and there is nothing
in it that has nv or cuda in its name.
Are you sure that slurm distributes nvidia binaries?
-Original Message-
From: slurm-users On Behalf Of Stephan
Roth
Sent: Friday, February 7, 2020 2:23 AM
To: slurm-user
I didn't know that slurm had nvml linked into it. I build slurm from source
and didn't notice that nvml was part of the build. I'll check on that again.
-Original Message-
From: slurm-users On Behalf Of Stephan
Roth
Sent: Friday, February 7, 2020 2:23 AM
To: slurm-users@lists.schedmd
On 05.02.20 21:06, Dean Schulze wrote:
> I need to dynamically configure gpus on my nodes. The gres.conf doc
> says to use
>
> Autodetect=nvml
That's all you need in gres.conf provided you don't configure any
Gres=... entries for your nodes in your slurm.conf.
If you do, make sure the string ma
10 matches
Mail list logo