That's what we do here. We have three different rpms we build.
server: because we run the latest MariaDB on our master
general compute
gpu compute: because we build against nvml
We name these all the same but have them in different repos and
distribute the repos to each node appropriately.
We also have a git repo in which we manage our slurm.spec file with a
branch for each version and type so we can keep organized.
-Paul Edmon-
On 9/24/2020 3:31 PM, Dana, Jason T. wrote:
Hello,
I hopefully have a quick question.
I have compiled Slurm RPMs on a CentOS system with nvidia drivers
installed so that I can utilize AutoDetect=nvml configuration in our
GPU nodes’ gres.conf. All seems to be going well on the GPU nodes
since I have done that. I was unable to install the slurm RPM on the
control/master node as the RPM required libnvidia-ml.so to be
installed. The control/master and other compute nodes don’t have any
nvidia cards attached to them, so I believed installing the drivers
just to satisfy this requirement might not be the best idea. I
recreated the RPM without the drivers present to get around this and
everything has been working great as far as I can tell.
I am now working on adding pmix support that I didn’t properly add
initially and am encountering this situation again. I figured I would
send up a flag and see if maybe I am going about this the wrong way.
Is it typical to have to compile the slurm RPMs for different types of
nodes or am I completely going about this the wrong way?
Thanks in advance!
Jason