Hi,

We also have hybrid cluster(s).
We use the same nfsroot for all nodes, so technically everything is
installed everywhere. And we compile slurm once with everything needed.

Users can run "module load cuda" and/or "module load nvidia" to have access
to nvcc and nvidia's libraries (cuda and nvidia are manually installed here
as well), so they can compile gpu code, but it won't run on nodes with no
nvidia hardware.

The infiniband is that same, though we don't have hybrid clusters. I.e. one
cluster has IB, and one doesn't. But they all run the same binaries.

HTH,
    Yair.



On Sat, Feb 29, 2020 at 5:24 PM <dean.w.schu...@gmail.com> wrote:

> There are GPU plugins that won't be built unless you build on a node that
> has the Nvidia drivers installed.
>
> -----Original Message-----
> From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of
> Brian Andrus
> Sent: Friday, February 28, 2020 7:36 PM
> To: slurm-users@lists.schedmd.com
> Subject: [slurm-users] Hybrid compiling options
>
> All,
>
> Wanted to reach out for input on how folks compile slurm when you have a
> hybrid cluster.
>
> Scenario:
>
> you have 4 node types:
>
> A) CPU only
> B) GPU Only
> C) CPU+IB
> D) GPU+IB
>
> So, you can compile slurm with/without IB support and/or with/without GPU
> support.
> Including either option creates a dependency when packaging (RPM based).
>
> So, do you compile different versions for the different node types or
> install the dependent packages on nodes that have no user (nvidia in
> particular here)?
>
> Generally, I have always added the superfluous packages, but wondered what
> the thoughts on that are.
>
> Brian Andrus
>
>
>
>
>

Reply via email to