Open MPI matches available hardware in node(s) against its compiled-in capabilities. Those capabilities are expressed as modular shared libraries (see e.g. $PREFIX/lib64/openmpi). You can use environment variables or command-line flags to influence which modules get used for specific purposed. For example, the Byte-Transfer Layer (BTL) module has openib, tcp, self, shared-memory (sm), vader implementations. So long as your build of Open MPI knew about Infiniband and the runtime can see the hardware, Open MPI should rank that interface highest-performance and use it.
> On Dec 9, 2019, at 08:54 , Sysadmin CAOS <sysadmin.c...@uab.cat> wrote: > > Hi mercan, > > OK, I forgot to compile OpenMPI with Infiniband support... But I still have a > doubt: SLURM scheduler assigns (offers) some nodes called "node0x" to my > sbatch job because in my SLURM cluster nodes have been added with "node0x" > name. My OpenMPI application has been (now) compiled with ibverbs support.. > but how I tell to my application or to my SLURM sbatch submit script that my > MPI program MUST use Infiniband network? If SLURM has assigned to me node01 > and node02 (with IP address 192.168.11.1 and 192.168.11.2 in a gigabit > network) and Infiniband is 192.168.13.x, who transform from "clus01" > (192.168.12.1) and "clus02" (192.168.12.2) to "infi01" (192.168.13.1) and > "infi02" (192.168.13.2). > > This step still baffles me... > > Sorry if my question is easy for you... but now I have been entered in a sea > of doubts. > > Thanks. > > El 05/12/2019 a las 14:27, mercan escribió: >> Hi; >> >> Your mpi and NAMD use your second network because of your applications did >> not compiled for infiniband. There are many compiled NAMD versions. the verb >> and ibverb versions are for using infiniband. Also, when you compiling the >> mpi source, you should check configure script detect the infiniband network >> to use infiniband. And even while compiling the slurm too. >> >> Regards; >> >> Ahmet M. >> >> >> On 5.12.2019 15:07, sysadmin.caos wrote: >>> Hello, >>> >>> Really, I don't know if my question is for this mailing list... but I will >>> explain my problem and, then, you could answer me whatever you think ;) >>> >>> I manage a SLURM clusters composed by 3 networks: >>> >>> * a gigabit network used for NFS shares (192.168.11.X). In this >>> network, my nodes are "node01, node02..." in /etc/hosts. >>> * a gigabit network used by SLURM (all my nodes are added to SLURM >>> cluster using this network and the hostname assigned via /etc/host >>> to this second network). (192.168.12.X). In this network, my nodes >>> are "clus01, clus02..." in /etc/hosts. >>> * a Infiniband network (192.168.13.X). In this network, my nodes are >>> "infi01, infi02..." in /etc/hosts. >>> >>> When I submit a MPI job, SLURM scheduler offers me "n" nodes called, for >>> example, clus01 and clus02 and, there, my application runs perfectly using >>> second network for SLURM connectivity and first network for NFS (and NIS) >>> shares. By default, as SLURM connectivity is on second network, my nodelist >>> contains nodes called "clus0x". >>> >>> However, now, I'm getting a "new" problem. I want to use third network >>> (Infiniband), but as SLURM offers me "clus0x" (second network), my MPI >>> application runs OK but using second network. This problem also occurs, for >>> example, using NAMD (Charmrun) application. >>> >>> So, my questions are: >>> >>> 1. is this SLURM configuration correct for using both networks? >>> 1. If answer is "no", how do I configure SLURM for my purpose? >>> 2. But if answer is "yes", how can I ensure connections in my >>> SLURM job are going in Infiniband? >>> >>> Thanks a lot!! >>> > >