Open MPI matches available hardware in node(s) against its compiled-in 
capabilities.  Those capabilities are expressed as modular shared libraries 
(see e.g. $PREFIX/lib64/openmpi).  You can use environment variables or 
command-line flags to influence which modules get used for specific purposed.  
For example, the Byte-Transfer Layer (BTL) module has openib, tcp, self, 
shared-memory (sm), vader implementations.  So long as your build of Open MPI 
knew about Infiniband and the runtime can see the hardware, Open MPI should 
rank that interface highest-performance and use it.



> On Dec 9, 2019, at 08:54 , Sysadmin CAOS <sysadmin.c...@uab.cat> wrote:
> 
> Hi mercan,
> 
> OK, I forgot to compile OpenMPI with Infiniband support... But I still have a 
> doubt: SLURM scheduler assigns (offers) some nodes called "node0x" to my 
> sbatch job because in my SLURM cluster nodes have been added with "node0x" 
> name. My OpenMPI application has been (now) compiled with ibverbs support.. 
> but how I tell to my application or to my SLURM sbatch submit script that my 
> MPI program MUST use Infiniband network? If SLURM has assigned to me node01 
> and node02 (with IP address 192.168.11.1 and 192.168.11.2 in a gigabit 
> network) and Infiniband is 192.168.13.x, who transform from "clus01" 
> (192.168.12.1) and "clus02" (192.168.12.2) to "infi01" (192.168.13.1) and 
> "infi02" (192.168.13.2).
> 
> This step still baffles me...
> 
> Sorry if my question is easy for you... but now I have been entered in a sea 
> of doubts.
> 
> Thanks.
> 
> El 05/12/2019 a las 14:27, mercan escribió:
>> Hi;
>> 
>> Your mpi and NAMD use your second network because of your applications did 
>> not compiled for infiniband. There are many compiled NAMD versions. the verb 
>> and ibverb versions are for using infiniband. Also, when you compiling the 
>> mpi source, you should check configure script detect the infiniband network 
>> to use infiniband. And even while compiling the slurm too.
>> 
>> Regards;
>> 
>>  Ahmet M.
>> 
>> 
>> On 5.12.2019 15:07, sysadmin.caos wrote:
>>> Hello,
>>> 
>>> Really, I don't know if my question is for this mailing list... but I will 
>>> explain my problem and, then, you could answer me whatever you think ;)
>>> 
>>> I manage a SLURM clusters composed by 3 networks:
>>> 
>>>   * a gigabit network used for NFS shares (192.168.11.X). In this
>>>     network, my nodes are "node01, node02..." in /etc/hosts.
>>>   * a gigabit network used by SLURM (all my nodes are added to SLURM
>>>     cluster using this network and the hostname assigned via /etc/host
>>>     to this second network). (192.168.12.X). In this network, my nodes
>>>     are "clus01, clus02..." in /etc/hosts.
>>>   * a Infiniband network (192.168.13.X). In this network, my nodes are
>>>     "infi01, infi02..." in /etc/hosts.
>>> 
>>> When I submit a MPI job, SLURM scheduler offers me "n" nodes called, for 
>>> example, clus01 and clus02 and, there, my application runs perfectly using 
>>> second network for SLURM connectivity and first network for NFS (and NIS) 
>>> shares. By default, as SLURM connectivity is on second network, my nodelist 
>>> contains nodes called "clus0x".
>>> 
>>> However, now, I'm getting a "new" problem. I want to use third network 
>>> (Infiniband), but as SLURM offers me "clus0x" (second network), my MPI 
>>> application runs OK but using second network. This problem also occurs, for 
>>> example, using NAMD (Charmrun) application.
>>> 
>>> So, my questions are:
>>> 
>>>  1. is this SLURM configuration correct for using both networks?
>>>      1. If answer is "no", how do I configure SLURM for my purpose?
>>>      2. But if answer is "yes", how can I ensure connections in my
>>>         SLURM job are going in Infiniband?
>>> 
>>> Thanks a lot!!
>>> 
> 
> 


Reply via email to