Re: [slurm-users] PMIx and Slurm

2017-11-28 Thread r...@open-mpi.org
Very true - one of the risks with installing from packages. However, be aware 
that slurm 17.02 doesn’t support PMIx v2.0, and so this combination isn’t going 
to work anyway.

If you want PMIx v2.x, then you need to pair it with SLURM 17.11.

Ralph

> On Nov 28, 2017, at 2:32 PM, Philip Kovacs  wrote:
> 
> This issue is that pmi 2.0+ provides a "backward compatibility" feature, 
> enabled by default, which installs
> both libpmi.so and libpmi2.so in addition to libpmix.so.  The route with the 
> least friction for you would probably
> be to uninstall pmix, then install slurm normally, letting it install its 
> libpmi and libpmi2.  Next configure and compile
> a custom pmix with that backward feature _disabled_, so it only installs 
> libpmix.so.   Slurm will "see" the pmix library
> after you install it and load it via its plugin when you use --mpi=pmix.   
> Again, just use the Slurm pmi and pmi2 and 
> install pmix separately with the backward compatible option disabled.
> 
> There is a packaging issue there in which two packages are trying to install 
> their own versions of the same files.  
> That should be brought to attention of the packages.  Meantime you can work 
> around it.
> 
> For PMIX:
> 
> ./configure --disable-pmi-backward-compatibility // ... etc ...
> 
> 
> 
> On Tuesday, November 28, 2017 4:44 PM, Artem Polyakov  
> wrote:
> 
> 
> Hello, Paul
> 
> Please see below.
> 
> 2017-11-28 13:13 GMT-08:00 Paul Edmon  >:
> So in an effort to future proof ourselves we are trying to build Slurm 
> against PMIx, but when I tried to do so I got the following:
> 
> Transaction check error:
>   file /usr/lib64/libpmi.so from install of slurm-17.02.9-1fasrc02.el7.cen 
> tos.x86_64 conflicts with file from package pmix-2.0.2-1.el7.centos.x86_64
>   file /usr/lib64/libpmi2.so from install of slurm-17.02.9-1fasrc02.el7.cen 
> tos.x86_64 conflicts with file from package pmix-2.0.2-1.el7.centos.x86_64
> 
> This is with compiling Slurm with the --with-pmix=/usr option.  A few things:
> 
> 1. I'm surprised when I tell it to use PMIx it still builds its own versions 
> of libpmi and pmi2 given that PMIx handles that now.
> 
> PMIx is a plugin and from multiple perspectives it makes sense to keep the 
> other versions available (i.e. backward compat or perf comparison) 
>  
> 
> 2. Does this mean I have to install PMIx in a nondefault location?  If so how 
> does that work with user build codes?  I'd rather not have multiple versions 
> of PMI around for people to build against.
> When we introduced PMIx it was in the beta stage and we didn't want to build 
> against it by default. Now it probably makes sense to assume --with-pmix by 
> default.
> I'm also thinking that we might need to solve it at the packagers level by 
> distributing "slurm-pmix" package that is builded and depends on the pmix 
> package that is currently shipped with particular Linux distro.
>  
> 
> 3.  What is the right way of building PMIx and Slurm such that they 
> interoperate properly?
> As for now it is better to have a PMIx installed in the well-known location. 
> And then build your MPIs or other apps against this PMIx installation.
> Starting (I think) from PMIx v2.1 we will have a cross-version support that 
> will give some flexibility about what installation to use with application,
>  
> 
> Suffice it to say little to no documentation exists on how to properly this, 
> so any guidance would be much appreciated.
> Indeed we have some problems with the documentation as PMIx technology is 
> relatively new. Hopefully we can fix this in near future.
> Being the original developer of the PMIx plugin I'll be happy to answer any 
> questions and help to resolve the issues.
> 
> 
>  
> 
> 
> -Paul Edmon-
> 
> 
> 
> 
> 
> 
> -- 
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
> 
> 



Re: [slurm-users] PMIx and Slurm

2017-11-28 Thread r...@open-mpi.org
My apologies - I guess we hadn’t been tracking it that way. I’ll try to add 
some clarification. We presented a nice table at the BoF and I just need to 
find a few minutes to post it.

I believe you do have to build slurm against PMIx so that the pmix plugin is 
compiled. You then also have to specify --mpi=pmix so slurm knows to use that 
plugin for this specific job.

You actually might be able to use the PMIx backward compatibility, and you 
might want to do so with slurm 17.11 and above as Mellanox did a nice job of 
further optimizing launch performance on IB platforms by adding fabric-based 
collective implementations to the pmix plugin. If you replace the slurm libpmi 
and libpmi2 with the ones from PMIx, what will happen is that PMI and PMI2 
calls will be converted to their PMIx equivalent and passed to the pmix plugin. 
This lets you take advantage of what Mellanox did.

The caveat is that your MPI might ask for some PMI/PMI2 feature that we didn’t 
implement. We have tested with MPICH as well as OMPI and it was fine - but we 
cannot give you a blanket guarantee (e.g., I’m pretty sure MVAPICH won’t work). 
Probably safer to stick with the slurm libs for that reason unless you test to 
ensure it all works.


> On Nov 28, 2017, at 6:42 PM, Paul Edmon  wrote:
> 
> Okay, I didn't see any note on the PMIx 2.1 page about versions of slurm it 
> was combatible with so I assumed all of them.  My bad.  Thanks for the 
> correction and the help.  I just naively used the rpm spec that was packaged 
> with PMIx which does enable the legacy support.  It seems best then to let 
> PMIx handle pmix solely and let slurm handle the rest.  Thanks!
> 
> Am I right in reading that you don't have to build slurm against PMIx?  So it 
> just interoperates with it fine if you just have it installed and specify 
> pmix as the launch option?  That's neat.
> -Paul Edmon-
> 
> On 11/28/2017 6:11 PM, Philip Kovacs wrote:
>> Actually if you're set on installing pmix/pmix-devel from the rpms and then 
>> configuring slurm manually,
>> you could just move the pmix-installed versions of libpmi.so* and 
>> libpmi2.so* to a safe place, configure
>> and install slurm which will drop in its versions pf those libs and then 
>> either use the slurm versions or move
>> the the pmix versions of libpmi and libpmi2 back into place in /usr/lib64. 
>> 
>> 
>> On Tuesday, November 28, 2017 5:32 PM, Philip Kovacs  
>>  wrote:
>> 
>> 
>> This issue is that pmi 2.0+ provides a "backward compatibility" feature, 
>> enabled by default, which installs
>> both libpmi.so and libpmi2.so in addition to libpmix.so.  The route with the 
>> least friction for you would probably
>> be to uninstall pmix, then install slurm normally, letting it install its 
>> libpmi and libpmi2.  Next configure and compile
>> a custom pmix with that backward feature _disabled_, so it only installs 
>> libpmix.so.   Slurm will "see" the pmix library
>> after you install it and load it via its plugin when you use --mpi=pmix.   
>> Again, just use the Slurm pmi and pmi2 and 
>> install pmix separately with the backward compatible option disabled.
>> 
>> There is a packaging issue there in which two packages are trying to install 
>> their own versions of the same files.  
>> That should be brought to attention of the packages.  Meantime you can work 
>> around it.
>> 
>> For PMIX:
>> 
>> ./configure --disable-pmi-backward-compatibility // ... etc ...
>> 
>> 
>> 
>> On Tuesday, November 28, 2017 4:44 PM, Artem Polyakov  
>>  wrote:
>> 
>> 
>> Hello, Paul
>> 
>> Please see below.
>> 
>> 2017-11-28 13:13 GMT-08:00 Paul Edmon > >:
>> So in an effort to future proof ourselves we are trying to build Slurm 
>> against PMIx, but when I tried to do so I got the following:
>> 
>> Transaction check error:
>>   file /usr/lib64/libpmi.so from install of slurm-17.02.9-1fasrc02.el7.cen 
>> tos.x86_64 conflicts with file from package pmix-2.0.2-1.el7.centos.x86_64
>>   file /usr/lib64/libpmi2.so from install of slurm-17.02.9-1fasrc02.el7.cen 
>> tos.x86_64 conflicts with file from package pmix-2.0.2-1.el7.centos.x86_64
>> 
>> This is with compiling Slurm with the --with-pmix=/usr option.  A few things:
>> 
>> 1. I'm surprised when I tell it to use PMIx it still builds its own versions 
>> of libpmi and pmi2 given that PMIx handles that now.
>> 
>> PMIx is a plugin and from multiple perspectives it makes sense to keep the 
>> other versions available (i.e. backward compat or perf comparison) 
>>  
>> 
>> 2. Does this mean I have to install PMIx in a nondefault location?  If so 
>> how does that work with user build codes?  I'd rather not have multiple 
>> versions of PMI around for people to build against.
>> When we introduced PMIx it was in the beta stage and we didn't want to build 
>> against it by default. Now it probably makes sense to assume --with-pmix by 
>> default.
>> I'm also thinking that we might

Re: [slurm-users] PMIx and Slurm

2017-11-28 Thread r...@open-mpi.org
Thanks for your patience and persistence. I’ll find a place to post your 
experiences to help others as they navigate these waters.


> On Nov 28, 2017, at 8:52 PM, Philip Kovacs  wrote:
> 
> I doubled checked and yes, you definitely want the pmix headers and libpmix 
> library installed before you configure slurm.
> No need to use --with-pmix if pmix is installed in standard system locations. 
> Configure slurm and it will see the pmix 
> installation.  After configuring slurm, but before installing it, manually 
> remove the pmix versions of libpmi.so* and libpmi2.so*. 
> Install slurm and use its versions of those libs.  Test every mpi variant 
> seen when you run `srun --mpi=list hostname`.  
> You should see pmi2 and pmix in that list and several others.   The pmix 
> option will invoke a slurm plugin that is linked 
> directly to the libpmix.so library.  If you favor using the pmix versions of 
> pmi/pmi2, sounds like you'll get better performance
> when using pmi/pmi2, but as mentioned, you would want to test every mpi 
> variant listed to make sure everything works.
> 
> 
> On Tuesday, November 28, 2017 9:57 PM, "r...@open-mpi.org" 
>  wrote:
> 
> 
> My apologies - I guess we hadn’t been tracking it that way. I’ll try to add 
> some clarification. We presented a nice table at the BoF and I just need to 
> find a few minutes to post it.
> 
> I believe you do have to build slurm against PMIx so that the pmix plugin is 
> compiled. You then also have to specify --mpi=pmix so slurm knows to use that 
> plugin for this specific job.
> 
> You actually might be able to use the PMIx backward compatibility, and you 
> might want to do so with slurm 17.11 and above as Mellanox did a nice job of 
> further optimizing launch performance on IB platforms by adding fabric-based 
> collective implementations to the pmix plugin. If you replace the slurm 
> libpmi and libpmi2 with the ones from PMIx, what will happen is that PMI and 
> PMI2 calls will be converted to their PMIx equivalent and passed to the pmix 
> plugin. This lets you take advantage of what Mellanox did.
> 
> The caveat is that your MPI might ask for some PMI/PMI2 feature that we 
> didn’t implement. We have tested with MPICH as well as OMPI and it was fine - 
> but we cannot give you a blanket guarantee (e.g., I’m pretty sure MVAPICH 
> won’t work). Probably safer to stick with the slurm libs for that reason 
> unless you test to ensure it all works.
> 
> 
>> On Nov 28, 2017, at 6:42 PM, Paul Edmon > <mailto:ped...@cfa.harvard.edu>> wrote:
>> 
> 
> Okay, I didn't see any note on the PMIx 2.1 page about versions of slurm it 
> was combatible with so I assumed all of them.  My bad.  Thanks for the 
> correction and the help.  I just naively used the rpm spec that was packaged 
> with PMIx which does enable the legacy support.  It seems best then to let 
> PMIx handle pmix solely and let slurm handle the rest.  Thanks!
> Am I right in reading that you don't have to build slurm against PMIx?  So it 
> just interoperates with it fine if you just have it installed and specify 
> pmix as the launch option?  That's neat.
> -Paul Edmon-
> 
> On 11/28/2017 6:11 PM, Philip Kovacs wrote:
>> Actually if you're set on installing pmix/pmix-devel from the rpms and then 
>> configuring slurm manually,
>> you could just move the pmix-installed versions of libpmi.so* and 
>> libpmi2.so* to a safe place, configure
>> and install slurm which will drop in its versions pf those libs and then 
>> either use the slurm versions or move
>> the the pmix versions of libpmi and libpmi2 back into place in /usr/lib64. 
>> 
>> 
>> On Tuesday, November 28, 2017 5:32 PM, Philip Kovacs  
>> <mailto:pkde...@yahoo.com> wrote:
>> 
>> 
>> This issue is that pmi 2.0+ provides a "backward compatibility" feature, 
>> enabled by default, which installs
>> both libpmi.so and libpmi2.so in addition to libpmix.so.  The route with the 
>> least friction for you would probably
>> be to uninstall pmix, then install slurm normally, letting it install its 
>> libpmi and libpmi2.  Next configure and compile
>> a custom pmix with that backward feature _disabled_, so it only installs 
>> libpmix.so.   Slurm will "see" the pmix library
>> after you install it and load it via its plugin when you use --mpi=pmix.   
>> Again, just use the Slurm pmi and pmi2 and 
>> install pmix separately with the backward compatible option disabled.
>> 
>> There is a packaging issue there in which two packages are trying to install 
>> their own versions of the same files.  
>> That should be brough

Re: [slurm-users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-18 Thread r...@open-mpi.org
Repeated here from the OMPI list:

We have had reports of applications running faster when executing under OMPI’s 
mpiexec versus when started by srun. Reasons aren’t entirely clear, but are 
likely related to differences in mapping/binding options (OMPI provides a very 
large range compared to srun) and optimization flags provided by mpiexec that 
are specific to OMPI.

OMPI uses PMIx for wireup support (starting with the v2.x series), which 
provides a faster startup than other PMI implementations. However, that is also 
available with Slurm starting with the 16.05 release, and some further 
PMIx-based launch optimizations were recently added to the Slurm 17.11 release. 
So I would expect that launch via srun with the latest Slurm release and PMIx 
would be faster than mpiexec - though that still leaves the faster execution 
reports to consider.

HTH
Ralph


> On Dec 18, 2017, at 2:26 PM, Prentice Bisbal  wrote:
> 
> Slurm users,
> 
> I've already posted this question to the OpenMPI and Beowulf lists, but I 
> also wanted to post this question here to get more Slurm-specific opinions, 
> in case some of you don't subscribe to those lists and have meaning input to 
> provide. For those of you that subscribe to one or more of these lists, I 
> apologize for making you read this a 3rd time.
> 
> We use OpenMPI with Slurm as our scheduler, and a user has asked me this: 
> should they use mpiexec/mpirun or srun to start their MPI jobs through Slurm?
> 
> My inclination is to use mpiexec, since that is the only method that's 
> (somewhat) defined in the MPI standard and therefore the most portable, and 
> the examples in the OpenMPI FAQ use mpirun. However, the Slurm documentation 
> on the schedmd website say to use srun with the --mpi=pmi option. (See links 
> below)
> 
> What are the pros/cons of using these two methods, other than the portability 
> issue I already mentioned? Does srun+pmi use a different method to wire up 
> the connections? Some things I read online seem to indicate that. If slurm 
> was built with PMI support, and OpenMPI was built with Slurm support, does it 
> really make any difference?
> 
> https://www.open-mpi.org/faq/?category=slurm
> https://slurm.schedmd.com/mpi_guide.html#open_mpi
> 
> -- 
> Prentice Bisbal
> Lead Software Engineer
> Princeton Plasma Physics Laboratory
> http://www.pppl.gov
> 
> 




Re: [slurm-users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-18 Thread r...@open-mpi.org
If it truly is due to mapping/binding and optimization params, then I would 
expect it to be highly application-specific. The sporadic nature of the reports 
would seem to also support that possibility.

I’d be very surprised to find run time scaling better with srun unless you are 
using some layout option with one that you aren’t using with another. mpiexec 
has all the srun layout options, and a lot more - so I suspect you just aren’t 
using the equivalent mpiexec option. Exploring those might even reveal a 
combination that runs better :-)

Launch time, however, is a different subject.


> On Dec 18, 2017, at 5:23 PM, Christopher Samuel  wrote:
> 
> On 19/12/17 12:13, r...@open-mpi.org wrote:
> 
>> We have had reports of applications running faster when executing under 
>> OMPI’s mpiexec versus when started by srun.
> 
> Interesting, I know that used to be the case with older versions of
> Slurm but since (I think) about 15.x we saw srun scale better than
> mpirun (this was for the molecular dynamics code NAMD).
> 
> -- 
> Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
> 




Re: [slurm-users] [17.11.1] no good pmi intention goes unpunished

2017-12-20 Thread r...@open-mpi.org
On Dec 20, 2017, at 6:21 PM, Philip Kovacs  wrote:
> 
> >  -- slurm.spec: move libpmi to a separate package to solve a conflict with 
> > the
> >version provided by PMIx. This will require a separate change to PMIx as
> >well.
> 
> I see the intention behind this change since the pmix 2.0+ package provides 
> libpmi/libpmi2
> and there is a possible (installation) conflict with the Slurm implementation 
> of those libraries.  
> We've discussed  that issue earlier.
> 
> Now, suppose a user installs the pmix versions of libpmi/pmi2 with the 
> expectation that pmi
> calls will be forwarded to libpmix for greater speed, the so-called "backward 
> compatibility" feature.
> 
> Shouldn't the Slurm mpi_pmi2 plugin attempt to link with libpmi2 instead of 
> its internal 
> implementation of pmi2?  As it stands now, there won't be any forwarding of 
> pmi2 code 
> to libpmix which I imagine users would expect in that scenario.

Sadly, it isn’t quite that simple. Most of the standard PMI2 calls are covered 
by the backward compatibility libraries, so things like MPICH should work 
out-of-the-box.

However, MVAPICH2 added a PMI2 extension call to the SLURM PMI2 library that 
they use and PMIx doesn’t cover (as there really isn’t an easy equivalent, and 
they called it PMIX_foo which causes a naming conflict), and so they would not 
work.




Re: [slurm-users] [17.11.1] no good pmi intention goes unpunished

2017-12-21 Thread r...@open-mpi.org
Hmmm - I think there may be something a little more subtle here. If you build 
your app and link it against “libpmi2”, and that library is actually the one 
from PMIx, then it won’t work with Slurm’s PMI2 plugin because the 
communication protocols are completely different.

So the fact is that if you want to use PMIx backward compatibility, you (a) 
need to link against either libpmix or the libpmi libraries we export (they are 
nothing more than symlinks to libpmix), and (b) specify --mpi=pmix on the srun 
cmd line.



> On Dec 21, 2017, at 11:44 AM, Philip Kovacs  wrote:
> 
> OK, so slurm's libpmi2 is a functional superset of the libpmi2 provided by 
> pmix 2.0+.  That's good to know.
> 
> My point here is that, if you use slurm's mpi/pmi2 plugin, regardless of 
> which libpmi2 is installed, 
> slurm or pmix, you will always run the slurm pmi2 code since it is compiled 
> directly into the plugin.
> 
> 
> On Wednesday, December 20, 2017 10:47 PM, "r...@open-mpi.org" 
>  wrote:
> 
> 
> On Dec 20, 2017, at 6:21 PM, Philip Kovacs  <mailto:pkde...@yahoo.com>> wrote:
>> 
>> >  -- slurm.spec: move libpmi to a separate package to solve a conflict with 
>> > the
>> >version provided by PMIx. This will require a separate change to PMIx as
>> >well.
>> 
>> I see the intention behind this change since the pmix 2.0+ package provides 
>> libpmi/libpmi2
>> and there is a possible (installation) conflict with the Slurm 
>> implementation of those libraries.  
>> We've discussed  that issue earlier.
>> 
>> Now, suppose a user installs the pmix versions of libpmi/pmi2 with the 
>> expectation that pmi
>> calls will be forwarded to libpmix for greater speed, the so-called 
>> "backward compatibility" feature.
>> 
>> Shouldn't the Slurm mpi_pmi2 plugin attempt to link with libpmi2 instead of 
>> its internal 
>> implementation of pmi2?  As it stands now, there won't be any forwarding of 
>> pmi2 code 
>> to libpmix which I imagine users would expect in that scenario.
> 
> Sadly, it isn’t quite that simple. Most of the standard PMI2 calls are 
> covered by the backward compatibility libraries, so things like MPICH should 
> work out-of-the-box.
> 
> However, MVAPICH2 added a PMI2 extension call to the SLURM PMI2 library that 
> they use and PMIx doesn’t cover (as there really isn’t an easy equivalent, 
> and they called it PMIX_foo which causes a naming conflict), and so they 
> would not work.
> 
> 
> 
> 



Re: [slurm-users] [17.11.1] no good pmi intention goes unpunished

2017-12-21 Thread r...@open-mpi.org
I need to correct myself - the libs are not symlinks to libpmix. They are 
actual copies of the libpmix library with their own version triplets which 
change only if/when the PMI-1 or PMI-2 abstraction code changes. If they were 
symlinks, we wouldn’t be able to track independent version triplets.

Just to further clarify: the reason we provide libpmi and libpmi2 is that users 
were requesting access to the backward compatibility feature, but their 
apps/libs were hardcoded to dlopen “libpmi” or “libpmi2”. We suggested they 
just manually create the links, but clearly there was some convenience 
associated with directly installing them. Hence, we added a configure option 
"--enable-pmi-backward-compatibility” to control the behavior and set it to 
enabled by default. Disabling it simply causes the other libs to not be made.


> On Dec 21, 2017, at 12:58 PM, Philip Kovacs  wrote:
> 
> >(they are nothing more than symlinks to libpmix)
> 
> This is very helpful to know.
> 
> 
> On Thursday, December 21, 2017 3:28 PM, "r...@open-mpi.org" 
>  wrote:
> 
> 
> Hmmm - I think there may be something a little more subtle here. If you build 
> your app and link it against “libpmi2”, and that library is actually the one 
> from PMIx, then it won’t work with Slurm’s PMI2 plugin because the 
> communication protocols are completely different.
> 
> So the fact is that if you want to use PMIx backward compatibility, you (a) 
> need to link against either libpmix or the libpmi libraries we export (they 
> are nothing more than symlinks to libpmix), and (b) specify --mpi=pmix on the 
> srun cmd line.
> 
> 
> 
>> On Dec 21, 2017, at 11:44 AM, Philip Kovacs > <mailto:pkde...@yahoo.com>> wrote:
>> 
>> OK, so slurm's libpmi2 is a functional superset of the libpmi2 provided by 
>> pmix 2.0+.  That's good to know.
>> 
>> My point here is that, if you use slurm's mpi/pmi2 plugin, regardless of 
>> which libpmi2 is installed, 
>> slurm or pmix, you will always run the slurm pmi2 code since it is compiled 
>> directly into the plugin.
>> 
>> 
>> On Wednesday, December 20, 2017 10:47 PM, "r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>" mailto:r...@open-mpi.org>> 
>> wrote:
>> 
>> 
>> On Dec 20, 2017, at 6:21 PM, Philip Kovacs > <mailto:pkde...@yahoo.com>> wrote:
>>> 
>>> >  -- slurm.spec: move libpmi to a separate package to solve a conflict 
>>> > with the
>>> >version provided by PMIx. This will require a separate change to PMIx 
>>> > as
>>> >well.
>>> 
>>> I see the intention behind this change since the pmix 2.0+ package provides 
>>> libpmi/libpmi2
>>> and there is a possible (installation) conflict with the Slurm 
>>> implementation of those libraries.  
>>> We've discussed  that issue earlier.
>>> 
>>> Now, suppose a user installs the pmix versions of libpmi/pmi2 with the 
>>> expectation that pmi
>>> calls will be forwarded to libpmix for greater speed, the so-called 
>>> "backward compatibility" feature.
>>> 
>>> Shouldn't the Slurm mpi_pmi2 plugin attempt to link with libpmi2 instead of 
>>> its internal 
>>> implementation of pmi2?  As it stands now, there won't be any forwarding of 
>>> pmi2 code 
>>> to libpmix which I imagine users would expect in that scenario.
>> 
>> Sadly, it isn’t quite that simple. Most of the standard PMI2 calls are 
>> covered by the backward compatibility libraries, so things like MPICH should 
>> work out-of-the-box.
>> 
>> However, MVAPICH2 added a PMI2 extension call to the SLURM PMI2 library that 
>> they use and PMIx doesn’t cover (as there really isn’t an easy equivalent, 
>> and they called it PMIX_foo which causes a naming conflict), and so they 
>> would not work.
>> 
>> 
>> 
>> 
> 
> 
> 



[slurm-users] Using PMIx with SLURM

2018-01-03 Thread r...@open-mpi.org
Hi folks

There have been some recent questions on both this and the OpenMPI mailing 
lists about PMIx use with SLURM. I have tried to capture the various 
conversations in a “how-to” guide on the PMIx web site:

https://pmix.org/support/how-to/slurm-support/ 


There are also some hints about how to debug PMIx-based apps: 
https://pmix.org/support/faq/debugging-pmix/ 


The web site is still in its infancy, so there is still a lot to be added. 
However, it may perhaps begin to help a bit. As always, suggestions are welcome.

Ralph



[slurm-users] Fabric manager interactions: request for comments

2018-02-05 Thread r...@open-mpi.org
I apologize in advance if you received a copy of this from other mailing lists
--

Hello all

The PMIx community is starting work on the next phase of defining support for 
network interactions, looking specifically at things we might want to obtain 
and/or control via the fabric manager. A very preliminary draft is shown here:

https://pmix.org/home/pmix-standard/fabric-manager-roles-and-expectations/ 


We would welcome any comments/suggestions regarding information you might find 
useful to get regarding the network, or controls you would like to set.

Thanks in advance
Ralph




Re: [slurm-users] Allocate more memory

2018-02-07 Thread r...@open-mpi.org
I’m afraid neither of those versions is going to solve the problem here - there 
is no way to allocate memory across nodes.

Simple reason: there is no way for a process to directly address memory on a 
separate node - you’d have to implement that via MPI or shmem or some other 
library.


> On Feb 7, 2018, at 6:57 AM, Loris Bennett  wrote:
> 
> Loris Bennett  > writes:
> 
>> Hi David,
>> 
>> david martin  writes:
>> 
>>>  
>>> 
>>> Hi,
>>> 
>>> I would like to submit a job that requires 3Go. The problem is that I have 
>>> 70 nodes available each node with 2Gb memory.
>>> 
>>> So the command sbatch --mem=3G will wait for ressources to become available.
>>> 
>>> Can I run sbatch and tell the cluster to use the 3Go out of the 70Go
>>> available or is that a particular setup ? meaning is the memory
>>> restricted to each node ? or should i allocate two nodes so that i
>>> have 2x4Go availble ?
>> 
>> Check
>> 
>>  man sbatch
>> 
>> You'll find that --mem means memory per node.  Thus, if you specify 3GB
>> but all the nodes have 2GB, your job will wait forever (or until you buy
>> more RAM and reconfigure Slurm).
>> 
>> You probably want --mem-per-cpu, which is actually more like memory per
>> task.
> 
> The above should read
> 
>  You probably want --mem-per-cpu, which is actually more like memory per
>  core and thus memory per task if you have tasks per core set to 1.
> 
>> This is obviously only going to work if your job can actually run
>> on more than one node, e.g. is MPI enabled.
>> 
>> Cheers,
>> 
>> Loris
> -- 
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de 
> 


Re: [slurm-users] Allocate more memory

2018-02-07 Thread r...@open-mpi.org
Afraid not - since you don’t have any nodes that meet the 3G requirement, 
you’ll just hang.

> On Feb 7, 2018, at 7:01 AM, david vilanova  wrote:
> 
> Thanks for the quick response.
> 
> Should the following script do the trick ?? meaning use all required nodes to 
> have at least 3G total memory ? even though my nodes were setup with 2G each 
> ??
> 
> #SBATCH array 1-10%10:1
> 
> #SBATCH mem-per-cpu=3000m
> 
> srun R CMD BATCH myscript.R
> 
> 
> 
> thanks
> 
> 
> 
> 
> On 07/02/2018 15:50, Loris Bennett wrote:
>> Hi David,
>> 
>> david martin  writes:
>> 
>>> 
>>> 
>>> Hi,
>>> 
>>> I would like to submit a job that requires 3Go. The problem is that I have 
>>> 70 nodes available each node with 2Gb memory.
>>> 
>>> So the command sbatch --mem=3G will wait for ressources to become available.
>>> 
>>> Can I run sbatch and tell the cluster to use the 3Go out of the 70Go
>>> available or is that a particular setup ? meaning is the memory
>>> restricted to each node ? or should i allocate two nodes so that i
>>> have 2x4Go availble ?
>> Check
>> 
>>   man sbatch
>> 
>> You'll find that --mem means memory per node.  Thus, if you specify 3GB
>> but all the nodes have 2GB, your job will wait forever (or until you buy
>> more RAM and reconfigure Slurm).
>> 
>> You probably want --mem-per-cpu, which is actually more like memory per
>> task.  This is obviously only going to work if your job can actually run
>> on more than one node, e.g. is MPI enabled.
>> 
>> Cheers,
>> 
>> Loris
>> 
> 
>