Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Mark Hahn
Gus' numbers makes sense to me. I assume his workload consists of multiple sized jobs, serial, modest parallel, and parallel jobs using all resources. Without pre-emptive scheduling, the batch queue system has to starve the system in order to run the larger jobs. unless backfill can utilize th

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Gerry Creager
Alan Louis Scheinine wrote: This thread has moved to the question of utilization, discussed by Mark Hahn, Gus Correa and Håkon Bugge. In my previous job most people developed code, though test runs could run for days and use as many as 64 cores. It was convenient for most people to have immediat

Re: [Beowulf] Distributed FS (Was: copying big files)

2008-08-14 Thread Carsten Aulbert
Marian Marinov wrote: > Have you looked at GFarm and Hadoop ? Very briefly at GFarm ("ages" ago), Hadoop was unknown to me. Thanks! Carsten begin:vcard fn:Carsten Aulbert n:Aulbert;Carsten org:Max Planck Institute for Gravitational Physics;Observational Relativity and Cosmology adr:;;Callinst

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Alan Louis Scheinine
This thread has moved to the question of utilization, discussed by Mark Hahn, Gus Correa and Håkon Bugge. In my previous job most people developed code, though test runs could run for days and use as many as 64 cores. It was convenient for most people to have immediate access due to the excess co

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Håkon Bugge
Gus' numbers makes sense to me. I assume his workload consists of multiple sized jobs, serial, modest parallel, and parallel jobs using all resources. Without pre-emptive scheduling, the batch queue system has to starve the system in order to run the larger jobs. Obviously, before a job which

Re: [Beowulf] Infinipath memory parity errors

2008-08-14 Thread Nifty niftyompi Mitch
On Thu, Aug 14, 2008 at 02:47:58PM -0400, Mark Kosmowski wrote: > > > Which driver is active? Which Infinipath software release > > > is installed? The tool "ipath_control -i" can show which... > > > > QLogic kernel.org driver > > 00: Version: Driver 2.0, InfiniPath_QLE7140, InfiniPath1 4.2, PCI

Re: Linux cpusets and HPC (was Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?)

2008-08-14 Thread Paul Jackson
Chris wrote: > The main purpose we're using them for is a quick and > easy way to catch users who don't know better doing > things like running an OpenMP code as a single CPU job > and overloading a node (and causing chaos for other > users) when it discovers 8 cores. Let me see if I understand th

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Paul Jackson
Chris wrote: > creates a job cpuset which includes the specific cpus > (vnodes) that have been allocated by the scheduler, and > all the mems present (it currently makes no attempt to be > clever about that). I have recently open sourced a major user level C library, called libcpusets, which inclu

Re: [Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-14 Thread Marian Marinov
On Wednesday 13 August 2008 16:10:18 Prentice Bisbal wrote: > Perry E. Metzger wrote: > > Dave Love <[EMAIL PROTECTED]> writes: > >>> We'd prefer to steer clear of Kerberos, it introduces > >>> arbitrary job limitations through ticket lives that > >>> are not tolerable for HPC work. > > > > Which o

Re: [Beowulf] Distributed FS (Was: copying big files)

2008-08-14 Thread Marian Marinov
On Thursday 14 August 2008 08:23:10 Carsten Aulbert wrote: > Hi all > > Bernard Li wrote: > > I'd like to add a comment -- is the reason why this "issue" hasn't > > been brought up as frequently as I think it should be mainly because a > > lot of folks use distributed FS that eliminates the need to

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Paul Jackson
> the RPM that is was LGPL'd. Yes, libcpuset and libbitmask are LGPL. Have fun! -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Paul Jackson
Chris wrote: > The 2.6 cpuset support in Torque came out of a long Would you have any pointers to some more details of what you've done here? I'm the maintainer, and one of the authors, of Linux 2.6 cpusets, and would like to do what I can with cpusets to make life easier (or at least no more pai

[Beowulf] large MPI adopters

2008-08-14 Thread Andrea Di Blas
hello, I am curious about what companies, besides the national labs of course, use any implementation of MPI to support large applications of any kind, whether only internally (like mapreduce for google, for example) or not. does anybody know of any cases? thank you and best regards, andrea

Re: [Beowulf] Infinipath memory parity errors

2008-08-14 Thread Mark Kosmowski
> > > Which driver is active? Which Infinipath software release > > is installed? The tool "ipath_control -i" can show which... > > QLogic kernel.org driver > 00: Version: Driver 2.0, InfiniPath_QLE7140, InfiniPath1 4.2, PCI 2, SW > Compat 2 > > I think this is a 2.1 distribution, whereas there

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Kilian CAVALOTTI
Hi Chris, On Tuesday 12 August 2008 08:29:31 pm Chris Samuel wrote: > We do use things like cpusets to try and limit the impact > that jobs can have on other jobs on the same nodes, I'm actually curious about how you implemented that. Do you have NUMA hardware? Do you use a resources manager, a

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Gus Correa
Hello Mark and list The measurement was based on walltime. It just refers to the user occupancy of the cluster, versus what was left idle (for all reasons, e.g. lack of resources to serve large queued jobs, lack of enough jobs to fill all nodes, etc). The number is simply the utilized resources

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Mark Hahn
It appears we've averaged almost 77% utilisation since the beginning of 2004 (when our current usage system records begin). Thank you very much for the data point! I've insisted here that above 70% utilization is very good, given the random nature of demand and jobs on queues in the academia, e

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Gus Correa
Hello Chris and list Chris Samuel wrote: - "Gus Correa" <[EMAIL PROTECTED]> wrote: One reason not mentioned is serial programs. Well a cluster is to run parallel jobs. Hmm, a cluster is to run HPC codes, there are plenty of legitimate single CPU codes to solve embarrassingly par

Re: [Beowulf] copying big files

2008-08-14 Thread Erwan Velu
Henning Fehrmann wrote: Hi everybody, Coping a big file onto all nodes in a cluster is a rather common problem. I would have thought that there might be a standard tool for distributing the files in an efficient way. So far, I haven't found one. Assuming one has a network design which allows

Re: [Beowulf] Distributed FS (Was: copying big files)

2008-08-14 Thread Carsten Aulbert
Hi Mark Mark Hahn wrote: > the premise of this approach is that whoever is using the node doesn't > mind the overhead of external accesses. do you have a sense (or even > measurements) on how bad this loss is (cpu, cache, memory, interconnect > overheads)? if you follow the reasoning that curre

Re: [Beowulf] Distributed FS (Was: copying big files)

2008-08-14 Thread Mark Hahn
Extending your storage that way is just darn cheap. in fact, we really have to stop thinking of storage as a significant cost component to clusters. if adding a terabyte disk to a node increases its cost by ~2%, then it's in the noise. Take a look at PVFS (www.pvfs.org). Disclaimer: I work

Re: [Beowulf] copying big files

2008-08-14 Thread Lombard, David N
On Wed, Aug 13, 2008 at 04:14:50PM -0700, Bernard Li wrote: > Hi: > > On Tue, Aug 12, 2008 at 10:10 AM, Lombard, David N > <[EMAIL PROTECTED]> wrote: > > > See Brent Chen's pcp at > > You'll want pcp, authd, and libe. Get gexec while you're at it... > > Dave!!

Re: [Beowulf] Distributed FS (Was: copying big files)

2008-08-14 Thread Robert Latham
On Thu, Aug 14, 2008 at 07:23:10AM +0200, Carsten Aulbert wrote: > Speaking of this, what do people use when they have say ~ 200 nodes > with an extra 1 TB drive in it. I know of glusterFS but very few > others who will be able to utilize this in a somewhat efficient > matter. Are the good/better a

Re: [Beowulf] Distributed FS (Was: copying big files)

2008-08-14 Thread Chris Samuel
- "Carsten Aulbert" <[EMAIL PROTECTED]> wrote: > Speaking of this, what do people use when they have say ~ 200 nodes > with an extra 1 TB drive in it. We're just using them for local scratch space, we thought it was overkill until two weeks after the first test node arrived when we got a use

Re: [Beowulf] Gigabit Ethernet and RDMA

2008-08-14 Thread Chris Samuel
- "Robert Kubrick" <[EMAIL PROTECTED]> wrote: > However, some observers predict TOE cards will be back as 10 Gigabit > Ethernet network capacity leapfrogs processing power again. Van Jacobsen's "channelised" mods to the Linux TCP stack described at Linux.Conf.Au in Dunedin, NZ in 2006 showe

Re: [Beowulf] Infinipath memory parity errors

2008-08-14 Thread Dave Love
Nifty niftyompi Mitch <[EMAIL PROTECTED]> writes: > Which driver is active? Which Infinipath software release > is installed? The tool "ipath_control -i" can show which... QLogic kernel.org driver 00: Version: Driver 2.0, InfiniPath_QLE7140, InfiniPath1 4.2, PCI 2, SW Compat 2 I think this i

Re: Linux cpusets and HPC (was Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?)

2008-08-14 Thread Chris Samuel
- "Paul Jackson" <[EMAIL PROTECTED]> wrote: Hi Paul, > Let me see if I understand this. Is the following right: > > Without the cpuset constraint, such a 'bad' job could tell the > cluster management software (PBS or Torque or ...) it needed just > one CPU, which could end up puttin

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-14 Thread Chris Samuel
- "Paul Jackson" <[EMAIL PROTECTED]> wrote: > I have recently open sourced a major user level C library, > called libcpusets, which includes routines to map cpus to > their corresponding memory nodes. Aha, a bunch of us had been badgering the local Melbourne SGI rep about getting that publis

Linux cpusets and HPC (was Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?)

2008-08-14 Thread Chris Samuel
- "Paul Jackson" <[EMAIL PROTECTED]> wrote: Hi Paul, > Chris wrote: > > The 2.6 cpuset support in Torque came out of a long > > Would you have any pointers to some more details of what you've > done here? Sure - mostly it was discussed on the torquedev list after the initial discussion at