Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Chris Samuel
- "Gus Correa" <[EMAIL PROTECTED]> wrote: > One reason not mentioned is serial programs. > Well a cluster is to run parallel jobs. Hmm, a cluster is to run HPC codes, there are plenty of legitimate single CPU codes to solve embarrassingly parallel problems! :-) [...] > and an average of abo

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Chris Samuel
- "Kilian CAVALOTTI" <[EMAIL PROTECTED]> wrote: > Hi Chris, Hello Kilian, > On Tuesday 12 August 2008 08:29:31 pm Chris Samuel wrote: > > > We do use things like cpusets to try and limit the impact > > that jobs can have on other jobs on the same nodes, > > I'm actually curious about how

[Beowulf] Distributed FS (Was: copying big files)

2008-08-13 Thread Carsten Aulbert
Hi all Bernard Li wrote: > I'd like to add a comment -- is the reason why this "issue" hasn't > been brought up as frequently as I think it should be mainly because a > lot of folks use distributed FS that eliminates the need to do one to > many file transfers? > Speaking of this, what do people u

Re: [Beowulf] Infinipath memory parity errors

2008-08-13 Thread Nifty niftyompi Mitch
On Wed, Aug 13, 2008 at 05:03:46PM +0100, Dave Love wrote: > [I know in an ideal world the vendor between us and PathScale^WQlogic > would sort this out.] > > I'm interested in the cause (and possible cure!) of intermittent errors > on various nodes in our Infinipath system which stop MPI jobs wit

Re: [Beowulf] copying big files

2008-08-13 Thread Bernard Li
I'd like to add a comment -- is the reason why this "issue" hasn't been brought up as frequently as I think it should be mainly because a lot of folks use distributed FS that eliminates the need to do one to many file transfers? Cheers, Bernard On Wed, Aug 13, 2008 at 4:14 PM, Bernard Li <[EMAIL

Re: [Beowulf] copying big files

2008-08-13 Thread Bernard Li
Hi: On Tue, Aug 12, 2008 at 10:10 AM, Lombard, David N <[EMAIL PROTECTED]> wrote: > See Brent Chen's pcp at > You'll want pcp, authd, and libe. Get gexec while you're at it... Dave!! You beat me to mention pcp! :-) I really wished somebody would pick up the

Re: [Beowulf] Can one Infiniband net support MPI and a parallel file system?

2008-08-13 Thread Robert Latham
On Wed, Aug 06, 2008 at 01:31:09PM -0500, Jason Clinton wrote: > Generally speaking, MPI programs will not be fetching/writing data > from/to storage at the same time they are doing MPI calls so there > tends to not be very much contention to worry about at the node level. Well... if the MPI progr

Re: [Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Perry E. Metzger
Loic Tortay <[EMAIL PROTECTED]> writes: > Perry E. Metzger wrote: > [...] >> >> Maybe some sort of strange myth has been going by so long on this that >> people refuse to believe that the ticket refresh is a single easy >> command? >> > The "myth" is the ability to automatically get a Kerberos t

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Joshua Baker-LePain
On Wed, 13 Aug 2008 at 1:00pm, Gus Correa wrote After I banned the use of Matlab (to the dismay and revolt of many users) I am *so* jealous... -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF ___ Beowulf mailing list, Beowulf@beowulf.org To c

Re: [Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Perry E. Metzger
Dave Love <[EMAIL PROTECTED]> writes: > "Perry E. Metzger" <[EMAIL PROTECTED]> writes: >> I keep seeing these messages go by over and over making it sound like >> this is difficult. It is not difficult. I've seen people say "I have >> seen no document with a recipe for how to do it", perhaps becau

Re: [Beowulf] Re: Kerberos + HPC

2008-08-13 Thread Perry E. Metzger
Dave Love <[EMAIL PROTECTED]> writes: > "Perry E. Metzger" <[EMAIL PROTECTED]> writes: > >> So, you just run kinit in cron as the specified daemon user with the >> appropriate flags and it will renew its own tickets and all is well. > > Who says you can even run kinit from cron if it was appropria

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Gus Correa
Hi resource management concerned experts and list I started the thread, before it gained a life of its own and its current incarnation. So, please let me add my two cents to this interesting discussion. We do share nodes on our cluster. After all, we have only 32 nodes, 64 dual processors, sin

[Beowulf] Infinipath memory parity errors

2008-08-13 Thread Dave Love
[I know in an ideal world the vendor between us and PathScale^WQlogic would sort this out.] I'm interested in the cause (and possible cure!) of intermittent errors on various nodes in our Infinipath system which stop MPI jobs with kernel messages like this, in case anyone's familiar with them:

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Craig Tierney
Joe Landman wrote: Craig Tierney wrote: Chris Samuel wrote: - "I Kozin (Igor)" <[EMAIL PROTECTED]> wrote: Generally speaking, MPI programs will not be fetching/writing data from/to storage at the same time they are doing MPI calls so there tends to not be very much contention to worry abo

[Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Dave Love
"Perry E. Metzger" <[EMAIL PROTECTED]> writes: > I keep seeing these messages go by over and over making it sound like > this is difficult. It is not difficult. I've seen people say "I have > seen no document with a recipe for how to do it", perhaps because a > single kinit command in a cron job i

Re: [Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Loic Tortay
Perry E. Metzger wrote: [...] > > Maybe some sort of strange myth has been going by so long on this that > people refuse to believe that the ticket refresh is a single easy > command? > The "myth" is the ability to automatically get a Kerberos ticket on any node in a cluster *especially* for the

[Beowulf] Re: Kerberos + HPC

2008-08-13 Thread Dave Love
"Perry E. Metzger" <[EMAIL PROTECTED]> writes: > So, you just run kinit in cron as the specified daemon user with the > appropriate flags and it will renew its own tickets and all is well. Who says you can even run kinit from cron if it was appropriate? > I'm not sure why people think this is al

[Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Dave Love
Chris Samuel <[EMAIL PROTECTED]> writes: > I was the OP. ;-) [`Post', not `poster'!] >> Why do you need to re-authenticate, > > If I create a 3 month long Kerberos ticket, and my PBS > job will run for 3 months but ends up waiting in the > queue for 2 weeks before it can start due to demand > t

Re: [Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Prentice Bisbal
Perry E. Metzger wrote: > Dave Love <[EMAIL PROTECTED]> writes: >>> We'd prefer to steer clear of Kerberos, it introduces >>> arbitrary job limitations through ticket lives that >>> are not tolerable for HPC work. > > Which of course isn't true. If Wall Street firms, which really cannot > afford

Re: [Beowulf] Re: Linux cluster authenticating against multiple Active Directory domains

2008-08-13 Thread Perry E. Metzger
Dave Love <[EMAIL PROTECTED]> writes: >> We'd prefer to steer clear of Kerberos, it introduces >> arbitrary job limitations through ticket lives that >> are not tolerable for HPC work. Which of course isn't true. If Wall Street firms, which really cannot afford to have their trading systems go do

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Joe Landman
Craig Tierney wrote: Chris Samuel wrote: - "I Kozin (Igor)" <[EMAIL PROTECTED]> wrote: Generally speaking, MPI programs will not be fetching/writing data from/to storage at the same time they are doing MPI calls so there tends to not be very much contention to worry about at the node level

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

2008-08-13 Thread Ashley Pittman
On Tue, 2008-08-12 at 12:09 -0600, Craig Tierney wrote: > Chris Samuel wrote: > > - "I Kozin (Igor)" <[EMAIL PROTECTED]> wrote: > > But that assumes you're not sharing a node with other > > jobs that may well be doing I/O. > > > I am wondering, who shares nodes in cluster systems with > MPI c