Re: [Beowulf] Does anyone here mix CISC and RISC within their clusters.

2016-10-26 Thread Christopher Samuel
On 27/10/16 16:00, Darren Wise wrote: > Along with seven dual socket, quadcore AMD x86-64 CISC nodes running > ubuntu 16.4LTS, MPICH and OpenMPI are giving me some strange errors but > as soon as I opt out the SUN box everything runs smoothly. You're not just mixing architectures, you're mixing D

Re: [Beowulf] Does anyone here mix CISC and RISC within their clusters.

2016-10-26 Thread Tim Cutts
Years ago our cluster was mixed architecture. At one point it contained four architectures: x86, x86_64, Itanium and Alpha. It worked fine for us, but our workload is largely embarrassingly parallel and we certainly never tried running MPI jobs across architectures. Tim -- The Wellcome Tru

Re: [Beowulf] Does anyone here mix CISC and RISC within their clusters.

2016-10-26 Thread C Bergström
Could this be related some some ENDIAN issue? I've never heard of this before and quite frankly - no idea about how MPI should behave in this setup.. curious.. What's the exact error and have you tried asking the openmpi or mpich mailing lists? On Thu, Oct 27, 2016 at 1:00 PM, Darren Wise wrote:

Re: [Beowulf] non-stop computing

2016-10-26 Thread Prentice Bisbal
On 10/26/2016 10:22 AM, Joe Landman wrote: On 10/26/2016 10:20 AM, Prentice Bisbal wrote: How so? By only having a single seat or node-locked license? Either ... for licensed code this is a non-starter. Which is a shame that we still are talking about node locked/single seat in 2016. I

Re: [Beowulf] non-stop computing Message-ID:

2016-10-26 Thread Greg Keller
Date: Wed, 26 Oct 2016 09:52:13 -0400 From: Joe Landman To: beowulf@beowulf.org Subject: Re: [Beowulf] non-stop computing Message-ID: <918d4484-89be-0a02-5e86-6041ba31d...@scalableinformatics.com> Content-Type: text/plain; charset=windows-1252; format=flowed Licensing might impede

Re: [Beowulf] non-stop computing

2016-10-26 Thread Joe Landman
On 10/26/2016 10:20 AM, Prentice Bisbal wrote: How so? By only having a single seat or node-locked license? Either ... for licensed code this is a non-starter. Which is a shame that we still are talking about node locked/single seat in 2016. -- Joseph Landman, Ph.D Founder and CEO Scala

Re: [Beowulf] non-stop computing

2016-10-26 Thread Prentice Bisbal
How so? By only having a single seat or node-locked license? Prentice Bisbal Lead Software Engineer Princeton Plasma Physics Laboratory http://www.pppl.gov On 10/26/2016 09:52 AM, Joe Landman wrote: Licensing might impede this ... Usually does. On 10/26/2016 09:50 AM, Prentice Bisbal wrote:

Re: [Beowulf] non-stop computing

2016-10-26 Thread John Hearns
Well you CAN have RAM arranged in banks which mirror each other in a RAID-1 fashion. But heck, why not have THREE servers running the same problem - then two of them can vote out the other one, and start to mutter about it behind its back... -Original Message- From: Beowulf [mailto:beo

Re: [Beowulf] non-stop computing

2016-10-26 Thread Joe Landman
Licensing might impede this ... Usually does. On 10/26/2016 09:50 AM, Prentice Bisbal wrote: There is a amazing beauty in this simplicity. Prentice On 10/25/2016 02:46 PM, Gavin W. Burris wrote: Hi, Michael. What if the same job ran on two separate nodes, with IO to local scratch? What a

Re: [Beowulf] non-stop computing

2016-10-26 Thread Prentice Bisbal
I would be laughing if this wasn't so true. The sad thing is, the person who took on this convoluted, BS-heavy approach would probably get promoted for managing a "large, complicated project with many moving parts" while the guy who took Gavin's approach would continue to toil away in his base

Re: [Beowulf] non-stop computing

2016-10-26 Thread Prentice Bisbal
There is a amazing beauty in this simplicity. Prentice On 10/25/2016 02:46 PM, Gavin W. Burris wrote: Hi, Michael. What if the same job ran on two separate nodes, with IO to local scratch? What are the odds both nodes would fail in that three week period. No special hardware / software req

Re: [Beowulf] non-stop computing

2016-10-26 Thread Justin Y. Shi
John's post is really funny! But I would only endorse Gavin's recommendation for it solves the problem statistically (and correctly). Justin On Wed, Oct 26, 2016 at 12:07 AM, Christopher Samuel wrote: > On 26/10/16 14:45, John Hanks wrote: > > > I'd suggest making NFS mounts hard, so processes