Re: [Beowulf] Here we go again

2019-12-13 Thread Bill Broadley
On 12/12/19 6:35 AM, Douglas Eadline wrote: Anyone see anything like this with Epyc, i.e. poor AMD performance when using Intel compilers or MKL? https://www.pugetsystems.com/labs/hpc/AMD-Ryzen-3900X-vs-Intel-Xeon-2175W-Python-numpy---MKL-vs-OpenBLAS-1560/ I as getting anomalously slow perfor

Re: [Beowulf] Containers in HPC

2019-05-24 Thread Bill Broadley
> A downside of containers is MUCH less visibility from the host OS. Sorry, I meant to say a downside of *virtual machines* is MUCH less visibility from the host OS. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change

Re: [Beowulf] Containers in HPC

2019-05-24 Thread Bill Broadley
On 5/23/19 5:35 AM, Jonathan Aquilina wrote:> Thanks for the great explanation and clarification. Another question that stems from the below what mechanisms exist in terms of security for the containers to be as secure as a VM? As usual with security it's complicated. Both VPSs and containers hav

Re: [Beowulf] Containers in HPC

2019-05-23 Thread Bill Broadley
On 5/23/19 3:49 AM, Jonathan Aquilina wrote:> Hi Guys, > > > > Can someone clarify for me are containers another form of virtualized > systems? > Or are they isolated environments running on bare metal? Generally virtual machines run their own kernel. Typically CPU overhead is close to zero,

Re: [Beowulf] Introduction and question

2019-02-28 Thread Bill Broadley
Yes you belong! Welcome to the list. There's many different ways to run a cluster. But my recommendations: * Making the clusters as identical as possible. * setup ansible roles for head node, NAS, and compute node * avoid installing/fixing things with vi/apt-get/dpkg/yum/dnf, use ansible w

[Beowulf] AMD Epyc + Omni-Path?

2018-03-21 Thread Bill Broadley
Anyone else running AMD Epyc (or any other non Intel CPU) and Omni-Path? I've have some AMD Epyc 7451 nodes working, but I went to buy more only to hear that it's not a configuration that the vendor or Intel will support. I've never needed support from Mellanox or Pathscale/Qlogic/Intel for previ

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-08 Thread Bill Broadley
Last time I saw this problem was because the chassis was missing the air redirection guides, and not enough air was getting to the CPUs. The OS upgrade might actually be enabling better throttling to keep the CPU cooler. ___ Beowulf mailing list, Beowu

Re: [Beowulf] cluster deployment and config management

2017-09-06 Thread Bill Broadley
On 09/05/2017 07:14 PM, Stu Midgley wrote: > I'm not feeling much love for puppet. I'm pretty fond of puppet for managing clusters. We use cobbler to go from PXE boot -> installed, then puppet takes over. Some of my favorite features: * Inheritance is handy node -> node for a particular cluster

Re: [Beowulf] cold spare storage?

2017-08-17 Thread Bill Broadley via Beowulf
On 08/17/2017 11:10 AM, Alex Chekholko wrote: > The Google paper from a few years ago showed essentially no correlations > between > the things you ask about and failure rates. So... do whatever is most > convenient for you. Backblaze also has a pretty large data set, granted not as big as googl

Re: [Beowulf] How to debug slow compute node?

2017-08-16 Thread Bill Broadley via Beowulf
On 08/10/2017 07:39 AM, Faraz Hussain wrote: > One of our compute nodes runs ~30% slower than others. It has the exact same > image so I am baffled why it is running slow . I have tested OMP and MPI > benchmarks. Everything runs slower. The cpu usage goes to 2000%, so all looks > normal there. We

Re: [Beowulf] Register article on Epyc

2017-07-04 Thread Bill Broadley
On 07/02/2017 05:43 AM, jaquilina wrote: > What is everyone's thoughts on Intel new i9 cpus as these boast significant > jump > in core count Relabled Xeons, just like all the previous generations. Same socket and same number of memory channels, it's just marketing. The 8 core variety has been

Re: [Beowulf] Register article on Epyc

2017-06-22 Thread Bill Broadley
On 06/22/2017 08:21 PM, Kilian Cavalotti wrote: > Oh, and at least the higher core-count SKUs like the 32-core 7251 are > actually 4 8-core dies linked together with a new "Infinity Fabric" > interconnect, not a single 32-core die. I completely missed that. And > it's fine, it probably makes sense

Re: [Beowulf] Register article on Epyc (Brian Dobbins)

2017-06-22 Thread Bill Broadley
On 06/22/2017 04:41 PM, mathog wrote: > On 22-Jun-2017 15:05, Greg Lindahl wrote: >> I don't think it hurt AMD that much in the end. > > I disagree. It's hard to say. I agree that AMD very slowly managed to claw some small market share from intel with the Opteron. I believe it was on the order

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Bill Broadley
On 06/21/2017 05:29 PM, Christopher Samuel wrote: > On 21/06/17 22:39, John Hearns wrote: > >> I would speculate about single socket AMD systems, with a smaller form >> facotr motherboard, maybe with onboard Infiniband. Put a lot of these >> cards in a chassis and boot them disklessly and you get

Re: [Beowulf] more automatic building

2016-10-06 Thread Bill Broadley
On 10/02/2016 06:11 PM, Christopher Samuel wrote: > On 30/09/16 23:43, Mikhail Kuzminsky wrote: > >> Are there, by your opinions, some clear OpenHPC minuses ? > > Last I heard their Open-MPI builds don't include Slurm support for > perceived licensing issues (odd to me, but that's lawyers for you),

Re: [Beowulf] more automatic building

2016-09-29 Thread Bill Broadley
On 09/28/2016 07:34 AM, Mikhail Kuzminsky wrote: > I worked always w/very small HPC clusters and built them manually > (each server). Manual installs aren't too bad up to 4 nodes or so. > But what is reasonable to do for clusters containing some tens or > hundred of nodes ? We use cobbler for D

[Beowulf] NFS HPC survey results.

2016-07-20 Thread Bill Broadley
Many thanks for all the responses. Here's the promised raw data: https://wiki.cse.ucdavis.edu/_media/wiki:linux-hpc-nfs-survey.csv I'll summarize the 26 results below. I'll email similar to those that asked. Not everyone answered all questions. 1) cluster OS: 72% Redhat/CentOS/Scientifi

[Beowulf] NFS HPC survey

2016-07-14 Thread Bill Broadley
We use NFS pretty heavily on a mix of a dozen or so small/medium clusters. I was curious how other people have NFS configured for their clusters. I made this survey to collect related information: http://goo.gl/forms/AuXCNR10WhJNgtDw1 It doesn't require a google login, none of the question

Re: [Beowulf] China aims for 100 PF

2016-06-21 Thread Bill Broadley
On 06/21/2016 05:14 AM, Remy Dernat wrote: Hi, 100 PF is really not far from reality right now: http://www.top500.org/news/new-chinese-supercomputer-named-worlds-fastest-system-on-latest-top500-list/ I was curious about the CPU/architecture and I found: http://www.netlib.org/utk/people/Jack

Re: [Beowulf] memory bandwidth scaling

2015-10-05 Thread Bill Broadley
On 10/01/2015 09:27 AM, Orion Poplawski wrote: > We may be looking a getting a couple new compute nodes. I'm leery though of > going too high in processor core counts. Does anyone have any general > experiences with performance scaling up to 12 cores per processor with general > models like CM1/W

[Beowulf] NFS + IB?

2015-02-20 Thread Bill Broadley
I read through the beowulf archives for mentions of NFS + IB. I found nothing newer than 2012. What are peoples current experience with NFS + IB? I'm looking at the options for smaller clusters with /home on NFS. I'll leave distributed filesystems for a separate discussion. The two leading opt

[Beowulf] Open source and the Draft Report of the Task Force on High Performance Computing

2014-08-27 Thread Bill Broadley
The URL: http://energy.gov/seab/downloads/draft-report-task-force-high-performance-computing One piece I found particularly interesting: There has been very little open source that has made its way into broad use within the HPC commercial community where great emphasis is placed on servic

[Beowulf] Nvidia K1 Denver

2014-08-12 Thread Bill Broadley
I was surprised to find the Nvidia K1 to be a surprising departure from the ARM Cortex a53 and a57 cores. Summary at: http://blogs.nvidia.com/blog/2014/08/11/tegra-k1-denver-64-bit-for-android/ Details at (if you are willing to share your email address): http://www.tiriasresearch.com/downloads/

[Beowulf] Power8

2014-04-29 Thread Bill Broadley
Sounds like a potentially interesting CPU/platform for HPC. Of particular interest: 1) similar quad socket performance to intel's best 2) embracing 3rd parties access to cc memory 3) up to 8 off chip memory controllers with cache (centaur chip) 4) allowing 3rd party motherboards 5) IBM exploring

[Beowulf] Nvidia and IBM create GPU interconnect for faster supercomputing

2014-03-25 Thread Bill Broadley
Sounds like a memory coherent 80GB/sec link: http://arstechnica.com/information-technology/2014/03/nvidia-and-ibm-create-gpu-interconnect-for-faster-supercomputing/ They mention GPU<->GPU links, but don't quite mention system <-> system links. ___ Beowu

Re: [Beowulf] Mutiple IB networks in one cluster

2014-02-04 Thread Bill Broadley
On 02/01/2014 08:17 AM, atchley tds.net wrote: > The cross-bar switch only guarantees non-blocking if the two ports are on > the same line card (i.e. using the same crossbar). Once you start > traversing multiple crossbars, you are sharing links and can experience > congestion. Full backplane mean

Re: [Beowulf] Admin action request

2013-11-22 Thread Bill Broadley
>Option 3: Enforce some of our basic etiquette. If you aren't willing > to abide by the house rules, you won't be allowed into the house to > violate the rules. In this case, I see more than two strikes, so I am > not all that inclined to be terribly forgiving of these breaches. I like #3

Re: [Beowulf] SSD caching for parallel filesystems

2013-02-09 Thread Bill Broadley
On 02/09/2013 01:22 PM, Vincent Diepeveen wrote: > SATA is very bad protocol for SSD's. > > SSD's allows perfectly parallel stores and writes, SATA doesn't. > So SATA really limits the SSD's true performance. SSDs and controllers often support NCQ which allows multiple outstanding requests. Not

Re: [Beowulf] AMD Roadrunner open compute motherboard

2013-01-16 Thread Bill Broadley
On 01/16/2013 10:20 AM, Hearns, John wrote: > http://www.theregister.co.uk/2013/01/16/amd_roadrunner_open_compute_motherboard/ The pictured 1U has what looks like 12 15k RPM fans (not including the power supplies). Or 6 double fans if you prefer. In my experience those fans burn an impressive am

Re: [Beowulf] AMD Roadrunner open compute motherboard

2013-01-16 Thread Bill Broadley
On 01/16/2013 11:27 AM, Vincent Diepeveen wrote: > The thing looks shitty. Just 2 sockets. At 2 sockets AMD is junk. Heh, at least at running chess programs that's of interest to approximately 0.00% of the market. > At > 4 sockets it would be interesting though - yet that's not shown. Dunno, s

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-12 Thread Bill Broadley
On 01/12/2013 07:29 AM, Vincent Diepeveen wrote: > Yes i was the inventor of that test to jump using a RNG randomly. > Paul Hsieh then modified it from calling the RNG and correcting for > the RNG, to the direct pointer math as you show here. Oh come now Vincent, inventor is a very strong word fo

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-12 Thread Bill Broadley
On 01/12/2013 04:25 PM, Stu Midgley wrote: > Until the Phi's came along, we were purchasing 1RU, 4 sockets nodes > with 6276's and 256GB ram. On all our codes, we found the throughput > to be greater than any equivalent density Sandy bridge systems > (usually 2 x dual socket in 1RU) at about 10-15

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-12 Thread Bill Broadley
On 01/11/2013 05:22 AM, Vincent Diepeveen wrote:> >> Bill - a 2 socket system doesn't deliver 512GB ram. > On 01/11/2013 05:59 AM, Reuti wrote: > Maybe I get it wrong, but I was checking these machines recently: > > IBM's x3550 M4 goes up to 768 GB with 2 CPUs > http://public.dhe.ibm.com/common

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-11 Thread Bill Broadley
On 01/11/2013 04:01 AM, Joshua mora acosta wrote: > Hi Bill, > AMD should pay you for these wise comments ;) > > But since this list is about providing feedback, and sharing knowledge, I > would like to add something to your comments, and somewhat HW agnostic. When > you are running stream bench

[Beowulf] AMD performance (was 500GB systems)

2013-01-10 Thread Bill Broadley
Over the last few months I've been hearing quite a few negative comments about AMD. Seems like most of them are extrapolating from desktop performance. Keep in mind that it's quite a stretch going from a desktop (single socket, 2 memory channels) to a server (dual socket, 4x the cores, 8 memory

Re: [Beowulf] Configuration management tools/strategy

2013-01-09 Thread Bill Broadley
On 01/06/2013 05:38 AM, Walid wrote: > Dear All, > > At work we are starting to evaluate Configuration management to be used > to manage several diverse hpc clusters We currently managing 15 clusters with puppet and am very pleased with puppet. Puppet is one of the critical pieces that allows us

[Beowulf] Intel 82574L problems with newer kernels?

2012-12-11 Thread Bill Broadley
Anyone have some working tweaks to get an Intel E1000e driver + 82574L chip to behave with linux 3.5 or 3.7 kernels? Not sure if this is a problem for all 82574Ls or just ones on recent supermicro motherboards. I noticed stuttering, occasional high latencies, and a continuously increasing droppe

Re: [Beowulf] ARM cpu's and development boards and research

2012-11-27 Thread Bill Broadley
On 11/27/2012 07:46 AM, Vincent Diepeveen wrote: > i dug around in price of ARMs and development boards. > > If you just buy a handful most interesting offer seems to be > > http://www.hardkernel.com/renewal_2011/products/prdt_info.php? > g_code=G133999328931 > > it's $129 and has a quad core A

Re: [Beowulf] A petabyte of objects

2012-11-13 Thread Bill Broadley
If you need an object store and not a file system I'd consider hadoop. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/b

Re: [Beowulf] let's standardize liquid cooling

2012-09-28 Thread Bill Broadley
Sounds expensive, complicated, and challenging. How about a MUCH simpler proposal: eliminate fans from compute nodes. Nodes should: * assume good front to back airflow Racks would: * have large fans front AND back that run at relatively low rpm, and relatively quiet. * If front or rear door o

Re: [Beowulf] FY;) GROMACS on the Raspberry Pi

2012-09-19 Thread Bill Broadley
I taught a MPI class a few times and wanted something simple, fun, and could be improved upon several times as the students learned MPI. It's obviously embarrassingly parallel, but non-trivial to do well. There's often not enough work per pixel or per image to make the communications overhead lo

Re: [Beowulf] cluster building advice?

2012-09-17 Thread Bill Broadley
On 09/16/2012 02:52 PM, Jeffrey Rossiter wrote:> The intention is for the system to be > used for scientific computation. That doesn't narrow it down much. > I am trying to decide on a linux > distribution to use. I suggest doing it yourself based on whatever popular linux distro you have experi

Re: [Beowulf] Status of beowulf.org?

2012-06-15 Thread Bill Broadley
On 06/15/2012 12:25 PM, Jan Wender wrote: > Hi all, > > Arend from Penguin replied and they are looking for the list. They would > like to continue hosting the list, but would ask for some volunteers to > administrate it. Well if they are doing such a poor job and aren't willing to administrate i

Re: [Beowulf] Torrents for HPC

2012-06-13 Thread Bill Broadley
On 06/13/2012 06:40 AM, Bernd Schubert wrote: > What about an easy to setup cluster file system such as FhGFS? Great suggestion. I'm all for a generally useful parallel file systems instead of torrent solution with a very narrow use case. > As one of > its developers I'm a bit biased of course,

Re: [Beowulf] Torrents for HPC

2012-06-12 Thread Bill Broadley
On 06/12/2012 03:47 PM, Skylar Thompson wrote: > We manage this by having users run this in the same Grid Engine > parallel environment they run their job in. This means they're > guaranteed to run the sync job on the same nodes their actual job runs > on. The copied files change so slowly that eve

Re: [Beowulf] Torrents for HPC

2012-06-12 Thread Bill Broadley
Many thanks for the online and offline feedback. I've been reviewing the mentioned alternatives. From what I can tell none of them allow nodes to join/leave at random. Our problem is that a user might submit 500-50,000 jobs that depend on a particular dataset and have a variable number of job

[Beowulf] Torrents for HPC

2012-06-08 Thread Bill Broadley
I've built Myrinet, SDR, DDR, and QDR clusters ( no FDR yet), but I still have users whose use cases and budgets still only justify GigE. I've setup a 160TB hadoop cluster is working well, but haven't found justification for the complexity/cost related to lustre. I have high hopes for Ceph, b

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-27 Thread Bill Broadley
On 01/27/2012 02:25 PM, Gilad Shainer wrote: > So I wonder why multiple OEMs decided to use Mellanox for on-board > solutions and no one used the QLogic silicon... That's a strange argument. What does Intel want? Something to make them more money. In the past that's been integrating functional

[Beowulf] HP redstone servers

2011-11-01 Thread Bill Broadley
The best summary I've found: http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ Specifications at for the ECX-1000: http://www.calxeda.com/products/energycore/ecx1000/techspecs And EnergyCard: http://www.calxeda.com/products/energycards/techspecs The only hint on price that I f

Re: [Beowulf] materials for air shroud?

2011-08-31 Thread Bill Broadley
On 08/31/2011 12:15 PM, David Mathog wrote: > That never crossed my mind. > > You sure about the flammability? I believe it for the ignition due to > temperature (Fahrenheit 451 and all that). However, I have a gut > feeling (but no data) that sparks are fairly likely to ignite cardboard, > and

Re: [Beowulf] dollars-per-teraflop : any lists like the Top500?

2010-06-30 Thread Bill Broadley
On 06/29/2010 10:50 PM, Greg Lindahl wrote: On Wed, Jun 30, 2010 at 12:30:12AM -0500, Rahul Nabar wrote: The Top500 list has many useful metrics but I didn't see any $$ based metrics there. Other communities with $$-based metrics haven't had much success with them. In HPC, many contracts are

Re: [Beowulf] AMD 6100 vs Intel 5600

2010-04-01 Thread Bill Broadley
On 04/01/2010 12:59 AM, Peter Kjellstrom wrote: I'm not convinced, is the number of cores more important that agg. performance and price? Also, if you turn on SMT/HT on a 6-core westmere it may appear very similar to a 12-core Magnycour (performance, appearance, price, ...). I'd be interested i

Re: [Beowulf] AMD 6100 vs Intel 5600

2010-04-01 Thread Bill Broadley
On 04/01/2010 05:34 AM, Jerker Nyberg wrote: On Thu, 1 Apr 2010, Peter Kjellstrom wrote: My experience is that in HPC it always boils down to price/performance and that would in my eyes make apples out of Magnycour and Westmere. I just ordered two desktop systems with Intel i7-860 2.8 GHz QC

Re: [Beowulf] AMD 6100 vs Intel 5600

2010-03-31 Thread Bill Broadley
On 03/31/2010 10:37 AM, Kilian CAVALOTTI wrote: > On Wed, Mar 31, 2010 at 5:27 PM, Orion Poplawski wrote: >> Looks like it's time to start evaluating the AMD 6100 (magny-cours) >> offerings versus the Intel 5600 (Nehalem-EX?) offerings. Any suggestions >> for resources? > > Just for the sake of

Re: [Beowulf] cpufreq, multiple cores, load

2010-03-09 Thread Bill Broadley
David Mathog wrote: > Starting a second cpuburn apparently schedules it > on one of the cores on the unused second processor, rather than > on the equally unused, but already sped up, second core on the first > CPU. Since that gives the most additional performance that seems a reasonable default.

Re: [Beowulf] copying data between clusters

2010-03-05 Thread Bill Broadley
Grid-ftp? http://www.globus.org/toolkit/docs/3.2/gridftp/key/index.html ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo

Re: [Beowulf] hardware RAID versus mdadm versus LVM-striping

2010-01-17 Thread Bill Broadley
Rahul Nabar wrote: > If I have a option between doing Hardware RAID versus having software > raid via mdadm is there a clear winner in terms of performance? No. > Or is > the answer only resolvable by actual testing? I have a fairly fast > machine (Nehalem 2.26 GHz 8 cores) and 48 gigs of RAM. >

Re: [Beowulf] New member, upgrading our existing Beowulf cluster

2009-12-03 Thread Bill Broadley
Greg Lindahl wrote: > On Fri, Dec 04, 2009 at 12:57:07PM +1100, Chris Samuel wrote: > >> If you've got a job running on there for a month >> or two then there's a fairly high opportunity cost >> involved. > > That kind of policy has a fairly high opportunity cost, even before > you factor in link

Re: [Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

2009-12-02 Thread Bill Broadley
Art Poon wrote: > I've tried resetting the SMC switch to factory defaults (with > auto-negotiate on). I've checked the /etc/beowulf/modprobe.conf and it > doesn't seem to be demanding anything exotic. We've tried swapping out to > another SMC switch but that didn't change anything. I had a very

Re: [Beowulf] Forwarded from a long time reader having trouble posting

2009-12-01 Thread Bill Broadley
Joe Landman wrote: > My apologies if this is bad form, I know Toon from his past > participation on this list, and he asked me to forward. > > Original Message Hi Toon, long time no type. > Dear all, > I've been working on hpux-itanium for the last 2 years (and even > unsubscri

Re: [Beowulf] How Would You Test Infiniband in New Cluster?

2009-11-17 Thread Bill Broadley
Jon Forrest wrote: > I had said "I believe these are with IB." > Now I'm not so sure. I just did a The performance numbers you showed from relay and mpi_nxnlatbw are definitely much faster than GigE. Unless it's multiple copies running on a single machine (thus printing the hostname). Assuming t

Re: [Beowulf] How Would You Test Infiniband in New Cluster?

2009-11-17 Thread Bill Broadley
Jon Forrest wrote: > Bill Broadley wrote: > >> My first suggest sanity test would be to test latency and bandwidth to >> insure >> you are getting IB numbers. So 80-100MB/sec and 30-60us for a small >> packet >> would imply GigE. 6-8 times the bandwidth ce

Re: [Beowulf] How Would You Test Infiniband in New Cluster?

2009-11-17 Thread Bill Broadley
Jon Forrest wrote: > Let's say you have a brand new cluster with > brand new Infiniband hardware, and that > you've installed OFED 1.4 and the > appropriate drivers for your IB > HCAs (i.e. you see ib0 devices > on the frontend and all compute nodes). > The cluster appears to be working > fine but

Re: [Beowulf] One time passwords and two factor authentication for a HPC setup (might be offtopic? )

2009-10-12 Thread Bill Broadley
Rahul Nabar wrote: > [I apologize if this might be somewhat offtopic for HPC;it could be > termed a generic Linux logon problem but I couldn't find many leads in > my typical linux.misc group.] How to secure a valuable network resource like a cluster sounds on topic to me. > I've used RSA type ca

[Beowulf] DRAM error rates: Nightmare on DIMM street

2009-10-06 Thread Bill Broadley
Somewhat of a follow up of the rather large study of disk drive reliability that google published awhile back: http://blogs.zdnet.com/storage/?p=638 PDF on which the article is based on: http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf ___ Beo

Re: [Beowulf] Nvidia FERMI/gt300 GPU

2009-10-01 Thread Bill Broadley
Craig Tierney wrote: > Bill Broadley wrote: >> Impressive: >> * IEEE floating point, doubles 1/2 as fast as single precision (6 times or >> so faster than the gt200). >> * ECC > > The GDDR5 says it supports ECC, but what is the card going to do? > Is it ECC

[Beowulf] Nvidia FERMI/gt300 GPU

2009-10-01 Thread Bill Broadley
Impressive: * IEEE floating point, doubles 1/2 as fast as single precision (6 times or so faster than the gt200). * ECC * 512 cores (gt200 has 240) * 384 bit bus gddr5 (twice as fast per pin, gt200 has 512 bits) * 3 billion transistors * 64KB of L1 cache per SM, 768KB L2, cache coherent across th

Re: [Beowulf] XEON power variations

2009-09-16 Thread Bill Broadley
Tom Rockwell wrote: > Hi, > > Intel assigns the same power consumption to different clockspeeds of L, > E, X series XEON. All L series have the same rating, all E series etc. > So, taking their numbers, the fastest of each type will always have the > best performance per watt. Wrong, well they

Re: [Beowulf] Intra-cluster security

2009-09-13 Thread Bill Broadley
Stuart Barkley wrote: > - Each user Very dangerous way to say it. Ideally you do everything possible to minimize the work of the user, that way they can't get it wrong. > creates a password-less ssh private key, puts the public I'm a fan of password-less private keys. Before the screaming beg

Re: [Beowulf] how large of an installation have people used NFS with? would 300 mounts kill performance?

2009-09-09 Thread Bill Broadley
Mark Hahn wrote: >> Our new cluster aims to have around 300 compute nodes. I was wondering >> what is the largest setup people have tested NFS with? Any tips or > > well, 300 is no problem at all. though if you're talking to a single > Gb-connected server, you can't home for much BW per node...

Re: [Beowulf] petabyte for $117k

2009-09-02 Thread Bill Broadley
Greg Lindahl wrote: > As for people's vibrations comments: they own a bunch of them and they > work... For now, I've seen similar setups last 6-12 months before a drive drops, then a rebuild triggers drop #2. > but that is only a single point of evidence and not a history > of working with a vari

Re: [Beowulf] petabyte for $117k

2009-09-02 Thread Bill Broadley
Eugen Leitl wrote: > On Tue, Sep 01, 2009 at 04:28:10PM -0700, Bill Broadley wrote: > >> I'm very curious to hear how they are in production. I've had vibration of > > My thoughts exactly. The lid screws down to apply pressure to a piece of foam. Foam presses down

Re: [Beowulf] petabyte for $117k

2009-09-01 Thread Bill Broadley
Greg Lindahl wrote: > http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ > > Kinda neat -- how does the price compare to the various 48-drive > systems available? I'm very curious to hear how they are in production. I've had vibration of large sets of dr

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-14 Thread Bill Broadley
Mikhail Kuzminsky wrote: > In message from Bill Broadley (Thu, 13 Aug 2009 > 17:09:24 -0700): > > Do I unerstand correctly that this results are for 4 cores& 4 openmp > threads ? And what is DDR3 RAM: DDR3/1066 ? 4 cores and 8 openmp threads. 4 threads is slightly faster:

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-13 Thread Bill Broadley
Tom Elken wrote: > To add some details to what Christian says, the HPC Challenge version of > STREAM uses dynamic arrays and is hard to optimize. I don't know what's > best with current compiler versions, but you could try some of these that > were used in past HPCC submissions with your program,

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-12 Thread Bill Broadley
Rahul Nabar wrote: > On Tue, Aug 11, 2009 at 12:06 PM, Bill Broadley wrote: >> Looks to me like you fit in the barcelona 512KB L2 cache (and get good >> scaling) and do not fit in the nehalem 256KB L2 cache (and get poor scaling). > > Thanks Bill! I never realized that the L2

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-12 Thread Bill Broadley
Gus Correa wrote: > Hi Bill, list > > Bill: This is very interesting indeed. Thanks for sharing! > > Bill's graph seem to show that Shanghai and Barcelona scale > (almost) linearly with the number of cores, whereas Nehalem stops > scaling and flattens out at 4 cores. Right. That's not really

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-12 Thread Bill Broadley
I've been working on a pthread memory benchmark that is loosely modeled on McCalpin's stream. It's been quite a challenge to remove all the noise/lost performance from the benchmark to get close to performance I expected. Some of the obstacles: * For the compilers that tend to be better at stream

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-11 Thread Bill Broadley
Rahul Nabar wrote: > Exactly! But I thought this was the big advance with the Nehalem that > it has removed the CPU<->Cache<->RAM bottleneck. Not sure I'd say removed, but they have made a huge improvement. To the point where a single socket intel is better than a dual socket barcelona. > So if

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Bill Broadley
Joshua Baker-LePain wrote: > Well, as there are only 8 "real" cores, running a computationally > intensive process across 16 should *definitely* do worse than across 8. I've seen many cases where that isn't true. The P4 rarely justified turning on HT because throughput would often be lower. Wit

Re: [Beowulf] Small form computers as cluster nodes - any comments about the Shuttle brand ?

2009-08-08 Thread Bill Broadley
Gerry Creager wrote: > I'd be trying to find ways to get 1u systems and, if 8 is the number, > you'll find they don't take up much room. Doubly so if you get one of the 2 nodes in 1U or 4 nodes in 2U. ___ Beowulf mailing list, Beowulf@beowulf.org sponsor

Re: [Beowulf] Approach For Diagnosing Heat Related Failure?

2009-07-21 Thread Bill Broadley
I'd suggest doing a visual inspection. Make sure all fans are not blocked by cables, are spinning. If that looks normal pull the CPU heat sinks and make sure they have good coverage with the heat sink goo, but not so much that it leaks over the edge of the chip. When you put the heat sink back

Re: [Beowulf] Parallel Programming Question

2009-06-26 Thread Bill Broadley
amjad ali wrote: > Hello all, > > In an mpi parallel code which of the following two is a better way: > > 1) Read the input data from input data files only by the master process > and then broadcast it other processes. > > 2) All the processes read the input data directly from input da

Re: [Beowulf] crunch per kilowatt: GPU vs. CPU

2009-05-18 Thread Bill Broadley
Craig Tierney wrote: > Where did you get the 1/12th number for NVIDIA? For each streaming > multiprocessor (SM) > has 1 single precision FPU per thread (8 threads per SM), but only 1 double > precision FPU > on the SM. So that ratio would be 1/8. I just used the nvidia provided information: ht

Re: [Beowulf] crunch per kilowatt: GPU vs. CPU

2009-05-18 Thread Bill Broadley
Lux, James P wrote: > Going "off chip" (e.g. for a memory access) will increase energy > consumption because you have to charge and discharge the capacitance of the > PCB traces and > drive the input impedance of the memory. This can be surprisingly large. > > Example: a typical load impedance on

Re: [Beowulf] crunch per kilowatt: GPU vs. CPU

2009-05-18 Thread Bill Broadley
Joe Landman wrote: > Hi David > > David Mathog wrote: >> Although the folks now using CUDA are likely most interested in crunch >> per unit time (time efficiency), perhaps some of you have measurements >> and can comment on the energy efficiency of GPU vs. CPU computing? That >> is, which uses th

Re: [Beowulf] Should I go for diskless or not?

2009-05-14 Thread Bill Broadley
Dr Cool Santa wrote: > I have a cluster of identical computers. We are planning to add more nodes > later. I was thinking whether I should go the diskless nodes way or not? > Diskless nodes seems as a really exciting, interesting and good option, > however when I did it I needed to troubleshoot a l

Re: [Beowulf] recommendations for cluster upgrades

2009-05-13 Thread Bill Broadley
Rahul Nabar wrote: > On Tue, May 12, 2009 at 7:05 PM, Greg Keller wrote: >> Nehalem is a huge step forward for Memory Bandwidth hogs. We have one code >> that is extremely memory bandwidth sensitive that i > > Thanks Greg. A somewhat naiive question I suspect. > > What's the best way to test th

Re: [Beowulf] recommendations for cluster upgrades

2009-05-13 Thread Bill Broadley
Gerry Creager wrote: > I'm going to take a little issue with Mark's first statement. I've been > bitten by Intel math bugs in the past (rerunning simulations for All CPUs have bugs. > verification of performance results in interestingly different answers). IMO that should happen with all new CP

Re: [Beowulf] Using commercial clouds for HPC

2009-05-07 Thread Bill Broadley
stephen mulcahy wrote: > Hi, > > I'm pretty sure this came up in some shape or form at some stage on this > list but after extensive Googling and Swishing(!?) of the list I can't > find anything concrete so apologies if I'm restarting an old thread. > > Has anyone done any investigation into usin

Re: [Beowulf] newbie

2009-05-02 Thread Bill Broadley
Chris Samuel wrote: > In the sense that they have no desire to support > competitors hardware, yes. Not really surprising, Sure, they could be nice enough to have a flag to disable the check for non-intel cpus. That way intel could avoid the cost of testing/certification of AMD cpus and folks tha

Re: [Beowulf] Surviving a double disk failure

2009-04-10 Thread Bill Broadley
Guy Coates wrote: > Yikes, epic recovery. > >> What are the lessons learnt? > > You forgot the obvious one. I suggest ditching silly old centos/redhat kernels and run something new enough to allow for scrubbing. So that all your disks don't silently start collecting errors waiting to cascade in

Re: [Beowulf] Rackable / SGI

2009-04-04 Thread Bill Broadley
Eugen Leitl wrote: > On Fri, Apr 03, 2009 at 01:32:13PM -0700, Greg Lindahl wrote: > >>> Will have to do with embedded memory or stacked 3d memory a la >>> http://www.cc.gatech.edu/~loh/Papers/isca2008-3Ddram.pdf >> We've been building bigger and bigger SMPs for a long time, making >> changes to i

Re: [Beowulf] Interesting google server design

2009-04-04 Thread Bill Broadley
Robert G. Brown wrote: > On Fri, 3 Apr 2009, Greg Lindahl wrote: > >> On Fri, Apr 03, 2009 at 09:14:37AM -0400, Robert G. Brown wrote: >> >>> b) The idea is to get the heat production OFF the motherboard. One >>> really interesting thing about the google design is that they hang the >>> stock O

Re: [Beowulf] X5500

2009-04-02 Thread Bill Broadley
Vincent Diepeveen wrote: > Bill, > > the ONLY price that matters is that of ECC ram when posting in a cluster > group. Agreed. All the numbers and URLs I mentioned were for ECC ram. > So in short i can completely ignore your posting. > > ECC is a requirement, not a luxury. Maybe you should re

Re: [Beowulf] X5500

2009-04-02 Thread Bill Broadley
Vincent Diepeveen wrote: > I wouldn't bet at registered-ecc DDR3 ram to become cheaper. > To be honest i misjudged that for DDR reg-ecc ram also, > it still is relative spoken expensive. I've heard this a dozen times, seems repeated quite often. Yet when I actually look I see it either so small a

Re: [Beowulf] Interesting google server design

2009-04-02 Thread Bill Broadley
Andrew Piskorski wrote: > On Wed, Apr 01, 2009 at 04:56:53PM -0700, Bill Broadley wrote: >> http://news.cnet.com/8301-1001_3-10209580-92.html > > According to that articel, the pictured dual socket server has two > hard drives, but from the second photo, although the view is obs

[Beowulf] Interesting google server design

2009-04-01 Thread Bill Broadley
http://news.cnet.com/8301-1001_3-10209580-92.html ___ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] X5500

2009-03-31 Thread Bill Broadley
Mikhail Kuzminsky wrote: > In message from Kilian CAVALOTTI (Tue, > 31 Mar 2009 10:27:55 +0200): >> ... >> Any other numbers, people? > > I beleive there is also a bit other important numbers - prices for Xeon > 55XX and system boards ;-) www.siliconmechanics.com has system pricing, I'm sure the

Re: [Beowulf] Re:running hot?

2009-03-19 Thread Bill Broadley
David Mathog wrote: > Mark Hahn wrote: > >> are you running your machinerooms warm to save power on cooling? > > How much would that really save? Is there a study somewhere > demonstrating substantial power savings? There's a data center going in the bay area and with a few concession from ven

  1   2   >