Re: [Beowulf] MPI over RoCE?

2025-03-08 Thread Greg Lindahl
On Thu, Feb 27, 2025 at 12:28:11PM -0500, Prentice Bisbal wrote: > I hope this isn't a dumb question: Do the Ethernet switches you're > looking at have crossbar switches inside them? No modern switch chip has a crossbar. They're rings, and the ring has high enough capacity to work as well as a cr

Re: [Beowulf] Your thoughts on the latest RHEL drama?

2023-06-26 Thread Greg Lindahl
On Mon, Jun 26, 2023 at 02:27:23PM -0400, Prentice Bisbal via Beowulf wrote: > By now, most of you should have heard about Red Hat's latest to > eliminate any competition to RHEL. I hadn't heard -- because I left back when "CentOS Stream" was announced. It seems that my astronomy colleagues are

Re: [Beowulf] odd vlan issue

2021-06-15 Thread Greg Lindahl
On Mon, Jun 14, 2021 at 12:38:50PM -0400, Michael Di Domenico wrote: > i got roped into troubleshooting an odd network issue. we have a mix > of cisco (mostly nexus) gear spread over our facility. on one > particular vlan it's operating as if it's a hub instead of switch. I have run into this si

[Beowulf] Ethernet switch OSes

2021-04-23 Thread Greg Lindahl
I'm buying a 100 gig ethernet switch for my lab, and it seems that the latest gear is intended to run a switch OS. Being as cheap as I've always been, free software sounds good. It looks like Open Network Linux is kaput. It looks like SONiC is doing pretty well. And there are several commercial

Re: [Beowulf] Is Crowd Computing the Next Big Thing?

2019-11-26 Thread Greg Lindahl
On Tue, Nov 26, 2019 at 08:07:32PM +, Chuck Petras wrote: > Are there any classes of problems that would be monitizeable in a grid > computing environment to make those efforts financially viable? Sure, currently there are 2 major areas: 1) Creating web proxies for web scraping from residen

Re: [Beowulf] Contents of Compute Nodes Images vs. Login Node Images

2018-10-23 Thread Greg Lindahl
On Tue, Oct 23, 2018 at 05:48:00PM +, Ryan Novosielski wrote: > We’re getting some complaints that there’s not enough stuff in the > compute node images, and that we should just boot compute nodes to > the login node image It's probably worth your while sitting down with your users and learni

Re: [Beowulf] ServerlessHPC

2018-07-24 Thread Greg Lindahl
We should all remember Don Becker's definition of "zero copy" -- it's when you make someone else do the copy and then pretend it was free. That was totally a foreshadowing of "serverless"! -- greg ___ Beowulf mailing list, Beowulf@beowulf.org sponsore

Re: [Beowulf] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

2018-01-02 Thread Greg Lindahl
On Wed, Jan 03, 2018 at 02:46:07PM +1100, Christopher Samuel wrote: > There appears to be no microcode fix possible and the kernel fix will > incur a significant performance penalty, people are talking about in the > range of 5%-30% depending on the generation of the CPU. :-( The performance hit

Re: [Beowulf] Register article on Epyc (Brian Dobbins)

2017-06-22 Thread Greg Lindahl
On Thu, Jun 22, 2017 at 12:27:30PM -0700, mathog wrote: > Recall that when the Opterons first came out the major manufacturers > did not ship any systems with it for what, a year, maybe longer? I > vaguely recall SuperMicro going in quickly and Dell, HP, and IBM > whistling in a corner. Somethin

Re: [Beowulf] Not OT, but a quick link to an article on InsideHPC

2017-03-23 Thread Greg Lindahl
Ouch, sorry to hear about both of these. HPC has always been a hard business for small companies. I think I spent about 2 years of full-time work on EKOPath. Joe, I've always been impressed that you got so far as a relatively small company trying to build differentiated systems. -- greg On Fri, M

Re: [Beowulf] Suggestions to what DFS to use

2017-02-13 Thread Greg Lindahl
On Mon, Feb 13, 2017 at 07:55:43AM +, Tony Brian Albers wrote: > Hi guys, > > So, we're running a small(as in a small number of nodes(10), not > storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum > Scale(GPFS) which works fine and has POSIX support. On top of GPFS we > hav

Re: [Beowulf] NUMA zone weirdness

2016-12-16 Thread Greg Lindahl
Wow, that's pretty obscure! I'd recommend reporting it to Intel so that they can add it to the descendants of ipath_checkout / ipath_debug. It's exactly the kind of hidden gotcha that leads to unhappy systems! -- greg On Fri, Dec 16, 2016 at 03:52:34PM +, John Hearns wrote: > Problem solved.

Re: [Beowulf] Dirty COW fix for RHEL 6.x

2016-10-27 Thread Greg Lindahl
On Thu, Oct 27, 2016 at 04:06:26PM -0700, Kilian Cavalotti wrote: > 2. the only supported release is the latest one, meaning that if you > have to stay on say CentOS 6.7 for whatever reason, you don't get a > kernel with the Dirty COW fix. And that is obviously a problem. This is a good time to r

Re: [Beowulf] New System Could Break Bottleneck in Microprocessors

2016-09-13 Thread Greg Lindahl
On Tue, Sep 13, 2016 at 10:57:41AM -0700, chuck_pet...@selinc.com wrote: > "The solution?born of a discussion with Intel engineers and executed by > Solihin's student, Yipeng Wang, at NC State and at Intel?was to turn the > software queue into hardware. Which is exactly what Omni-Path does. Goo

Re: [Beowulf] Parallel programming for Xeon Phis

2016-08-24 Thread Greg Lindahl
On Wed, Aug 24, 2016 at 04:44:03PM +, John Hearns wrote: > OK, I guess that the state of the art for a FORTRAN Compiler in the > 60s is pitiful compared to the sophisticated compilers we have > today. Actually, Fortran was designed to be optimized, and IBM's first compiler was an optimizing c

Re: [Beowulf] curiosity killed the cat

2016-06-02 Thread Greg Lindahl
I found it difficult to compare different technologies using n/2. > > Holger > > > > On 01 Jun 2016, at 15:18, Greg Lindahl wrote: > > > > On Wed, Jun 01, 2016 at 03:25:22PM -0400, Andrew Piskorski wrote: > > > >> You mean GAMMA (Genoa Active Mes

Re: [Beowulf] curiosity killed the cat

2016-06-01 Thread Greg Lindahl
On Thu, Jun 02, 2016 at 07:48:10AM +0800, Stu Midgley wrote: > That's it. As I said, I haven't used it since the early 00's. > > With 100Gb becoming common it might be time for these sort of MPI's to come > back. You could do better than these old interfaces, too... modern ethernet chips have mu

Re: [Beowulf] curiosity killed the cat

2016-06-01 Thread Greg Lindahl
On Wed, Jun 01, 2016 at 03:25:22PM -0400, Andrew Piskorski wrote: > You mean GAMMA (Genoa Active Message MAchine): > > http://www.disi.unige.it/project/gamma/ > http://www.linux-mag.com/id/7253/ > http://www.beowulf.org/pipermail/beowulf/2006-February/014846.html I wonder how their flow co

Re: [Beowulf] Optimized math routines/transcendentals

2016-04-30 Thread Greg Lindahl
; >> there are many clever ways to calculate common functions > >> that you can find in certain math or algorithms & data structures texts. > >> You would also need intimate knowledge of the target chipset. > >> But it seems that would be way too much time in > &

Re: [Beowulf] Optimized math routines/transcendentals

2016-04-29 Thread Greg Lindahl
On Sat, Apr 30, 2016 at 02:23:31AM +0800, C Bergström wrote: > Surprisingly, glibc does a pretty respectable job in terms of > accuracy, but alas it's certainly not the fastest. If you go look in the source comments I believe it says which paper's algorithm it is using... doing range reduction fo

Re: [Beowulf] curiosity killed the cat

2016-04-28 Thread Greg Lindahl
And the answer is: http://www.hpcwire.com/2016/04/26/omni-path-steadily-gaining-market-traction-says-intel/ Omni-Path was the winner. I look forward to more articles from Gilad about why offloading is awesome. -- greg On Thu, Oct 22, 2015 at 11:17:02PM -0700, Greg Lindahl wrote: > The

Re: [Beowulf] urgent: cost of fire suppression?

2016-04-21 Thread Greg Lindahl
You live in Australia, right? Halon systems leak all the time, it's not only discharge due to fires that's an issue. The oldest machine room I've ever "owned" had a grandfathered, ozone-destroying AC system that leaked... a lot. On Thu, Apr 21, 2016 at 08:54:25AM -0500, Stu Midgley wrote: > and,

Re: [Beowulf] urgent: cost of fire suppression?

2016-04-19 Thread Greg Lindahl
On Tue, Apr 19, 2016 at 07:32:38PM +0200, Per Jessen wrote: > I thought halon gas was the usual choice for datacentres, has that gone > out of fashion? It was quite popular. However, it's not friendly to the ozone layer... which means it's phased out due to the Montreal Protocol. -- greg ___

Re: [Beowulf] shared memory error

2016-04-18 Thread Greg Lindahl
You might want to look at the semop manpage. EINVAL means something particular. From the looks of it you could add some print statements for debugging. On Mon, Apr 18, 2016 at 11:31:27PM +0100, Jörg Saßmannshausen wrote: > Hi all, > > sorry for the lack of reply but I done some testings and now I

[Beowulf] curiosity killed the cat

2015-10-22 Thread Greg Lindahl
The Tri-lab TLCC2 machine was the biggest InfiniPath sale ever, and its replacement was just announced... with no mention of interconnect. Anyone know? -- greg ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your

Re: [Beowulf] F1 to go CFD only?

2015-10-14 Thread Greg Lindahl
On Wed, Oct 14, 2015 at 08:40:27AM +0100, John Hearns wrote: > Interesting article on HPCwire > > http://www.hpcwire.com/2015/10/07/formula-one-debates-cfd-only-future/ > > i don;t see it myself - wind tunnels are an important part of the aero > engineers toolbox, and they don't run them for fun.

Re: [Beowulf] Semour Cray 90th Anniversary

2015-10-14 Thread Greg Lindahl
On Wed, Oct 14, 2015 at 09:11:53PM +0100, James Cownie wrote: > If you read https://www.alcf.anl.gov/articles/introducing-aurora > carefully, > you can notice that Intel is the prime contractor on the Aurora > contract, while Cray is a subcont

Re: [Beowulf] memory bandwidth scaling

2015-10-06 Thread Greg Lindahl
On Tue, Oct 06, 2015 at 12:35:30PM -0700, mathog wrote: > Lately I have been working on a system with >512Gb of RAM and a lot > of processors. > This wouldn't be at all a cost effective beowulf node, but it is a > godsend when the problems being addressed require huge amounts of > memory and do no

Re: [Beowulf] Hyper Convergence Infrastructure

2015-10-03 Thread Greg Lindahl
On Sat, Oct 03, 2015 at 10:18:57AM +, Lechner, David A. wrote: > I am wondering if anyone on this list has benchmarked the impact of > an HCI solution on performance, or how this newest "next big thing" > compares to a new Linux/intel commodity solution? It has "hype" in the name, doesn't tha

Re: [Beowulf] interesting article on HPC vs evolution of 'big data' analysis

2015-04-08 Thread Greg Lindahl
On Wed, Apr 08, 2015 at 03:57:34PM -0400, Scott Atchley wrote: > There is concern by some and outright declaration by others (including > hardware vendors) that MPI will not scale to exascale due to issues like > rank state growing too large for 10-100 million endpoints, That's weird, given that

Re: [Beowulf] mpi slow pairs

2014-08-29 Thread Greg Lindahl
On Fri, Aug 29, 2014 at 08:49:57AM -0700, Greg Lindahl wrote: > Huh, Intel (PathScale/QLogic) has shipped a NxN debugging program for > more than a decade. The first vendor I recall shipping such a program > was Microway. I guess it takes a while for good practices to spread > th

Re: [Beowulf] mpi slow pairs

2014-08-29 Thread Greg Lindahl
On Fri, Aug 29, 2014 at 11:30:09AM -0400, Michael Di Domenico wrote: > > Also have you run ibdiagnet to see if anything is flagged up? > > i've run a multitude of ib diags on the machines, but nothing is > popping out as wrong. what's weird is that it's only certain pairing > of machines not an

Re: [Beowulf] Interesting POV about Hadoop

2014-06-09 Thread Greg Lindahl
On Tue, Jun 03, 2014 at 03:32:25PM +, Lockwood, Glenn wrote: > With all due respect to Stonebraker, I found the whole article very silly. Ditto. We built a search engine by extending the bigtable concept to be even more NoSQL, to support an equivalence between streaming and map/reduce. There

Re: [Beowulf] Good IB network performance when using 7 cores, poor performance on all 8?

2014-04-26 Thread Greg Lindahl
On Thu, Apr 24, 2014 at 11:31:56AM -0400, Brian Dobbins wrote: > Hi everyone, > > We're having a problem with one of our clusters after it was upgraded to > RH6.2 (from CentOS5.5) - the performance of our Infiniband network degrades > randomly and severely when using all 8 cores in our nodes for

Re: [Beowulf] SSDs for HPC?

2014-04-08 Thread Greg Lindahl
On Tue, Apr 08, 2014 at 02:12:58PM +, Lux, Jim (337C) wrote: > Consider something > like a credit card processing system. This is going to have a lot of "add > at the end" transaction data. As opposed to, say, a library catalog where > books are checked out essentially at random, and you upda

Re: [Beowulf] Mutiple IB networks in one cluster

2014-02-06 Thread Greg Lindahl
l of a computer interconnect? I thought that the typical > cross-sectional bandwidth was less, or am I mistaken? > Alan Scheinine > > Greg Lindahl wrote: >> In the usual Clos network, 1/2 of the nodes can make a single call to >> the other 1/2 of the nodes. That's what'

Re: [Beowulf] Mutiple IB networks in one cluster

2014-02-05 Thread Greg Lindahl
In the usual Clos network, 1/2 of the nodes can make a single call to the other 1/2 of the nodes. That's what's non-blocking. Nothing else is. Running any real code, every node talks to more than one other node, and the network is not non-blocking. It makes perfect sense in a telephone network. In

Re: [Beowulf] Intel pulls networking onto Xeon Phi

2013-12-03 Thread Greg Lindahl
On Mon, Dec 02, 2013 at 08:41:26AM -0500, atchley tds.net wrote: > On Mon, Dec 2, 2013 at 8:37 AM, atchley tds.net wrote: > > I am not sure what Aries currently offers that IB does not. The IB in question is the True Scale adapter, which does some things really fast and other things pretty slowl

Re: [Beowulf] Let's just all go home then

2013-08-14 Thread Greg Lindahl
TL;DR: when using only 2 or 4 out of 8 total cores, the virtualized gizmo manages better memory locality and better cache usage because of less motion of processes to different cores. In HPC, we generally lock processes to cores, and they should done this, too. Even shorter: Pilot error, nothing t

Re: [Beowulf] anyone using SALT on your clusters?

2013-07-02 Thread Greg Lindahl
On Tue, Jul 02, 2013 at 10:54:14AM -0400, Joe Landman wrote: > One argument which is easy to make for salt, which I didn't see anyone > make is, it lets you lower your risk by removing the ssh daemon. You mean raise your risk, because the ssh equivalent in the pub-sub world is going to be less a

Re: [Beowulf] anyone using SALT on your clusters?

2013-07-01 Thread Greg Lindahl
On Mon, Jul 01, 2013 at 02:01:13PM +0100, Jonathan Barber wrote: > Pinging the host prior to connecting only determines that the IP stack is > working, not that the OS is capable of handling an ssh connection. Indeed, that's why we actually have a more complicated liveness algorithm, starting wit

Re: [Beowulf] anyone using SALT on your clusters?

2013-06-28 Thread Greg Lindahl
On Fri, Jun 28, 2013 at 09:45:50AM +0100, Jonathan Barber wrote: > The problem with SSH based approaches is when you have failed nodes - > normally they cause the entire command to hang until the attempted > connection times out. Normally what people do is ping the node before trying ssh on it. A

Re: [Beowulf] physical memory

2013-04-24 Thread Greg Lindahl
On Wed, Apr 24, 2013 at 10:30:14AM -0400, Lawrence Stewart wrote: > Does linux recombine physical memory into contiguous regions? See: https://lwn.net/Articles/368869/ We find that it's awfully expensive when it's on with our search engine/nosql workload. In an HPC setting, you could explicitly

Re: [Beowulf] Intel splits the network

2013-04-22 Thread Greg Lindahl
On Mon, Apr 22, 2013 at 10:35:34AM -0400, atchley tds.net wrote: > I had forgotten about Fulcrum. I was under the impression that Fulcrum made > the chips and sold them to switch vendors. They did. Now at Intel, they've showed off a new architecture (based on shared memory, an unusual implementat

Re: [Beowulf] Power calculations , double precision, ECC and power of APU's

2013-03-19 Thread Greg Lindahl
On Tue, Mar 19, 2013 at 12:02:08AM -0500, Geoffrey Jacobs wrote: > I would be happy to be corrected, but isn't an SNR of 1000:1 considered > to be excellent for a spectrograph If I remember correctly the dim past when I was almost an astronomer, yes, for things outside the Earth other than the Su

Re: [Beowulf] What Intel is doing with Pathscale

2013-02-24 Thread Greg Lindahl
On Fri, Feb 22, 2013 at 12:31:57PM -0500, Douglas Eadline wrote: > > Intel’s True Scale InfiniBand with QDR-80 > > http://semiaccurate.com/2013/02/22/intels-true-scale-infiniband-with-qdr-80/ And I am *very* pleasantly surprised that they're marketing it that way! Also note that they're headed

Re: [Beowulf] The Why and How of an All-Flash Enterprise Storage Array

2013-02-13 Thread Greg Lindahl
On Wed, Feb 13, 2013 at 12:54:26AM -0500, Douglas J. Trainor wrote: > Check out Pure Storage's approach in a seminar at Stanford by Costa > Sapuntzakis -- Enterprise storage on the Beowulf list? What will they think of next? -- greg ___ Beowulf mail

Re: [Beowulf] SSD caching for parallel filesystems

2013-02-11 Thread Greg Lindahl
On Sun, Feb 10, 2013 at 07:19:44PM +, Andrew Holway wrote: > >> Find me an application that needs big bandwidth and doesn't need massive > >> storage. > > Databases. Lots of databases. blekko's search engine. We own 1/2 petabyte of 160 gig Intel X-25Ms, are bandwidth-limited, and we wouldn't

Re: [Beowulf] AMD Roadrunner open compute motherboard

2013-01-18 Thread Greg Lindahl
On Fri, Jan 18, 2013 at 03:00:11PM -0500, Mark Hahn wrote: > > As in contradiction to what Mark Hahn says - it needs active cooling. > > if anyone cares: a dimm dissipates roughly 2W, depending on generation, > whether it's LV, number of devices, etc. one of those duplex 14krpm fans > is typical

Re: [Beowulf] AMD Roadrunner open compute motherboard

2013-01-18 Thread Greg Lindahl
On Fri, Jan 18, 2013 at 11:39:23PM +, Lux, Jim (337C) wrote: > they're putting them next to power plants so power costs are low and near > toxic waste dumps or swamps or something where land is cheap). In the Silicon Valley, we build office parks and apartments on waste dumps and swamps, and

Re: [Beowulf] Supercomputers face growing resilience problems

2012-11-23 Thread Greg Lindahl
On Thu, Nov 22, 2012 at 11:19:51PM -0500, Justin YUAN SHI wrote: > The fundamental problem rests in our programming API. If you look at > MPI and OpenMP carefully, you will find that these and all others have > one common assumption: the application-level communication is always > successful. Just

Re: [Beowulf] electricity prices

2012-09-26 Thread Greg Lindahl
On Tue, Sep 25, 2012 at 04:44:29PM -0400, Robert G. Brown wrote: > To be honest, given the high costs of nearly everything in CA I'm amazed > that anyone ever locates anything there. Land prices are high. Housing > costs are astronomical. Electricity is 2-3 times more expensive than it > is in

Re: [Beowulf] General cluster management tools - Re: Southampton engineers a Raspberry Pi Supercomputer

2012-09-16 Thread Greg Lindahl
On Sun, Sep 16, 2012 at 10:43:57PM +0200, Andrew Holway wrote: > case in point: We have based a reasonable chunk of our backend > infrastructure on openindiana. http://lwn.net/Articles/514046/. What > do we do now? Choose more carefully next time? Just like you have to do a little due diligence b

Re: [Beowulf] burn-in bootable cd

2012-09-07 Thread Greg Lindahl
On Fri, Sep 07, 2012 at 07:30:10AM +1000, Christopher Samuel wrote: > I think what you're thinking of is Advanced Clusterings "Breakin": > > http://www.advancedclustering.com/software/breakin.html > > Both downloads and a git repo of what's needed to build your own. Last > updated in July with

[Beowulf] burn-in bootable cd

2012-09-06 Thread Greg Lindahl
There used to be a bootable CD which combined a kernel with extra EDAC stuff a user-land which ran HPL Linpack to exercise all the cores/dimms. We can't find a modern version of it, does anyone kow of one? -- greg ___ Beowulf mailing list, Beowulf@beow

Re: [Beowulf] Servers Too Hot? Intel Recommends a Luxurious Oil Bath

2012-09-03 Thread Greg Lindahl
I think the niche they have in mind is "server farms". The Google/Facebook/whoever modern datacenter design that uses free air cooling and hot/cold segregation and whatnot to get down to a SPUE of 1.15 doesn't leave much energy to cut outside the server. Getting rid of the fans is a ~ 15% win. Whet

Re: [Beowulf] Caption Competition

2012-06-19 Thread Greg Lindahl
On Mon, Jun 18, 2012 at 11:39:12AM -0400, Mark Hahn wrote: > we have a location with a raised floor of ~4 ft. I'm not sure how > that was chosen, but I also can't think of any reason why not. > I mean, in general, raised floors are a chilled air plenum, > so it's clearly good to avoid narrow one

Re: [Beowulf] yikes: intel buys cray's spine

2012-04-24 Thread Greg Lindahl
> http://www.eetimes.com/electronics-news/4371639/Cray-sells-interconnect-hardware-unit-to-Intel This is a real surprise. Intel said then that the IB stuff they bought from QLogic/PathScale was intended for exoscale computing. For this buy Intel says: > "This deal does not affect our current Infi

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-30 Thread Greg Lindahl
On Mon, Jan 30, 2012 at 10:04:53AM -0500, Mark Hahn wrote: > > http://www.cscs.ch/fileadmin/user_upload/customers/cscs/Tech_Reports/Performance_Analysis_IB-QDR_final-2.pdf > > as far as I can tell, this paper mainly says "a coalescing stack delivers > benchmark results showing a lot higher bandwi

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-27 Thread Greg Lindahl
On Fri, Jan 27, 2012 at 09:24:10PM -0500, Joe Landman wrote: > > Are you talking about the latency of 1 core on 1 system talking to 1 > > core on one system, or the kind of latency that real MPI programs see, > > running on all of the cores on a system and talking to many other > > systems? I assu

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-27 Thread Greg Lindahl
On Fri, Jan 27, 2012 at 06:10:02PM -0800, Bill Broadley wrote: > Anyone have an estimate on how much latency a direct connect to QPI > would save vs pci-e? ~ 0.2us. Remember that the first 2 generations of InfiniPath were both SDR: one for HyperTransport and one for PCIe. The difference was 0.3us

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-27 Thread Greg Lindahl
On Fri, Jan 27, 2012 at 03:19:31PM -0500, Joe Landman wrote: > >>> That's the whole market, and QLogic says they are #1 in the FCoE > >>> adapter segment of this market, and #2 in the overall 10 gig adapter > >>> market (see > >>> http://seekingalpha.com/article/303061-qlogic-s-ceo-discusses- > >>

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-27 Thread Greg Lindahl
On Fri, Jan 27, 2012 at 11:29:35AM -0800, Håkon Bugge wrote: > That can explain why QLogic is selling, but not why Intel is buying. That's right. This was probably bought, not sold. If you look at the press release Intel put out, it's all about Exascale computing. http://newsroom.intel.com/commu

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-23 Thread Greg Lindahl
On Mon, Jan 23, 2012 at 11:28:26AM -0800, Greg Lindahl wrote: > http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html I figured out the main why: http://seekingalpha.com/news-article/2082171-qlogic-gains-market-share-in-both-fibre-channel-and-10gb-ether

[Beowulf] Intel buys QLogic InfiniBand business

2012-01-23 Thread Greg Lindahl
http://www.hpcwire.com/hpcwire/2012-01-23/intel_to_buy_qlogic_s_infiniband_business.html ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/ma

[Beowulf] Leif Nixon gets quoted in Forbes

2011-11-21 Thread Greg Lindahl
Beowulf-list-participant Lief Nixon got quoted in Forbes! http://www.forbes.com/sites/andygreenberg/2011/11/17/chinas-great-firewall-tests-mysterious-scans-on-encrypted-connections/ -- greg ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Pe

Re: [Beowulf] building Infiniband 4x cluster questions

2011-11-13 Thread Greg Lindahl
> Is that for an MPI message ? I'd heard that FDR might > have higher latencies due to the coding changes that > were happening - is this not the case ? When you're asking about MPI latency, are you interested in a 2-node cluster, or a big one? The usual MPI latency benchmark only uses 1 core eac

Re: [Beowulf] Users abusing screen

2011-10-26 Thread Greg Lindahl
On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote: > If the issue is processes that run for far too long, and are abusing > the system, cgroups or 'pushing' the users to use a batch system seems > to work better than writing scripts to make decisions on killing > processes. What I saw

Re: [Beowulf] Users abusing screen

2011-10-25 Thread Greg Lindahl
On Mon, Oct 24, 2011 at 09:46:49AM -0400, Prentice Bisbal wrote: > The systems where screen is being abused are not part of the batch > system, and they will not /can not be for reasons I don't want to get > into here. The problem with killing long-running programs is that there > are often long r

Re: [Beowulf] 10GbE topologies for small-ish clusters?

2011-10-12 Thread Greg Lindahl
We just bought a couple of 64-port 10g switches from Blade, for the middle of our networking infrastructure. They were the winner over all the others, lowest price and appropriate features. We also bought Blade top-of-rack switches. Now that they've been bought up by IBM you have to negotiate harde

Re: [Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud

2011-10-04 Thread Greg Lindahl
On Tue, Oct 04, 2011 at 03:29:28PM -0400, Chris Dagdigian wrote: > I'm largely with RGB on this one with the minor caveat that I think he > might be undervaluing the insane economies of scale that IaaS providers > like Amazon & Google can provide. You can rent that economy of scale if you're in

Re: [Beowulf] materials for air shroud?

2011-09-16 Thread Greg Lindahl
On Fri, Sep 16, 2011 at 08:35:40PM -0400, Andrew Piskorski wrote: > That makes me wonder why they didn't put all 4 sockets in a row. Then > you could have just put one giant heat sink across all 4; I don't see > any capacitors or such sticking up in the way. It's harder than it looks to get shor

Re: [Beowulf] Fwd: H8DMR-82 ECC error

2011-08-17 Thread Greg Lindahl
> Memtest was ok, I done 9 cycles without any problems. You should be using the HPL implementation of the Linpack benchmark for testing memory. It exercises all of the memory and all of the cores, and is what most HPC vendors seem to use for node burnin. There's even a bootable DVD with a kernel w

Re: [Beowulf] PetaBytes on a budget, take 2

2011-07-22 Thread Greg Lindahl
On Fri, Jul 22, 2011 at 09:05:11AM +0200, Eugen Leitl wrote: > Additional advantage of zfs is that it can deal with the higher > error rate of consumer or nearline SATA disks (though it can do > nothing against enterprise disk's higher resistance to vibration), > and also with silent bit rot with

Re: [Beowulf] PetaBytes on a budget, take 2

2011-07-21 Thread Greg Lindahl
On Fri, Jul 22, 2011 at 01:44:56AM -0400, Mark Hahn wrote: > to be honest, I don't understand what applications lead to focus on IOPS > (rationally, not just aesthetic/ideologically). it also seems like > battery-backed ram and logging to disks would deliver the same goods... In HPC, the metadat

Re: [Beowulf] PetaBytes on a budget, take 2

2011-07-21 Thread Greg Lindahl
On Fri, Jul 22, 2011 at 12:33:37AM -0400, Mark Hahn wrote: > storage isn't about performance any more. ok, hyperbole, a little. > but even a cheap disk does > 100 MB/s, and in all honesty, there are > not tons of people looking for bandwidth more than a small multiplier > of that. sure, a QDR fi

Re: [Beowulf] PetaBytes on a budget, take 2

2011-07-21 Thread Greg Lindahl
On Thu, Jul 21, 2011 at 08:03:58PM -0400, Ellis H. Wilson III wrote: > Used in a backup solution, triplication won't get you much more > resilience than RAID6 but will pay a much greater performance penalty to > simply get your backup or checkpoint completed. Hey, if you don't see any benefit fro

Re: [Beowulf] PetaBytes on a budget, take 2

2011-07-21 Thread Greg Lindahl
On Thu, Jul 21, 2011 at 02:55:30PM -0400, Ellis H. Wilson III wrote: > My personal experience with getting large amounts of data from local > storage to HDFS has been suboptimal compared to something more raw, If you're writing 3 copies of everything on 3 different nodes, then sure, it's a lot sl

Re: [Beowulf] PetaBytes on a budget, take 2

2011-07-21 Thread Greg Lindahl
On Thu, Jul 21, 2011 at 12:28:00PM -0400, Ellis H. Wilson III wrote: > For traditional Beowulfers, spending a year or two developing custom > software just to manage big data is likely not worth it. There are many open-souce packages for big data, HDFS being one file-oriented example in the Hado

[Beowulf] New Tri-Labs cluster running QLogic HCAs/switches

2011-06-17 Thread Greg Lindahl
20,000 nodes: http://www.hpcwire.com/hpcwire/2011-06-17/qlogic_wins_major_deployment_in_nnsa%27s_tri-labs_cluster.html Looks like Mellanox is finally getting some significant competition. Now if only blekko could pull the same trick off against google! -- greg ___

Re: [Beowulf] Execution time measurements - clarification

2011-05-20 Thread Greg Lindahl
On Fri, May 20, 2011 at 02:26:31PM -0400, Mark Hahn forwarded a message: > When I run 2 identical examples of the same batch job simultaneously, > execution time of *each* job is > LOWER than for single job run ! I'd try locking these sequential jobs to a single core, you can get quite weird eff

Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-20 Thread Greg Lindahl
On Fri, May 20, 2011 at 08:52:43AM -0700, Lux, Jim (337C) wrote: > As hardware gets smaller and faster and lower power, the "cost" to provide > extra computational resources to implement a strategy like this gets > smaller, relative to the ever increasing human labor cost to try and make > it perf

Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-19 Thread Greg Lindahl
On Fri, May 20, 2011 at 12:35:25AM -0400, Joe Landman wrote: >Does anyone run a large-ish cluster without ECC ram? Or with ECC > turned off at the motherboard level? I am curious if there are numbers > of these, and what issues people encounter. I have some of my own data > from smaller

[Beowulf] How InfiniBand gained confusing bandwidth numbers

2011-05-09 Thread Greg Lindahl
http://dilbert.com/strips/comic/2011-05-10/ ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] GP-GPU experience

2011-04-04 Thread Greg Lindahl
On Mon, Apr 04, 2011 at 09:54:37AM -0700, Massimiliano Fatica wrote: > If you are old enough to remember the time when the first distribute > computers appeared on the scene, > this is a deja-vu. Not to mention the prior appearance of array processors. Oil+Gas bought a lot of those, too. Some imp

Re: [Beowulf] AMD 8 cores vs 12 cores CPUs and Infiniband

2011-03-30 Thread Greg Lindahl
On Wed, Mar 30, 2011 at 12:02:33AM -0400, Mark Hahn wrote: > > 2 - I've heard that QLogic behavior is better in terms of QP creation, I > > well, they've often bragged about message rates - I'm not sure how relate > that is to QP creation. They are two separate issues. PSM's equivalent of a QP i

Re: [Beowulf] Storage - the end of RAID?

2010-10-29 Thread Greg Lindahl
On Fri, Oct 29, 2010 at 03:02:45PM -0400, Ellis H. Wilson III wrote: > I think it's making a pretty wild assumption to say search engines and > HPC have the same I/O needs (and thus can use the same I/O setups). Well, I'm an HPC guy doing infrastructure for a search engine, so I'm not assuming

Re: [Beowulf] RE: Storage - the end of RAID?

2010-10-29 Thread Greg Lindahl
On Fri, Oct 29, 2010 at 02:46:35PM -0400, Ellis H. Wilson III wrote: > Drives (of the commodity variety) are pretty darn cheap already. I'd be > surprised if this (RAID 1) isn't the better solution today (rather than > RAID2-6), rather than some point in the future. Um, it's not really RAID

Re: [Beowulf] Storage - the end of RAID?

2010-10-29 Thread Greg Lindahl
On Fri, Oct 29, 2010 at 05:42:39PM +0100, Hearns, John wrote: > Quite a perceptive article on ZDnet > > http://www.zdnet.com/blog/storage/the-end-of-raid/1154?tag=nl.e539 This has been going on for a long time. Blekko has 5 petabytes of disk, and no RAID anywhere. RAID went out with SQL. Kinda

Re: [Beowulf] China Wrests Supercomputer Title From U.S.

2010-10-28 Thread Greg Lindahl
On Thu, Oct 28, 2010 at 10:55:32PM -0500, Alan Louis Scheinine wrote: > With regard to networks, a near-future fork in the road between Beowulf > clusters versus supercomputers may be the intelligence concerning global > memory added to the network interface chip for upcoming models of > supercom

Re: [Beowulf] China Wrests Supercomputer Title From U.S.

2010-10-28 Thread Greg Lindahl
On Thu, Oct 28, 2010 at 05:51:26PM +0100, Igor Kozin wrote: > Incidentally, does anyone know existing _production_ GPU clusters used not > for development but to run jobs by ordinary users? That depends on whether you call Cell a GPU. And whether you think RoadRunner has reached production, and i

Re: [Beowulf] Broadcast - not for HPC - or is it?

2010-10-05 Thread Greg Lindahl
On Fri, Sep 24, 2010 at 08:21:55PM +1000, Matt Hurd wrote: > This was not designed for HPC but for low-latency trading as it beats > a switch in terms of speed. Primarily focused on low-latency > distribution of market data to multiple users as the port to port > latency is in the range of 5-7 na

Re: [Beowulf] A sea of wimpy cores

2010-09-20 Thread Greg Lindahl
t. isnt the whole point to try > and speed up the calculation process? > > On Mon, Sep 20, 2010 at 7:47 PM, Greg Lindahl wrote: > > > I'm sure that some BIOSes have that kind of feature, but none of the > > ones that I'm currently using do. > > > > On

Re: [Beowulf] A sea of wimpy cores

2010-09-20 Thread Greg Lindahl
I'm sure that some BIOSes have that kind of feature, but none of the ones that I'm currently using do. On Sun, Sep 19, 2010 at 08:06:39AM +0200, Jonathan Aquilina wrote: > Greg correct me if im wrong but cant you put in the memory which is > compatible with the system and slow the memory bus down

Re: [Beowulf] A sea of wimpy cores

2010-09-17 Thread Greg Lindahl
On Fri, Sep 17, 2010 at 10:53:11AM -0400, Lawrence Stewart wrote: > One of the other ideas at SiCortex was that a slow core wouldn't > affect application performance of codes that were actually limited by > the memory system. We noticed many codes running at 1 - 5% of peak > performance, spending

Re: [Beowulf] 48-port 10gig switches?

2010-09-01 Thread Greg Lindahl
't heard about any 48-port 10GbE switch chips. Fulcrum and Dune > don't show anything like that on their websites. Where did you hear > about 48-port 10G asics? 24-port chips are pretty easy to find, but I > hadn't heard about 48-port'ers. > > Tom > >

[Beowulf] 48-port 10gig switches?

2010-09-01 Thread Greg Lindahl
I'm in the market for 48-port 10gig switches (preferably not a chassis), and was wondering if anyone other than Arista and (soon) Voltaire makes them? Force10 seems to only have a chassis that big? Cisco isn't my favorite vendor anyway. One would think that the availability of a single-chip 48-port

Re: [Beowulf] Kernel action relevant to us

2010-08-16 Thread Greg Lindahl
On Sun, Aug 15, 2010 at 09:13:50AM +1000, Chris Samuel wrote: > On Sat, 14 Aug 2010 04:16:19 pm Walid wrote: > > > do we know if that have made it to any Linux Kernel? > > Looks like it was merged for 2.6.34-rc1 according to gitk, so > yes, it should be in the current kernel. It needs to be expl

Re: [Beowulf] compilers vs mpi?

2010-07-20 Thread Greg Lindahl
On Tue, Jul 20, 2010 at 12:07:32PM -0400, Mark Hahn wrote: > I'm interested in hearing about experiences with mixing compilers > between the application and MPI. that is, I would like to be able > to compile MPI (say, OpenMPI) with gcc, and expect it to work correctly > with apps compiled with

Re: [Beowulf] instances where a failed storage block is not all zero?

2010-07-07 Thread Greg Lindahl
On Wed, Jul 07, 2010 at 03:34:01PM -0700, David Mathog wrote: > With "modern" hardware are there currently any notable instances where a > failed read of a hardware storage area block results in that missing > data being filled in with something other than null bytes? Yes. You might get the wrong

  1   2   3   4   5   6   7   >