Re: [Beowulf] Your thoughts on the latest RHEL drama?

2023-06-26 Thread Bill
We're all ears... Bill On 6/26/23 3:00 PM, Douglas Eadline wrote: I'll have more to say later and to me the irony of this situation is Red Hat has become what they were created to prevent*. -- Doug * per conversations with Bob Young back in the day We&#

Re: [Beowulf] LSF vs Slurm

2022-03-10 Thread Bill Benedetto
bout them. They are really quick to respond and quite good about working until there is a resolution of some sort. - Bill Benedetto On Thu, 2022-03-10 at 10:39 -0600, Lohit Valleru via Beowulf wrote: > >   > Hello Everyone, > > I wanted to ask if there is anyone who could explain

Re: [Beowulf] [External] RIP CentOS 8

2020-12-10 Thread Bill Abbott
I still haven't forgiven them for what they did to Sun. Bill On 12/10/20 2:44 PM, Jon Tegner wrote: On 12/10/20 10:55 AM, Jon Tegner wrote: What about https://linux.oracle.com/switch/centos/ Regards, /jon Possibly a good option - if I didn't trust Oracle even less than IBM. 

Re: [Beowulf] First cluster in 20 years - questions about today

2020-02-02 Thread Bill Abbott
h. We do this for both small and medium clusters and it works well. We chose centos 7/warewulf/slurm as our track. Regards, Bill On 2/1/20 10:21 PM, Mark Kosmowski wrote: I've been out of computation for about 20 years since my master degree. I'm getting into the game again as

Re: [Beowulf] 10G and rsync

2020-01-02 Thread Bill Abbott
If you have no choice but to use single rsync then either set up an rsyncd server on the other end to bypass ssh or use something like hpn-ssh for performance. Bill On 1/2/20 10:52 AM, Bill Abbott wrote: > Fpsync and parsyncfp both do a great job with multiple rsyncs, although > you have

Re: [Beowulf] 10G and rsync

2020-01-02 Thread Bill Abbott
Fpsync and parsyncfp both do a great job with multiple rsyncs, although you have to be careful about --delete. The best performance for fewer, larger files, if it's an initial or one-time transfer, is bbcp with multiple streams. Also jack up the tcp send buffer and turn on jumbo frames.

Re: [Beowulf] Here we go again

2019-12-13 Thread Bill Broadley
On 12/12/19 6:35 AM, Douglas Eadline wrote: Anyone see anything like this with Epyc, i.e. poor AMD performance when using Intel compilers or MKL? https://www.pugetsystems.com/labs/hpc/AMD-Ryzen-3900X-vs-Intel-Xeon-2175W-Python-numpy---MKL-vs-OpenBLAS-1560/ I as getting anomalously slow perfor

Re: [Beowulf] traverse @ princeton

2019-10-10 Thread Bill Wichser
lanes of v3, we get full EDR to both CPU sockets. Bill On 10/10/19 12:57 PM, Scott Atchley wrote: That is better than 80% peak, nice. Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third rack? You went with a single-port HCA per socket and not the shared, dual-port HCA in the

Re: [Beowulf] traverse @ princeton

2019-10-10 Thread Bill Wichser
. It fits there today but who knows what else got on there since June. With the help of nVidia we managed to get 1.09PF across 45 nodes. Bill On 10/10/19 7:45 AM, Michael Di Domenico wrote: for those that may not have seen https://insidehpc.com/2019/10/traverse-supercomputer-to-acceler

Re: [Beowulf] Rsync - checksums

2019-10-01 Thread Bill Wichser
I used xxHash-0.7.0 to build against. You'll need to grab a version and install. For the actual rsync I have a diff, xxhash.patch along with the rpms for rsync in https://tigress-web.princeton.edu/~bill/ If I get time I'll try and pass this to the upstream rsync folks. It is

Re: [Beowulf] Rsync - checksums

2019-09-30 Thread Bill Wichser
Just wanted to circle back on my orginal question. I changed the rsync code adding xxhash and we see about a 3x speedup. Good enough since it is very close to not using any checksum speedups. Bill On 6/17/19 9:43 AM, Bill Wichser wrote: We have moved to a rsync disk backup system, from TSM

Re: [Beowulf] Rsync - checksums

2019-06-18 Thread Bill Wichser
Was it? I meant it in the gracious of ways. We did not have that version of rsync and knew nothing of that call. That pointer to an option was most appreciated. Bill On 6/18/19 11:46 AM, Michael Di Domenico wrote: On Tue, Jun 18, 2019 at 11:00 AM Bill Wichser wrote: Well thanks for THAT

Re: [Beowulf] Rsync - checksums

2019-06-18 Thread Bill Wichser
No. Using the rsync daemon on the receiving end. Bill On 6/18/19 11:03 AM, Stu Midgley wrote: Are you rsyncing over ssh?  If so, get HPN-SSH and use the non-cipher. MUCH faster again :) On Tue, Jun 18, 2019 at 11:00 PM Bill Wichser <mailto:b...@princeton.edu>> wrote: Well t

Re: [Beowulf] Rsync - checksums

2019-06-18 Thread Bill Wichser
Well thanks for THAT pointer! Using --checksum-choice=none results in speedup of somewhere between 2-3 times. That's my validation of the checksum theory things have been pointing towards. Now to get xxhash into rsync and I think we are all set. Thanks, Bill On 6/18/19 9:57 AM, El

Re: [Beowulf] Rsync - checksums

2019-06-18 Thread Bill Wichser
#x27;s all for naught. Maybe it isn't. But in a few weeks hopefully we'll have determined. Thanks all, Bill On 6/18/19 9:02 AM, Ellis H. Wilson III wrote: On 6/18/19 6:59 AM, Bill Wichser wrote: Just for clarity here, we are NOT using the -c option.  The checksums happen whenever th

Re: [Beowulf] Rsync - checksums

2019-06-18 Thread Bill Wichser
board. This is what sparked my question. Bill On 6/18/2019 5:02 AM, Peter Kjellström wrote: On Mon, 17 Jun 2019 08:29:53 -0700 Christopher Samuel wrote: On 6/17/19 6:43 AM, Bill Wichser wrote: md5 checksums take a lot of compute time with huge files and even with millions of smaller ones.  The

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread bill
does for lustre. On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <bill@princeton.edu> wrote: > > We have moved to a rsync disk backup system, from TSM tape, in order to > have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options > but here we are. > >

[Beowulf] Rsync - checksums

2019-06-17 Thread Bill Wichser
would be wonderful. Ideally this could be some optional plugin for rsync where users could choose which checksummer to use. Bill ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mo

Re: [Beowulf] Containers in HPC

2019-05-24 Thread Bill Broadley
> A downside of containers is MUCH less visibility from the host OS. Sorry, I meant to say a downside of *virtual machines* is MUCH less visibility from the host OS. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change

Re: [Beowulf] Containers in HPC

2019-05-24 Thread Bill Broadley
On 5/23/19 5:35 AM, Jonathan Aquilina wrote:> Thanks for the great explanation and clarification. Another question that stems from the below what mechanisms exist in terms of security for the containers to be as secure as a VM? As usual with security it's complicated. Both VPSs and containers hav

Re: [Beowulf] Containers in HPC

2019-05-23 Thread Bill Broadley
On 5/23/19 3:49 AM, Jonathan Aquilina wrote:> Hi Guys, > > > > Can someone clarify for me are containers another form of virtualized > systems? > Or are they isolated environments running on bare metal? Generally virtual machines run their own kernel. Typically CPU overhead is close to zero,

Re: [Beowulf] Introduction and question

2019-02-28 Thread Bill Broadley
Yes you belong! Welcome to the list. There's many different ways to run a cluster. But my recommendations: * Making the clusters as identical as possible. * setup ansible roles for head node, NAS, and compute node * avoid installing/fixing things with vi/apt-get/dpkg/yum/dnf, use ansible w

Re: [Beowulf] If you can help ...

2018-11-11 Thread Bill Wichser
Thanks for the reminder! Bill On 11/10/2018 11:44 PM, Prentice Bisbal via Beowulf wrote: $1,570 so far.  $8,430 left to go. On 11/9/2018 7:27 AM, Douglas Eadline wrote: Everyone: This is a difficult email to write. For years we (Lara Kisielewska, Tim Wilcox, Don Becker, myself, and many

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-29 Thread Bill Abbott
I like how a thoughtful piece on open source and freedom of choice ends with the phrase "You have no rights". Subtle. Bill On 10/29/18 7:44 PM, Douglas Eadline wrote: > > > In the sage words of Douglas Adams, "Don't Panic" > > My take here: > > &

Re: [Beowulf] Intel Storm on the Horizon ?

2018-07-03 Thread Bill Abbott
I do plan on keeping an eye on 20B market cap HPC vendors. So far I've found exactly one. Bill On 07/03/2018 01:38 PM, C Bergström wrote: On Tue, Jul 3, 2018 at 10:35 PM, Douglas Eadline mailto:deadl...@eadline.org>> wrote: > It's an interesting theory, but just b

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread Bill Abbott
I thought it was for sysadmin, not developer. Disregard most of what I said. Bill On 06/13/2018 02:49 PM, Fred Youhanaie wrote: On 13/06/18 18:07, Jonathan Engwall wrote: John Hearne wrote:  > Stuart Midgley works for DUG?  They are currently  > recruiting for an HPC manager in

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread Bill Abbott
know how they think and if they can come up with a coherent plan. Another version is "A user says the cluster is slow. What do you do?" Bill On 06/13/2018 02:27 PM, Andrew Latham wrote: Such a broad topic. I would assume things like DHCP, TFTP, Networking, PXE and IPMI whi

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread Bill Abbott
they're in Australia, so they might use these terms differently. Prentice On 06/13/2018 01:53 PM, Bill Abbott wrote: linux, mostly On 06/13/2018 01:07 PM, Jonathan Engwall wrote: John Hearne wrote:  > Stuart Midgley works for DUG?  They are currently  > recruiting for an HPC manager

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread Bill Abbott
linux, mostly On 06/13/2018 01:07 PM, Jonathan Engwall wrote: John Hearne wrote: > Stuart Midgley works for DUG?  They are currently > recruiting for an HPC manager in London... Interesting... Recruitment at DUG wants to call me about Low Level HPC. I have at least until 6pm. I am excited bu

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread Bill Abbott
7;re working towards is to have a vm cluster in one of the commercial cloud providers that only accepts small jobs, and use slurm federation to steer the smaller jobs there, leaving the on-prem nodes for big mpi jobs. We're not there yet but shouldn't be a problem to implement technic

[Beowulf] AMD Epyc + Omni-Path?

2018-03-21 Thread Bill Broadley
Anyone else running AMD Epyc (or any other non Intel CPU) and Omni-Path? I've have some AMD Epyc 7451 nodes working, but I went to buy more only to hear that it's not a configuration that the vendor or Intel will support. I've never needed support from Mellanox or Pathscale/Qlogic/Intel for previ

Re: [Beowulf] Infiniband switch topology display

2018-01-17 Thread Bill Abbott
OSU's INAM is free and graphical. Bill On 1/16/18 12:18 PM, Alex Chekholko via Beowulf wrote: Hi John, My Mellanox knowledge is some years out of date, but Mellanox makes a proprietary and expensive monitoring tool with a GUI. I believe it has a 30-day trial mode, so you can install i

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-08 Thread Bill Broadley
Last time I saw this problem was because the chassis was missing the air redirection guides, and not enough air was getting to the CPUs. The OS upgrade might actually be enabling better throttling to keep the CPU cooler. ___ Beowulf mailing list, Beowu

Re: [Beowulf] cluster deployment and config management

2017-09-06 Thread Bill Broadley
On 09/05/2017 07:14 PM, Stu Midgley wrote: > I'm not feeling much love for puppet. I'm pretty fond of puppet for managing clusters. We use cobbler to go from PXE boot -> installed, then puppet takes over. Some of my favorite features: * Inheritance is handy node -> node for a particular cluster

Re: [Beowulf] cold spare storage?

2017-08-17 Thread Bill Broadley via Beowulf
On 08/17/2017 11:10 AM, Alex Chekholko wrote: > The Google paper from a few years ago showed essentially no correlations > between > the things you ask about and failure rates. So... do whatever is most > convenient for you. Backblaze also has a pretty large data set, granted not as big as googl

Re: [Beowulf] How to debug slow compute node?

2017-08-16 Thread Bill Broadley via Beowulf
On 08/10/2017 07:39 AM, Faraz Hussain wrote: > One of our compute nodes runs ~30% slower than others. It has the exact same > image so I am baffled why it is running slow . I have tested OMP and MPI > benchmarks. Everything runs slower. The cpu usage goes to 2000%, so all looks > normal there. We

Re: [Beowulf] Register article on Epyc

2017-07-04 Thread Bill Broadley
On 07/02/2017 05:43 AM, jaquilina wrote: > What is everyone's thoughts on Intel new i9 cpus as these boast significant > jump > in core count Relabled Xeons, just like all the previous generations. Same socket and same number of memory channels, it's just marketing. The 8 core variety has been

Re: [Beowulf] Register article on Epyc

2017-06-22 Thread Bill Broadley
On 06/22/2017 08:21 PM, Kilian Cavalotti wrote: > Oh, and at least the higher core-count SKUs like the 32-core 7251 are > actually 4 8-core dies linked together with a new "Infinity Fabric" > interconnect, not a single 32-core die. I completely missed that. And > it's fine, it probably makes sense

Re: [Beowulf] Register article on Epyc (Brian Dobbins)

2017-06-22 Thread Bill Broadley
On 06/22/2017 04:41 PM, mathog wrote: > On 22-Jun-2017 15:05, Greg Lindahl wrote: >> I don't think it hurt AMD that much in the end. > > I disagree. It's hard to say. I agree that AMD very slowly managed to claw some small market share from intel with the Opteron. I believe it was on the order

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Bill Broadley
On 06/21/2017 05:29 PM, Christopher Samuel wrote: > On 21/06/17 22:39, John Hearns wrote: > >> I would speculate about single socket AMD systems, with a smaller form >> facotr motherboard, maybe with onboard Infiniband. Put a lot of these >> cards in a chassis and boot them disklessly and you get

Re: [Beowulf] more automatic building

2016-10-06 Thread Bill Broadley
On 10/02/2016 06:11 PM, Christopher Samuel wrote: > On 30/09/16 23:43, Mikhail Kuzminsky wrote: > >> Are there, by your opinions, some clear OpenHPC minuses ? > > Last I heard their Open-MPI builds don't include Slurm support for > perceived licensing issues (odd to me, but that's lawyers for you),

Re: [Beowulf] more automatic building

2016-09-29 Thread Bill Broadley
On 09/28/2016 07:34 AM, Mikhail Kuzminsky wrote: > I worked always w/very small HPC clusters and built them manually > (each server). Manual installs aren't too bad up to 4 nodes or so. > But what is reasonable to do for clusters containing some tens or > hundred of nodes ? We use cobbler for D

[Beowulf] NFS HPC survey results.

2016-07-20 Thread Bill Broadley
Many thanks for all the responses. Here's the promised raw data: https://wiki.cse.ucdavis.edu/_media/wiki:linux-hpc-nfs-survey.csv I'll summarize the 26 results below. I'll email similar to those that asked. Not everyone answered all questions. 1) cluster OS: 72% Redhat/CentOS/Scientifi

[Beowulf] NFS HPC survey

2016-07-14 Thread Bill Broadley
We use NFS pretty heavily on a mix of a dozen or so small/medium clusters. I was curious how other people have NFS configured for their clusters. I made this survey to collect related information: http://goo.gl/forms/AuXCNR10WhJNgtDw1 It doesn't require a google login, none of the question

Re: [Beowulf] China aims for 100 PF

2016-06-21 Thread Bill Broadley
On 06/21/2016 05:14 AM, Remy Dernat wrote: Hi, 100 PF is really not far from reality right now: http://www.top500.org/news/new-chinese-supercomputer-named-worlds-fastest-system-on-latest-top500-list/ I was curious about the CPU/architecture and I found: http://www.netlib.org/utk/people/Jack

[Beowulf] First experiences with Broadwell and the Dell M630

2016-06-09 Thread Bill Wichser
#x27;d sure like to have a lot more of those better performers! Did I mention GPFS? We have it running on a v3 node with the same kernel. On the Broadwell chips though, it just hangs the kernel. Sigh. The cutting edge. When can I order Skylake? Bill __

Re: [Beowulf] memory bandwidth scaling

2015-10-05 Thread Bill Broadley
On 10/01/2015 09:27 AM, Orion Poplawski wrote: > We may be looking a getting a couple new compute nodes. I'm leery though of > going too high in processor core counts. Does anyone have any general > experiences with performance scaling up to 12 cores per processor with general > models like CM1/W

[Beowulf] NFS + IB?

2015-02-20 Thread Bill Broadley
I read through the beowulf archives for mentions of NFS + IB. I found nothing newer than 2012. What are peoples current experience with NFS + IB? I'm looking at the options for smaller clusters with /home on NFS. I'll leave distributed filesystems for a separate discussion. The two leading opt

Re: [Beowulf] IPoIB failure

2015-01-23 Thread Bill Wichser
quite noisy! Bill On 01/23/2015 09:25 AM, John Hearns wrote: Do you see anything interesting in the opensm logs on that server? In the past I have found looking through opensm logs to be tough going though - generally full fo verbose messages which don't mean a lot. Maybe if you could trac

[Beowulf] IPoIB failure

2015-01-23 Thread Bill Wichser
n the first place. It just seems that the IPoIB layer was at fault here somehow in that routing was not correct across the entire IB network. If anyone has any insights, I'd be most appreciative. It's clear we do not understand this aspect of the IB stack and how this layer works.

Re: [Beowulf] Putting /home on Lusture of GPFS

2014-12-23 Thread Bill Wichser
ps my strongest this month. Bill ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Open source and the Draft Report of the Task Force on High Performance Computing

2014-08-27 Thread Bill Broadley
The URL: http://energy.gov/seab/downloads/draft-report-task-force-high-performance-computing One piece I found particularly interesting: There has been very little open source that has made its way into broad use within the HPC commercial community where great emphasis is placed on servic

[Beowulf] Nvidia K1 Denver

2014-08-12 Thread Bill Broadley
I was surprised to find the Nvidia K1 to be a surprising departure from the ARM Cortex a53 and a57 cores. Summary at: http://blogs.nvidia.com/blog/2014/08/11/tegra-k1-denver-64-bit-for-android/ Details at (if you are willing to share your email address): http://www.tiriasresearch.com/downloads/

[Beowulf] Power8

2014-04-29 Thread Bill Broadley
Sounds like a potentially interesting CPU/platform for HPC. Of particular interest: 1) similar quad socket performance to intel's best 2) embracing 3rd parties access to cc memory 3) up to 8 off chip memory controllers with cache (centaur chip) 4) allowing 3rd party motherboards 5) IBM exploring

[Beowulf] Nvidia and IBM create GPU interconnect for faster supercomputing

2014-03-25 Thread Bill Broadley
Sounds like a memory coherent 80GB/sec link: http://arstechnica.com/information-technology/2014/03/nvidia-and-ibm-create-gpu-interconnect-for-faster-supercomputing/ They mention GPU<->GPU links, but don't quite mention system <-> system links. ___ Beowu

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2014-02-06 Thread Bill Wichser
On 2/6/2014 9:30 AM, Aaron Knister wrote: Bill Wichser princeton.edu> writes: We have tested using c1 instead of c0 but no difference. We don't use logical processors at all. When the problems happens, it doesn't matter what you set the cores for C1/C0, they never get up t

Re: [Beowulf] Mutiple IB networks in one cluster

2014-02-04 Thread Bill Broadley
On 02/01/2014 08:17 AM, atchley tds.net wrote: > The cross-bar switch only guarantees non-blocking if the two ports are on > the same line card (i.e. using the same crossbar). Once you start > traversing multiple crossbars, you are sharing links and can experience > congestion. Full backplane mean

Re: [Beowulf] Admin action request

2013-11-22 Thread Bill Broadley
>Option 3: Enforce some of our basic etiquette. If you aren't willing > to abide by the house rules, you won't be allowed into the house to > violate the rules. In this case, I see more than two strikes, so I am > not all that inclined to be terribly forgiving of these breaches. I like #3

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-09-19 Thread Bill Wichser
nly been a week and a half of some very happy researchers! Thanks, Bill On 09/19/2013 11:32 AM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 18/09/13 10:49, Douglas O'Flaherty wrote: > >> "Run in C1. C0 over commits

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-09-17 Thread Bill Wichser
. If we get through a whole month then I would say that after all the firmware and iDrac and CMC updates that a chassis power cycle is the answer. But tomorrow I will look again and hopefully be happily surprised for one more day. Bill On 9/17/2013 8:06 PM, Richard Hickey wrote > You are n

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-09-05 Thread Bill Wichser
onsumption | 192 Watts | ok Current | 0.80 Amps | ok On a normal node the power is upwards of 350W. We are trying to escalate with Dell but that process is SLOW! Thanks, Bill On 08/30/2013 09:03 AM, Bill Wichser wrote: > Since January, when we installed an M620 Sandybridge cl

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-09-03 Thread Bill Wichser
Yes. Especially when you do not see the power limit normal following it. This says that the cores are limited (by power) but never actually receiving enough to then say that they are normal again. We'd see both on ramp up to C0 state. limited then normal; limited then normal. Bill

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-09-03 Thread Bill Wichser
doing using the cpupower command, info we were unable to obtain completely without this BIOS change. I'm not sure about the C1E state being enabled though and will experiment further. Thanks to everyone who offered suggestions. An extra thanks to Don Holmgren who pointed us down this path

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-08-30 Thread Bill Wichser
Thanks for everyone's suggestions so far. I have a very good lead, which we have tested, and appears to have promising effects. I will summarize after we have tested on a few more nodes and confirmed some results. Thanks, Bill ___ Beowulf ma

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-08-30 Thread Bill Wichser
odes be bad nodes consistently. They have been mostly moving targets at this point, randomly distributed. > >> Again, the magnitude of the problem is about 5-10% at any time. Given >> 600 > > if I understand you, the prevalence is only 5-10%, but the magnitude > (effect) >

Re: [Beowulf] Problems with Dell M620 and CPU power throttling

2013-08-30 Thread Bill Wichser
re are not sufficient in showing me this though. Thanks, Bill On 08/30/2013 10:44 AM, Mark Hahn wrote: >> We run the RH 6.x release and are up to date with kernel/OS patches. > > have you done any /sys tuning? > >> non-redundant. tuned is set for performance. Turbo mode is >

[Beowulf] Problems with Dell M620 and CPU power throttling

2013-08-30 Thread Bill Wichser
rs an event where this power capping takes effect. At this point we remain clueless about what is causing this to happen. We can detect the condition now and have been power cycling the nodes in order to reset. If anyone has a clue, or better yet, solved the issue, we'd love to

Re: [Beowulf] Intel Phi musings

2013-03-14 Thread Bill Wichser
Apparently, now that we finally received our own Phi's, this blog has been taken down? I read through it as content was being added but did not copy down anything. Did I miss an announcement of a move? Bill On 02/12/2013 10:02 AM, Dr Stuart Midgley wrote: > I've started a blo

Re: [Beowulf] SSD caching for parallel filesystems

2013-02-09 Thread Bill Broadley
On 02/09/2013 01:22 PM, Vincent Diepeveen wrote: > SATA is very bad protocol for SSD's. > > SSD's allows perfectly parallel stores and writes, SATA doesn't. > So SATA really limits the SSD's true performance. SSDs and controllers often support NCQ which allows multiple outstanding requests. Not

Re: [Beowulf] AMD Roadrunner open compute motherboard

2013-01-16 Thread Bill Broadley
On 01/16/2013 10:20 AM, Hearns, John wrote: > http://www.theregister.co.uk/2013/01/16/amd_roadrunner_open_compute_motherboard/ The pictured 1U has what looks like 12 15k RPM fans (not including the power supplies). Or 6 double fans if you prefer. In my experience those fans burn an impressive am

Re: [Beowulf] AMD Roadrunner open compute motherboard

2013-01-16 Thread Bill Broadley
On 01/16/2013 11:27 AM, Vincent Diepeveen wrote: > The thing looks shitty. Just 2 sockets. At 2 sockets AMD is junk. Heh, at least at running chess programs that's of interest to approximately 0.00% of the market. > At > 4 sockets it would be interesting though - yet that's not shown. Dunno, s

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-12 Thread Bill Broadley
On 01/12/2013 07:29 AM, Vincent Diepeveen wrote: > Yes i was the inventor of that test to jump using a RNG randomly. > Paul Hsieh then modified it from calling the RNG and correcting for > the RNG, to the direct pointer math as you show here. Oh come now Vincent, inventor is a very strong word fo

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-12 Thread Bill Broadley
On 01/12/2013 04:25 PM, Stu Midgley wrote: > Until the Phi's came along, we were purchasing 1RU, 4 sockets nodes > with 6276's and 256GB ram. On all our codes, we found the throughput > to be greater than any equivalent density Sandy bridge systems > (usually 2 x dual socket in 1RU) at about 10-15

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-12 Thread Bill Broadley
On 01/11/2013 05:22 AM, Vincent Diepeveen wrote:> >> Bill - a 2 socket system doesn't deliver 512GB ram. > On 01/11/2013 05:59 AM, Reuti wrote: > Maybe I get it wrong, but I was checking these machines recently: > > IBM's x3550 M4 goes up to 768 GB with 2 CP

Re: [Beowulf] AMD performance (was 500GB systems)

2013-01-11 Thread Bill Broadley
On 01/11/2013 04:01 AM, Joshua mora acosta wrote: > Hi Bill, > AMD should pay you for these wise comments ;) > > But since this list is about providing feedback, and sharing knowledge, I > would like to add something to your comments, and somewhat HW agnostic. When > you

[Beowulf] AMD performance (was 500GB systems)

2013-01-10 Thread Bill Broadley
x27;s bandwidth scaling on a quad socket with 64 cores: http://cse.ucdavis.edu/bill/pstream/bm3-all.png I don't have a similar Intel, but I do have a dual socket e5: http://cse.ucdavis.edu/bill/pstream/e5-2609.png ___ Beowulf mail

Re: [Beowulf] Configuration management tools/strategy

2013-01-09 Thread Bill Broadley
On 01/06/2013 05:38 AM, Walid wrote: > Dear All, > > At work we are starting to evaluate Configuration management to be used > to manage several diverse hpc clusters We currently managing 15 clusters with puppet and am very pleased with puppet. Puppet is one of the critical pieces that allows us

[Beowulf] Intel 82574L problems with newer kernels?

2012-12-11 Thread Bill Broadley
Anyone have some working tweaks to get an Intel E1000e driver + 82574L chip to behave with linux 3.5 or 3.7 kernels? Not sure if this is a problem for all 82574Ls or just ones on recent supermicro motherboards. I noticed stuttering, occasional high latencies, and a continuously increasing droppe

Re: [Beowulf] ARM cpu's and development boards and research

2012-11-27 Thread Bill Broadley
On 11/27/2012 07:46 AM, Vincent Diepeveen wrote: > i dug around in price of ARMs and development boards. > > If you just buy a handful most interesting offer seems to be > > http://www.hardkernel.com/renewal_2011/products/prdt_info.php? > g_code=G133999328931 > > it's $129 and has a quad core A

Re: [Beowulf] A petabyte of objects

2012-11-13 Thread Bill Broadley
If you need an object store and not a file system I'd consider hadoop. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/b

Re: [Beowulf] let's standardize liquid cooling

2012-09-28 Thread Bill Broadley
Sounds expensive, complicated, and challenging. How about a MUCH simpler proposal: eliminate fans from compute nodes. Nodes should: * assume good front to back airflow Racks would: * have large fans front AND back that run at relatively low rpm, and relatively quiet. * If front or rear door o

Re: [Beowulf] FY;) GROMACS on the Raspberry Pi

2012-09-19 Thread Bill Broadley
I taught a MPI class a few times and wanted something simple, fun, and could be improved upon several times as the students learned MPI. It's obviously embarrassingly parallel, but non-trivial to do well. There's often not enough work per pixel or per image to make the communications overhead lo

Re: [Beowulf] cluster building advice?

2012-09-17 Thread Bill Broadley
On 09/16/2012 02:52 PM, Jeffrey Rossiter wrote:> The intention is for the system to be > used for scientific computation. That doesn't narrow it down much. > I am trying to decide on a linux > distribution to use. I suggest doing it yourself based on whatever popular linux distro you have experi

Re: [Beowulf] Status of beowulf.org?

2012-06-15 Thread Bill Broadley
On 06/15/2012 12:25 PM, Jan Wender wrote: > Hi all, > > Arend from Penguin replied and they are looking for the list. They would > like to continue hosting the list, but would ask for some volunteers to > administrate it. Well if they are doing such a poor job and aren't willing to administrate i

Re: [Beowulf] Torrents for HPC

2012-06-13 Thread Bill Broadley
On 06/13/2012 06:40 AM, Bernd Schubert wrote: > What about an easy to setup cluster file system such as FhGFS? Great suggestion. I'm all for a generally useful parallel file systems instead of torrent solution with a very narrow use case. > As one of > its developers I'm a bit biased of course,

Re: [Beowulf] Torrents for HPC

2012-06-12 Thread Bill Broadley
On 06/12/2012 03:47 PM, Skylar Thompson wrote: > We manage this by having users run this in the same Grid Engine > parallel environment they run their job in. This means they're > guaranteed to run the sync job on the same nodes their actual job runs > on. The copied files change so slowly that eve

Re: [Beowulf] Torrents for HPC

2012-06-12 Thread Bill Broadley
Many thanks for the online and offline feedback. I've been reviewing the mentioned alternatives. From what I can tell none of them allow nodes to join/leave at random. Our problem is that a user might submit 500-50,000 jobs that depend on a particular dataset and have a variable number of job

[Beowulf] Torrents for HPC

2012-06-08 Thread Bill Broadley
ple bittorrent client and made a 16GB example data set and measured the performance pushing it to 38 compute nodes: http://cse.ucdavis.edu/bill/btbench-2.png The slow ramp up is partially because I'm launching torrent clients with a crude for i in { ssh $i launch_torrent.sh }. I get app

Re: [Beowulf] Intel buys QLogic InfiniBand business

2012-01-27 Thread Bill Broadley
On 01/27/2012 02:25 PM, Gilad Shainer wrote: > So I wonder why multiple OEMs decided to use Mellanox for on-board > solutions and no one used the QLogic silicon... That's a strange argument. What does Intel want? Something to make them more money. In the past that's been integrating functional

[Beowulf] HP redstone servers

2011-11-01 Thread Bill Broadley
The best summary I've found: http://www.theregister.co.uk/2011/11/01/hp_redstone_calxeda_servers/ Specifications at for the ECX-1000: http://www.calxeda.com/products/energycore/ecx1000/techspecs And EnergyCard: http://www.calxeda.com/products/energycards/techspecs The only hint on price that I f

Re: [Beowulf] materials for air shroud?

2011-08-31 Thread Bill Broadley
On 08/31/2011 12:15 PM, David Mathog wrote: > That never crossed my mind. > > You sure about the flammability? I believe it for the ignition due to > temperature (Fahrenheit 451 and all that). However, I have a gut > feeling (but no data) that sparks are fairly likely to ignite cardboard, > and

Re: [Beowulf] Infiniband: MPI and I/O?

2011-05-26 Thread Bill Wichser
swer. I know that a 10Gbps pipe hits 4Gbps for sustained periods to our central storage from the cluster. I also know that I can totally overwhelm a 10G connected OSS which is currently I/O bound. My question really was twofold: 1) is anyone doing this successfully and

[Beowulf] Infiniband: MPI and I/O?

2011-05-26 Thread Bill Wichser
abric? Thanks, Bill ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] 9 traits of the veteran Unix admin

2011-02-18 Thread Bill Rankin
> Besides, I only HAVE around 20 cases of bottles to fill, and if > I get more the administrative penalty of using our laundry room wall as > a beer storage unit will become, um, "severe". > > In other words, before I reach that point I will have to acknowledge > that I've passed a critical scaling

Re: [Beowulf] maui: how to set different walltime for different users

2011-02-17 Thread Bill Wichser
cted with ACL, is going to be the only way around this. Bill akshar bhosale wrote: > hi, > > we have a cluster on 16 nodes where we run torque+maui. > > We have set max walltime of 4 days for all jobs. we want to set > different max walltimes for different users. e.g. user abc w

Re: [Beowulf] IBM's Watson on Jeopardy tonight

2011-02-16 Thread Bill Rankin
> ariel sabiguero yawelak wrote: > > A dead company used to say "the > computer is the network", and for our brains it seems so. Actually for HPC and esp. clusters that (IMNSHO) is even more true today. My desktop just serves as my window into that network. Loss of a CPU or an entire server I

RE: [Beowulf] RE: Storage - the end of RAID?

2010-11-01 Thread Bill Rankin
a lot of sense to start replacing disks where we have single point bottlenecks in our I/O chain. So I look at the whole discussion as the realization that finally the rest of the world is catching up to us. :-) -bill ___ Beowulf mailing list, Beow

RE: [Beowulf] China Wrests Supercomputer Title From U.S.

2010-10-29 Thread Bill Rankin
> Define "real" applications, Something that produces tangible, scientifically useful results that would not have otherwise been realized without the availability and capability of that machine. > but to give my guess at your question "But they didn't. Why?" > > One word - cost Well, that's

RE: [Beowulf] China Wrests Supercomputer Title From U.S.

2010-10-29 Thread Bill Rankin
Douglas: > > [...] > > What this machine does do is validate to some extent the continued > use and development of GPUs in an HPC/cluster setting. > > [...] > > Nvidia claims Tianhe-1A's 4.04 megawatts of CUDA GPUs and Xeon CPUs is > three times more power efficient than CPUs alone. The Nvidia p

RE: [Beowulf] Re: Interesting

2010-10-29 Thread Bill Rankin
ts of hand-written notes and have several fountain pens I use. There are several brands of archival-class inks available and much debate over which ones are "best". Because of their nature they tend to be difficult to use and a mess to

  1   2   3   4   >