Re: [Beowulf] immersion

2024-04-07 Thread Scott Atchley
On Sun, Mar 24, 2024 at 2:38 PM Michael DiDomenico wrote: > i'm curious if others think DLC might hit a power limit sooner or later, > like Air cooling already has, given chips keep climbing in watts. > What I am worried about is power per blade/node. The Cray EX design used in Frontier has a l

Re: [Beowulf] immersion

2024-03-24 Thread Scott Atchley
there come a point where its just not worth it unless it's a big custom > solution like the HPE stuff > The ORv3 rack design's maximum power is the number of power shelves times the power per shelf. Reach out to me directly at @ ornl.gov and I can connect you with some vendors. &

Re: [Beowulf] immersion

2024-03-24 Thread Scott Atchley
On Sat, Mar 23, 2024 at 10:40 AM Michael DiDomenico wrote: > i'm curious to know > > 1 how many servers per vat or U > 2 i saw a slide mention 1500w/sqft, can you break that number into kw per > vat? > 3 can you shed any light on the heat exchanger system? it looks like > there's just two pipes c

Re: [Beowulf] [External] anyone have modern interconnect metrics?

2024-01-22 Thread Scott Atchley
On Mon, Jan 22, 2024 at 11:16 AM Prentice Bisbal wrote: > > >> > Another interesting topic is that nodes are becoming many-core - any >> > thoughts? >> >> Core counts are getting too high to be of use in HPC. High core-count >> processors sound great until you realize that all those cores are no

Re: [Beowulf] [External] anyone have modern interconnect metrics?

2024-01-20 Thread Scott Atchley
On Fri, Jan 19, 2024 at 9:40 PM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > > Yes, someone is sure to say "don't try characterizing all that stuff - > > it's your application's performance that matters!" Alas, we're a generic > > "any kind of research computing" organization, so t

Re: [Beowulf] [EXTERNAL] Re: anyone have modern interconnect metrics?

2024-01-18 Thread Scott Atchley
you’d solder to a board. And > there are plenty of XAUI->optical kinds of interfaces. And optical cables > are cheap and relatively rugged. > > > > > > *From:* Beowulf *On Behalf Of *Scott Atchley > *Sent:* Wednesday, January 17, 2024 7:18 AM > *To:* Larry Ste

Re: [Beowulf] anyone have modern interconnect metrics?

2024-01-17 Thread Scott Atchley
While I was at Myricom, the founder, Chuck Seitz, used to say that there was Ethernet and Ethernot. He tied Myricom's fate to Ethernet's 10G PHYs. On Wed, Jan 17, 2024 at 9:08 AM Larry Stewart wrote: > I don't know what the networking technology of the future will be like, > but it will be calle

Re: [Beowulf] anyone have modern interconnect metrics?

2024-01-17 Thread Scott Atchley
I don't think that UE networks are available yet. On Wed, Jan 17, 2024 at 3:13 AM Jan Wender via Beowulf wrote: > Hi Mark, hi all, > > The limitations of Ethernet seem to be recognised by many participants in > the network area. That is the reason for the founding of the Ultra-Ethernet > allianc

Re: [Beowulf] Checkpointing MPI applications

2023-03-27 Thread Scott Atchley
On Thu, Mar 23, 2023 at 3:46 PM Christopher Samuel wrote: > On 2/19/23 10:26 am, Scott Atchley wrote: > > > We are looking at SCR for Frontier with the idea that users can store > > checkpoints on the node-local drives with replication to a buddy node. > > SCR will manage

Re: [Beowulf] Checkpointing MPI applications

2023-02-19 Thread Scott Atchley
Hi Chris, It looks like it tries to checkpoint application state without checkpointing the application or its libraries (including MPI). I am curious if the checkpoint sizes are similar or significantly larger to the application's typical outputs/checkpoints. If they are much larger, the time to w

Re: [Beowulf] Top 5 reasons why mailing lists are better than Twitter

2022-11-21 Thread Scott Atchley
We have OpenMPI running on Frontier with libfabric. We are using HPE's CXI (Cray eXascale Interface) provider instead of RoCE though. On Sat, Nov 19, 2022 at 2:57 AM Matthew Wallis via Beowulf < beowulf@beowulf.org> wrote: > > > ;-) > > 1. Less spam. > 2. Private DMs, just email the person. > 3.

Re: [Beowulf] likwid vs stream (after HPCG discussion)

2022-03-20 Thread Scott Atchley
On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky wrote: > If so, it turns out that for the HPC user, stream gives a more > important estimate - the application is translated by the compiler > (they do not write in assembler - except for modules from mathematical > libraries), and stream will giv

Re: [Beowulf] Data Destruction

2021-09-29 Thread Scott Atchley
> On 9/29/2021 10:06 AM, Scott Atchley wrote: > > Are you asking about selectively deleting data from a parallel file system > (PFS) or destroying drives after removal from the system either due to > failure or system decommissioning? > > For the latter, DOE does not allow us t

Re: [Beowulf] Data Destruction

2021-09-29 Thread Scott Atchley
Are you asking about selectively deleting data from a parallel file system (PFS) or destroying drives after removal from the system either due to failure or system decommissioning? For the latter, DOE does not allow us to send any non-volatile media offsite once it has had user data on it. When we

Re: [Beowulf] AMD and AVX512

2021-06-16 Thread Scott Atchley
On Wed, Jun 16, 2021 at 1:15 PM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > Did anyone else attend this webinar panel discussion with AMD hosted by > HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your > Success in HPC" > > https://www.hpcwire.com/amd-hpc-solutions-e

Re: [Beowulf] Project Heron at the Sanger Institute [EXT]

2021-02-04 Thread Scott Atchley
On Thu, Feb 4, 2021 at 9:23 AM Jörg Saßmannshausen < sassy-w...@sassy.formativ.net> wrote: > One of the things I heard a few times is the use of GPUs for the analysis. > Is > that something you are doing as well? ORNL definitely is. We were the first to contribute cycles to the COVID-19 HPC Cons

Re: [Beowulf] Julia on POWER9?

2020-10-16 Thread Scott Atchley
% hostname -f login1.summit.olcf.ornl.gov % module avail |& grep julia forge/19.0.4 ibm-wml-ce/1.6.1-1 *julia*/1.4.2 (E)ppt/2.4.0-beta2 (D)vampir/9.5.0 (D) [*atchley*@*login1*]*~ *% module avail julia

Re: [Beowulf] Best case performance of HPL on EPYC 7742 processor ...

2020-08-17 Thread Scott Atchley
I do not have any specific HPL hints. I would suggest setting the BIOS to NUMAs-Per-Socket to 4 (NSP-4). I would try running 16 processes, one per CCX - two per CCD, with an OpenMP depth of 4. Dell's HPC blog has a few articles on tuning Rome: https://www.dell.com/support/article/en-us/sln319015

Re: [Beowulf] Power per area

2020-03-11 Thread Scott Atchley
1.035 in Perth :) > > Come and have a look at our Houston DC :) > > > > On Wed, Mar 11, 2020 at 3:37 AM Scott Atchley > wrote: > >> Hi everyone, >> >> I am wondering whether immersion cooling makes sense. We are most limited >> by datacenter floor spac

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
e to connect you with their founder/inventor. > > --Jeff > > On Tue, Mar 10, 2020 at 1:08 PM Scott Atchley > wrote: > >> Hi Jeff, >> >> Interesting, I have not seen this yet. >> >> Looking at their 52 kW rack's dimensions, it works out to 3.7 kW/

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
> I hope that helps a bit. > > Jörg > > Am Dienstag, 10. März 2020, 20:26:18 GMT schrieb David Mathog: > > On Tue, 10 Mar 2020 15:36:42 -0400 Scott Atchley wrote: > > > To make the exercise even more fun, what is the weight per square foot > > > for > >

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
On Tue, Mar 10, 2020 at 4:26 PM David Mathog wrote: > On Tue, 10 Mar 2020 15:36:42 -0400 Scott Atchley wrote: > > > To make the exercise even more fun, what is the weight per square foot > > for > > immersion systems? Our data centers have a limit of 250 or 500 pounds

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
27;re > based here in San Diego. > > https://ddcontrol.com/ > > --Jeff > > On Tue, Mar 10, 2020 at 12:37 PM Scott Atchley > wrote: > >> Hi everyone, >> >> I am wondering whether immersion cooling makes sense. We are most limited >> by datacenter floor space

[Beowulf] Power per area

2020-03-10 Thread Scott Atchley
Hi everyone, I am wondering whether immersion cooling makes sense. We are most limited by datacenter floor space. We can manage to bring in more power (up to 40 MW for Frontier) and install more cooling towers (ditto), but we cannot simply add datacenter space. We have asked to build new building

Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-17 Thread Scott Atchley
Hi Jim, While we allow both batch and interactive, the scheduler handles them the same. The scheduler uses queue time, node count, requested wall time, project id, and others to determine when items run. We have backfill turned on so that when the scheduler allocates a large job and the time to dr

Re: [Beowulf] HPC demo

2020-01-14 Thread Scott Atchley
s on a project in Oak Ridge. Was that it? > > > > John McCulloch | PCPC Direct, Ltd. | desk 713-344-0923 > > > > *From:* Scott Atchley > *Sent:* Tuesday, January 14, 2020 7:19 AM > *To:* John McCulloch > *Cc:* beowulf@beowulf.org > *Subject:* Re: [Beowulf] HPC de

Re: [Beowulf] HPC demo

2020-01-14 Thread Scott Atchley
We still have Tiny Titan even though Titan is gone. It allows users to toggle processors on and off and the display has a mode where the "water" is colored coded by the processor, which has a corresponding light. You can see the frame rate go up as you add processors a

Re: [Beowulf] traverse @ princeton

2019-10-11 Thread Scott Atchley
es of > v3, we get full EDR to both CPU sockets. > > Bill > > On 10/10/19 12:57 PM, Scott Atchley wrote: > > That is better than 80% peak, nice. > > > > Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third > rack? > > > > You went with

Re: [Beowulf] traverse @ princeton

2019-10-10 Thread Scott Atchley
That is better than 80% peak, nice. Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third rack? You went with a single-port HCA per socket and not the shared, dual-port HCA in the shared PCIe slot? On Thu, Oct 10, 2019 at 8:48 AM Bill Wichser wrote: > Thanks for the kind words.

[Beowulf] Exascale Day (10/18 aka 10^18)

2019-10-04 Thread Scott Atchley
Cray is hosting an online panel with speakers from ANL, LLNL, ORNL, ECP, and Cray on Oct. 18: https://www.cray.com/resources/exascale-day-panel-discussion ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscri

Re: [Beowulf] [EXTERNAL] Re: HPE completes Cray acquisition

2019-09-27 Thread Scott Atchley
Cray: This one goes up to 10^18 On Fri, Sep 27, 2019 at 12:08 PM Christopher Samuel wrote: > On 9/27/19 7:40 AM, Lux, Jim (US 337K) via Beowulf wrote: > > > “A HPE company” seems sort of bloodless and corporate. I would kind of > > hope for something like “CRAY – How Fast Do You Want to Go?” o

Re: [Beowulf] HPE completes Cray acquisition

2019-09-25 Thread Scott Atchley
These companies compliment each other. HPE is working on some very cool technologies and their purchasing power should help reduce costs. Cray has experience with the leadership-scale systems, several generations of HPC interconnects, and optimizing scientific software. We are waiting to find out

Re: [Beowulf] Titan is no more

2019-08-05 Thread Scott Atchley
111_titan-ornl-olcf-activity-6563128357154762753-ce3m > > > On Sun, Aug 4, 2019 at 1:19 PM Scott Atchley > wrote: > >> Hi everyone, >> >> Titan completed it last job Friday and was powered down at 1 pm. I >> imagine the room was a lot quieter after that. &g

[Beowulf] Titan is no more

2019-08-04 Thread Scott Atchley
Hi everyone, Titan completed it last job Friday and was powered down at 1 pm. I imagine the room was a lot quieter after that. Once Titan and the other systems in the room are removed, work will begin on putting in the new, stronger floor that will hold Frontier. Scott __

Re: [Beowulf] HPE to acquire Cray

2019-05-20 Thread Scott Atchley
Geez, I take one day of vacation and this happens. My phone was lit up all day. On Fri, May 17, 2019 at 1:20 AM Kilian Cavalotti < kilian.cavalotti.w...@gmail.com> wrote: > > https://www.bloomberg.com/news/articles/2019-05-17/hp-enterprise-said-to-near-deal-to-buy-supercomputer-maker-cray-jvrfiu7

Re: [Beowulf] LFortran ... a REPL/Compiler for Fortran

2019-03-25 Thread Scott Atchley
Hmm, how does this compare to Flang ? On Sun, Mar 24, 2019 at 12:33 PM Joe Landman wrote: > See https://docs.lfortran.org/ . Figured Jeff Layton would like this :D > > > -- > Joe Landman > e: joe.land...@gmail.com > t: @hpcjoe > w: https://scalability.o

Re: [Beowulf] Large amounts of data to store and process

2019-03-13 Thread Scott Atchley
I agree with your take about slower progress on the hardware front and that software has to improve. DOE funds several vendors to do research to improve technologies that will hopefully benefit HPC, in particular, as well as the general market. I am reviewing a vendor's latest report on micro-archi

Re: [Beowulf] Introduction and question

2019-02-23 Thread Scott Atchley
Yes, you belong. :-) On Sat, Feb 23, 2019 at 9:41 AM Will Dennis wrote: > Hi folks, > > > > I thought I’d give a brief introduction, and see if this list is a good > fit for my questions that I have about my HPC-“ish” infrastructure... > > > > I am a ~30yr sysadmin (“jack-of-all-trades” type), c

Re: [Beowulf] Simulation for clusters performance

2019-01-04 Thread Scott Atchley
You may also want to look at Sandia's Structural Simulation Toolkit and Argonne's CODES . On Thu, Jan 3, 2019 at 6:26 PM Benson Muite wrote: > There are a number of tools. A possible starting point is: > > http://spcl.inf.ethz.ch/

Re: [Beowulf] If you can help ...

2018-11-09 Thread Scott Atchley
Done and I reposted your request on LinkedIn as well. On Fri, Nov 9, 2018 at 8:28 AM Douglas Eadline wrote: > > Everyone: > > This is a difficult email to write. For years we (Lara Kisielewska, > Tim Wilcox, Don Becker, myself, and many others) have organized > and staffed the Beowulf Bash each

Re: [Beowulf] If I were specifying a new custer...

2018-10-11 Thread Scott Atchley
What do your apps need? • Lots of memory? Perhaps Power9 or Naples with 8 memory channels? Also, Cavium ThunderX2. • More memory bandwidth? Same as above. • Max single thread performance? Intel or Power9? • Are your apps GPU enabled? If not, do you have budget/time/expertise to do the work?

Re: [Beowulf] New Spectre attacks - no software mitigation - what impact for HPC?

2018-07-17 Thread Scott Atchley
seems to have got mixed in there also! > The main thrust dual ARM based and RISC-V > > Also I like the plexiglass air shroud pictured at Barcelona. I saw > something similar at the HPE centre in Grenoble. > Damn good idea. > > > > > > > > On 17 July 2018 at

Re: [Beowulf] New Spectre attacks - no software mitigation - what impact for HPC?

2018-07-17 Thread Scott Atchley
Hi Chris, They say that no announced silicon is vulnerable. Your link makes it clear that no ISA is immune if the implementation performs speculative execution. I think your point about two lines of production may make sense. Vendors will have to assess vulnerabilities and the performance trade-of

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-10 Thread Scott Atchley
On Sun, Jun 10, 2018 at 4:53 AM, Chris Samuel wrote: > On Sunday, 10 June 2018 1:22:07 AM AEST Scott Atchley wrote: > > > Hi Chris, > > Hey Scott, > > > We have looked at this _a_ _lot_ on Titan: > > > > A Multi-faceted Approach to Job Placement for Im

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-09 Thread Scott Atchley
Hi Chris, We have looked at this _a_ _lot_ on Titan: A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems https://ieeexplore.ieee.org/document/7877165/ This issue we have is small jobs "inside" large jobs interfering with the larger jobs. The item that is

Re: [Beowulf] HPC Systems Engineer Positions

2018-06-01 Thread Scott Atchley
We have three HPC Systems Engineer positions open in the Technology Integration group within the National Center for Computational Science at ORNL. All are available from http://jobs.ornl.gov. On Fri, Jun 1, 2018 at 9:20 AM, Mahmood Sayed wrote: > Hello fellow HPC community. > > I have potential

Re: [Beowulf] Heterogeneity in a tiny (two-system cluster)?

2018-02-16 Thread Scott Atchley
If it is memory bandwidth limited, you may want to consider AMD's EPYC which has 33% more bandwidth. On Fri, Feb 16, 2018 at 3:41 AM, John Hearns via Beowulf < beowulf@beowulf.org> wrote: > Oh, and while you are at it. > DO a bit of investigation on how the FVCOM model is optimised for use with >

Re: [Beowulf] Intel kills Knights Hill, Xeon Phi line "being revised"

2017-11-18 Thread Scott Atchley
sense. If these research > projects were a start-up, it would have failed hard. > > [1] https://en.wikipedia.org/wiki/X87 > > > > On Sat, Nov 18, 2017 at 8:50 PM, Scott Atchley > wrote: > >> Hmm, can you name a large processor vendor who has not accepted US >> g

Re: [Beowulf] Intel kills Knights Hill, Xeon Phi line "being revised"

2017-11-18 Thread Scott Atchley
Hmm, can you name a large processor vendor who has not accepted US government research funding in the last five years? See DOE's FastForward, FastForward2, DesignForward, DesignForward2, and now PathForward. On Fri, Nov 17, 2017 at 9:18 PM, Jonathan Engwall < engwalljonathanther...@gmail.com> wrot

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-13 Thread Scott Atchley
Are you logging something goes to the disk in the local case, but that is competing for network bandwidth when NFS mounting? On Wed, Sep 13, 2017 at 2:15 PM, Scott Atchley wrote: > Are you swapping? > > On Wed, Sep 13, 2017 at 2:14 PM, Andrew Latham wrote: > >> ack, so mayb

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-13 Thread Scott Atchley
Are you swapping? On Wed, Sep 13, 2017 at 2:14 PM, Andrew Latham wrote: > ack, so maybe validate you can reproduce with another nfs root. Maybe a > lab setup where a single server is serving nfs root to the node. If you > could reproduce in that way then it would give some direction. Beyond that

Re: [Beowulf] Poor bandwith from one compute node

2017-08-17 Thread Scott Atchley
I would agree that the bandwidth points at 1 GigE in this case. For IB/OPA cards running slower than expected, I would recommend ensuring that they are using the correct amount of PCIe lanes. On Thu, Aug 17, 2017 at 12:35 PM, Joe Landman wrote: > > > On 08/17/2017 12:00 PM, Faraz Hussain wrote:

Re: [Beowulf] Hyperthreading and 'OS jitter'

2017-07-22 Thread Scott Atchley
I would imagine the answer is "It depends". If the application uses the per-CPU caches effectively, then performance may drop when HT shares the cache between the two processes. We are looking at reserving a couple of cores per node on Summit to run system daemons if the use requests. If the user

Re: [Beowulf] Register article on Epyc

2017-06-22 Thread Scott Atchley
Hi Mark, I agree that these are slightly noticeable but they are far less than accessing a NIC on the "wrong" socket, etc. Scott On Thu, Jun 22, 2017 at 9:26 AM, Mark Hahn wrote: > But now, with 20+ core CPUs, does it still really make sense to have >> dual socket systems everywhere, with NUMA

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Scott Atchley
In addition to storage, if you use GPUs for compute, the single socket is compelling. If you rely on the GPUs for the parallel processing, then the CPUs are just for serial acceleration and handling I/O. A single socket with 32 cores and 128 lanes of PCIe can handle up to eight GPUs with four CPU c

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Scott Atchley
The single socket versions make sense for storage boxes that can use RDMA. You can have two EDR ports out the front using 16 lanes each. For the storage, you can have 32-64 lanes internally or out the back for NVMe. You even have enough lanes for two ports of HDR, when it is ready, and 48-64 lanes

Re: [Beowulf] Suggestions to what DFS to use

2017-02-15 Thread Scott Atchley
Hi Chris, Check with me in about a year. After using Lustre for over 10 years to initially serve ~10 PB of disk and now serve 30+ PB with very nice DDN gear, later this year we will be installing 320 PB (250 PB useable) of GPFS (via IBM ESS storage units) to support Summit, our next gen HPC syste

Re: [Beowulf] genz interconnect?

2016-10-12 Thread Scott Atchley
The Gen-Z site looks like it has a detailed FAQ. The CCIX FAQ is a little more sparse. The ARM link you posted is a good overview. On Wed, Oct 12, 2016 at 8:11 AM, Michael Di Domenico wrote: > anyone have any info on this? there isn't much out there on the web. > the arm.com link has more detai

Re: [Beowulf] Parallel programming for Xeon Phis

2016-08-24 Thread Scott Atchley
On Wed, Aug 24, 2016 at 4:54 PM, Greg Lindahl wrote: > On Wed, Aug 24, 2016 at 04:44:03PM +, John Hearns wrote: > > > OK, I guess that the state of the art for a FORTRAN Compiler in the > > 60s is pitiful compared to the sophisticated compilers we have > > today. > > Actually, Fortran was des

Re: [Beowulf] AMD cards with integrated SSD slots

2016-07-27 Thread Scott Atchley
None have AMD CPUs? Number three Titan has AMD Interlagos CPUs and NVIDIA GPUs. Given that the Fiji can access HBM at 512 GB/s, accessing NVM at 4 GB/s will feel rather slow albeit much better than 1-2 GB/s connected to the motherboard's PCIe. On Wed, Jul 27, 2016 at 5:53 PM, Brian Oborn wrote:

Re: [Beowulf] NFS HPC survey results.

2016-07-22 Thread Scott Atchley
Did you mean IB over Ethernet (IBoE)? I thought IB over IP has been around long before RoCE. On Thu, Jul 21, 2016 at 7:34 PM, Christopher Samuel wrote: > Thanks so much Bill, very much appreciated! > > On 21/07/16 09:19, Bill Broadley wrote: > > > 15) If IB what transport (10 responses) > >

[Beowulf] Anyone using Apache Mesos?

2015-11-11 Thread Scott Atchley
Someone asked me and I said I would ask around. Thanks, Scott ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Semour Cray 90th Anniversary

2015-10-14 Thread Scott Atchley
On Wed, Oct 14, 2015 at 3:58 PM, Prentice Bisbal < prentice.bis...@rutgers.edu> wrote: > On 10/03/2015 01:54 PM, Nathan Pimental wrote: > > Very nice article. Are cray computers still made, and how popular are > they? How pricey are they? :) > > > Yes, Argonne National Lab (ANL) announced in April

Re: [Beowulf] Accelio

2015-08-20 Thread Scott Atchley
They are using this as a basis for the XioMessenger within Ceph to get RDMA support. On Thu, Aug 20, 2015 at 9:24 AM, John Hearns wrote: > I saw this mentioned on the Mellanox site. Has anyone come across it: > > > > http://www.accelio.org/ > > > > Looks interesting. > > > > > > > > Dr. John Hea

[Beowulf] China aims for 100 PF

2015-07-17 Thread Scott Atchley
They will use a homegrown GPDSP (general purpose DSP) accelerator in lieu of the Intel Knights Landing accelerators: http://www.theplatform.net/2015/07/15/china-intercepts-u-s-intel-restrictions-with-homegrown-supercomputer-chips/ Also, hints about their interconnect and file system upgrades. Sc

Re: [Beowulf] Paper describing Google's queuing system "Borg"

2015-04-21 Thread Scott Atchley
Is Omega the successor? The Borg paper mentions Omega : Omega [69] supports multiple parallel, specialized “verti- cals” that are each roughly equivalent to a Borgmaster minus its persistent store and link shards. Omega schedulers use optimistic concurrency control to manipulate a shared repre-

Re: [Beowulf] CephFS

2015-04-09 Thread Scott Atchley
No, but you might find this interesting: http://dl.acm.org/citation.cfm?id=2538562 On Thu, Apr 9, 2015 at 11:24 AM, Tom Harvill wrote: > > Hello, > > Question: is anyone on this list using CephFS in 'production'? If so, > what are you using > it for (ie. scratch/tmp, archive, homedirs)? In ou

Re: [Beowulf] interesting article on HPC vs evolution of 'big data' analysis

2015-04-09 Thread Scott Atchley
On Wed, Apr 8, 2015 at 9:56 PM, Greg Lindahl wrote: > On Wed, Apr 08, 2015 at 03:57:34PM -0400, Scott Atchley wrote: > > > There is concern by some and outright declaration by others (including > > hardware vendors) that MPI will not scale to exascale due to issues like >

Re: [Beowulf] interesting article on HPC vs evolution of 'big data' analysis

2015-04-08 Thread Scott Atchley
There is concern by some and outright declaration by others (including hardware vendors) that MPI will not scale to exascale due to issues like rank state growing too large for 10-100 million endpoints, lack of reliability, etc. Those that make this claim then offer up their favorite solution (a PG

Re: [Beowulf] Mellanox Multi-host

2015-03-11 Thread Scott Atchley
Looking at this and the above link: http://www.mellanox.com/page/press_release_item?id=1501 It seems that the OCP Yosemite is a motherboard that allows four compute cards to be plugged into it. The compute cards can even have different CPUs (x86, ARM, Power). The Yosemite board has the NIC and co

[Beowulf] Summit

2014-11-14 Thread Scott Atchley
This is what's next: https://www.olcf.ornl.gov/summit/ Scott ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] 10Gb/s iperf test point (TCP) available ?

2010-10-15 Thread Scott Atchley
On Oct 14, 2010, at 10:37 PM, Christopher Samuel wrote: > Apologies if this is off topic, but I'm trying to check > what speeds the login nodes to our cluster and BlueGene > can talk at and the only 10Gb/s iperf server I've been > given access to so far (run by AARNET) showed me just > under 1Gb/s

Re: [Beowulf] 48-port 10gig switches?

2010-09-02 Thread Scott Atchley
On Sep 2, 2010, at 12:58 PM, David Mathog wrote: > A lot of 1 GbE switches use around 15W/port so I thought 10 GbE switches > would be real fire breathers. It doesn't look that way though, the > power consumption cited here: > > http://www.voltaire.com/NewsAndEvents/Press_Releases/press2010/Volt

Re: [Beowulf] OT: recoverable optical media archive format?

2010-06-10 Thread Scott Atchley
On Jun 10, 2010, at 3:20 PM, David Mathog wrote: > Jesse Becker and others suggested: > >>http://users.softlab.ntua.gr/~ttsiod/rsbep.html > > I tried it and it works, mostly, but definitely has some warts. > > To start with I gave it a negative control - a file so badly corrupted > it shoul

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-24 Thread Scott Atchley
On Feb 23, 2010, at 6:16 PM, Brice Goglin wrote: > Greg Lindahl wrote: >>> now that I'm inventorying ignorance, I don't really understand why RDMA >>> always seems to be presented as a big hardware issue. wouldn't it be >>> pretty easy to define an eth or IP-level protocol to do remote puts, >>

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Scott Atchley
On Feb 20, 2010, at 1:49 PM, Paul Johnson wrote: What are the reasons to prefer one or the other? Why choose? You can install both and test with your application to see if there is a performance difference (be sure to keep your runtime environment paths correct - don't mix libraries and MP

Re: [Beowulf] Performance tuning for Jumbo Frames

2009-12-15 Thread Scott Atchley
On Dec 14, 2009, at 12:57 PM, Alex Chekholko wrote: Set it as high as you can; there is no downside except ensuring all your devices are set to handle that large unit size. Typically, if the device doesn't support jumbo frames, it just drops the jumbo frames silently, which can result in odd

Re: [Beowulf] Re: scalability

2009-12-10 Thread Scott Atchley
On Dec 10, 2009, at 9:56 AM, Jörg Saßmannshausen wrote: I have heard of Open-MX before, do you need special hardware for that? No, any Ethernet driver on Linux. http://open-mx.org Scott ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Pen

Re: [Beowulf] mpd ..failed ..!

2009-11-16 Thread Scott Atchley
On Nov 14, 2009, at 7:24 AM, Zain elabedin hammade wrote: I installed mpich2 - 1.1.1-1.fc11.i586.rpm . You should ask this on the mpich list at: https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss I wrote on every machine : mpd & mpdtrace -l You started stand-alone MPD rings of size

Re: [Beowulf] Re: Ahoy shipmates

2009-10-13 Thread Scott Atchley
On Oct 13, 2009, at 12:06 PM, David Mathog wrote: A sprinkler leak in a computer room is bad, but just imagine the damage that seawater would do. Even a fine mist would be dreadful, as it would be sucked through the cases and the droplets would either short things out immediately, or lead i

Re: [Beowulf] large scratch space on cluster

2009-09-29 Thread Scott Atchley
On Sep 29, 2009, at 1:13 PM, Scott Atchley wrote: On Sep 29, 2009, at 10:09 AM, Jörg Saßmannshausen wrote: However, I was wondering whether it does make any sense to somehow 'export' that scratch space to other nodes (4 cores only). So, the idea behind that is, if I need a vast

Re: [Beowulf] large scratch space on cluster

2009-09-29 Thread Scott Atchley
On Sep 29, 2009, at 10:09 AM, Jörg Saßmannshausen wrote: However, I was wondering whether it does make any sense to somehow 'export' that scratch space to other nodes (4 cores only). So, the idea behind that is, if I need a vast amount of scratch space, I could use the one in the 8 core nod

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 1:44 PM, Scott Atchley wrote: Right, and that's what I did before, with sensible results I thought. Repeating it now on Centos 5.2 and OpenSuSE 10.3, it doesn't behave sensibly, and I don't know what's different from the previous SuSE results apart, prob

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 12:10 PM, Dave Love wrote: When I test Open-MX, I turn interrupt coalescing off. I run omx_pingpong to determine the lowest latency (LL). If the NIC's driver allows one to specify the interrupt value, I set it to LL-1. Right, and that's what I did before, with sensible r

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 5:59 AM, Dave Love wrote: Can you say something about any tuning you did to get decent results? To get the lowest latency, turn off rx interrupt coalescence, either with ethtool or module parameters, depending on the driver. Of course, you may not want to turn it off co

Re: [Beowulf] 10 GbE

2009-02-11 Thread Scott Atchley
On Feb 11, 2009, at 7:57 AM, Igor Kozin wrote: Hello everyone, we are embarking on evaluation of 10 GbE for HPC and I was wondering if someone has already had experience with Arista 7148SX 48 port switch or/and Netxen cards. General pros and cons would be greatly appreciated and in particu

Re: [Beowulf] tcp error: Need ideas!

2009-01-25 Thread Scott Atchley
On Jan 25, 2009, at 10:13 AM, Gerry Creager wrote: -bash-3.2# ethtool -K rx off no offload settings changed You missed the interface here. You should try: -bash-3.2# ethtool -K eth1 rx off -bash-3.2# ethtool -k eth1 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on scatte

Re: [Beowulf] Odd SuperMicro power off issues

2008-12-08 Thread Scott Atchley
Hi Chris, We had a customer with Opterons experience reboots with nothing in the logs, etc. The only thing we saw with "ipmitool sel list" was: 1 | 11/13/2007 | 10:49:44 | System Firmware Error | We traced to a HyperTransport deadlock, which by default reboots the node. Our engineer fou

Re: [Beowulf] Security issues

2008-10-27 Thread Scott Atchley
On Oct 25, 2008, at 11:17 PM, Marian Marinov wrote: Also a good security addition will be adding SELinux, RSBAC or GRSecurity to the kernel and actually using any of these. Bear in mind, that there may be performance trade-offs. Enabling SELinux will cut 2 Gb/s off a 10 Gb/s link as measur

Re: [Beowulf] Has DDR IB gone the way of the Dodo?

2008-10-03 Thread Scott Atchley
On Oct 3, 2008, at 2:24 PM, Bill Broadley wrote: QDR over fiber should be "reasonably priced", here's hoping that the days of Myrinet 250MB/sec optical cables will return. Corrections/comments welcome. I am not in sales and I have no access to pricing besides our list prices, but I am tol

Re: [Beowulf] scratch File system for small cluster

2008-09-25 Thread Scott Atchley
On Sep 25, 2008, at 10:19 AM, Joe Landman wrote: We have measured NFSoverRDMA speeds (on SDR IB at that) at 460 MB/s, on an RDMA adapter reporting 750 MB/s (in a 4x PCIe slot, so ~860 MB/ s max is what we should expect for this). Faster IB hardware should result in better performance, thoug

Re: [Beowulf] 10gig CX4 switches

2008-09-15 Thread Scott Atchley
On Sep 15, 2008, at 8:38 PM, Joe Landman wrote: Greg Lindahl wrote: I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 ports) and it's time to buy a 10gig switch. Has anyone done a recent survey of the market? I don't need any layer-3 features, just layer-2. I see that HP h

Re: [Beowulf] Gigabit Ethernet and RDMA

2008-08-11 Thread Scott Atchley
Hi Gus, Are you trying to find software for NICs you currently have? Or are you looking for gigabit Ethernet NICs that natively support some form of kernel-bypass/zero-copy? I do not know of any of the latter (do Chelsio or others offer 1G NICs with iWarp?). As for the former, there are

Re: [Beowulf] Building new cluster - estimate

2008-08-10 Thread Scott Atchley
On Aug 5, 2008, at 10:43 PM, Joe Landman wrote: As a note: I was pointed to a recent lockup (double lock acquisition) in XFS with NFS. I don't think I have seen this one in the wild myself. Right now I am fighting an NFS over RDMA crash in 2.6.26 which seems to have been cured in 2.6.26.

Re: [Beowulf] copying big files (Henning Fehrmann)

2008-08-10 Thread Scott Atchley
On Aug 10, 2008, at 7:57 AM, Scott Atchley wrote: You may want to look at http://loci.cs.utk.edu. If you need to distribute large files within a cluster or across the WAN, you can use the LoRS tools to stripe the file over multiple servers and the clients then try pulling blocks off of

Re: [Beowulf] copying big files (Henning Fehrmann)

2008-08-10 Thread Scott Atchley
On Aug 9, 2008, at 5:03 PM, Reuti wrote: Hi, Am 09.08.2008 um 20:53 schrieb jitesh dundas: We could try and implement this functionality of resuming broken downloads like in some softwares like Download Accelerator and bit-torrent. I hope my views can help, so here goes:- When a file is bei

Re: [Beowulf] Roadrunner picture

2008-07-16 Thread Scott Atchley
On Jul 16, 2008, at 6:50 PM, John Hearns wrote: On Wed, 2008-07-16 at 23:29 +0100, John Hearns wrote: To answer your question more directly, Panasas is a storage cluster to complement your compute cluster. Each storage blade is connected into a shelf (chassis) with an internal ethernet net

Re: [Beowulf] automount on high ports

2008-07-02 Thread Scott Atchley
On Jul 2, 2008, at 10:09 AM, Gerry Creager wrote: Although I believe Lustre's robustness is very good these days, I do not believe that it will not work in your setting. I think that they currently do not recommend mounting a client on a node that is also working as a server as you are doin

Re: [Beowulf] automount on high ports

2008-07-02 Thread Scott Atchley
On Jul 2, 2008, at 7:22 AM, Carsten Aulbert wrote: Bogdan Costescu wrote: Have you considered using a parallel file system ? We looked a bit into a few, but would love to get any input from anyone on that. What we found so far was not really convincing, e.g. glusterFS at that time was no

Re: [Beowulf] How Can Microsoft's HPC Server Succeed?

2008-04-03 Thread Scott Atchley
On Apr 3, 2008, at 3:52 PM, Kyle Spaans wrote: On Wed, Apr 2, 2008 at 7:39 PM, Chris Dagdigian <[EMAIL PROTECTED]> wrote: spew out a terabyte per day of raw data and many times that stuff needs to be post processed and distilled down into different forms. A nice little 8-core box running a s

  1   2   >