Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

Prentice Bisbal via Beowulf Fri, 17 Sep 2021 10:51:39 -0700

We ran a number of apps on evaluation systems before determining the96-core Intel systems provided the best results. I was responsible forrunning HPL and HPCG benchmarks. We had researcher run their varioussimulation codes.

I agree withjust about everything you said, especially (1) - I thinkturbo frequencies are irrelevant for HPC, since all the cores willtypically be pinned during an HPC job. When I calculated the theoreticalFLOPS for the evaluation systems, I had to look at the AVX512frequencies for the Intel processors, since that is yet anotheroperating frequency. To Intel's credit, the different CPU frequencyinformation and when those frequencies would be invoked, was availableonline for all but their newest processors (probably just hadn't beenpublished yet), whereas I couldn't find any frquency steppinginformation for the AMDs.

I often point out to people that clock frequencies have been decreasingas core counts go up. I remember in the early 2000s that CPUs with aClock of ~3.5 GHz were pretty common. Now it seems most processors havea baseclock below 3 GHz, and only go above that in "turbo mode".

Where I disagree with you is (3). Whether or not cache size is importantdepends on the size of the job. If your iterating through data-parallelloops over a large dataset that exceeds cache size, the opportunity toreread cached data is probably limited or nonexistent. As we often sayhere, "it depends". I'm sore someone with better low-level hardwareknowledge will pipe in and tell me why I'm wrong (Cunningham's Law).


Prentice

On 9/14/21 1:34 PM, Douglas Eadline wrote:

Here are the questions I am curious about. High core count is great,
if it works for your performance goals.

1. Clock speed: All the turbo stuff is great for a low
number of processes, but if you load all the cores, then you are
now running at base clock speeds, which due to the large number
of cores and the thermal envelope is often not that fast.

2. Memory BW: Take the number of memory channels multiply by
the memory speed BW and divide by the number of cores. That
is of course a worst case BW/core, however, memory hungry apps
may not be able to use all the cores.

3. Cache Size: same idea as Memory, why do you think
things like AMD 3D V-Cache will be landing very soon.

IMO fat-core processors are designed and work best
for cloud applications; shared use, bursty applications,
containers. HPC tends to light everything up at once for
long periods of time.

--
Doug

Not anymore, at least not in the HPC realm.Â  We recently purchased
quad-socket systems with a total of 96 Intel cores/node, and dual socket
systems with 128 AMD cores/node.

With Intel now marking their "highly scalable" (or something like that)
line of processors, and AMD, who was always pushing highr core-counts,
back in the game, I think numbers like that will be common in HPC
clusters puchased in the next year or so.

But, yeah, I guess 28 physical cores is more than the average desktop
has these days.


Prentice

On 8/24/21 6:42 PM, Jonathan Engwall wrote:

EMC offers dual socket 28 physical core processors. That's a lot of
computer.

On Tue, Aug 24, 2021, 1:33 PM Lux, Jim (US 7140) via Beowulf
<beowulf@beowulf.org <mailto:beowulf@beowulf.org>> wrote:

     Yes, indeed.. I didn't call out Limulus, because it was mentioned
     earlier in the thread.

     And another reason why you might want your own.
     Every so often, the notice from JPL's HPC goes out to the users -
     "Halo/Gattaca/clustername will not be available because it is
     reserved for Mars {Year}"Â  While Mars landings at JPL are a *big
     deal*, not everyone is working on them (in fact, by that time,
     most of the Martians are now working on something else), and you
     want to get your work done.Â  I suspect other institutional
     clusters have similar "the 800 pound (363 kg) gorilla has
     requested" scenarios.


     ï»¿On 8/24/21, 11:34 AM, "Douglas Eadline" <deadl...@eadline.org
     <mailto:deadl...@eadline.org>> wrote:


     Â  Â  Jim,

     Â  Â  You are describing a lot of the design pathway for Limulus
     Â  Â  clusters. The local (non-data center) power, heat, noise are
all
     Â  Â  minimized while performance is maximized.

     Â  Â  A well decked out system is often less than $10K and
     Â  Â  are on par with a fat multi-core workstations.
     Â  Â  (and there are reasons a clustered approach performs better)

     Â  Â  Another use case is where there is no available research data
     center
     Â  Â  hardware because there is no specialized
sysadmins/space/budget.
     Â  Â  (Many smaller colleges and universities fall into this
     Â  Â  group). Plus, often times, dropping something into a data
center
     Â  Â  means an additional cost to the researchers budget.

     Â  Â  --
     Â  Â  Doug


     Â  Â  > I've been looking at "small scale" clusters for a long time
     (2000?)Â  and
     Â  Â  > talked a lot with the folks from Orion, as well as on this
list.
     Â  Â  > They fit in a "hard to market to" niche.
     Â  Â  >
     Â  Â  > My own workflow tends to have use cases that are a big
     "off-nominal" - one
     Â  Â  > is the rapid iteration of a computational model while
     experimenting - That
     Â  Â  > is, I have a python code that generates input to Numerical
     Â  Â  > Electromagnetics Code (NEC), I run the model over a range of
     parameters,
     Â  Â  > then look at the output to see if I'm getting what what I
     want. If not, I
     Â  Â  > change the code (which essentially changes the antenna
     design), rerun the
     Â  Â  > models, and see if it worked.Â  I'd love an iteration time
of
     "a minute or
     Â  Â  > two" for the computation, maybe a minute or two to plot the
     outputs
     Â  Â  > (fiddling with the plot ranges, etc.).Â  For reference, for
a
     radio
     Â  Â  > astronomy array on the far side of the Moon, I was running
     144 cases, each
     Â  Â  > at 380 frequencies: to run 1 case takes 30 seconds, so
     farming it out to
     Â  Â  > 12 processors gave me a 6 minute run time, which is in the
     right range.
     Â  Â  > Another model of interaction of antnenas on a spacecraft
     runs about 15
     Â  Â  > seconds/case; and a third is about 120 seconds/case.
     Â  Â  >
     Â  Â  > To get "interactive development", then, I want the "cycle
     time" to be 10
     Â  Â  > minutes - 30 minutes of thinking about how to change the
     design and
     Â  Â  > altering the code to generate the new design, make a couple
     test runs to
     Â  Â  > find the equivalent of "syntax errors", and then turn it
     loose - get a cup
     Â  Â  > of coffee, answer a few emails, come back and see the
     results.Â  I could
     Â  Â  > iterate maybe a half dozen shots a day, which is pretty
     productive.
     Â  Â  > (Compared to straight up sequential - 144 runs at 30 seconds
     is more than
     Â  Â  > an hour - and that triggers a different working cadence that
     devolves to
     Â  Â  > sort of one shot a day) - The "10 minute" turnaround is also
     compatible
     Â  Â  > with my job, which, unfortunately, has things other than
     computing -
     Â  Â  > meetings, budgets, schedules.Â  At 10 minute runs, I can
     carve out a few
     Â  Â  > hours and get into that "flow state" on the technical
     problem, before
     Â  Â  > being disrupted by "a person from Porlock."
     Â  Â  >
     Â  Â  > So this is, I think, a classic example ofÂ  "I want local
     control" - sure,
     Â  Â  > you might have access to a 1000 or more node cluster, but
     you're going to
     Â  Â  > have to figure out how to use its batch management system
     (SLURM and PBS
     Â  Â  > are two I've used) - and that's a bit different than "self
     managed 100%
     Â  Â  > access". Or, AWS kinds of solutions for EP problems.
     Â There's something
     Â  Â  > very satisfying about getting an idea and not having to "ok,
     now I have to
     Â  Â  > log in to the remote cluster with TFA, set up the tunnel,
     move my data,
     Â  Â  > get the job spun up, get the data back" - especially for
     iterative
     Â  Â  > development.Â  I did do that using JPLs and TACCs clusters,
     and "moving
     Â  Â  > data" proved to be a barrier - the other thing was the
     "iterative code
     Â  Â  > development" in between runs - Most institutional clusters
     discourage
     Â  Â  > interactive development on the cluster (even if you're only
     sucking up one
     Â  Â  > core).Â  Â If the tools were a bit more "transparent" and
     there were "shared
     Â  Â  > disk" capabilities, this might be more attractive, and while
     everyone is
     Â  Â  > exceedingly helpful, there are still barriers to making it
     "run it on my
     Â  Â  > desktop"
     Â  Â  >
     Â  Â  > Another use case that I wind up designing for is the "HPC in
     places
     Â  Â  > without good communications and limited infrastructure" -Â
     The notional
     Â  Â  > use case might be an archaeological expedition wanting to
     use HPC to
     Â  Â  > process ground penetrating radar data or something like
     that.Â  Â (or, given
     Â  Â  > that I work at JPL, you have a need for HPC on the surface
     of Mars) - So
     Â  Â  > sending your data to a remote cluster isn't really an
     option.Â  And here,
     Â  Â  > the "speedup" you need might well be a factor of 10-20 over
     a single
     Â  Â  > computer, something doable in a "portable" configuration
     (check it as
     Â  Â  > luggage, for instance). Just as for my antenna modeling
     problems, turning
     Â  Â  > an "overnight" computation into a "10-20 minute" computation
     would change
     Â  Â  > the workflow dramatically.
     Â  Â  >
     Â  Â  >
     Â  Â  > Another market is "learn how to cluster" - for which the RPi
     clusters work
     Â  Â  > (or "packs" of Beagleboards) - they're fun, and in a
classroom
     Â  Â  > environment, I think they are an excellent cost effective
     solution to
     Â  Â  > learning all the facets of "bringing up a cluster from
     scratch", but I'm
     Â  Â  > not convinced they provide a good "MIPS/Watt" or
     "MIPS/liter" metric - in
     Â  Â  > terms of convenience.Â  That is, rather than a cluster of 10
     RPis, you
     Â  Â  > might be better off just buying a faster desktop machine.
     Â  Â  >
     Â  Â  > Let's talk design desirements/constraints
     Â  Â  >
     Â  Â  > I've had a chance to use some "clusters in a box" over the
     last decades,
     Â  Â  > and I'd suggest that while power is one constraint, another
     is noise.
     Â  Â  > Just the other day, I was in a lab and someone commented
     that "those
     Â  Â  > computers are amazingly fast, but you really need to put
     them in another
     Â  Â  > room". Yes, all those 1U and 2U rack mounted boxes with tiny
     fans
     Â  Â  > screaming is just not "office compatible"Â  Â And that kind
of
     brings up
     Â  Â  > another interesting constraint for "deskside" computing -
     heat.Â  Sure you
     Â  Â  > can plug in 1500W of computers (or even 3000W if you have
     two circuits),
     Â  Â  > but can you live in your office with a 1500W space heater?
     Â  Â  > Interestingly, for *my* workflow, that's probably ok - *my*
     computation
     Â  Â  > has a 10-30% duty cycle - think for 30 minutes, compute for
     5-10.Â  But
     Â  Â  > still, your office mate will appreciate if you keep the
     sound level down
     Â  Â  > to 50dBA.
     Â  Â  >
     Â  Â  > GPUs - some codes can use them, some can't.Â  They tend,
     though, to be
     Â  Â  > noisy (all that air flow for cooling). I don't know that GPU
     manufacturers
     Â  Â  > spend a lot of time on this.Â  Sure, I've seen charts and
     specs that claim
     Â  Â  > <50 dBA. But I think they're gaming the measurement,
     counting on the user
     Â  Â  > to be a gamer wearing headphones or with a big sound
     system.Â  I will say,
     Â  Â  > for instance, that the PS/4 positively roars when spun up
     unless youÃ¢â‚¬â„¢ve
     Â  Â  > got external forced ventilation to keep the inlet air temp
low.
     Â  Â  >
     Â  Â  > Looking at GSA guidelines for office space - if it's
     "deskside" it's got
     Â  Â  > to fit in the 50-80 square foot cubicle or your shared part
     of a 120
     Â  Â  > square foot office.
     Â  Â  >
     Â  Â  > Then one needs to figure out the "refresh cycle time" for
     buying hardware
     Â  Â  > - This has been a topic on this list forever - you have 2
     years of
     Â  Â  > computation to do: do you buy N nodes today at speed X, or
     do you wait a
     Â  Â  > year, buy N/2 nodes at speed 4X, and finish your computation
     at the same
     Â  Â  > time.
     Â  Â  >
     Â  Â  > Fancy desktop PCs with monitors, etc. come in at under $5k,
     including
     Â  Â  > burdens and installation, but not including monthly service
     charges (in an
     Â  Â  > institutional environment).Â  If you look at "purchase
     limits" there's some
     Â  Â  > thresholds (usually around $10k, then increasing in factors
     of 10 or 100
     Â  Â  > steps) for approvals.Â  So a $100k deskside box is going to
     be a tough
     Â  Â  > sell.
     Â  Â  >
     Â  Â  >
     Â  Â  >
     Â  Â  > Ã¯Â»Â¿On 8/24/21, 6:07 AM, "Beowulf on behalf of Douglas
Eadline"
     Â  Â  > <beowulf-boun...@beowulf.org
     <mailto:beowulf-boun...@beowulf.org> on behalf of
     deadl...@eadline.org <mailto:deadl...@eadline.org>> wrote:
     Â  Â  >
     Â  Â  >Â  Â  Â Jonathan
     Â  Â  >
     Â  Â  >Â  Â  Â It is a real cluster, available in 4 and 8 node
versions.
     Â  Â  >Â  Â  Â The design if for non-data center use. That is, local
     Â  Â  >Â  Â  Â office, lab, home where power, cooling, and noise
     Â  Â  >Â  Â  Â are important. More info here:
     Â  Â  >
     Â  Â  >
     
https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$
     
<https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$>
     Â  Â  >
     
https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$
     
<https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$>
     Â  Â  >
     Â  Â  >Â  Â  Â --
     Â  Â  >Â  Â  Â Doug
     Â  Â  >
     Â  Â  >
     Â  Â  >
     Â  Â  >Â  Â  Â > Hi Doug,
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > Not to derail the discussion, but a quick question
you
     say desk
     Â  Â  > side
     Â  Â  >Â  Â  Â > cluster is it a single machine that will run a vm
cluster?
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > Regards,
     Â  Â  >Â  Â  Â > Jonathan
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > -----Original Message-----
     Â  Â  >Â  Â  Â > From: Beowulf <beowulf-boun...@beowulf.org
     <mailto:beowulf-boun...@beowulf.org>> On Behalf Of Douglas
     Â  Â  > Eadline
     Â  Â  >Â  Â  Â > Sent: 23 August 2021 23:12
     Â  Â  >Â  Â  Â > To: John Hearns <hear...@gmail.com
     <mailto:hear...@gmail.com>>
     Â  Â  >Â  Â  Â > Cc: Beowulf Mailing List <beowulf@beowulf.org
     <mailto:beowulf@beowulf.org>>
     Â  Â  >Â  Â  Â > Subject: Re: [Beowulf] List archives
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > John,
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > I think that was on twitter.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > In any case, I'm working with these processors
right now.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > On the new Ryzens, the power usage is actually
quite
     tunable.
     Â  Â  >Â  Â  Â > There are three settings.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > 1) Package Power Tracking: The PPT threshold is the
     allowed socket
     Â  Â  > power
     Â  Â  >Â  Â  Â > consumption permitted across the voltage rails
     supplying the
     Â  Â  > socket.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > 2) Thermal Design Current: The maximum current
(TDC)
     (amps) that can
     Â  Â  > be
     Â  Â  >Â  Â  Â > delivered by a specific motherboard's voltage
regulator
     Â  Â  > configuration in
     Â  Â  >Â  Â  Â > thermally-constrained scenarios.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > 3) Electrical Design Current: The maximum current
     (EDC) (amps) that
     Â  Â  > can be
     Â  Â  >Â  Â  Â > delivered by a specific motherboard's voltage
regulator
     Â  Â  > configuration in a
     Â  Â  >Â  Â  Â > peak ("spike") condition for a short period of
time.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > My goal is to tweak the 105W TDP R7-5800X so it
draws
     power like
     Â  Â  > the
     Â  Â  >Â  Â  Â > 65W-TDP R5-5600X
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > This is desk-side cluster low power stuff.
     Â  Â  >Â  Â  Â > I am using extension cable-plug for Limulus blades
     that have an
     Â  Â  > in-line
     Â  Â  >Â  Â  Â > current meter (normally used for solar panels).
     Â  Â  >Â  Â  Â > Now I can load them up and watch exactly how much
     current is being
     Â  Â  > pulled
     Â  Â  >Â  Â  Â > across the 12V rails.
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > If you need more info, let me know
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > --
     Â  Â  >Â  Â  Â > Doug
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â >> The Beowulf list archives seem to end in July
2021.
     Â  Â  >Â  Â  Â >> I was looking for Doug Eadline's post on limiting
AMD
     power and
     Â  Â  > the
     Â  Â  >Â  Â  Â >> results on performance.
     Â  Â  >Â  Â  Â >>
     Â  Â  >Â  Â  Â >> John H
     Â  Â  >Â  Â  Â >> _______________________________________________
     Â  Â  >Â  Â  Â >> Beowulf mailing list, Beowulf@beowulf.org
     <mailto:Beowulf@beowulf.org> sponsored by Penguin
     Â  Â  >Â  Â  Â >> Computing To change your subscription (digest mode
or
     unsubscribe)
     Â  Â  >Â  Â  Â >> visit
     Â  Â  >Â  Â  Â >>
     
https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$
     
<https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$>
     Â  Â  >Â  Â  Â >> /beowulf.org/cgi-bin/mailman/listinfo/beowulf
     <http://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
     Â  Â  >Â  Â  Â >>
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > --
     Â  Â  >Â  Â  Â > Doug
     Â  Â  >Â  Â  Â >
     Â  Â  >Â  Â  Â > _______________________________________________
     Â  Â  >Â  Â  Â > Beowulf mailing list, Beowulf@beowulf.org
     <mailto:Beowulf@beowulf.org> sponsored by Penguin
     Â  Â  > Computing
     Â  Â  >Â  Â  Â > To change your subscription (digest mode or
     unsubscribe) visit
     Â  Â  >Â  Â  Â >
     
https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$
     
<https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$>
     Â  Â  >Â  Â  Â >
     Â  Â  >
     Â  Â  >
     Â  Â  >Â  Â  Â --
     Â  Â  >Â  Â  Â Doug
     Â  Â  >
     Â  Â  >Â  Â  Â _______________________________________________
     Â  Â  >Â  Â  Â Beowulf mailing list, Beowulf@beowulf.org
     <mailto:Beowulf@beowulf.org> sponsored by Penguin
     Â  Â  > Computing
     Â  Â  >Â  Â  Â To change your subscription (digest mode or
unsubscribe)
     visit
     Â  Â  >
     
https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$
     
<https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$>
     Â  Â  >
     Â  Â  >


     Â  Â  --
     Â  Â  Doug


     _______________________________________________
     Beowulf mailing list, Beowulf@beowulf.org
     <mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
     To change your subscription (digest mode or unsubscribe) visit
     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Doug

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

Reply via email to