From: Beowulf <beowulf-boun...@beowulf.org> on behalf of Joe Landman 
<joe.land...@gmail.com>
Date: Monday, June 21, 2021 at 6:46 AM
To: Jonathan Engwall <engwalljonathanther...@gmail.com>
Cc: "beowulf@beowulf.org" <beowulf@beowulf.org>
Subject: [EXTERNAL] Re: [Beowulf] AMD and AVX512

On 6/21/21 9:20 AM, Jonathan Engwall wrote:
I have followed this thinking "square peg, round hole."
You have got it again, Joe. Compilers are your problem.

<snip discussion of architecture>

To date, I don’t know that *compilers* pay much attention to things like IO 
(that’s buried in some library call no doubt).

>>Maybe, someday, we'll get a great HPC compiler for C/Fortran.

Wasn’t the Fortran compiler for the 7600 highly optimized? Did vector unrolling 
and all that. And those compilers for the FPS boxes?



I think you mean great HPC compilers for chips that are available and fast 
<grin>



I think, too that the comments about ARM vs x86 vs whatever are interesting.



We’ve moved a long way from clusters where the ethernet interconnect was rate 
limiting, and the nodes were single core, single memory, single disk (if any).  
 When you start getting into processors with hundreds of cores, or you start 
looking at “nanojoules/instruction” (or is instruction even the right thing to 
be counting.. maybe it’s nanojoules/data operation – where that could be a 
read/write from memory, disk, or interprocessor link).



Look at the (probably) specious claim that Tesla has the 5th fastest 
supercomputer - articles are very light on details, but I think it’s a whole 
bunch of GPUs – but their “number of cores” isn’t very big compared to even 
#100 on the “Top 500” list.



However, it might well be that for Tesla’s specific processing load, that 5000 
GPU cores *is* faster than most Top 500 clustes.

And, given the recent news about miners consuming all those joules – maybe our 
metrics should be looking at more than raw speed.

Jim



(who has not just 1, but TWO, ARM based clusters on the shelf behind his desk.. 
Yes, Beaglebones, but it’s an ARM, it’s 4 nodes, and I use various cluster 
tools to manipulate them – the connection fabric for one is kind of slow 
(802.11))




_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to