The recent comments on compilers, caches, etc., are why HPC isn’t a bigger 
deal.  The infrastructure today is reminiscent of what I used in the 1970s on a 
big CDC or Burroughs or IBM machine, perhaps with a FPS box attached.
I prepare a job, with some sort of job control structure, submit it to a batch 
queue, and get my results some time later.  Sure, I’m not dropping off a deck 
or tapes, and I’m not getting green-bar paper or a tape back, but really, it’s 
not much different – I drop a file and get files back either way.

And just like back then, it’s up to me to figure out how best to arrange my 
code to run fastest (or me, wall clock time, but others it might be CPU time or 
cost or something else)

It would be nice if the compiler (or run-time or infrastructure) figured out 
the whole “what’s the arrangement of cores/nodes/scratch storage for this 
application on this particular cluster”.
I also acknowledge that this is a “hard” problem and one that doesn’t have the 
commercial value of, say, serving the optimum ads to me when I read the 
newspaper on line.

Yeah, it’s not that hard to call library routines for matrix operations, and to 
put my trust in the library writers – I trust them more than I trust me to find 
the fastest linear equation solver, fft, etc. – but so far, the next level of 
abstraction up – “how many cores/nodes” is still left to me, and that means 
doing instrumentation, figuring out what the results mean, etc.


From: Beowulf <beowulf-boun...@beowulf.org> on behalf of "beowulf@beowulf.org" 
<beowulf@beowulf.org>
Reply-To: Jim Lux <james.p....@jpl.nasa.gov>
Date: Monday, September 20, 2021 at 10:42 AM
To: Lawrence Stewart <stew...@serissa.com>, Jim Cownie <jcow...@gmail.com>
Cc: Douglas Eadline <deadl...@eadline.org>, "beowulf@beowulf.org" 
<beowulf@beowulf.org>
Subject: Re: [Beowulf] [EXTERNAL] Re: Deskside clusters



From: Beowulf <beowulf-boun...@beowulf.org> on behalf of Lawrence Stewart 
<stew...@serissa.com>
Date: Monday, September 20, 2021 at 9:17 AM
To: Jim Cownie <jcow...@gmail.com>
Cc: Lawrence Stewart <stew...@serissa.com>, Douglas Eadline 
<deadl...@eadline.org>, "beowulf@beowulf.org" <beowulf@beowulf.org>
Subject: Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

Well said.  Expanding on this, caches work because of both temporal locality and
spatial locality.  Spatial locality is addressed by having cache lines be 
substantially
larger than a byte or word.  These days, 64 bytes is pretty common.  Some 
prefetch schemes,
like the L1D version that fetches the VA ^ 64 clearly affect spatial locality.  
Streaming
prefetch has an expanded notion of “spatial” I suppose!

What puzzles me is why compilers seem not to have evolved much notion of cache 
management. It
seems like something a smart compiler could do.  Instead, it is left to Prof. 
Goto and the folks
at ATLAS and BLIS to figure out how to rewrite algorithms for efficient cache 
behavior. To my
limited knowledge, compilers don’t make much use of PREFETCH or any 
non-temporal loads and stores
either. It seems to me that once the programmer helps with RESTRICT and so 
forth, then compilers could perfectly well dynamically move parts of arrays 
around to maximize cache use.

-L



I suspect that there’s enough variability among cache implementation and the 
wide variety of algorithms that might use it that writing a smart-enough 
compiler is “hard” and “expensive”.



Leaving it to the library authors is probably the best “bang for the buck”.






_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to