Okay.. So SRAM instead of Cache.. Or at least cache that doesn't care about off chip coherency (e.g. No bus snooping, and use delayed writeback)
A good paged virtual memory manager might work as well. But here's a question... Would a Harvard architecture with separate code and data paths to memory be a good idea. It's pretty standard in the DSP world, which is sort of a SIMD (except it's not really a single instruction... But you do the same thing to many sets of data over and over.. And a lot of exascale type applications: finite element codes, would have the same pattern) On 11/29/12 6:47 AM, "Eugen Leitl" <[email protected]> wrote: >On Thu, Nov 29, 2012 at 02:19:26PM +0000, Lux, Jim (337C) wrote: >> >> >> On 11/28/12 11:46 PM, "Eugen Leitl" <[email protected]> wrote: >> >> >On Thu, Nov 29, 2012 at 01:14:39AM -0500, Mark Hahn wrote: >> > >> >I've been waiting for cache to die and be substituted by >> >on-die SRAM or MRAM. Yet to happen, but if it happens, >> >it will be with embedded-like systems. >> >> >> When running, SRAM consumes a lot more power and space than almost any >> kind of DRAM. 2-4 transistors per cell vs 1, if nothing else. > >Yes, but we're talking cache. Cache is SRAM with extra logic. >Even a cache hit is slower than it would take to access on-die >SRAM. Cache coherency doesn't scale due to relativistically >constrained signalling. There also cannot be any such thing >as a global memory, unless you want it to be slow and spend >a lot of silicon real estate to make multiple writes to the >same location consistent. > >> A big problem is that the CMOS process for dense, low power, fast RAM is >> different than what you want to use for a CPU. And even between DRAM and >> SRAM there's a pretty big difference. (trenches, etc.) > >This is why we need stacked memories. Notice that MRAM might be compatible >with CPU fabbing processes. ST-MRAM >http://www.computerworld.com/s/article/9233516/Everspin_ships_first_ST_MRA >M_memory_with_500X_performance_of_flash >should have very good scaling in terms of performance and power >dissipation and can potentially be fabricated on top of an >ordinary CPU core http://www.cs.utexas.edu/~cart/publications/tr01-36.pdf _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
