On Wed, Feb 14, 2007 at 09:51:21AM -0800, Jim Lux wrote: > I'm not sure you could put any processor (except maybe something like > a microcontroller) into a DRAM design and keep the densities > up. There are all sorts of things that might bite you.. aside from
IBM has just announced at the ISSCC a 1-transistor eDRAM substitute for the 6T-SRAM cell used in caches. (Others have already demonstrated 1T-SRAM years ago, AMD has Z-RAM, Intel Floating Body Cells, T-RAM doesn't need a capacitor, etc. -- embedded RAM is reasonably common in network processors, IIRC). http://www.heise.de/newsticker/meldung/85295 It's 45 nm SOI (starting 2008), 1.5 ns access (SRAM does 0.8..1 ns), and is supposed to be far more dissipation-friendly. Theoretically this gives you 6 times the eDRAM of a CPU cache, which is at least 12 MBytes, and possibly up to 48 MBytes (Power6 dual-core has 8 MBytes on-die cache). > thermal issues, I suspect that the number of mask layers, etc. is > fairly small for DRAM. The actual materials on the chip (doping > levels, etc.) may not allow for a reasonably performing processor > with reasonable feature sizes and thermal properties. Getting the > heat away from the junction is a big deal. > > I think DRAMs are built with a maximum of 4 layers of interconnect > with vias, while processors have a lot more layers and a much more > sophisticated interconnect structure. Above processes are compatible with CPU processes, so there's some hope the piggybacking in Terascale doesn't have to be forever. > Each and every switch has some non-zero power associated with > changing state. Sure, the core swings smaller voltages and energies, > but a DRAM cell is a lot smaller than a flipflop or half-adder in the > CPU, and only one is changing at a time, as opposed to thousands. At the horizon, there's MRAM which can also do logic with a little extension to each cell (a kind of nonvolatile FPGA). It's not that hugely fast, but it's static, and very low power. > A big advantage of integrating CPU and memory, though, is that you > don't have to "go offchip" which saves a huge amount in > drivers/receivers, etc. Of course, this is why everyone is looking Yes, this is a major advantage. No pads, too, but a few serial high-speed links. > to integrated photonics and/or real high speed serial > interconnects. The I/O buffer might consume a hundred or thousand > times more power than the onchip logic driving it. Trading some more > logic inside to serialize and deserialize, and do adapative > equalization, in exchange for fewer "wires out of the chip" is a good deal. > > Then, there's the speed of light problem. Put two chips 10cm apart Increasing density to true 3d integration is a very good way to reduce the average distance. Stacking computation modules on a 3d lattice also minimizes dead space, of course with current cooling you won't get more than a few 10 MW out of a paper basket volume before the cluster goes China syndrome. > on a board, and the round trip time (say for address to get there and > data to get back) is going to be in the nanoseconds area, even if the > chip itself were infinitely fast. The mammal CNS has a 120 m/s signalling limit, yet it can process pretty complex stimuli in few 10 ms. -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf