On 2/5/16, 9:58 AM, "Douglas Eadline" <deadl...@eadline.org> wrote:
> >> What I find interesting about this is that there's only a 3:1 difference >> between high and low. >> >> That's a pretty compelling argument that if you need a 10x speedup, >>you're >> not going to get it by "buy a faster computer", and that, rather, >> parallelism is your friend. > >And the clocks go from 2.5 to 3.2 GHz. I¹m not sure how closely CPU rate correlates with computational performance these days. It¹s all about architecture and things like memory bandwidth. It used to be that 2x clock gave you 2xMIPS. Today I view any stated clock rates as essentially part of the part number to allow you to distinguish between models. While it¹s not as bad as ³peak music watts² in the pre-FTC audio amplifier days, I think all the clock rate tells you is that *for the same chip architecture more Ghz is faster than less Ghz* and that¹s about it. You¹re certainly not laying out a PWB to run at 3GHz. > >I'm not sure how much farther multi-core can go >with adding cores. Well, multi-core is just parallel/cluster computing on a smaller scale with more cross-resource coupling - maybe more like a crossbar switch to memory. For the vast majority of software out there (e.g. People running Excel, most PC apps, etc.) multi core seems to just be a way to spread threads across multiple CPUs, perhaps saving some context switch time, and keeping the memory interface full - They¹re all still hitting the same big RAM, network, and disk drive. On my desktop and notebook computers, it seems the programs that consistently sucks up a whole core is the virus scanner and the disk directory indexing tools - neither of these would be an issue in HPC, I suspect. To go farther, software will have to undergo a significant change in architecture to one that is more amenable to hardware architectures more like a classic Beowulf: standalone nodes with some communication fabric; There¹s still the problem that making RAM is very different than making CPU - the chip design is fundamentally different. This is the problem that Tilera/EZ-Chip or Angstrom face. The TILE-Gx72 has 72 cores, but only 4 DDR memory ports. There¹s 22Mbyte of on-chip cache - spreading that out across all the cores means that one core really has 1/3 Mbyte for its share of the cache. So you need a task that is pretty fine grained to take advantage. Sure, you¹re not in the vector pipeline/parallel SIMD world, but it¹s still each CPU has to have a very high locality of data reference or the system becomes memory bound. > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf