On 07/07/2011 12:31 PM, Lux, Jim (337C) wrote: >> On 07/07/2011 10:13 AM, Eugen Leitl wrote: >>> >>> http://www.techeye.net/chips/one-million-arm-chips-challenge-intel-bumblebee >>> >>> One million ARM chips challenge Intel bumblebee >>> >> >> Now say it like Dr. Evil: one MILLION processors. >> >> >> How long is it going to take to wire them all up? And how fast are they >> going to fail? If there's a MTBF of one million hours, that's still one >> failure per hour. > > > But this presents a very interesting design challenge.. when you get to this > sort of scale, you have to assume that at any time, some of them are going to > be dead or dying. Just like google's massively parallel database engines.. > > It's all about ultimate scalability. Anybody with a moderate competence > (certainly anyone on this list) could devise a scheme to use 1000 perfect > processors that never fail to do 1000 quanta of work in unit time. It's > substantially more challenging to devise a scheme to do 1000 quanta of work > in unit time on, say, 1500 processors with a 20% failure rate. Or even in > 1.2*unit time. >
Just to be clear - I wasn't saying this was a bad idea. Scaling up to this size seems inevitable. I was just imagining the team of admins who would have to be working non-stop to replace dead processors! I wonder what the architecture for this system will be like. I imagine it will be built around small multi-socket blades that are hot-swappable to handle this. Prentice _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf