On Fri, Oct 29, 2010 at 03:02:45PM -0400, Ellis H. Wilson III wrote: > I think it's making a pretty wild assumption to say search engines and > HPC have the same I/O needs (and thus can use the same I/O setups).
Well, I'm an HPC guy doing infrastructure for a search engine, so I'm not assuming much. And I didn't say the setup would be the same -- just that Lustre/PVFS would probably be more reliable and higher performance if they stored copies on multiple servers instead of using local or SAN RAID. (Or did they implement this while I wasn't looking?) > Also, I'd be blown away if Blekko wasn't doing it's own > striping/redundancy - even if they aren't using RAID 0 or 1 by the book, > they probably are using the same concepts (though hand-spun for search > engine needs). We do the usual thing: store 3 copies on 3 different servers, locality picked such that a single network or power failure won't take out more than 1 copy. Since we are very concerned about transfer rates, it's well worth buying more disks because the read speed increases. > I don't think the "whole internet" takes up 5 petabytes, The internet is infinite in size thanks to websites that generate data (or crap). Our 3 billion page crawl (1/5 of the size we dream of) is 257 tbytes (compressed), and the corresponding index is 77 terabytes (very compressed). (Yes, we have a lot of disk space empty at the moment.) -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf