On Apr 6, 2011, at 10:29 PM, Jens Mueller wrote: > Walter, thanks for the advice: Well you are right, mentioning google. My > question was also to understand how such large systems like google/facebook > are actually working. So my numbers are just theoretical and made up. My > system will be smaller, but I would be very happy to understand how such > large systems are build and I think the approach Ephraim showd should be > working quite well at large scale.
Understanding what Google does will NOT help you build your engine. Just like understanding a F1 race car does not help you build a Toyota Camry. One is built for performance only, and requires LOTS of support, the other for supportability and stability. Very different engineering goals and designs. Here is one view of Google's search setup: http://www.linesave.co.uk/google_search_engine.html This talk gives a lot more detail. Summary in the blog post, slides in the PDF. Google's search is entirely in-memory. They load off disk and run. http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html http://research.google.com/people/jeff/WSDM09-keynote.pdf How big will your system be? Does it require real-time updates? wunder -- Walter Underwood Lead Engineer, MarkLogic