Steven D'Aprano <st...@pearwood.info> wrote: ... > You think that search engine software is hard? It's not hard. Yahoo has > one, Microsoft has one, Alta Vista had one, Ask Jeeves had one, search > engines where everywhere, and there still exist a couple of dozen. In > 2008 alone, TEN new search engines were launched to the public. > Google's competitive advantage isn't their software, but their data, > their market share, and name recognition.
I couldn't let that pass. Having written several search engines myself I have to say that building search engines is VERY hard. Anyone can write a simple search engine that can run through a few megabytes of data on behalf of a few users. But writing an engine that can process terabytes or petabytes of data, can service millions of users, can keep data in sync on thousands of servers, can handle proximity matching, relevance ranking, significant phrase extraction, search word stemming and highlighting in multiple languages, result set size estimation, and do it all at blinding speed and with practical sized and real-time updatable stored data structures - that's a job that only the best, world class programmers can do. I once worked with professors of computer science who had studied search technology for years and written their own award winning system (the Inquery search engine) and they couldn't figure out how the then top search engine (Infoseek) was able to do what it did as fast as it did. Go to any computer science library and try to find published algorithms for search engine optimization. Try to find out, for example, how to build a postings list that can tell you when three words appear in a particular order in a document, and still be small, compressed, updatable, and yet fast to search. Maybe there's something now, but there wasn't when I looked. It was a black art based on closely guarded, proprietary secrets. > Yes, Google probably had a competitive advantage due to their software > algorithm ten years ago, and maybe, just maybe, they wouldn't have had > one if they had open sourced it then. But even back on day one, Google > didn't mind telling people what their algorithm was. Google explained that relevance ranking was partly based on determining how many links existed to a web page. Revealing that secret is like revealing that an internal combustion engine mixes air and gasoline in a carburetor (well, they used to.) Yes, it gives you important information. Yes it was a great innovation in its time. But it doesn't tell you anything about how to make it work and it's only a tiny sliver of the technology that Google had to build. -- Alan Meyer amey...@yahoo.com _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users