Stu, Interesting! Can you provide more details about your setup? By "load balance the indexing stage" you mean "distribute the indexing process", right? Do you simply take your content to be indexed, split it into N chunks where N matches the number of TaskNodes in your Hadoop cluster and provide a map function that does the indexing? What does the reduce function do? Does that call IndexWriter.addAllIndexes or do you do that outside Hadoop?
Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Stu Hood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 7, 2008 7:14:20 PM Subject: Re: solr with hadoop As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop allows us to load balance the indexing stage, and then we use the raw Lucene IndexWriter.addAllIndexes method to merge the data to be hosted on Solr instances. Thanks, Stu -----Original Message----- From: Mike Klaas <[EMAIL PROTECTED]> Sent: Friday, January 4, 2008 3:04pm To: solr-user@lucene.apache.org Subject: Re: solr with hadoop On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: > I have huge index base (about 110 millions documents, 100 fields > each). But size of the index base is reasonable, it's about 70 Gb. > All I need is increase performance, since some queries, which match > big number of documents, are running slow. > So I was thinking is any benefits to use hadoop for this? And if > so, what direction should I go? Is anybody did something for > integration Solr with Hadoop? Does it give any performance boost? > Hadoop might be useful for organizing your data enroute to Solr, but I don't see how it could be used to boost performance over a huge Solr index. To accomplish that, you need to split it up over two machines (for which you might find hadoop useful). -Mike