Re: solr with hadoop

Otis Gospodnetic Mon, 07 Jan 2008 23:45:16 -0800

Stu,

Interesting!  Can you provide more details about your setup?  By "load balance 
the indexing stage" you mean "distribute the indexing process", right?  Do you 
simply take your content to be indexed, split it into N chunks where N matches 
the number of TaskNodes in your Hadoop cluster and provide a map function that 
does the indexing?  What does the reduce function do?  Does that call 
IndexWriter.addAllIndexes or do you do that outside Hadoop?


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Stu Hood <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 7, 2008 7:14:20 PM
Subject: Re: solr with hadoop

As Mike suggested, we use Hadoop to organize our data en route to Solr.
 Hadoop allows us to load balance the indexing stage, and then we use
 the raw Lucene IndexWriter.addAllIndexes method to merge the data to be
 hosted on Solr instances.

Thanks,
Stu



-----Original Message-----
From: Mike Klaas <[EMAIL PROTECTED]>
Sent: Friday, January 4, 2008 3:04pm
To: solr-user@lucene.apache.org
Subject: Re: solr with hadoop

On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote:

> I have huge index base (about 110 millions documents, 100 fields  
> each). But size of the index base is reasonable, it's about 70 Gb.  
> All I need is increase performance, since some queries, which match  
> big number of documents, are running slow.
> So I was thinking is any benefits to use hadoop for this? And if  
> so, what direction should I go? Is anybody did something for  
> integration Solr with Hadoop? Does it give any performance boost?
>
Hadoop might be useful for organizing your data enroute to Solr, but  
I don't see how it could be used to boost performance over a huge  
Solr index.  To accomplish that, you need to split it up over two  
machines (for which you might find hadoop useful).

-Mike

Re: solr with hadoop

Reply via email to