Marc is referring to the very informative by Ted Dunning from maybe a month or 
so ago.

For what it's worth, we just used Hadoop Streaming, JRuby, and EmbeddedSolr to 
speed up indexing by parallelizing it.
 Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Marc Sturlese <marc.sturl...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, June 22, 2010 12:43:27 PM
> Subject: Re: anyone use hadoop+solr?
> 
> 
Well, the patch consumes the data from a csv. You have to modify the input 
> to
use TableInputFormat (I don't remember if it's called exaclty like that) 
> and
it will work.
Once you've done that, you have to specify as much 
> reducers as shards you
want.

I know 2 ways to index using 
> hadoop
method 1 (solr-1301 & nutch):
-Map: just get data from the 
> source and create key-value
-Reduce: does the analysis and index the 
> data
So, the index is build on the reducer side

method 2 (hadoop 
> lucene index contrib)
-Map: does analysis and open indexWriter to add 
> docs
-Reducer: Merge small indexs build in the map
So, indexs are build on 
> the map side
method 2 has no good integration with Solr at the 
> moment.

In the jira (SOLR-1301) there's a good explanation of the 
> advantages and
disadvantages of indexing on the map or reduce side. I 
> recomend you to read
with detail all the comments on the jira to know exactly 
> how it works.


-- 
View this message in context: 
> href="http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

Reply via email to