Re: anyone use hadoop+solr?

Marc Sturlese Tue, 22 Jun 2010 09:43:56 -0700

Well, the patch consumes the data from a csv. You have to modify the input to
use TableInputFormat (I don't remember if it's called exaclty like that) and
it will work.
Once you've done that, you have to specify as much reducers as shards you
want.


I know 2 ways to index using hadoop
method 1 (solr-1301 & nutch):
-Map: just get data from the source and create key-value
-Reduce: does the analysis and index the data
So, the index is build on the reducer side

method 2 (hadoop lucene index contrib)
-Map: does analysis and open indexWriter to add docs
-Reducer: Merge small indexs build in the map
So, indexs are build on the map side
method 2 has no good integration with Solr at the moment.

In the jira (SOLR-1301) there's a good explanation of the advantages and
disadvantages of indexing on the map or reduce side. I recomend you to read
with detail all the comments on the jira to know exactly how it works.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: anyone use hadoop+solr?

Reply via email to