Re: anyone use hadoop+solr?

MitchK Sat, 04 Sep 2010 10:53:53 -0700

Hi,

this topic started a few months ago, however there are some questions from
my side, that I couldn't answer by looking at the SOLR-1301-issue nor the
wiki-pages.


Let me try to explain my thoughts:
Given: a Hadoop-cluster, a solr-search-cluster and nutch as a
crawling-engine which also performs LinkRank and webgraph-related tasks.

Once a list of documents is created by nutch, you put the list + the
LinkRank-values etc. into a Solr+Hadoop-job like it is described in
Solr-1301 to index or reindex the given documents.
When the shards are built, they will be sent over the network to the
solr-search-cluster.
Is this description correct?

What makes me thinking is:
Assumed I got a Document X on machine Y in shard Y... 
When I reindex that document X together with lots of other documents that
are present or not present in Shard Y... and I put the resulting shard on a
machine Z, how does machine Y notice that it has got an older version of
document X than machine Z?

Furthermore: Go on and assume that the shard Y was replicated to three other
machines, how do they all notice, that their version of document X is not
the newest available one?
In such an environment, we do not have a master (right?), so far: How to
keep the index as consistent as possible?

Thank you for clearifying. 

Kind regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1418140.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: anyone use hadoop+solr?

Reply via email to