Hi, this topic started a few months ago, however there are some questions from my side, that I couldn't answer by looking at the SOLR-1301-issue nor the wiki-pages.
Let me try to explain my thoughts: Given: a Hadoop-cluster, a solr-search-cluster and nutch as a crawling-engine which also performs LinkRank and webgraph-related tasks. Once a list of documents is created by nutch, you put the list + the LinkRank-values etc. into a Solr+Hadoop-job like it is described in Solr-1301 to index or reindex the given documents. When the shards are built, they will be sent over the network to the solr-search-cluster. Is this description correct? What makes me thinking is: Assumed I got a Document X on machine Y in shard Y... When I reindex that document X together with lots of other documents that are present or not present in Shard Y... and I put the resulting shard on a machine Z, how does machine Y notice that it has got an older version of document X than machine Z? Furthermore: Go on and assume that the shard Y was replicated to three other machines, how do they all notice, that their version of document X is not the newest available one? In such an environment, we do not have a master (right?), so far: How to keep the index as consistent as possible? Thank you for clearifying. Kind regards -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1418140.html Sent from the Solr - User mailing list archive at Nabble.com.