FWIW, the Solr included with Cloudera Search, by default, "ignores all but the most recent document version" during merges. The conflict resolution is configurable however. See the documentation for details. http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html -- see the user guide pdf, " update-conflict-resolver" parameter
James -----Original Message----- From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of Anirudha Jadhav Sent: Tuesday, June 11, 2013 10:47 AM To: solr-user@lucene.apache.org Subject: Re: index merge question From my experience the lucene mergeTool and the one invoked by coreAdmin is a pure lucene implementation and does not understand the concepts of a unique Key(solr land concept) http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note at the end we do frequent index merges for which we externally run map/reduce ( java code using lucene api's) jobs to merge & validate merged indices with sources. -Ani On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmil...@gmail.com> wrote: > Yeah, you have to carefully manage things if you are map/reduce building > indexes *and* updating documents in other ways. > > If your 'source' data for MR index building is the 'truth', you also have the > option of not doing incremental index merging, and you could simply rebuild > the whole thing every time - of course, depending your cluster size, that > could be quite expensive. > > - Mark > > On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >> Thanks Mark. My question is stemming from the new cloudera search stuff. >> My concern its that if while rebuilding the index someone updates a >> doc that update could be lost from a solr perspective. I guess what >> would need to happen to ensure the correct information was indexed >> would be to record the start time and reindex the information that changed >> since then? >> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmil...@gmail.com> wrote: >> >>> >>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>>> When merging through the core admin ( >>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy >>>> for conflicts during the merge? So for instance if I am merging >>>> core 1 and core 2 into core 0 (first example), what happens if core >>>> 1 and core 2 >>> both >>>> have a document with the same key, say core 1 has a newer version >>>> of core 2? Does the merge fail, does the newer document remain? >>> >>> You end up with both documents, both with that ID - not generally a >>> situation you want to end up in. You need to ensure unique id's in >>> the input data or replace the index rather than merging into it. >>> >>>> >>>> Also if using the srcCore method if a document with key 1 is >>>> written >>> while >>>> an index also with key 1 is being merged what happens? >>> >>> It depends on the order I think - if the doc is written after the >>> merge and it's an update, it will update the doc that was just >>> merged in. If the merge comes second, you have the doc twice and it's a >>> problem. >>> >>> - Mark > -- Anirudha P. Jadhav