>From my experience the lucene mergeTool and the one invoked by coreAdmin is a pure lucene implementation and does not understand the concepts of a unique Key(solr land concept)
http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note at the end we do frequent index merges for which we externally run map/reduce ( java code using lucene api's) jobs to merge & validate merged indices with sources. -Ani On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmil...@gmail.com> wrote: > Yeah, you have to carefully manage things if you are map/reduce building > indexes *and* updating documents in other ways. > > If your 'source' data for MR index building is the 'truth', you also have the > option of not doing incremental index merging, and you could simply rebuild > the whole thing every time - of course, depending your cluster size, that > could be quite expensive. > > - Mark > > On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >> Thanks Mark. My question is stemming from the new cloudera search stuff. >> My concern its that if while rebuilding the index someone updates a doc >> that update could be lost from a solr perspective. I guess what would need >> to happen to ensure the correct information was indexed would be to record >> the start time and reindex the information that changed since then? >> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmil...@gmail.com> wrote: >> >>> >>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>>> When merging through the core admin ( >>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for >>>> conflicts during the merge? So for instance if I am merging core 1 and >>>> core 2 into core 0 (first example), what happens if core 1 and core 2 >>> both >>>> have a document with the same key, say core 1 has a newer version of core >>>> 2? Does the merge fail, does the newer document remain? >>> >>> You end up with both documents, both with that ID - not generally a >>> situation you want to end up in. You need to ensure unique id's in the >>> input data or replace the index rather than merging into it. >>> >>>> >>>> Also if using the srcCore method if a document with key 1 is written >>> while >>>> an index also with key 1 is being merged what happens? >>> >>> It depends on the order I think - if the doc is written after the merge >>> and it's an update, it will update the doc that was just merged in. If the >>> merge comes second, you have the doc twice and it's a problem. >>> >>> - Mark > -- Anirudha P. Jadhav