FWIW, the Solr included with Cloudera Search, by default, "ignores all but the 
most recent document version" during merges.
The conflict resolution is configurable however.  See the documentation for 
details.
http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html
-- see the user guide pdf, " update-conflict-resolver" parameter

James

-----Original Message-----
From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of Anirudha 
Jadhav
Sent: Tuesday, June 11, 2013 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: index merge question

From my experience the lucene mergeTool and the one invoked by coreAdmin is a 
pure lucene implementation and does not understand the concepts of a unique 
Key(solr land concept)

  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note at the 
end

we do frequent index merges for which we externally run map/reduce ( java code 
using lucene api's) jobs to merge & validate merged indices with sources.
-Ani

On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmil...@gmail.com> wrote:
> Yeah, you have to carefully manage things if you are map/reduce building 
> indexes *and* updating documents in other ways.
>
> If your 'source' data for MR index building is the 'truth', you also have the 
> option of not doing incremental index merging, and you could simply rebuild 
> the whole thing every time - of course, depending your cluster size, that 
> could be quite expensive.

>
> - Mark
>
> On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
>> Thanks Mark.  My question is stemming from the new cloudera search stuff.
>> My concern its that if while rebuilding the index someone updates a 
>> doc that update could be lost from a solr perspective.  I guess what 
>> would need to happen to ensure the correct information was indexed 
>> would be to record the start time and reindex the information that changed 
>> since then?
>> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmil...@gmail.com> wrote:
>>
>>>
>>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>
>>>> When merging through the core admin (
>>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy 
>>>> for conflicts during the merge?  So for instance if I am merging 
>>>> core 1 and core 2 into core 0 (first example), what happens if core 
>>>> 1 and core 2
>>> both
>>>> have a document with the same key, say core 1 has a newer version 
>>>> of core 2?  Does the merge fail, does the newer document remain?
>>>
>>> You end up with both documents, both with that ID - not generally a 
>>> situation you want to end up in. You need to ensure unique id's in 
>>> the input data or replace the index rather than merging into it.
>>>
>>>>
>>>> Also if using the srcCore method if a document with key 1 is 
>>>> written
>>> while
>>>> an index also with key 1 is being merged what happens?
>>>
>>> It depends on the order I think - if the doc is written after the 
>>> merge and it's an update, it will update the doc that was just 
>>> merged in. If the merge comes second, you have the doc twice and it's a 
>>> problem.
>>>
>>> - Mark
>



--
Anirudha P. Jadhav

Reply via email to