>From my experience the lucene mergeTool and the one invoked by
coreAdmin is a pure lucene implementation and does not understand the
concepts of a unique Key(solr land concept)

  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note
at the end

we do frequent index merges for which we externally run map/reduce (
java code using lucene api's) jobs to merge & validate merged indices
with sources.
-Ani

On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmil...@gmail.com> wrote:
> Yeah, you have to carefully manage things if you are map/reduce building 
> indexes *and* updating documents in other ways.
>
> If your 'source' data for MR index building is the 'truth', you also have the 
> option of not doing incremental index merging, and you could simply rebuild 
> the whole thing every time - of course, depending your cluster size, that 
> could be quite expensive.
>
> - Mark
>
> On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
>> Thanks Mark.  My question is stemming from the new cloudera search stuff.
>> My concern its that if while rebuilding the index someone updates a doc
>> that update could be lost from a solr perspective.  I guess what would need
>> to happen to ensure the correct information was indexed would be to record
>> the start time and reindex the information that changed since then?
>> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmil...@gmail.com> wrote:
>>
>>>
>>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>
>>>> When merging through the core admin (
>>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for
>>>> conflicts during the merge?  So for instance if I am merging core 1 and
>>>> core 2 into core 0 (first example), what happens if core 1 and core 2
>>> both
>>>> have a document with the same key, say core 1 has a newer version of core
>>>> 2?  Does the merge fail, does the newer document remain?
>>>
>>> You end up with both documents, both with that ID - not generally a
>>> situation you want to end up in. You need to ensure unique id's in the
>>> input data or replace the index rather than merging into it.
>>>
>>>>
>>>> Also if using the srcCore method if a document with key 1 is written
>>> while
>>>> an index also with key 1 is being merged what happens?
>>>
>>> It depends on the order I think - if the doc is written after the merge
>>> and it's an update, it will update the doc that was just merged in. If the
>>> merge comes second, you have the doc twice and it's a problem.
>>>
>>> - Mark
>



-- 
Anirudha P. Jadhav

Reply via email to