Re: index merge question

Mark Miller Tue, 11 Jun 2013 09:11:36 -0700

Right - but that sounds a little different than what we were talking about.


You had brought up the core admin merge cmd that let's you merge an index into 
a running Solr cluster.

We are calling that the golive option in the map reduce indexing code. It has 
the limitations we have discussed.

However, if you are only using map reduce to build indexes, there are 
facilities for dealing with duplicate id's - as you see in the documentation. 
The merges involved in that are different though - these are merges that happen 
as the final index is being constructed by the map reduce job. The final step 
is the golive step, where the indexes will be deployed to the running Solr 
cluster - this is what uses the core admin merge command, and if you are doing 
updates or adds outside of map reduce, you will face the issues we have 
discussed.


- Mark

On Jun 11, 2013, at 11:57 AM, James Thomas <jtho...@camstar.com> wrote:

> FWIW, the Solr included with Cloudera Search, by default, "ignores all but 
> the most recent document version" during merges.
> The conflict resolution is configurable however.  See the documentation for 
> details.
> http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html
> -- see the user guide pdf, " update-conflict-resolver" parameter
> 
> James
> 
> -----Original Message-----
> From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of 
> Anirudha Jadhav
> Sent: Tuesday, June 11, 2013 10:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: index merge question
> 
> From my experience the lucene mergeTool and the one invoked by coreAdmin is a 
> pure lucene implementation and does not understand the concepts of a unique 
> Key(solr land concept)
> 
>  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note at the 
> end
> 
> we do frequent index merges for which we externally run map/reduce ( java 
> code using lucene api's) jobs to merge & validate merged indices with sources.
> -Ani
> 
> On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmil...@gmail.com> wrote:
>> Yeah, you have to carefully manage things if you are map/reduce building 
>> indexes *and* updating documents in other ways.
>> 
>> If your 'source' data for MR index building is the 'truth', you also have 
>> the option of not doing incremental index merging, and you could simply 
>> rebuild the whole thing every time - of course, depending your cluster size, 
>> that could be quite expensive.
> 
>> 
>> - Mark
>> 
>> On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> 
>>> Thanks Mark.  My question is stemming from the new cloudera search stuff.
>>> My concern its that if while rebuilding the index someone updates a 
>>> doc that update could be lost from a solr perspective.  I guess what 
>>> would need to happen to ensure the correct information was indexed 
>>> would be to record the start time and reindex the information that changed 
>>> since then?
>>> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmil...@gmail.com> wrote:
>>> 
>>>> 
>>>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>> 
>>>>> When merging through the core admin (
>>>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy 
>>>>> for conflicts during the merge?  So for instance if I am merging 
>>>>> core 1 and core 2 into core 0 (first example), what happens if core 
>>>>> 1 and core 2
>>>> both
>>>>> have a document with the same key, say core 1 has a newer version 
>>>>> of core 2?  Does the merge fail, does the newer document remain?
>>>> 
>>>> You end up with both documents, both with that ID - not generally a 
>>>> situation you want to end up in. You need to ensure unique id's in 
>>>> the input data or replace the index rather than merging into it.
>>>> 
>>>>> 
>>>>> Also if using the srcCore method if a document with key 1 is 
>>>>> written
>>>> while
>>>>> an index also with key 1 is being merged what happens?
>>>> 
>>>> It depends on the order I think - if the doc is written after the 
>>>> merge and it's an update, it will update the doc that was just 
>>>> merged in. If the merge comes second, you have the doc twice and it's a 
>>>> problem.
>>>> 
>>>> - Mark
>> 
> 
> 
> 
> --
> Anirudha P. Jadhav

Re: index merge question

Reply via email to