Thanks Erick and Upayavira! This answers my question.

On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> See the very last line here:
> http://wiki.apache.org/solr/MergingSolrIndexes
>
> Short answer is that merging will lead to duplicate documents, even with
> uniqueKeys defined.
>
> So you're really kind of stuck handling this outside of merge, either by
> shipping the
> list of overwritten docs and deleting them from the base index or shipping
> the JSON/XML
> format and indexing those. Of the  two, I'd think the latter is
> easiest/least prone to surprises.
> Especially since you could re-run the indexing as many times as necessary.
>
> The UniqueKey bits are only guaranteed to overwrite older docs when
> indexing, not merging.
>
> Best
> Erick
>
>
> On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi <contacts...@gmail.com
> >wrote:
>
> > Hi Alex,
> >
> > You got my point right. What I see is merge adds duplicate document. Is
> > there a way to overwrite existing document in one core by another. Can
> > merge operation lead to data corruption, say in case when the core on
> > client had uncommitted changes.
> >
> > What would be a better solution for my requirement, merge or indexing
> > XML/JSON?
> >
> > Regards,
> > Dikchant
> >
> > On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
> > <arafa...@gmail.com>wrote:
> >
> > > Not sure I fully understood this and maybe you already cover that by
> > > 'merge', but if you know what you gave the client last time, you can
> just
> > > build a differential as a second core, then on client mount that second
> > > core and merge it into the first one (e.g. with DIH).
> > >
> > > Just a thought.
> > >
> > > Regards,
> > >    Alex.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi <contacts...@gmail.com
> > > >wrote:
> > >
> > > > Hi Erick,
> > > >
> > > > Sorry for creating the confusion. By slave, I mean the indexes on
> > client
> > > > machine will be replica of the master and in not same as the slave in
> > > > master-slave model. Below is the detail:
> > > >
> > > > The system is being developed to support search facility on 1000s of
> > > > system, a majority of which will be offline.
> > > >
> > > > The idea is that we will have a search system which will be sold
> > > > on subscription basis. For each of the subscriber, we will copy the
> > > master
> > > > index to their local machine, over a drive or CD. Now, if a
> subscriber
> > > > comes after 2 months and want the updates, we just want to provide
> the
> > > > deltas for 2 month as the volume of data is huge. For this we can
> think
> > > of
> > > > two approaches:
> > > > 1. Fetch the documents which are less than 2 months old  in JSON
> format
> > > > from master Solr. Copy it to the subscriber machine
> > > > and index those documents. (copy through cd / memory sticks)
> > > > 2. Create separate indexes for each month on our master machine. Copy
> > the
> > > > indexes to the client machine and merge. Prior to merge we need to
> > delete
> > > > records which the new index has, to avoid duplicates.
> > > >
> > > > As long as the setup is new, we will copy the complete index and
> > restart
> > > > Solr. We are not sure of the best approach for copying the deltas.
> > > >
> > > > Thanks,
> > > > Dikchant
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson <
> > erickerick...@gmail.com
> > > > >wrote:
> > > >
> > > > > This is somewhat confusing. You say that box2 is the slave, yet
> > they're
> > > > not
> > > > > connected? Then you need to copy the <solr home>/data index from
> box
> > 1
> > > to
> > > > > box 2 manually (I'd have box2 solr shut down at the time) and
> restart
> > > > Solr.
> > > > >
> > > > > Why can't the boxes be connected? That's a much simpler way of
> going
> > > > about
> > > > > it.
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > >
> > > > > On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi <
> > contacts...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Walter,
> > > > > >
> > > > > > Thanks for the response.
> > > > > >
> > > > > > Commit will help to reflect changes on Box1. We are able to
> achieve
> > > > this.
> > > > > > We want the changes to reflect in Box2.
> > > > > >
> > > > > > We have two indexes. Say
> > > > > > Box1: Master & DB has been setup. Data Import runs on this.
> > > > > > Box2: Slave running.
> > > > > >
> > > > > > We want all the updates on Box1 to be merged/present in index on
> > > Box2.
> > > > > Both
> > > > > > the boxes are not connected over n/w. How can be achieve this.
> > > > > >
> > > > > > Please let me know, if am not clear.
> > > > > >
> > > > > > Thanks again!
> > > > > >
> > > > > > Regards,
> > > > > > Dikchant
> > > > > >
> > > > > > On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood <
> > > > > wun...@wunderwood.org
> > > > > > >wrote:
> > > > > >
> > > > > > > You do not need to manage online and offline indexes. Commit
> when
> > > you
> > > > > are
> > > > > > > done with your updates and Solr will take care of it for you.
> The
> > > > > changes
> > > > > > > are not live until you commit.
> > > > > > >
> > > > > > > wunder
> > > > > > >
> > > > > > > On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > How can we do delta update of offline indexes?
> > > > > > > >
> > > > > > > > We have the master index on which data import will be done.
> The
> > > > index
> > > > > > > > directory will be copied to slave machine in case of full
> > update,
> > > > > > through
> > > > > > > > CD as the  slave/client machine is offline.
> > > > > > > > So, what should be the approach for getting the delta to the
> > > > slave. I
> > > > > > can
> > > > > > > > think of two approaches.
> > > > > > > >
> > > > > > > > 1.Create separate indexes of the delta on the master machine,
> > > copy
> > > > it
> > > > > > to
> > > > > > > > the slave machine and merge. Before merging the indexes on
> the
> > > > client
> > > > > > > > machine, delete all the updated and deleted documents in
> client
> > > > > machine
> > > > > > > > else merge will add duplicates. So along with the index, we
> > need
> > > to
> > > > > > > > transfer the list of documents which has been
> updated/deleted.
> > > > > > > >
> > > > > > > > 2. Extract all the documents which has changed since a
> > particular
> > > > > time
> > > > > > in
> > > > > > > > XML/JSON and index it in client machine.
> > > > > > > >
> > > > > > > > The size of indexes are huge, so we cannot rollover index
> > > > everytime.
> > > > > > > >
> > > > > > > > Please help me with your take and challenges you see in the
> > above
> > > > > > > > approaches. Please suggest if you think of any other better
> > > > approach.
> > > > > > > >
> > > > > > > > Thanks a ton!
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Dikchant
> > > > > > >
> > > > > > > --
> > > > > > > Walter Underwood
> > > > > > > wun...@wunderwood.org
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to