Re: Questions regarding re-index when using Solr as a data source

Walter Underwood Thu, 09 Jun 2016 09:19:15 -0700

First, using Solr as a repository is pretty risky. I would keep the official 
copy of the data in a database, not in Solr.


Second, you can’t “migrate tables” because Solr doesn’t have tables. You need 
to turn the tables into documents, then index the documents. It can take a lot 
of joins to flatten a relational schema into Solr documents.

Solr does not support schema migration, so yes, you will need to save off all 
the documents, then reload them. I would save them to files. It makes no sense 
to put them in another copy of Solr.

Changing the schema will be difficult and time-consuming, but you’ll probably 
run into much worse problems trying to use Solr as a repository.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 9, 2016, at 8:50 AM, Hui Liu <h...@opentext.com> wrote:
> 
> Hi,
> 
>              We are porting an application currently hosted in Oracle 11g to 
> Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in 
> Solr, index them, and build search tools on top of this; the goal is we won't 
> be using Oracle at all after this has been implemented; every fields in Solr 
> will have 'stored=true' and selectively a subset of searchable fields will 
> have 'indexed=true'; the question is what steps we should follow if we need 
> to re-index a collection after making some schema changes - mostly we only 
> add new fields to store, or make a non-indexed field as indexed, we normally 
> do not delete or rename any existing fields; according to this url: 
> https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 
> 'intermediate' Solr1 to only store the data themselves without any indexing, 
> then have another Solr2 setup to store the indexed data, and in case of 
> re-index, just delete all the documents in Solr2 for the collection and 
> re-import data from Solr1 into Solr2 using SolrEntityProcessor (from 
> dataimport handler)? Is this still the recommended approach? I can see the 
> downside of this approach is if we have tremendous amount of data for a 
> collection (some of our collection could have several billions of documents), 
> re-import it from Solr1 to Solr2 may take a few hours or even days, and 
> during this time, users cannot query the data, is there any better way to do 
> this and avoid this type of down time? Any feedback is appreciated!
> 
> Regards,
> Hui Liu
> Opentext, Inc.

Re: Questions regarding re-index when using Solr as a data source

Reply via email to