Re: Index process migration from 3.6.x to 4.x

Shawn Heisey Tue, 30 Dec 2014 06:12:14 -0800

On 12/30/2014 2:16 AM, Samuel García Martínez wrote:
> I'm studying the migration process from our current solr 3.6 multitenant
> cluster (single master, multiple slaves) setup to a solrcloud 4.10.3 but I
> have a a question about the tlog.
> 
> First of all, I will try to give some context:
> 
> 
>    - 1 single master and N slaves.
>    - around 300 cores (about 10 per client). Manual sharding, just doc
>    routing per doc language. Example: test-client-en_US, test-client-it_IT,
>    test-client2-en_US.
>    - small indexes. About 2GB per core. Bigger cores contains around 200k
>    docs, others are just heavy denormalized docs (a few thousand, but big
>    documents)
>    - I only do bulk indexing. We have a plenty process that are full
>    reindex, i.e., deleteByQuery=*:* and reindex.
>    - When a full reindex fails, I want to keep the older version.
> 
> According to the docs, it is recommended to use the autocommit feature and
> openSearcher=false. But my question is, in my context, can cause any issue
> not using the autocommit? Outside the extra storage space and the possible
> delay in the startup time.
> 
> Another solution I came up with is to create a new collection, issue the
> full reindex in that collection and "rename" it or create an alias that I
> can use to query. So, every time I reindex, the "temp" index is created and
> associated to the alias and the old is deleted. Is this right or a crazy
> idea?


Using the transaction log is recommended.  The default Directory
implementation (NRTCachingDirectoryFactory) can experience data loss if
it is not enabled.  That Directory implementation is the default for
good reason, as long as you have the log enabled.

When the transaction log is enabled, it is *always* recommended that you
do regular hard commits, to avoid the transaction log growing out of
control.  The startup delay can be significant -- replaying a
transaction log covering all of your documents can take as long as the
bulk indexing operations that created the transaction log, and if your
bulk indexing is multi-threaded or multi-process, replaying can take
LONGER than the original indexing.  Using the autoCommit feature with
openSearcher=false is the easiest way to keep the transaction log size
down, but *any* hard commit will do.

You don't need to use SolrCloud when you upgrade.  It does make
redundancy easier -- replication does not happen during normal SolrCloud
operation.  Instead the documents will be indexed independently and
automatically by every replica.

With traditional Solr, on either version, you can swap/rename cores.
With collections on SolrCloud, swapping and renaming isn't an option,
but you can create multiple collections and alias a common name to
whichever one is active.  These features would let you create a new
index and exchange it with the old one, just as you have described in
your last paragraph.  I use the core swapping method for the rare full
rebuild, without SolrCloud.

Thanks,
Shawn

Re: Index process migration from 3.6.x to 4.x

Reply via email to