On 12/30/2014 2:16 AM, Samuel García Martínez wrote: > I'm studying the migration process from our current solr 3.6 multitenant > cluster (single master, multiple slaves) setup to a solrcloud 4.10.3 but I > have a a question about the tlog. > > First of all, I will try to give some context: > > > - 1 single master and N slaves. > - around 300 cores (about 10 per client). Manual sharding, just doc > routing per doc language. Example: test-client-en_US, test-client-it_IT, > test-client2-en_US. > - small indexes. About 2GB per core. Bigger cores contains around 200k > docs, others are just heavy denormalized docs (a few thousand, but big > documents) > - I only do bulk indexing. We have a plenty process that are full > reindex, i.e., deleteByQuery=*:* and reindex. > - When a full reindex fails, I want to keep the older version. > > According to the docs, it is recommended to use the autocommit feature and > openSearcher=false. But my question is, in my context, can cause any issue > not using the autocommit? Outside the extra storage space and the possible > delay in the startup time. > > Another solution I came up with is to create a new collection, issue the > full reindex in that collection and "rename" it or create an alias that I > can use to query. So, every time I reindex, the "temp" index is created and > associated to the alias and the old is deleted. Is this right or a crazy > idea?
Using the transaction log is recommended. The default Directory implementation (NRTCachingDirectoryFactory) can experience data loss if it is not enabled. That Directory implementation is the default for good reason, as long as you have the log enabled. When the transaction log is enabled, it is *always* recommended that you do regular hard commits, to avoid the transaction log growing out of control. The startup delay can be significant -- replaying a transaction log covering all of your documents can take as long as the bulk indexing operations that created the transaction log, and if your bulk indexing is multi-threaded or multi-process, replaying can take LONGER than the original indexing. Using the autoCommit feature with openSearcher=false is the easiest way to keep the transaction log size down, but *any* hard commit will do. You don't need to use SolrCloud when you upgrade. It does make redundancy easier -- replication does not happen during normal SolrCloud operation. Instead the documents will be indexed independently and automatically by every replica. With traditional Solr, on either version, you can swap/rename cores. With collections on SolrCloud, swapping and renaming isn't an option, but you can create multiple collections and alias a common name to whichever one is active. These features would let you create a new index and exchange it with the old one, just as you have described in your last paragraph. I use the core swapping method for the rare full rebuild, without SolrCloud. Thanks, Shawn