Thanks for the quick reply!

We just want to use solrcloud because it simplifies the operations process
and the cluster management like centralized configurations, replica
management and so on.

I've been playing with a 4 node cluster and watching the tlog and possibles
issues and it seems too dangerous just to let them grow and lost a node
when the index process is near 9x% done. It takes up to 30m to become
active again due to the tlog replaying.

I think the best solution to this is create a new collection with replicas,
full index on that new collection and when the index is done issue a
CREATEALIAS on that new collection (moving the alias from the old one to
the new one). Then, dropping the old one.

I'm start thinking about network problems and the reliability of this
process and I think i would become a pain :s

On Tue, Dec 30, 2014 at 3:11 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/30/2014 2:16 AM, Samuel García Martínez wrote:
> > I'm studying the migration process from our current solr 3.6 multitenant
> > cluster (single master, multiple slaves) setup to a solrcloud 4.10.3 but
> I
> > have a a question about the tlog.
> >
> > First of all, I will try to give some context:
> >
> >
> >    - 1 single master and N slaves.
> >    - around 300 cores (about 10 per client). Manual sharding, just doc
> >    routing per doc language. Example: test-client-en_US,
> test-client-it_IT,
> >    test-client2-en_US.
> >    - small indexes. About 2GB per core. Bigger cores contains around 200k
> >    docs, others are just heavy denormalized docs (a few thousand, but big
> >    documents)
> >    - I only do bulk indexing. We have a plenty process that are full
> >    reindex, i.e., deleteByQuery=*:* and reindex.
> >    - When a full reindex fails, I want to keep the older version.
> >
> > According to the docs, it is recommended to use the autocommit feature
> and
> > openSearcher=false. But my question is, in my context, can cause any
> issue
> > not using the autocommit? Outside the extra storage space and the
> possible
> > delay in the startup time.
> >
> > Another solution I came up with is to create a new collection, issue the
> > full reindex in that collection and "rename" it or create an alias that I
> > can use to query. So, every time I reindex, the "temp" index is created
> and
> > associated to the alias and the old is deleted. Is this right or a crazy
> > idea?
>
> Using the transaction log is recommended.  The default Directory
> implementation (NRTCachingDirectoryFactory) can experience data loss if
> it is not enabled.  That Directory implementation is the default for
> good reason, as long as you have the log enabled.
>
> When the transaction log is enabled, it is *always* recommended that you
> do regular hard commits, to avoid the transaction log growing out of
> control.  The startup delay can be significant -- replaying a
> transaction log covering all of your documents can take as long as the
> bulk indexing operations that created the transaction log, and if your
> bulk indexing is multi-threaded or multi-process, replaying can take
> LONGER than the original indexing.  Using the autoCommit feature with
> openSearcher=false is the easiest way to keep the transaction log size
> down, but *any* hard commit will do.
>
> You don't need to use SolrCloud when you upgrade.  It does make
> redundancy easier -- replication does not happen during normal SolrCloud
> operation.  Instead the documents will be indexed independently and
> automatically by every replica.
>
> With traditional Solr, on either version, you can swap/rename cores.
> With collections on SolrCloud, swapping and renaming isn't an option,
> but you can create multiple collections and alias a common name to
> whichever one is active.  These features would let you create a new
> index and exchange it with the old one, just as you have described in
> your last paragraph.  I use the core swapping method for the rare full
> rebuild, without SolrCloud.
>
> Thanks,
> Shawn
>
>


-- 
Un saludo,
Samuel García.

Reply via email to