Hi Everyone,

I am wondering if there is any best practice regarding re-indexing
documents in SolrCloud 6.0.0 without making the data (or the underlying
collection) temporarily unavailable. Wiping all documents in a collection
and performing a full re-indexing is not a viable alternative for us.

Say we had a massive Solr Cloud cluster with a number of separate nodes
that are used to host *multiple hundreds* of collections, with document
counts ranging from a couple of thousands to multiple (say up to 20)
millions of documents, each with 200-300 fields and a background batch
loader job that fetches data from a variety of source systems.

We have to retain the cluster and ALL collections online all the time (365
x 24): We cannot allow queries to be blocked while data in a collection is
being updated and we cannot load everything in a single-shot jumbo commit
(the replication could overload the cluster).

One solution I could imagine is storing an additional field "load
time-stamp" in all documents and the client (interactive query) application
extending all queries with an additional restriction, which requires
documents "load time-stamp" to be the latest known completed "load
time-stamp".

This concept would work according to the following:
1.) The batch job would simply start loading new documents, with the new
"load time-stamp". Existing documents would not be touched.
2.) The client (interactive query) application would still use the old data
from the previous load (since all queries are restricted with the old "load
time-stamp")
3.) The batch job would store the new "load time-stamp" as the one to be
used (e.g. in a separate collection etc.) -- after this, all queries would
return the most up-to-data documents
4.) The batch job would purge all documents from the collection, where
the "load time-stamp" is not the same as the last one.

This approach seems to be implementable, however, I definitely want to
avoid reinventing the wheel myself and wondering if there is any better
solution or built-in Solr Cloud feature to achieve the same or something
similar.

Thanks,
Peter

Reply via email to