Hi Solr community I'm in the process of getting my mind set straight on SolrCloud; more specifically: trying to design a feasible workflow for a use-case where we currently use master/slave replication. First, the use case:
We want to 1. separate indexing workload from query workload 2. deploying config and/or schema changes without interrupting queries Currently we do (1) with a straight-forward master/slave replication setup. N master shards that handle updates and N slave shards replicating from these. In this setup we can do (2) by temporarily stopping replication, deploying new configuration/schema to master shards, possibly re-indexing, switching queries to go the master shards, re-enabling replication, and - when replication has finished - switching queries back to the slave shards So... introducing SolrCloud. We would really like to utilize SolrCloud, especially for the added fault-tolerance and simpler distributed indexing, but I'm a bit puzzled on how to achieve something similar to the above. Re (1): Am I right in thinking that a given update is sent to every replica of the shard to which it belongs for analysis and indexing? And that there is no immediate way to separate indexing from queries within a collection? Re (2): Deploying new schema/config should be as simple as uploading to ZooKeeper and reloading cores. Right? So for the case where the new config/schema is compatible with the index we're good. For the other case, I think we could do it by: Create a new collection, upload the new config/schema to zookeeper, index into the new collection, switch queries to the new collection, delete the old collection. Would this be the way to go? Or is there a simpler way that I cannot see? Just to bring the scale of our operation into it: Our index is approx. 200 million documents, with a total index size around 0.5TB. The normal flow of updates is in the order of a few million/day, but we will frequently (say on a weekly basis) need to re-index all or large parts of our documents. Either due to schema changes or re-processing of the original data. Sorry for dumping my brain on you, but any input you might have on this, will be highly appreciated. Regards, -- Steffen Elberg Godskesen Programmer DTU Library --------------------------------------- Technical University of Denmark Technical Information Center of Denmark Anker Engelunds Vej 1 PO Box 777 Building 101D 2800 Kgs. Lyngby s...@dtic.dtu.dk http://www.dtic.dtu.dk/