Hi everyone,

I'm working on upgrading a set of clusters from Solr 4.10.4 to Solr 7.1.0.

Our deployment tooling no longer works given that legacyCloud defaults to false 
(SOLR-8256) and I'm hoping to get some advice on what to do going forward.

Our setup is as follows:
  * we run in AWS with multiple independent Solr clusters, each with its own 
Zookeeper tier
  * each cluster hosts only a single collection
  * each machine/node in the cluster has a single core / is a replica for one 
shard in the collection

We bring up new clusters as needed.  This is entirely automated and basically 
works as follows:
  * we first provision and set up a fresh Zookeeper tier
  * then, we provision a Solr bootstrapper machine that uploads collection 
config, specifies numShards and starts up
  * it's then easy provision the rest of the machines and have them 
automatically join a shard in the collection by hooking them to the right 
Zookeeper cluster and specifying numShards
  * if a node needs to be added to the cluster we just need to spin a machine 
up and start up Solr

The desired final state of a such a deployment is a fully configured cluster 
ready to accept updates.

Now that legacyCloud is false I'm not sure how to preserve this pretty nice, 
hands-off deployment style as the bootstrapping performed by the first node 
provisioned doesn't create a collection and adding new nodes requires explicit 
configuration.

A new deployment procedure that I've worked out using the Collections API would 
look like:
  * provision Zookeeper tier
  * provision all the Solr nodes, wait for them all to come up
  * upload collection config + solr.xml to Zookeeper
  * create collection using Collections API
  * if a node needs to be added: provision a machine, start up Solr, use 
ADDREPLICA from Collections API passing shard number and coreNodeName

This isn’t a giant deal to build but it adds complexity that I'm not excited 
about as deployment tooling needs to have some understanding of what the global 
state of the cluster is before being able to create a collection or when 
adding/replacing nodes.

The questions I was hoping someone would have some time to help me with are:

* Does the new deployment procedure I've suggested seem reasonable?  Would we 
be doing anything wrong/fighting best practices?
  * Is there a way to keep cluster provisioning automated without having to 
build additional orchestration logic into our deployment tooling (using 
autoscaling, or triggers, or something I don’t know about)?

Apologies for the wall of text and thanks. :)

Marko

Reply via email to