Yes, it looks like you're on track, good luck!

On Mon, Oct 23, 2017 at 5:21 PM, Marko Babic <babma...@abebooks.com> wrote:
> Thanks for the quick reply, Erick.
>
> To follow up:
>
> “
> Well, first you can explicitly set legacyCloud=true by using the
>     Collections API CLUSTERPROP command. I don't recommend this, mind you,
>     as legacyCloud will not be supported forever.
> “
>
> Yes, but like you say: we’ll have to deal with at some point, not much 
> benefit in punting.
>
> “
> I'm not following something here though. When you say:
>     "The desired final state of a such a deployment is a fully configured
>     cluster ready to accept updates."
>     are there any documents already in the index or is this really a new 
> collection?
> “
>
> It’s a brand new collection with new configuration on fresh hardware which 
> we’ll then fully index from a source document store (we do this when we have 
> certain schema changes that require re-indexing or we want to experiment).
>
> “
>     Not sure what you mean here. Configuration of what?  Just spinning up
>     a Solr node pointing to the right ZooKeeper should be sufficient, or
>     I'm not understanding at all.
> “
>
> Apologies, the way I stated that was all wrong: by “requires configuration” I 
> just meant to note the need to specify a shard and a node when adding a 
> replica (and not even the node as you point out to me below ☺).
>
> “
> I suspect you're really talking about the "node" parameter
>     to ADDREPLCIA
> “
>
> Ah, yes: that is what I meant, sorry.
>
> It sounds like I haven’t missed too much in the documentation then, I’ll look 
> more into replica placement rules.
>
> Thank you so much again for your time and help.
>
> Marko
>
>
> On 10/23/17, 4:33 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
>     Well, first you can explicitly set legacyCloud=true by using the
>     Collections API CLUSTERPROP command. I don't recommend this, mind you,
>     as legacyCloud will not be supported forever.
>
>     I'm not following something here though. When you say:
>     "The desired final state of a such a deployment is a fully configured
>     cluster ready to accept updates."
>     are there any documents already in the index or is this really a new 
> collection?
>
>     and "adding new nodes requires explicit configuration"
>
>     Not sure what you mean here. Configuration of what?  Just spinning up
>     a Solr node pointing to the right ZooKeeper should be sufficient, or
>     I'm not understanding at all.
>
>     If not, your proposed outline seems right with one difference:
>     "if a node needs to be added: provision a machine, start up Solr, use
>     ADDREPLICA from Collections API passing shard number and coreNodeName"
>
>     coreNodeName isn't something you ordinarily need to bother with. I'm
>     being specific here where coreNodeName is usually something like
>     core_node7. I suspect you're really talking about the "node" parameter
>     to ADDREPLCIA, something like: 192.168.1.32:8983_solr, the entry from
>     live_nodes.
>
>     Now, all that said you may be better off just letting Solr add the
>     replica where it wants, it'll usually put a new replica on a node
>     without replicas so specifying the collection and shard should be
>     sufficient. Also, note that there are replica placement rules that can
>     help enforce this kind of thing.
>
>     Best,
>     Erick
>
>     On Mon, Oct 23, 2017 at 3:12 PM, Marko Babic <babma...@abebooks.com> 
> wrote:
>     > Hi everyone,
>     >
>     > I'm working on upgrading a set of clusters from Solr 4.10.4 to Solr 
> 7.1.0.
>     >
>     > Our deployment tooling no longer works given that legacyCloud defaults 
> to false (SOLR-8256) and I'm hoping to get some advice on what to do going 
> forward.
>     >
>     > Our setup is as follows:
>     >   * we run in AWS with multiple independent Solr clusters, each with 
> its own Zookeeper tier
>     >   * each cluster hosts only a single collection
>     >   * each machine/node in the cluster has a single core / is a replica 
> for one shard in the collection
>     >
>     > We bring up new clusters as needed.  This is entirely automated and 
> basically works as follows:
>     >   * we first provision and set up a fresh Zookeeper tier
>     >   * then, we provision a Solr bootstrapper machine that uploads 
> collection config, specifies numShards and starts up
>     >   * it's then easy provision the rest of the machines and have them 
> automatically join a shard in the collection by hooking them to the right 
> Zookeeper cluster and specifying numShards
>     >   * if a node needs to be added to the cluster we just need to spin a 
> machine up and start up Solr
>     >
>     > The desired final state of a such a deployment is a fully configured 
> cluster ready to accept updates.
>     >
>     > Now that legacyCloud is false I'm not sure how to preserve this pretty 
> nice, hands-off deployment style as the bootstrapping performed by the first 
> node provisioned doesn't create a collection and adding new nodes requires 
> explicit configuration.
>     >
>     > A new deployment procedure that I've worked out using the Collections 
> API would look like:
>     >   * provision Zookeeper tier
>     >   * provision all the Solr nodes, wait for them all to come up
>     >   * upload collection config + solr.xml to Zookeeper
>     >   * create collection using Collections API
>     >   * if a node needs to be added: provision a machine, start up Solr, 
> use ADDREPLICA from Collections API passing shard number and coreNodeName
>     >
>     > This isn’t a giant deal to build but it adds complexity that I'm not 
> excited about as deployment tooling needs to have some understanding of what 
> the global state of the cluster is before being able to create a collection 
> or when adding/replacing nodes.
>     >
>     > The questions I was hoping someone would have some time to help me with 
> are:
>     >
>     > * Does the new deployment procedure I've suggested seem reasonable?  
> Would we be doing anything wrong/fighting best practices?
>     >   * Is there a way to keep cluster provisioning automated without 
> having to build additional orchestration logic into our deployment tooling 
> (using autoscaling, or triggers, or something I don’t know about)?
>     >
>     > Apologies for the wall of text and thanks. :)
>     >
>     > Marko
>     >
>
>

Reply via email to