Re: loading zookeeper data

Erick Erickson Fri, 22 Jul 2016 17:20:49 -0700

bq: Zookeeper seems a step backward.....

For stand-alone Solr, I tend to agree it's a bit awkward. But as Shawn
says, there's no _need_ to run Zookeeper with a more recent Solr.
Running Solr without Zookeeper is perfectly possible, we call that
"stand alone". And, if you have no need for sharding etc., there's no
compelling reason to run SolrCloud. Well, there are some good reasons
having to do with fail-over and the like, but...

Where SolrCloud becomes compelling is when you _do_ need to have
shard, and deal with HA/DR. Then the added step of maintaining things
in Zookeeper is a small price to pay for _not_ having to be sure that
all the configs on all the servers are all the same. Imagine a cluster
with several hundred replicas out there. Being absolutely sure that
all of them have the same configs, have been restarted and the like
becomes daunting. So having to do an "upconfig" is a good tradeoff
IMO.

The bin/solr script has a "zk -upconfig" parameter that'll take care
of pushing the configs up. Since you already have the configs in VCS,
your process is just to pull them from vcs to "somewhere" then
bin/solr zk -upconfig -z zookeeper_asserss -n configset_name -d
directory_you_downloaded_to_from_VCS.

Thereafter you simply refer to them by name when you create a
collection and the rest of it is automatic. Every time a core reloads
it gets the new configs.

If you're trying to manipulate _cores_, that may be where you're going
wrong. Think of them as _collections_. What's not clear from your
problem statement is whether these cores on the various machines are
part of the same collection or not. Do you have multiple shards in one
logical index? Or do you have multiple collections that have
masters/slaves (in which case the master and all the slaves that point
to it will be a "collection")? Do all of the cores you have use the
same configurations? Or is each set of master/slaves using a different
configuration?

Best,
Erick

On Fri, Jul 22, 2016 at 4:41 PM, Aristedes Maniatis <a...@ish.com.au> wrote:
> On 22/07/2016 5:22pm, Aristedes Maniatis wrote:
>> But then what? In the production cluster it seems I then need to
>>
>> 1. Grab the latest configuration bundle for each core and unpack them
>> 2. Launch Java
>> 3. Execute the Solr jars (from the production server since it must be the 
>> right version)
>> - with org.apache.solr.cloud.ZkCLI
>> - and some parameters pointing to the production Zookeeper cluster
>> - pointing also to the unpacked config files
>> 4. Parse the output to understand if any error happened
>> 5. Wait for Solr to pick up the new configuration and do any final 
>> production checks
>
> Shawn wrote:
>
>> If you *do* want to run in cloud mode, then you will need to use zkcli to 
>> upload config changes to zookeeper and then issue a collection reload with 
>> the Collections API. This will find and reload all the cores related to that 
>> collection, across the entire cloud. You have the option of using the ZkCLI 
>> java class, or the zkcli.sh script that can be found in all 5.x and 6.x 
>> installs at server/scripts/cloud-scripts. As of version 5.3, the jars 
>> required for zkcli are already unpacked before Solr is started.
>
>
> Thanks Shawn,
>
> I'm trying to understand the common workflow of deploying configuration to 
> Zookeeper. I'm new to that tool, so at this point it appears to be a big 
> black box which can only be populated with data with a specific Java 
> application. Surely others here on this list use configuration management 
> tools and other non-manual workflows.
>
> I've written a little gradle task to wrap up sending data to zookeeper:
>
> task deployConfig {
>         description = 'Upload configuration to production zookeeper cluster.'
>         file('src/main/resources/solr').eachDir { core ->
>             doLast {
>               javaexec {
>                 classpath configurations.zookeeper
>                 main = 'org.apache.solr.cloud.ZkCLI'
>                 args = [
>                         "-confdir", core,
>                         "-zkhost", "solr.host.com:2181",
>                         "-cmd", "upconfig",
>                         "-confname", core.name
>                 ]
>               }
>             }
>         }
> }
>
>
> That does the trick, although I've not yet figured out how to know whether it 
> was successful because it doesn't return anything. And as I outlined above, 
> it is quite cumbersome to automate. Are you saying that everyone who runs 
> SolrCloud runs all these scripts against their production jars by hand?
>
> Zookeeper seems a step backward from files on disk in terms of ease of 
> automation, inspecting for problems, version control and a new point of 
> failure.
>
> Perhaps because I'm new to it I'm missing a set of tools that make all that 
> much easier. Or for that matter, I'm missing an understanding of what problem 
> Zookeeper solves.
>
> Ari
>
>
> --
> -------------------------->
> Aristedes Maniatis
> CEO, ish
> https://www.ish.com.au
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>

Re: loading zookeeper data

Reply via email to