I thought jar files for custom code were meant to go into the '.system' collection, not zookeeper. Did I miss a new/old storage option?
On Wed, Jan 2, 2019, 12:25 PM Erick Erickson <erickerick...@gmail.com wrote: > 1> no. At one point, this could be done in the sense that the > collections would be reconstructed, (legacyCloud) but that turned out > to have.. side effects. Even in that case, though, Solr couldn't > reconstruct the configsets. (insert rant that you really must store > your configsets in a VCS system somewhere IMO). > > 2> Should be fine, as long as the state changes don't include things > like adding replicas or collections or you've changed your configsets. > ZK has nothing to do with commits for instance. Leader election is > recorded in ZK, but other leaders will be elected if necessary. Again, > though, if you've changed the topology (added replicas and/or > collections and/or shards if using implicit routing) between the time > you took the snapshot and ZK failed you'll have an incomplete restored > state. > > Now, all that said ZooKeeper data is "just data". Apart from blobs > stored in ZK, you can manually reconstruct the whole thing with a > text editor and upload it. this would be tedious and error-prone to be > sure, but do-able. Periodically storing away a copy of the Collections > API CLUSTERSTATUS would help a lot. > > Another approach would be to simply re-create your collections with > the exact same shard count. That'll create replicas with the same > ranges etc. Then shut your Solr instances down and copy the data > directory from the correct old replica to the correct new replica. > Once you're satisfied that things are running, you can delete the old > (unused) data. As an aside, in this case I'd create my new > collection(s) as leader-only (1 replica), then copy as necessary and > verify that things were as expected. Once that was done, I'd use > ADDREPLICA to build out the new collection(s). This pre-supposes you > can get your configsets back from VCS as well as any binary data > you've stored in ZK (e.g. jar files for custom code and the like). > > So overall it's do-able even without ZK snapshots _assuming_ you can > find copies of your configsets and any custom code you've stored in > ZK. Not something I'd really _like_ to do, but in an emergency you > have options. > > But backing up ZK snapshots in a safe place would be, by far, the > easiest and safest thing to do.... > > HTH, > Erick > > On Wed, Jan 2, 2019 at 12:36 AM Pavel Micka <pavel.mi...@zoomint.com> > wrote: > > > > Hi, > > We are currently implementing Solr cloud and as part of this effort we > are investigating, which failure modes may happen between Solr and > Zookeeper. > > > > We have found quite a lot articles describing the "happy path" failure, > when ZK stops (loses majority) and the Solr Cluster ceases to serve write > requests (& read continues to work as expected). Once ZK cluster is > reconciled and majority achieved again, everything continues working as > expected. > > > > What we have not been able to find is what happens when ZK cluster > catastrophically fails and loses its data. Either completely (scenario A) > or is restarted from backup (scenario B). > > > > So now the questions: > > > > 1) Scenario A - Is existing Solr Cloud cluster able to start > against a clean Zookeeper and reconstruct all the ZK data from its internal > state (using some king of emergency recovery; it may take long)? > > > > 2) Scenario B - What is the worst case backup/restore scenario? For > example when > > > > a. ZK is backed up > > > > b. Cluster performs some transition between states "X -> Y" (such > as commit shard, elect new leader etc.) > > > > c. ZK fails completely > > > > d. ZK is restored from backup created in step a > > > > e. Solr Cloud is in state "Y", while ZK is in state "X" > > > > Thanks in advance, > > > > Pavel > > >