FWIW -- zookeeper is pretty set-and-forget in my experience with settings like autopurge.snapRetainCount, autopurge.purgeInterval, and rotating the zookeeper.out stdout file.
It is a big hassle to setup the individual myid files and keep them in sync with the server.$id=hostname in zoo.cfg but, again, one time pain. I think smaller solr deployments could benefit from some easier ability to configure the embedded zookeeper (like the improved zk upconfig and friends) which might address this entire point? The only reason I don't run embedded zk (I use three small ec2's) is because cpu/disk contention on the same server have burned me in the past. On Wed, Jun 10, 2020 at 3:30 AM Jan Høydahl <jan....@cominvent.com> wrote: > > Curator is just on the client (solr) side, to make it easier to integrate > with Zookeeper, right? > > If you study Elastic, they had terrible cluster stability a few years ago > since everything > was too «dynamic» and «zero config». That led to the system outsmarting > itself when facing > real-life network partitions and other failures. Solr did not have these > issues exactly because > it relies on Zookeeper which is very static and hard to change (on purpose), > and thus delivers > a strong, stable quorum. So what did Elastic do a couple years ago? They > adopted the same > best practice as ZK, recommending 3 or 5 (statically defined) master nodes > that owns the > cluster state. > > Solr could get rid of ZK the same way as KAFKA. But while KAFKA already has a > distributed log they could replace ZK with (hey, Kafka IS a log), Solr would > need to add > such a log, and it would need to be embedded in the Solr process to avoid > that extra runtime. > I believe it could be done with Apache Ratis > (https://ratis.incubator.apache.org <https://ratis.incubator.apache.org/>) > which > is a RAFT Java library. But I’m doubtful if the project has the bandwidth and > dedication right > now to embark on such a project. It would probably be a multi-year effort, > first building > abstractions on top of ZK, then moving one piece of ZK dependency over to > RAFT at a time, > needing both systems in parallel, before at the end ZK could go away. > > I’d like to see it happen. Especially for smaller deployments it would be > fantastic. > > Jan > > > 10. jun. 2020 kl. 01:03 skrev Erick Erickson <erickerick...@gmail.com>: > > > > The intermediate solution is to migrate to Curator. I don’t know all the > > ins and outs > > of that and whether or not it would be easier to setup and maintain. > > > > I do know that Zookeeper is deeply embedded in Solr and taking replacing it > > with > > most anything would be a major pain. > > > > I’m also certain that rewriting Zookeeper is a rat-hole that would take a > > major > > effort. If anyone would like to try it, all patches welcome. > > > > FWIW, > > er...@curmudgeon.com > > > >> On Jun 9, 2020, at 6:01 PM, Dave <hastings.recurs...@gmail.com> wrote: > >> > >> Is it horrible that I’m already burnt out from just reading that? > >> > >> I’m going to stick to the classic solr master slave set up for the > >> foreseeable future, at least that let’s me focus more on the search theory > >> rather than the back end system non stop. > >> > >>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > >>> > >>> My 2 cents, I have few solrcloud productions installations, I would share > >>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as > >>> they > >>> come out of my mind. > >>> > >>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper > >>> expert even if you only need Solr. > >>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on > >>> separate machines but for many customers this is too expensive. And for > >>> the > >>> rest it is expensive just to have the instances (i.e. dockers). It is > >>> expensive even to have people that know Zookeeper or even only train them. > >>> - given the high availability function of a zookeeper cluster you have > >>> to monitor it and promptly backup and restore. But it is hard to monitor > >>> (and configure the monitoring) and it is even harder to backup and restore > >>> (when it is running). > >>> - You can't add or remove nodes in zookeeper when it is up. Only the > >>> latest > >>> version should finally give the possibility to add/remove nodes when it is > >>> running, but afak this is not still supported by SolrCloud (out of the > >>> box). > >>> - many people fail when they try to run a SolrCloud cluster because it is > >>> hard to set up, for example: SolrCloud zkcli runs poorly on windows. > >>> - it is hard to admin the zookeeper remotely, basically there are no > >>> utilities that let you easily list/read/write/delete files on a zookeeper > >>> filesystem. > >>> - it was really hard to create a zookeeper ensemble in kubernetes, only > >>> recently appeared few solutions. This was so counter-productive for the > >>> Solr project because now the world is moving to Kubernetes, and there is > >>> basically no support. > >>> - well, after all these troubles, when the solrcloud clusters are > >>> configured correctly then, well, they are solid (rock?). And even if few > >>> Solr nodes/replicas went down the entire cluster can restore itself almost > >>> automatically, but how much work. > >>> > >>> Believe me, I like Solr, but at the end of this long journey, sometimes I > >>> would really use only paas/saas instead of having to deal with all these > >>> troubles. > > >