> If people are running 100 or 1000 node clusters and use each node as a ZK server, by default, what kind of impact would that have?
Bad. Very very bad. The largest ZK quorum I've personally seen is 9, and I've heard rumors of somebody running 15. I think the recommended approach for distributing load is to use Observers[1], which may provide some tiering benefits or may be redundant with traditional ZK clients. Maybe an Observer per failure zone makes sense for the Solr Operator? [1]: https://zookeeper.apache.org/doc/current/zookeeperObservers.html On Wed, Sep 15, 2021 at 8:33 AM Houston Putman <hous...@apache.org> wrote: > If we were to make this work, and support productionized embedded > Zookeeper, then it would absolutely be something that we want to support by > default in the Solr Operator. > > I don't think we'd be able to cut the Zookeeper Operator dependency really > quickly, because this is going in at the earliest in Solr 9 and more likely > Solr 10 (probably). The Solr Operator still needs to support older > versions, especially Solr 8 for a fair amount of time. So once the minimum > supported Solr version is one that has this feature, then we can get rid of > the Zookeeper Operator for good. This is probably my favorite thing about > the SIP. The Zookeeper Operator is fine, but removing that dependency would > lift a huge burden off of the Solr Operator's shoulders. > > I also think it's a good idea to be able to start solr in a ZK-Only mode. > > Also you should be able to tell Solr whether you want it to start as a ZK > member or observer, or not run ZK on that node at all. I'm not extremely in > touch with the ZK community at this point, but what cluster sizes are > people scaling up to nowadays? If people are running 100 or 1000 node > clusters and use each node as a ZK server, by default, what kind of impact > would that have? > > On Tue, Sep 14, 2021 at 8:20 PM Mike Drob <md...@apache.org> wrote: > >> I like the idea of starting nodes in a ZK-only mode, probably we would >> call it something like coordination mode. It ties in to ideas that I've had >> while discussing with other folks about other Solr node specializations, >> like "edge" nodes that are part of a cluster but do not host collections >> and exist solely for routing http queries to the appropriate places. >> >> I think it could be useful in a k8s deployment as well, but I'd have to >> think about how we want to do all the port magic there. I know that I've >> had conversations with Houston about wanting to move away from >> ZookeeperOperator, but those haven't quite taken hold yet. >> >> On Tue, Sep 14, 2021 at 6:02 PM Jan Høydahl <jan....@cominvent.com> >> wrote: >> >>> Thanks for kicking this off Mike. I added a few "rejected alternatives" >>> and put a few questions for thought in a comment. You may want to keep all >>> discussion in this email thread, so here are the questions copied: >>> >>> >>> *This is promising! Question: Would this mode be valuable also for >>> Kubernetes deployments, i.e. we could get rid of the ZookeeperOperator and >>> instead let the SolrOperator keep track of which Solr pods that also act as >>> ZK nodes?Would we allow a Solr node to start in a ZK-only mode, i.e. not >>> eligible for collections/cores/overseer? This would also support those huge >>> clusters where you want dedicated ZKs.* >>> >>> This also ties in with SIP-6 Solr should own the bootstrap process >>> <https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process>, >>> as we'd want to control startup/shutdown behavior wrt the zk, e.g. start >>> embedded zk before solr and stop solr before stopping zk. Perhaps also >>> gracefully exiting the quorum on planned shutdown? >>> >>> Jan >>> >>> 14. sep. 2021 kl. 22:09 skrev Mike Drob <md...@apache.org>: >>> >>> Devs, >>> >>> We've previously discussed maintaining ZK as being an operational hurdle >>> for some groups getting started or migrating to SolrCloud from non-ZK cloud >>> mode. I'd like to discuss the idea of embedding ZK in our own process >>> control. >>> >>> Please see the SIP at >>> https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper >>> >>> Thank you, >>> Mike >>> >>> >>>