Alternatively, do you still want to be protected against a single failure during scheduled maintenance?
With a three node ensemble, when one Zookeeper node is being updated or moved to a new instance, one more failure means it does not have a quorum. With a five node ensemble, three nodes would still be up. If you are OK with that risk, run three nodes. If not, run five. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 21, 2016, at 9:27 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > NP. My usual question though is "how often do you expect to lose a > second ZK node before you can replace the first one that died?" > > My tongue-in-cheek statement is often "If you're losing two nodes > regularly, you have problems with your hardware that you're not really > going to address by adding more ZK nodes" ;). > > And do note that even if you lose quorum, SolrCloud will continue to > serve _queries_, albeit the "picture" each individual Solr node has of > the current state of all the Solr nodes will get stale. You won't be > able to index though. That said, the internal Solr load balancers > auto-distribute queries anyway to live nodes, so things can limp > along. > > As always, it's a tradeoff between expense/complexity and robustness > though, and each and every situation is different in how much risk it > can tolerate. > > FWIW, > Erick > > On Thu, Jan 21, 2016 at 1:49 AM, Yago Riveiro <yago.rive...@gmail.com> wrote: >> Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority. >> It's not the desirable configuration but is tolerable. >> >> >> >> Thanks Erick. >> >> >> >> \-- >> >> /Yago Riveiro >> >>> On Jan 21 2016, at 4:15 am, Erick Erickson <erickerick...@gmail.com> >> wrote: >> >>> >> >>> bq: 3 are to risky, you lost one you lost quorum >> >>> >> >>> Typo? You need to lose two..... >> >>> >> >>> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro <yago.rive...@gmail.com> >> wrote: >> > Our Zookeeper cluster is an ensemble of 5 machines, is a good starting >> point, >> > 3 are to risky, you lost one you lost quorum and with 7 sync cost >> increase. >> > >> > >> > >> > ZK cluster is in machines without IO and rotative hdd (don't not use SDD >> to >> > gain IO performance, zookeeper is optimized to spinning disks). >> > >> > >> > >> > The ZK cluster behaves without problems, the first deploy of ZK was in >> the >> > same machines that the Solr Cluster (ZK log in its own hdd) and that >> didn't >> > wok very well, CPU and networking IO from Solr Cluster was too much. >> > >> > >> > >> > About schema modifications. >> > >> > Modify the schema to add new fields is relative simple with new API, in >> the >> > pass all the work was manually uploading the schema to ZK and reloading >> all >> > collections (indexing must be disable or timeouts and funny errors >> happen). >> > >> > With the new Schema API this is more user friendly. Anyway, I stop >> indexing >> > and for reload the collections (I don't know if it's necessary >> nowadays). >> > >> > About Indexing data. >> > >> > >> > >> > We have self made data importer, it's not java and not performs batch >> indexing >> > (with 500 collections buffer data and build the batch is expensive and >> > complicate for error handling). >> > >> > >> > >> > We use regular HTTP post in json. Our throughput is about 1000 docs/s >> without >> > any type of optimization. Some time we have issues with replication, the >> slave >> > can keep pace with leader insertion and a full sync is requested, this >> is >> bad >> > because sync the replica again implicates a lot of IO wait and CPU and >> with >> > replicas with 100G take an hour or more (normally when this happen, we >> disable >> > indexing to release IO and CPU and not kill the node with a load of 50 >> or >> 60). >> > >> > In this department my advice is "keep it simple" in the end is an HTTP >> POST to >> > a node of the cluster. >> > >> > >> > >> > \\-- >> > >> > /Yago Riveiro >> > >> >> On Jan 20 2016, at 1:39 pm, Troy Edwards >> &lt;tedwards415...@gmail.com&gt; >> > wrote: >> > >> >> >> > >> >> Thank you for sharing your experiences/ideas. >> > >> >> >> > >> >> Yago since you have 8 billion documents over 500 collections, can >> you >> share >> > what/how you do index maintenance (e.g. add field)? And how are you >> loading >> > data into the index? Any experiences around how Zookeeper ensemble >> behaves >> > with so many collections? >> > >> >> >> > >> >> Best, >> > >> >> >> > >> >> >> > On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro >> &lt;yago.rive...@gmail.com&gt; >> > wrote: >> > >> >> >> > >> >> &gt; What I can say is: >> > &gt; >> > &gt; >> > &gt; * SDD (crucial for performance if the index doesn't fit in >> memory, and >> > &gt; will not fit) >> > &gt; * Divide and conquer, for that volume of docs you will need >> more >> than 6 >> > &gt; nodes. >> > &gt; * DocValues to not stress the java HEAP. >> > &gt; * Do you will you aggregate data?, if yes, what is your max >> > &gt; cardinality?, this question is the most important to size >> correctly the >> > &gt; memory needs. >> > &gt; * Latency is important too, which threshold is acceptable >> before >> > &gt; consider a query slow? >> > &gt; At my company we are running a 12 terabytes (2 replicas) Solr >> cluster >> > with >> > &gt; 8 >> > &gt; billion documents sparse over 500 collection . For this we have >> about 12 >> > &gt; machines with SDDs and 32G of ram each (~24G for the heap). >> > &gt; >> > &gt; We don't have a strict need of speed, 30 second query to >> aggregate 100 >> > &gt; million >> > &gt; documents with 1M of unique keys is fast enough for us, >> normally >> the >> > &gt; aggregation performance decrease as the number of unique keys >> increase, >> > &gt; with >> > &gt; low unique key factor, queries take less than 2 seconds if data >> is in OS >> > &gt; cache. >> > &gt; >> > &gt; Personal recommendations: >> > &gt; >> > &gt; * Sharding is important and smart sharding is crucial, you >> don't >> want >> > &gt; run queries on data that is not interesting (this slow down >> queries when >> > &gt; the dataset is big). >> > &gt; * If you want measure speed do it with about 1 billion >> documents >> to >> > &gt; simulate something real (real for 10 billion document world). >> > &gt; * Index with re-indexing in mind. with 10 billion docs, >> re-index >> data >> > &gt; takes months ... This is important if you don't use regular >> features of >> > &gt; Solr. In my case I configured Docvalues with disk format (not >> standard >> > &gt; feature in 4.x) and at some point this format was deprecated. >> Upgrade >> > Solr >> > &gt; to 5.x was an epic 3 months battle to do it without full >> downtime. >> > &gt; * Solr is like your girlfriend, will demand love and care and >> plenty of >> > &gt; space to full-recover replicas that in some point are out of >> sync, happen >> > a >> > &gt; lot restarting nodes (this is annoying with replicas with >> 100G), >> don't >> > &gt; underestimate this point. Free space can save your life. >> > &gt; >> > &gt; \\\\-- >> > &gt; >> > &gt; /Yago Riveiro >> > &gt; >> > &gt; &gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey >> > &amp;lt;apa...@elyograg.org&amp;gt; >> > &gt; wrote: >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; On 1/19/2016 1:30 PM, Troy Edwards wrote: >> > &gt; &amp;gt; We are currently "beta testing" a SolrCloud with 2 >> nodes and 2 >> > shards >> > &gt; with >> > &gt; &amp;gt; 2 replicas each. The number of documents is about >> 125000. >> > &gt; &amp;gt; >> > &gt; &amp;gt; We now want to scale this to about 10 billion >> documents. >> > &gt; &amp;gt; >> > &gt; &amp;gt; What are the steps to prototyping, hardware >> estimation and >> > stress >> > &gt; testing? >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; There is no general information available for sizing, >> because there >> > are >> > &gt; too many factors that will affect the answers. Some of the >> important >> > &gt; information that you need will be impossible to predict until >> you >> > &gt; actually build it and subject it to a real query load. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; https://lucidworks.com/blog/sizing-hardware-in-the- >> abstract-why-we- >> > dont- >> > &gt; have-a-definitive-answer/ >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; With an index of 10 billion documents, you may not be >> able to >> > precisely >> > &gt; predict performance and hardware requirements from a >> small-scale >> > &gt; prototype. You'll likely need to build a full-scale system on a >> small >> > &gt; testbed, look for bottlenecks, ask for advice, and plan on a >> larger >> > &gt; system for production. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; The hard limit for documents on a single shard is >> slightly less than >> > &gt; Java's Integer.MAX_VALUE -- just over two billion. Because >> deleted >> > &gt; documents count against this max, about one billion documents >> per shard >> > &gt; is the absolute max that should be loaded in practice. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; BUT, if you actually try to put one billion documents >> in a single >> > &gt; server, performance will likely be awful. A more reasonable >> limit per >> > &gt; machine is 100 million ... but even this is quite large. You >> might need >> > &gt; smaller shards, or you might be able to get good performance >> with larger >> > &gt; shards. It all depends on things that you may not even know >> yet. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; Memory is always a strong driver for Solr performance, >> and I am >> > speaking >> > &gt; specifically of OS disk cache -- memory that has not been >> allocated by >> > &gt; any program. With 10 billion documents, your total index size >> will >> > &gt; likely be hundreds of gigabytes, and might even reach terabyte >> scale. >> > &gt; Good performance with indexes this large will require a lot of >> total >> > &gt; memory, which probably means that you will need a lot of >> servers >> and >> > &gt; many shards. SSD storage is strongly recommended. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; For extreme scaling on Solr, especially if the query >> rate will be >> > high, >> > &gt; it is recommended to only have one shard replica per server. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; I have just added an "extreme scaling" section to the >> following wiki >> > &gt; page, but it's mostly a placeholder right now. I would like to >> have a >> > &gt; discussion with people who operate very large indexes so I can >> put real >> > &gt; usable information in this section. I'm on IRC quite frequently >> in the >> > &gt; #solr channel. >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; https://wiki.apache.org/solr/SolrPerformanceProblems >> > &gt; >> > &gt; &gt; >> > &gt; >> > &gt; &gt; Thanks, >> > &gt; Shawn >> > &gt; >> > &gt; >> > >>