Re: Scaling SolrCloud

Walter Underwood Thu, 21 Jan 2016 10:46:49 -0800

Alternatively, do you still want to be protected against a single failure 
during scheduled maintenance?


With a three node ensemble, when one Zookeeper node is being updated or moved 
to a new instance, one more failure means it does not have a quorum. With a 
five node ensemble, three nodes would still be up.

If you are OK with that risk, run three nodes. If not, run five.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 21, 2016, at 9:27 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> NP. My usual question though is "how often do you expect to lose a
> second ZK node before you can replace the first one that died?"
> 
> My tongue-in-cheek statement is often "If you're losing two nodes
> regularly, you have problems with your hardware that you're not really
> going to address by adding more ZK nodes" ;).
> 
> And do note that even if you lose quorum, SolrCloud will continue to
> serve _queries_, albeit the "picture" each individual Solr node has of
> the current state of all the Solr nodes will get stale. You won't be
> able to index though. That said, the internal Solr load balancers
> auto-distribute queries anyway to live nodes, so things can limp
> along.
> 
> As always, it's a tradeoff between expense/complexity and robustness
> though, and each and every situation is different in how much risk it
> can tolerate.
> 
> FWIW,
> Erick
> 
> On Thu, Jan 21, 2016 at 1:49 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority.
>> It's not the desirable configuration but is tolerable.
>> 
>> 
>> 
>> Thanks Erick.
>> 
>> 
>> 
>> \--
>> 
>> /Yago Riveiro
>> 
>>> On Jan 21 2016, at 4:15 am, Erick Erickson &lt;erickerick...@gmail.com&gt;
>> wrote:
>> 
>>> 
>> 
>>> bq: 3 are to risky, you lost one you lost quorum
>> 
>>> 
>> 
>>> Typo? You need to lose two.....
>> 
>>> 
>> 
>>> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro &lt;yago.rive...@gmail.com&gt;
>> wrote:
>> &gt; Our Zookeeper cluster is an ensemble of 5 machines, is a good starting
>> point,
>> &gt; 3 are to risky, you lost one you lost quorum and with 7 sync cost
>> increase.
>> &gt;
>> &gt;
>> &gt;
>> &gt; ZK cluster is in machines without IO and rotative hdd (don't not use SDD
>> to
>> &gt; gain IO performance, zookeeper is optimized to spinning disks).
>> &gt;
>> &gt;
>> &gt;
>> &gt; The ZK cluster behaves without problems, the first deploy of ZK was in
>> the
>> &gt; same machines that the Solr Cluster (ZK log in its own hdd) and that
>> didn't
>> &gt; wok very well, CPU and networking IO from Solr Cluster was too much.
>> &gt;
>> &gt;
>> &gt;
>> &gt; About schema modifications.
>> &gt;
>> &gt; Modify the schema to add new fields is relative simple with new API, in
>> the
>> &gt; pass all the work was manually uploading the schema to ZK and reloading
>> all
>> &gt; collections (indexing must be disable or timeouts and funny errors
>> happen).
>> &gt;
>> &gt; With the new Schema API this is more user friendly. Anyway, I stop
>> indexing
>> &gt; and for reload the collections (I don't know if it's necessary 
>> nowadays).
>> &gt;
>> &gt; About Indexing data.
>> &gt;
>> &gt;
>> &gt;
>> &gt; We have self made data importer, it's not java and not performs batch
>> indexing
>> &gt; (with 500 collections buffer data and build the batch is expensive and
>> &gt; complicate for error handling).
>> &gt;
>> &gt;
>> &gt;
>> &gt; We use regular HTTP post in json. Our throughput is about 1000 docs/s
>> without
>> &gt; any type of optimization. Some time we have issues with replication, the
>> slave
>> &gt; can keep pace with leader insertion and a full sync is requested, this 
>> is
>> bad
>> &gt; because sync the replica again implicates a lot of IO wait and CPU and
>> with
>> &gt; replicas with 100G take an hour or more (normally when this happen, we
>> disable
>> &gt; indexing to release IO and CPU and not kill the node with a load of 50 
>> or
>> 60).
>> &gt;
>> &gt; In this department my advice is "keep it simple" in the end is an HTTP
>> POST to
>> &gt; a node of the cluster.
>> &gt;
>> &gt;
>> &gt;
>> &gt; \\--
>> &gt;
>> &gt; /Yago Riveiro
>> &gt;
>> &gt;&gt; On Jan 20 2016, at 1:39 pm, Troy Edwards
>> &amp;lt;tedwards415...@gmail.com&amp;gt;
>> &gt; wrote:
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; Thank you for sharing your experiences/ideas.
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; Yago since you have 8 billion documents over 500 collections, can 
>> you
>> share
>> &gt; what/how you do index maintenance (e.g. add field)? And how are you
>> loading
>> &gt; data into the index? Any experiences around how Zookeeper ensemble
>> behaves
>> &gt; with so many collections?
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; Best,
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt;
>> &gt; On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro
>> &amp;lt;yago.rive...@gmail.com&amp;gt;
>> &gt; wrote:
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; &amp;gt; What I can say is:
>> &gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; * SDD (crucial for performance if the index doesn't fit in
>> memory, and
>> &gt; &amp;gt; will not fit)
>> &gt; &amp;gt; * Divide and conquer, for that volume of docs you will need 
>> more
>> than 6
>> &gt; &amp;gt; nodes.
>> &gt; &amp;gt; * DocValues to not stress the java HEAP.
>> &gt; &amp;gt; * Do you will you aggregate data?, if yes, what is your max
>> &gt; &amp;gt; cardinality?, this question is the most important to size
>> correctly the
>> &gt; &amp;gt; memory needs.
>> &gt; &amp;gt; * Latency is important too, which threshold is acceptable 
>> before
>> &gt; &amp;gt; consider a query slow?
>> &gt; &amp;gt; At my company we are running a 12 terabytes (2 replicas) Solr
>> cluster
>> &gt; with
>> &gt; &amp;gt; 8
>> &gt; &amp;gt; billion documents sparse over 500 collection . For this we have
>> about 12
>> &gt; &amp;gt; machines with SDDs and 32G of ram each (~24G for the heap).
>> &gt; &amp;gt;
>> &gt; &amp;gt; We don't have a strict need of speed, 30 second query to
>> aggregate 100
>> &gt; &amp;gt; million
>> &gt; &amp;gt; documents with 1M of unique keys is fast enough for us, 
>> normally
>> the
>> &gt; &amp;gt; aggregation performance decrease as the number of unique keys
>> increase,
>> &gt; &amp;gt; with
>> &gt; &amp;gt; low unique key factor, queries take less than 2 seconds if data
>> is in OS
>> &gt; &amp;gt; cache.
>> &gt; &amp;gt;
>> &gt; &amp;gt; Personal recommendations:
>> &gt; &amp;gt;
>> &gt; &amp;gt; * Sharding is important and smart sharding is crucial, you 
>> don't
>> want
>> &gt; &amp;gt; run queries on data that is not interesting (this slow down
>> queries when
>> &gt; &amp;gt; the dataset is big).
>> &gt; &amp;gt; * If you want measure speed do it with about 1 billion 
>> documents
>> to
>> &gt; &amp;gt; simulate something real (real for 10 billion document world).
>> &gt; &amp;gt; * Index with re-indexing in mind. with 10 billion docs, 
>> re-index
>> data
>> &gt; &amp;gt; takes months ... This is important if you don't use regular
>> features of
>> &gt; &amp;gt; Solr. In my case I configured Docvalues with disk format (not
>> standard
>> &gt; &amp;gt; feature in 4.x) and at some point this format was deprecated.
>> Upgrade
>> &gt; Solr
>> &gt; &amp;gt; to 5.x was an epic 3 months battle to do it without full
>> downtime.
>> &gt; &amp;gt; * Solr is like your girlfriend, will demand love and care and
>> plenty of
>> &gt; &amp;gt; space to full-recover replicas that in some point are out of
>> sync, happen
>> &gt; a
>> &gt; &amp;gt; lot restarting nodes (this is annoying with replicas with 
>> 100G),
>> don't
>> &gt; &amp;gt; underestimate this point. Free space can save your life.
>> &gt; &amp;gt;
>> &gt; &amp;gt; \\\\--
>> &gt; &amp;gt;
>> &gt; &amp;gt; /Yago Riveiro
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey
>> &gt; &amp;amp;lt;apa...@elyograg.org&amp;amp;gt;
>> &gt; &amp;gt; wrote:
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:
>> &gt; &amp;gt; &amp;amp;gt; We are currently "beta testing" a SolrCloud with 2
>> nodes and 2
>> &gt; shards
>> &gt; &amp;gt; with
>> &gt; &amp;gt; &amp;amp;gt; 2 replicas each. The number of documents is about
>> 125000.
>> &gt; &amp;gt; &amp;amp;gt;
>> &gt; &amp;gt; &amp;amp;gt; We now want to scale this to about 10 billion
>> documents.
>> &gt; &amp;gt; &amp;amp;gt;
>> &gt; &amp;gt; &amp;amp;gt; What are the steps to prototyping, hardware
>> estimation and
>> &gt; stress
>> &gt; &amp;gt; testing?
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; There is no general information available for sizing,
>> because there
>> &gt; are
>> &gt; &amp;gt; too many factors that will affect the answers. Some of the
>> important
>> &gt; &amp;gt; information that you need will be impossible to predict until
>> you
>> &gt; &amp;gt; actually build it and subject it to a real query load.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; https://lucidworks.com/blog/sizing-hardware-in-the-
>> abstract-why-we-
>> &gt; dont-
>> &gt; &amp;gt; have-a-definitive-answer/
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; With an index of 10 billion documents, you may not be
>> able to
>> &gt; precisely
>> &gt; &amp;gt; predict performance and hardware requirements from a 
>> small-scale
>> &gt; &amp;gt; prototype. You'll likely need to build a full-scale system on a
>> small
>> &gt; &amp;gt; testbed, look for bottlenecks, ask for advice, and plan on a
>> larger
>> &gt; &amp;gt; system for production.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; The hard limit for documents on a single shard is
>> slightly less than
>> &gt; &amp;gt; Java's Integer.MAX_VALUE -- just over two billion. Because
>> deleted
>> &gt; &amp;gt; documents count against this max, about one billion documents
>> per shard
>> &gt; &amp;gt; is the absolute max that should be loaded in practice.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; BUT, if you actually try to put one billion documents
>> in a single
>> &gt; &amp;gt; server, performance will likely be awful. A more reasonable
>> limit per
>> &gt; &amp;gt; machine is 100 million ... but even this is quite large. You
>> might need
>> &gt; &amp;gt; smaller shards, or you might be able to get good performance
>> with larger
>> &gt; &amp;gt; shards. It all depends on things that you may not even know 
>> yet.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; Memory is always a strong driver for Solr performance,
>> and I am
>> &gt; speaking
>> &gt; &amp;gt; specifically of OS disk cache -- memory that has not been
>> allocated by
>> &gt; &amp;gt; any program. With 10 billion documents, your total index size
>> will
>> &gt; &amp;gt; likely be hundreds of gigabytes, and might even reach terabyte
>> scale.
>> &gt; &amp;gt; Good performance with indexes this large will require a lot of
>> total
>> &gt; &amp;gt; memory, which probably means that you will need a lot of 
>> servers
>> and
>> &gt; &amp;gt; many shards. SSD storage is strongly recommended.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; For extreme scaling on Solr, especially if the query
>> rate will be
>> &gt; high,
>> &gt; &amp;gt; it is recommended to only have one shard replica per server.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; I have just added an "extreme scaling" section to the
>> following wiki
>> &gt; &amp;gt; page, but it's mostly a placeholder right now. I would like to
>> have a
>> &gt; &amp;gt; discussion with people who operate very large indexes so I can
>> put real
>> &gt; &amp;gt; usable information in this section. I'm on IRC quite frequently
>> in the
>> &gt; &amp;gt; #solr channel.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; https://wiki.apache.org/solr/SolrPerformanceProblems
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; Thanks,
>> &gt; &amp;gt; Shawn
>> &gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt;
>>

Re: Scaling SolrCloud

Reply via email to