On 6/2/2016 1:28 AM, Selvam wrote: > We need to run a heavy SOLR with 300 million documents, with each > document having around 350 fields. The average length of the fields > will be around 100 characters, it may have date and integers fields as > well. Now we are not sure whether to have single server or run > multiple servers (for each node/shards?). We are using Solr 5.5 and > want best performance. We are new to SolrCloud, I would like to > request your inputs on how many nodes/shards we need to have and how > many servers for best performance. We primarily use geo-statial search.
The really fast answer, which I know isn't really an answer, is this: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ This is *also* the answer if I take time to really think about it ... and I do realize that none of this actually helps you. You will need to prototype. Ideally, your prototype should be the entire index. Performance will generally not scale linearly, so if you make decisions based on a small-scale prototype, you might find that you don't have enough hardware. The answer will be *heavily* influenced by how many of those 350 fields will be used for searching, sorting, faceting, etc. It will also be influenced by the complexity of the queries, how fast the queries must complete, and how many queries per second the cluster must handle. With the information you have supplied, your whole index is likely to be in the 10-20TB range. Performance on an index that large, even with plenty of hardware and good tuning, is probably not going to be stellar. You are likely to need several terabytes of total RAM (across all servers) to achieve reasonable performance *on a single copy*. If you want two copies of the index for high availability, your RAM requirements will double. Handling an index this size is not going to be inexpensive. An unavoidable fact about Solr performance: For best results, Solr must be able to read critical data entirely from RAM for queries. If it must go to disk, then performance will not be optimal -- disks are REALLY slow. Putting the data on SSD will help, but even SSD storage is quite a lot slower than RAM. For *perfect* performance, the index data on a server must fit entirely into unallocated memory -- which means memory beyond the Java heap and the basic operating system requirements. The operating system (not Java) will automatically handle caching the index in this available memory. This perfect situation is usually not required in practice, though -- the *entire* index is not needed when you do a query. Here's something I wrote about the topic of Solr performance. It is not as comprehensive as I would like it to be, because I have tried to make it relatively concise and useful: https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn