Also not sure about your domain but you may want to double check if you really need 350 fields for searching & storing. Many times when you challenge this against the higher cost of hardware, you may be able to reduce # of searchable / stored fields.
Thanks, Susheel On Thu, Jun 2, 2016 at 9:21 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 6/2/2016 1:28 AM, Selvam wrote: > > We need to run a heavy SOLR with 300 million documents, with each > > document having around 350 fields. The average length of the fields > > will be around 100 characters, it may have date and integers fields as > > well. Now we are not sure whether to have single server or run > > multiple servers (for each node/shards?). We are using Solr 5.5 and > > want best performance. We are new to SolrCloud, I would like to > > request your inputs on how many nodes/shards we need to have and how > > many servers for best performance. We primarily use geo-statial search. > > The really fast answer, which I know isn't really an answer, is this: > > > https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > This is *also* the answer if I take time to really think about it ... > and I do realize that none of this actually helps you. You will need to > prototype. Ideally, your prototype should be the entire index. > Performance will generally not scale linearly, so if you make decisions > based on a small-scale prototype, you might find that you don't have > enough hardware. > > The answer will be *heavily* influenced by how many of those 350 fields > will be used for searching, sorting, faceting, etc. It will also be > influenced by the complexity of the queries, how fast the queries must > complete, and how many queries per second the cluster must handle. > > With the information you have supplied, your whole index is likely to be > in the 10-20TB range. Performance on an index that large, even with > plenty of hardware and good tuning, is probably not going to be > stellar. You are likely to need several terabytes of total RAM (across > all servers) to achieve reasonable performance *on a single copy*. If > you want two copies of the index for high availability, your RAM > requirements will double. Handling an index this size is not going to > be inexpensive. > > An unavoidable fact about Solr performance: For best results, Solr must > be able to read critical data entirely from RAM for queries. If it must > go to disk, then performance will not be optimal -- disks are REALLY > slow. Putting the data on SSD will help, but even SSD storage is quite > a lot slower than RAM. > > For *perfect* performance, the index data on a server must fit entirely > into unallocated memory -- which means memory beyond the Java heap and > the basic operating system requirements. The operating system (not > Java) will automatically handle caching the index in this available > memory. This perfect situation is usually not required in practice, > though -- the *entire* index is not needed when you do a query. > > Here's something I wrote about the topic of Solr performance. It is not > as comprehensive as I would like it to be, because I have tried to make > it relatively concise and useful: > > https://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn > >