Re: SOLR cloud sharding

Susheel Kumar Fri, 03 Jun 2016 05:52:58 -0700

Also not sure about your domain but you may want to double check if you
really need 350 fields for searching & storing. Many times when you
challenge this against the higher cost of hardware, you may be able to
reduce # of searchable / stored fields.


Thanks,
Susheel

On Thu, Jun 2, 2016 at 9:21 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 6/2/2016 1:28 AM, Selvam wrote:
> > We need to run a heavy SOLR with 300 million documents, with each
> > document having around 350 fields. The average length of the fields
> > will be around 100 characters, it may have date and integers fields as
> > well. Now we are not sure whether to have single server or run
> > multiple servers (for each node/shards?). We are using Solr 5.5 and
> > want best performance. We are new to SolrCloud, I would like to
> > request your inputs on how many nodes/shards we need to have and how
> > many servers for best performance. We primarily use geo-statial search.
>
> The really fast answer, which I know isn't really an answer, is this:
>
>
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> This is *also* the answer if I take time to really think about it ...
> and I do realize that none of this actually helps you.  You will need to
> prototype.  Ideally, your prototype should be the entire index.
> Performance will generally not scale linearly, so if you make decisions
> based on a small-scale prototype, you might find that you don't have
> enough hardware.
>
> The answer will be *heavily* influenced by how many of those 350 fields
> will be used for searching, sorting, faceting, etc.  It will also be
> influenced by the complexity of the queries, how fast the queries must
> complete, and how many queries per second the cluster must handle.
>
> With the information you have supplied, your whole index is likely to be
> in the 10-20TB range.  Performance on an index that large, even with
> plenty of hardware and good tuning, is probably not going to be
> stellar.  You are likely to need several terabytes of total RAM (across
> all servers) to achieve reasonable performance *on a single copy*.  If
> you want two copies of the index for high availability, your RAM
> requirements will double.  Handling an index this size is not going to
> be inexpensive.
>
> An unavoidable fact about Solr performance:  For best results, Solr must
> be able to read critical data entirely from RAM for queries.  If it must
> go to disk, then performance will not be optimal -- disks are REALLY
> slow.  Putting the data on SSD will help, but even SSD storage is quite
> a lot slower than RAM.
>
> For *perfect* performance, the index data on a server must fit entirely
> into unallocated memory -- which means memory beyond the Java heap and
> the basic operating system requirements.  The operating system (not
> Java) will automatically handle caching the index in this available
> memory.  This perfect situation is usually not required in practice,
> though -- the *entire* index is not needed when you do a query.
>
> Here's something I wrote about the topic of Solr performance.  It is not
> as comprehensive as I would like it to be, because I have tried to make
> it relatively concise and useful:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Shawn
>
>

Re: SOLR cloud sharding

Reply via email to