RE: Solr cloud performance degradation with billions of documents

Toke Eskildsen Thu, 14 Aug 2014 11:58:38 -0700

Erick Erickson [erickerick...@gmail.com] wrote:
> Solr requires holding large parts of the index in memory.
> For the entire corpus. At once.


That requirement is under the assumption that one must have the lowest possible 
latency at each individual box. You might as well argue for the fastest 
possible memory or the fastest possible CPU being a requirement. The advice is 
good in some contexts and a waste of money in other.

I not-so-humbly point to 
http://sbdevel.wordpress.com/2014/08/13/whale-hunting-with-solr/ where we (for 
simple searches) handily achieve our goal of sub-second response times for a 
10TB index with just 1.4% of the index cached in RAM. Had our goal been 
sub-50ms, it would be another matter, but it is not. Just as Wilburn's problem 
is not to minimize latency for each individual box, but to achieve a certain 
throughput for indexing, while performing searches.

Wilburn's hardware is currently able to keep up, although barely, with 300B 
documents. He needs to handle 900B. Tripling (or quadrupling) the amount of 
machines should do the trick. Increasing the amount of RAM on each current 
machine might also work (qua the well known effect of RAM with Lucene/Solr). 
Using local SSDs, if he is not doing so already, might also work (qua the 
article above).

- Toke Eskildsen

RE: Solr cloud performance degradation with billions of documents

Reply via email to