Erick Erickson [erickerick...@gmail.com] wrote: > Solr requires holding large parts of the index in memory. > For the entire corpus. At once.
That requirement is under the assumption that one must have the lowest possible latency at each individual box. You might as well argue for the fastest possible memory or the fastest possible CPU being a requirement. The advice is good in some contexts and a waste of money in other. I not-so-humbly point to http://sbdevel.wordpress.com/2014/08/13/whale-hunting-with-solr/ where we (for simple searches) handily achieve our goal of sub-second response times for a 10TB index with just 1.4% of the index cached in RAM. Had our goal been sub-50ms, it would be another matter, but it is not. Just as Wilburn's problem is not to minimize latency for each individual box, but to achieve a certain throughput for indexing, while performing searches. Wilburn's hardware is currently able to keep up, although barely, with 300B documents. He needs to handle 900B. Tripling (or quadrupling) the amount of machines should do the trick. Increasing the amount of RAM on each current machine might also work (qua the well known effect of RAM with Lucene/Solr). Using local SSDs, if he is not doing so already, might also work (qua the article above). - Toke Eskildsen