Here's a blog outlining why this is so hard to answer: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
Just one example from your post, you mention index size as a metric. It's often useless. Stored data ('stored="true" ') is placed in files with special extensions (*.fdt and *.fdx). These have virtually no effect on search requirements. They can occupy 10% of your on-disk space or 90% of your disk space..... Gotta prototype and measure.... Best Erick On Wed, Aug 29, 2012 at 5:45 PM, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > Unfortunately the answer for this can vary quite a bit based on a > number of factors: > > 1. Whether or not fields are stored, > 2. Document size, > 3. Total term count, > 4. Solr version > > etc. > > We have two major indexes, one for servicing online queries, and one > for batch processing. Our batch index is performance critical and > therefore was optimized for throughput, was stored in RAM, and has > less stored fields than the online query one. The batch index shards > are 25Gb or less, and we're trending toward smaller and more numerous > shards. This is with 1.4, and I'm just finishing up on our migration > to 3.6.1. > > Michael Della Bitta > > P.S. Why'd you CC honeybadger? Honeybadger don't care... > > ------------------------------------------------ > Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 > www.appinions.com > Where Influence Isn’t a Game > > > On Wed, Aug 29, 2012 at 5:17 PM, Michael Brandt > <michael.j.bra...@colorado.edu> wrote: >> Hi all, >> >> I am looking for information on how many documents may be indexed by a >> single instance of Solr (not using shards) before performance issues are >> encountered. In searching the internet I've come across some varying >> answers; one answer suggest 50GBs is >> problematic<http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656848.html>; >> this blog >> post<http://harish11g.blogspot.com/2012/02/apache-solr-sharding-amazon-ec2.html>on >> sharding Solr in AWS says sharding is not necessary until you have >> "millions of records," but is no more specific. >> >> What experiences have you had with this? At what point did you find it >> necessary to scale up Solr, in terms of both number of records and size of >> index (whether MB, GB, etc.)? >> >> Thanks, >> Michael Brandt