Tom, Yes, we've (Biz360) indexed 3 billion and upwards... If indexing is the issue (or rather re-indexing) we used SOLR-1301 with Hadoop to re-index efficiently (ie, in a timely manner). For querying we're currently using the out of the box Solr distributed shards query mechanism, which is hard (read, near impossible) to customize. I've been writing SOLR-1724 which deploy cores out of HDFS. SOLR-1724 works in conjunction with Solr Cloud which should allow for more efficient failover. Katta has a nice model for replicating cores across multiple servers for redundancy. The issue with this is, it could feasibly require 2 times as many servers for 2 times replication.
If you have more questions feel free to ping me or whatever. Cheers, Jason On Fri, Apr 2, 2010 at 8:57 AM, Burton-West, Tom <tburt...@umich.edu> wrote: > We are currently indexing 5 million books in Solr, scaling up over the next > few years to 20 million. However we are using the entire book as a Solr > document. We are evaluating the possibility of indexing individual pages as > there are some use cases where users want the most relevant pages regardless > of what book they occur in. However, we estimate that we are talking about > somewhere between 1 and 6 billion pages and have concerns over whether Solr > will scale to this level. > > Does anyone have experience using Solr with 1-6 billion Solr documents? > > The lucene file format document > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations) mentions > a limit of about 2 billion document ids. I assume this is the lucene > internal document id and would therefore be a per index/per shard limit. Is > this correct? > > > Tom Burton-West. > > > >