We are currently indexing 5 million books in Solr, scaling up over the next few years to 20 million. However we are using the entire book as a Solr document. We are evaluating the possibility of indexing individual pages as there are some use cases where users want the most relevant pages regardless of what book they occur in. However, we estimate that we are talking about somewhere between 1 and 6 billion pages and have concerns over whether Solr will scale to this level.
Does anyone have experience using Solr with 1-6 billion Solr documents? The lucene file format document (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations) mentions a limit of about 2 billion document ids. I assume this is the lucene internal document id and would therefore be a per index/per shard limit. Is this correct? Tom Burton-West.