The 2B limitation is within one shard, due to using a signed 32-bit integer. There is no limit in that regard in sharding- Distributed Search uses the stored unique document id rather than the internal docid.
On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens <richcari...@gmail.com> wrote: > A colleague of mine is using native Lucene + some home-grown > patches/optimizations to index over 13B small documents in a 32-shard > environment, which is around 406M docs per shard. > > If there's a 2B doc id limitation in Lucene then I assume he's patched it > himself. > > On Fri, Apr 2, 2010 at 1:17 PM, <dar...@ontrenet.com> wrote: > >> My guess is that you will need to take advantage of Solr 1.5's upcoming >> cloud/cluster renovations and use multiple indexes to comfortably achieve >> those numbers. Hypthetically, in that case, you won't be limited by single >> index docid limitations of Lucene. >> >> > We are currently indexing 5 million books in Solr, scaling up over the >> > next few years to 20 million. However we are using the entire book as a >> > Solr document. We are evaluating the possibility of indexing individual >> > pages as there are some use cases where users want the most relevant >> pages >> > regardless of what book they occur in. However, we estimate that we are >> > talking about somewhere between 1 and 6 billion pages and have concerns >> > over whether Solr will scale to this level. >> > >> > Does anyone have experience using Solr with 1-6 billion Solr documents? >> > >> > The lucene file format document >> > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations) >> > mentions a limit of about 2 billion document ids. I assume this is the >> > lucene internal document id and would therefore be a per index/per shard >> > limit. Is this correct? >> > >> > >> > Tom Burton-West. >> > >> > >> > >> > >> >> > -- Lance Norskog goks...@gmail.com