The 2B limitation is within one shard, due to using a signed 32-bit
integer. There is no limit in that regard in sharding- Distributed
Search uses the stored unique document id rather than the internal
docid.

On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens <richcari...@gmail.com> wrote:
> A colleague of mine is using native Lucene + some home-grown
> patches/optimizations to index over 13B small documents in a 32-shard
> environment, which is around 406M docs per shard.
>
> If there's a 2B doc id limitation in Lucene then I assume he's patched it
> himself.
>
> On Fri, Apr 2, 2010 at 1:17 PM, <dar...@ontrenet.com> wrote:
>
>> My guess is that you will need to take advantage of Solr 1.5's upcoming
>> cloud/cluster renovations and use multiple indexes to comfortably achieve
>> those numbers. Hypthetically, in that case, you won't be limited by single
>> index docid limitations of Lucene.
>>
>> > We are currently indexing 5 million books in Solr, scaling up over the
>> > next few years to 20 million.  However we are using the entire book as a
>> > Solr document.  We are evaluating the possibility of indexing individual
>> > pages as there are some use cases where users want the most relevant
>> pages
>> > regardless of what book they occur in.  However, we estimate that we are
>> > talking about somewhere between 1 and 6 billion pages and have concerns
>> > over whether Solr will scale to this level.
>> >
>> > Does anyone have experience using Solr with 1-6 billion Solr documents?
>> >
>> > The lucene file format document
>> > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations)
>> > mentions a limit of about 2 billion document ids.   I assume this is the
>> > lucene internal document id and would therefore be a per index/per shard
>> > limit.  Is this correct?
>> >
>> >
>> > Tom Burton-West.
>> >
>> >
>> >
>> >
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to