Hi Mike,

I'm curious about what you said there:  "People have constructed (lucene) 
indices with over a billion
documents.".  Are you referring to somebody specific?  I've never heard of 
anyone creating a single Lucene index that large, but I'd love to know who did 
that.

Thanks,
Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, March 27, 2007 6:20:40 PM
Subject: Re: maximum index size

On 3/27/07, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> I know there are a bunch of variables here (RAM, number of fields, hits, 
> etc.), but I am trying to get a sense of how big of an index in terms of 
> number of documents Solr can reasonable handle. I have heard indexes of 3-4 
> million documents running fine. But, I have no idea what a reasonable upper 
> limit might be.

People have constructed (lucene) indices with over a billion
documents.  But if "reasonable" means something like "<1s query time
for a medium-complexity query on non-astronomical hardware", I
wouldn't go much higher than the figure you quote.

> I have a large number of documents and about 200-300 customers would have 
> access to varying subsets of those documents. So, one possible strategy is to 
> have everything in a large index, but duplicate the documents for each 
> customer that has access to that document. But, that would really make the 
> total number of documents huge. So, I am trying to get a sense of how big is 
> too big. Each document will probably have about 30 fields. Most of them will 
> be strings, but there will be some text, ints,a nd floats.

If you are going to store a document for each customer then some field
must indicate to which customer the document instance belongs.  In
that case, why not index a single copy of each document, with a field
containing a list of customers having access?

-Mike



Reply via email to