Hi Otis,

I'm afraid I wasn't thinking of anyone specific--just something I
recall reading on the lucene list.  I assumed that the "document" was
a very small piece of data.

Of course, it is also possible that the message I recall reading was
something like http://java2.5341.com/msg/91276.html, which doesn't
exactly boast completion of such a feat!

-Mike

On 3/28/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi Mike,

I'm curious about what you said there:  "People have constructed (lucene) 
indices with over a billion
documents.".  Are you referring to somebody specific?  I've never heard of 
anyone creating a single Lucene index that large, but I'd love to know who did that.

Thanks,
Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, March 27, 2007 6:20:40 PM
Subject: Re: maximum index size

On 3/27/07, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> I know there are a bunch of variables here (RAM, number of fields, hits, 
etc.), but I am trying to get a sense of how big of an index in terms of number of 
documents Solr can reasonable handle. I have heard indexes of 3-4 million 
documents running fine. But, I have no idea what a reasonable upper limit might be.

People have constructed (lucene) indices with over a billion
documents.  But if "reasonable" means something like "<1s query time
for a medium-complexity query on non-astronomical hardware", I
wouldn't go much higher than the figure you quote.

> I have a large number of documents and about 200-300 customers would have 
access to varying subsets of those documents. So, one possible strategy is to have 
everything in a large index, but duplicate the documents for each customer that 
has access to that document. But, that would really make the total number of 
documents huge. So, I am trying to get a sense of how big is too big. Each 
document will probably have about 30 fields. Most of them will be strings, but 
there will be some text, ints,a nd floats.

If you are going to store a document for each customer then some field
must indicate to which customer the document instance belongs.  In
that case, why not index a single copy of each document, with a field
containing a list of customers having access?

-Mike




Reply via email to