Hi Otis, I'm afraid I wasn't thinking of anyone specific--just something I recall reading on the lucene list. I assumed that the "document" was a very small piece of data.
Of course, it is also possible that the message I recall reading was something like http://java2.5341.com/msg/91276.html, which doesn't exactly boast completion of such a feat! -Mike On 3/28/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi Mike, I'm curious about what you said there: "People have constructed (lucene) indices with over a billion documents.". Are you referring to somebody specific? I've never heard of anyone creating a single Lucene index that large, but I'd love to know who did that. Thanks, Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 27, 2007 6:20:40 PM Subject: Re: maximum index size On 3/27/07, Kevin Osborn <[EMAIL PROTECTED]> wrote: > I know there are a bunch of variables here (RAM, number of fields, hits, etc.), but I am trying to get a sense of how big of an index in terms of number of documents Solr can reasonable handle. I have heard indexes of 3-4 million documents running fine. But, I have no idea what a reasonable upper limit might be. People have constructed (lucene) indices with over a billion documents. But if "reasonable" means something like "<1s query time for a medium-complexity query on non-astronomical hardware", I wouldn't go much higher than the figure you quote. > I have a large number of documents and about 200-300 customers would have access to varying subsets of those documents. So, one possible strategy is to have everything in a large index, but duplicate the documents for each customer that has access to that document. But, that would really make the total number of documents huge. So, I am trying to get a sense of how big is too big. Each document will probably have about 30 fields. Most of them will be strings, but there will be some text, ints,a nd floats. If you are going to store a document for each customer then some field must indicate to which customer the document instance belongs. In that case, why not index a single copy of each document, with a field containing a list of customers having access? -Mike