Hi Mike, I'm curious about what you said there: "People have constructed (lucene) indices with over a billion documents.". Are you referring to somebody specific? I've never heard of anyone creating a single Lucene index that large, but I'd love to know who did that.
Thanks, Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 27, 2007 6:20:40 PM Subject: Re: maximum index size On 3/27/07, Kevin Osborn <[EMAIL PROTECTED]> wrote: > I know there are a bunch of variables here (RAM, number of fields, hits, > etc.), but I am trying to get a sense of how big of an index in terms of > number of documents Solr can reasonable handle. I have heard indexes of 3-4 > million documents running fine. But, I have no idea what a reasonable upper > limit might be. People have constructed (lucene) indices with over a billion documents. But if "reasonable" means something like "<1s query time for a medium-complexity query on non-astronomical hardware", I wouldn't go much higher than the figure you quote. > I have a large number of documents and about 200-300 customers would have > access to varying subsets of those documents. So, one possible strategy is to > have everything in a large index, but duplicate the documents for each > customer that has access to that document. But, that would really make the > total number of documents huge. So, I am trying to get a sense of how big is > too big. Each document will probably have about 30 fields. Most of them will > be strings, but there will be some text, ints,a nd floats. If you are going to store a document for each customer then some field must indicate to which customer the document instance belongs. In that case, why not index a single copy of each document, with a field containing a list of customers having access? -Mike