Thanks all! One last question... If I had a collection of 2.5 billion docs and a demand averaging 200 queries per second, what's the confidence that Solr/Lucene could handle this volume and execute search with sub-second response times?
-----Original Message----- From: Charlie Jackson [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 1:32 PM To: solr-user@lucene.apache.org Subject: RE: dataset parameters suitable for lucene application Sorry, I meant that it maxed out in the sense that my maxDoc field on the stats page was 8.8 million, which indicates that the most docs it has ever had was around 8.8 million. It's down to about 7.8 million currently. I have seen no signs of a "maximum" number of docs Solr can handle. -----Original Message----- From: Chris Harris [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 11:49 AM To: solr-user@lucene.apache.org Subject: Re: dataset parameters suitable for lucene application By "maxed out" do you mean that Solr's performance became unacceptable beyond 8.8M records, or that you only had 8.8M records to index? If the former, can you share the particular symptoms? On 9/26/07, Charlie Jackson <[EMAIL PROTECTED]> wrote: > My experiences so far with this level of data have been good. > > Number of records: Maxed out at 8.8 million > Database size: friggin huge (100+ GB) > Index size: ~24 GB > > 1) It took me about a day to index 8 million docs using a non-optimized > program I wrote. It's non-optimized in the sense that it's not > multi-threaded. It batched together groups of about 5,000 docs at a time > to be indexed. > > 2) Search times for a basic search are almost always sub-second. If we > toss in some faceting, it takes a little longer, but I've hardly ever > seen it go above 1-2 seconds even with the most advanced queries. > > Hope that helps. > > > Charlie > > ____________________________________________ > > -----Original Message----- > From: Law, John [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 26, 2007 9:28 AM > To: solr-user@lucene.apache.org > Subject: dataset parameters suitable for lucene application > > I am new to the list and new to lucene and solr. I am considering Lucene > for a potential new application and need to know how well it scales. > > Following are the parameters of the dataset. > > Number of records: 7+ million > Database size: 13.3 GB > Index Size: 10.9 GB > > My questions are simply: > > 1) Approximately how long would it take Lucene to index these documents? > 2) What would the approximate retrieval time be (i.e. search response > time)? > > Can someone provide me with some informed guidance in this regard? > > Thanks in advance, > John > > ______________________________________________ > John Law > Director, Platform Management > ProQuest > 789 Eisenhower Parkway > Ann Arbor, MI 48106 > 734-997-4877 > [EMAIL PROTECTED] > www.proquest.com > www.csa.com > > ProQuest... Start here. > > > >