One other useful piece of information would be how big you expect your indexes to be. Which you should be able to estimate quite easily by indexing, say, 20,000 documents from the relevant databases.
Of particular interest will be the delta between the size of the index at, say, 10,000 documents and 20,000, since size is related to the number of unique terms per field and once you get past a certain number of terms, virtually every new term will already be in your index. Also, I think that the relevant metric is what the size is for *unstored* data since storing the fields isn't particularly relevant to search response time (although it can *certainly* be relevant to *total* time if you assemble a lot of stored fields to return). * *If your new to Lucene, the difference between stored and indexed is a bit confusing, so if the above is gibberish, you'd be well served by understanding the distinction before you go too far <G>. Best Erick On Wed, Jan 21, 2009 at 1:04 PM, Thomas Dowling <tdowl...@ohiolink.edu>wrote: > On 01/21/2009 12:25 PM, Matthew Runo wrote: > > At a certain level it will become better to have multiple smaller boxes > > rather than one huge one. I've found that even an old P4 with 2 gigs of > > ram has decent response time on our 150,000 item index with only a few > > users - but it quickly goes downhill if we get more than 5 or 6. How > > many documents are you going to be storing in your index? How much of > > them will be "stored" versus "indexed"? Will you be faceting on the > > results? > > Thanks for the tip on multiple boxes. We'll be hosting about 20 > databases total. A couple of them are in the 10- to 20-million record > range and a couple more are in the 5- to 10-million range. It's highly > structured data and I anticipate a lot of faceting and indexing almost > all the fields. > > > > > In general, I'd recommend a 64 bit processor with enough ram to store > > your index in ram - but that might not be possible with "millions" of > > records. Our 150,000 item index is about a gig and a half when optimized > > but yours will likely be different depending on how much you store. > > Faceting takes more memory than pure searching as well. > > > > This is very helpful. Thanks again. > > > -- > Thomas Dowling >