Heh, I'm not sure if this is valid thinking. :) By *matching* doc distribution I meant: what proportion of your millions of documents actually ever get matched and then how many of those make it to the UI. If you have 1000 queries in a day and they all end up matching only 3 of your docs, the system will need less RAM than a system where 1000 queries match 50000 different docs.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Salman Akram <salman.ak...@northbaysolutions.net> > To: solr-user@lucene.apache.org > Sent: Fri, February 4, 2011 3:38:55 PM > Subject: Re: Performance optimization of Proximity/Wildcard searches > > Well I assume many people out there would have indexes larger than 100GB and > I don't think so normally you will have more RAM than 32GB or 64! > > As I mentioned the queries are mostly phrase, proximity, wildcard and > combination of these. > > What exactly do you mean by distribution of documents? On this index our > documents are not more than few hundred KB's on average (file system size) > and there are around 14 million documents. 80% of the index size is taken up > by position file. I am not sure if this is what you asked? > > On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com > > wrote: > > > Hi, > > > > > > > Sharding is an option too but that too comes with limitations so want to > > > keep that as a last resort but I think there must be other things coz > > 150GB > > > is not too big for one drive/server with 32GB Ram. > > > > Hmm.... what makes you think 32 GB is enough for your 150 GB index? > > It depends on queries and distribution of matching documents, for example. > > What's yours like? > > > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > ----- Original Message ---- > > > From: Salman Akram <salman.ak...@northbaysolutions.net> > > > To: solr-user@lucene.apache.org > > > Sent: Tue, January 25, 2011 4:20:34 AM > > > Subject: Performance optimization of Proximity/Wildcard searches > > > > > > Hi, > > > > > > I am facing performance issues in three types of queries (and their > > > combination). Some of the queries take more than 2-3 mins. Index size is > > > around 150GB. > > > > > > > > > - Wildcard > > > - Proximity > > > - Phrases (with common words) > > > > > > I know CommonGrams and Stop words are a good way to resolve such issues > > but > > > they don't fulfill our functional requirements (Common Grams seem to > > have > > > issues with phrase proximity, stop words have issues with exact match > > etc). > > > > > > Sharding is an option too but that too comes with limitations so want to > > > keep that as a last resort but I think there must be other things coz > > 150GB > > > is not too big for one drive/server with 32GB Ram. > > > > > > Cache warming is a good option too but the index get updated every hour > > so > > > not sure how much would that help. > > > > > > What are the other main tips that can help in performance optimization > > of > > > the above queries? > > > > > > Thanks > > > > > > -- > > > Regards, > > > > > > Salman Akram > > > > > > > > > -- > Regards, > > Salman Akram >