Re: Performance optimization of Proximity/Wildcard searches

Otis Gospodnetic Fri, 04 Feb 2011 18:46:08 -0800

Heh, I'm not sure if this is valid thinking. :)

By *matching* doc distribution I meant: what proportion of your millions of 
documents actually ever get matched and then how many of those make it to the 
UI.
If you have 1000 queries in a day and they all end up matching only 3 of your 
docs, the system will need less RAM than a system where 1000 queries match 
50000 
different docs.


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Salman Akram <salman.ak...@northbaysolutions.net>
> To: solr-user@lucene.apache.org
> Sent: Fri, February 4, 2011 3:38:55 PM
> Subject: Re: Performance optimization of Proximity/Wildcard searches
> 
> Well I assume many people out there would have indexes larger than 100GB  and
> I don't think so normally you will have more RAM than 32GB or  64!
> 
> As I mentioned the queries are mostly phrase, proximity, wildcard  and
> combination of these.
> 
> What exactly do you mean by distribution of  documents? On this index our
> documents are not more than few hundred KB's on  average (file system size)
> and there are around 14 million documents. 80% of  the index size is taken up
> by position file. I am not sure if this is what  you asked?
> 
> On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com
> >  wrote:
> 
> > Hi,
> >
> >
> > > Sharding is an  option  too but that too comes with limitations so want to
> > > keep that as a  last  resort but I think there must be other things coz
> >  150GB
> > > is not too big for  one drive/server with 32GB  Ram.
> >
> > Hmm.... what makes you think 32 GB is enough for your 150  GB index?
> > It depends on queries and distribution of matching documents,  for example.
> > What's yours like?
> >
> > Otis
> >  ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> >
> >
> >
> > ----- Original  Message ----
> > > From: Salman Akram <salman.ak...@northbaysolutions.net>
> >  > To: solr-user@lucene.apache.org
> >  > Sent: Tue, January 25, 2011 4:20:34 AM
> > > Subject: Performance  optimization of Proximity/Wildcard searches
> > >
> > >  Hi,
> > >
> > > I am facing performance issues in three types of  queries (and  their
> > > combination). Some of the queries take  more than 2-3 mins. Index size  is
> > > around 150GB.
> >  >
> > >
> > >    - Wildcard
> > >     -  Proximity
> > >    - Phrases (with common  words)
> > >
> > > I know CommonGrams and  Stop words are a  good way to resolve such issues
> > but
> > > they don't fulfill  our  functional requirements (Common Grams seem to
> > have
> >  > issues with phrase  proximity, stop words have issues with exact  match
> > etc).
> > >
> > > Sharding is an  option too  but that too comes with limitations so want to
> > > keep that as a  last  resort but I think there must be other things coz
> >  150GB
> > > is not too big for  one drive/server with 32GB  Ram.
> > >
> > > Cache warming is a good option too but  the  index get updated every hour
> > so
> > > not sure how much would  that  help.
> > >
> > > What are the other main tips that can  help in performance  optimization
> > of
> > > the above  queries?
> > >
> > > Thanks
> > >
> > > --
> >  > Regards,
> > >
> > > Salman Akram
> >  >
> >
> 
> 
> 
> -- 
> Regards,
> 
> Salman Akram
>

Re: Performance optimization of Proximity/Wildcard searches

Reply via email to