Re: Improve Query Time For Large Index

Peter Karich Tue, 10 Aug 2010 13:16:54 -0700

Hi Tom,

my index is around 3GB large and I am using 2GB RAM for the JVM although
a some more is available.
If I am looking into the RAM usage while a slow query runs (via
jvisualvm) I see that only 750MB of the JVM RAM is used.


> Can you give us some examples of the slow queries?

for example the empty query solr/select?q=
takes very long or solr/select?q=http
where 'http' is the most common term

> Are you using stop words?  

yes, a lot. I stored them into stopwords.txt

> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2

this looks interesting. I read through
https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4.
I only need to enable it via:

<filter class="solr.CommonGramsFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>

right? Do I need to reindex?

Regards,
Peter.

> Hi Peter,
>
> A few more details about your setup would help list members to answer your 
> questions.
> How large is your index?  
> How much memory is on the machine and how much is allocated to the JVM?
> Besides the Solr caches, Solr and Lucene depend on the operating system's 
> disk caching for caching of postings lists.  So you need to leave some memory 
> for the OS.  On the other hand if you are optimizing and refreshing every 
> 10-15 minutes, that will invalidate all the caches, since an optimized index 
> is essentially a set of new files.
>
> Can you give us some examples of the slow queries?  Are you using stop words? 
>  
>
> If your slow queries are phrase queries, then you might try either adding the 
> most frequent terms in your index to the stopwords list  or try CommonGrams 
> and add them to the common words list.  (Details on CommonGrams here: 
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2)
>
> Tom Burton-West
>
> -----Original Message-----
> From: Peter Karich [mailto:peat...@yahoo.de] 
> Sent: Tuesday, August 10, 2010 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: Improve Query Time For Large Index
>
> Hi,
>
> I have 5 Million small documents/tweets (=> ~3GB) and the slave index
> replicates itself from master every 10-15 minutes, so the index is
> optimized before querying. We are using solr 1.4.1 (patched with
> SOLR-1624) via SolrJ.
>
> Now the search speed is slow >2s for common terms which hits more than 2
> mio docs and acceptable for others: <0.5s. For those numbers I don't use
> highlighting or facets. I am using the following schema [1] and from
> luke handler I know that numTerms =~20 mio. The query for common terms
> stays slow if I retry again and again (no cache improvements).
>
> How can I improve the query time for the common terms without using
> Distributed Search [2] ?
>
> Regards,
> Peter.
>
>
> [1]
> <field name="id" type="tlong" indexed="true" stored="true"
> required="true" />
> <field name="date" type="tdate" indexed="true" stored="true" />
> <!-- term* attributes to prepare faster highlighting. -->
> <field name="txt" type="text" indexed="true" stored="true"
>                termVectors="true" termPositions="true" termOffsets="true"/>
>
> [2]
> http://wiki.apache.org/solr/DistributedSearch
>
>
>   


-- 
http://karussell.wordpress.com/

Re: Improve Query Time For Large Index

Reply via email to