Hi Tom!

> Hi Peter,
>
> Can you give a few more examples of slow queries?  
> Are they phrase queries? Boolean queries? prefix or wildcard queries?
>   

I am experimenting with one word queries only at the moment.

> If one word queries are your slow queries, than CommonGrams won't help.  
> CommonGrams will only help with phrase queries.
>   

hmmh, ok.

> How are you using termvectors? 
yes.

> That may be slowing things down.  I don't have experience with termvectors, 
> so someone else on the list might speak to that.
>   

ok. But for highlighting I'll need them to speed things up (a lot).


> When you say the query time for common terms stays slow, do you mean if you 
> re-issue the exact query, the second query is not faster?  That seems very 
> strange. 

Yes. Indeed. The queryResultCache has no hits at all. Strange.

>  You might restart Solr, and send a first query (the first query always takes 
> a relatively long time.)  Then pick one of your slow queries and send it 2 
> times.  The second time you send the query it should be much faster due to 
> the Solr caches and you should be able to see the cache hit in the Solr admin 
> panel.  If you send the exact query a second time (without enough intervening 
> queries to evict data from the cache, ) the Solr queryResultCache should get 
> hit and you should see a response time in the .01-5 millisecond range.
>   

That's not the case. The second query is only some few milliseconds
faster (but stays >2s). But I'm not sure what I am doing wrong. The
other 3 caches have a good hitratio but queryResultCache has 0. For
queryResultCache I am using:
<queryResultCache class="solr.LRUCache" size="400" initialSize="400"
autowarmCount="400"/>

But even if I double that it didn't make the hitratio > 0

> How much memory is on the machine?  If your bottleneck is disk i/o for 
> frequent terms, then you want to make sure you have enough memory for the OS 
> disk cache.  
>   

Yes, there should be enough memory for the OS-disc-cache.

> I assume that http is not in your stopwords.

exactly.


> CommonGrams will only help with phrase queries. CommonGrams was committed and 
> is in Solr 1.4.  If you decide to use CommonGrams you definitely need to 
> re-index and you also need to use both the index time filter and the query 
> time filter.  Your index will be larger.
>
> <fieldType name="foo" ...>
> <analyzer type="index">
> <filter class="solr.CommonGramsFilterFactory" words="new400common.txt"/>
> </analyzer>
>
> <analyzer type="query">
> <filter class="solr.CommonGramsQueryFilterFactory" words="new400common.txt"/>
> </analyzer>
> </fieldType>
>   

Thanks, I will try that, if I can solve the current issue :-)
And thanks for all your answers, I will try to experiment with my setup
in more detail now ...

Kind regards,
Peter.



> Subject: Re: Improve Query Time For Large Index
>
> Hi Tom,
>
> my index is around 3GB large and I am using 2GB RAM for the JVM although
> a some more is available.
> If I am looking into the RAM usage while a slow query runs (via
> jvisualvm) I see that only 750MB of the JVM RAM is used.
>
>   
>> Can you give us some examples of the slow queries?
>>     
> for example the empty query solr/select?q=
> takes very long or solr/select?q=http
> where 'http' is the most common term
>
>   
>> Are you using stop words?  
>>     
> yes, a lot. I stored them into stopwords.txt
>
>   
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
>>     
> this looks interesting. I read through
> https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4.
> I only need to enable it via:
>
> <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>
> right? Do I need to reindex?
>
> Regards,
> Peter.
>
>   
>> Hi Peter,
>>
>> A few more details about your setup would help list members to answer your 
>> questions.
>> How large is your index?  
>> How much memory is on the machine and how much is allocated to the JVM?
>> Besides the Solr caches, Solr and Lucene depend on the operating system's 
>> disk caching for caching of postings lists.  So you need to leave some 
>> memory for the OS.  On the other hand if you are optimizing and refreshing 
>> every 10-15 minutes, that will invalidate all the caches, since an optimized 
>> index is essentially a set of new files.
>>
>> Can you give us some examples of the slow queries?  Are you using stop 
>> words?  
>>
>> If your slow queries are phrase queries, then you might try either adding 
>> the most frequent terms in your index to the stopwords list  or try 
>> CommonGrams and add them to the common words list.  (Details on CommonGrams 
>> here: 
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2)
>>
>> Tom Burton-West
>>
>> -----Original Message-----
>> From: Peter Karich [mailto:peat...@yahoo.de] 
>> Sent: Tuesday, August 10, 2010 9:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Improve Query Time For Large Index
>>
>> Hi,
>>
>> I have 5 Million small documents/tweets (=> ~3GB) and the slave index
>> replicates itself from master every 10-15 minutes, so the index is
>> optimized before querying. We are using solr 1.4.1 (patched with
>> SOLR-1624) via SolrJ.
>>
>> Now the search speed is slow >2s for common terms which hits more than 2
>> mio docs and acceptable for others: <0.5s. For those numbers I don't use
>> highlighting or facets. I am using the following schema [1] and from
>> luke handler I know that numTerms =~20 mio. The query for common terms
>> stays slow if I retry again and again (no cache improvements).
>>
>> How can I improve the query time for the common terms without using
>> Distributed Search [2] ?
>>
>> Regards,
>> Peter.
>>
>>
>> [1]
>> <field name="id" type="tlong" indexed="true" stored="true"
>> required="true" />
>> <field name="date" type="tdate" indexed="true" stored="true" />
>> <!-- term* attributes to prepare faster highlighting. -->
>> <field name="txt" type="text" indexed="true" stored="true"
>>                termVectors="true" termPositions="true" termOffsets="true"/>
>>
>> [2]
>> http://wiki.apache.org/solr/DistributedSearch
>>     

Reply via email to