Hi Tom, my index is around 3GB large and I am using 2GB RAM for the JVM although a some more is available. If I am looking into the RAM usage while a slow query runs (via jvisualvm) I see that only 750MB of the JVM RAM is used.
> Can you give us some examples of the slow queries? for example the empty query solr/select?q= takes very long or solr/select?q=http where 'http' is the most common term > Are you using stop words? yes, a lot. I stored them into stopwords.txt > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 this looks interesting. I read through https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4. I only need to enable it via: <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" words="stopwords.txt"/> right? Do I need to reindex? Regards, Peter. > Hi Peter, > > A few more details about your setup would help list members to answer your > questions. > How large is your index? > How much memory is on the machine and how much is allocated to the JVM? > Besides the Solr caches, Solr and Lucene depend on the operating system's > disk caching for caching of postings lists. So you need to leave some memory > for the OS. On the other hand if you are optimizing and refreshing every > 10-15 minutes, that will invalidate all the caches, since an optimized index > is essentially a set of new files. > > Can you give us some examples of the slow queries? Are you using stop words? > > > If your slow queries are phrase queries, then you might try either adding the > most frequent terms in your index to the stopwords list or try CommonGrams > and add them to the common words list. (Details on CommonGrams here: > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2) > > Tom Burton-West > > -----Original Message----- > From: Peter Karich [mailto:peat...@yahoo.de] > Sent: Tuesday, August 10, 2010 9:54 AM > To: solr-user@lucene.apache.org > Subject: Improve Query Time For Large Index > > Hi, > > I have 5 Million small documents/tweets (=> ~3GB) and the slave index > replicates itself from master every 10-15 minutes, so the index is > optimized before querying. We are using solr 1.4.1 (patched with > SOLR-1624) via SolrJ. > > Now the search speed is slow >2s for common terms which hits more than 2 > mio docs and acceptable for others: <0.5s. For those numbers I don't use > highlighting or facets. I am using the following schema [1] and from > luke handler I know that numTerms =~20 mio. The query for common terms > stays slow if I retry again and again (no cache improvements). > > How can I improve the query time for the common terms without using > Distributed Search [2] ? > > Regards, > Peter. > > > [1] > <field name="id" type="tlong" indexed="true" stored="true" > required="true" /> > <field name="date" type="tdate" indexed="true" stored="true" /> > <!-- term* attributes to prepare faster highlighting. --> > <field name="txt" type="text" indexed="true" stored="true" > termVectors="true" termPositions="true" termOffsets="true"/> > > [2] > http://wiki.apache.org/solr/DistributedSearch > > > -- http://karussell.wordpress.com/