On 8/9/2013 9:36 AM, Neal Ensor wrote:
I have an 8 million document solr index, roughly divided down the middle by
an identifying "product" value, one of two distinct values.  The documents
in both "sides" are very similar, with stored text fields, etc.  I have two
nearly identical request handlers, one for each "side".

When I perform very similar queries on either "side" for random phrases,
requesting 500 rows with highlighting on titles and summaries, I get very
different results.  One "side" consistently returns results in around 1-2
seconds, whereas the other one consistently returns in 6-10 seconds.  I
don't see any reason why it's worse; each run of queries is deliberately
randomized to avoid caches getting in the way.  Each test query returns the
full first 500 in most cases.

My filter query cache configuration looks like:

<filterCache class="solr.FastLRUCache"
                  size="750000"
                  initialSize="10000"
                  autowarmCount="0"/>

This filterCache is *enormous* ... even the initialSize is larger than I would normally expect to see for the total size. With 8 million documents, each entry in the cache can be 1 megabyte, and in practice, the entry will be either very small or it will be the full 1 megabyte ... depending on how many documents get matched by a filter. This has the potential to chew up a lot of RAM without really doing much for you.

If the same problem happens when you drastically reduce the size of filterCache, I suspect basic performance problems. Even 1-2 seconds seems very slow to me.

The first questions I have are some statistics about your index and the server you're running it on. How big is that index in terms of disk space? How much RAM are you allocating to the JVM? How much RAM is in the entire machine? Is the machine running software other than Solr, such as a web server, database server, etc? What operating system are you running on, is it 64 bit, and is Java 64 bit?

Next, I'd like to know more about your queries. Can you include typical examples of all query parameters for both "sides"? What does the indexed and stored data look like for a typical document? Depending on what I learn here, I might need to see all or part of your config and schema.

How often do you send updates/deletes to your index? How often and exactly how are you doing commits, and do you have any auto commit in your config?

Thanks,
Shawn

Reply via email to