On 8/9/2013 9:36 AM, Neal Ensor wrote:
I have an 8 million document solr index, roughly divided down the middle by
an identifying "product" value, one of two distinct values. The documents
in both "sides" are very similar, with stored text fields, etc. I have two
nearly identical request handlers, one for each "side".
When I perform very similar queries on either "side" for random phrases,
requesting 500 rows with highlighting on titles and summaries, I get very
different results. One "side" consistently returns results in around 1-2
seconds, whereas the other one consistently returns in 6-10 seconds. I
don't see any reason why it's worse; each run of queries is deliberately
randomized to avoid caches getting in the way. Each test query returns the
full first 500 in most cases.
My filter query cache configuration looks like:
<filterCache class="solr.FastLRUCache"
size="750000"
initialSize="10000"
autowarmCount="0"/>
This filterCache is *enormous* ... even the initialSize is larger than I
would normally expect to see for the total size. With 8 million
documents, each entry in the cache can be 1 megabyte, and in practice,
the entry will be either very small or it will be the full 1 megabyte
... depending on how many documents get matched by a filter. This has
the potential to chew up a lot of RAM without really doing much for you.
If the same problem happens when you drastically reduce the size of
filterCache, I suspect basic performance problems. Even 1-2 seconds
seems very slow to me.
The first questions I have are some statistics about your index and the
server you're running it on. How big is that index in terms of disk
space? How much RAM are you allocating to the JVM? How much RAM is in
the entire machine? Is the machine running software other than Solr,
such as a web server, database server, etc? What operating system are
you running on, is it 64 bit, and is Java 64 bit?
Next, I'd like to know more about your queries. Can you include typical
examples of all query parameters for both "sides"? What does the
indexed and stored data look like for a typical document? Depending on
what I learn here, I might need to see all or part of your config and
schema.
How often do you send updates/deletes to your index? How often and
exactly how are you doing commits, and do you have any auto commit in
your config?
Thanks,
Shawn