@Erick: thanks for sharing the knowledge on the hit ratio - evictions interplay. Sounds quite reasonable.
Dmitry On Sat, Aug 10, 2013 at 3:11 AM, Erick Erickson <erickerick...@gmail.com>wrote: > To add to what Shawn said, this filterCache is enormous. The key statistics > are > the hit ratio and evictions. Evictions aren't bad if the hit ratio is high. > If hit ratio is > low and evictions are high, only then should you consider making it larger. > So > I'd drop it back to 512. > > Hit ratios around 75% are my personal "too low" number, but YMMV... > > BUT, it's an LRU cache. So assuming you're forming a filter query for the > two "sides" and that you append an fq clause to every query, > you'll only need two entries <G>. Plus, of course, other fqs. > > The first thing I'd do is only return 10 rows, turn off highlighting, and > anything > else that comes to mind. Then add them back and see which ones > are causing you grief. > > Or add &debug=timing. That'll return a list of how much time each > component takes and may give you a clue as well. > > Best > Erick > > > On Fri, Aug 9, 2013 at 1:55 PM, Shawn Heisey <s...@elyograg.org> wrote: > > > On 8/9/2013 9:36 AM, Neal Ensor wrote: > > > >> I have an 8 million document solr index, roughly divided down the middle > >> by > >> an identifying "product" value, one of two distinct values. The > documents > >> in both "sides" are very similar, with stored text fields, etc. I have > >> two > >> nearly identical request handlers, one for each "side". > >> > >> When I perform very similar queries on either "side" for random phrases, > >> requesting 500 rows with highlighting on titles and summaries, I get > very > >> different results. One "side" consistently returns results in around > 1-2 > >> seconds, whereas the other one consistently returns in 6-10 seconds. I > >> don't see any reason why it's worse; each run of queries is deliberately > >> randomized to avoid caches getting in the way. Each test query returns > >> the > >> full first 500 in most cases. > >> > > > > My filter query cache configuration looks like: > >> > >> <filterCache class="solr.FastLRUCache" > >> size="750000" > >> initialSize="10000" > >> autowarmCount="0"/> > >> > > > > This filterCache is *enormous* ... even the initialSize is larger than I > > would normally expect to see for the total size. With 8 million > documents, > > each entry in the cache can be 1 megabyte, and in practice, the entry > will > > be either very small or it will be the full 1 megabyte ... depending on > how > > many documents get matched by a filter. This has the potential to chew > up a > > lot of RAM without really doing much for you. > > > > If the same problem happens when you drastically reduce the size of > > filterCache, I suspect basic performance problems. Even 1-2 seconds > seems > > very slow to me. > > > > The first questions I have are some statistics about your index and the > > server you're running it on. How big is that index in terms of disk > space? > > How much RAM are you allocating to the JVM? How much RAM is in the > entire > > machine? Is the machine running software other than Solr, such as a web > > server, database server, etc? What operating system are you running on, > is > > it 64 bit, and is Java 64 bit? > > > > Next, I'd like to know more about your queries. Can you include typical > > examples of all query parameters for both "sides"? What does the indexed > > and stored data look like for a typical document? Depending on what I > > learn here, I might need to see all or part of your config and schema. > > > > How often do you send updates/deletes to your index? How often and > > exactly how are you doing commits, and do you have any auto commit in > your > > config? > > > > Thanks, > > Shawn > > > > >