Re: Question about filter query: "half" of my index is slower than the other?

Dmitry Kan Thu, 15 Aug 2013 07:55:48 -0700

@Erick: thanks for sharing the knowledge on the hit ratio - evictions
interplay. Sounds quite reasonable.


Dmitry


On Sat, Aug 10, 2013 at 3:11 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> To add to what Shawn said, this filterCache is enormous. The key statistics
> are
> the hit ratio and evictions. Evictions aren't bad if the hit ratio is high.
> If hit ratio is
> low and evictions are high, only then should you consider making it larger.
> So
> I'd drop it back to 512.
>
> Hit ratios around 75% are my personal "too low" number, but YMMV...
>
> BUT, it's an LRU cache. So assuming you're forming a filter query for the
> two "sides" and that you append an fq clause to every query,
> you'll only need two entries <G>. Plus, of course, other fqs.
>
> The first thing I'd do is only return 10 rows, turn off highlighting, and
> anything
> else that comes to mind. Then add them back and see which ones
> are causing you grief.
>
> Or add &debug=timing. That'll return a list of how much time each
> component takes and may give you a clue as well.
>
> Best
> Erick
>
>
> On Fri, Aug 9, 2013 at 1:55 PM, Shawn Heisey <s...@elyograg.org> wrote:
>
> > On 8/9/2013 9:36 AM, Neal Ensor wrote:
> >
> >> I have an 8 million document solr index, roughly divided down the middle
> >> by
> >> an identifying "product" value, one of two distinct values.  The
> documents
> >> in both "sides" are very similar, with stored text fields, etc.  I have
> >> two
> >> nearly identical request handlers, one for each "side".
> >>
> >> When I perform very similar queries on either "side" for random phrases,
> >> requesting 500 rows with highlighting on titles and summaries, I get
> very
> >> different results.  One "side" consistently returns results in around
> 1-2
> >> seconds, whereas the other one consistently returns in 6-10 seconds.  I
> >> don't see any reason why it's worse; each run of queries is deliberately
> >> randomized to avoid caches getting in the way.  Each test query returns
> >> the
> >> full first 500 in most cases.
> >>
> >
> >  My filter query cache configuration looks like:
> >>
> >> <filterCache class="solr.FastLRUCache"
> >>                   size="750000"
> >>                   initialSize="10000"
> >>                   autowarmCount="0"/>
> >>
> >
> > This filterCache is *enormous* ... even the initialSize is larger than I
> > would normally expect to see for the total size.  With 8 million
> documents,
> > each entry in the cache can be 1 megabyte, and in practice, the entry
> will
> > be either very small or it will be the full 1 megabyte ... depending on
> how
> > many documents get matched by a filter. This has the potential to chew
> up a
> > lot of RAM without really doing much for you.
> >
> > If the same problem happens when you drastically reduce the size of
> > filterCache, I suspect basic performance problems.  Even 1-2 seconds
> seems
> > very slow to me.
> >
> > The first questions I have are some statistics about your index and the
> > server you're running it on.  How big is that index in terms of disk
> space?
> >  How much RAM are you allocating to the JVM?  How much RAM is in the
> entire
> > machine?  Is the machine running software other than Solr, such as a web
> > server, database server, etc?  What operating system are you running on,
> is
> > it 64 bit, and is Java 64 bit?
> >
> > Next, I'd like to know more about your queries.  Can you include typical
> > examples of all query parameters for both "sides"?  What does the indexed
> > and stored data look like for a typical document?  Depending on what I
> > learn here, I might need to see all or part of your config and schema.
> >
> > How often do you send updates/deletes to your index?  How often and
> > exactly how are you doing commits, and do you have any auto commit in
> your
> > config?
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Question about filter query: "half" of my index is slower than the other?

Reply via email to