On Aug 9, 2013, at 17:36 , Neal Ensor <nen...@gmail.com> wrote:
> So, I have an oddball question I have been battling with in the last day or
> two.
> 
> I have an 8 million document solr index, roughly divided down the middle by
> an identifying "product" value, one of two distinct values.  The documents
> in both "sides" are very similar, with stored text fields, etc.  I have two
> nearly identical request handlers, one for each "side".
> 
> When I perform very similar queries on either "side" for random phrases,
> requesting 500 rows with highlighting on titles and summaries, I get very
> different results.  One "side" consistently returns results in around 1-2
> seconds, whereas the other one consistently returns in 6-10 seconds.  I
> don't see any reason why it's worse; each run of queries is deliberately
> randomized to avoid caches getting in the way.  Each test query returns the
> full first 500 in most cases.
> 
> My filter query cache configuration looks like:
> 
> <filterCache class="solr.FastLRUCache"
>                 size="750000"
>                 initialSize="10000"
>                 autowarmCount="0"/>
> 
> (desperately trying to increase it, hoping this would help).  The other
> caches are quite small; the use cases the customer is dealing with don't
> involve much in the way of paging, just returning a large initial set with
> highlighting in the shortest time.
> 
> I'm trying to optimize this down so the disparity between the two "halves"
> is not so dramatic.  Is there any optimizations or things I should be
> looking for to tune?  Is it just the "way it is"?  I've tried to argue to
> decrease the return set size, turn off highlighting, etc., but these seem
> to be out of the question.  I would at least like some concrete reason why
> one filter query would be so relatively out of whack than the other, given
> the document ranges are very nearly half (3.8 million vs. 4.0 million in
> the slower side).
> 
> Any pointers or suggestions would be appreciated.  Thanks in advance.
> 
> Neal Ensor
> nen...@gmail.com

Does one side have mcuh more data in one of the fields that is being returned?

Reply via email to