It seems (from observation only) that most of the documents on both sides
of this equation have the same "weights".  I don't see any wide swaths of
unpopulated fields on the "good" side.  Just wondering if there's some
caching involved that I'm missing here, or something I can balance out
better...


On Fri, Aug 9, 2013 at 11:39 AM, Raymond Wiker <rwi...@gmail.com> wrote:

> On Aug 9, 2013, at 17:36 , Neal Ensor <nen...@gmail.com> wrote:
> > So, I have an oddball question I have been battling with in the last day
> or
> > two.
> >
> > I have an 8 million document solr index, roughly divided down the middle
> by
> > an identifying "product" value, one of two distinct values.  The
> documents
> > in both "sides" are very similar, with stored text fields, etc.  I have
> two
> > nearly identical request handlers, one for each "side".
> >
> > When I perform very similar queries on either "side" for random phrases,
> > requesting 500 rows with highlighting on titles and summaries, I get very
> > different results.  One "side" consistently returns results in around 1-2
> > seconds, whereas the other one consistently returns in 6-10 seconds.  I
> > don't see any reason why it's worse; each run of queries is deliberately
> > randomized to avoid caches getting in the way.  Each test query returns
> the
> > full first 500 in most cases.
> >
> > My filter query cache configuration looks like:
> >
> > <filterCache class="solr.FastLRUCache"
> >                 size="750000"
> >                 initialSize="10000"
> >                 autowarmCount="0"/>
> >
> > (desperately trying to increase it, hoping this would help).  The other
> > caches are quite small; the use cases the customer is dealing with don't
> > involve much in the way of paging, just returning a large initial set
> with
> > highlighting in the shortest time.
> >
> > I'm trying to optimize this down so the disparity between the two
> "halves"
> > is not so dramatic.  Is there any optimizations or things I should be
> > looking for to tune?  Is it just the "way it is"?  I've tried to argue to
> > decrease the return set size, turn off highlighting, etc., but these seem
> > to be out of the question.  I would at least like some concrete reason
> why
> > one filter query would be so relatively out of whack than the other,
> given
> > the document ranges are very nearly half (3.8 million vs. 4.0 million in
> > the slower side).
> >
> > Any pointers or suggestions would be appreciated.  Thanks in advance.
> >
> > Neal Ensor
> > nen...@gmail.com
>
> Does one side have mcuh more data in one of the fields that is being
> returned?

Reply via email to