It seems (from observation only) that most of the documents on both sides of this equation have the same "weights". I don't see any wide swaths of unpopulated fields on the "good" side. Just wondering if there's some caching involved that I'm missing here, or something I can balance out better...
On Fri, Aug 9, 2013 at 11:39 AM, Raymond Wiker <rwi...@gmail.com> wrote: > On Aug 9, 2013, at 17:36 , Neal Ensor <nen...@gmail.com> wrote: > > So, I have an oddball question I have been battling with in the last day > or > > two. > > > > I have an 8 million document solr index, roughly divided down the middle > by > > an identifying "product" value, one of two distinct values. The > documents > > in both "sides" are very similar, with stored text fields, etc. I have > two > > nearly identical request handlers, one for each "side". > > > > When I perform very similar queries on either "side" for random phrases, > > requesting 500 rows with highlighting on titles and summaries, I get very > > different results. One "side" consistently returns results in around 1-2 > > seconds, whereas the other one consistently returns in 6-10 seconds. I > > don't see any reason why it's worse; each run of queries is deliberately > > randomized to avoid caches getting in the way. Each test query returns > the > > full first 500 in most cases. > > > > My filter query cache configuration looks like: > > > > <filterCache class="solr.FastLRUCache" > > size="750000" > > initialSize="10000" > > autowarmCount="0"/> > > > > (desperately trying to increase it, hoping this would help). The other > > caches are quite small; the use cases the customer is dealing with don't > > involve much in the way of paging, just returning a large initial set > with > > highlighting in the shortest time. > > > > I'm trying to optimize this down so the disparity between the two > "halves" > > is not so dramatic. Is there any optimizations or things I should be > > looking for to tune? Is it just the "way it is"? I've tried to argue to > > decrease the return set size, turn off highlighting, etc., but these seem > > to be out of the question. I would at least like some concrete reason > why > > one filter query would be so relatively out of whack than the other, > given > > the document ranges are very nearly half (3.8 million vs. 4.0 million in > > the slower side). > > > > Any pointers or suggestions would be appreciated. Thanks in advance. > > > > Neal Ensor > > nen...@gmail.com > > Does one side have mcuh more data in one of the fields that is being > returned?