On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:
Thank-you for your response, this does shed some light on the subject.
Our basic question was why were we seeing slower responses the smaller
our result set got.
Currently we are searching about 1.2 million documents with the source
document about 2KB, but we do duplicate some of the data. I bumped
up my
filterCache to 5 million and the 2nd search I did for an non-indexed
term came back in 2.1 seconds so that is much improved. I am a little
concerned about having this value so high but this is our problem
and we
will play with it.
I do have a few follow-up questions. First, in regards to the
filterCache once a single search has been done and facets
requested, as
long as new facets aren't requested and the size is large enough then
the filters will remain in the cache, correct?
Also, you mention that faceting is more a "function of the number
of the
number of terms in the field". The 2 fields causing our problems are
Authors and Subjects. If we divided up the data that made these facets
into more specific fields (Primary author, secondary author, etc.)
would
this perform better? So the number of facet fields would increase but
the unique terms for a given facet should be less.
There are essentially two facet computation strategies:
1. cached bitsets: a bitset for each term is generated and
intersected with the query restul bitset. This is more general and
performs well up to a few thousand terms.
2. field enumeration: cache the field contents, and generate counts
using this data. Relatively independent of #unique terms, but
requires at most a single facet value per field per document.
So, if you factor author into Primary author/Secondary author, where
each is guaranteed to only have one value per doc, this could greatly
accelerate your faceting. There are probably fewer unique subjects,
so strategy 1 is likely fine.
To use strategy 2, just make sure that multivalued="false" is set for
those fields in schema.xml
-Mike