On 12/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: My data is 492,000 records of book data.  I am faceting on 4 fields:
: author, subject, language, format.
: Format and language are fairly simple as their are only a few unique
: terms.  Author and subject however are much different in that there are
: thousands of unique terms.

by the looks of it, you have a lot more then a few thousand unique terms
in those two fields ... are you tokenizing on these fields?  that's
probably not what you want for ields you're going to facet on.

Right, if any of these are tokenized, then you could make them
non-tokenized (use "string" type).  If they really need to be
tokenized (author for example), then you could use copyField to make
another copy to a non-tokenized field that you can use for faceting.

After that, as Hoss suggests, run a single faceting query with all 4
fields and look at the filterCache statistics.  Take the "lookups"
number and multiply it by, say, 1.5 to leave some room for future
growth, and use that as your cache size.  You probably want to bump up
both initialSize and autowarmCount as well.

The first query will still be slow.  The second should be relatively fast.
You may hit an OOM error.  Increase the JVM heap size if this happens.

-Yonik

Reply via email to