Thanks for all the great answers.
Quick Question: did you say you are faceting on the first name field
seperately from the last name field? ... why?
You misunderstood. I'm doing faceting on first author, and last author
of the list. Life science papers have authors list, and the first one is
usually the guy who did most of the work, and the last one is usually
the boss of the lab. I already have untokenized author fields for that
using copyField.
Second: you mentioned increasing hte size of your filterCache
significantly, but we don't really know how heterogenous your index
is ...
once you made that cahnge did your filterCache hitrate increase? ..
do you
have any evictions (you can check on the "Statistics" page)
It was at the default (16000) and it hit the ceiling so to speak. I did
maxSize=16000000 (for testing purpose) and now size : 17038 and 0
evictions. For a single facet field (journal name) with a limit of 5 and
12 faceted query fields (range on publication date), I now have 0.5
seconds search, which is not too bad. The filtercache size is pretty
much constant no matter how many queries I do.
However, if I try to add another facet field (such as first_author),
something strange happens. 99% CPU, the filter cache is filling up
really fast, hitratio goes to hell, no disk activity, and it can stay
that way for at least 30 minutes (didn't test longer, no point really).
It turns out that journal_name has 17038 different tokens, which is
manageable, but first_author has > 400 000. I don't think this will ever
yield good performance, so i might only do journal_name facets.
Any reasons why facets tries to preload every term in the field?
I have noticed that facets are not cached. Facets off, cached query take
0.01 seconds. Facet on, uncached and cached queries take 0.7 seconds.
Any plans for a facets cache? I know that facets is still a very early
feature, but its already awesome; my application is maybe irrealistic.
Thanks,
Michael