Thanks for all the great answers.

Quick Question: did you say you are faceting on the first name field
seperately from the last name field? ... why?
You misunderstood. I'm doing faceting on first author, and last author of the list. Life science papers have authors list, and the first one is usually the guy who did most of the work, and the last one is usually the boss of the lab. I already have untokenized author fields for that using copyField.
Second: you mentioned increasing hte size of your filterCache
significantly, but we don't really know how heterogenous your index is ... once you made that cahnge did your filterCache hitrate increase? .. do you
have any evictions (you can check on the "Statistics" page)
It was at the default (16000) and it hit the ceiling so to speak. I did maxSize=16000000 (for testing purpose) and now size : 17038 and 0 evictions. For a single facet field (journal name) with a limit of 5 and 12 faceted query fields (range on publication date), I now have 0.5 seconds search, which is not too bad. The filtercache size is pretty much constant no matter how many queries I do.

However, if I try to add another facet field (such as first_author), something strange happens. 99% CPU, the filter cache is filling up really fast, hitratio goes to hell, no disk activity, and it can stay that way for at least 30 minutes (didn't test longer, no point really). It turns out that journal_name has 17038 different tokens, which is manageable, but first_author has > 400 000. I don't think this will ever yield good performance, so i might only do journal_name facets.

Any reasons why facets tries to preload every term in the field?

I have noticed that facets are not cached. Facets off, cached query take 0.01 seconds. Facet on, uncached and cached queries take 0.7 seconds. Any plans for a facets cache? I know that facets is still a very early feature, but its already awesome; my application is maybe irrealistic.

Thanks,
Michael

Reply via email to