A little bit of history: We built a solr-like solution on Lucene.NET and C# about 5 years ago, which including faceted search. In order to get really good facet performance, what we did was pre-cache all the facet fields in RAM as efficient compressed data structures (either a variable byte encoded list of doc IDs as integers, or as a bit array, depending on how many docs that field matches). Then we sorted those sets of facet fields by total document frequency so we enumerate the more frequent facet fields first, and we stop looking when we find a facet field which has less total document matches than the top N facet counts we are looking for. We also did more efficient intersection algorithm between the facet field and the matched doc set using intersection on the internal uint fields of the bit array when possible. This works great for one reason that we package all this cached data structure onto a binary file on the master server and distribute that file with each new index snapshot to the slaves. (So the heavy lifting of reading the TermEnum and TermDocs happens only on the master servers). The slaves just pre-load that binary structure directly into ram in one shot in the background when opening a new snapshot for search. We have 200 million docs, 10 shards, about 20 facet fields, some of which contain about 20,000 unique values. We show top 10 facets for about 10 different fields in results page. We provide search results with lots of facets and date counts in around 200-300ms using this technique.
Currently, we are porting this entire system to SOLR. For a single core index of 8 million docs, using similar documents and facet fields from our production indexes, I cant get faceted search to perform anywhere close to 300ms for general searches. More like 1.5-3 seconds. I adjusted filter cache size to 10,000, and tried running different facet.method parameters (enum and fc). But still very slow. I'm running on server with 2 cores, 3.7 GB ram and setting JVM to have up to 2.5 GB ram. I see that SOLR takes quite some time to pre-load the filter cache for some of these facet fields when opening a new searcher. Is there anything else that I should look into for getting better facet performance? Given these metrics (200m docs, 20 facet fields, some fields with 20,000 unique values), what kind of facet search performance should I expect? Also we need to issue frequent commits since we are constantly streaming new content into the system. Thanks Bob