On Wed, 2014-01-22 at 23:59 +0100, Bing Hua wrote: > I am going to evaluate some Lucene/Solr capabilities on handling faceted > queries, in particular, with a single facet field that contains large number > (say up to 1 million) of distinct values. Does anyone have some experience > on how lucene performs in this scenario?
We facet on Author (11.5M unique values) and Subject (3.8M unique values) on our 12M documents. Each individual document typically has a low amount of authors and subjects. Two indexes of about 50GB each, 3GB heap, 5GB RAM free for disk cache, SSD, 4 core Intel Xeon L5420@2.50GHz. Response time is around 1-200 ms for most queries, some queries taking 1-2 seconds and 1-2% of queries taking 3-10 seconds. We use a home-grown faceting system under Lucene, but previous tests shows performance and memory requirements to be quite similar to Solr faceting, as they use the same algorithm (assuming facet.method=fc). I do not know how our performance is compared to Lucene faceting. The dreaded "Too Many Unique Values" is not a performance problem, but a hard limit on the number of unique values imposed by Solr fc-faceting. 16M, as far as I remember. I do not know if Lucene faceting has the same limit. - Toke Eskildsen, State and University Library, Denmark