On Wed, 2014-01-22 at 23:59 +0100, Bing Hua wrote:
> I am going to evaluate some Lucene/Solr capabilities on handling faceted
> queries, in particular, with a single facet field that contains large number
> (say up to 1 million) of distinct values. Does anyone have some experience
> on how lucene performs in this scenario?

We facet on Author (11.5M unique values) and Subject (3.8M unique
values) on our 12M documents. Each individual document typically has a
low amount of authors and subjects. Two indexes of about 50GB each, 3GB
heap, 5GB RAM free for disk cache, SSD, 4 core Intel Xeon L5420@2.50GHz.

Response time is around 1-200 ms for most queries, some queries taking
1-2 seconds and 1-2% of queries taking 3-10 seconds.

We use a home-grown faceting system under Lucene, but previous tests
shows performance and memory requirements to be quite similar to Solr
faceting, as they use the same algorithm (assuming facet.method=fc).
I do not know how our performance is compared to Lucene faceting.


The dreaded "Too Many Unique Values" is not a performance problem, but a
hard limit on the number of unique values imposed by Solr fc-faceting.
16M, as far as I remember. I do not know if Lucene faceting has the same
limit.

- Toke Eskildsen, State and University Library, Denmark


Reply via email to