Tang, Rebecca [rebecca.t...@ucsf.edu] wrote: > I have an solr index with 14+ million records. We facet on quite a few > fields with very > high-cardinality such as author, person, organization, brand and document > type. Some > of the records contain thousands of persons and organizations. So the person > and > organization fields can be very large.
How many unique values per field in the full index are we talking? Just approximately. > After this change, the performance improved drastically. But I can't > understand why > building these fields as multi-valued field vs. single-valued field with > semicolon > tokenizer can have such a dramatic performance difference. It should not. I suspect something else is happening. 10 minutes does not sound unrealistic if it is your first query after and index update. Maybe your measurement for tokenized was unwarmed and your measurement for un-tokenized warmed? Could you give an example of a full query? Anyway, you should definitely be using DocValues for such high cardinality facet-fields. Depending on your usage pattern and where the bottleneck is, https://issues.apache.org/jira/browse/SOLR-5894 might also help. - Toke Eskildsen