On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote: > Like Walter Underwood wrote, in technical sense faceting on authors > isn't a good idea.
In a technical sense, there is no good or bad about faceting on high-cardinality fields in Solr. The faceting code is fairly efficient (modulo the newly discovered regression) and scales well with the number of references and unique terms. It gives the expected performance when used with high-cardinality fields: Relatively heavy and with substantial worst-case processing time. As such should be enabled with care and a clear understanding of the cost. But the same can be said of a great deal of other features, when building an IT system. Labelling is a good or bad idea only makes sense when looking at the specific context. I am being a stickler about this because high-cardinality faceting in Solr has an undeserved bad rep. Rather than discouraging it, we should be better at describing the consequences of using it. > In the worst case, the relation book to author is > n:n. Never the less, thanks to authority files (which are intensively > used in Germany) the facet 'author' is often helpful. We have been faceting on Author (10M uniques) since 2007. It helps our users navigate the corpus. It is a good idea for us. We tried faceting on 6 billion uniques/machine as default in our Net Archive (custom hack). It raised our non-pathological 75% percentile to 2½ second, with little value for the researchers. It was a bad idea for us. - Toke Eskildsen, State and University Library, Denmark