Re: faceting is unusable slow since upgrade to 5.3.0

Toke Eskildsen Mon, 28 Sep 2015 02:19:50 -0700

On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote:
> Like Walter Underwood wrote, in technical sense faceting on authors 
> isn't a good idea.


In a technical sense, there is no good or bad about faceting on
high-cardinality fields in Solr. The faceting code is fairly efficient
(modulo the newly discovered regression) and scales well with the number
of references and unique terms. It gives the expected performance when
used with high-cardinality fields: Relatively heavy and with substantial
worst-case processing time.

As such should be enabled with care and a clear understanding of the
cost. But the same can be said of a great deal of other features, when
building an IT system. Labelling is a good or bad idea only makes sense
when looking at the specific context.

I am being a stickler about this because high-cardinality faceting in
Solr has an undeserved bad rep. Rather than discouraging it, we should
be better at describing the consequences of using it.

> In the worst case, the relation book to author is 
> n:n. Never the less, thanks to authority files (which are intensively 
> used in Germany) the facet 'author' is often helpful.

We have been faceting on Author (10M uniques) since 2007. It helps our
users navigate the corpus. It is a good idea for us.

We tried faceting on 6 billion uniques/machine as default in our Net
Archive (custom hack). It raised our non-pathological 75% percentile to
2½ second, with little value for the researchers. It was a bad idea for
us.

- Toke Eskildsen, State and University Library, Denmark

Re: faceting is unusable slow since upgrade to 5.3.0

Reply via email to