The rollup streaming expression rolls up aggregations on a stream that has been sorted by the group by fields. This is basically a MapReduce reduce operation and can work with extremely high cardinality (basically unlimited). The rollup function is designed to rollup data produced by the /export handler which can also sort data sets with very high cardinality. The docs should describe the correct usage of the rollup expression with the /export handler.
Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Feb 20, 2018 at 11:10 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 2/20/2018 4:44 AM, Alfonso Muñoz-Pomer Fuentes wrote: > >> We have a query that we can resolve using either facet or search with >> rollup. In the Stream Source Reference section of Solr’s Reference Guide ( >> https://lucene.apache.org/solr/guide/7_1/stream-source-refe >> rence.html#facet) it says “To support high cardinality aggregations see >> the rollup function”. I was wondering what it’s considered “high >> cardinality”. If it serves, our query returns up to 60k results. I haven’t >> got to do any benchmarking to see if there’s any difference, though, >> because facet so far performs very well, but I don’t know if I’m near the >> “tipping point”. Any feedback would be appreciated. >> > > There's no hard and fast rule for this. The tipping point is going to be > different for every use case. With a little bit of information about your > setup, experienced users can make an educated guess about whether or not > performance will be good, but cannot say with absolute certainty what > you're going to run into. > > Let's start with some definitions, which you may or may not already know: > > https://en.wikipedia.org/wiki/Cardinality_(data_modeling) > https://en.wikipedia.org/wiki/Cardinality > > You haven't said how many unique values are in your field. The only > information I have from you is 60K results from your queries, which may or > may not have any bearing on the total number of documents in your index, or > the total number of unique values in the field you're using for faceting. > So the next paragraph may or may not apply to your index. > > In general, 60,000 unique values in a field would be considered very low > cardinality, because computers can typically operate on 60,000 values > *very* quickly, unless the size of each value is enormous. But if the > index has 60,000 total documents, then *in relation to other data*, the > cardinality is very high, even though most people would say the opposite. > Sixty thousand documents or unique values is almost always a very small > index, not prone to performance issues. > > The warnings about cardinality in the Solr documentation mostly refer to > *absolute* cardinality -- how many unique values there are in a field, > regardless of the actual number of documents. If there are millions or > billions of unique values, then operations like facets, grouping, sorting, > etc are probably going to be slow. If there are a lot less, such as > thousands or only a handful, then those operations are likely to be very > fast, because the computer will have less information it must process. > > Thanks, > Shawn > >