Thanks, Chris. I just needed to stare at the code I already knew about more intently to see what was really going on. It's super convoluted and super confusing. The keys were the handleResponses method in the main component class and the AbstractStatsValues class that is hidden in the StatsValuesFactory source file. Oddly, the StatsValues source file doesn't contain the classes that implement that interface - they're in the "factory" source file!
BTW, we should have some doc notes on the limitations and performance implications of the stats component. Although, admittedly, it's moot if stats is eventually to be superseded by the analytics component. -- Jack Krupansky On Wed, Jan 14, 2015 at 12:26 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : Does anybody know for sure whether the stats component fully supports > : distributed mode? It is listed in the doc as supporting distributed mode > > it's been supported for as long as i can remember -- since Day 1 of the > StatsComponent i believe. > > : (at least for old, non-SolrCloud distrib mode), but... I don't see any > code > : that actually does that. Nor any tests, unless they are hidden somewhere > I > : didn't look. > > just like any other SearchComponent: look at StatsComponent.prepare, > StatsComponent.process, ...distributedProcess, ....modifyRequest, > ...handleResponses, ...finishStage, etc... > > > : In particular, I am interested in the "countdistinct" parameter which > would > : need to retrieve all distinct values from all other shards to detect > : whether any of the distinct values overlap between shards. > > yep -- that's exactly what it does ... totally naive and not a good idea > at all for fields with non-trivial cardinality, which is why you have to > explicitly turn it on with "calcDistinct" and why i wnat to replace it > with HyperLogLog approximations... > > https://issues.apache.org/jira/browse/SOLR-6968 > > -Hoss > http://www.lucidworks.com/ >