: So one question is if there's any way to increase StatsComponent performance. : Does it use any caches, or does it operate without caches? My Solr is running
I believe it uses the field cache to allow fast lookup of numeric values for documents as it iterates through teh document set -- there's not really any sort of caching it can use that it isn't already. : But it also occurs to me that the StatsComponent is doing a lot more than I : need. I just need min/max. And the cardinality of this field is a couple : orders of magnitude lower than the total number of documents. But the cardnaliy of the values isn't really relevant -- it still has to check the value for every doc in your set to see what value it has. In things like faceting, term frequency can come into play becuase we can make optimizations to see if a given terms index wide frequency is less the our cut off, and if it is we can skip it completely w/o checking how many docs in our set contain that value -- that type of optimization isn't possible for min/max (although i suppose there is room for a possible imporvement of checking if the min we've found so far is the "global" min for that field, and if so don't bother checking nay docs ... that seems like a really niche special case optimization, but if you want to submit a patch it might be useful. Honestly: if you have a really small cardinality for these numeric values (ie: small enough to return every value on every request) perhaps you should use faceting to find the min/max values (with facet.mincount=1) instead of starts? : StatsComponent is also doing a bunch of other things, like sum, median, etc. : Perhaps if there were a way to _just_ get min/max, it would be faster. Is : there any way to get min/max values in a result set other than StatsComponent? I don't think so .. i belive Ryan considered this when he firsted added StatsComponent, but he decided it wasn't really worth the trouble -- all of the stats are computed in a single pass, and the majority of the time is spent getting the value of every doc in the set -- adding each value to a running total (for the sum and ultimatley computing the median) is a really cheap operation compared to the actaul iteration over the set. That said: if you wanna work on a patch and can demonstrate that making these things configurable has performance improvements in the special case w/o hurting performance in the default case, i don't think anyone will argue against it. -Hoss