Chris Hostetter wrote:
Honestly: if you have a really small cardinality for these numeric values (ie: small enough to return every value on every request) perhaps you should use faceting to find the min/max values (with facet.mincount=1) instead of starts?
Thanks for the tips and info.

I can't figure out any way to use faceting to find min/max values. If I do a facet.sort=index, and facet.limit=1, then the facet value returned would be the min value... but how could I get the max value? There is no facet.sort=rindex or what have you. Ah, you say small enough to return every value on every request. Nope, it's not THAT small. I've got about 3 million documents, and 2-10k unique integers in a field, and I want to find the min/max.

I guess, if I both index and store the field (which I guess i have to do anyway), I can find min and max via two separate queries. Sort by my_field asc, sort by my_field desc, with rows=1 both times, get out the stored field, that's my min/max.

That might be what I resort to. But it's a shame, StatsComponent can give me the info "included" in the query I'm already making, as opposed to requiring two additional querries on top of that -- which you'd think would be _slower_, but doesn't in fact seem to be.


I don't think so .. i belive Ryan considered this when he firsted added StatsComponent, but he decided it wasn't really worth the trouble -- all of the stats are computed in a single pass, and the majority of the time is spent getting the value of every doc in the set -- adding each value to a running total (for the sum and ultimatley computing the median) is a really cheap operation compared to the actaul iteration over the set.
Yeah, it's really kind of a mystery to me why StatsComponent is being so slow. StatsComponent is slower than faceting on the field, and is even slower than the total time of: 1) First making the initial query, filling all caches, 2) Then making two additional querries with the same q/fq, but with different sorts to get min and max from the result set in #1.

From what you say, there's no good reason for StatsComponent to be slower than these alternatives, but it is, by an order of magnitude (1-2 seconds vs 10-15 seconds).

I guess I'd have to get into Java profiling/debugging to figure it out, maybe a weird bug or mis-design somewhere I'm tripping.

Konathan

Reply via email to