Chris Hostetter wrote:
Honestly: if you have a really small cardinality for these numeric
values (ie: small enough to return every value on every request) perhaps
you should use faceting to find the min/max values (with facet.mincount=1)
instead of starts?
Thanks for the tips and info.
I can't figure out any way to use faceting to find min/max values. If I
do a facet.sort=index, and facet.limit=1, then the facet value returned
would be the min value... but how could I get the max value? There is
no facet.sort=rindex or what have you. Ah, you say small enough to
return every value on every request. Nope, it's not THAT small. I've
got about 3 million documents, and 2-10k unique integers in a field, and
I want to find the min/max.
I guess, if I both index and store the field (which I guess i have to do
anyway), I can find min and max via two separate queries. Sort by
my_field asc, sort by my_field desc, with rows=1 both times, get out the
stored field, that's my min/max.
That might be what I resort to. But it's a shame, StatsComponent can
give me the info "included" in the query I'm already making, as opposed
to requiring two additional querries on top of that -- which you'd think
would be _slower_, but doesn't in fact seem to be.
I don't think so .. i belive Ryan considered this when he firsted added
StatsComponent, but he decided it wasn't really worth the trouble -- all
of the stats are computed in a single pass, and the majority of the time
is spent getting the value of every doc in the set -- adding each value to
a running total (for the sum and ultimatley computing the median) is a
really cheap operation compared to the actaul iteration over the set.
Yeah, it's really kind of a mystery to me why StatsComponent is being so
slow. StatsComponent is slower than faceting on the field, and is even
slower than the total time of: 1) First making the initial query,
filling all caches, 2) Then making two additional querries with the same
q/fq, but with different sorts to get min and max from the result set in
#1.
From what you say, there's no good reason for StatsComponent to be
slower than these alternatives, but it is, by an order of magnitude (1-2
seconds vs 10-15 seconds).
I guess I'd have to get into Java profiling/debugging to figure it out,
maybe a weird bug or mis-design somewhere I'm tripping.
Konathan