: my documents (products) have a price field, and I want to have
: a "dynamically" calculated range facet for that in the response.

FYI: there have been some previous discussions on this topic...

http://www.nabble.com/blahblah-t2387813.html#a6799060
http://www.nabble.com/faceted-browsing-t1363854.html#a3753053

: AFAICS I do not have the possibility to specify range queries in my
: application, as I do not have a clue what's the lowest and highest
: price in the search result and what are "good" ranges according
: to the (statistical) distribution of prices in the search result.

as mentioned in one of those threads, it's *really* hard to get the
statistical sampling to the point where it's both balanced, but also user
freindly.  writing code specificly for price ranges in dollars lets you
make some assumptions about things that give you "nice" ranges (rounding
to one significant digit less then the max, doing log based ranges, etc..)
that wouldn't really apply if you were trying to implement a truely
generic dynamic range generator.

one thing to keep in mind: it's typically not a good idea to have the
constraint set of a facet change just because some other constraint was
added to the query -- individual constraints might disappear because
they no longer apply, but it can be very disconcerting to a user to
when options hcange on them....  if i search on "ipod" a statistical
analysis of prices might yeild facet ranges of $1-20, $20-60, $60-120,
$120-$200 ... if i then click on "accessories" the statistics might skew
cheaper, so hte new ranges are $1-20, $20-30, $30-40, $40-70 ...  and now
i'm a frustrated user, because i relaly wanted ot use the range $20-60
(that just happens to be my budget) and you offered it to me and then you
took it away ... i have to undo my selection or "accessories" then click
$20-60, and then click accessories to get what i wnat ... not very nice.

: So if it would be possible to go over each item in the search result
: I could check the price field and define my ranges for the specific
: query on solr side and return the price ranges as a facet.

: Otherwise, what would be a good starting point to plug in such
: functionality into solr?

if you relaly want to do statistical distributions, one way to avoid doing
all of this work on the client side (and needing to pull back all of hte
prices from all of hte matches) would be to write a custom request handler
that subclasses whichever on you currently use and does this computation
on the server side -- where it has lower level access to the data and
doesn't need to stream it over the wire.  FieldCache in particular would
come in handy.

it occurs to me that even though there may not be a way to dynamicly
create facet ranges that can apply usefully on any numeric field, we could
add generic support to the request handlers for optionally fetching some
basic statistics about a DocSet for clients that want them (either for
building ranges, or for any other purpose)

min, max, mean, median, mode, midrange ... those should all be easy to
compute using the ValueSource from the field type (it would be nice if
FieldType's had some way of indicating which DocValues function can best
manage the field type, but we can always assume float or have an option
for dictating it ... people might want a float mean for an int field
anyway)

i suppose even stddev could be computed fairly easily ... there's a
formula for that that works well in a single pass over a bunch of values
right?




-Hoss

Reply via email to