Chris Hostetter wrote:
: My (our) query plugin uses specialized SolrCache's in lieu of the meta
: data records. For each new searcher installed each fields possible
: values will be determined and stored in a cache (off the top of my head,
Are you determining the field values based on all indexed values for those
fields, or do you have application specific logic in the plugin that knows
certain fields (like "price") should be ranges, while other fields should
be discreet?
Yes. We filter on facets in 2 major ways.
For unranged attributes we simply specify the normalized value that
should appear the in the field (duh).
For unranged attributes we specify the field, a operator and the
normalized value we are comparing against. Here is an example of a
passed parameter.
&atr_A00053=K02147U00054||>4194304
That tells the system to give me only computers with more than 4MB of
RAM (wasn't that obvious?). In this case the K...U... number isn't
actually used (translated that means "4MB"), only the field (A00053) and
the normalized field value (4194303 ... that nonsense value that
currently means 4MB, 4048KB, etc).
This system only exists to maintain compatibility with systems
previously used to manage our AltaVista based search engine. It's not
pretty but it works well given our current functionality requirements.
It also doesn't do bounded searched like 4MB to 8MB.
that's the reason why I used special metadata docs -- actually that's only
part of the reason, i needed the facets to be data driven to allow our
site staff to manage them, and i needed to support vastly different facets
based on category (hence: one metadata doc per category).
Right, it's all about customer requirements. As above, the data gets
pulled from a live DB the web front end to produce the query strings as
options to the user and the logic is embedded in the query string.
What I'd really like to see is an XML query language so I can toss all
the hackish URL query arguments and really move much of the query plugin
logic out into the query itself instead of in the Java code.
I do intend to revamp our faceting engine in our next major release to
customers. We'll introduce dynamic attribute bucketing. Rather than
produce a list of counts of all values for an attribute and have "at
least" or "at most" options, users will be given ranged lists based on
the actual distribution of the facets. I haven't really worked out
the details since I haven't actually began the design but I'm probably
going to see if I can't just look at it like it's on a bell curve and
start picking evenly sized buckets. Monitors <= 15" (10), 15 -> 17
(10), 17 -> 21 (10), 21-> 25 (10), > 25 (10). Now obviously I can't
force it into a nice distribution like that but I'll figure out
something. In any case, the bucket ranges will need to be based on the
actual distribution (easy to maintain, hard to implement) in the current
result set and not some pre-manufactured bucket categories (easy to
implement, hard to maintain) as those get obsoleted fairly quickly.