Chris Hostetter wrote:


: My (our) query plugin uses specialized SolrCache's in lieu of the meta
: data records.   For each new searcher installed each fields possible
: values will be determined and stored in a cache (off the top of my head,

Are you determining the field values based on all indexed values for those
fields, or do you have application specific logic in the plugin that knows
certain fields (like "price") should be ranges, while other fields should
be discreet?
Yes. We filter on facets in 2 major ways. For unranged attributes we simply specify the normalized value that should appear the in the field (duh). For unranged attributes we specify the field, a operator and the normalized value we are comparing against. Here is an example of a passed parameter.

&atr_A00053=K02147U00054||>4194304

That tells the system to give me only computers with more than 4MB of RAM (wasn't that obvious?). In this case the K...U... number isn't actually used (translated that means "4MB"), only the field (A00053) and the normalized field value (4194303 ... that nonsense value that currently means 4MB, 4048KB, etc).

This system only exists to maintain compatibility with systems previously used to manage our AltaVista based search engine. It's not pretty but it works well given our current functionality requirements. It also doesn't do bounded searched like 4MB to 8MB.

that's the reason why I used special metadata docs -- actually that's only
part of the reason, i needed the facets to be data driven to allow our
site staff to manage them, and i needed to support vastly different facets
based on category (hence: one metadata doc per category).

Right, it's all about customer requirements. As above, the data gets pulled from a live DB the web front end to produce the query strings as options to the user and the logic is embedded in the query string. What I'd really like to see is an XML query language so I can toss all the hackish URL query arguments and really move much of the query plugin logic out into the query itself instead of in the Java code.

I do intend to revamp our faceting engine in our next major release to customers. We'll introduce dynamic attribute bucketing. Rather than produce a list of counts of all values for an attribute and have "at least" or "at most" options, users will be given ranged lists based on the actual distribution of the facets. I haven't really worked out the details since I haven't actually began the design but I'm probably going to see if I can't just look at it like it's on a bell curve and start picking evenly sized buckets. Monitors <= 15" (10), 15 -> 17 (10), 17 -> 21 (10), 21-> 25 (10), > 25 (10). Now obviously I can't force it into a nice distribution like that but I'll figure out something. In any case, the bucket ranges will need to be based on the actual distribution (easy to maintain, hard to implement) in the current result set and not some pre-manufactured bucket categories (easy to implement, hard to maintain) as those get obsoleted fairly quickly.

Reply via email to