: > My main interest
: > is in Solr's fledgling support for faceted search.
: >
: > Are there any ideas on how to support hierarchical facets?

there are two different things that fall under the heading of
"hierarchical facets", and they can be very differnet...

The first usage is when the word "hierarchy" refers largely to the UI of
the facet, and not of the data itself.  An example of this would be a
facet on "date", where you show each decade as a possible constraint
w/count and if/when the user picks a decade, then you should the
individual years as constraints, and if/when they pick a year you should
them months, etc...  untill you get to the granularity that makes sense.
Another example of this would be a facet on a "Name" field, where you
start by showing them constratints and coutns on the first letter of a
name, and one they pick that you should them all of hte two letter combos,
and then the 3 letter combos, etc...

The second usage is when the documents themselves can be organized into a
hierarchical taxonomy (or perhaps multiple differnet taxonomies) and you
want to expose that information as a facet ...  Nabble.com's "Narrow
Search Results" right nav fits this model.


It's not allways a clear cut line though ... you might actually think of
your data being organized in a hierarcy of decades, which have years as
sub-categories which have months as sub-categories ... but most people
wouldn't do that.  I don't know anyone who would think that it acctually
makes sense to categorize people in a taxonomy based on the first N leters
of their name.  Yonik's Location example however, is a good one where the
lines are very blurry.  It might make sense to think of it in the first
usage case, where the UI of the facets is presented in such a way that
we only should the State level constraints once a Country level constraint
is picked, etc...  But other people might prefer a UI driven by the second
usage, where a biz search for "Dunkin Donuts" lists the first five
constraints in the Location field as...
     United States/Massachusetts/Boston (103)
     United States/Massachusetts/Cambridge (92)
     United States/Massachusetts/Brookline (84)
     United States/Colorado (77)
     United States/Massachusetts/Newton (55)

...because there are just so damn many Dunkin Donuts in Massachusetts, we
go down to the citi level, but for Colorado we just show at the state
granularity


The first type of "hierarchical facet" is obviously a lot easier then the
second -- largely because the second can't typcially be done using simpel
Term comparisons ... and you need some carefully choosen logic for
deciding when to be granular, and when to be general.

: An alternative would be to add the concept of a strict facet hierarchy
: into Solr, and it could do the summing itself (useful if there are too
: many leaves to return them all to the client).

FYI: what we found when building the "Category" Facet in the left nav of
this page...
  http://shopper.search.com/search?q=compactflash
...was that sorting on the strict count wasn't very good from a usability
perspective, because the very general categories in the taxonomy tended to
contain so many products they allways sorted to the top, even if they
weren't particularaly relevent (ie: even if only half of the Digital
Cameras on the market use compact flash cards, that may still be more then
the total number of products in the entire "flash memory" category).  We
wound up computing a custom score for each category based on a function of
the number of matching results in that category, the total number of items in
that category, and a few other metrics.

It may be hard to roll out a truely "generic" way of sorting hierarchical
counts that would be usefull for people.


-Hoss

Reply via email to