: > My main interest : > is in Solr's fledgling support for faceted search. : > : > Are there any ideas on how to support hierarchical facets?
there are two different things that fall under the heading of "hierarchical facets", and they can be very differnet... The first usage is when the word "hierarchy" refers largely to the UI of the facet, and not of the data itself. An example of this would be a facet on "date", where you show each decade as a possible constraint w/count and if/when the user picks a decade, then you should the individual years as constraints, and if/when they pick a year you should them months, etc... untill you get to the granularity that makes sense. Another example of this would be a facet on a "Name" field, where you start by showing them constratints and coutns on the first letter of a name, and one they pick that you should them all of hte two letter combos, and then the 3 letter combos, etc... The second usage is when the documents themselves can be organized into a hierarchical taxonomy (or perhaps multiple differnet taxonomies) and you want to expose that information as a facet ... Nabble.com's "Narrow Search Results" right nav fits this model. It's not allways a clear cut line though ... you might actually think of your data being organized in a hierarcy of decades, which have years as sub-categories which have months as sub-categories ... but most people wouldn't do that. I don't know anyone who would think that it acctually makes sense to categorize people in a taxonomy based on the first N leters of their name. Yonik's Location example however, is a good one where the lines are very blurry. It might make sense to think of it in the first usage case, where the UI of the facets is presented in such a way that we only should the State level constraints once a Country level constraint is picked, etc... But other people might prefer a UI driven by the second usage, where a biz search for "Dunkin Donuts" lists the first five constraints in the Location field as... United States/Massachusetts/Boston (103) United States/Massachusetts/Cambridge (92) United States/Massachusetts/Brookline (84) United States/Colorado (77) United States/Massachusetts/Newton (55) ...because there are just so damn many Dunkin Donuts in Massachusetts, we go down to the citi level, but for Colorado we just show at the state granularity The first type of "hierarchical facet" is obviously a lot easier then the second -- largely because the second can't typcially be done using simpel Term comparisons ... and you need some carefully choosen logic for deciding when to be granular, and when to be general. : An alternative would be to add the concept of a strict facet hierarchy : into Solr, and it could do the summing itself (useful if there are too : many leaves to return them all to the client). FYI: what we found when building the "Category" Facet in the left nav of this page... http://shopper.search.com/search?q=compactflash ...was that sorting on the strict count wasn't very good from a usability perspective, because the very general categories in the taxonomy tended to contain so many products they allways sorted to the top, even if they weren't particularaly relevent (ie: even if only half of the Digital Cameras on the market use compact flash cards, that may still be more then the total number of products in the entire "flash memory" category). We wound up computing a custom score for each category based on a function of the number of matching results in that category, the total number of items in that category, and a few other metrics. It may be hard to roll out a truely "generic" way of sorting hierarchical counts that would be usefull for people. -Hoss