One task when designing a facet-based UI is deciding which fields to facet on and display facets for. One possibility that I hope to explore is to determine which fields to facet on dynamically, based on the search results. In particular, I hypothesize that, for a somewhat heterogeneous index (heterogeneous in terms of which fields a given record might contain), that the following rule might be helpful: Facet on a given field to the extent that it is frequently set in the documents matching the user's search.
For example, let's say my results look like this: Doc A: f1: foo f2: bar f3: <N/A> f4: <N/A> Doc B: f1: foo2 f2: <N/A> f3: <N/A> f4: <N/A> Doc C: f1: foo3 f2: quiz f3: <N/A> f4: buzz Doc D: f1: foo4 f2: question f3: bam f4: bing The field usage information for these documents could be summarized like this: field f1: Set in 4 docs field f2: Set in 3 doc field f3: Set 1 doc field f4: Set 2 doc If I were choosing facet fields based on the above rule, I would definitely want to display facets for field f1, since occurs in all documents. If I had room for another facet in the UI, I would facet f2. If I wanted another one, I'd go with f4, since it's more popular than f3. I probably would ignore f3 in any case, because it's set for only one document. Has anyone implemented such a scheme with Solr? Any success? (The closest thing I can find is http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries to pick which facets to display based not on frequency but based more on a ruleset.) As far as implementation, the most straightforward approach (which wouldn't involve modifying Solr) would apparently be to add a new multi-valued "fieldsindexed" field to each document, which would note which fields actually have a value for each document. So when I pass data to Solr at indexing time, it will look something like this (except of course it will be in valid Solr XML, rather than this schematic): Doc A: f1: foo f2: bar indexedfields: f1, f2 Doc B: f1: foo2 indexedfields: f1 Doc C: f1: foo3 f2: quiz f4: buzz indexedfields: f1, f2, f4 Doc D: f1: foo4 f2: question f3: bam f4: bing indexedfields: f1, f2, f3, f4 Then to chose which facets to display, I call http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true and use the frequency information from this query to determine which fields to display in the faceting UI. (To get the actual facet information for those fields, I would query Solr a second time.) Are there any alternatives that would be easier or more efficient? Thanks, Chris