One task when designing a facet-based UI is deciding which fields to
facet on and display facets for. One possibility that I hope to
explore is to determine which fields to facet on dynamically, based
on
the search results. In particular, I hypothesize that, for a somewhat
heterogeneous index (heterogeneous in terms of which fields a given
record might contain), that the following rule might be helpful:
Facet
on a given field to the extent that it is frequently set in the
documents matching the user's search.
For example, let's say my results look like this:
Doc A:
f1: foo
f2: bar
f3: <N/A>
f4: <N/A>
Doc B:
f1: foo2
f2: <N/A>
f3: <N/A>
f4: <N/A>
Doc C:
f1: foo3
f2: quiz
f3: <N/A>
f4: buzz
Doc D:
f1: foo4
f2: question
f3: bam
f4: bing
The field usage information for these documents could be summarized
like
this:
field f1: Set in 4 docs
field f2: Set in 3 doc
field f3: Set 1 doc
field f4: Set 2 doc
If I were choosing facet fields based on the above rule, I would
definitely want to display facets for field f1, since occurs in all
documents. If I had room for another facet in the UI, I would facet
f2. If I wanted another one, I'd go with f4, since it's more popular
than f3. I probably would ignore f3 in any case, because it's set for
only one document.
Has anyone implemented such a scheme with Solr? Any success? (The
closest thing I can find is
http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
to pick which facets to display based not on frequency but based more
on a ruleset.)
As far as implementation, the most straightforward approach (which
wouldn't involve modifying Solr) would apparently be to add a new
multi-valued "fieldsindexed" field to each document, which would note
which fields actually have a value for each document. So when I pass
data to Solr at indexing time, it will look something like this
(except of course it will be in valid Solr XML, rather than this
schematic):
Doc A:
f1: foo
f2: bar
indexedfields: f1, f2
Doc B:
f1: foo2
indexedfields: f1
Doc C:
f1: foo3
f2: quiz
f4: buzz
indexedfields: f1, f2, f4
Doc D:
f1: foo4
f2: question
f3: bam
f4: bing
indexedfields: f1, f2, f3, f4
Then to chose which facets to display, I call
http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true
and use the frequency information from this query to determine which
fields to display in the faceting UI. (To get the actual facet
information for those fields, I would query Solr a second time.)
Are there any alternatives that would be easier or more efficient?
Thanks,
Chris