One task when designing a facet-based UI is deciding which fields to
facet on and display facets for. One possibility that I hope to
explore is to determine which fields to facet on dynamically, based on
the search results. In particular, I hypothesize that, for a somewhat
heterogeneous index (heterogeneous in terms of which fields a given
record might contain), that the following rule might be helpful: Facet
on a given field to the extent that it is frequently set in the
documents matching the user's search.

For example, let's say my results look like this:

Doc A:
  f1: foo
  f2: bar
  f3: <N/A>
  f4: <N/A>

Doc B:
  f1: foo2
  f2: <N/A>
  f3: <N/A>
  f4: <N/A>

Doc C:
  f1: foo3
  f2: quiz
  f3: <N/A>
  f4: buzz

Doc D:
  f1: foo4
  f2: question
  f3: bam
  f4: bing

The field usage information for these documents could be summarized like this:

field f1: Set in 4 docs
field f2: Set in 3 doc
field f3: Set 1 doc
field f4: Set 2 doc

If I were choosing facet fields based on the above rule, I would
definitely want to display facets for field f1, since occurs in all
documents.  If I had room for another facet in the UI, I would facet
f2. If I wanted another one, I'd go with f4, since it's more popular
than f3. I probably would ignore f3 in any case, because it's set for
only one document.

Has anyone implemented such a scheme with Solr? Any success? (The
closest thing I can find is
http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
to pick which facets to display based not on frequency but based more
on a ruleset.)

As far as implementation, the most straightforward approach (which
wouldn't involve modifying Solr) would apparently be to add a new
multi-valued "fieldsindexed" field to each document, which would note
which fields actually have a value for each document. So when I pass
data to Solr at indexing time, it will look something like this
(except of course it will be in valid Solr XML, rather than this
schematic):

Doc A:
  f1: foo
  f2: bar
  indexedfields: f1, f2

Doc B:
  f1: foo2
  indexedfields: f1

Doc C:
  f1: foo3
  f2: quiz
  f4: buzz
  indexedfields: f1, f2, f4

Doc D:
  f1: foo4
  f2: question
  f3: bam
  f4: bing
  indexedfields: f1, f2, f3, f4

Then to chose which facets to display, I call

http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true

and use the frequency information from this query to determine which
fields to display in the faceting UI. (To get the actual facet
information for those fields, I would query Solr a second time.)

Are there any alternatives that would be easier or more efficient?

Thanks,
Chris

Reply via email to