And further on this, if you want a field automatically added to each document with the list of its field names, check out http://issues.apache.org/jira/browse/SOLR-1280

        Erik



On Aug 4, 2009, at 1:01 AM, Avlesh Singh wrote:

I understand the general need here. And just extending what you suggested (indexing the fields themselves inside a multiValued field), you can perform
a query like this -
/search? q = myquery &facet = true &facet .field = indexedfields&facet.field=field1&facet.field=field2...&facet.sort=true

You'll get facets for all the fields (passed as multiple facet.field
params), including the one that gives you field frequency. You can do all
sorts of post processing on this data to achieve the desired.

Hope this helps.

Cheers
Avlesh

On Tue, Aug 4, 2009 at 2:20 AM, Chris Harris <rygu...@gmail.com> wrote:

One task when designing a facet-based UI is deciding which fields to
facet on and display facets for. One possibility that I hope to
explore is to determine which fields to facet on dynamically, based on
the search results. In particular, I hypothesize that, for a somewhat
heterogeneous index (heterogeneous in terms of which fields a given
record might contain), that the following rule might be helpful: Facet
on a given field to the extent that it is frequently set in the
documents matching the user's search.

For example, let's say my results look like this:

Doc A:
f1: foo
f2: bar
f3: <N/A>
f4: <N/A>

Doc B:
f1: foo2
f2: <N/A>
f3: <N/A>
f4: <N/A>

Doc C:
f1: foo3
f2: quiz
f3: <N/A>
f4: buzz

Doc D:
f1: foo4
f2: question
f3: bam
f4: bing

The field usage information for these documents could be summarized like
this:

field f1: Set in 4 docs
field f2: Set in 3 doc
field f3: Set 1 doc
field f4: Set 2 doc

If I were choosing facet fields based on the above rule, I would
definitely want to display facets for field f1, since occurs in all
documents.  If I had room for another facet in the UI, I would facet
f2. If I wanted another one, I'd go with f4, since it's more popular
than f3. I probably would ignore f3 in any case, because it's set for
only one document.

Has anyone implemented such a scheme with Solr? Any success? (The
closest thing I can find is
http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
to pick which facets to display based not on frequency but based more
on a ruleset.)

As far as implementation, the most straightforward approach (which
wouldn't involve modifying Solr) would apparently be to add a new
multi-valued "fieldsindexed" field to each document, which would note
which fields actually have a value for each document. So when I pass
data to Solr at indexing time, it will look something like this
(except of course it will be in valid Solr XML, rather than this
schematic):

Doc A:
f1: foo
f2: bar
indexedfields: f1, f2

Doc B:
f1: foo2
indexedfields: f1

Doc C:
f1: foo3
f2: quiz
f4: buzz
indexedfields: f1, f2, f4

Doc D:
f1: foo4
f2: question
f3: bam
f4: bing
indexedfields: f1, f2, f3, f4

Then to chose which facets to display, I call


http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true

and use the frequency information from this query to determine which
fields to display in the faceting UI. (To get the actual facet
information for those fields, I would query Solr a second time.)

Are there any alternatives that would be easier or more efficient?

Thanks,
Chris


Reply via email to