Hi,

We have a requirement to fetch a set of distinct values of a given field
that match the given query. We also need to fetch the number of items
associated with each field value. I figured out a way to do this for
single-valued fields but am not able to get it to work for multi-valued
fields.

Long Story:
Say you have an index of movies, I would like to get a unique set of
directors matching a query (say "john") and also the number of movies
directed by each of them. For this example lets assume that "director" is a
single valued field.

I came up with one approach to implement this: Search for the query string
in the director field and then apply faceting on the same field (director).
The search will limit the movie results to the ones directed by directors
matching the query. Further, the faceting will provide a unique set of
directors and also the count of movies associated with them. The query will
look something like this:
solr/Movie/select/?q=director:(john)&start=0&rows=0&facet=true&facet.field=raw_director

This query works fine for single-valued fields. However it does not work in
the case of multi-valued fields, say we perform a similar search on the
"actors" (mutli-valued) field, the query will look like:
solr/Movie/select/?q=actors:(john)&start=0&rows=0&facet=true&facet.field=raw_actors
In this case, the search will again limit the movie results to the ones in
which actors matching the query have acted in. However while faceting the
results on "actors", the facet results will also contain other actors that
have acted in the resulting movies. For eg: say we are searching for
actors:malkovich, this will return all movies in which John Malkovich has
acted in. When the faceting is applied on these results, the facet results
contain John Malkovich with the correct number of movies. But, the facet
results also contain other actors who have acted with John Malkovich. The
facet results for the above query look something like this:
<lst name="facet_fields">
    <lst name="raw_actors">
        <int name="John Malkovich">49</int>
        <int name="Catherine Deneuve">4</int>
        <int name="John Cusack">3</int>
        <int name="Angelina Jolie">2</int>
        <int name="Evangeline Lilly">2</int>
        <int name="Glenne Headly">2</int>
        <int name="Jeremy Irons">2</int>
        <int name="Ray Winstone">2</int>
    </lst>
</lst>
The other actors in the above results is obviously not what we expect to
see, since they do match the original query (i.e. malkovich).

Is there any other way I can approach this for multi-valued fields ?

Thanks,
karthik c
http://cantspellathing.blogspot.com

Reply via email to