Field collapsing might work for you. I haven't looked at the details
of the implementation and it is still in development, but it is the
right sort of feature. You'd like to see the top N matches for
each value of the author field, right?

wunder

On 1/6/08 3:25 PM, "Charles Hornberger" <[EMAIL PROTECTED]>
wrote:

> I've got a problem that I'm not quite sure how to solve and am wondering if
> anyone has any insight or similar experience to share.
> 
> Here's the situation: Documents in our Solr index include a field
> identifying their author (we have 1000s of authors). When displaying an
> individual document, we also want to display a list of related documents by
> other authors*, so we do a search using the current document's title, author
> name, summary, and keywords as the query. Sometimes the search yields a
> results set in which all of the top n documents (in reality, n is ~10) are
> from one author.
> 
> Apparently, people don't like this.
> 
> So what is being asked for is a result set in which no more than m (where m
> is probably 3) of the top n are from any single author. (It's not that we
> want to exclude documents m+1, m+2, etc. by each author from the result set
> entirely; we just don't want them in the top n.)
> 
> More generically, I can imagine this as a feature that might be occasionally
> useful, e.g. as a kind of "diversity boost function" to be used when scoring
> results, where you specify the fields for which you want to enforce
> diversity (e.g., author name, genre, color, etc.), and provide your values
> for n and m, and Solr, uhm, obliges. :-)
> 
> Any tips or ideas on how to proceed? (We're using Solr 1.2 so we don't have
> MoreLikeThis, but we can upgrade to a newer version if it's likely that
> MoreLikeThis can provide what we're looking for.)
> 
> -Charlie
> 
> * In fact, we wouldn't mind if additional documents by the same author were
> included, but we found that when we didn't exclude the original author from
> the result set, we almost always had the same problem: The first n documents
> were always by the original author.

Reply via email to