Field collapsing might work for you. I haven't looked at the details of the implementation and it is still in development, but it is the right sort of feature. You'd like to see the top N matches for each value of the author field, right?
wunder On 1/6/08 3:25 PM, "Charles Hornberger" <[EMAIL PROTECTED]> wrote: > I've got a problem that I'm not quite sure how to solve and am wondering if > anyone has any insight or similar experience to share. > > Here's the situation: Documents in our Solr index include a field > identifying their author (we have 1000s of authors). When displaying an > individual document, we also want to display a list of related documents by > other authors*, so we do a search using the current document's title, author > name, summary, and keywords as the query. Sometimes the search yields a > results set in which all of the top n documents (in reality, n is ~10) are > from one author. > > Apparently, people don't like this. > > So what is being asked for is a result set in which no more than m (where m > is probably 3) of the top n are from any single author. (It's not that we > want to exclude documents m+1, m+2, etc. by each author from the result set > entirely; we just don't want them in the top n.) > > More generically, I can imagine this as a feature that might be occasionally > useful, e.g. as a kind of "diversity boost function" to be used when scoring > results, where you specify the fields for which you want to enforce > diversity (e.g., author name, genre, color, etc.), and provide your values > for n and m, and Solr, uhm, obliges. :-) > > Any tips or ideas on how to proceed? (We're using Solr 1.2 so we don't have > MoreLikeThis, but we can upgrade to a newer version if it's likely that > MoreLikeThis can provide what we're looking for.) > > -Charlie > > * In fact, we wouldn't mind if additional documents by the same author were > included, but we found that when we didn't exclude the original author from > the result set, we almost always had the same problem: The first n documents > were always by the original author.