I've got a problem that I'm not quite sure how to solve and am wondering if anyone has any insight or similar experience to share.
Here's the situation: Documents in our Solr index include a field identifying their author (we have 1000s of authors). When displaying an individual document, we also want to display a list of related documents by other authors*, so we do a search using the current document's title, author name, summary, and keywords as the query. Sometimes the search yields a results set in which all of the top n documents (in reality, n is ~10) are from one author. Apparently, people don't like this. So what is being asked for is a result set in which no more than m (where m is probably 3) of the top n are from any single author. (It's not that we want to exclude documents m+1, m+2, etc. by each author from the result set entirely; we just don't want them in the top n.) More generically, I can imagine this as a feature that might be occasionally useful, e.g. as a kind of "diversity boost function" to be used when scoring results, where you specify the fields for which you want to enforce diversity (e.g., author name, genre, color, etc.), and provide your values for n and m, and Solr, uhm, obliges. :-) Any tips or ideas on how to proceed? (We're using Solr 1.2 so we don't have MoreLikeThis, but we can upgrade to a newer version if it's likely that MoreLikeThis can provide what we're looking for.) -Charlie * In fact, we wouldn't mind if additional documents by the same author were included, but we found that when we didn't exclude the original author from the result set, we almost always had the same problem: The first n documents were always by the original author.