Re: eliminating "too many results from the same source"

Charles Hornberger Sun, 06 Jan 2008 16:44:54 -0800

Of course -- and now I feel silly for not having thought of that :-).
Thanks!


On Jan 6, 2008 4:37 PM, Walter Underwood <[EMAIL PROTECTED]> wrote:

> Field collapsing might work for you. I haven't looked at the details
> of the implementation and it is still in development, but it is the
> right sort of feature. You'd like to see the top N matches for
> each value of the author field, right?
>
> wunder
>
> On 1/6/08 3:25 PM, "Charles Hornberger" <[EMAIL PROTECTED]>
> wrote:
>
> > I've got a problem that I'm not quite sure how to solve and am wondering
> if
> > anyone has any insight or similar experience to share.
> >
> > Here's the situation: Documents in our Solr index include a field
> > identifying their author (we have 1000s of authors). When displaying an
> > individual document, we also want to display a list of related documents
> by
> > other authors*, so we do a search using the current document's title,
> author
> > name, summary, and keywords as the query. Sometimes the search yields a
> > results set in which all of the top n documents (in reality, n is ~10)
> are
> > from one author.
> >
> > Apparently, people don't like this.
> >
> > So what is being asked for is a result set in which no more than m
> (where m
> > is probably 3) of the top n are from any single author. (It's not that
> we
> > want to exclude documents m+1, m+2, etc. by each author from the result
> set
> > entirely; we just don't want them in the top n.)
> >
> > More generically, I can imagine this as a feature that might be
> occasionally
> > useful, e.g. as a kind of "diversity boost function" to be used when
> scoring
> > results, where you specify the fields for which you want to enforce
> > diversity (e.g., author name, genre, color, etc.), and provide your
> values
> > for n and m, and Solr, uhm, obliges. :-)
> >
> > Any tips or ideas on how to proceed? (We're using Solr 1.2 so we don't
> have
> > MoreLikeThis, but we can upgrade to a newer version if it's likely that
> > MoreLikeThis can provide what we're looking for.)
> >
> > -Charlie
> >
> > * In fact, we wouldn't mind if additional documents by the same author
> were
> > included, but we found that when we didn't exclude the original author
> from
> > the result set, we almost always had the same problem: The first n
> documents
> > were always by the original author.
>
>

Re: eliminating "too many results from the same source"

Reply via email to