Interesting, I had not heard of MMR.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Sep 28, 2018 at 10:43 AM Tim Allison <talli...@apache.org> wrote:

> If you haven’t already, might want to check out maximal marginal
> relevance...original paper: Carbonell and Goldstein.
>
> On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein <joels...@gmail.com> wrote:
>
> > Yeah, I think your plan sounds fine.
> >
> > Do you have a specific use case for diversity of results. I've been
> > wondering if diversity of results would provide better perceived
> relevance.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarel...@bloomberg.net> wrote:
> >
> > > Yeah, I think Kmeans might be a way to implement the "top 3 stories
> that
> > > are more distant", but you can also have a more naïve (and faster)
> > strategy
> > > like
> > >  - sending a threshold
> > >  - scan the documents according to the relevance score
> > >  - select the top documents that have diversity > threshold.
> > >
> > > I would allow to define the strategy and select it from the request.
> > >
> > > From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To:  Diego
> > > Ceccarelli (BLOOMBERG/ LONDON ) ,  solr-user@lucene.apache.org
> > > Subject: Re: solr and diversification
> > >
> > > I've thought about this problem a little bit. What I was considering
> was
> > > using Kmeans clustering to cluster the top 50 docs, then pulling the
> top
> > > scoring doc form each cluster as the top documents. This should be fast
> > and
> > > effective at getting diversity.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > > dceccarel...@bloomberg.net> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm considering to write a component for diversifying the results. I
> > know
> > > > that diversification can be achieved by using grouping but I'm
> thinking
> > > > about something different and query biased.
> > > > The idea is to have something that gets applied after the normal
> > > retrieval
> > > > and selects the top k documents more diverse based on some distance
> > > metric:
> > > >
> > > > Example:
> > > > imagine that you are asking for 10 rows, and you set diversify.rows=3
> > > > diversity.metric=tfidf  diversify.field=body
> > > >
> > > > Solr might retrieve the the top 10 rows as usual, extract tfidf
> vectors
> > > > for the bodies and select the top 3 stories that are more distant
> > > according
> > > > to the cosine similarity.
> > > > This would be different from grouping because documents will be
> > > > 'collapsed' or not based on the subset of documents retrieved for the
> > > > query.
> > > > Do you think it would make sense to have it as a component?  any
> > feedback
> > > > / idea?
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>

Reply via email to