Interesting, I had not heard of MMR.
Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep 28, 2018 at 10:43 AM Tim Allison <talli...@apache.org> wrote: > If you haven’t already, might want to check out maximal marginal > relevance...original paper: Carbonell and Goldstein. > > On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein <joels...@gmail.com> wrote: > > > Yeah, I think your plan sounds fine. > > > > Do you have a specific use case for diversity of results. I've been > > wondering if diversity of results would provide better perceived > relevance. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) < > > dceccarel...@bloomberg.net> wrote: > > > > > Yeah, I think Kmeans might be a way to implement the "top 3 stories > that > > > are more distant", but you can also have a more naïve (and faster) > > strategy > > > like > > > - sending a threshold > > > - scan the documents according to the relevance score > > > - select the top documents that have diversity > threshold. > > > > > > I would allow to define the strategy and select it from the request. > > > > > > From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To: Diego > > > Ceccarelli (BLOOMBERG/ LONDON ) , solr-user@lucene.apache.org > > > Subject: Re: solr and diversification > > > > > > I've thought about this problem a little bit. What I was considering > was > > > using Kmeans clustering to cluster the top 50 docs, then pulling the > top > > > scoring doc form each cluster as the top documents. This should be fast > > and > > > effective at getting diversity. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) < > > > dceccarel...@bloomberg.net> wrote: > > > > > > > Hi, > > > > > > > > I'm considering to write a component for diversifying the results. I > > know > > > > that diversification can be achieved by using grouping but I'm > thinking > > > > about something different and query biased. > > > > The idea is to have something that gets applied after the normal > > > retrieval > > > > and selects the top k documents more diverse based on some distance > > > metric: > > > > > > > > Example: > > > > imagine that you are asking for 10 rows, and you set diversify.rows=3 > > > > diversity.metric=tfidf diversify.field=body > > > > > > > > Solr might retrieve the the top 10 rows as usual, extract tfidf > vectors > > > > for the bodies and select the top 3 stories that are more distant > > > according > > > > to the cosine similarity. > > > > This would be different from grouping because documents will be > > > > 'collapsed' or not based on the subset of documents retrieved for the > > > > query. > > > > Do you think it would make sense to have it as a component? any > > feedback > > > > / idea? > > > > > > > > > > > > > > > > > > > > > > > >