Hi Jason,
Yes, TV will store additional data in the index. Using fields with TV=true will simply get to the seminal terms more easily. Yes, in the end the terms are used to perform a normal query and get the most similar docs. This is based on my use of MLT a whiiiiiiile back, but I don't think things changed that much in the last few years. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Jason Rennie <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, August 4, 2008 6:17:28 PM > Subject: Re: diversity in results > > Does the MLT handler simply select a few high tfidf terms from the doc and > use them as a query? Sounds like a useful tool. Do you know anything about > relevant performance issues? I noticed that the Solr MoreLikeThis wiki page > recommends turning on TermVectors for corresponding fields. Can lucene not > easily return term counts for a document with the standard indexing b/c it's > term-based (i.e. "inverted"). Does TermVectors=true cause solr/lucene to > store an additional doc-based index? > > Thanks, > > Jason > > On Mon, Aug 4, 2008 at 5:06 PM, Brian Whitman wrote: > > > not out of the box, but I would use the mlt handler on the first result and > > remove all the ones that appear in both the MLT and query response. > > > > B > > > > > -- > Jason Rennie > Head of Machine Learning Technologies, StyleFeeder > http://www.stylefeeder.com/ > Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/