Hi Jason,

Yes, TV will store additional data in the index.  Using fields with TV=true 
will simply get to the seminal terms more easily.  Yes, in the end the terms 
are used to perform a normal query and get the most similar docs.  This is 
based on my use of MLT a whiiiiiiile back, but I don't think things changed 
that much in the last few years.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jason Rennie <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, August 4, 2008 6:17:28 PM
> Subject: Re: diversity in results
> 
> Does the MLT handler simply select a few high tfidf terms from the doc and
> use them as a query?  Sounds like a useful tool.  Do you know anything about
> relevant performance issues?  I noticed that the Solr MoreLikeThis wiki page
> recommends turning on TermVectors for corresponding fields.  Can lucene not
> easily return term counts for a document with the standard indexing b/c it's
> term-based (i.e. "inverted").  Does TermVectors=true cause solr/lucene to
> store an additional doc-based index?
> 
> Thanks,
> 
> Jason
> 
> On Mon, Aug 4, 2008 at 5:06 PM, Brian Whitman wrote:
> 
> > not out of the box, but I would use the mlt handler on the first result and
> > remove all the ones that appear in both the MLT and query response.
> >
> > B
> >
> >
> -- 
> Jason Rennie
> Head of Machine Learning Technologies, StyleFeeder
> http://www.stylefeeder.com/
> Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

Reply via email to