Any ideas on this? Is it worth sending a bug report?

Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.

Cheers,

Andrew.


Andrew Clegg wrote:
> 
> Hi,
> 
> If I run a MoreLikeThis query like the following:
> 
> http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1
> 
> one of the hits in the results is "and" (I don't do any stopword removal
> on this field).
> 
> However if I look inside that document with the TermVectorComponent:
> 
> http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords
> 
> I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms
> with *much* higher tf.idf scores, e.g.:
> 
> <lst name="aquaspirillum">
> <int name="tf">1</int>
> <int name="df">10</int>
> <double name="tf-idf">0.1</double>
> </lst>
> 
> that *don't* appear in the MoreLikeThis list. (I tried adding
> &mlt.maxwl=999 to the end of the MLT query but it makes no difference.)
> 
> What's going on? Surely something with tf.idf = 0.1 is a far better
> candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4?
> Or does MoreLikeThis do some other heuristic magic to select good
> candidates, and sometimes get it wrong?
> 
> BTW the keywords field is indexed, stored, multi-valued and term-vectored.
> 
> Thanks,
> 
> Andrew.
> 
> -- 
> :: http://biotext.org.uk/ ::
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to