Solr mlt doesn't return documents with "exactly the same" contents

2014-11-27 Thread hhc
I have two documents with ids "aaa" and "bbb", and the titles of both
documents are "a black fox jumps over a red flower".  I imported both
documents, along with several other testing documents, two a core "test".

I want solr to return documents similar to document "aaa", so I submited the
following:

http://localhost:8983/solr/test/select?q=id:aaa&mlt=true&mlt.fl=title

Solr returned some similar documents.  However, document "bbb", which should
be the most similar document of "aaa", was not in the mlt returned list. 
Any ideas how this could happen?  Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr mlt doesn't return documents with "exactly the same" contents

2014-11-27 Thread hhc
Hi Nishant,

Thank you for the reply.  

I believe that solr removes the first document from the mlt list because a
document is most similar to "itself" and thus should be removed.  In my
case, "aaa" and "bbb" are two different documents.  When search for
documents similar to "aaa", the document "aaa" should be removed from the
list, but "bbb" should be kept.

I did the experiment you suggested.  Unfortunately, the document "ccc" was
not in the mlt list.  I modify the title of "ccc" to a somewhat different
sentence "a black fox jumps over a yellow flower", but the document "ccc"
was not in the list either.  :-(

Anyone has any clues on this?  Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284p4171382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr mlt doesn't return documents with "exactly the same" contents

2014-11-27 Thread hhc
After carefully reading the mlt parameters here
https://wiki.apache.org/solr/MoreLikeThis

I found that I can specify the following parameters to return "bbb" when
search for similar documents of "aaa":

mlt.mintf=1
mlt.mindf=2

Details:
mlt.mintf: Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc.
DEFAULT_MIN_TERM_FREQ = 2
mlt.mindf: Minimum Document Frequency - the frequency at which words will be
ignored which do not occur in at least this many docs.
DEFAULT_MIN_DOC_FREQ = 5

Hope this is helpful to those who are confused about the mlt returns.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284p4171399.html
Sent from the Solr - User mailing list archive at Nabble.com.