Hey hhc,

I am new to Solr, so pardon me if this throws you off. But I think the
following piece of code is relevant to your problem from
MoreLikeThisHandler#handleRequestBody():

      // Find documents MoreLikeThis - either with a reader or a query
      // 
--------------------------------------------------------------------------------
      if (reader != null) {
        mltDocs = mlt.getMoreLikeThis(reader, start, rows, filters,
            interesting, flags);
      } else if (q != null) {
        // Matching options
        boolean includeMatch = params.getBool(MoreLikeThisParams.MATCH_INCLUDE,
            true);
        int matchOffset = params.getInt(MoreLikeThisParams.MATCH_OFFSET, 0);
        // Find the base match*        DocList match =
searcher.getDocList(query, null, null, matchOffset, 1,
*            flags); // only get the first one...
        if (includeMatch) {
          rsp.add("match", match);
        }

        // This is an iterator, but we only handle the first match*
    DocIterator iterator = match.iterator();
*        if (iterator.hasNext()) {
          // do a MoreLikeThis query for each document in results
          *int id = iterator.nextDoc();
          mltDocs = mlt.getMoreLikeThis(id, start, rows, filters, interesting,
              flags);*
        }
      } else {
        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
            "MoreLikeThis requires either a query (?q=) or text to
find similar documents.");
      }

    } finally {
      if (reader != null) {
        reader.close();
      }
    }

>From the code in bold, it seems like it pulls the first document from the
top 10 list (which is most likely your duplicate document, as it seems to
be ranked by score), and issues an mlt query on that.

As an experiment to verify this, you can try the following:
1. Add a *third* document, similar to "aaa", let's say it's called "ccc".
2. Issue the same query that you posted above:
http://localhost:8983/solr/test/select?q=id:aaa&mlt=true&mlt.fl=title
3. If you see document "ccc" in the results list, that confirms the above
notion of mine.

Let us know how it goes!

Best Regards,
Nishant Kelkar

On Thu, Nov 27, 2014 at 2:33 AM, hhc <hhchen1...@gmail.com> wrote:

> I have two documents with ids "aaa" and "bbb", and the titles of both
> documents are "a black fox jumps over a red flower".  I imported both
> documents, along with several other testing documents, two a core "test".
>
> I want solr to return documents similar to document "aaa", so I submited
> the
> following:
>
> http://localhost:8983/solr/test/select?q=id:aaa&mlt=true&mlt.fl=title
>
> Solr returned some similar documents.  However, document "bbb", which
> should
> be the most similar document of "aaa", was not in the mlt returned list.
> Any ideas how this could happen?  Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to