In addition to all the valuable information already shared I am curious to understand why you think the results are unreliable. Most of the times is the parameters that cause to ignore some of the terms of the original document/corpus (as simple of the min/max document frequency to consider or min term frequency in the source doc) .
I have been working a lot on the MLT in the past years and presenting the work done (and internals) at various conferences/meetups. I'll share some slides and some Jira issues that may help you: https://www.youtube.com/watch?v=jkaj89XwHHw&t=540s <https://www.youtube.com/watch?v=jkaj89XwHHw&t=540s> https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works <https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works> https://issues.apache.org/jira/browse/LUCENE-8326 <https://issues.apache.org/jira/browse/LUCENE-8326> https://issues.apache.org/jira/browse/LUCENE-7802 <https://issues.apache.org/jira/browse/LUCENE-7802> https://issues.apache.org/jira/browse/LUCENE-7498 <https://issues.apache.org/jira/browse/LUCENE-7498> Generally speaking I favour the MLT query parser, it builds the MLT query and gives you the chance to see it using the debug query. ----- --------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html