In addition to all the valuable information already shared I am curious to
understand why you think the results are unreliable.
Most of the times is the parameters that cause to ignore some of the terms
of the original document/corpus (as simple of the min/max document frequency
to consider or min term frequency in the source doc) .

I have been working a lot on the MLT in the past years and presenting the
work done (and internals) at various conferences/meetups.

I'll share some slides and some Jira issues that may help you:

https://www.youtube.com/watch?v=jkaj89XwHHw&t=540s
<https://www.youtube.com/watch?v=jkaj89XwHHw&t=540s>  
https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works
<https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works>  

https://issues.apache.org/jira/browse/LUCENE-8326
<https://issues.apache.org/jira/browse/LUCENE-8326>  
https://issues.apache.org/jira/browse/LUCENE-7802
<https://issues.apache.org/jira/browse/LUCENE-7802>  
https://issues.apache.org/jira/browse/LUCENE-7498
<https://issues.apache.org/jira/browse/LUCENE-7498>  

Generally speaking I favour the MLT query parser, it builds the MLT query
and gives you the chance to see it using the debug query.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to