No this isn't the MLT, just the standard query parser for now. I did
try the heuristic approach and I might stick with that actually. I ran
the process on known duplicates and created a collection of all
scores. I was then able to see how well the query worked. The scores
seemed focused to one rang
Hi,
are you using moreLikeThis for that feature?
I have no suggestion for a reliable threshold, I think this depends
on the domain you are operating and is IMO only solvable with a heuristic.
It also depends on fields, boosts, ...
It could be that there is a 'score gap' between duplicates and none