mocobeta commented on PR #940: URL: https://github.com/apache/lucene/pull/940#issuecomment-1146545273
@pminkov thank you for the thorough analysis! Looking at the result, with this fix too common words do not appear, as expected, and too rare words still not be selected - so the result will be more balanced I think. > Here are the terms that are selected for each document: https://gist.github.com/pminkov/1432b04f794b97d1fc042ffc1ac0dce2 Could you attach the .txt file to this PR? It's a good reference for others and I think it'd be great to directly have it here. I think the improvement in quality is clear now. On the other hand, MLT query is a very widely used feature, I've been thinking about having a bool parameter (say `useRawTermFreq`, maybe) to switch back to the old behavior so that we give users an option to stick to the old results if it's needed. We can set the default to `false` to use `TFIDFSimilarity.tf(freq)` for main (opt-out), and `true` for 9x (opt-in). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org