mocobeta commented on PR #940:
URL: https://github.com/apache/lucene/pull/940#issuecomment-1146545273

   @pminkov thank you for the thorough analysis! 
   
   Looking at the result, with this fix too common words do not appear, as 
expected, and too rare words still not be selected - so the result will be more 
balanced I think. 
   
   > Here are the terms that are selected for each document: 
https://gist.github.com/pminkov/1432b04f794b97d1fc042ffc1ac0dce2
   
   Could you attach the .txt file to this PR? It's a good reference for others 
and I think it'd be great to directly have it here.
   
   I think the improvement in quality is clear now. On the other hand, MLT 
query is a very widely used feature, I've been thinking about having a bool 
parameter (say `useRawTermFreq`, maybe) to switch back to the old behavior so 
that we give users an option to stick to the old results if it's needed. We can 
set the default to `false` to use `TFIDFSimilarity.tf(freq)` for main 
(opt-out), and `true` for 9x (opt-in). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to