agorlenko commented on PR #11946:
URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320508166

   > Can you explain why you want the "find all docs with score > T"?
   
   For example, we want to give user only suitable for him/her documents. We 
have a custom scorer (based on ml-model, for example) which calculates a score. 
Next, we compare that score with the threshold to determine whether this 
document is suitable for the user or not. But usually that scorer too 
computationally complex to compute it for every document which passed filters. 
In order to deal with this problem we can build another model, much simpler. 
That new model would select candidates for the heavy model. One of the basic 
approaches for building that light model is knn: we have a vector (embedding) 
for user or users' query and we have a vector (embedding) for every document. 
So we just find the nearest documents and pass them to the heavy scorer. But we 
don't know K in that case, we know only the threshold. This threshold is 
defined during the development of the ranking model. Such tasks naturally arise 
in recommendation systems and ranking  as well.
   
   > That is going to be a scary thing. What if someone asks for T==0? Then the 
computation and memory requirements are unbounded.
   
   The same result can be achieved by setting K = 1000...00. I think we don't 
add the new vulnerability here. Maybe it is worth to add a warning to the 
documentation (for K and for similarityThreshold).
   
   
   If you still think that it's a bad idea to support such functionality in 
Lucene, I will rewrite this PR to the post-filter case. But I think it can be 
useful for people who add ML-ranking in search systems based on Lucene.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to