[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

GitBox Mon, 28 Nov 2022 07:18:48 -0800


msokolov commented on PR #11946:
URL: https://github.com/apache/lucene/pull/11946#issuecomment-1329282101


   Hi, I was taking time off for a few days, back now. Have you tried 
post-filtering? When we added support for existing pre-filter (accepting Query) 
there was some extensive testing to determine when it is better to pre-filter 
vs post-filter. The answer is not always so clear-cut. If the filter is not so 
restrictive (matches > 90% of docs, say), you are probably better off 
post-filtering. If it is highly restrictive then prefiltering will likely offer 
performance gains. If it's possible in your application to precompute the 
filter and cache it for some time (eg in a user session), then you can use the 
existing prefiltering operation by creating a BitSet matching docs that meet 
the threshold criterion.
   
   So I would suggest trying a large K and post-filtering and see if you get 
reasonable results?
   
   In short, I think this is too risky/trappy for most users. Using a 
highly-restrictive scoring threshold is really not the same as using a large K 
from a user perspective since the cost is predictable with K (not very data 
dependent), but not so with the score (as a user I don't know what the score 
distribution is, a priori), so providing a score threshold is definitely more 
dangerous/trappy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

Reply via email to