msokolov commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1329282101
Hi, I was taking time off for a few days, back now. Have you tried post-filtering? When we added support for existing pre-filter (accepting Query) there was some extensive testing to determine when it is better to pre-filter vs post-filter. The answer is not always so clear-cut. If the filter is not so restrictive (matches > 90% of docs, say), you are probably better off post-filtering. If it is highly restrictive then prefiltering will likely offer performance gains. If it's possible in your application to precompute the filter and cache it for some time (eg in a user session), then you can use the existing prefiltering operation by creating a BitSet matching docs that meet the threshold criterion. So I would suggest trying a large K and post-filtering and see if you get reasonable results? In short, I think this is too risky/trappy for most users. Using a highly-restrictive scoring threshold is really not the same as using a large K from a user perspective since the cost is predictable with K (not very data dependent), but not so with the score (as a user I don't know what the score distribution is, a priori), so providing a score threshold is definitely more dangerous/trappy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org