jpountz commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1657050056
I agree that there is similarity in that in both cases it boils down to whether or not you can accept having less than `k` hits. However the degradation is brutal with filtering as you either need to evaluate the filter across the entire segment to load it into a bitset (not great for both runtime (if the filter cardinality is high) and memory usage) or linearly scan all filter matches (not great either). Here the degradation is much more graceful as you only pay some overhead for vectors that get collected. For filtering, I could see a case for requesting k'>k vectors and then do post filtering. For this case I think I would always want to use this feature, potentially combined with the `visitLimit` option to protect against worst-case conditions like a million child docs per parent that would make collisions frequent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org