jpountz commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1657050056

   I agree that there is similarity in that in both cases it boils down to 
whether or not you can accept having less than `k` hits. However the 
degradation is brutal with filtering as you either need to evaluate the filter 
across the entire segment to load it into a bitset (not great for both runtime 
(if the filter cardinality is high) and memory usage) or linearly scan all 
filter matches (not great either). Here the degradation is much more graceful 
as you only pay some overhead for vectors that get collected. For filtering, I 
could see a case for requesting k'>k vectors and then do post filtering. For 
this case I think I would always want to use this feature, potentially combined 
with the `visitLimit` option to protect against worst-case conditions like a 
million child docs per parent that would make collisions frequent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to