shubhamvishu commented on PR #12857: URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836604882
@kaivalnp We could use the `acceptDocs.cardinality()` when its a `BitSetIterator` to get the upper bound which might have some deletes but that would still change the decision sometimes of whether to go for exact search or not. Since we don't know how many of those docs are live but we do know the num of deletes in the segment(we don't know the intersections of these two). One thing that might be tried is to come up with some heuristic that adds some penalty to the cost based on the num of deletes in the segment (i.e. `ctx.reader().numDeletedDocs()/ctx.reader().maxDoc()`). Like maybe if there are 10% deletes we could for eg decrease the cost by 10% or maybe 5%. This might help in cases where we miss falling back to exact search. Though this would need some thorough benchmarking to see what works best. On separate note, I'm thinking if there is some use case where we don't require to know this cost upfront and directly go for approximate search only for instance. Currently, this optimization only kicks in when the iterator is of `BitSetIterator` but if its possible to ignore this cost step or get this cost by some other heuristic/approximation then we could completely make it completely lazily evaluated using `DISI#advance(docid)` for those use cases. @msokolov @benwtrent Maybe you could share your thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org