Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

via GitHub Fri, 01 Dec 2023 06:15:16 -0800


Pulkitg64 commented on PR #12857:
URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836188173

Thanks @shubhamvishu for taking a look.
> I went through the change but I didn't understand how are we not reusing
the bitset in the current approach. We do wrap the BitSetIterator with a
FilteredDocIdSetIterator when there are deleted docs right which would
eventually use the bitset to advance the inner iterator(See
[this](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/FilteredDocIdSetIterator.java#L72-L87)).

Sorry! I think, I should have used different title for this PR. The part in
the current approach, which I am trying to optimize is that when the iterator
is of ```BitSetIterator``` instance and ```live docs``` are not null. So in
current approach we create a new BitSet while taking live docs into
consideration. But this bitset creation is a linear time complexity process,
because to create bitset we need to iterate over all matched docs. This BitSet
creation is not required as we can wrap both matched docs bitset and live docs
bitset under single Bits instance which can be later used directly during
approximate search. So instead of creating new Bitset, we are computing if a
document is valid for searching or not at runtime. This saves us time to create
new BitSet.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

Reply via email to