Re: [PR] Updating Dense#intoBitSet to take into account the provided bitset if upTo equals to NO_MORE_DOCS [lucene]

via GitHub Tue, 08 Jul 2025 03:48:06 -0700


pmpailis commented on PR #14882:
URL: https://github.com/apache/lucene/pull/14882#issuecomment-3048383136


   I think that the issue seems to be with the following 
   ```
   int disiTo = Math.min(upTo, bitSet.length())
   ```
   as the above computation does not take into account the initial `offset`, so 
we could end up reading less data than initially requested. So for example, we 
could have `bitSize.length: 100`, `offset: 90`, and `upTo: 150`. Instead of 
reading `[90, 150]`, we would end up operating on just the `[90, 100]` range 
causing all sorts of issues later in the pipeline. 
   Switching back to 
   ```
   int disiTo = upTo == DocIdSetIterator.NO_MORE_DOCS ? bitSet.length() : upTo;
   ```
   seems to address all failures (had 50 full test successful runs). 
   
   Would you suggest to correct only when `upTo == NO_MORE_DOCS` (or sth like 
`upTo - offset >= bitSet.length()`) or to maybe just restrict the fix for the 
initial `knn` filter case and provide a custom `upTo` at that point? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Updating Dense#intoBitSet to take into account the provided bitset if upTo equals to NO_MORE_DOCS [lucene]

Reply via email to