jtibshirani commented on PR #951:
URL: https://github.com/apache/lucene/pull/951#issuecomment-1163800086

   I looked into this more deeply and realized that there are a bunch of times 
we decide not to cache a query into a `BitSet`. For example 
`UsageTrackingQueryCachingPolicy#shouldNeverCache` decides to never cache 
`TermQuery`. So with the default caching policy, the optimization may not kick 
in super often. At this point it's a loose best effort.
   
   For this reason, maybe we can keep the optimization simple for now. We could 
only apply it if the iterator is a `BitSetIterator` and there are no deleted 
docs. The conversion would look something like this:
   
   ```
   private BitSet createBitSet(Weight filterWeight, LeafReaderContext context) 
throws IOException {
           int maxDoc = context.reader().maxDoc();
           Bits liveDocs = context.reader().getLiveDocs();
   
           Scorer scorer = filterWeight.scorer(context);
           if (scorer == null) {
               return new FixedBitSet(maxDoc);
           }
   
           DocIdSetIterator iterator = scorer.iterator();
           if (liveDocs == null && iterator instanceof BitSetIterator 
bitSetIterator) {
               return bitSetIterator.getBitSet();
           }
   
           FixedBitSet bitSet = new FixedBitSet(maxDoc);
           for (int doc = iterator.nextDoc(); doc != 
DocIdSetIterator.NO_MORE_DOCS; doc = iterator.nextDoc()) {
               if (liveDocs == null || liveDocs.get(doc)) {
                   bitSet.set(doc);
               }
           }
           return bitSet;
       }
   ```
   
   Then we would pass the `BitSet` for `acceptDocs`, and `BitSet#cardinality` 
for `visitedLimit`. That way the algorithm remains unchanged but we avoid 
creating a new bit set in some cases.
   
   I also think your approach in this PR makes the code cleaner, so that's a 
nice benefit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to