jtibshirani commented on PR #951: URL: https://github.com/apache/lucene/pull/951#issuecomment-1163800086
I looked into this more deeply and realized that there are a bunch of times we decide not to cache a query into a `BitSet`. For example `UsageTrackingQueryCachingPolicy#shouldNeverCache` decides to never cache `TermQuery`. So with the default caching policy, the optimization may not kick in super often. At this point it's a loose best effort. For this reason, maybe we can keep the optimization simple for now. We could only apply it if the iterator is a `BitSetIterator` and there are no deleted docs. The conversion would look something like this: ``` private BitSet createBitSet(Weight filterWeight, LeafReaderContext context) throws IOException { int maxDoc = context.reader().maxDoc(); Bits liveDocs = context.reader().getLiveDocs(); Scorer scorer = filterWeight.scorer(context); if (scorer == null) { return new FixedBitSet(maxDoc); } DocIdSetIterator iterator = scorer.iterator(); if (liveDocs == null && iterator instanceof BitSetIterator bitSetIterator) { return bitSetIterator.getBitSet(); } FixedBitSet bitSet = new FixedBitSet(maxDoc); for (int doc = iterator.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = iterator.nextDoc()) { if (liveDocs == null || liveDocs.get(doc)) { bitSet.set(doc); } } return bitSet; } ``` Then we would pass the `BitSet` for `acceptDocs`, and `BitSet#cardinality` for `visitedLimit`. That way the algorithm remains unchanged but we avoid creating a new bit set in some cases. I also think your approach in this PR makes the code cleaner, so that's a nice benefit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org