gf2121 commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2099522207
########## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ########## @@ -76,4 +77,57 @@ public int advanceShallow(int target) throws IOException { * {@link #advanceShallow(int) shallow-advanced} to included and {@code upTo} included. */ public abstract float getMaxScore(int upTo) throws IOException; + + /** + * Return a new batch of doc IDs and scores, starting at the current doc ID, and ending before + * {@code upTo}. Because it starts on the current doc ID, it is illegal to call this method if the + * {@link #docID() current doc ID} is {@code -1}. + * + * <p>An empty return value indicates that there are no postings left between the current doc ID + * and {@code upTo}. + * + * <p>Implementations should ideally fill the buffer with a number of entries comprised between 8 + * and a couple hundreds, to keep heap requirements contained, while still being large enough to + * enable operations on the buffer to auto-vectorize efficiently. + * + * <p>The default implementation is provided below: + * + * <pre class="prettyprint"> + * int batchSize = 16; // arbitrary + * buffer.growNoCopy(batchSize); + * int size = 0; + * DocIdSetIterator iterator = iterator(); + * for (int doc = docID(); doc < upTo && size < batchSize; doc = iterator.nextDoc()) { + * if (liveDocs == null || liveDocs.get(doc)) { + * buffer.docs[size] = doc; + * buffer.scores[size] = score(); + * ++size; + * } + * } + * buffer.size = size; + * </pre> + * + * <p><b>NOTE</b>: The provided {@link DocAndScoreBuffer} should not hold references to internal + * data structures. + * + * <p><b>NOTE</b>: In case this {@link Scorer} exposes a {@link #twoPhaseIterator() + * TwoPhaseIterator}, it should be positioned on a matching document before this method is called. + * + * @lucene.internal + */ + public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer buffer) + throws IOException { + int batchSize = 16; // arbitrary + buffer.growNoCopy(batchSize); + int size = 0; + DocIdSetIterator iterator = iterator(); Review Comment: We have many implementations returning a new iterator here (like `TwoPhaseIterator.asDocIdSetIterator`), will the object construction for each 16 docs cause noticeable overhead? ########## lucene/core/src/java/org/apache/lucene/search/TermScorer.java: ########## @@ -120,4 +126,50 @@ public void setMinCompetitiveScore(float minScore) { impactsDisi.setMinCompetitiveScore(minScore); } } + + @Override + public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer buffer) + throws IOException { + if (docAndFreqBuffer == null) { + docAndFreqBuffer = new DocAndFreqBuffer(); + } + + for (; ; ) { + postingsEnum.nextPostings(upTo, docAndFreqBuffer); + if (liveDocs != null && docAndFreqBuffer.size != 0) { + // An empty return value indicates that there are no more docs before upTo. We may be + // unlucky, and there are docs left, but all docs from the current batch happen to be marked + // as deleted. So we need to iterate until we find a batch that has at least one non-deleted + // doc. + docAndFreqBuffer.apply(liveDocs); + if (docAndFreqBuffer.size == 0) { + continue; + } + } + break; + } + + int size = docAndFreqBuffer.size; + normValues = ArrayUtil.growNoCopy(normValues, size); + if (norms == null) { + Arrays.fill(normValues, 0, size, 1L); Review Comment: Can we only do this fill when grow happens? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ########## @@ -1034,6 +1035,50 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws IOExcept } } + @Override + public void nextPostings(int upTo, DocAndFreqBuffer buffer) throws IOException { + assert needsRefilling == false; + + if (needsFreq == false) { + super.nextPostings(upTo, buffer); + return; + } + + buffer.size = 0; + if (doc >= upTo) { + return; + } + + // Only return docs from the current block + buffer.growNoCopy(BLOCK_SIZE); + upTo = (int) Math.min(upTo, level0LastDocID + 1L); + + // Frequencies are decoded lazily, calling freq() makes sure that the freq block is decoded + freq(); + + int start = docBufferUpto - 1; + buffer.size = 0; Review Comment: Nit: `buffer.size` has be set to 0 above (line 1047), can we avoid this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org