[ https://issues.apache.org/jira/browse/LUCENE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539903#comment-17539903 ]
kkewwei commented on LUCENE-10516: ---------------------------------- For the spareDocValues, we use compression to store data: sameCount, detailValue, In the BKDReader, we compare the same batch docIds in the loop, the iterator seems useless. {code:java} // read cardinality and point private void visitSparseRawDocValues(int[] commonPrefixLengths, byte[] scratchPackedValue, IndexInput in, BKDReaderDocIDSetIterator scratchIterator, int count, IntersectVisitor visitor) throws IOException { int i; for (i = 0; i < count;) { // read the same values count int length = in.readVInt(); // read the detail values for(int dim = 0; dim < numDataDims; dim++) { int prefix = commonPrefixLengths[dim]; in.readBytes(scratchPackedValue, dim*bytesPerDim + prefix, bytesPerDim - prefix); } scratchIterator.reset(i, length); // iterate compare every same values. visitor.visit(scratchIterator, scratchPackedValue); i += length; } if (i != count) { throw new CorruptIndexException("Sub blocks do not add up to the expected count: " + count + " != " + i, in); } } {code} > reduce unnecessary loop matches in BKDReader > -------------------------------------------- > > Key: LUCENE-10516 > URL: https://issues.apache.org/jira/browse/LUCENE-10516 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other > Affects Versions: 8.6.2 > Reporter: kkewwei > Priority: Major > > In *BKDReader.visitSparseRawDocValues()*, we will read a batch of docIds > which have the same point value:*scratchPackedValue*, then call > *visitor.visit(scratchIterator, scratchPackedValue)* to find which docIDs > match the range. > {code:java} > default void visit(DocIdSetIterator iterator, byte[] packedValue) throws > IOException { > int docID; > while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { > visit(docID, packedValue); > } > } > {code} > We know that the packedValue are same for the batch of docIds, if the first > doc match the range, the batch of other docIds will also match the range, so > the loop seems useless. > We should call the method as follow: > {code:java} > public void visit(DocIdSetIterator iterator, byte[] packedValue) > throws IOException { > if (matches(packedValue)) { > int docID; > while ((docID = iterator.nextDoc()) != > DocIdSetIterator.NO_MORE_DOCS) { > visit(docID); > } > } > } > {code} > https://github.com/apache/lucene/blob/2e941fcfed6cad3d9c8667ff5324cd04858ba547/lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java#L196 > If we should override the *visit(DocIdSetIterator iterator, byte[] > packedValue)* in *ExitableDirectoryReader$ExitableIntersectVisitor* to avoid > calling the default implement: > {code:java} > @Override > public void visit(DocIdSetIterator iterator, byte[] packedValue) > throws IOException { > queryCancellation.checkCancelled(); > in.visit(iterator, packedValue); > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org