[jira] [Commented] (LUCENE-10516) reduce unnecessary loop matches in BKDReader

kkewwei (Jira) Thu, 19 May 2022 20:31:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539903#comment-17539903
 ]


kkewwei commented on LUCENE-10516:
----------------------------------

For the spareDocValues, we use compression to store data: sameCount, 
detailValue,  In the BKDReader, we compare the same batch docIds in the loop, 
the iterator seems useless.

{code:java}
// read cardinality and point
  private void visitSparseRawDocValues(int[] commonPrefixLengths, byte[] 
scratchPackedValue, IndexInput in, BKDReaderDocIDSetIterator scratchIterator, 
int count, IntersectVisitor visitor) throws IOException {
    int i;
    for (i = 0; i < count;) {
      // read the same values count
      int length = in.readVInt();
     // read the detail values
      for(int dim = 0; dim < numDataDims; dim++) {
        int prefix = commonPrefixLengths[dim];
        in.readBytes(scratchPackedValue, dim*bytesPerDim + prefix, bytesPerDim 
- prefix);
      }
      scratchIterator.reset(i, length); 
     // iterate compare every same values.
      visitor.visit(scratchIterator, scratchPackedValue); 
      i += length;
    }
    if (i != count) {
      throw new CorruptIndexException("Sub blocks do not add up to the expected 
count: " + count + " != " + i, in);
    }
  }
{code}


> reduce unnecessary loop matches in BKDReader
> --------------------------------------------
>
>                 Key: LUCENE-10516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10516
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 8.6.2
>            Reporter: kkewwei
>            Priority: Major
>
> In *BKDReader.visitSparseRawDocValues()*, we will read a batch of docIds 
> which have the same point value:*scratchPackedValue*, then call 
> *visitor.visit(scratchIterator, scratchPackedValue)* to find which docIDs 
> match the range.
> {code:java}
> default void visit(DocIdSetIterator iterator, byte[] packedValue) throws 
> IOException {
>       int docID;
>       while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { 
>         visit(docID, packedValue); 
>       }
>     }
> {code}
> We know that the packedValue are same for the batch of docIds, if the first 
> doc match the range, the batch of other docIds will also match the range, so 
> the loop seems useless.
> We should call the method as follow:
> {code:java}
>           public void visit(DocIdSetIterator iterator, byte[] packedValue) 
> throws IOException {
>             if (matches(packedValue)) {
>               int docID;
>               while ((docID = iterator.nextDoc()) != 
> DocIdSetIterator.NO_MORE_DOCS) {
>                 visit(docID);
>               }
>             }
>           }
> {code}
> https://github.com/apache/lucene/blob/2e941fcfed6cad3d9c8667ff5324cd04858ba547/lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java#L196
> If we should override the *visit(DocIdSetIterator iterator, byte[] 
> packedValue)* in *ExitableDirectoryReader$ExitableIntersectVisitor* to avoid 
> calling the default implement:
> {code:java}
>         @Override
>         public void visit(DocIdSetIterator iterator, byte[] packedValue) 
> throws IOException {
>             queryCancellation.checkCancelled();
>             in.visit(iterator, packedValue);
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10516) reduce unnecessary loop matches in BKDReader

Reply via email to