mayya-sharipova commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r862127419
########## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ########## @@ -320,13 +323,19 @@ private static class FieldEntry { final int numLevels; final int dimension; private final int size; - final int[] ordToDoc; - private final IntUnaryOperator ordToDocOperator; final int[][] nodesByLevel; // for each level the start offsets in vectorIndex file from where to read neighbours final long[] graphOffsetsByLevel; - - FieldEntry(DataInput input, VectorSimilarityFunction similarityFunction) throws IOException { + final long docsWithFieldOffset; + final long docsWithFieldLength; + final short jumpTableEntryCount; + final byte denseRankPower; + long addressesOffset; Review Comment: Should all these new variables be final? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ########## @@ -258,14 +257,18 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs, int } private OffHeapVectorValues getOffHeapVectorValues(FieldEntry fieldEntry) throws IOException { - IndexInput bytesSlice = - vectorData.slice("vector-data", fieldEntry.vectorDataOffset, fieldEntry.vectorDataLength); - return new OffHeapVectorValues( - fieldEntry.dimension, fieldEntry.size(), fieldEntry.ordToDoc, bytesSlice); + if (fieldEntry.docsWithFieldOffset == -2) { + return OffHeapVectorValues.emptyOffHeapVectorValues(fieldEntry.dimension); + } else { + IndexInput bytesSlice = + vectorData.slice("vector-data", fieldEntry.vectorDataOffset, fieldEntry.vectorDataLength); + return new OffHeapVectorValues( + fieldEntry.dimension, fieldEntry.size(), fieldEntry, vectorData, bytesSlice); Review Comment: Should `OffHeapVectorValues` just accept `fieldEntry` as an argument and `fieldEntry.dimension` and `fieldEntry.size()` can be removed? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ########## @@ -400,27 +400,42 @@ static class OffHeapVectorValues extends VectorValues private final int dimension; private final int size; - private final int[] ordToDoc; - private final IntUnaryOperator ordToDocOperator; + // dataIn was used to init a new IndexedDIS for #randomAccess() private final IndexInput dataIn; + private final IndexInput slice; private final BytesRef binaryValue; private final ByteBuffer byteBuffer; private final int byteSize; private final float[] value; + private final IndexedDISI disi; + private final FieldEntry fieldEntry; + final DirectMonotonicReader ordToDoc; private int ord = -1; private int doc = -1; - OffHeapVectorValues(int dimension, int size, int[] ordToDoc, IndexInput dataIn) { + OffHeapVectorValues( + int dimension, int size, FieldEntry fieldEntry, IndexInput dataIn, IndexInput slice) + throws IOException { this.dimension = dimension; this.size = size; - this.ordToDoc = ordToDoc; - ordToDocOperator = ordToDoc == null ? IntUnaryOperator.identity() : (ord) -> ordToDoc[ord]; + this.fieldEntry = fieldEntry; this.dataIn = dataIn; + this.slice = slice; + this.disi = initDISI(dataIn); Review Comment: I was thinking how to simplify the code around `disi` as it involves a lot of conditions that need to be carefully checked. (may be to use a general `DocIdSetIterator instead, but it doesn't seem to work). Anyway I could not think of anything smart, and this LGTM in the current form -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org