benwtrent commented on code in PR #12314:
URL: https://github.com/apache/lucene/pull/12314#discussion_r1243631127
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##########
@@ -822,40 +836,37 @@ private void writeMeta(
* Writes the byte vector values to the output and returns a set of
documents that contains
* vectors.
*/
- private static DocsWithFieldSet writeByteVectorData(
- IndexOutput output, ByteVectorValues byteVectorValues) throws
IOException {
- DocsWithFieldSet docsWithField = new DocsWithFieldSet();
- for (int docV = byteVectorValues.nextDoc();
- docV != NO_MORE_DOCS;
- docV = byteVectorValues.nextDoc()) {
- // write vector
- byte[] binaryValue = byteVectorValues.vectorValue();
- assert binaryValue.length == byteVectorValues.dimension() *
VectorEncoding.BYTE.byteSize;
+ private static DocsWithVectorsSet writeByteVectorData(
+ IndexOutput output, ByteVectorValues mergedVectorValues) throws
IOException {
+ DocsWithVectorsSet docsWithVectors = new DocsWithVectorsSet();
+ for (int vectorId = mergedVectorValues.nextDoc(); vectorId !=
NO_MORE_DOCS; vectorId = mergedVectorValues.nextDoc()) {
+ int docID = mergedVectorValues.ordToDoc(vectorId);
+ byte[] binaryValue = mergedVectorValues.vectorValue();
+ assert binaryValue.length == mergedVectorValues.dimension() *
VectorEncoding.BYTE.byteSize;
output.writeBytes(binaryValue, binaryValue.length);
- docsWithField.add(docV);
+ docsWithVectors.add(docID);
}
- return docsWithField;
+ return docsWithVectors;
Review Comment:
From what I can tell, this is simply writing each vector value in order,
regardless if they are part of the same document or not correct?
I suppose we get the document order for free since they all have to be
supplied at the same time correct?
The reason I say this, is that if we want to be able to iterate by document,
being able to skip directly to the document and read its vectors is important,
which wouldn't be easily possible unless all the vectors in a document were
written right next to each other.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]