mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r754297662
########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ########## @@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws IOException { return new FieldEntry(input, similarityFunction); } + private void fillGraphNodesAndOffsetsByLevel() throws IOException { + for (FieldEntry entry : fields.values()) { + IndexInput input = Review comment: @jpountz I wonder if Adrien can suggest us better how to organize vector files? For the context: currently, `.vem (vector metadata)` can be quite big as it contains graph offsets (for 1M docs: 1M * 8 bytes = 8Mb; for 10M docs: 80 Mb). This PR tries to store graph offsets into a separate file, but still load them during initialization of `FieldEntry`. The problem then as @jtibshirani noticed `FieldEntry` data is constructed from two files: the metadata file and the file that stores graph offsets. May be it is worth to keep the original format and store all this data in the metadata file? ########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ########## @@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws IOException { return new FieldEntry(input, similarityFunction); } + private void fillGraphNodesAndOffsetsByLevel() throws IOException { + for (FieldEntry entry : fields.values()) { + IndexInput input = Review comment: @jpountz I wonder if you can suggest us better how to organize vector files? For the context: currently, `.vem (vector metadata)` can be quite big as it contains graph offsets (for 1M docs: 1M * 8 bytes = 8Mb; for 10M docs: 80 Mb). This PR tries to store graph offsets into a separate file, but still load them during initialization of `FieldEntry`. The problem then as @jtibshirani noticed `FieldEntry` data is constructed from two files: the metadata file and the file that stores graph offsets. May be it is worth to keep the original format and store all this data in the metadata file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org