mayya-sharipova commented on a change in pull request #416:
URL: https://github.com/apache/lucene/pull/416#discussion_r754297662
##########
File path:
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws
IOException {
return new FieldEntry(input, similarityFunction);
}
+ private void fillGraphNodesAndOffsetsByLevel() throws IOException {
+ for (FieldEntry entry : fields.values()) {
+ IndexInput input =
Review comment:
@jpountz I wonder if Adrien can suggest us better how to organize vector
files?
For the context: currently, `.vem (vector metadata)` can be quite big as it
contains graph offsets (for 1M docs: 1M * 8 bytes = 8Mb; for 10M docs: 80 Mb).
This PR tries to store graph offsets into a separate file, but still load them
during initialization of `FieldEntry`. The problem then as @jtibshirani noticed
`FieldEntry` data is constructed from two files: the metadata file and the
file that stores graph offsets. May be it is worth to keep the original format
and store all this data in the metadata file?
##########
File path:
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws
IOException {
return new FieldEntry(input, similarityFunction);
}
+ private void fillGraphNodesAndOffsetsByLevel() throws IOException {
+ for (FieldEntry entry : fields.values()) {
+ IndexInput input =
Review comment:
@jpountz I wonder if you can suggest us better how to organize vector
files?
For the context: currently, `.vem (vector metadata)` can be quite big as it
contains graph offsets (for 1M docs: 1M * 8 bytes = 8Mb; for 10M docs: 80 Mb).
This PR tries to store graph offsets into a separate file, but still load them
during initialization of `FieldEntry`. The problem then as @jtibshirani noticed
`FieldEntry` data is constructed from two files: the metadata file and the
file that stores graph offsets. May be it is worth to keep the original format
and store all this data in the metadata file?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]