[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

GitBox Mon, 22 Nov 2021 05:52:08 -0800


mayya-sharipova commented on a change in pull request #416:
URL: https://github.com/apache/lucene/pull/416#discussion_r754297662




##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws 
IOException {
     return new FieldEntry(input, similarityFunction);
   }
 
+  private void fillGraphNodesAndOffsetsByLevel() throws IOException {
+    for (FieldEntry entry : fields.values()) {
+      IndexInput input =

Review comment:
       @jpountz I wonder if Adrien can suggest us better how to organize vector 
files?  
   
   For the context: currently, `.vem (vector metadata)` can be quite big as it 
contains graph offsets (for 1M docs: 1M * 8 bytes = 8Mb; for 10M docs: 80 Mb). 
This PR tries to store graph offsets into a separate file, but still load them 
during initialization of `FieldEntry`. The problem then as @jtibshirani noticed 
`FieldEntry` data is constructed from two files: the metadata file and  the 
file that stores graph offsets.  May be it is worth to keep the original format 
and store all this data in the metadata file?

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws 
IOException {
     return new FieldEntry(input, similarityFunction);
   }
 
+  private void fillGraphNodesAndOffsetsByLevel() throws IOException {
+    for (FieldEntry entry : fields.values()) {
+      IndexInput input =

Review comment:
       @jpountz I wonder if you can suggest us better how to organize vector 
files?  
   
   For the context: currently, `.vem (vector metadata)` can be quite big as it 
contains graph offsets (for 1M docs: 1M * 8 bytes = 8Mb; for 10M docs: 80 Mb). 
This PR tries to store graph offsets into a separate file, but still load them 
during initialization of `FieldEntry`. The problem then as @jtibshirani noticed 
`FieldEntry` data is constructed from two files: the metadata file and  the 
file that stores graph offsets.  May be it is worth to keep the original format 
and store all this data in the metadata file?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

Reply via email to