[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #992: LUCENE-10592 Build HNSW Graph on indexing

GitBox Tue, 12 Jul 2022 12:37:57 -0700


mayya-sharipova commented on code in PR #992:
URL: https://github.com/apache/lucene/pull/992#discussion_r919288022



##########
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java:
##########
@@ -24,28 +24,40 @@
 import org.apache.lucene.index.DocIDMerger;
 import org.apache.lucene.index.FieldInfo;
 import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.Sorter;
 import org.apache.lucene.index.VectorValues;
 import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.Accountable;
 import org.apache.lucene.util.Bits;
 import org.apache.lucene.util.BytesRef;
 
 /** Writes vectors to an index. */
-public abstract class KnnVectorsWriter implements Closeable {
+public abstract class KnnVectorsWriter implements Accountable, Closeable {
 
   /** Sole constructor */
   protected KnnVectorsWriter() {}
 
-  /** Write all values contained in the provided reader */
-  public abstract void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
+  /** Add new field for indexing */
+  public abstract void addField(FieldInfo fieldInfo) throws IOException;
+
+  /** Add new docID with its vector value to the given field for indexing */
+  public abstract void addValue(FieldInfo fieldInfo, int docID, float[] 
vectorValue)
+      throws IOException;
+
+  /** Flush all buffered data on disk * */
+  public abstract void flush(int maxDoc, Sorter.DocMap sortMap) throws 
IOException;

Review Comment:
   @jtibshirani  Thanks for the suggestion, I thought how to organize it, and I 
could not find a good way to do it, so I left the things as they are.
   
   In `IndexingChain#flush` we could have called `KnnFieldVectorsWriter#flush`, 
but `flush` operation also requires do `writer.finish();` and close the writer, 
so it is better managed by `VectorValuesConsumer` than individual 
`KnnFieldVectorsWriter` objects.
   
   >  this would help make Lucene93HnswVectorsWriter easier to read, because we 
could separate out the complex sorting logic into a class like 
SortingFieldWriter
   
   This is also challenging to implement because whether a field writer needs 
to be `SortingFieldWriter` only becomes known during flush (`if sortMap != 
null` ), so this would require converting usual field writer object to 
`SortingFieldWriter`  on flush, which doesn't look nice. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #992: LUCENE-10592 Build HNSW Graph on indexing

Reply via email to