[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

GitBox Fri, 04 Mar 2022 06:19:35 -0800


mayya-sharipova commented on pull request #728:
URL: https://github.com/apache/lucene/pull/728#issuecomment-1059202282



   @jtibshirani  Thanks a lot for your review.
   
   > If a user had 100 vector fields, then now we might have 100+ files being 
written concurrently, multiplied by the number of segments we're writing at the 
same time. It seems like this could cause problems -- should we only use this 
strategy if there are a relatively small number of vector fields? 
   
   That's a good concern, and the way you suggested to address looks reasonable 
to me.  I can explore this in subsequent work, if we decide to keep this 
approach
   
    > It feels wasteful to be writing the vectors to a temp file in 
IndexingChain, then immediately reading and writing them to a temp file again 
Lucene91HnswVectorsWriter. I wonder if we could make a top-level 
OffHeapVectorValues class that's more broadly visible, so that 
Lucene91HnswVectorsWriter could just check if it's dealing with a file-backed 
vector values and avoid creating another one?
   
   I had the same thought and was intending to explore this in subsequent work. 
One thing to note for now, is copying data like this takes almost no time, 
except we temporarily occupying extra disk space while the graph is being built.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

Reply via email to