[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

GitBox Thu, 14 Apr 2022 08:36:36 -0700


LuXugang commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1099309953


   Thanks @jtibshirani for reviewing such big PR though I had tried to split 
into several commits by different phase of modification.
   
   >The PR moves the ordToDoc mapping from the metadata file to the vector data 
file. This is great, but it means that we should update the way ordToDoc is 
loaded. We are still loading ordToDoc when the format is opened, which is not 
good, since we are not supposed to touch any data files at this point. I think 
we should follow the same pattern as in the Lucene90DocValuesProducer class 
where we only load the DirectMonotonicReader.Meta file when opening the format, 
then load the full reader later each time we search or load vector values?
   
   Your suggestion is really make sense, so I should remove  ordToDoc to 
`OffHeapVectorValues` and make it off-heap? 
   By the way, in method `Lucene91HnswVectorsReader#search`, ordToDoc is a 
frequent invocation. I worry that off-heap will case latency compared with 
loading all to memory.
   
   >This PR both moves the ordToDoc mapping to disk, and adds an IndexedDISI to 
support fast iteration. It'd be nice to focus on one change at a time, since it 
makes it easier to understand and review. Maybe we could just move ordToDoc to 
disk in this PR. Or do you think the two changes need to go together?
   
   In `Lucene91HnswVectorsReader`, ordToDoc as a array used to both iteration 
and mapping. but now ordToDoc in PR only for mapping, so I had do this two 
changes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

Reply via email to