LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1099309953
Thanks @jtibshirani for reviewing such big PR though I had tried to split into several commits by different phase of modification. >The PR moves the ordToDoc mapping from the metadata file to the vector data file. This is great, but it means that we should update the way ordToDoc is loaded. We are still loading ordToDoc when the format is opened, which is not good, since we are not supposed to touch any data files at this point. I think we should follow the same pattern as in the Lucene90DocValuesProducer class where we only load the DirectMonotonicReader.Meta file when opening the format, then load the full reader later each time we search or load vector values? Your suggestion is really make sense, so I should remove ordToDoc to `OffHeapVectorValues` and make it off-heap? By the way, in method `Lucene91HnswVectorsReader#search`, ordToDoc is a frequent invocation. I worry that off-heap will case latency compared with loading all to memory. >This PR both moves the ordToDoc mapping to disk, and adds an IndexedDISI to support fast iteration. It'd be nice to focus on one change at a time, since it makes it easier to understand and review. Maybe we could just move ordToDoc to disk in this PR. Or do you think the two changes need to go together? In `Lucene91HnswVectorsReader`, ordToDoc as a array used to both iteration and mapping. but now ordToDoc in PR only for mapping, so I had do this two changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org