msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2282182975
Thinking about the implementation a bit I realized that when we reorder the vector storage for the benefit of HNSW we will still need a way to iterate over vector values in docid order, and we need to map from vector ord to docid when searching. None of the existing vector formats handles this: they are optimized for vectors that are stored in docid order. To make some progress, I'd start with an initial implementation that stores these mappings in a naive way, eg as fully-populated arrays, and we can use that to measure how much improvement we see in graph storage size and search performance. Then we could revisit and use some more efficient data structure for the ord/doc mapping. Since the ordinals would no longer be increasing with docid we can't use DirectMonotonicReader/Writer any more, but it needs to be something more like what SortedNumericDocValues does. I'm not super familiar with what we have - I wonder if there is some reusable code that would help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org