msokolov commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2282182975

   Thinking about the implementation a bit I realized that when we reorder the 
vector storage for the benefit of HNSW we will still need a way to iterate over 
vector values in docid order, and we need to map from vector ord to docid when 
searching. None of the existing vector formats handles this: they are optimized 
for vectors that are stored in docid order. To make some progress, I'd start 
with an initial implementation that stores these mappings in a naive way, eg as 
fully-populated arrays, and we can use that to measure how much improvement we 
see in graph storage size and search performance. Then we could revisit and use 
some  more efficient data structure for the ord/doc mapping. Since the ordinals 
would no longer be increasing with docid we can't use 
DirectMonotonicReader/Writer any more, but it needs to be something more like 
what SortedNumericDocValues does.  I'm not super familiar with what we have - I 
wonder if there is some reusable code that would help.    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to