[
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julie Tibshirani updated LUCENE-9583:
-------------------------------------
Priority: Blocker (was: Major)
> How should we expose VectorValues.RandomAccess?
> -----------------------------------------------
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael Sokolov
> Priority: Blocker
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}}
> sub-interface. [~jtibshirani] pointed out this is not needed by some
> vector-indexing strategies which can operate solely using a forward-iterator
> (it is needed by HNSW), and so in the interest of simplifying the public API
> we should not expose this internal detail (which by the way surfaces internal
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered
> that we do also currently make use of the RA API when merging vector fields
> over sorted indexes. Without it, we would need to load all vectors into RAM
> while flushing/merging, as we currently do in
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}}
> to {{VectorValues.RandomAccess}}. This I think we could move back, and
> handle the HNSW requirements for search elsewhere. I wonder if that would
> alleviate the major concern here?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]